嗨,我有一个没有标题的数据文件,并且其中有14个字段,我正在使用提取方法根据模式检查其中的几个字段。我正在按列检查数据。它确实验证数据没有问题,但是唯一的问题是它返回true / false。如果数据与模式不匹配,我想打印整行,或打印格式不正确的值。我不确定如何实现。
下面是我到目前为止尝试过的代码
import pandas as pd
import re
try:
df = pd.read_csv("E:\data.txt", delimiter='\t', dtype=str, header=None)
x= len(df.index)
print(x)
except Exception as e:
print (e)
try:
df[0]= df[0].str.match('(\d\d\d\d-\d\d-\d\d)')
except Exception as e:
print (e)
try:
df[1]= df[1].str.match('(\d\d:\d\d:\d\d)')
except Exception as e:
print (e)
try:
df[2]= df[2].str.match('(\d\d.\d\d\d\d*)')
except Exception as e:
print (e)
try:
df[3]= df[3].str.match('-(\d\d.\d\d\d\d*)')
except Exception as e:
print (e)
try:
df[5]= df[5].str.match('(\d\d*)')
except Exception as e:
print (e)
try:
df[6]= df[6].str.match('(\d\d.\d\d\d\d*)')
except Exception as e:
print (e)
try:
df[7]= df[7].str.match('(\d\d.\d\d\d\d*)')
except Exception as e:
print (e)
print(df)
try:
df[13]= df[13].str.match('(\d\d*)')
except Exception as e:
print (e)
print(df)
下面是样本数据集:
018-01-01 00:04:43 43.71678 -79.44384 Tow 53 43.75544 -79.43828 C100 1 WL28 CCG P2 00:16:29
2018-01-01 00:09:10 43.78304 -79.23663 Lockout 30 43.79497 -79.23394 C2 4 WL5 CCG 00:10:05
2018-01-01 00:15:49 43.24116 -79.85282 Lockout 134 43.39425 -79.98044 H23 9 F109 CCG 00:48:16
2018-01-01 00:16:47 43.76756 -79.41196 Flatbed Tow 435 43.77409 -79.49313 C23 10 FB88 CCG 00:18:19
2018-01-01 00:18:53 43.26671 -79.96222 Tow 172 43.2412 -79.85274 H23 11 F109 CCG 02:42:04
2018-01-01 00:22:59 43.8088942 -79.2698542 No service 35 43.78196 -79.2351 C2 50001 WL5 CLUB_AUTO 00:23:04
2018-01-01 00:25:39 43.57866 -79.63927 Tow 304 43.59991 -79.67094 C950 14 F157 CCG 02:46:21
2018-01-01 00:26:27 43.72097 -79.47553 Lockout 152 43.81375 -79.36767 C950 15 F124 CCG P2 00:50:35
2018-01-01 00:26:56 43.785702 -79.729198 Jump Start/Battery Test 55 43.68537 -79.80871 C28 50003 FB6 CCG 00:52:26
2018-01-01 00:28:08 43.79901 -79.42031 Flatbed Tow 67 43.94571 -79.44134 C950 50004 F124 CLUB_AUTO 00:35:10
2018-01-01 00:33:26 43.67615 -79.7707 Tow 84 0 0 C28 19 FB6 CCG P2 00:54:30
2018-01-01 00:41:30 44.07323 -79.48489 Tow 9 44.06664 -79.42858 C512 22 FB1 CCG 00:42:50
2018-01-01 00:43:36 43.62484 -79.55517 Tow 43 43.68514 -79.59623 C16 23 WL1 CCG P5 00:53:31
2018-01-01 00:43:40 43.7088 -79.39456 Flat tire, with Spare 64 43.70485 -79.29617 A18 24 LS1 CCG 00:47:33
2018-01-01 00:47:24 43.87896 -79.49169 Tow 96 43.81937 -79.56436 C950 26 F157 CCG 01:12:33
2018-01-01 00:48:17 44.90311 -79.43861 Winch/Extrication 87 0 0 R130 27 SHOP CCG P2 00:58:46
2018-01-01 00:48:38 0 0 Flat tire, with Spare 72 43.22824 -79.77316 G30 50006 WL1 RAP_RSO RP 00:54:38
2018-01-01 00:51:39 44.26556 -78.35797 No service 55 44.25686 -78.28151 P151 30 FB3 CCG 00:54:11
2018-01-01 00:52:36 43.68888 -79.32561 Tow 272 43.7969 -79.42919 C950 31 F117 CCG 04:04:21
2018-01-01 00:53:08 43.68968 -79.74152 No service 132 43.69461 -79.71206 C28 32 FB6 CCG 02:30:03
2018-01-01 00:55:14 44.2455058 -76.9499712 Tow 49 44.28096 -76.56847 K112M 50007 MD2 CLUB_AUTO 00:57:35
2018-01-01 00:57:00 46.31401 -83.94554 Winch/Extrication 189 46.52989 -84.37611 R829 36 FB2 CCG 01:11:20
2018-01-01 00:58:23 43.444523 -80.497246 No service 59 43.48807 -80.5573 G23 50008 F105 CCG 01:10:51
2018-01-01 00:59:31 42.26581 -82.4183 Tow 84
IIUC,也许您可以尝试这样的事情:
dd = {1:'(\d\d:\d\d:\d\d)'
,2:'(\d\d.\d{4,})'
,3:'-(\d\d.\d{4,})'
,5:'(\d\d*)'}
for i, j in dd.items():
print(f'Checking column {i}')
print(df[~df[i].astype(str).str.match(j)])