我有DataFrame:
df = pd.DataFrame{'col1': ['afs', 'chk', 'est', 'app'],
'col2': ['ofcr', 'guar', 'ltv', 'gender'],
'col3': ['code', 'mod']}
而且我有字典:
dict = {'ofcr':'officer','chk':'check','mod':'modification','est':'estimated','app':'application', 'gender':'gender'}
我需要遍历df并将数学键替换为其各自的值。我可以使用:
df["col1"] = df["col1"].map(dict)
但是这会将不匹配项转换为NaN。我想要的是保持令牌不变,但是在字符串中添加“ -UNKNOWN-”或类似的显而易见的字词,以便以后使用。我尝试过循环:
for tok in df['col1']:
if tok in dict.values():
df.replace(dict, inplace=True)
if tok not in dict.values():
df.replace(tok, tok '-UNKNOWN', inplace=True)
print(tok)
这也替换了匹配项(在所有列中都足够多了,不仅仅是传递的匹配项),但不影响不匹配项。
applymap()
:applymap()
打印:
df = pd.DataFrame({'col1': ['afs', 'chk', 'est', 'app'], 'col2': ['ofcr', 'guar', 'ltv', 'gender'], 'col3': ['code', 'mod', 'xxx', 'zzz']}) dct = {'ofcr':'officer','chk':'check','mod':'modification','est':'estimated','app':'application', 'gender':'gender'} print(df.applymap(lambda x: dct.get(x, x + '-UNKNOWN')))
col1 col2 col3
0 afs-UNKNOWN officer code-UNKNOWN
1 check guar-UNKNOWN modification
2 estimated ltv-UNKNOWN xxx-UNKNOWN
3 application gender zzz-UNKNOWN