我正在根据大学/学院名称的词典来使数据集中的教育数据保持一致。如何针对字典运行代码并获得所需的输出?数据由缩写和俗称组成。
有人可以在R中提供此示例。我也愿意在python中尝试它,R只是我的偏爱。
这是我的字典的一个示例:
*University Name Dictionary
California Institute of Technology
New York University
Massachusetts Institute of Technology
Georgia Institute of Technology
Rutgers University
University of California, Berkley
University of California, Los Angeles
这是我的数据:
*Education
Cal Tech
NYU
MIT
Ga Tech
Georgia Tech
Rutgers
Berkley
UCLA
这就是我想要的:
*Education *New Education
Cal Tech California Institute of Technology
NYU New York University
MIT Massachusetts Institute of Technology
Ga Tech Georgia Institute of Technology
Georgia Tech Georgia Institute of Technology
Rutgers Rutgers University
Berkley University of California, Berkley
UCLA University of California, Los Angeles
抱歉,如果已经有解决方案,我就是找不到。我将不胜感激。
[pandas
具有功能replace(dictionary)
,其中dictionary
类似于
{"Cal Tech": "California Institute of Technology"}
因为pandas.DataFrame
受R
的启发,所以R
可能有相似的地方。
data = {
'Cal Tech': 'California Institute of Technology',
'NYU': 'New York University',
'MIT': 'Massachusetts Institute of Technology',
'Ga Tech': 'Georgia Institute of Technology',
'Georgia Tech': 'Georgia Institute of Technology',
'Rutgers': 'Rutgers University',
'Berkley': 'University of California, Berkley',
'UCLA': 'University of California, Los Angeles',
}
import pandas as pd
df = pd.DataFrame({
'Education': ['Cal Tech', 'NYU', 'MIT', 'Ga Tech', 'Georgia Tech', 'Rutgers', 'Berkley', 'UCLA']
})
df['New Education'] = df['Education'].replace(data)
print(df)
结果:
Education New Education
0 Cal Tech California Institute of Technology
1 NYU New York University
2 MIT Massachusetts Institute of Technology
3 Ga Tech Georgia Institute of Technology
4 Georgia Tech Georgia Institute of Technology
5 Rutgers Rutgers University
6 Berkley University of California, Berkley
7 UCLA University of California, Los Angeles
如果使用regex=True
,它也可以替换成更长的字符串
data = {
'Cal Tech': 'California Institute of Technology',
'NYU': 'New York University',
'MIT': 'Massachusetts Institute of Technology',
'Ga Tech': 'Georgia Institute of Technology',
'Georgia Tech': 'Georgia Institute of Technology',
'Rutgers': 'Rutgers University',
'Berkley': 'University of California, Berkley',
'UCLA': 'University of California, Los Angeles',
}
import pandas as pd
df = pd.DataFrame({
'Education': ['I am from MIT']
})
df['New Education'] = df['Education'].replace(data, regex=True)
print(df)
结果:
Education New Education
0 I am from MIT I am from Massachusetts Institute of Technology