我有一个JSON文件,我转换为pandas数据帧,
# Bring in data
audit = pd.read_json('audit_2018-03-02.json')
现在,我有几列,其中这些列的值是一个字符串列表。
foo
[By Audience, By Vendor]
[By Month, By Keyword, By Ad Group, By Service]
[By Month, By To Date, By Keyword, By Ad Group]
我试图遍历列foo并从此列创建数据框。
我试过了,
list_of_records = [
(i['By Month'],
i['By Keyword'],
i['By Ad Group'],
i['By Audience'],
i['By Vender'],
i['By Week'],
i['By To Date'],
i['By Creative'],
i['By Strategy'],
i['By Converstion'],
i['By Geo'],
i['By Campaign']
)
for i, in zip(audit['foo'])
]
Dimensions_Measured = pd.DataFrame.from_records(
list_of_records,
columns = ['By Month', 'By Keyword', 'By Ad Group', 'By Audience', 'By Vender',
'By Week', 'By To Date', 'By Creative', 'By Strategy', 'By Converstion',
'By Geo', 'By Campaign']
)
但我得到一个错误TypeError: list indices must be integers, not str
关于如何实现这一点的任何想法?
我应该进行某种热编码然后创建数据框吗?
您可以通过pd.Series.values.tolist()
将一系列列表转换为多个系列:
foo = pd.Series([['By Audience', 'By Vendor'],
['By Month', 'By Keyword', 'By Ad Group', 'By Service'],
['By Month', 'By To Date', 'By Keyword', 'By Ad Group']])
df = pd.DataFrame(foo.values.tolist())
# 0 1 2 3
# 0 By Audience By Vendor None None
# 1 By Month By Keyword By Ad Group By Service
# 2 By Month By To Date By Keyword By Ad Group