我有以下数据:
json_str = "[{“key1”: “value1”, “key2”= “value2”,
“key3”: “{“key_a”: “value_a1”, “key_b”: “value_b1”, “key_c”: “value_c1”}”,“key4”: 4},
{“key1”: “value5”, “key2”= “value6”,
“key3”: “{“key_a”: “value_a2”, “key_b”: “value_b2”, “key_c”: “value_c2”}”,“key4”: 8}]"
我想将其转换为pandas DataFrame。我试过这个:
#code1
data = pd.read_json(json_str)
print(data)
#code2
data = pd.read_json(json_str, typ ='series')
print(data)
#code3
data = pd.DataFrame.from_dict([json_str], orient='columns', dtype= None)
print(data)
#same output
ValueError: Unexpected character found when decoding object value
再次:
data = json.loads(json_str)
print(data)
enter code here
error : json.decoder.JSONDecodeError: Expecting ',' delimiter
我无法使用.replace(),因为我需要一个列名“key3”,其中包含例如:{“key_a”:“value_a1”,“key_b”:“value_b1”,“key_c”:“value_c1”}的JSON值
需要清理数据,这是一种方法
from functools import reduce
import ast
di = {'“':"'", '”':"'", "'{":'{', "}'":"}", "=":':' }
new = reduce(lambda x, y: x.replace(y, di[y]), di, json)
df = pd.io.json.json_normalize(ast.literal_eval(new))
print(df)
key1 key2 key3.key_a key3.key_b key3.key_c key4
0 value1 value2 value_a1 value_b1 value_c1 4
1 value5 value6 value_a2 value_b2 value_c2 8