我能够获取一个文本文件,阅读每一行,每行创建一个字典,更新(附加)每一行并存储json文件。问题是,当读取json文件时,它将无法正确读取。错误指向存储文件问题?
文本文件看起来像:
84.txt;科学怪人,或现代普罗米修斯;玛丽·沃斯通克拉夫特(戈德温)雪莱98.txt;双城记;查尔斯·狄更斯...
import json
import re
path = "C:\\...\\data\\"
books = {}
books_json = {}
final_book_json ={}
file = open(path + 'books\\set_of_books.txt', 'r')
json_list = file.readlines()
open(path + 'books\\books_json.json', 'w').close() # used to clean each test
json_create = []
i = 0
for line in json_list:
line = line.replace('#', '')
line = line.replace('.txt','')
line = line.replace('\n','')
line = line.split(';', 4)
BookNumber = line[0]
BookTitle = line[1]
AuthorName = line[-1]
file
if BookNumber == ' 2701':
BookNumber = line[0]
BookTitle1 = line[1]
BookTitle2 = line[2]
AuthorName = line[3]
BookTitle = BookTitle1 + ';' + BookTitle2 # needed to combine title into one to fit dict format
books = json.dumps( {'AuthorName': AuthorName, 'BookNumber': BookNumber, 'BookTitle': BookTitle})
books_json = json.loads(books)
final_book_json.update(books_json)
with open(path + 'books\\books_json.json', 'a'
) as out_put:
json.dump(books_json, out_put)
with open(path + 'books\\books_json.json', 'r'
) as out_put:
'books\\books_json.json', 'r')]
print(json.load(out_put))
报告的错误是:JSONDecodeError:额外的数据:第1行第133列(字符132)-将其添加在第一个“} {”之间。不确定json应该以平面文件格式显示吗?输出文件如上所示编辑器看起来像:{“ AuthorName”:“ Mary Wollstonecraft(Godwin)Shelley”,“ BookNumber”:“ 84”,“ BookTitle”:“ Frankenstein或现代普罗米修斯”} {“ AuthorName”:“ Charles Dickens”,“ BookNumber”:“98“,” BookTitle“:”两个城市的故事“} ...
我最终改变了方法,并使用熊猫来读取文本,然后拆分单单元格输入。
books = pd.read_csv(path + 'books\\set_of_books.txt', sep='\t', names =('r','t', 'a') )
#print(books.head(10))
# Function to clean the 'raw(r)' inoput data
def clean_line(cell):
...
return cell
books['r'] = books['r'].apply(clean_line)
books = books['r'].str.split(';', expand=True)