无法在Python中解析从HTML标签中的属性获取的JSON字符串

问题描述 投票:0回答:1

我正在对端点进行 AJAX 调用(我没有创建此 API),其中响应采用 JSON 形式。在 JSON 中,有一个名为

content
且类型为
string
的键。在我看来,该内容是 HTML 数据,其中包含一些 JSON。我希望能够解析包含在 HTML 数据中的 JSON,但是当我尝试对字符串执行
json.loads()
操作时,我不断收到以下错误:

{JSONDecodeError}JSONDecodeError('Expecting property name enclosed in double quotes: line 1 column 2 (char 1)')

我真的不明白为什么会出现这个错误

这是我正在尝试解析的 JSON 字符串:

{\"name\":\"ThreadMainListItemNormalizer\",\"props\":{\"thread\":{\"threadId\":4369992,\"threadTypeId\":1,\"titleSlug\":\"sebamed-sale-extra-soft-baby-cream-ps239-anti-dandruff-shampoo-ps387\",\"title\":\"Sebamed sale - extra soft baby cream \£2.39 / anti dandruff shampoo \£3.87\",\"currentUserVoteDirection\":\"\",\"commentCount\":0,\"status\":\"Activated\",\"isExpired\":false,\"isNew\":true,\"isPinned\":false,\"isTrending\":null,\"isBookmarked\":false,\"isLocal\":false,\"temperature\":0,\"temperatureLevel\":\"\",\"type\":\"Deal\",\"nsfw\":false,\"deletedAt\":null,\"publishedAt\":1720003748,\"voucherCode\":\"\",\"link\":\"https://www.justmylook.com/sebamed-m583\",\"merchant\":{\"merchantId\":45518,\"merchantName\":\"Justmylook\",\"merchantUrlName\":\"justmylook.co.uk\",\"isMerchantPageEnabled\":true},\"price\":2.39,\"nextBestPrice\":0,\"percentage\":0,\"discountType\":null,\"shipping\":{\"isFree\":1,\"price\":0},\"user\":{\"userId\":2701300,\"username\":\"Manish_N\",\"title\":\"\",\"avatar\":{\"path\":\"users/raw/default\",\"name\":\"2701300_6\",\"slotId\":\"default\",\"width\":0,\"height\":0,\"version\":6,\"unattached\":false,\"uid\":\"2701300_6.raw\",\"ext\":\"raw\"},\"persona\":{\"text\":null,\"type\":null},\"isBanned\":false,\"isDeletedOrPendingDeletion\":false,\"isUserProfileHidden\":false}}}}

如果我将上面的 JSON 字符串粘贴到 这个在线 JSON 验证器工具,它会说它是无效的 JSON,但是,当我使用此工具对 JSON 进行转义时,我会得到以下输出:

"name":"ThreadMainListItemNormalizer","props":{"thread":{"threadId":4369991,"threadTypeId":1,"titleSlug":"samsung-55-qn700c-neo-qled-8k-hdr-smart-tv","title":"Samsung 55\" QN700C Neo QLED 8K HDR Smart TV Sold by Reliant Direct FBA","currentUserVoteDirection":"","commentCount":0,"status":"Activated","isExpired":false,"isNew":true,"isPinned":false,"isTrending":null,"isBookmarked":false,"isLocal":false,"temperature":0.59,"temperatureLevel":"Hot1","type":"Deal","nsfw":false,"deletedAt":null,"publishedAt":1720003637,"voucherCode":"","link":"https://www.amazon.co.uk/dp/B0BWFNLPTP?smid=A2CN43WDI0AWCL","merchant":{"merchantId":1650,"merchantName":"Amazon","merchantUrlName":"amazon-uk","isMerchantPageEnabled":true},"price":999,"nextBestPrice":1198,"percentage":0,"discountType":null,"shipping":{"isFree":1,"price":0},"user":{"userId":2679277,"username":"ben.jammin","title":"","avatar":{"path":"users/raw/default","name":"2679277_1","slotId":"default","width":0,"height":0,"version":1,"unattached":false,"uid":"2679277_1.raw","ext":"raw"},"persona":{"text":null,"type":null},"isBanned":false,"isDeletedOrPendingDeletion":false,"isUserProfileHidden":false}}}}

这实际上是有效的 JSON。当我尝试复制 unescape 工具并尝试在 Python 中对字符串进行 unescape 时,我的问题就出现了。

我尝试过以下解决方案

  • 使用

    ast.literal_eval()
    但出现以下错误

    {SyntaxError}SyntaxError('unexpected character after line continuation character', ('<unknown>', 1, 3, '{\\"name\\":\\"ThreadMainListItemNo...:null,\\"type\\":null},\\"isBanned\\":false,\\"isDeletedOrPendingDeletion\\":false,\\"isUserProfileHidden\\":false}}}}', 1, 0))
    
  • 使用

    .encode('raw_unicode_escape').decode('unicode_escape')
    概述的方法这里但是在对未转义的字符串执行
    json.loads()
    之后,我收到以下错误

    {JSONDecodeError}JSONDecodeError('Invalid \\escape: line 1 column 224 (char 223)')
    
python json beautifulsoup
1个回答
0
投票

只需使用简单的 str.replace - 在这种情况下似乎就足够了。即使有一个转义的反斜杠,如

a\\"b
,替换策略仍将保留一个反斜杠字符:

new_json = old_json.replace(r'"', '"')

© www.soinside.com 2019 - 2024. All rights reserved.