导入 JSON 格式的数据 (Python) 时如何保留名称/值对的“名称”部分中的空格?

问题描述 投票:0回答:1

我一直在寻求从 URL 导入 JSON 格式数据的帮助(就处理 JSON 而言,我是新手),并收到了针对 this 问题的很好答案。

然而,我遇到了麻烦。我的一些属性名称包含空格。例如,“Property1”和我上一个问题中的其他几个属性名称实际上可能是“Property1_word1 Property1_word2”。当前的解决方案仅保留属性名称的第一个单词。一开始我可以侥幸逃脱,但现在需要所有的言语。如果有人能给我任何提示,我将不胜感激。我到现在还没找到。


编辑(在此处提供所有信息,以便无需参考以前的帖子):

我想从网站导入数据。首先,我将网站的内容(如下)保存为文件。在我上一个问题中,每个属性名称仅由一个单词组成。现在我正在处理由多个单词组成的属性名称。我在下面提供了一个示例,其中 Property1、Property4 和 Property8 的名称包含多个单词。

{
    "payload": {
        "allShortcutsEnabled": false,
        "fileTree": {
            "": {
                "items": [
                    {
                        "name": "thing",
                        "path": "thing",
                        "contentType": "directory"
                    },
                    {
                        "name": ".repurlignore",
                        "path": ".repurlignore",
                        "contentType": "file"
                    },
                    {
                        "name": "README.md",
                        "path": "README.md",
                        "contentType": "file"
                    },
                    {
                        "name": "thing2",
                        "path": "thing2",
                        "contentType": "file"
                    },
                    {
                        "name": "thing3",
                        "path": "thing3",
                        "contentType": "file"
                    },
                    {
                        "name": "thing4",
                        "path": "thing4",
                        "contentType": "file"
                    },
                    {
                        "name": "thing5",
                        "path": "thing5",
                        "contentType": "file"
                    },
                    {
                        "name": "thing6",
                        "path": "thing6",
                        "contentType": "file"
                    },
                    {
                        "name": "thing7",
                        "path": "thing7",
                        "contentType": "file"
                    },
                    {
                        "name": "thing8",
                        "path": "thing8",
                        "contentType": "file"
                    },
                    {
                        "name": "thing9",
                        "path": "thing9",
                        "contentType": "file"
                    },
                    {
                        "name": "thing10",
                        "path": "thing10",
                        "contentType": "file"
                    },
                    {
                        "name": "thing11",
                        "path": "thing11",
                        "contentType": "file"
                    }
                ],
                "totalCount": 500
            }
        },
        "fileTreeProcessingTime": 5.262188,
        "foldersToFetch": [],
        "reducedMotionEnabled": null,
        "repo": {
            "id": 1234567,
            "defaultBranch": "main",
            "name": "repository",
            "ownerLogin": "contributor",
            "currentUserCanPush": false,
            "isFork": false,
            "isEmpty": false,
            "createdAt": "2023-10-31",
            "ownerAvatar": "https://avatars.repurlusercontent.com/u/98765432?v=1",
            "public": true,
            "private": false,
            "isOrgOwned": false
        },
        "symbolsExpanded": false,
        "treeExpanded": true,
        "refInfo": {
            "name": "main",
            "listCacheKey": "v0:13579",
            "canEdit": false,
            "refType": "branch",
            "currentOid": "identifier"
        },
        "path": "thing2",
        "currentUser": null,
        "blob": {
            "rawLines": [
                "        C_1H_4   Methane                  ",
                "            5.00000        Property1_word1 Property1_word2                              ",
                "             20.00000        Property2                     ",
                "           500.66500        Property3                              ",
                "           100.00000        Property4_word1 Property4_word2                                           ",
                "         -4453.98887        Property5                                      ",
                "           100.48200        Property6                                   ",
                "            59.75258        Property7                                         ",
                "             5.33645        Property8_word1 Property8_word2                                         ",
                "             0.00000        Property9         ",
                "           645.07777        Property10                                       ",
                "             0.00000        Property11                           ",
                "             0.00000        Property12                           ",
                "             0.00000        Property13                             ",
                "             0.00000        Property14                             ",
                "             0.00000        Property15                             ",
                "             0.00000        Property16                             ",
                "             0.00000        Property17                   ",
                "             0.00000        Property18                            ",
                "             0.00000        Property19                   ",
                "             0.00000        Property20                             ",
                "             0.00000        Property21                   ",
                "             0.00000        Property22                             ",
                "             0.00000        Property23                   ",
                "             0.00000        Property24                    ",
                "             0.00000        Property25                    ",
                "             0.57876        Property26                                           ",
                "             4.00000        Property27                                               ",
                "             0.00000        Property28                    ",
                "             0.00000        Property29               ",
                "             0.00000        Property30                  ",
                "             0.00000        Property31            ",
                "             0.00000        Property32                  ",
                "             1.00000        Property33                         ",
                "             0.00000        Property34                       ",
                "            26.00000        Property35                             ",
                "             1.44571        Property36                               ",
                "             1.08756        Property37                            ",
                "             0.00000        Property38                          ",
                "             0.00000        Property39                        ",
                "             0.00000        Property40                        ",
                "             6.00000        Property41                       ",
                "             9.00000        Property42                                         ",
                "             0.00000        Property43                                         "
            ],
            "stylingDirectives": [
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                [],
                []
            ],
            "csv": null,
            "csvError": null,
            "dependabotInfo": {
                "showConfigurationBanner": false,
                "configFilePath": null,
                "networkDependabotPath": "/contributor/repository/network/updates",
                "dismissConfigurationNoticePath": "/settings/dismiss-notice/dependabot_configuration_notice",
                "configurationNoticeDismissed": null,
                "repoAlertsPath": "/contributor/repository/security/dependabot",
                "repoSecurityAndAnalysisPath": "/contributor/repository/settings/security_analysis",
                "repoOwnerIsOrg": false,
                "currentUserCanAdminRepo": false
            },
            "displayName": "thing2",
            "displayUrl": "https://repurl.com/contributor/repository/blob/main/thing2?raw=true",
            "headerInfo": {
                "blobSize": "3.37 KB",
                "deleteInfo": {
                    "deleteTooltip": "You must be signed in to make or propose changes"
                },
                "editInfo": {
                    "editTooltip": "XXX"
                },
                "ghDesktopPath": "https://desktop.repurl.com",
                "repurlLfsPath": null,
                "onBranch": true,
                "shortPath": "5678",
                "siteNavLoginPath": "/login?return_to=identifier",
                "isCSV": false,
                "isRichtext": false,
                "toc": null,
                "lineInfo": {
                    "truncatedLoc": "33",
                    "truncatedSloc": "33"
                },
                "mode": "executable file"
            },
            "image": false,
            "isCodeownersFile": null,
            "isPlain": false,
            "isValidLegacyIssueTemplate": false,
            "issueTemplateHelpUrl": "https://docs.repurl.com/articles/about-issue",
            "issueTemplate": null,
            "discussionTemplate": null,
            "language": null,
            "languageID": null,
            "large": false,
            "loggedIn": false,
            "newDiscussionPath": "/contributor/repository/issues/new",
            "newIssuePath": "/contributor/repository/issues/new",
            "planSupportInfo": {
                "repoOption1": null,
                "repoOption2": null,
                "requestFullPath": "/contributor/repository/blob/main/thing2",
                "repoOption4": null,
                "repoOption5": null,
                "repoOption6": null,
                "repoOption7": null
            },
            "repoOption8": {
                "repoOption9": "/settings/dismiss-notice/repoOption10",
                "releasePath": "/contributor/repository/releases/new=true",
                "repoOption11": false,
                "repoOption12": false
            },
            "rawBlobUrl": "https://repurl.com/contributor/repository/raw/main/thing2",
            "repoOption13": false,
            "richText": null,
            "renderedFileInfo": null,
            "shortPath": null,
            "tabSize": 8,
            "topBannersInfo": {
                "overridingGlobalFundingFile": false,
                "universalPath": null,
                "repoOwner": "contributor",
                "repoName": "repository",
                "repoOption14": false,
                "citationHelpUrl": "https://docs.repurl.com/en/repurl/archiving/about",
                "repoOption15": false,
                "repoOption16": null
            },
            "truncated": false,
            "viewable": true,
            "workflowRedirectUrl": null,
            "symbols": {
                "timedOut": false,
                "notAnalyzed": true,
                "symbols": []
            }
        },
        "collabInfo": null,
        "collabMod": false,
        "wtsdf_signifier": {
            "/contributor/repository/branches": {
                "post": "identifier"
            },
            "/repos/preferences": {
                "post": "identifier"
            }
        }
    },
    "title": "repository/thing2 at main \\u0000 contributor/repository"
}

这是处理由一个单词组成的属性名称的代码(删除空格的命令仅导入由多个单词组成的名称的第一个单词):

import json
import pandas as pd

f = open("yourJson.json", "r")
data = json.load(f)
f.close()

# Get what we want to extract from the json
to_extract = data["payload"]["blob"]["rawLines"]

# Remove useless whitespace
stripped = [e.strip() for e in to_extract]
trimmed = [" ".join(e.split()) for e in stripped]

# Transform the list of string to a dict
as_dict = {e.split(' ')[0]: e.split(' ')[1] for e in trimmed}

# Load the dict with pandas
df = pd.DataFrame(as_dict.items(), columns=['Value', 'Property'])

我尝试过各种解决方案(例如,不去除空格、指定与我需要的数据关联的确切属性名称),但对于 JSON 来说我很迷茫,以至于错误没有意义。

python json python-3.x pandas file-io
1个回答
0
投票

您可以在 json 键中使用空格,如果这是您的问题,则它不是无效的。

{
    "My name is": "Efe"
}

如果你想从字符串中删除不需要的空格,你可以使用这个:

mystring = " Hello "
mystring = mystring.strip()

#'Hello'

如果您可以使用一个问题中的所有材料来编辑问题,而不参考旧问题,那么会更容易看到问题和代码。

© www.soinside.com 2019 - 2024. All rights reserved.