我一直在寻求从 URL 导入 JSON 格式数据的帮助(就处理 JSON 而言,我是新手),并收到了针对 this 问题的很好答案。
然而,我遇到了麻烦。我的一些属性名称包含空格。例如,“Property1”和我上一个问题中的其他几个属性名称实际上可能是“Property1_word1 Property1_word2”。当前的解决方案仅保留属性名称的第一个单词。一开始我可以侥幸逃脱,但现在需要所有的言语。如果有人能给我任何提示,我将不胜感激。我到现在还没找到。
编辑(在此处提供所有信息,以便无需参考以前的帖子):
我想从网站导入数据。首先,我将网站的内容(如下)保存为文件。在我上一个问题中,每个属性名称仅由一个单词组成。现在我正在处理由多个单词组成的属性名称。我在下面提供了一个示例,其中 Property1、Property4 和 Property8 的名称包含多个单词。
{
"payload": {
"allShortcutsEnabled": false,
"fileTree": {
"": {
"items": [
{
"name": "thing",
"path": "thing",
"contentType": "directory"
},
{
"name": ".repurlignore",
"path": ".repurlignore",
"contentType": "file"
},
{
"name": "README.md",
"path": "README.md",
"contentType": "file"
},
{
"name": "thing2",
"path": "thing2",
"contentType": "file"
},
{
"name": "thing3",
"path": "thing3",
"contentType": "file"
},
{
"name": "thing4",
"path": "thing4",
"contentType": "file"
},
{
"name": "thing5",
"path": "thing5",
"contentType": "file"
},
{
"name": "thing6",
"path": "thing6",
"contentType": "file"
},
{
"name": "thing7",
"path": "thing7",
"contentType": "file"
},
{
"name": "thing8",
"path": "thing8",
"contentType": "file"
},
{
"name": "thing9",
"path": "thing9",
"contentType": "file"
},
{
"name": "thing10",
"path": "thing10",
"contentType": "file"
},
{
"name": "thing11",
"path": "thing11",
"contentType": "file"
}
],
"totalCount": 500
}
},
"fileTreeProcessingTime": 5.262188,
"foldersToFetch": [],
"reducedMotionEnabled": null,
"repo": {
"id": 1234567,
"defaultBranch": "main",
"name": "repository",
"ownerLogin": "contributor",
"currentUserCanPush": false,
"isFork": false,
"isEmpty": false,
"createdAt": "2023-10-31",
"ownerAvatar": "https://avatars.repurlusercontent.com/u/98765432?v=1",
"public": true,
"private": false,
"isOrgOwned": false
},
"symbolsExpanded": false,
"treeExpanded": true,
"refInfo": {
"name": "main",
"listCacheKey": "v0:13579",
"canEdit": false,
"refType": "branch",
"currentOid": "identifier"
},
"path": "thing2",
"currentUser": null,
"blob": {
"rawLines": [
" C_1H_4 Methane ",
" 5.00000 Property1_word1 Property1_word2 ",
" 20.00000 Property2 ",
" 500.66500 Property3 ",
" 100.00000 Property4_word1 Property4_word2 ",
" -4453.98887 Property5 ",
" 100.48200 Property6 ",
" 59.75258 Property7 ",
" 5.33645 Property8_word1 Property8_word2 ",
" 0.00000 Property9 ",
" 645.07777 Property10 ",
" 0.00000 Property11 ",
" 0.00000 Property12 ",
" 0.00000 Property13 ",
" 0.00000 Property14 ",
" 0.00000 Property15 ",
" 0.00000 Property16 ",
" 0.00000 Property17 ",
" 0.00000 Property18 ",
" 0.00000 Property19 ",
" 0.00000 Property20 ",
" 0.00000 Property21 ",
" 0.00000 Property22 ",
" 0.00000 Property23 ",
" 0.00000 Property24 ",
" 0.00000 Property25 ",
" 0.57876 Property26 ",
" 4.00000 Property27 ",
" 0.00000 Property28 ",
" 0.00000 Property29 ",
" 0.00000 Property30 ",
" 0.00000 Property31 ",
" 0.00000 Property32 ",
" 1.00000 Property33 ",
" 0.00000 Property34 ",
" 26.00000 Property35 ",
" 1.44571 Property36 ",
" 1.08756 Property37 ",
" 0.00000 Property38 ",
" 0.00000 Property39 ",
" 0.00000 Property40 ",
" 6.00000 Property41 ",
" 9.00000 Property42 ",
" 0.00000 Property43 "
],
"stylingDirectives": [
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[]
],
"csv": null,
"csvError": null,
"dependabotInfo": {
"showConfigurationBanner": false,
"configFilePath": null,
"networkDependabotPath": "/contributor/repository/network/updates",
"dismissConfigurationNoticePath": "/settings/dismiss-notice/dependabot_configuration_notice",
"configurationNoticeDismissed": null,
"repoAlertsPath": "/contributor/repository/security/dependabot",
"repoSecurityAndAnalysisPath": "/contributor/repository/settings/security_analysis",
"repoOwnerIsOrg": false,
"currentUserCanAdminRepo": false
},
"displayName": "thing2",
"displayUrl": "https://repurl.com/contributor/repository/blob/main/thing2?raw=true",
"headerInfo": {
"blobSize": "3.37 KB",
"deleteInfo": {
"deleteTooltip": "You must be signed in to make or propose changes"
},
"editInfo": {
"editTooltip": "XXX"
},
"ghDesktopPath": "https://desktop.repurl.com",
"repurlLfsPath": null,
"onBranch": true,
"shortPath": "5678",
"siteNavLoginPath": "/login?return_to=identifier",
"isCSV": false,
"isRichtext": false,
"toc": null,
"lineInfo": {
"truncatedLoc": "33",
"truncatedSloc": "33"
},
"mode": "executable file"
},
"image": false,
"isCodeownersFile": null,
"isPlain": false,
"isValidLegacyIssueTemplate": false,
"issueTemplateHelpUrl": "https://docs.repurl.com/articles/about-issue",
"issueTemplate": null,
"discussionTemplate": null,
"language": null,
"languageID": null,
"large": false,
"loggedIn": false,
"newDiscussionPath": "/contributor/repository/issues/new",
"newIssuePath": "/contributor/repository/issues/new",
"planSupportInfo": {
"repoOption1": null,
"repoOption2": null,
"requestFullPath": "/contributor/repository/blob/main/thing2",
"repoOption4": null,
"repoOption5": null,
"repoOption6": null,
"repoOption7": null
},
"repoOption8": {
"repoOption9": "/settings/dismiss-notice/repoOption10",
"releasePath": "/contributor/repository/releases/new=true",
"repoOption11": false,
"repoOption12": false
},
"rawBlobUrl": "https://repurl.com/contributor/repository/raw/main/thing2",
"repoOption13": false,
"richText": null,
"renderedFileInfo": null,
"shortPath": null,
"tabSize": 8,
"topBannersInfo": {
"overridingGlobalFundingFile": false,
"universalPath": null,
"repoOwner": "contributor",
"repoName": "repository",
"repoOption14": false,
"citationHelpUrl": "https://docs.repurl.com/en/repurl/archiving/about",
"repoOption15": false,
"repoOption16": null
},
"truncated": false,
"viewable": true,
"workflowRedirectUrl": null,
"symbols": {
"timedOut": false,
"notAnalyzed": true,
"symbols": []
}
},
"collabInfo": null,
"collabMod": false,
"wtsdf_signifier": {
"/contributor/repository/branches": {
"post": "identifier"
},
"/repos/preferences": {
"post": "identifier"
}
}
},
"title": "repository/thing2 at main \\u0000 contributor/repository"
}
这是处理由一个单词组成的属性名称的代码(删除空格的命令仅导入由多个单词组成的名称的第一个单词):
import json
import pandas as pd
f = open("yourJson.json", "r")
data = json.load(f)
f.close()
# Get what we want to extract from the json
to_extract = data["payload"]["blob"]["rawLines"]
# Remove useless whitespace
stripped = [e.strip() for e in to_extract]
trimmed = [" ".join(e.split()) for e in stripped]
# Transform the list of string to a dict
as_dict = {e.split(' ')[0]: e.split(' ')[1] for e in trimmed}
# Load the dict with pandas
df = pd.DataFrame(as_dict.items(), columns=['Value', 'Property'])
我尝试过各种解决方案(例如,不去除空格、指定与我需要的数据关联的确切属性名称),但对于 JSON 来说我很迷茫,以至于错误没有意义。
您可以在 json 键中使用空格,如果这是您的问题,则它不是无效的。
{
"My name is": "Efe"
}
如果你想从字符串中删除不需要的空格,你可以使用这个:
mystring = " Hello "
mystring = mystring.strip()
#'Hello'
如果您可以使用一个问题中的所有材料来编辑问题,而不参考旧问题,那么会更容易看到问题和代码。