如何清理json文件成为Dataframe?

问题描述 投票:0回答:1

我正在从 MaxPreps(高中体育统计网站)抓取篮球统计数据。我成功将数据导入到我的vscode中,但是json数据巨大且混乱。我可以看到所有正确的数字和球员姓名,但行列不整齐。我该如何让 json 文件成为一个整齐的行/列数据框?

我尝试了 pandas 标准化功能,但不确定我在看什么。我尝试将输出与 NBA 的统计网站进行比较,他们的数据看起来更有条理,就好像他们的输出是网站上的统计表一样。

我的代码:

import pandas as pd
import requests
pd.set_option('display.max_columns', None)
import numpy as np

test_url = 'https://production.api.maxpreps.com/gatewayweb/react/team-season-player-stats/rollup/v1?teamId=cb3c4816-4749-4381-8c4c-5613ef8c89c9&sportSeasonId=77be7c75-cdf9-483d-867f-ea2af557e731'
url_json = requests.get(url=test_url).json()


df_normal = pd.json_normalize(url_json)
print(df_normal)
#print(url_json)
  {'status': 200, 'message': 'Success', 'cacheResult': 'None', 'data': {'teamId': 'cb3c4816-4749-4381-8c4c-5613ef8c89c9', 'sportSeasonId': '77be7c75-cdf9-483d-867f-ea2af557e731', 'groups': [{'name': 'Game Stats', 'subgroups': [{'name': '', 'stats': {'columns': [{'name': 'Jersey', 'header': '#', 'displayName': '#', 'isSortedColumn': True, 'overallValue': None, 'sortDirection': 1, 'columnType': 1}, {'name': 'Name', 'header': 'Name', 'displayName': 'Name', 'isSortedColumn': False, 'overallValue': None, 'sortDirection': 0, 'columnType': 2}, {'name': 'GamesPlayed', 'header': 'GP', 'displayName': 'Games Played', 'isSortedColumn': True, 'overallValue': '29', 'sortDirection': 2, 'columnType': 0}, {'name': 'MinutesPerGame', 'header': 'MPG', 'displayName': 'Minutes Per Game', 'isSortedColumn': True, 'overallValue': '0', 'sortDirection': 2, 'columnType': 0}, {'name': 'PointsPerGame', 'header': 'PPG', 'displayName': 'Points Per Game', 'isSortedColumn': True, 'overallValue': '53.2

^^^json数据

我想要它的样子

parameters
: 
{LeagueID: "00", PerMode: "Totals", StatCategory: "PTS", Season: "All Time", SeasonType: "Playoffs",…}
resource
: 
"leagueleaders"
resultSet
: 
{name: "LeagueLeaders",…}
headers
: 
["PLAYER_ID", "PLAYER_NAME", "GP", "MIN", "FGM", "FGA", "FG_PCT", "FG3M", "FG3A", "FG3_PCT", "FTM",…]
name
: 
"LeagueLeaders"
rowSet
: 
[,…]
[0 … 99]
0
: 
[2544, "LeBron James", 287, 11858, 2928, 5896, 0.497, 470, 1415, 0.332, 1836, 2479, 0.741, 430, 2153,…]
1
: 
[893, "Michael Jordan", 179, 7474, 2188, 4497, 0.487, 148, 446, 0.332, 1463, 1766, 0.828, 305, 847,…]
2
: 
[76003, "Kareem Abdul-Jabbar", 237, 8851, 2356, 4422, 0.533, 0, 4, 0, 1050, 1419, 0.74, 505, 1273,…]
3
: 
[977, "Kobe Bryant", 220, 8641, 2014, 4499, 0.448, 292, 882, 0.331, 1320, 1617, 0.816, 230, 889, 1119,…]
{'resource': 'leagueleaders', 'parameters': {'LeagueID': '00', 'PerMode': 'Totals', 'StatCategory': 'PTS', 'Season': 'All Time', 'SeasonType': 'Playoffs', 'Scope': 'S', 'ActiveFlag': 'No'}, 'resultSet': {'name': 'LeagueLeaders', 'headers': ['PLAYER_ID', 'PLAYER_NAME', 'GP', 'MIN', 'FGM', 'FGA', 'FG_PCT', 'FG3M', 'FG3A', 'FG3_PCT', 'FTM', 'FTA', 'FT_PCT', 'OREB', 'DREB', 'REB', 'AST', 'STL', 'BLK', 'TOV', 'PF', 'PTS', 'AST_TOV', 'STL_TOV', 'EFG_PCT', 'TS_PCT', 'GP_RANK', 'MIN_RANK', 'FGM_RANK', 'FGA_RANK', 'FG_PCT_RANK', 'FG3M_RANK', 'FG3A_RANK', 'FG3_PCT_RANK', 'FTM_RANK', 'FTA_RANK', 'FT_PCT_RANK', 'OREB_RANK', 'DREB_RANK', 'REB_RANK', 'AST_RANK', 'STL_RANK', 'BLK_RANK', 'TOV_RANK', 'PF_RANK', 'PTS_RANK', 'AST_TOV_RANK', 'STL_TOV_RANK', 'EFG_PCT1', 'TS_PCT1'], 'rowSet': [[2544, 'LeBron James', 287, 11858, 2928, 5896, 0.497, 470, 1415, 0.332, 1836, 2479, 0.741, 430, 2153, 2583, 2067, 483, 275, 1034, 655, 8162, 1.999, 0.467, 0.536, 0.584, 1, 1, 1, 1, 591, 3, 2, 714, 1, 1, 1262, 16, 1, 4, 2, 1, 10, 1, 8, 1, 469, 1078, 491, 405], [893, 'Michael Jordan', 179, 7474, 2188, 4497, 0.487, 148, 446, 0.332, 1463, 1766, 0.828, 305, 847, 1152, 1022, 376, 158, 546, 541, 5987, 1.872, 0.689, 0.503, 0.568, 19, 12, 3, 3, 65

来自NBA的统计^^^

类似的内容已经被报道过,但它使用的是 NBA.com 的统计数据。我不确定它如何转换为我从 MaxPreps 获得的数据。我基本上想获取我的数据 MaxPreps 并使其位于一个干净的数据框中,以便我可以开始绘制它。

python pandas dataframe visualization
1个回答
0
投票

可能不是最干净的解决方案,但这似乎可以解决问题。这种格式需要更多的争论,而不仅仅是在 Pandas DataFrame 中导入。

import requests
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', None)

test_url = 'https://production.api.maxpreps.com/gatewayweb/react/team-season-player-stats/rollup/v1?teamId=cb3c4816-4749-4381-8c4c-5613ef8c89c9&sportSeasonId=77be7c75-cdf9-483d-867f-ea2af557e731'
url_json = requests.get(url=test_url).json()

df_dict = {}
for group_idx, group in enumerate(url_json['data']['groups']):
    group_dict = {}
    for subgroup in group['subgroups']:
        for row_idx, row in enumerate(subgroup['stats']['rows']):
            row_dict = {}
            for col_idx, col in enumerate(subgroup['stats']['columns']):
                header = col['displayName']
                value = row['columns'][col_idx]['value']
                row_dict[header] = value
            group_dict[row_idx] = row_dict
        df_dict[group_idx] = group_dict

df_list = []
for df in df_dict:
    df = pd.DataFrame(df_dict[df]).T
    df_list.append(df)
df_out = pd.concat(df_list)
© www.soinside.com 2019 - 2024. All rights reserved.