使用 DuckDB SQL 访问深层嵌套字段

问题描述 投票:0回答:1

我有一些来自 json 文件的深层嵌套数据,我正在尝试将其加载到 DuckDB 中:

  "events": [
    {
      "id": "401638586",
      "uid": "s:40~l:41~e:401638586",
      "date": "2024-03-21T18:45Z",
      "name": "Wagner Seahawks at North Carolina Tar Heels",
      "shortName": "WAG VS UNC",
      "season": {
        "year": 2024,
        "type": 3,
        "slug": "post-season"
      },
      "competitions": [
        {
          "id": "401638586",
          "uid": "s:40~l:41~e:401638586~c:401638586",
          "date": "2024-03-21T18:45Z",
          "attendance": 18223,
          "type": {
            "id": "6",
            "abbreviation": "TRNMNT"
          },
          "timeValid": true,
          "neutralSite": true,
          "conferenceCompetition": false,
          "playByPlayAvailable": true,
          "recent": false,
.
.
.

我使用的查询如下所示:

select 
    games['id'] as id, 
    games['date'] as date,
    games['season']['year'] as season_year,
    games['season']['slug'] as season_slug,
    '2024-03-21' as partition_date, 
    games['name'] as name, 
    games['shortName'] as short_name, 
    games['status']['period'] as period,
    games['status']['type']['completed'] as completed, 
    games['competitions'][0]['neutralSite'] as neutral, 
    games['competitions'][0]['conferenceCompetition'] as in_conference, 
    games['competitions'][0]['playByPlayAvailable'] as pbp_available 
from (
    select 
        unnest(events) as games 
    from read_json('/path/to/json/data')
)
limit 1;

输出如下所示:

            id = 401638586
          date = 2024-03-21T18:45Z
   season_year = 2024
   season_slug = post-season
partition_date = 2024-03-21
          name = Wagner Seahawks at North Carolina Tar Heels
    short_name = WAG VS UNC
        period = 2
     completed = true
       neutral = 
 in_conference = 
 pbp_available = 

您可以看到,一旦进入嵌套的“competitions”字段,它只会返回空/空值。我怎样才能正确访问这些字段?我尝试过使用

json_extract
但似乎无法让它工作。

sql json duckdb
1个回答
0
投票

您正在使用

[0]
而不是
[1]

请参阅文档中的警告

遵循 PostgreSQL 的约定,DuckDB 对数组和列表使用基于 1 的索引,对 JSON 数据类型使用基于 0 的索引。

使用

[1]

 将产生预期值。

games['competitions'][1]['neutralSite'] as neutral, games['competitions'][1]['conferenceCompetition'] as in_conference, games['competitions'][1]['playByPlayAvailable'] as pbp_available
Rows: 1
Columns: 10
$ id              <str> '401638586'
$ date            <str> '2024-03-21T18:45Z'
$ season_year     <i64> 2024
$ season_slug     <str> 'post-season'
$ partition_date  <str> '2024-03-21'
$ name            <str> 'Wagner Seahawks at North Carolina Tar Heels'
$ short_name      <str> 'WAG VS UNC'
$ neutral        <bool> True
$ in_conference  <bool> False
$ pbp_available  <bool> True
    
© www.soinside.com 2019 - 2024. All rights reserved.