遍历嵌套节列表python

问题描述 投票:0回答:1

我有一个名为“sections”的字典列表,格式如下:

  [{
      "elements": [
        "/sections/1",
        "/sections/5",
        "/sections/6",
        "/sections/7"
      ]
    },
    {
      "elements": [
        "/sections/2",
        "/sections/3",
        "/sections/4"
      ]
    },
    {
      "elements": [
        "/paragraphs/0"
      ]
    },
    {
      "elements": [
        "/paragraphs/1"
      ]
    },
    {
      "elements": [
        "/paragraphs/2"
      ]
    },
    {
      "elements": [
        "/paragraphs/3",
        "/tables/0",
        "/paragraphs/5",
        "/paragraphs/6",
        "/paragraphs/7",
        "/paragraphs/8",
        "/paragraphs/9",
        "/paragraphs/10",
        "/paragraphs/11",
        "/paragraphs/12",
        "/paragraphs/13",
        "/paragraphs/14",
        "/paragraphs/15",
        "/paragraphs/16",
        "/paragraphs/17",
        "/paragraphs/18"
      ]
    },
    {
      "elements": [
        "/paragraphs/19",
        "/paragraphs/21",
        "/paragraphs/22",
        "/paragraphs/23",
        "/paragraphs/24",
        "/paragraphs/25",
        "/paragraphs/26",
        "/paragraphs/27",
        "/paragraphs/28",
        "/paragraphs/29",
        "/paragraphs/30",
        "/paragraphs/31",
        "/paragraphs/32",
        "/paragraphs/33",
        "/paragraphs/34",
        "/paragraphs/35",
        "/paragraphs/36",
        "/paragraphs/37",
        "/paragraphs/38",
        "/paragraphs/39",
        "/paragraphs/40",
        "/paragraphs/41",
        "/paragraphs/42"
      ]
    }]

它是 Azure 文档智能 json 的示例输出。我想遍历这些部分。 “sections”是一个值列表,也可能包含嵌套的部分或段落。

例如

 print(sections[0])
会给我
{'elements': ['/sections/1', '/sections/5', '/sections/6', '/sections/7']}

"/sections/1" 可以解释为

sections[1]
,其他类似。

嵌套的层次结构是

Section--->Paragraph

我想遍历列表并展平输出。

我有另一个段落字典,其中键作为段落编号,值作为实际段落内容,我想引用它。

因此,我希望遍历此部分列表并获得段落的输出,例如:

["/paragraphs/0","/paragraphs/1","/paragraphs/3","/tables/0","/paragraphs/5"...]

一旦我以这种格式输出,我就可以编写另一个函数来从段落字典中提取准确的信息。(我自己做。)

我需要帮助以优化的方式编写遍历的代码/函数。我写了一些东西,但没有给出正确的结果。

def CheckSectCondition(sect_elems):
    if len([s for s in sect_elems if "sect" in s]) == 0:
        return True
    else:
        return False
    
all_text = ""
for i in range(0,len(section_data)):
    curr_section = section_data[i]
    curr_section_elements = curr_section['elements']
    if CheckSectCondition(curr_section_elements) == False:
        while CheckSectCondition(curr_section_elements) == False:
            for i in curr_section_elements:
                if i[1:5] == 'sect':
                    sub_sec_name = i.split('/')[-1]
                    sub_sec_elements = section_data[int(sub_sec_name)]['elements']
                    print(sub_sec_elements)
                    #again iterate
                elif i[1:5] == 'para':
                    print(i)
                    #do something
                elif i[1:5] == 'tabl':
                    print(i)
                    #do something
            CheckSectCondition(curr_section_elements) == True

### here section_data is the sections list

任何帮助将非常感激,因为我不知道递归编程,因为节内的节可能是多个级别。

python python-3.x algorithm traversal
1个回答
0
投票

这是执行此操作的基本方法

t = [{
      "elements": [
        "/sections/1",
        "/sections/5",
        "/sections/6",
        "/sections/7"
      ]
    },
    {
      "elements": [
        "/sections/2",
        "/sections/3",
        "/sections/4"
      ]
    },
    {
      "elements": [
        "/paragraphs/0"
      ]
    },
    {
      "elements": [
        "/paragraphs/1"
      ]
    },
    {
      "elements": [
        "/paragraphs/2"
      ]
    },
    {
      "elements": [
        "/paragraphs/3",
        "/tables/0",
        "/paragraphs/5",
        "/paragraphs/6",
        "/paragraphs/7",
        "/paragraphs/8",
        "/paragraphs/9",
        "/paragraphs/10",
        "/paragraphs/11",
        "/paragraphs/12",
        "/paragraphs/13",
        "/paragraphs/14",
        "/paragraphs/15",
        "/paragraphs/16",
        "/paragraphs/17",
        "/paragraphs/18"
      ]
    },
    {
      "elements": [
        "/paragraphs/19",
        "/paragraphs/21",
        "/paragraphs/22",
        "/paragraphs/23",
        "/paragraphs/24",
        "/paragraphs/25",
        "/paragraphs/26",
        "/paragraphs/27",
        "/paragraphs/28",
        "/paragraphs/29",
        "/paragraphs/30",
        "/paragraphs/31",
        "/paragraphs/32",
        "/paragraphs/33",
        "/paragraphs/34",
        "/paragraphs/35",
        "/paragraphs/36",
        "/paragraphs/37",
        "/paragraphs/38",
        "/paragraphs/39",
        "/paragraphs/40",
        "/paragraphs/41",
        "/paragraphs/42"
      ]
    }]
    
r = []
for k in t:
    r.extend(k['elements'])
print(r)

告诉我它是否足够适合您的用例,然后我们可以根据需要进一步优化它。

最新问题
© www.soinside.com 2019 - 2025. All rights reserved.