我有一个名为“sections”的字典列表,格式如下:
[{
"elements": [
"/sections/1",
"/sections/5",
"/sections/6",
"/sections/7"
]
},
{
"elements": [
"/sections/2",
"/sections/3",
"/sections/4"
]
},
{
"elements": [
"/paragraphs/0"
]
},
{
"elements": [
"/paragraphs/1"
]
},
{
"elements": [
"/paragraphs/2"
]
},
{
"elements": [
"/paragraphs/3",
"/tables/0",
"/paragraphs/5",
"/paragraphs/6",
"/paragraphs/7",
"/paragraphs/8",
"/paragraphs/9",
"/paragraphs/10",
"/paragraphs/11",
"/paragraphs/12",
"/paragraphs/13",
"/paragraphs/14",
"/paragraphs/15",
"/paragraphs/16",
"/paragraphs/17",
"/paragraphs/18"
]
},
{
"elements": [
"/paragraphs/19",
"/paragraphs/21",
"/paragraphs/22",
"/paragraphs/23",
"/paragraphs/24",
"/paragraphs/25",
"/paragraphs/26",
"/paragraphs/27",
"/paragraphs/28",
"/paragraphs/29",
"/paragraphs/30",
"/paragraphs/31",
"/paragraphs/32",
"/paragraphs/33",
"/paragraphs/34",
"/paragraphs/35",
"/paragraphs/36",
"/paragraphs/37",
"/paragraphs/38",
"/paragraphs/39",
"/paragraphs/40",
"/paragraphs/41",
"/paragraphs/42"
]
}]
它是 Azure 文档智能 json 的示例输出。我想遍历这些部分。 “sections”是一个值列表,也可能包含嵌套的部分或段落。
例如
print(sections[0])
会给我 {'elements': ['/sections/1', '/sections/5', '/sections/6', '/sections/7']}
"/sections/1" 可以解释为
sections[1]
,其他类似。
嵌套的层次结构是
Section--->Paragraph
我想遍历列表并展平输出。
我有另一个段落字典,其中键作为段落编号,值作为实际段落内容,我想引用它。
因此,我希望遍历此部分列表并获得段落的输出,例如:
["/paragraphs/0","/paragraphs/1","/paragraphs/3","/tables/0","/paragraphs/5"...]
一旦我以这种格式输出,我就可以编写另一个函数来从段落字典中提取准确的信息。(我自己做。)
我需要帮助以优化的方式编写遍历的代码/函数。我写了一些东西,但没有给出正确的结果。
def CheckSectCondition(sect_elems):
if len([s for s in sect_elems if "sect" in s]) == 0:
return True
else:
return False
all_text = ""
for i in range(0,len(section_data)):
curr_section = section_data[i]
curr_section_elements = curr_section['elements']
if CheckSectCondition(curr_section_elements) == False:
while CheckSectCondition(curr_section_elements) == False:
for i in curr_section_elements:
if i[1:5] == 'sect':
sub_sec_name = i.split('/')[-1]
sub_sec_elements = section_data[int(sub_sec_name)]['elements']
print(sub_sec_elements)
#again iterate
elif i[1:5] == 'para':
print(i)
#do something
elif i[1:5] == 'tabl':
print(i)
#do something
CheckSectCondition(curr_section_elements) == True
### here section_data is the sections list
任何帮助将非常感激,因为我不知道递归编程,因为节内的节可能是多个级别。
这是执行此操作的基本方法
t = [{
"elements": [
"/sections/1",
"/sections/5",
"/sections/6",
"/sections/7"
]
},
{
"elements": [
"/sections/2",
"/sections/3",
"/sections/4"
]
},
{
"elements": [
"/paragraphs/0"
]
},
{
"elements": [
"/paragraphs/1"
]
},
{
"elements": [
"/paragraphs/2"
]
},
{
"elements": [
"/paragraphs/3",
"/tables/0",
"/paragraphs/5",
"/paragraphs/6",
"/paragraphs/7",
"/paragraphs/8",
"/paragraphs/9",
"/paragraphs/10",
"/paragraphs/11",
"/paragraphs/12",
"/paragraphs/13",
"/paragraphs/14",
"/paragraphs/15",
"/paragraphs/16",
"/paragraphs/17",
"/paragraphs/18"
]
},
{
"elements": [
"/paragraphs/19",
"/paragraphs/21",
"/paragraphs/22",
"/paragraphs/23",
"/paragraphs/24",
"/paragraphs/25",
"/paragraphs/26",
"/paragraphs/27",
"/paragraphs/28",
"/paragraphs/29",
"/paragraphs/30",
"/paragraphs/31",
"/paragraphs/32",
"/paragraphs/33",
"/paragraphs/34",
"/paragraphs/35",
"/paragraphs/36",
"/paragraphs/37",
"/paragraphs/38",
"/paragraphs/39",
"/paragraphs/40",
"/paragraphs/41",
"/paragraphs/42"
]
}]
r = []
for k in t:
r.extend(k['elements'])
print(r)
告诉我它是否足够适合您的用例,然后我们可以根据需要进一步优化它。