我对 cosmos db 非常陌生,需要一些帮助才能从大型 json 中提取某些信息。
我的json如下:
{
"id": "abc",
"doc1": {
"documentName": "xyz",
"documentUrl": "xyz",
"documentType": ".xlsx",
"contents": [
{
"Date": "2022-06-10T00:00:00",
"Type": "Interaction",
"Subject": "ABC",
"Description": "My name is ABC."
},
{
"Date": "2022-12-01T00:00:00",
"Type": "Interaction",
"Subject": "DEF",
"Description": "I live in a town named DEF."
},
{
"Date": "2023-03-15T00:00:00",
"Type": "Interaction",
"Subject": "IJK",
"Description": "He is known as IJK."
}
]
},
"doc2": {
"documentName": "wyc",
"documentUrl": "wyc",
"documentType": ".xlsx",
"contents": [
{
"Date": "2023-12-05T00:00:00",
"Type": "Task",
"Subject": "KLM",
"Description": "She has a friend who is called as KLM.",
"Status": "Completed"
},
{
"Date": "2023-03-15T00:00:00",
"Type": "Task",
"Subject": "ROQ",
"Description": "The dessert is named as ROQ.",
"Status": "Completed"
},
{
"Date": "2023-07-15T00:00:00",
"Type": "Task",
"Subject": "VDI",
"Description": "We need to know the name of the school that VDI goes to.",
"Status": "Open"
}
]
},
"doc3": {
"documentName": "ckl",
"documentUrl": "ckl",
"documentType": ".pdf",
"contents": [
{
"pageNo": 1,
"pageText": "Hi this place is known to have awesome desserts."
},
{
"pageNo": 2,
"pageText": "Hello World."
},
{
"pageNo": 3,
"pageText": "It is a beautiful day."
},
{
"pageNo": 4,
"pageText": "Sorry I think you have reached the wrong number."
}
]
}
}
我试图从doc1中提取
"Date", "Subject", "Description"
,从doc2中提取"Date", "Subject", "Description" and "Status"
,从doc3中提取"documentUrl" and "pageText"
(仅从"pageNo" 2 and 3
)。
我尝试过检查其他 stackoverflow 问题的答案,但没有一个答案对我有用,如果有人可以帮助我解决这个问题。
您还没有真正指定您想要的输出是什么样子,所以我不得不做出一些假设。您可以尝试这个查询,看看它是否满足您的要求。你可以看到我只为第2页和第3页设置了数组过滤器。您当然可以更改它,将数据放在自己的字段中,而不是像我下面所做的那样:
SELECT c.id, ARRAY(SELECT VALUE t.Date FROM t in c.doc1.contents) AS doc1Date,ARRAY(SELECT VALUE t.Subject FROM t in c.doc1.contents) AS doc1Subject, ARRAY(SELECT VALUE t.Description FROM t in c.doc1.contents) AS doc1Description,ARRAY(SELECT VALUE t.Date FROM t in c.doc2.contents) AS doc2Date, ARRAY(SELECT VALUE t.Status FROM t in c.doc2.contents) AS doc2Status, c.doc3.documentUrl AS doc3url, ARRAY(SELECT VALUE t.pageText FROM t in c.doc3.contents WHERE t["pageNo"] IN (2, 3)) AS doc3pageText FROM c WHERE c.id = "abc"
它会给出如下响应。
[
{
"id": "abc",
"doc1Date": [
"2022-06-10T00:00:00",
"2022-12-01T00:00:00",
"2023-03-15T00:00:00"
],
"doc1Subject": [
"ABC",
"DEF",
"IJK"
],
"doc1Description": [
"My name is ABC.",
"I live in a town named DEF.",
"He is known as IJK."
],
"doc2Date": [
"2023-12-05T00:00:00",
"2023-03-15T00:00:00",
"2023-07-15T00:00:00"
],
"doc2Status": [
"Completed",
"Completed",
"Open"
],
"doc3url": "ckl",
"doc3pageText": [
"Hello World.",
"It is a beautiful day."
]
}
]