我正在开发 Microsoft Fabric 管道,我想将一些数据从 API URL 复制到 Fabric Lakehouse,我面临的问题是它向 Lakehouse 中的表写入相同的数据重复 864 次,但我不这样做不明白为什么。
我还有一个 Authentication 标头和 OSvC-CREST-Application-Context 标头,这是从该 api 检索数据所必需的。
我正在使用复制数据活动执行此操作,现在我只想使用类似于以下内容的 API URL 获取一个元素的详细信息:https://mysite.example.com/services/rest/connect/v1。 4/件/1
这个 api 调用的结果看起来像这样:
{
"id": 1,
"lookupName": "ITEM1",
"createdTime": "2016-01-19T11:08:02.000Z",
"updatedTime": "2024-11-06T08:57:50.000Z",
"locations": {
"links": [
{
"rel": "self",
"href": "https://mysite.example.com/services/rest/connect/v1.4/items/1/locations"
},
{
"rel": "full",
"href": "https://mysite.example.com/services/rest/connect/v1.4/items/1/locations/{location_id}",
"templated": true
}
]
},
"banner": {
"importanceFlag": {
"id": 3,
"lookupName": "High"
},
"text": " ",
"updatedByperson": {
"links": [
{
"rel": "self",
"href": "https://mysite.example.com/services/rest/connect/v1.4/persons/2"
},
{
"rel": "canonical",
"href": "https://mysite.example.com/services/rest/connect/v1.4/persons/2"
},
{
"rel": "describedby",
"href": "https://mysite.example.com/services/rest/connect/v1.4/metadata-catalog/persons",
"mediaType": "application/schema+json"
}
]
},
"updatedTime": "2016-01-22T15:00:26.000Z"
},
"customFields": {
"c": {
"maintenance": true,
"blacklist": false,
"org_remote": null,
"tip_cont": {
"id": 68,
"lookupName": "abc"
},
"amef_cnt": null,
"term_date": null,
"term_reason": null,
"phisical_country": "Romania"
}
},
"externalReference": null,
"attachments": {
"links": [
{
"rel": "self",
"href": "https://mysite.example.com/services/rest/connect/v1.4/items/1/attachments"
},
{
"rel": "full",
"href": "https://mysite.example.com/services/rest/connect/v1.4/items/1/attachments/{attachment_id}",
"templated": true
}
]
},
"industry": null,
"login": "log.in",
"name": "item1",
"nameFurigana": null,
"notes": {
"links": [
{
"rel": "self",
"href": "https://mysite.example.com/services/rest/connect/v1.4/items/1/notes"
},
{
"rel": "full",
"href": "https://mysite.example.com/services/rest/connect/v1.4/items/1/notes/{note_id}",
"templated": true
}
]
},
"itemHierarchy": {
"links": [
{
"rel": "self",
"href": "https://mysite.example.com/services/rest/connect/v1.4/items/1/itemHierarchy"
},
{
"rel": "full",
"href": "https://mysite.example.com/services/rest/connect/v1.4/items/1/itemHierarchy/{itemHierarchy_id}",
"templated": true
}
]
},
"parent": null,
"actionSettings": {
"acquiredDate": null,
"actionperson": {
"links": [
{
"rel": "self",
"href": "https://mysite.example.com/services/rest/connect/v1.4/persons/2"
},
{
"rel": "canonical",
"href": "https://mysite.example.com/services/rest/connect/v1.4/persons/2"
},
{
"rel": "describedby",
"href": "https://mysite.example.com/services/rest/connect/v1.4/metadata-catalog/persons",
"mediaType": "application/schema+json"
}
]
},
"total": {
"currency": {
"id": 2,
"lookupName": "RON"
},
"exchangeRate": null
}
},
"serviceSettings": {
"sLAInstances": {
"links": [
{
"rel": "self",
"href": "https://mysite.example.com/services/rest/connect/v1.4/items/1/serviceSettings/sLAInstances"
},
{
"rel": "full",
"href": "https://mysite.example.com/services/rest/connect/v1.4/items/1/serviceSettings/sLAInstances/{sLAInstance_id}",
"templated": true
}
]
}
},
"source": {
"id": 10,
"lookupName": "Contact",
"parents": [
{
"id": 3,
"lookupName": "consola"
}
]
},
"supersededBy": null,
"links": [
{
"rel": "self",
"href": "https://mysite.example.com/services/rest/connect/v1.4/items/1"
},
{
"rel": "canonical",
"href": "https://mysite.example.com/services/rest/connect/v1.4/items/1"
},
{
"rel": "describedby",
"href": "https://mysite.example.com/services/rest/connect/v1.4/metadata-catalog/items",
"mediaType": "application/schema+json"
}
]
}
我认为不需要分页规则,但我最初设置为分页规则 RFC5988 = True 然后我更改为 MaxRequestNumber=1 只是为了测试是否得到其他结果,不,结果是相同的.
管道运行后的输出如下所示:
{
"dataRead": 16168,
"dataWritten": 5933,
"filesWritten": 1,
"sourcePeakConnections": 1,
"sinkPeakConnections": 1,
"rowsRead": 1,
"rowsCopied": 864,
"copyDuration": 17,
"throughput": 1.47,
"errors": [],
"usedDataIntegrationUnits": 4,
"usedParallelCopies": 1,
"executionDetails": [
{
"source": {
"type": "RestService"
},
"sink": {
"type": "Lakehouse"
},
"status": "Succeeded",
"start": "12/13/2024, 4:35:22 PM",
"duration": 17,
"usedDataIntegrationUnits": 4,
"usedParallelCopies": 1,
"profile": {
"queue": {
"status": "Completed",
"duration": 6
},
"transfer": {
"status": "Completed",
"duration": 11,
"details": {
"readingFromSource": {
"type": "RestService",
"workingDuration": 0,
"timeToFirstByte": 0
},
"writingToSink": {
"type": "Lakehouse",
"workingDuration": 0
}
}
}
},
"detailedDurations": {
"queuingDuration": 6,
"timeToFirstByte": 0,
"transferDuration": 11
}
}
],
"dataConsistencyVerification": {
"VerificationResult": "Unsupported"
}
}
为什么我会得到重复的值以及如何解决它? 或者也许有另一种从 api 获取数据的方法,比如使用 Web Activity 获取数据,然后使用另一个 Activity 将其写入 Lakehouse?!
您可以按照以下步骤来实现您的要求: 创建管道并运行 Web 活动以检索 Api 的详细信息,如下所示:
管道 Web 活动成功执行后,将检索 Api 的详细信息,为源创建 Rest Api 链接服务,为接收器创建 Lake House 链接服务。在 Web 活动成功时添加 foreach 活动,添加以下范围函数作为 foreach 活动的项目:
@range(1,activity('restAPI').output.total_pages)
在每个活动添加复制活动中,通过使用创建的链接服务,使用数据集参数
rurl
和值 @dataset().rurl
创建 Rest Api 数据集,将其添加为复制活动的源,并将 get 活动设置为请求方法并添加值 ?page=@{item()}
为 rurl
如下图:
使用创建的链接服务为表创建带有数据集参数
tablename
的 Lake House 数据集,将其添加到值为 Apipage@{item()}
tablename
的接收器并启用自动创建表。根据要求绘制数据。调试管道,它将成功执行。数据复制成功如下图:
在示例 Api 中有两个页面,这就是为什么创建了 2 个包含完整数据的表。