我们如何使用 Synapses Serverless SQL 池在 Delta Lake 上创建维度模型,而不使用 Azure Analytics Services?
请帮我做步骤...
我发现它可以在Data Lake
上完成要使用 Synapse Serverless SQL 池在 Delta Lake 上创建维度模型,您可以按照以下步骤操作:
使用以下代码创建所需的数据源和源文件格式,例如 Parquet 和分隔文本格式:
CREATE EXTERNAL FILE FORMAT [SynapseParquetFormat]
WITH (
FORMAT_TYPE = PARQUET
);
CREATE EXTERNAL FILE FORMAT [SynapseDelimitedTextFormat]
WITH ( FORMAT_TYPE = DELIMITEDTEXT ,
FORMAT_OPTIONS (
FIELD_TERMINATOR = ',',
FIRST_ROW = 2,
USE_TYPE_DEFAULT = FALSE
))
CREATE EXTERNAL DATA SOURCE [files_badls_dfs_core_windows_net]
WITH (
LOCATION = 'abfss://<containerNmae>@<ADLSName>.dfs.core.windows.net'
)
使用以下代码创建源外部表:
CREATE EXTERNAL TABLE dbo.employee (
[EMPLOYEE_ID] Int,
[FIRST_NAME] nvarchar(4000),
[LAST_NAME] nvarchar(4000),
[EMAIL] nvarchar(4000),
[PHONE_NUMBER] nvarchar(4000),
[HIRE_DATE] nvarchar(4000),
[JOB_ID] nvarchar(4000),
[SALARY] INT,
[COMMISSION_PCT] nvarchar(4000),
[MANAGER_ID] INT,
[DEPARTMENT_ID] INT
)
WITH (
LOCATION = 'inputs/employees.csv',
DATA_SOURCE = [files_badls_dfs_core_windows_net],
FILE_FORMAT = [SynapseDelimitedTextFormat]
)
对于初始维度加载,从 CSV 文件中提取源数据并使用以下代码将其加载到数据湖:
CREATE EXTERNAL TABLE Dimemployee WITH
(
LOCATION = 'datawarehouse/conformed/dimemployee/1',
DATA_SOURCE= files_badls_dfs_core_windows_net,
FILE_FORMAT = SynapseParquetFormat
)
AS
SELECT ROW_NUMBER() OVER (ORDER BY EMPLOYEE_ID) as Employeekey,
EMPLOYEE_ID as EmployeeBusinesskey,
FIRST_NAME,
GETDATE() as DateTimeLoaded
FROM employee
尺寸数据加载到序列号文件夹结构中,如下所示:
欲了解更多步骤,您可以按照this。