我正在尝试在 DBT 内的 python 模型中导入一些 Snowflake conda 通道包。但是当我在模型上运行 dbt build 时遇到以下错误:
无法使用指定的包创建Python函数。请检查您的包裹规格并重试。
但是当我在 DBT Cloud IDE 中编译相同的代码并在 Snowsight 中运行它时,它工作得很好(我必须从 Snowsight 中“包”部分的“anaconda 包”选项卡中选择要导入的包)。
这是我的模型代码:
from datetime import datetime
import pandas as pd
from snowflake.snowpark.functions import col
from sklearn.feature_extraction.text import TfidfVectorizer
import faiss
import numpy as np
def model(dbt, session):
dbt.config(
packages = [
'faiss-cpu',
'numpy',
'pandas',
'scikit-learn',
]
)
scrapped_products = dbt.ref('mart_dwh__d_products').to_pandas()
scrapped_products['PRODUCT_NAME'] = scrapped_products['PRODUCT_NAME'].str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8')
scrapped_products['PRODUCT_NAME'] = scrapped_products['PRODUCT_NAME'].str.upper()
scrapped_products['PRODUCT_NAME'] = scrapped_products['PRODUCT_NAME'].str.replace('[^a-zA-Z0-9 ]', ' ', regex=True)
scrapped_products['PRODUCT_NAME'] = scrapped_products['PRODUCT_NAME'].str.replace(' +', ' ', regex=True)
customer_products = dbt.ref('seed__mil_products_catalog').to_pandas()
customer_products['PRODUCT_NAME'] = customer_products['PRODUCT_NAME'].str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8')
customer_products['PRODUCT_NAME'] = customer_products['PRODUCT_NAME'].str.upper()
customer_products['PRODUCT_NAME'] = customer_products['PRODUCT_NAME'].str.replace('[^a-zA-Z0-9 ]', ' ', regex=True)
customer_products['PRODUCT_NAME'] = customer_products['PRODUCT_NAME'].str.replace(' +', ' ', regex=True)
tfidf_vectorizer = TfidfVectorizer(analyzer='char', ngram_range=(1, 3))
scrapped_product_vectors = tfidf_vectorizer.fit_transform(scrapped_products['PRODUCT_NAME']).toarray()
customer_product_vectors = tfidf_vectorizer.transform(customer_products['PRODUCT_NAME']).toarray()
d = scrapped_product_vectors.shape[1]
index = faiss.IndexFlatL2(d)
index.add(np.array(scrapped_product_vectors, dtype=np.float32))
distances, indices = index.search(np.array(customer_product_vectors, dtype=np.float32), 1)
match_result = []
for i, (scrapped_idx, customer_idx) in enumerate(zip(indices, distances)):
match_result.append({
'CUSTOMER_PRODUCT_ID': customer_products.iloc[i]['PRODUCT_ID'], # ID of the current customer product
'SCRAPPED_PRODUCT_ID': scrapped_products.iloc[scrapped_idx[0]]['ID'], # Matched scrapped product ID
'SIMILARITY_RATIO': 1 / (1 + customer_idx[0]), # Convert Euclidean distance to a similarity score
'PROCESSED_AT': datetime.now().strftime("%Y-%m-%d %H:%M:%S.%f") # Timestamp of processing
})
final_df = pd.DataFrame(match_result)
session.use_database(dbt.this.database)
session.use_schema(dbt.this.schema)
return_df = session.create_dataframe(final_df)
return return_df
我如何让它在雪景中工作
我想知道我错过了什么。我尝试了各种方法,但似乎都不起作用。我总是遇到同样的错误。我还尝试将我的包添加到阶段并在我的代码中引用它们,但最终遇到了相同的错误。
编辑
DBT 生成的 proc python 版本:
尝试在snowsight中使用python 3.8时出错:
有没有办法在 dbt 中设置自定义 python 版本?