在 DBT python 模型中加载雪花 conda 包

问题描述 投票:0回答:1

我正在尝试在 DBT 内的 python 模型中导入一些 Snowflake conda 通道包。但是当我在模型上运行 dbt build 时遇到以下错误:

无法使用指定的包创建Python函数。请检查您的包裹规格并重试。

但是当我在 DBT Cloud IDE 中编译相同的代码并在 Snowsight 中运行它时,它工作得很好(我必须从 Snowsight 中“包”部分的“anaconda 包”选项卡中选择要导入的包)。

这是我的模型代码:

from datetime import datetime
import pandas as pd
from snowflake.snowpark.functions import col
from sklearn.feature_extraction.text import TfidfVectorizer
import faiss
import numpy as np

def model(dbt, session): 

    dbt.config(
        packages = [
            'faiss-cpu',
            'numpy', 
            'pandas',
            'scikit-learn',
        ]
    )

    
    scrapped_products = dbt.ref('mart_dwh__d_products').to_pandas()
    
    scrapped_products['PRODUCT_NAME'] = scrapped_products['PRODUCT_NAME'].str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8')
    
    scrapped_products['PRODUCT_NAME'] = scrapped_products['PRODUCT_NAME'].str.upper()
    
    scrapped_products['PRODUCT_NAME'] = scrapped_products['PRODUCT_NAME'].str.replace('[^a-zA-Z0-9 ]', ' ', regex=True)
    
    scrapped_products['PRODUCT_NAME'] = scrapped_products['PRODUCT_NAME'].str.replace(' +', ' ', regex=True)

    
    customer_products = dbt.ref('seed__mil_products_catalog').to_pandas()
    
    customer_products['PRODUCT_NAME'] = customer_products['PRODUCT_NAME'].str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8')
    
    customer_products['PRODUCT_NAME'] = customer_products['PRODUCT_NAME'].str.upper()
    
    customer_products['PRODUCT_NAME'] = customer_products['PRODUCT_NAME'].str.replace('[^a-zA-Z0-9 ]', ' ', regex=True)
    
    customer_products['PRODUCT_NAME'] = customer_products['PRODUCT_NAME'].str.replace(' +', ' ', regex=True)

    
    tfidf_vectorizer = TfidfVectorizer(analyzer='char', ngram_range=(1, 3))
    
    
    scrapped_product_vectors = tfidf_vectorizer.fit_transform(scrapped_products['PRODUCT_NAME']).toarray()
    
    customer_product_vectors = tfidf_vectorizer.transform(customer_products['PRODUCT_NAME']).toarray()

    
    d = scrapped_product_vectors.shape[1]
    
    
    index = faiss.IndexFlatL2(d)
    
    index.add(np.array(scrapped_product_vectors, dtype=np.float32))

    
    distances, indices = index.search(np.array(customer_product_vectors, dtype=np.float32), 1)  

    
    match_result = []
    
    for i, (scrapped_idx, customer_idx) in enumerate(zip(indices, distances)):
        match_result.append({
            'CUSTOMER_PRODUCT_ID': customer_products.iloc[i]['PRODUCT_ID'],  # ID of the current customer product
            'SCRAPPED_PRODUCT_ID': scrapped_products.iloc[scrapped_idx[0]]['ID'],  # Matched scrapped product ID
            'SIMILARITY_RATIO': 1 / (1 + customer_idx[0]),  # Convert Euclidean distance to a similarity score
            'PROCESSED_AT': datetime.now().strftime("%Y-%m-%d %H:%M:%S.%f")  # Timestamp of processing
        })

    final_df = pd.DataFrame(match_result)

    session.use_database(dbt.this.database)
    session.use_schema(dbt.this.schema)
    return_df = session.create_dataframe(final_df)

    return return_df

我如何让它在雪景中工作

Snowsight caption

我想知道我错过了什么。我尝试了各种方法,但似乎都不起作用。我总是遇到同样的错误。我还尝试将我的包添加到阶段并在我的代码中引用它们,但最终遇到了相同的错误。


编辑

DBT 生成的 proc python 版本:

DBT Generated proc python version

尝试在snowsight中使用python 3.8时出错:

Error when trying to use python 3.8 in snowsight

有没有办法在 dbt 中设置自定义 python 版本?

python snowflake-cloud-data-platform dbt python-packaging
1个回答
0
投票

要设置自定义 python 版本,我只需在配置块中指定它,如下所示:

Changing python version for a python dbt model

而且效果很好

Excution output

© www.soinside.com 2019 - 2024. All rights reserved.