插入松果索引

问题描述 投票:0回答:1

我正在尝试使用以下代码将列表中的一些数据更新插入到 Pinecone 索引中:

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertModel.from_pretrained("bert-base-uncased")

def generate_embeddings(text):
    inputs = tokenizer(text, return_tensors="pt")
    outputs = model(**inputs)
    embeddings = outputs.last_hidden_state.mean(dim=1).squeeze().detach().numpy()
    return embeddings

embeddings = [generate_embeddings(article) for article in article_content]


pc = Pinecone(api_key="API_KEY")
pc.create_index(
    name="index-name",
    dimension=4096, # Replace with your model dimensions
    metric="cosine", # Replace with your model metric
    spec=ServerlessSpec(
        cloud="aws",
        region="us-east-1"
    ) 
)
index = pc.Index("index-name")

index.upsert(embeddings)

但我收到以下错误:


Traceback (most recent call last):
  File "c:\Users\tvish\llama api.py", line 158, in <module>
    index.upsert(embeddings)
  File "C:\Users\tvish\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\pinecone\utils\error_handling.py", line 11, in inner_func
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\tvish\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\pinecone\data\index.py", line 175, in upsert
    return self._upsert_batch(vectors, namespace, _check_type, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\tvish\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\pinecone\data\index.py", line 206, in _upsert_batch
    vectors=list(map(vec_builder, vectors)),
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\tvish\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\pinecone\data\index.py", line 202, in <lambda>
    vec_builder = lambda v: VectorFactory.build(v, check_type=_check_type)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\tvish\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\pinecone\data\vector_factory.py", line 30, in build
    raise ValueError(f"Invalid vector value passed: cannot interpret type {type(item)}")
ValueError: Invalid vector value passed: cannot interpret type <class 'numpy.ndarray'>

这是我第一次使用向量,大部分代码都是我刚刚在网上找到的。有谁知道这意味着什么以及如何解决它?

谢谢

python vectorization pinecone
1个回答
0
投票

您的嵌入位于 numpy 数组对象中,而不是 Pinecone upsert 方法所需的浮点数列表中。尝试在数组上使用

.tolist()
将其转换为浮点数列表。

© www.soinside.com 2019 - 2024. All rights reserved.