我正在尝试使用以下代码将列表中的一些数据更新插入到 Pinecone 索引中:
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertModel.from_pretrained("bert-base-uncased")
def generate_embeddings(text):
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
embeddings = outputs.last_hidden_state.mean(dim=1).squeeze().detach().numpy()
return embeddings
embeddings = [generate_embeddings(article) for article in article_content]
pc = Pinecone(api_key="API_KEY")
pc.create_index(
name="index-name",
dimension=4096, # Replace with your model dimensions
metric="cosine", # Replace with your model metric
spec=ServerlessSpec(
cloud="aws",
region="us-east-1"
)
)
index = pc.Index("index-name")
index.upsert(embeddings)
但我收到以下错误:
Traceback (most recent call last):
File "c:\Users\tvish\llama api.py", line 158, in <module>
index.upsert(embeddings)
File "C:\Users\tvish\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\pinecone\utils\error_handling.py", line 11, in inner_func
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\tvish\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\pinecone\data\index.py", line 175, in upsert
return self._upsert_batch(vectors, namespace, _check_type, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\tvish\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\pinecone\data\index.py", line 206, in _upsert_batch
vectors=list(map(vec_builder, vectors)),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\tvish\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\pinecone\data\index.py", line 202, in <lambda>
vec_builder = lambda v: VectorFactory.build(v, check_type=_check_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\tvish\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\pinecone\data\vector_factory.py", line 30, in build
raise ValueError(f"Invalid vector value passed: cannot interpret type {type(item)}")
ValueError: Invalid vector value passed: cannot interpret type <class 'numpy.ndarray'>
这是我第一次使用向量,大部分代码都是我刚刚在网上找到的。有谁知道这意味着什么以及如何解决它?
谢谢
您的嵌入位于 numpy 数组对象中,而不是 Pinecone upsert 方法所需的浮点数列表中。尝试在数组上使用
.tolist()
将其转换为浮点数列表。