我在 Kaggle 笔记本上使用 Huggingface 库中的 Llama3 模型,在运行管道模块时遇到此错误 我已经删除了堆栈跟踪的主要部分,因为否则不允许使用所有代码且没有描述来发布问题。
RuntimeError Traceback (most recent call last)
Cell In[19], line 17, in Llama_Chat(system_role, user_msg)
12 def Llama_Chat(system_role,user_msg):
13 messages = [
14 {"role": "system", "content": system_role},
15 {"role": "user", "content": user_msg},
16 ]
---> 17 outputs = pipeline(
18 messages,
19 max_new_tokens=256,
20 temperature = 0.1
21
22 )
24 reply=outputs[0]["generated_text"][-1]["content"]
25 return reply
File /opt/conda/lib/python3.10/site-packages/accelerate/hooks.py:169, in add_hook_to_module.<locals>.new_forward(module, *args, **kwargs)
167 output = module._old_forward(*args, **kwargs)
168 else:
--> 169 output = module._old_forward(*args, **kwargs)
170 return module._hf_hook.post_forward(module, output)
File /opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py:603, in LlamaSdpaAttention.forward(self, hidden_states, attention_mask, position_ids, past_key_value, output_attentions, use_cache, cache_position, position_embeddings, **kwargs)
599 # We dispatch to SDPA's Flash Attention or Efficient kernels via this `is_causal` if statement instead of an inline conditional assignment
600 # in SDPA to support both torch.compile's dynamic shapes and full graph options. An inline conditional prevents dynamic shapes from compiling.
601 is_causal = True if causal_mask is None and q_len > 1 else False
--> 603 attn_output = torch.nn.functional.scaled_dot_product_attention(
604 query_states,
605 key_states,
606 value_states,
607 attn_mask=causal_mask,
608 dropout_p=self.attention_dropout if self.training else 0.0,
609 is_causal=is_causal,
610 )
612 attn_output = attn_output.transpose(1, 2).contiguous()
613 attn_output = attn_output.view(bsz, q_len, -1)
RuntimeError: cutlassF: no kernel found to launch!
这是我在使用kaggle中的transformers库运行huggingface模型时遇到的错误..我已经检查了cuda、pytorch的版本,它们都很好 ChatGpt、Claude 等都建议版本不匹配,但我没有取得任何进展
尝试将以下后端设置为 False。
torch.backends.cuda.enable_mem_efficient_sdp(False)
torch.backends.cuda.enable_flash_sdp(False)