当我在 elevanlabs 上测试时,我将在下面提到的代码产生非常糟糕的声音(output.mp3)。您认为这是什么原因?是不是设置有问题?
import absl.flags
import absl.app
import absl.logging
import google.generativeai as genai
import requests
import os
import pygame
# Disable unnecessary logs
absl.flags.FLAGS.stderrthreshold = "FATAL"
# Configure your API key
genai.configure(api_key="GEMİNİ_APİ") # Gemini API
# Initialize Pygame Mixer
pygame.mixer.init()
class Response:
def text(self, prompt, question):
"""Sends the prompt and question to the Gemini API and returns the response in text format."""
self.prompt = prompt
self.question = question
# Combine prompt and question
full_question = f"{self.prompt}\n{self.question}"
# Send a request to the Gemini API
model = genai.GenerativeModel("gemini-1.5-flash")
self.response = model.generate_content(full_question)
def text_response(self):
# Display the response on the screen
print(self.response.text)
def voice_response(self):
url = "https://api.elevenlabs.io/v1/text-to-speech/68gbrBPLYTEZzIIJ0apU" # Voice model API
querystring = {"optimize_streaming_latency":"2"}
payload = {
"text": self.response.text,
"voice_settings": {
"stability": 0.35,
"similarity_boost": 0.85,
"style": 0.55
}
}
headers = {
"xi-api-key": "ELEVENLABS_APİ_KEY",
"Content-Type": "application/json"
}
response_voice = requests.request("POST", url, json=payload, headers=headers, params=querystring)
if response_voice.status_code == 200:
with open("output.mp3", "wb") as file:
file.write(response_voice.content)
print("Audio successfully created and saved to 'output.mp3'.")
# Play the audio file
pygame.mixer.music.load("output.mp3")
pygame.mixer.music.play()
# Wait until the audio playback is complete
while pygame.mixer.music.get_busy():
pygame.time.Clock().tick(10)
else:
print(f"Error: {response_voice.status_code} - {response_voice.text}")
# Main function
def main(argv):
prompt = """You are a friendly, polite, and respectful male employee responsible for guiding patients to the correct department and floor in a hospital.
You don't talk about things you don't know.
I will give you information about the departments and floors in the hospital. You will answer the questions asked to you based on this information!
If a patient has a problem, help them, approach them with good intentions, share your feelings with them, and give them moral support.
Departments on the 1st floor: Anesthesiology and Reanimation, Appointment making, Brain and Neurosurgery, and Pediatric Surgery.
Directions to the departments on the 1st floor:
1. Anesthesiology and Reanimation: Go straight through door A1, it is the last door on the right.
2. Appointment making: You will see it immediately to the right of the entrance.
3. Brain and Neurosurgery: It is the 2nd door on the left from door A2.
4. Neurosurgery: Go straight through door C1, it is the last right door on the 1st left.
5. Pediatric Surgery: Go through C1 and it is the first door on the right.
Based on this information, guide the people who come to you and always remember to not ask for anything more after your answer!
Your answers should not be too short, at least 3 lines.
"""
question = input("Your question: ")
response = Response()
response.text(prompt, question)
response.voice_response()
# Main program
if __name__ == '__main__':
absl.app.run(main)
这是一款应用程序,可以根据医院提供的楼层、面积和类似信息来指导人工智能工作人员。 我现在使用该应用程序的唯一问题是,正如我之前提到的,声音比我在网站上的 11labs 中尝试的声音差。 如果需要,您甚至可以建议新模型和新设置。 但请注意,模型必须是土耳其语或支持其字符。
查询字符串={“optimize_streaming_latency”:“2”}
您要求他们在速度与质量之间进行权衡(尽管这个参数似乎已被弃用,我不知道他们目前如何处理它)。尝试减少它
0 - 默认模式(无延迟优化)
1 - 正常延迟优化(大约是选项 3 可能延迟改进的 50%)
2 - 强大的延迟优化(选项 3 的延迟可能改善约 75%)
3 - 最大延迟优化 4 - 最大延迟优化,但也可以关闭文本规范化器以节省更多延迟(最佳延迟,但可能会发音错误,例如数字和日期)。
看起来它也默认默认请求使用
monolingual_v1
模型,这是他们的旧模型。
尝试使用他们的客户端,并明确要求更新的型号,例如:
from elevenlabs import ElevenLabs
client = ElevenLabs(
api_key="YOUR_API_KEY",
)
client.text_to_speech.convert(
voice_id="68gbrBPLYTEZzIIJ0apU",
output_format="mp3_44100_128",
text="The first move is what sets everything in motion.",
model_id="eleven_multilingual_v2",
)
https://elevenlabs.io/docs/api-reference/text-to-speech/convert