)上听到
最终,目的是通过电话使用TTS,这对他们很重要。 per this this响应,我可以确认Google TTS演示不支持<phoneme>
标签。因此,我转向代码生成输出。
void Main()
{
var builder = new TextToSpeechClientBuilder();
builder.ApiKey = "<api key from Google TTS project>";
var client = builder.Build();
// The input to be synthesized, can be provided as text or SSML.
var input = new SynthesisInput
{
Ssml = "<speak><phoneme alphabet=\"ipa\" ph=\"ˌjɑː.kɔː.ˈkiː\">dad</phoneme></speak>",
};
// Build the voice request.
var voiceSelection = new VoiceSelectionParams
{
LanguageCode = "en-US",
Name = "en-US-Wavenet-C",
};
// Specify the type of audio file.
var audioConfig = new AudioConfig
{
AudioEncoding = AudioEncoding.Mp3
};
// Perform the text-to-speech request.
var response = client.SynthesizeSpeech(input, voiceSelection, audioConfig);
// Write the response to the output file.
using (var output = File.Create(@"C:\Temp\yakoke.mp3"))
{
response.AudioContent.WriteTo(output);
}
Console.WriteLine("Audio content written to file \"yakoke.mp3\"");
}
为me生成的内容),但这在两种方式上是不正确的。 第二个音节应该听起来是长的“ o”(如
boy中)。根据我们英语的抑制音素和压力水平,代码
ɔː
应产生此声音。相反,我听到了第一个元音的重复。
应力标记不会影响任何音节上的口音。在参考音频中,很明显,听到最后一个音节会受到主要压力。 我在这里错过了什么?
我用python进行了测试,这是我的ssml:
<speak><phoneme alphabet=\"ipa\" ph=\"ˌjɑː.kʊ.ˈkiː\">yakoke</phoneme></speak>