我正在使用 Google Cloud TTS 和 PHP 来生成音频文件。大文本被分割成小块,为每个小块生成音频,然后媒体文件组合在一起,并保存为 mp3 文件。
当音频编码为MP3时可以正常工作,但当编码为LINEAR16时,合并不起作用。音频仅包含第一段。
文档说, ->getAudioContent() 方法返回 bace64_encoded 音频,但与 php 一起使用时,它没有编码,只是一个音频文件。你可以看到它在这个文件上运行: https://storage.googleapis.com/gspeech-audio-storage/gspeech_en-US:2:I:1:0_ef4285edf0514f0b90636718b155c089_9df685f7e3bd36d8ec36cf4faec53c2d.mp3
我提供了制作主要内容的代码。知道如何使用 PHP 将 LINEAR16 编码的音频文件组合在一起吗?
...
use Google\Cloud\TextToSpeech\V1\AudioConfig;
use Google\Cloud\TextToSpeech\V1\AudioEncoding;
use Google\Cloud\TextToSpeech\V1\SsmlVoiceGender;
use Google\Cloud\TextToSpeech\V1\SynthesisInput;
use Google\Cloud\TextToSpeech\V1\TextToSpeechClient;
use Google\Cloud\TextToSpeech\V1\VoiceSelectionParams;
$client = new TextToSpeechClient();
$synthesisInputText = new SynthesisInput();
$audioConfig = new AudioConfig();
// $audioConfig->setAudioEncoding(AudioEncoding::MP3); works fine
$audioConfig->setAudioEncoding(AudioEncoding::LINEAR16);
$voice = new VoiceSelectionParams();
$voice->setLanguageCode('en-US');
$voice->setName('en-US-Neural2-A');
$media_total = '';
foreach($txt_array as $k => $txt) {
$synthesisInputText->setText($txt);
$response = $client->synthesizeSpeech($synthesisInputText, $voice, $audioConfig);
$media = $response->getAudioContent();
// $media = base64_encode($media); did not work
$media_total .= $media;
}
使用 LINEAR16 编码,$media_total 仅包含第一个文本片段的音频。使用 MP3 编码,效果很好。
尝试添加 setSampleRateHertz
$rateHeartz = 16000;
$audioConfig->setSampleRateHertz($rateHeartz);
并将数据PCM合并为wav文件,就像这样
$wav_header = createWavHeader(strlen($pcm_data), $rateHeartz, 1);
file_put_contents('output.wav', $wav_header . $pcm_data);
function createWavHeader($pcm_data_size, $sample_rate, $channels) {
$block_align = $channels * 2;
$byte_rate = $sample_rate * $block_align;
$header = pack('RIFFVVV4VVVVVVVV',
0x46464952,
36 + $pcm_data_size,
0x45564157,
0x20746d66,
16,
1,
$channels,
$sample_rate,
$byte_rate,
$block_align,
16,
0x61746164,
$pcm_data_size
);
return $header;
}