我正在尝试在客户端录制音频,将其发送到“服务器”,然后通过Sphinx4在“服务器”上使用语音转文本。我的代码:
public class SoundModifier implements Runnable
{
private static final String ACOUSTIC_MODEL = "resource:/edu/cmu/sphinx/models/en-us/en-us";
private static final String DICTIONARY_PATH = "resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict";
private static final String GRAMMAR_PATH = "resource:/edu/cmu/sphinx/demo/dialog/";
private static final String LANGUAGE_MODEL = "resource:/edu/cmu/sphinx/models/en-us/en-us.lm.bin";
// other unrelated stuff
public SoundModifier(ConcurrentLinkedQueue inputQueue, ConcurrentLinkedQueue outputQueue, String saveFolder) throws IOException
{
Configuration configuration = new Configuration();
configuration.setAcousticModelPath(ACOUSTIC_MODEL);
configuration.setDictionaryPath(DICTIONARY_PATH);
configuration.setLanguageModelPath(LANGUAGE_MODEL);
configuration.setSampleRate(16000);
recognizer = new StreamSpeechRecognizer(configuration);
// other unrelated stuff
}
@Override
public void run()
{
var now = ZonedDateTime.now();
while(running)
{
while (inputQueue.size() > 0)
{
byte[] chunk = (byte[]) inputQueue.poll();
byte[] copy = Arrays.copyOf(chunk, chunk.length);
try
{
getText(copy);
}
catch (IOException ex)
{
Logger.getLogger(SoundModifier.class.getName()).log(Level.SEVERE, null, ex);
}
recordBytes.write(copy, 0, copy.length);
byte[][] send = new byte[][]{"audio".getBytes(), copy };
outputQueue.add(send);
}
}
String time = now.getYear() + "-" + now.getMonthValue() + "-" + now.getDayOfMonth() + "--" + now.getHour() + "-" + now.getMinute() + "-" + now.getSecond();
String filename = saveFolder + time + " SoundModifier.wav";
File file = new File(filename);
try
{
save(file);
}
catch (IOException ex)
{
Logger.getLogger(SoundRecorder.class.getName()).log(Level.WARNING, null, ex);
}
}
private ArrayList<WordResult> getText(byte[] input) throws IOException
{
ArrayList<WordResult> utteredWords = new ArrayList<>();
stream = new ByteArrayInputStream(input);
recognizer.startRecognition(stream);
SpeechResult result;
while ((result = recognizer.getResult()) != null)
{
// var words = result.getWords();
// System.out.println("words: " + words);
// utteredWords.addAll(words);
System.out.format("Hypothesis: %s\n", result.getHypothesis());
serverFrame.setASRText(result.getHypothesis());
}
recognizer.stopRecognition();
return utteredWords;
}
public void save(File wavFile) throws IOException
{
byte[] audioData = recordBytes.toByteArray();
ByteArrayInputStream bais = new ByteArrayInputStream(audioData);
try (AudioInputStream audioInputStream = new AudioInputStream(bais, format, audioData.length / format.getFrameSize()))
{
AudioSystem.write(audioInputStream, AudioFileFormat.Type.WAVE, wavFile);
}
recordBytes.close();
LOGGER.log(Level.INFO, "recordBytes close");
}
}
这将产生以下输出:
11:23:33.703 INFO trieNgramModel LM Cache Size: 0 Hits: 0 Misses: 0
11:23:33.703 INFO speedTracker # ----------------------------- Timers----------------------------------------
11:23:33.703 INFO speedTracker # Name Count CurTime MinTime MaxTime AvgTime TotTime
11:23:33.703 INFO speedTracker Load Dictionary 46 0.0350s 0.0340s 0.0740s 0.0415s 1.9100s
11:23:33.703 INFO speedTracker Load AM 1 0.8700s 0.8700s 0.8700s 0.8700s 0.8700s
11:23:33.703 INFO speedTracker Frontend 184 0.0000s 0.0000s 0.0030s 0.0000s 0.0090s
11:23:33.703 INFO speedTracker Load LM 46 0.2640s 0.2320s 0.3450s 0.2699s 12.4150s
11:23:33.703 INFO speedTracker Score 184 0.0000s 0.0000s 0.0030s 0.0000s 0.0090s
11:23:33.703 INFO speedTracker Prune 460 0.0000s 0.0000s 0.0000s 0.0000s 0.0000s
11:23:33.703 INFO speedTracker Grow 644 0.0000s 0.0000s 0.0030s 0.0001s 0.0380s
11:23:33.703 INFO speedTracker Compile 46 0.3450s 0.2990s 0.6200s 0.3422s 15.7400s
11:23:33.703 INFO speedTracker Total Time Audio: 5.89s Proc: 0.03s 0.00 X real time
11:23:33.703 INFO memoryTracker Mem Total: 1186.00 Mb Free: 689.00 Mb
11:23:33.703 INFO memoryTracker Used: This: 497.00 Mb Avg: 657.31 Mb Max: 1468.03 Mb
11:23:33.703 INFO dictionary Loading dictionary from: jar:file:/C:/Users/???/.m2/repository/de/sciss/sphinx4-data/1.0.0/sphinx4-data-1.0.0.jar!/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict
11:23:33.743 INFO dictionary Loading filler dictionary from: jar:file:/C:/Users/???/.m2/repository/de/sciss/sphinx4-data/1.0.0/sphinx4-data-1.0.0.jar!/edu/cmu/sphinx/models/en-us/en-us/noisedict
11:23:33.743 INFO trieNgramModel Loading n-gram language model from: jar:file:/C:/Users/???/.m2/repository/de/sciss/sphinx4-data/1.0.0/sphinx4-data-1.0.0.jar!/edu/cmu/sphinx/models/en-us/en-us.lm.bin
11:23:33.902 INFO dictionary The dictionary is missing a phonetic transcription for the word '3-d'
11:23:33.903 WARNING trieNgramModel The dictionary is missing a phonetic transcription for the word '3-d'
11:23:33.903 INFO dictionary The dictionary is missing a phonetic transcription for the word 'adjustors'
11:23:33.904 WARNING trieNgramModel The dictionary is missing a phonetic transcription for the word 'adjustors'
11:23:33.904 INFO dictionary The dictionary is missing a phonetic transcription for the word 'adulyadej'
11:23:33.904 WARNING trieNgramModel The dictionary is missing a phonetic transcription for the word 'adulyadej'
11:23:33.915 INFO dictionary The dictionary is missing a phonetic transcription for the word 'chloroflourocarbons'
11:23:33.915 WARNING trieNgramModel The dictionary is missing a phonetic transcription for the word 'chloroflourocarbons'
11:23:33.925 INFO dictionary The dictionary is missing a phonetic transcription for the word 'déjà'
11:23:33.925 WARNING trieNgramModel The dictionary is missing a phonetic transcription for the word 'déjà'
11:23:33.940 INFO dictionary The dictionary is missing a phonetic transcription for the word 'iife'
11:23:33.940 WARNING trieNgramModel The dictionary is missing a phonetic transcription for the word 'iife'
11:23:33.952 INFO dictionary The dictionary is missing a phonetic transcription for the word 'mm-hm'
11:23:33.952 WARNING trieNgramModel The dictionary is missing a phonetic transcription for the word 'mm-hm'
11:23:33.952 INFO dictionary The dictionary is missing a phonetic transcription for the word 'mm-hmm'
11:23:33.952 WARNING trieNgramModel The dictionary is missing a phonetic transcription for the word 'mm-hmm'
11:23:33.952 INFO dictionary The dictionary is missing a phonetic transcription for the word 'mmmm'
11:23:33.952 WARNING trieNgramModel The dictionary is missing a phonetic transcription for the word 'mmmm'
11:23:33.954 INFO dictionary The dictionary is missing a phonetic transcription for the word 'ngo's'
11:23:33.954 WARNING trieNgramModel The dictionary is missing a phonetic transcription for the word 'ngo's'
11:23:33.956 INFO dictionary The dictionary is missing a phonetic transcription for the word 'occured'
11:23:33.956 WARNING trieNgramModel The dictionary is missing a phonetic transcription for the word 'occured'
11:23:33.956 INFO dictionary The dictionary is missing a phonetic transcription for the word 'offical'
11:23:33.956 WARNING trieNgramModel The dictionary is missing a phonetic transcription for the word 'offical'
11:23:33.956 INFO dictionary The dictionary is missing a phonetic transcription for the word 'officals'
11:23:33.956 WARNING trieNgramModel The dictionary is missing a phonetic transcription for the word 'officals'
11:23:33.963 INFO dictionary The dictionary is missing a phonetic transcription for the word 'port_au_prince'
11:23:33.963 WARNING trieNgramModel The dictionary is missing a phonetic transcription for the word 'port_au_prince'
11:23:33.963 INFO dictionary The dictionary is missing a phonetic transcription for the word 'possiblity'
11:23:33.963 WARNING trieNgramModel The dictionary is missing a phonetic transcription for the word 'possiblity'
11:23:33.987 WARNING trieNgramModel Dictionary is missing 15 words that are contained in the language model.
11:23:34.080 INFO dictionary The dictionary is missing a phonetic transcription for the word 'offical'
11:23:34.080 INFO dictionary The dictionary is missing a phonetic transcription for the word 'mm-hm'
11:23:34.081 INFO dictionary The dictionary is missing a phonetic transcription for the word 'adulyadej'
11:23:34.081 INFO dictionary The dictionary is missing a phonetic transcription for the word 'adjustors'
11:23:34.082 INFO dictionary The dictionary is missing a phonetic transcription for the word 'mm-hmm'
11:23:34.082 INFO dictionary The dictionary is missing a phonetic transcription for the word 'ngo's'
11:23:34.082 INFO dictionary The dictionary is missing a phonetic transcription for the word 'officals'
11:23:34.083 INFO dictionary The dictionary is missing a phonetic transcription for the word 'chloroflourocarbons'
11:23:34.083 INFO dictionary The dictionary is missing a phonetic transcription for the word '3-d'
11:23:34.084 INFO dictionary The dictionary is missing a phonetic transcription for the word 'déjà'
11:23:34.085 INFO dictionary The dictionary is missing a phonetic transcription for the word 'port_au_prince'
11:23:34.086 INFO dictionary The dictionary is missing a phonetic transcription for the word 'mmmm'
11:23:34.086 INFO dictionary The dictionary is missing a phonetic transcription for the word 'iife'
11:23:34.089 INFO dictionary The dictionary is missing a phonetic transcription for the word 'possiblity'
11:23:34.090 INFO dictionary The dictionary is missing a phonetic transcription for the word 'occured'
11:23:34.281 INFO lexTreeLinguist Max CI Units 43
11:23:34.281 INFO lexTreeLinguist Unit table size 79507
11:23:34.281 INFO speedTracker # ----------------------------- Timers----------------------------------------
11:23:34.281 INFO speedTracker # Name Count CurTime MinTime MaxTime AvgTime TotTime
11:23:34.281 INFO speedTracker Load Dictionary 47 0.0400s 0.0340s 0.0740s 0.0415s 1.9500s
11:23:34.281 INFO speedTracker Load AM 1 0.8700s 0.8700s 0.8700s 0.8700s 0.8700s
11:23:34.281 INFO speedTracker Frontend 184 0.0000s 0.0000s 0.0030s 0.0000s 0.0090s
11:23:34.281 INFO speedTracker Load LM 47 0.2440s 0.2320s 0.3450s 0.2693s 12.6590s
11:23:34.281 INFO speedTracker Score 184 0.0000s 0.0000s 0.0030s 0.0000s 0.0090s
11:23:34.281 INFO speedTracker Prune 460 0.0000s 0.0000s 0.0000s 0.0000s 0.0000s
11:23:34.281 INFO speedTracker Grow 644 0.0000s 0.0000s 0.0030s 0.0001s 0.0380s
11:23:34.281 INFO speedTracker Compile 47 0.2940s 0.2940s 0.6200s 0.3411s 16.0340s
11:23:34.282 INFO speedTracker This Time Audio: 0.13s Proc: 0.00s Speed: 0.00 X real time
11:23:34.282 INFO speedTracker Total Time Audio: 6.02s Proc: 0.03s 0.00 X real time
11:23:34.282 INFO memoryTracker Mem Total: 1186.00 Mb Free: 301.00 Mb
11:23:34.282 INFO memoryTracker Used: This: 885.00 Mb Avg: 659.76 Mb Max: 1468.03 Mb
11:23:34.282 INFO trieNgramModel LM Cache Size: 0 Hits: 0 Misses: 0
Hypothesis:
11:23:34.282 INFO trieNgramModel LM Cache Size: 0 Hits: 0 Misses: 0
11:23:34.282 INFO speedTracker # ----------------------------- Timers----------------------------------------
11:23:34.282 INFO speedTracker # Name Count CurTime MinTime MaxTime AvgTime TotTime
11:23:34.282 INFO speedTracker Load Dictionary 47 0.0400s 0.0340s 0.0740s 0.0415s 1.9500s
11:23:34.282 INFO speedTracker Load AM 1 0.8700s 0.8700s 0.8700s 0.8700s 0.8700s
11:23:34.282 INFO speedTracker Frontend 188 0.0000s 0.0000s 0.0030s 0.0000s 0.0090s
11:23:34.282 INFO speedTracker Load LM 47 0.2440s 0.2320s 0.3450s 0.2693s 12.6590s
11:23:34.282 INFO speedTracker Score 188 0.0000s 0.0000s 0.0030s 0.0000s 0.0090s
11:23:34.282 INFO speedTracker Prune 470 0.0000s 0.0000s 0.0000s 0.0000s 0.0000s
11:23:34.282 INFO speedTracker Grow 658 0.0000s 0.0000s 0.0030s 0.0001s 0.0380s
11:23:34.282 INFO speedTracker Compile 47 0.2940s 0.2940s 0.6200s 0.3411s 16.0340s
11:23:34.282 INFO speedTracker Total Time Audio: 6.02s Proc: 0.03s 0.00 X real time
11:23:34.282 INFO memoryTracker Mem Total: 1186.00 Mb Free: 301.00 Mb
11:23:34.282 INFO memoryTracker Used: This: 885.00 Mb Avg: 662.16 Mb Max: 1468.03 Mb
[这种类型的输出在我与客户端一起录制音频时重复(并且实际上比录制时间更长,即使看起来处理时间为0.03秒)。
音频格式在其他地方定义:
public class StaticAudioFormat
{
private static final int channels = 1;
private static final boolean signed = true;
private static final boolean bigEndian = false;
private static final float sampleRate = 16000;
private static final int sampleSizeInBits = 16;
/**
* Defines a default audio format used to record
*/
static AudioFormat getAudioFormat()
{
return new AudioFormat(sampleRate, sampleSizeInBits, channels, signed, bigEndian);
}
}
我可以事后将保存的音频读入Audacity,听起来不错。我可以使用以下方式转录录制的文件:
recognizer.startRecognition(new FileInputStream("???/2020-5-12--13-9-37 SoundModifier.wav"));
SpeechResult result = recognizer.getResult();
recognizer.stopRecognition();
System.out.println("---------------------------------------------------------------");
while ((result = recognizer.getResult()) != null) {
System.out.println(result.getHypothesis());
}
System.out.println("---------------------------------------------------------------");
...它正确输出了我说的内容。
要使Sphinx4的StreamSpeechRecognizer从语音实时输出文本,我需要做什么?
编辑:我在Windows上,可能无法使用某些选项。
Sphinx4速度非常慢,您无法识别实时性良好的准确性。尝试更现代的内容,例如Vosk-API