语音识别(SR)是计算语言学的跨学科子领域,它将语言学,计算机科学和电气工程领域的知识和研究结合起来,开发出能够通过计算机识别和翻译口语的方法和技术。和计算机化的设备,如分类为智能技术和机器人技术的设备
在src/pyaudio/device_api.c中包含的文件:1: 在src/pyaudio/device_api.h中包含的文件中:7: /library/frameworks/python.framework/versions/3.13/include/python3.13/python.h:19:19:10 :.....
https://www.google.com/speech-api/v2/recognize?...
有很多节点模块,这些模块只需包装浏览器的语音识别,如果您不在浏览器中,则没有帮助。还有几个模块是外部服务的接口,如果您向您发送音频,可以为您提供语音识别。
android webkitspeechRevention .fimpinal变量未显示正确的值
var recognition = new webkitSpeechRecognition(); recognition.onresult = function (e) { //This is called every time the Speech Recognition hears you speak. //You may say "How's it going today", the recognition will try to //interpret what you're saying while you're speaking. For example, while //you're speaking it may go.. "house" "how's it going" "how's it going today" //as it interprets it returns an object that contains properties, one of //which is "e.results[i].isFinal" where "i" is an array of returned objects. //In this case the object with a transcript of "house" would have a //"e.results[i].isFinal" value of false. Where as the object with a transcript //of "how's it going today" would have a "e.results[i].isFinal" value of //true.. Because this is the FINAL INTERPRETATION of this particular transcript. //HOWEVER.. The problem I'm having is that when using a mobile device, the "e.results[i].isFinal" always //has a value of true, even when it's not the final interpretation. It works correctly on desktop however. Both are using Chrome. if(e.results[e.results.length-1].isFinal){ var finalTranscript = ''; for(i=0;i<e.results.length;i++){ finalTranscript += e.results[i][0].transcript; } console.log(finalTranscript); document.getElementById('output').innerHTML = finalTranscript; } }
为了解决此问题,我使用Web Audio API实现了音频预处理:
在OSX Mavericks中,现在包括语音说法,非常有用。我正在尝试使用命令能力来创建自己的数字生活助手,但找不到如何使用识别...
android的extrageRognizer用extra_audio_source仍在听麦克风而不是从文件
private var recorder: MediaRecorder? = null private var recognizer: SpeechRecognizer? = null private val mediaFormat = MediaRecorder.OutputFormat.MPEG_4 private val audioEncoding = MediaRecorder.AudioEncoder.DEFAULT private var currentRecordingFile: String = "recording_0.3gp" private var recordingParcel: ParcelFileDescriptor? = null // [ {"text": "speech to text result", "file": "path to clip recording"}, "time": "datetime" ] private var translations = mutableStateListOf<Map<String, String>>() private fun startTalking () { startRecording() } private fun stopTalking () { stopRecording() startRecognizing() } private fun startRecording () { val num = translations.count() currentRecordingFile = "$externalCacheDir/recording_$num.3gp" recorder = MediaRecorder(this).apply { setAudioSource(MediaRecorder.AudioSource.MIC) setOutputFormat(mediaFormat) setAudioEncoder(audioEncoding) setAudioChannels(1) setAudioSamplingRate(16000) setAudioEncodingBitRate(64000) setOutputFile(currentRecordingFile) try { prepare() } catch (e: IOException) { Log.e("startRecording", e.toString()) } start() } } private fun stopRecording () { recorder?.apply { stop() release() } recorder = null } private fun startRecognizing () { val file = File(currentRecordingFile) recordingParcel = ParcelFileDescriptor.open(file, ParcelFileDescriptor.MODE_READ_ONLY) val intent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH) intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, "in-ID") intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_PREFERENCE, "in-ID") intent.putExtra(RecognizerIntent.EXTRA_AUDIO_SOURCE, recordingParcel) intent.putExtra(RecognizerIntent.EXTRA_AUDIO_SOURCE_ENCODING, audioEncoding) intent.putExtra(RecognizerIntent.EXTRA_AUDIO_SOURCE_CHANNEL_COUNT, 1) intent.putExtra(RecognizerIntent.EXTRA_AUDIO_SOURCE_SAMPLING_RATE, 16000) try { recognizer = SpeechRecognizer.createSpeechRecognizer(this) recognizer?.setRecognitionListener(this) recognizer?.startListening(intent) } catch (e: Exception) { Log.e("SpeechRecognizer", e.message.toString()) } } private fun stopRecognizing () { recordingParcel?.close() recognizer?.stopListening() recognizer?.destroy() recognizer = null } override fun onError(error: Int) { Log.e("Speech onError", error.toString()) stopRecognizing() } override fun onResults(results: Bundle){ val words: ArrayList<String>? = results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION) if (words != null) { val sentence = words.joinToString(separator = " ") val translation = mapOf("text" to sentence, "file" to currentRecordingFile) translations.add(translation) Log.e("CURR RESULT", sentence) } stopRecognizing() }
const recognition = new SpeechRecognition(); recognition.continuous = true; recognition.lang = 'en-US'; recognition.onresult = (event) => {...} recognition.start();
我对 Azure 语音服务还很陌生,我正在使用 twilo/plivo 服务将号码与 azure stt 连接起来,并在转录后进一步处理它。 我的问题是当我说话时,它是
我目前正在我的项目中使用 Azure 语音转文本。它直接从麦克风识别语音输入(这就是我想要的)并保存文本输出,但我也对保存感兴趣......
Docker Compose 和 React JS 的 WebSocket 连接问题
我遇到本地部署问题。当使用 Docker Compose 运行 Kaldi 服务器和 React JS 前端时,问题出在 WebSocket 连接上。当 Kaldi
我正在尝试使用speech_recognition模块制作一个python AI, 我想在人工智能中添加一个热词检测功能,所以我尝试使用语音识别模块来实现它,但它不起作用......
在 tauri JS 应用程序中,我正在从 JS 录制音频并对其进行处理,然后通过 rust 处理程序将数据发送到 python 子进程。在 python 脚本中,我使用 vosk 将语音转换为实时文本...
我正在尝试虚拟辅助,想知道如何让它响应被叫的名字。目前我有一个按钮可以激活listen(),让程序开始监听