在阅读了数十篇有关该主题的帖子后,我得出的结论是,我正在实现一个非常困难的目标,但应该不是不可能的。
您可以使用我的应用程序来学习/提高英语,我正在尝试实现一些“听并重复”音频课程,用户将通过 Android 的文本转语音功能收听一段内容(我已经能够保存文本转语音音频以 wav 文件形式缓存到缓存文件夹),然后用户将重复收听的文本。录音完成后,我将使用 musicg 库来比较两个音轨的相似性并给用户评分。
我尝试了这种方法但没有成功。我能够保存 pcm 音频文件,但该文件不仅不能被任何播放器识别为音频,我什至无法使用 AudioTrack Android 类播放音频。
我还尝试使用此链接将 pcm 音频转换为 wav 也无济于事。生成的 .wav 文件无法在我的 Mac 中播放,也不能在我的手机中播放。
还尝试了 SpeechRecognizer 类,无法保存识别的音频,onBufferReceived 从未被调用。
我的“最后机会”是使用RecognitionIntent中的intent.getData()来提取音频uri,然后使用contentResolver生成音频文件,但intent.getData()总是返回null。
这是我的 Kotlin 代码:
class AudioLessonsActivity : AppCompatActivity(), RecognitionListener, TextToSpeech.OnInitListener {
private val permission = 100
private lateinit var returnedText: TextView
private lateinit var toggleButton: ToggleButton
private lateinit var progressBar: ProgressBar
private lateinit var speech: SpeechRecognizer
private lateinit var recognizerIntent: Intent
private var logTag = "VoiceRecognitionActivity"
private lateinit var ttobj: TextToSpeech
private val mUtteranceID = "totts"
lateinit var recordAudioResultLauncher: ActivityResultLauncher<Intent>
override fun onInit(status: Int) {
if (status == TextToSpeech.SUCCESS) {
val result = ttobj.setLanguage(Locale.US)
val toSpeak = "Hello my friend"
ttobj.speak(toSpeak, TextToSpeech.QUEUE_FLUSH, null,"")
saveToAudioFile(toSpeak)
if (result == TextToSpeech.LANG_MISSING_DATA || result == TextToSpeech.LANG_NOT_SUPPORTED) {
Log.e("TTS","The Language not supported!")
}
}
}
private fun setRecordAudio() {
val recordAudioIntent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH)
recordAudioIntent.putExtra(
RecognizerIntent.EXTRA_LANGUAGE_MODEL,
RecognizerIntent.LANGUAGE_MODEL_FREE_FORM
)
recordAudioIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, Locale.US.toString())
recordAudioIntent.putExtra(RecognizerIntent.EXTRA_PROMPT, "Say anything, please")
recordAudioResultLauncher.launch(recordAudioIntent)
}
private fun promptSpeechInput() {
recordAudioResultLauncher = registerForActivityResult(
ActivityResultContracts .StartActivityForResult()
) { result ->
if (result.resultCode == Activity.RESULT_OK) {
val recordData = result.data?.data //=> always null
//val data = result.data?.extras!![Intent.ACTION_REC] as Uri?
//val bundle: Bundle? = recordData?.extras
//ArrayList<String> matches = bundle.getStringArrayList(RecognizerIntent.EXTRA_RESULTS)
val audioUri: Uri? = recordData
val filestream: InputStream? =
audioUri?.let { contentResolver.openInputStream(it) }
}
}
}
private fun saveToAudioFile(text: String) {
val mAudioFilename = this.cacheDir.toString() + "/file1.wav"
ttobj.synthesizeToFile(text, null, File(mAudioFilename), mUtteranceID)
}
private fun doSpeech(){
ttobj = TextToSpeech(this, this)
}
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
setContentView(R.layout.activity_audiolessons)
promptSpeechInput()
doSpeech()
title = "KotlinApp"
returnedText = findViewById(R.id.textView)
progressBar = findViewById(R.id.progressBar)
toggleButton = findViewById(R.id.toggleButton)
progressBar.visibility = View.VISIBLE
speech = SpeechRecognizer.createSpeechRecognizer(this)
Log.i(logTag, "isRecognitionAvailable: " + SpeechRecognizer.isRecognitionAvailable(this))
speech.setRecognitionListener(this)
recognizerIntent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH)
recognizerIntent.putExtra("android.speech.extra.GET_AUDIO_FORMAT", "audio/AMR")
recognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, "en")
recognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, "en-GB")
recognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, Locale.US.toString())
//intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_PREFERENCE, Locale.UK.toString())
recognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM)
recognizerIntent.putExtra(RecognizerIntent.EXTRA_MAX_RESULTS, 3)
toggleButton.setOnCheckedChangeListener { _, isChecked ->
if (isChecked) {
progressBar.visibility = View.VISIBLE
progressBar.isIndeterminate = true
ActivityCompat.requestPermissions(this@AudioLessonsActivity,
arrayOf(Manifest.permission.RECORD_AUDIO),
permission)
} else {
progressBar.isIndeterminate = false
progressBar.visibility = View.VISIBLE
speech.stopListening()
}
}
}
override fun onRequestPermissionsResult(requestCode: Int, permissions: Array<String?>,
grantResults: IntArray) {
super.onRequestPermissionsResult(requestCode, permissions, grantResults)
when (requestCode) {
permission -> if (grantResults.isNotEmpty() && grantResults[0] == PackageManager.PERMISSION_GRANTED) {
//speech.startListening(recognizerIntent)
setRecordAudio()
} else {
Toast.makeText(this@AudioLessonsActivity, "Permission Denied!", Toast.LENGTH_SHORT).show()
}
}
}
}
我必须添加文件应为.wav 格式,因为musicg 仅支持wav 文件。
好吧,我自己来回答。感谢这个伟大的 OmRecorder 库,我终于得到了它。
我现在可以在应用程序的缓存文件夹中使用用户的声音获取可播放的 .wav 音频文件。