如何识别用户在Android中开始和停止说话的时间？（Android中的语音识别）

Question

我已经做了很多研发并经历了很多资源来解决我的问题，但我没有得到任何适当的解决方案。

我开发了一个应用程序，现在我想为它添加基于语音的功能。

所需的功能是

1）当USER开始讲话时，它应该录制音频/视频和

2）当用户停止讲话时，应播放录制的音频/视频。

注意：此处视频表示用户在该段时间内执行的任何操作。例如，点击按钮或某种动画等。

我不想在Android中默认使用谷歌的语音识别器，因为它需要互联网，但我的应用程序脱机运行。此外，我开始了解CMU-Sphinx。但根据我的要求，它没有帮助。

编辑： - 此外，我想补充说，我已经使用开始和停止按钮实现了这一点，但我不想使用这些按钮。

如果有人有任何想法或任何建议，请告诉我。

Answer 1

最简单和最常见的方法是在音频中使用count the number of zero crossings（即当符号从正变为负时）。

如果该值太高，则声音不太可能是语音。如果它太低，那么再次，它不太可能是语音。

将其与简单的能量水平（音频的声音大小）相结合，您就拥有了非常强大的解决方案。

如果您需要更准确的系统，那么它会变得更加复杂。一种方法是从“训练数据”中提取音频特征（例如MFCCs），用类似GMM的方法对它们进行建模，然后测试从GMM对实时音频中提取的特征。通过这种方式，您可以模拟给定音频帧是非语音语音的可能性。然而，这不是一个简单的过程。

我强烈建议你去零交叉线，因为它很容易实现，99％的时间工作正常:)

Answer 2

您可以尝试将监听器添加到应用程序事件，如导航，单击动画等......在侦听器实现中，您可以触发启动/停止功能...

http://tseng-blog.nge-web.net/blog/2009/02/14/implementing-listeners-in-your-android-java-application/

看看这些例子......这可能对你有所帮助....

但我想知道你所描述的关于你的应用程序行为的内容看起来像你会像说汤姆一样重新发明？ :-P

Answer 3

下面是我用于iPhone应用程序的代码，完全相同。代码是Objective-C ++，但我有很多评论。此代码在记录队列的回调函数内执行。我确信Android平台也存在类似的方法。

这种方法在我使用它的几乎所有声学环境中都非常好用，它在我们的应用程序中使用。你可以download it测试它，如果你想。

尝试在android平台上实现它，你就完成了！

// If there are some audio samples in the audio buffer of the recording queue
if (inNumPackets > 0) {
        // The following 4 lines of code are vector functions that compute 
        // the average power of the current audio samples. 
        // Go [here][2] to view documentation about them. 
        vDSP_vflt16((SInt16*)inBuffer->mAudioData, 1, aqr->currentFrameSamplesArray, 1, inNumPackets);
        vDSP_vabs(aqr->currentFrameSamplesArray, 1, aqr->currentFrameSamplesArray, 1, inNumPackets);
        vDSP_vsmul(aqr->currentFrameSamplesArray, 1, &aqr->divider, aqr->currentFrameSamplesArray, 1, inNumPackets);
        vDSP_sve(aqr->currentFrameSamplesArray, 1, &aqr->instantPower, inNumPackets);
        // InstantPower holds the energy for the current audio samples
        aqr->instantPower /= (CGFloat)inNumPackets;
        // S.O.S. Avoid +-infs, NaNs add a small number to InstantPower
        aqr->instantPower = log10f(aqr->instantPower + 0.001f);
        // InstantAvgPower holds the energy for a bigger window 
        // of time than InstantPower
        aqr->instantAvgPower = aqr->instantAvgPower * 0.95f + 0.05f * aqr->instantPower;
        // AvgPower holds the energy for an even bigger window 
        // of time than InstantAvgPower
        aqr->avgPower = aqr->avgPower * 0.97f + 0.03f * aqr->instantAvgPower;
        // This is the ratio that tells us when to record
        CGFloat ratio = aqr->avgPower / aqr->instantPower;
        // If we are not already writing to an audio file and 
        // the ratio is bigger than a specific hardcoded value 
        // (this value has to do with the quality of the microphone 
        // of the device. I have set it to 1.5 for an iPhone) then start writing!
        if (!aqr->writeToFile && ratio > aqr->recordingThreshold) {
            aqr->writeToFile = YES;
        } 
        if (aqr->writeToFile) {
            // write packets to file
            XThrowIfError(AudioFileWritePackets(aqr->mRecordFile, FALSE, inBuffer->mAudioDataByteSize,
                                                inPacketDesc, aqr->mRecordPacket, &inNumPackets, inBuffer->mAudioData),
                          "AudioFileWritePackets failed");
            aqr->mRecordPacket += inNumPackets;
            // Now if we are recording but the instantAvgPower is lower 
            // than avgPower then we increase the countToStopRecording counter
            if (aqr->instantAvgPower < aqr->avgPower) {
                aqr->countToStopRecording++;
            } 
            // or else set him to 0.
            else {
                aqr->countToStopRecording = 0;
            }
            // If we have detected that there is not enough power in 30 consecutive
            // audio sample buffers OR we have recorded TOO much audio 
            // (the user speaks for more than a threshold of time) stop recording 
            if (aqr->countToStopRecording > 30 || aqr->mRecordPacket > kMaxAudioPacketsDuration) {
                aqr->countToStopRecording = 0;
                aqr->writeToFile = NO;
                // Notify the audio player that we finished recording 
                // and start playing the audio!!!
                dispatch_async(dispatch_get_main_queue(), ^{[[NSNotificationCenter defaultCenter] postNotificationName:@"RecordingEndedPlayNow" object:nil];});
            }
        }
    }

最好！

Answer 4

这是检测用户停止说话的简单代码。我正在检查以下价值

recorder.getMaxAmplitude（）;

示例代码：

public void startRecording() throws IOException {

    Thread thread = new Thread() {
        @Override
        public void run() {
            int i = 0;
            while (i == 0) {

                try {
                    sleep(100);

                    if (recorder != null) {

                        checkValue(recorder.getMaxAmplitude());

                    }
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }
        }
    };
    thread.start();


}

checkValue函数：

public void checkValue(int amplitude) {


    try{

        if (amplitude > 1000) {
            Log.d("I", "Amplitude : " + amplitude);
            amplitude = recorder.getMaxAmplitude();
            Thread.sleep(2000);
            isListened=true;
        }else if(isListened) {
            Log.d("I","Stop me");
            recordingDialog.dismiss();
        }

    }catch (Exception e){
        e.printStackTrace();
    }


}

我知道这个问题已经很久了，之前已经回答了，但是这个小代码片段可能会帮助其他人。

如何识别用户在Android中开始和停止说话的时间？（Android中的语音识别）

问题描述投票：3回答：4

4个回答

最新问题

如何识别用户在Android中开始和停止说话的时间？ （Android中的语音识别）

问题描述 投票：3回答：4

4个回答

最新问题

如何识别用户在Android中开始和停止说话的时间？（Android中的语音识别）

问题描述投票：3回答：4