如何在Python中将语音转换为文本

见贤思齐 · 发表于 2024-9-11 16:20:07

一、说明学习如何使用语音识别Python库执行语音识别，以在Python中将音频语音转换为文本。想要更快地编码吗？我们的Python代码生成器让您只需点击几下即可创建Python脚本。现在就现在试试！二、语言AI库2.1相当给力的转文字库语音识别是计算机软件识别口语中的单词和短语并将其转换为人类可读文本的能力。在本教程中，您将学习如何使用SpeechRecognition库在Python中将语音转换为文本。因此，我们不需要从头开始构建任何机器学习模型，这个库为我们提供了各种知名公共语音识别API（例如GoogleCloudSpeechAPI、IBMSpeechToText等）的便捷包装。请注意，如果您不想使用API，而是直接对机器学习模型进行推理，那么一定要查看本教程，其中我将向您展示如何使用当前最先进的机器学习模型在Python中执行语音识别。另外，如果您想要其他方法来执行ASR，请查看此语音识别综合教程。另请学习：如何在Python中翻译文本。2.2安装过程好吧，让我们开始使用以下命令安装库pip：pip3installSpeechRecognitionpydub1 好的，打开一个新的Python文件并导入它：importspeech_recognitionassr1 这个库的好处是它支持多种识别引擎：CMUSphinx（离线）谷歌语音识别谷歌云语音API维特人工智能微软必应语音识别HoundifyAPIIBM语音转文本Snowboy热词检测（离线）我们将在这里使用Google语音识别，因为它很简单并且不需要任何API密钥。2.3转录音频文件确保当前目录中有一个包含英语演讲的音频文件（如果您想跟我一起学习，请在此处获取音频文件）：filename="16-122828-0002.wav"1'运行运行该文件是从LibriSpeech数据集中获取的，但您可以使用任何您想要的音频WAV文件，只需更改文件名，让我们初始化我们的语音识别器：#initializetherecognizerr=sr.Recognizer()12 下面的代码负责加载音频文件，并使用Google语音识别将语音转换为文本：#openthefilewithsr.AudioFile(filename)assource:#listenforthedata(loadaudiotomemory)audio_data=r.record(source)#recognize(convertfromspeechtotext)text=r.recognize_google(audio_data)print(text)1234567 这将需要几秒钟才能完成，因为它将文件上传到Google并获取输出，这是我的结果：Ibelieveyou'rejusttalkingnonsense1 上面的代码适用于小型或中型音频文件。在下一节中，我们将为大文件编写代码。2.4转录大型音频文件如果您想对长音频文件执行语音识别，那么下面的函数可以很好地处理这个问题：#importinglibrariesimportspeech_recognitionassrimportosfrompydubimportAudioSegmentfrompydub.silenceimportsplit_on_silence#createaspeechrecognitionobjectr=sr.Recognizer()#afunctiontorecognizespeechintheaudiofile#sothatwedon'trepeatourselvesininotherfunctionsdeftranscribe_audio(path):#usetheaudiofileastheaudiosourcewithsr.AudioFile(path)assource:audio_listened=r.record(source)#tryconvertingittotexttext=r.recognize_google(audio_listened)returntext#afunctionthatsplitstheaudiofileintochunksonsilence#andappliesspeechrecognitiondefget_large_audio_transcription_on_silence(path):"""Splittingthelargeaudiofileintochunksandapplyspeechrecognitiononeachofthesechunks"""#opentheaudiofileusingpydubsound=AudioSegment.from_file(path)#splitaudiosoundwheresilenceis500milisecondsormoreandgetchunkschunks=split_on_silence(sound,#experimentwiththisvalueforyourtargetaudiofilemin_silence_len=500,#adjustthisperrequirementsilence_thresh=sound.dBFS-14,#keepthesilencefor1second,adjustableaswellkeep_silence=500,)folder_name="audio-chunks"#createadirectorytostoretheaudiochunksifnotos.path.isdir(folder_name)

s.mkdir(folder_name)whole_text=""#processeachchunkfori,audio_chunkinenumerate(chunks,start=1):#exportaudiochunkandsaveitin#the`folder_name`directory.chunk_filename=os.path.join(folder_name,f"chunk{i}.wav")audio_chunk.export(chunk_filename,format="wav")#recognizethechunktry:text=transcribe_audio(chunk_filename)exceptsr.UnknownValueErrorase:print("Error:",str(e))else:text=f"{text.capitalize()}."print(chunk_filename,":",text)whole_text+=text#returnthetextforallchunksdetectedreturnwhole_text``` 注意：您需要安装Pydub才能pip使上述代码正常工作。上述函数使用模块split_on_silence()中的函数pydub.silence在静音时将音频数据分割成块。该min_silence_len参数是用于分割的最小静音长度（以毫秒为单位）。silence_thresh是阈值，任何比这更安静的东西都将被视为静音，我将其设置为平均dBFS-14，keep_silence参数是在检测到的每个块的开头和结尾处留下的静音量（以毫秒为单位）。这些参数并不适合所有声音文件，请尝试根据您的大量音频需求尝试这些参数。之后，我们迭代所有块并将每个语音音频转换为文本，然后将它们加在一起，这是一个运行示例：path="7601-291468-0006.wav"print("\nFulltext:",get_large_audio_transcription_on_silence(path))注意：您可以在此处7601-291468-0006.wav获取文件。输出：```pythonaudio-chunks\chunk1.wav:Hisabodewhichyouhadfixedinaboweryorcountryseat.audio-chunks\chunk2.wav:Atashortdistancefromthecity.audio-chunks\chunk3.wav:Justatwhatisnowcalleddutchstreet.audio-chunks\chunk4.wav:Soonerboundedwithproofsofhisingenuity.audio-chunks\chunk5.wav

atentsmokejacks.audio-chunks\chunk6.wav:Itrequiredahorsetoworksome.audio-chunks\chunk7.wav

utchovenroastedmeatwithoutfire.audio-chunks\chunk8.wav:Cartsthatwentbeforethehorses.audio-chunks\chunk9.wav:Weathercoxthatturnedagainstthewindandotherwrongheadedcontrivances.audio-chunks\chunk10.wav:Sojustunderstandcanfounditallbeholders.Fulltext:Hisabodewhichyouhadfixedinaboweryorcountryseat.Atashortdistancefromthecity.Justatwhatisnowcalleddutchstreet.Soonerboundedwithproofsofhisingenuity.Patentsmokejacks.Itrequiredahorsetoworksome.Dutchovenroastedmeatwithoutfire.Cartsthatwentbeforethehorses.Weathercoxthatturnedagainstthewindandotherwrongheadedcontrivances.Sojustunderstandcanfounditallbeholders.12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394因此，该函数会自动为我们创建一个文件夹，并放置我们指定的原始音频文件块，然后对所有这些文件运行语音识别。如果您想将音频文件分割成固定的间隔，我们可以使用以下函数：#afunctionthatsplitstheaudiofileintofixedintervalchunks#andappliesspeechrecognitiondefget_large_audio_transcription_fixed_interval(path,minutes=5):"""Splittingthelargeaudiofileintofixedintervalchunksandapplyspeechrecognitiononeachofthesechunks"""#opentheaudiofileusingpydubsound=AudioSegment.from_file(path)#splittheaudiofileintochunkschunk_length_ms=int(1000*60*minutes)#converttomillisecondschunks=[sound[i:i+chunk_length_ms]foriinrange(0,len(sound),chunk_length_ms)]folder_name="audio-fixed-chunks"#createadirectorytostoretheaudiochunksifnotos.path.isdir(folder_name)

s.mkdir(folder_name)whole_text=""#processeachchunkfori,audio_chunkinenumerate(chunks,start=1):#exportaudiochunkandsaveitin#the`folder_name`directory.chunk_filename=os.path.join(folder_name,f"chunk{i}.wav")audio_chunk.export(chunk_filename,format="wav")#recognizethechunktry:text=transcribe_audio(chunk_filename)exceptsr.UnknownValueErrorase:print("Error:",str(e))else:text=f"{text.capitalize()}."print(chunk_filename,":",text)whole_text+=text#returnthetextforallchunksdetectedreturnwhole_text1234567891011121314151617181920212223242526272829303132'运行运行上述函数将大音频文件分割成5分钟的块。您可以更改minutes参数以满足您的需要。由于我的音频文件不是那么大，我尝试将其分成10秒的块：print("\nFulltext:",get_large_audio_transcription_fixed_interval(path,minutes=1/6))1输出：audio-fixed-chunks\chunk1.wav:Hisabodewhichyouhadfixedinaboweryorcountryseatatashortdistancefromthecityjustthatoneisnowcalled.audio-fixed-chunks\chunk2.wav

utchstreetsoonaboundedwithproofsofhisingenuitypatentsmokejacksthatrequiredahorsetoworksome.audio-fixed-chunks\chunk3.wav:Ovenroastedmeatwithoutfirecartsthatwentbeforethehorsesweathercoxthatturnedagainstthewindandotherwronghead.audio-fixed-chunks\chunk4.wav:Contrivancesthatastonishedandconfounditallbeholders.Fulltext:Hisabodewhichyouhadfixedinaboweryorcountryseatatashortdistancefromthecityjustthatoneisnowcalled.Dutchstreetsoonaboundedwithproofsofhisingenuitypatentsmokejacksthatrequiredahorsetoworksome.Ovenroastedmeatwithoutfirecartsthatwentbeforethehorsesweathercoxthatturnedagainstthewindandotherwronghead.Contrivancesthatastonishedandconfounditallbeholders.12345672.5从麦克风读取这需要在您的计算机上安装PyAudio，以下是根据您的操作系统安装的过程：windows 你可以直接pip安装它：$pip3installpyaudio1Linux 您需要先安装依赖项：$sudoapt-getinstallpython-pyaudiopython3-pyaudio$pip3installpyaudio12苹果系统你需要先安装portaudio，然后你可以直接pip安装它：$brewinstallportaudio$pip3installpyaudio12 现在让我们使用麦克风来转换我们的语音：importspeech_recognitionassrwithsr.Microphone()assource:#readtheaudiodatafromthedefaultmicrophoneaudio_data=r.record(source,duration=5)print("Recognizing...")#convertspeechtotexttext=r.recognize_google(audio_data)print(text)123456789 这将从您的麦克风中听到5秒钟，然后尝试将语音转换为文本！它与前面的代码非常相似，但是我们在这里使用该Microphone()对象从默认麦克风读取音频，然后我们使用函数duration中的参数record()在5秒后停止读取，然后将音频数据上传到Google以获取输出文本。您还可以使用函数offset中的参数在几秒record()后开始录制offset。此外，您可以通过将language参数传递给recognize_google()函数来识别不同的语言。例如，如果您想识别西班牙语语音，您可以使用：text=r.recognize_google(audio_data,language="es-ES")1 在此StackOverflow答案中查看支持的语言。三、结论正如您所看到的，使用这个库将语音转换为文本非常容易和简单。这个库在野外被广泛使用。查看官方文档。如果您也想在Python中将文本转换为语音，请查看本教程。另请阅读：如何使用Python识别图像中的光学字符。快乐编码！

		自动登录	找回密码
密码			会员注册