如何从各种语言的 YouTube 视频中提取字幕

Question

我使用下面的代码从 YouTube 视频中提取字幕，但它仅适用于英文视频。我有一些西班牙语视频，所以我想知道如何修改代码以提取西班牙语字幕？

from pytube import YouTube
from youtube_transcript_api import YouTubeTranscriptApi

# Define the video URL or ID of the YouTube video you want to extract text from
video_url = 'https://www.youtube.com/watch?v=xYgoNiSo-kY'

# Download the video using pytube
youtube = YouTube(video_url)
video = youtube.streams.get_highest_resolution()
video.download()

# Get the downloaded video file path
video_path = video.default_filename

# Get the video ID from the URL
video_id = video_url.split('v=')[-1]

# Get the transcript for the specified video ID
transcript = YouTubeTranscriptApi.get_transcript(video_id)

# Extract the text from the transcript
captions_text = ''
for segment in transcript:
    caption = segment['text']
    captions_text += caption + ' '

# Print the extracted text
print(captions_text)

Answer 1

使用 - list_transcripts - 获取可用语言列表：

示例：

video_id = 'xYgoNiSo-kY'
transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)

然后，循环

transcript_list

变量以查看获得的可用语言：

示例：

for x, tr in enumerate(transcript_list):
  print(tr.language_code)

在这种情况下，结果是：

是

修改代码以循环视频上可用的语言并下载生成的字幕：

示例：

# Variables for store the downloaded captions:
all_captions = []
caption = None
captions_text = ''

# Loop all languages available for this video and download the generated captions:
for x, tr in enumerate(transcript_list):
  print("Downloading captions in " + tr.language + "...")
  transcript_obtained_in_language = transcript_list.find_transcript([tr.language_code]).fetch()
  for segment in transcript_obtained_in_language:
    caption = segment['text']
    captions_text += caption + ' '
  all_captions.append({"language " : tr.language_code + " - " + tr.language, "captions" : captions_text})
  caption = None
  captions_text = ''
  print("="*20)
print("Done")

在

all_captions

变量中，将存储从给定

VIDEO_ID

获得的字幕和语言。

Answer 2

您可以在通话中添加语言参数。试试这个：

更改自：

transcript = YouTubeTranscriptApi.get_transcript(video_id)

至：

transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=['es'])

如何从各种语言的 YouTube 视频中提取字幕

问题描述投票：0回答：2

2个回答

最新问题

如何从各种语言的 YouTube 视频中提取字幕

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2