How to Easily Read Subtitles from YouTube Videos Using Python

אפריל 14, 2024

30 minutes free Consultation

Learn how to automate manual processes

YouTube provides a vast collection of videos on various topics, and many of these videos come with subtitles or closed captions. Extracting subtitles from YouTube videos can be useful for various purposes, such as analysis, data mining, or creating transcripts. In this tutorial, we’ll learn how to easily read subtitles from YouTube videos using Python and the YouTube Transcript API.

You can follow along with the code examples in this article or access the complete Jupyter Notebook here.

Prerequisites

Before we start, make sure you have the following prerequisites:

Python installed on your system
Access to the internet to install the required libraries

Step 1: Install the Required Libraries

To read subtitles from YouTube videos, we’ll use the youtube_transcript_api library. We’ll also use the pandas library to convert the subtitles into a DataFrame. Run the following commands to install the necessary libraries:

!pip install youtube_transcript_api
!pip install pandas

Step 2: Import the Required Libraries

In your Python script or Jupyter Notebook, import the required libraries:

from youtube_transcript_api import YouTubeTranscriptApi
import pandas as pd

Step 3: Fetch the Transcript

To fetch the transcript for a specific YouTube video, you need the video ID. You can find the video ID in the URL of the video. For example, if the video URL is https://www.youtube.com/watch?v=aKEatGCJUGM, the video ID is aKEatGCJUGM.

Use the YouTubeTranscriptApi.get_transcript() function to fetch the transcript:

video_id = "aKEatGCJUGM"
transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=["iw"])

In this example, we’re fetching the transcript for the video with ID aKEatGCJUGM and specifying the language as Hebrew ("iw"). You can change the language code according to your requirements.

Step 4: Convert the Transcript to a DataFrame

The transcript obtained from the YouTube Transcript API is a list of dictionaries, where each dictionary represents a segment of the transcript. To make it easier to work with the data, we can convert it into a pandas DataFrame:

data = []
for segment in transcript:
    text = segment['text']
    start = segment['start']
    duration = segment['duration']
    data.append([video_id, start, start+duration, text])

df = pd.DataFrame(data, columns=['video_id', 'start_time', 'end_time', 'text'])
print(df)

This code snippet iterates over each segment of the transcript, extracts the relevant information (text, start time, and duration), and appends it to a list called data. Finally, it creates a DataFrame called df with columns for the video ID, start time, end time, and text.

Step 5: Convert the DataFrame to JSON or CSV (Optional)

If you need the transcript data in JSON or CSV format, you can easily convert the DataFrame to the desired format:

# Convert to JSON
json_data = df[['text', 'start_time']].to_json(force_ascii=False, orient='records')

# Convert to CSV
df[['text', 'start_time']].to_csv('text.txt', index=False, header=False)

These code snippets demonstrate how to convert the DataFrame to JSON and CSV formats, respectively. You can customize the columns and options based on your requirements.

Conclusion

Reading subtitles from YouTube videos using Python is a straightforward process with the help of the YouTube Transcript API. By following the steps outlined in this tutorial, you can easily fetch transcripts, convert them into a structured format like a DataFrame, and further process or analyze the data as needed.

The YouTube Transcript API provides a convenient way to access subtitles programmatically, opening up possibilities for various applications such as sentiment analysis, keyword extraction, or generating summaries of video content.

Feel free to explore the different options and customize the code to suit your specific requirements. Happy subtitle extraction!

Accelerate Your Career with Our Data and AI Course - Enroll Today

Transform your career with our immersive data and AI course. Acquire practical skills, learn from industry leaders, and open doors to new opportunities in this dynamic field. Secure your spot now and embark on a journey towards success

Click Here

Surprising Benefits of Working Online for People with Disabilities

אפריל 17, 2024

7 Incredible Ways Tech Education is Transforming Lives of People with Disabilities

אפריל 17, 2024

Revolutionize Your SQL Server Data with Python

אפריל 17, 2024

30 minutes free Consultation

Ready to revolutionize your career? Schedule a consultation meeting today and discover how our immersive data and AI course can equip you with the skills, knowledge, and industry insights you need to succeed.

Click Here

How to Easily Read Subtitles from YouTube Videos Using Python

30 minutes free Consultation

How to Easily Read Subtitles from YouTube Videos Using Python

Prerequisites

Step 1: Install the Required Libraries

Step 2: Import the Required Libraries

Step 3: Fetch the Transcript

Step 4: Convert the Transcript to a DataFrame

Step 5: Convert the DataFrame to JSON or CSV (Optional)

Conclusion

Accelerate Your Career with Our Data and AI Course - Enroll Today

More From My Blog

Surprising Benefits of Working Online for People with Disabilities

7 Incredible Ways Tech Education is Transforming Lives of People with Disabilities

Revolutionize Your SQL Server Data with Python

30 minutes free Consultation