- Published on
Using Azure to add subtitles to a video
- Authors
- Name
- Jac Timms
I needed to create English subtitles for a video which was recorded in French and figured I'd use Azure instead of paying for cloud services. In this guide, I'll walk through how to use Azure Speech Service to automatically generate subtitles for your videos.
Prerequisites
Before we begin, you'll need:
- An Azure subscription (you can create one for free)
- A video file you want to add subtitles to
- Basic familiarity with command line tools
Step 1: Extract Audio from Video and Convert to WAV
First, we need to extract the audio from your video file and convert it to WAV format:
On MacOS
Use the built-in afconvert
command:
afconvert -f WAVE -d LEI16@44100 input.m4a output.wav
On Windows
Use ffmpeg (you'll need to install it first):
ffmpeg -i input.mp4 -acodec pcm_s16le -ar 44100 output.wav
Step 2: Set Up Azure Speech Service
- Create a Speech resource in the Azure portal
- Once created, get your Speech resource key and region from the "Keys and Endpoint" section
Step 3: Install the Speech CLI
The Azure Speech CLI is the easiest way to generate subtitles. Install it using the .NET CLI:
Note: You'll also need to make sure you have the .NET 6 SDK installed. If, like me you are on ARM64 Mac, the Cognitive Services Speech SDK doesn't support it yet, so you'll need to use Rosetta to run the .NET CLI.
I have a guide on running .NET x64 under Rosetta on ARM64 Macs if you need help with that.
Once you have .NET 6 SDK installed, install the Speech CLI:
dotnet tool install --global Microsoft.CognitiveServices.Speech.CLI
Step 4: Configure Speech CLI
Set up your Azure Speech Service credentials:
spx config @key --set YOUR-SUBSCRIPTION-KEY
spx config @region --set YOUR-REGION
Replace YOUR-SUBSCRIPTION-KEY
with your Speech resource key and YOUR-REGION
with your resource region (e.g., westus, northeurope).
If you are on Windows or Linux, you're also best off installing GStreamer, you can follow the instructions for that here.
If you are on MacOS, GStreamer isn't supported. I tried to get it work for this, but it just refused.
Step 5: Generate Subtitles
Now you can generate subtitles in WebVTT format: *Important: Specificing the format flag will trigger GStreamer dependency requirements. *
Rosetta on MacOS:
spx recognize --file input.wav --language fr-FR --output vtt file subtitles.vtt
Or on Windows or Linux using GStreamer:
spx recognize --file your-audio.m4a --format any --output vtt file subtitles.vtt --output srt file subtitles.srt
Additional options you can use:
--profanity masked
: Masks profanity in the output--phrases "Phrase1;Phrase2"
: Improves recognition of specific phrases--property SpeechServiceResponse_StablePartialResultThreshold=5
: Improves accuracy by requiring more confidence in the recognition
Output Format
The command generates a WebVTT file that looks like this:
WEBVTT
00:00:00.170 --> 00:00:03.230
Welcome to this video tutorial.
00:00:03.230 --> 00:00:06.450
Today we'll be discussing Azure services.
Clean Up
When you're done, you can remove the Speech resource from Azure portal if you don't plan to use it again. This ensures you won't incur any additional costs.
Tips for Better Results
- Use high-quality audio for better recognition accuracy
- Add custom phrases for domain-specific terminology
- Test with a small segment first to verify the quality
- Consider post-editing the subtitles for perfect accuracy
Conclusion
Azure Speech Service provides a powerful and automated way to generate subtitles for your videos. While the output might need some manual refinement depending on your needs, it significantly reduces the time and effort required compared to manual transcription.
For more advanced scenarios or programmatic access, Azure Speech Service also provides SDKs for various programming languages including C#, Python, and JavaScript.