Building a Speech-to-Text Application with OpenAI's Whisper API, Node.js, and TypeScript
1. Introduction
Section titled 1. IntroductionIn this tutorial, we aim to guide you through the process of transcribing and translating audio files using OpenAI’s Whisper API, Node.js, and TypeScript. By the end of this guide, you will have a functioning application that can transcribe audio files into text and translate them into English.
2. Prerequisites
Section titled 2. PrerequisitesBefore we begin, make sure you have the following installed on your system:
- Node.js: You can download it from the official Node.js website.
- npm (Node Package Manager): It comes bundled with Node.js, so if you’ve installed Node.js, you already have npm.
- TypeScript: Install it globally on your system using npm with the command
npm install -g typescript.
3. Setting Up the Project
Section titled 3. Setting Up the ProjectLet’s start by setting up a new project:
- Create a new directory for your project. You can do this using the command
mkdir whisper-transcription-service. - Navigate into the new directory with
cd whisper-transcription-service. - Initialize a new Node.js project by running
npm init -y. This will create a newpackage.jsonfile in your project directory. - Install the necessary dependencies for our project. We will need the
openaianddotenvpackages. Install them using npm with the commandnpm install openai dotenv.openai: This is the official OpenAI client library for Node.js. It provides convenient methods for interacting with the OpenAI API.dotenv: This package loads environment variables from a.envfile intoprocess.env. We will use this to load our OpenAI API key from a.envfile.
- Initialize a new TypeScript configuration file by running
tsc --init. This will create a newtsconfig.jsonfile in your project directory.
You can install these packages using npm (Node Package Manager) with the following command:
npm install openai dotenv
Remember, you also need to have Node.js and TypeScript installed on your system. If you haven’t installed them yet, you can download Node.js from the official Node.js website and install TypeScript globally on your system using npm with the command npm install -g typescript.
4. Creating the Transcription Service
Section titled 4. Creating the Transcription ServiceWe will create a TranscriptionService class that will handle the transcription and translation of our audio files. This class will have two methods: transcribeAudio and translateAudio.
Create a new file named transcriptionService.ts and add the following code:
transcriptionService.ts
import { Configuration, OpenAIApi } from 'openai';
import * as fs from 'fs';
export class TranscriptionService {
private openai: OpenAIApi;
constructor(apiKey: string) {
const configuration = new Configuration({
apiKey: apiKey,
});
this.openai = new OpenAIApi(configuration);
}
async transcribeAudio(filePath: string) {
const audioStream = fs.createReadStream(filePath);
const response = await this.openai.createTranscription(audioStream, 'whisper-1');
return response.text;
}
async translateAudio(filePath: string) {
const audioStream = fs.createReadStream(filePath);
const response = await this.openai.createTranscription(
audioStream,
"whisper-1"
);
return response.text;
}
}
In the constructor of our TranscriptionService class, we initialize the OpenAIApi object with our API key. We will use this object to interact with the OpenAI API.
5. Implementing the Transcription Logic
Section titled 5. Implementing the Transcription LogicThe transcribeAudio method will use the createTranscription method of the OpenAIApi object to transcribe our audio files.
The translateAudio method will use the createTranslation method of the OpenAIApi object to translate our audio files.
This method reads the audio file from the provided file path and sends it to the OpenAI API for transcription or Translation. The response from the API is a JSON object that contains the transcribed or translated text, which we return from our method.
6. Implementing the Translation Logic
Section titled 6. Implementing the Translation LogicThis method works similarly to the transcribeAudio method. It reads the audio file from the provided file path and sends it to the OpenAI API for translation. The response from the API is a JSON object that contains the translated text, which we return from our method.
7. Using the Transcription Service
Section titled 7. Using the Transcription ServiceNow that we have our TranscriptionService class ready, we can use it to transcribe and translate our audio files. Create a new file named index.ts and add the following code:
index.ts
import { TranscriptionService } from './transcriptionService';
import * as dotenv from 'dotenv';
dotenv.config();
const transcriptionService = new TranscriptionService(process.env.OPENAI_API_KEY);
async function main() {
const transcription = await transcriptionService.transcribeAudio('audio.mp3');
console.log('Transcription:', transcription);
const translation = await transcriptionService.translateAudio('german.m4a');
console.log('Translation:', translation);
}
main();
In this code, we first load our OpenAI API key from a .env file using the dotenv package. We then create an instance of our TranscriptionService class and use it to transcribe and translate our audio files. The transcribed and translated text is then logged to the console.
8. Conclusion
Section titled 8. ConclusionCongratulations! You have successfully created an application that can transcribe and translate audio files using OpenAI’s Whisper API, Node.js, and TypeScript.
This application can be extended in many ways. For example, you could add error handling to make it more robust, support more audio formats, or integrate it with other services to create a more complex application. The possibilities are endless!