Building a Speech⁠-⁠to⁠-⁠Text Application with OpenAI's Whisper API, Node.js, and TypeScript

In this tutorial, we aim to guide you through the process of transcribing and translating audio files using OpenAI’s Whisper API, Node.js, and TypeScript. By the end of this guide, you will have a functioning application that can transcribe audio files into text and translate them into English.

Before we begin, make sure you have the following installed on your system:

  • Node.js: You can download it from the official Node.js website.
  • npm (Node Package Manager): It comes bundled with Node.js, so if you’ve installed Node.js, you already have npm.
  • TypeScript: Install it globally on your system using npm with the command npm install -g typescript.

Let’s start by setting up a new project:

  1. Create a new directory for your project. You can do this using the command mkdir whisper-transcription-service.
  2. Navigate into the new directory with cd whisper-transcription-service.
  3. Initialize a new Node.js project by running npm init -y. This will create a new package.json file in your project directory.
  4. Install the necessary dependencies for our project. We will need the openai and dotenv packages. Install them using npm with the command npm install openai dotenv.
    • openai: This is the official OpenAI client library for Node.js. It provides convenient methods for interacting with the OpenAI API.
    • dotenv: This package loads environment variables from a .env file into process.env. We will use this to load our OpenAI API key from a .env file.
  5. Initialize a new TypeScript configuration file by running tsc --init. This will create a new tsconfig.json file in your project directory.

You can install these packages using npm (Node Package Manager) with the following command:

Terminal window
npm install openai dotenv

Remember, you also need to have Node.js and TypeScript installed on your system. If you haven’t installed them yet, you can download Node.js from the official Node.js website and install TypeScript globally on your system using npm with the command npm install -g typescript.

4. Creating the Transcription Service

Section titled 4. Creating the Transcription Service

We will create a TranscriptionService class that will handle the transcription and translation of our audio files. This class will have two methods: transcribeAudio and translateAudio.

Create a new file named transcriptionService.ts and add the following code:

transcriptionService.ts
import { Configuration, OpenAIApi } from 'openai';
import * as fs from 'fs';


export class TranscriptionService {
  private openai: OpenAIApi;

  constructor(apiKey: string) {
    const configuration = new Configuration({
      apiKey: apiKey,
    });
    this.openai = new OpenAIApi(configuration);
  }

  async transcribeAudio(filePath: string) {
  const audioStream = fs.createReadStream(filePath);
  const response = await this.openai.createTranscription(audioStream, 'whisper-1');
  return response.text;
}

 async translateAudio(filePath: string) {
  const audioStream = fs.createReadStream(filePath);
  const response = await this.openai.createTranscription(
  audioStream,
  "whisper-1"
);
  return response.text;
}
}

In the constructor of our TranscriptionService class, we initialize the OpenAIApi object with our API key. We will use this object to interact with the OpenAI API.

5. Implementing the Transcription Logic

Section titled 5. Implementing the Transcription Logic

The transcribeAudio method will use the createTranscription method of the OpenAIApi object to transcribe our audio files. The translateAudio method will use the createTranslation method of the OpenAIApi object to translate our audio files. This method reads the audio file from the provided file path and sends it to the OpenAI API for transcription or Translation. The response from the API is a JSON object that contains the transcribed or translated text, which we return from our method.

6. Implementing the Translation Logic

Section titled 6. Implementing the Translation Logic

This method works similarly to the transcribeAudio method. It reads the audio file from the provided file path and sends it to the OpenAI API for translation. The response from the API is a JSON object that contains the translated text, which we return from our method.

7. Using the Transcription Service

Section titled 7. Using the Transcription Service

Now that we have our TranscriptionService class ready, we can use it to transcribe and translate our audio files. Create a new file named index.ts and add the following code:

index.ts
import { TranscriptionService } from './transcriptionService';
import * as dotenv from 'dotenv';

dotenv.config();

const transcriptionService = new TranscriptionService(process.env.OPENAI_API_KEY);

async function main() {
  const transcription = await transcriptionService.transcribeAudio('audio.mp3');
  console.log('Transcription:', transcription);

  const translation = await transcriptionService.translateAudio('german.m4a');
  console.log('Translation:', translation);
}

main();

In this code, we first load our OpenAI API key from a .env file using the dotenv package. We then create an instance of our TranscriptionService class and use it to transcribe and translate our audio files. The transcribed and translated text is then logged to the console.

Congratulations! You have successfully created an application that can transcribe and translate audio files using OpenAI’s Whisper API, Node.js, and TypeScript.

This application can be extended in many ways. For example, you could add error handling to make it more robust, support more audio formats, or integrate it with other services to create a more complex application. The possibilities are endless!