Code icon

The App is Under a Quick Maintenance

We apologize for the inconvenience. Please come back later

Menu iconMenu iconOpenAI API Bible Volume 2
OpenAI API Bible Volume 2

Chapter 2: Audio Understanding and Generation with Whisper and GPT-4o

Practical Exercises — Chapter 2

Exercise 1: Transcribe an English Audio File

Task:

Use the Whisper API to transcribe a short .mp3 audio file containing English speech.

Solution:

import openai
import os
from dotenv import load_dotenv

load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

audio_file = open("english_note.mp3", "rb")

transcript = openai.Audio.transcribe(
    model="whisper-1",
    file=audio_file,
    response_format="text"
)

print("Transcript:\n", transcript)

Exercise 2: Translate Foreign Language Audio into English

Task:

Upload a non-English audio file and translate it into English using the Whisper API.

Solution:

translated = openai.Audio.translate(
    model="whisper-1",
    file=open("spanish_clip.mp3", "rb"),
    response_format="text"
)

print("Translation:\n", translated)

Exercise 3: Upload and Analyze an Audio File with GPT-4o

Task:

Upload an .mp3 audio file and ask GPT-4o to summarize it.

Solution:

audio_upload = openai.files.create(
    file=open("meeting_summary.mp3", "rb"),
    purpose="assistants"
)

response = openai.ChatCompletion.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Please summarize this meeting."},
                {"type": "audio", "audio": {"file_id": audio_upload.id}}
            ]
        }
    ]
)

print("Summary:\n", response["choices"][0]["message"]["content"])

Exercise 4: Generate a Spoken Response Using Text-to-Speech

Task:

Take a GPT-generated reply and convert it to speech using OpenAI’s TTS API.

Solution:

text_to_speak = "Sure! The marketing meeting discussed Q3 strategies and budget allocations."

speech = openai.audio.speech.create(
    model="tts-1",
    voice="nova",
    input=text_to_speak
)

with open("spoken_reply.mp3", "wb") as f:
    f.write(speech.content)

print("Voice reply saved as 'spoken_reply.mp3'")

Exercise 5: Build a Voice-to-Voice Mini Assistant

Task:

Build a basic pipeline that accepts audio, generates a response using GPT-4o, and replies back using synthesized voice.

Solution:

# Step 1: Upload audio
uploaded_audio = openai.files.create(
    file=open("user_voice_prompt.mp3", "rb"),
    purpose="assistants"
)

# Step 2: GPT-4o processes it
chat = openai.ChatCompletion.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Please answer this question politely."},
                {"type": "audio", "audio": {"file_id": uploaded_audio.id}}
            ]
        }
    ]
)

reply = chat["choices"][0]["message"]["content"]

# Step 3: Convert GPT reply to audio
tts = openai.audio.speech.create(
    model="tts-1",
    voice="echo",
    input=reply
)

with open("voice_response.mp3", "wb") as f:
    f.write(tts.content)

print("Assistant reply saved as 'voice_response.mp3'")

In these exercises, you practiced:

  • Uploading and transcribing audio with Whisper
  • Translating foreign speech to English
  • Summarizing and interpreting audio with GPT-4o
  • Converting GPT replies into natural-sounding speech
  • Building your first voice-to-voice assistant pipeline

You now have all the tools to powerfully integrate speech into any AI project, whether you're building a language tutor, accessibility assistant, voice-based productivity tool, or smart speaker experience.

Practical Exercises — Chapter 2

Exercise 1: Transcribe an English Audio File

Task:

Use the Whisper API to transcribe a short .mp3 audio file containing English speech.

Solution:

import openai
import os
from dotenv import load_dotenv

load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

audio_file = open("english_note.mp3", "rb")

transcript = openai.Audio.transcribe(
    model="whisper-1",
    file=audio_file,
    response_format="text"
)

print("Transcript:\n", transcript)

Exercise 2: Translate Foreign Language Audio into English

Task:

Upload a non-English audio file and translate it into English using the Whisper API.

Solution:

translated = openai.Audio.translate(
    model="whisper-1",
    file=open("spanish_clip.mp3", "rb"),
    response_format="text"
)

print("Translation:\n", translated)

Exercise 3: Upload and Analyze an Audio File with GPT-4o

Task:

Upload an .mp3 audio file and ask GPT-4o to summarize it.

Solution:

audio_upload = openai.files.create(
    file=open("meeting_summary.mp3", "rb"),
    purpose="assistants"
)

response = openai.ChatCompletion.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Please summarize this meeting."},
                {"type": "audio", "audio": {"file_id": audio_upload.id}}
            ]
        }
    ]
)

print("Summary:\n", response["choices"][0]["message"]["content"])

Exercise 4: Generate a Spoken Response Using Text-to-Speech

Task:

Take a GPT-generated reply and convert it to speech using OpenAI’s TTS API.

Solution:

text_to_speak = "Sure! The marketing meeting discussed Q3 strategies and budget allocations."

speech = openai.audio.speech.create(
    model="tts-1",
    voice="nova",
    input=text_to_speak
)

with open("spoken_reply.mp3", "wb") as f:
    f.write(speech.content)

print("Voice reply saved as 'spoken_reply.mp3'")

Exercise 5: Build a Voice-to-Voice Mini Assistant

Task:

Build a basic pipeline that accepts audio, generates a response using GPT-4o, and replies back using synthesized voice.

Solution:

# Step 1: Upload audio
uploaded_audio = openai.files.create(
    file=open("user_voice_prompt.mp3", "rb"),
    purpose="assistants"
)

# Step 2: GPT-4o processes it
chat = openai.ChatCompletion.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Please answer this question politely."},
                {"type": "audio", "audio": {"file_id": uploaded_audio.id}}
            ]
        }
    ]
)

reply = chat["choices"][0]["message"]["content"]

# Step 3: Convert GPT reply to audio
tts = openai.audio.speech.create(
    model="tts-1",
    voice="echo",
    input=reply
)

with open("voice_response.mp3", "wb") as f:
    f.write(tts.content)

print("Assistant reply saved as 'voice_response.mp3'")

In these exercises, you practiced:

  • Uploading and transcribing audio with Whisper
  • Translating foreign speech to English
  • Summarizing and interpreting audio with GPT-4o
  • Converting GPT replies into natural-sounding speech
  • Building your first voice-to-voice assistant pipeline

You now have all the tools to powerfully integrate speech into any AI project, whether you're building a language tutor, accessibility assistant, voice-based productivity tool, or smart speaker experience.

Practical Exercises — Chapter 2

Exercise 1: Transcribe an English Audio File

Task:

Use the Whisper API to transcribe a short .mp3 audio file containing English speech.

Solution:

import openai
import os
from dotenv import load_dotenv

load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

audio_file = open("english_note.mp3", "rb")

transcript = openai.Audio.transcribe(
    model="whisper-1",
    file=audio_file,
    response_format="text"
)

print("Transcript:\n", transcript)

Exercise 2: Translate Foreign Language Audio into English

Task:

Upload a non-English audio file and translate it into English using the Whisper API.

Solution:

translated = openai.Audio.translate(
    model="whisper-1",
    file=open("spanish_clip.mp3", "rb"),
    response_format="text"
)

print("Translation:\n", translated)

Exercise 3: Upload and Analyze an Audio File with GPT-4o

Task:

Upload an .mp3 audio file and ask GPT-4o to summarize it.

Solution:

audio_upload = openai.files.create(
    file=open("meeting_summary.mp3", "rb"),
    purpose="assistants"
)

response = openai.ChatCompletion.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Please summarize this meeting."},
                {"type": "audio", "audio": {"file_id": audio_upload.id}}
            ]
        }
    ]
)

print("Summary:\n", response["choices"][0]["message"]["content"])

Exercise 4: Generate a Spoken Response Using Text-to-Speech

Task:

Take a GPT-generated reply and convert it to speech using OpenAI’s TTS API.

Solution:

text_to_speak = "Sure! The marketing meeting discussed Q3 strategies and budget allocations."

speech = openai.audio.speech.create(
    model="tts-1",
    voice="nova",
    input=text_to_speak
)

with open("spoken_reply.mp3", "wb") as f:
    f.write(speech.content)

print("Voice reply saved as 'spoken_reply.mp3'")

Exercise 5: Build a Voice-to-Voice Mini Assistant

Task:

Build a basic pipeline that accepts audio, generates a response using GPT-4o, and replies back using synthesized voice.

Solution:

# Step 1: Upload audio
uploaded_audio = openai.files.create(
    file=open("user_voice_prompt.mp3", "rb"),
    purpose="assistants"
)

# Step 2: GPT-4o processes it
chat = openai.ChatCompletion.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Please answer this question politely."},
                {"type": "audio", "audio": {"file_id": uploaded_audio.id}}
            ]
        }
    ]
)

reply = chat["choices"][0]["message"]["content"]

# Step 3: Convert GPT reply to audio
tts = openai.audio.speech.create(
    model="tts-1",
    voice="echo",
    input=reply
)

with open("voice_response.mp3", "wb") as f:
    f.write(tts.content)

print("Assistant reply saved as 'voice_response.mp3'")

In these exercises, you practiced:

  • Uploading and transcribing audio with Whisper
  • Translating foreign speech to English
  • Summarizing and interpreting audio with GPT-4o
  • Converting GPT replies into natural-sounding speech
  • Building your first voice-to-voice assistant pipeline

You now have all the tools to powerfully integrate speech into any AI project, whether you're building a language tutor, accessibility assistant, voice-based productivity tool, or smart speaker experience.

Practical Exercises — Chapter 2

Exercise 1: Transcribe an English Audio File

Task:

Use the Whisper API to transcribe a short .mp3 audio file containing English speech.

Solution:

import openai
import os
from dotenv import load_dotenv

load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

audio_file = open("english_note.mp3", "rb")

transcript = openai.Audio.transcribe(
    model="whisper-1",
    file=audio_file,
    response_format="text"
)

print("Transcript:\n", transcript)

Exercise 2: Translate Foreign Language Audio into English

Task:

Upload a non-English audio file and translate it into English using the Whisper API.

Solution:

translated = openai.Audio.translate(
    model="whisper-1",
    file=open("spanish_clip.mp3", "rb"),
    response_format="text"
)

print("Translation:\n", translated)

Exercise 3: Upload and Analyze an Audio File with GPT-4o

Task:

Upload an .mp3 audio file and ask GPT-4o to summarize it.

Solution:

audio_upload = openai.files.create(
    file=open("meeting_summary.mp3", "rb"),
    purpose="assistants"
)

response = openai.ChatCompletion.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Please summarize this meeting."},
                {"type": "audio", "audio": {"file_id": audio_upload.id}}
            ]
        }
    ]
)

print("Summary:\n", response["choices"][0]["message"]["content"])

Exercise 4: Generate a Spoken Response Using Text-to-Speech

Task:

Take a GPT-generated reply and convert it to speech using OpenAI’s TTS API.

Solution:

text_to_speak = "Sure! The marketing meeting discussed Q3 strategies and budget allocations."

speech = openai.audio.speech.create(
    model="tts-1",
    voice="nova",
    input=text_to_speak
)

with open("spoken_reply.mp3", "wb") as f:
    f.write(speech.content)

print("Voice reply saved as 'spoken_reply.mp3'")

Exercise 5: Build a Voice-to-Voice Mini Assistant

Task:

Build a basic pipeline that accepts audio, generates a response using GPT-4o, and replies back using synthesized voice.

Solution:

# Step 1: Upload audio
uploaded_audio = openai.files.create(
    file=open("user_voice_prompt.mp3", "rb"),
    purpose="assistants"
)

# Step 2: GPT-4o processes it
chat = openai.ChatCompletion.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Please answer this question politely."},
                {"type": "audio", "audio": {"file_id": uploaded_audio.id}}
            ]
        }
    ]
)

reply = chat["choices"][0]["message"]["content"]

# Step 3: Convert GPT reply to audio
tts = openai.audio.speech.create(
    model="tts-1",
    voice="echo",
    input=reply
)

with open("voice_response.mp3", "wb") as f:
    f.write(tts.content)

print("Assistant reply saved as 'voice_response.mp3'")

In these exercises, you practiced:

  • Uploading and transcribing audio with Whisper
  • Translating foreign speech to English
  • Summarizing and interpreting audio with GPT-4o
  • Converting GPT replies into natural-sounding speech
  • Building your first voice-to-voice assistant pipeline

You now have all the tools to powerfully integrate speech into any AI project, whether you're building a language tutor, accessibility assistant, voice-based productivity tool, or smart speaker experience.