Code icon

The App is Under a Quick Maintenance

We apologize for the inconvenience. Please come back later

Menu iconMenu iconOpenAI API Bible Volume 2
OpenAI API Bible Volume 2

Chapter 6: Cross-Model AI Suites

Practical Exercises — Chapter 6

Exercise 1: Modularize Your Multimodal Logic

Task:

Refactor your existing multimodal pipeline (Whisper → GPT → DALL·E) into three clearly separated Python modules: transcribe.pysummarize.py, and generate_image.py.

Solution:

# transcribe.py
import openai

def transcribe_audio(file_path):
    with open(file_path, "rb") as audio_file:
        result = openai.Audio.transcribe("whisper-1", file=audio_file)
    return result["text"]
# summarize.py
import openai

def summarize_text(text):
    response = openai.ChatCompletion.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Summarize this text for clarity and brevity."},
            {"role": "user", "content": text}
        ]
    )
    return response["choices"][0]["message"]["content"]
# generate_image.py
import openai

def generate_image_from_prompt(prompt):
    response = openai.Image.create(
        prompt=prompt,
        model="dall-e-3",
        size="1024x1024",
        response_format="url"
    )
    return response["data"][0]["url"]

Exercise 2: Build a Local File Processor

Task:

Create a script that reads an audio file from the local system and automatically generates a transcription, summary, image prompt, and image.

Solution:

from transcribe import transcribe_audio
from summarize import summarize_text
from generate_image import generate_image_from_prompt
import requests

audio_file = "example.m4a"
transcript = transcribe_audio(audio_file)
summary = summarize_text(transcript)
image_prompt = summarize_text(transcript)  # You could also use a dedicated prompt function
image_url = generate_image_from_prompt(image_prompt)

# Download and save image
img_data = requests.get(image_url).content
with open("output_image.png", "wb") as f:
    f.write(img_data)

print("Transcript:", transcript)
print("Summary:", summary)
print("Image Prompt:", image_prompt)
print("Image URL:", image_url)

Exercise 3: Add Logging to Your Pipeline

Task:

Add a logging system that tracks when a file was processed and logs each stage of the pipeline.

Solution:

import logging

logging.basicConfig(
    filename="pipeline.log",
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s"
)

logging.info("Started processing example.m4a")
# After each step:
logging.info("Transcription complete")
logging.info("Summary complete")
logging.info("Image generated successfully")

Exercise 4: Deploy the Flask Dashboard on Render

Task:

Take your app.py and HTML files and deploy them using Render. Make sure to:

  • Create a requirements.txt
  • Use gunicorn as your server
  • Add OPENAI_API_KEY as an environment variable
  • Test the URL on both desktop and mobile

Deployment Notes:

# requirements.txt should include:
Flask
openai
python-dotenv
gunicorn
requests

Start command on Render:

gunicorn app:app

Exercise 5: Auto-Process Audio Uploads in Background

Task:

Create a local script that watches a directory for .mp3 or .m4a files and processes them automatically.

Solution (Watch Script):

import os
import time
from transcribe import transcribe_audio
from summarize import summarize_text
from generate_image import generate_image_from_prompt

UPLOAD_DIR = "uploads"
PROCESSED = set()

def watch_folder():
    while True:
        files = [f for f in os.listdir(UPLOAD_DIR) if f.endswith((".m4a", ".mp3"))]
        for f in files:
            if f not in PROCESSED:
                path = os.path.join(UPLOAD_DIR, f)
                transcript = transcribe_audio(path)
                summary = summarize_text(transcript)
                prompt = summarize_text(transcript)
                image_url = generate_image_from_prompt(prompt)
                print("✅ Processed:", f)
                PROCESSED.add(f)
        time.sleep(5)

watch_folder()

Exercise 6: Protect Your API Keys

Task:

Use python-dotenv to load your API key safely from a .env file.

Solution:

# .env
OPENAI_API_KEY=your-openai-key
# In your Python code
from dotenv import load_dotenv
import os

load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

By completing these exercises, you now have a hands-on understanding of how to:

  • Structure clean, modular AI pipelines
  • Automate file processing workflows
  • Secure your deployment and prepare for scaling
  • Build applications that combine voice, text, and visuals

This chapter forms the practical core of real-world AI products. In the next and final chapter, you’ll combine everything into a fully polished AI suite, complete with documentation, front-end, and real use case.

Practical Exercises — Chapter 6

Exercise 1: Modularize Your Multimodal Logic

Task:

Refactor your existing multimodal pipeline (Whisper → GPT → DALL·E) into three clearly separated Python modules: transcribe.pysummarize.py, and generate_image.py.

Solution:

# transcribe.py
import openai

def transcribe_audio(file_path):
    with open(file_path, "rb") as audio_file:
        result = openai.Audio.transcribe("whisper-1", file=audio_file)
    return result["text"]
# summarize.py
import openai

def summarize_text(text):
    response = openai.ChatCompletion.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Summarize this text for clarity and brevity."},
            {"role": "user", "content": text}
        ]
    )
    return response["choices"][0]["message"]["content"]
# generate_image.py
import openai

def generate_image_from_prompt(prompt):
    response = openai.Image.create(
        prompt=prompt,
        model="dall-e-3",
        size="1024x1024",
        response_format="url"
    )
    return response["data"][0]["url"]

Exercise 2: Build a Local File Processor

Task:

Create a script that reads an audio file from the local system and automatically generates a transcription, summary, image prompt, and image.

Solution:

from transcribe import transcribe_audio
from summarize import summarize_text
from generate_image import generate_image_from_prompt
import requests

audio_file = "example.m4a"
transcript = transcribe_audio(audio_file)
summary = summarize_text(transcript)
image_prompt = summarize_text(transcript)  # You could also use a dedicated prompt function
image_url = generate_image_from_prompt(image_prompt)

# Download and save image
img_data = requests.get(image_url).content
with open("output_image.png", "wb") as f:
    f.write(img_data)

print("Transcript:", transcript)
print("Summary:", summary)
print("Image Prompt:", image_prompt)
print("Image URL:", image_url)

Exercise 3: Add Logging to Your Pipeline

Task:

Add a logging system that tracks when a file was processed and logs each stage of the pipeline.

Solution:

import logging

logging.basicConfig(
    filename="pipeline.log",
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s"
)

logging.info("Started processing example.m4a")
# After each step:
logging.info("Transcription complete")
logging.info("Summary complete")
logging.info("Image generated successfully")

Exercise 4: Deploy the Flask Dashboard on Render

Task:

Take your app.py and HTML files and deploy them using Render. Make sure to:

  • Create a requirements.txt
  • Use gunicorn as your server
  • Add OPENAI_API_KEY as an environment variable
  • Test the URL on both desktop and mobile

Deployment Notes:

# requirements.txt should include:
Flask
openai
python-dotenv
gunicorn
requests

Start command on Render:

gunicorn app:app

Exercise 5: Auto-Process Audio Uploads in Background

Task:

Create a local script that watches a directory for .mp3 or .m4a files and processes them automatically.

Solution (Watch Script):

import os
import time
from transcribe import transcribe_audio
from summarize import summarize_text
from generate_image import generate_image_from_prompt

UPLOAD_DIR = "uploads"
PROCESSED = set()

def watch_folder():
    while True:
        files = [f for f in os.listdir(UPLOAD_DIR) if f.endswith((".m4a", ".mp3"))]
        for f in files:
            if f not in PROCESSED:
                path = os.path.join(UPLOAD_DIR, f)
                transcript = transcribe_audio(path)
                summary = summarize_text(transcript)
                prompt = summarize_text(transcript)
                image_url = generate_image_from_prompt(prompt)
                print("✅ Processed:", f)
                PROCESSED.add(f)
        time.sleep(5)

watch_folder()

Exercise 6: Protect Your API Keys

Task:

Use python-dotenv to load your API key safely from a .env file.

Solution:

# .env
OPENAI_API_KEY=your-openai-key
# In your Python code
from dotenv import load_dotenv
import os

load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

By completing these exercises, you now have a hands-on understanding of how to:

  • Structure clean, modular AI pipelines
  • Automate file processing workflows
  • Secure your deployment and prepare for scaling
  • Build applications that combine voice, text, and visuals

This chapter forms the practical core of real-world AI products. In the next and final chapter, you’ll combine everything into a fully polished AI suite, complete with documentation, front-end, and real use case.

Practical Exercises — Chapter 6

Exercise 1: Modularize Your Multimodal Logic

Task:

Refactor your existing multimodal pipeline (Whisper → GPT → DALL·E) into three clearly separated Python modules: transcribe.pysummarize.py, and generate_image.py.

Solution:

# transcribe.py
import openai

def transcribe_audio(file_path):
    with open(file_path, "rb") as audio_file:
        result = openai.Audio.transcribe("whisper-1", file=audio_file)
    return result["text"]
# summarize.py
import openai

def summarize_text(text):
    response = openai.ChatCompletion.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Summarize this text for clarity and brevity."},
            {"role": "user", "content": text}
        ]
    )
    return response["choices"][0]["message"]["content"]
# generate_image.py
import openai

def generate_image_from_prompt(prompt):
    response = openai.Image.create(
        prompt=prompt,
        model="dall-e-3",
        size="1024x1024",
        response_format="url"
    )
    return response["data"][0]["url"]

Exercise 2: Build a Local File Processor

Task:

Create a script that reads an audio file from the local system and automatically generates a transcription, summary, image prompt, and image.

Solution:

from transcribe import transcribe_audio
from summarize import summarize_text
from generate_image import generate_image_from_prompt
import requests

audio_file = "example.m4a"
transcript = transcribe_audio(audio_file)
summary = summarize_text(transcript)
image_prompt = summarize_text(transcript)  # You could also use a dedicated prompt function
image_url = generate_image_from_prompt(image_prompt)

# Download and save image
img_data = requests.get(image_url).content
with open("output_image.png", "wb") as f:
    f.write(img_data)

print("Transcript:", transcript)
print("Summary:", summary)
print("Image Prompt:", image_prompt)
print("Image URL:", image_url)

Exercise 3: Add Logging to Your Pipeline

Task:

Add a logging system that tracks when a file was processed and logs each stage of the pipeline.

Solution:

import logging

logging.basicConfig(
    filename="pipeline.log",
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s"
)

logging.info("Started processing example.m4a")
# After each step:
logging.info("Transcription complete")
logging.info("Summary complete")
logging.info("Image generated successfully")

Exercise 4: Deploy the Flask Dashboard on Render

Task:

Take your app.py and HTML files and deploy them using Render. Make sure to:

  • Create a requirements.txt
  • Use gunicorn as your server
  • Add OPENAI_API_KEY as an environment variable
  • Test the URL on both desktop and mobile

Deployment Notes:

# requirements.txt should include:
Flask
openai
python-dotenv
gunicorn
requests

Start command on Render:

gunicorn app:app

Exercise 5: Auto-Process Audio Uploads in Background

Task:

Create a local script that watches a directory for .mp3 or .m4a files and processes them automatically.

Solution (Watch Script):

import os
import time
from transcribe import transcribe_audio
from summarize import summarize_text
from generate_image import generate_image_from_prompt

UPLOAD_DIR = "uploads"
PROCESSED = set()

def watch_folder():
    while True:
        files = [f for f in os.listdir(UPLOAD_DIR) if f.endswith((".m4a", ".mp3"))]
        for f in files:
            if f not in PROCESSED:
                path = os.path.join(UPLOAD_DIR, f)
                transcript = transcribe_audio(path)
                summary = summarize_text(transcript)
                prompt = summarize_text(transcript)
                image_url = generate_image_from_prompt(prompt)
                print("✅ Processed:", f)
                PROCESSED.add(f)
        time.sleep(5)

watch_folder()

Exercise 6: Protect Your API Keys

Task:

Use python-dotenv to load your API key safely from a .env file.

Solution:

# .env
OPENAI_API_KEY=your-openai-key
# In your Python code
from dotenv import load_dotenv
import os

load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

By completing these exercises, you now have a hands-on understanding of how to:

  • Structure clean, modular AI pipelines
  • Automate file processing workflows
  • Secure your deployment and prepare for scaling
  • Build applications that combine voice, text, and visuals

This chapter forms the practical core of real-world AI products. In the next and final chapter, you’ll combine everything into a fully polished AI suite, complete with documentation, front-end, and real use case.

Practical Exercises — Chapter 6

Exercise 1: Modularize Your Multimodal Logic

Task:

Refactor your existing multimodal pipeline (Whisper → GPT → DALL·E) into three clearly separated Python modules: transcribe.pysummarize.py, and generate_image.py.

Solution:

# transcribe.py
import openai

def transcribe_audio(file_path):
    with open(file_path, "rb") as audio_file:
        result = openai.Audio.transcribe("whisper-1", file=audio_file)
    return result["text"]
# summarize.py
import openai

def summarize_text(text):
    response = openai.ChatCompletion.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Summarize this text for clarity and brevity."},
            {"role": "user", "content": text}
        ]
    )
    return response["choices"][0]["message"]["content"]
# generate_image.py
import openai

def generate_image_from_prompt(prompt):
    response = openai.Image.create(
        prompt=prompt,
        model="dall-e-3",
        size="1024x1024",
        response_format="url"
    )
    return response["data"][0]["url"]

Exercise 2: Build a Local File Processor

Task:

Create a script that reads an audio file from the local system and automatically generates a transcription, summary, image prompt, and image.

Solution:

from transcribe import transcribe_audio
from summarize import summarize_text
from generate_image import generate_image_from_prompt
import requests

audio_file = "example.m4a"
transcript = transcribe_audio(audio_file)
summary = summarize_text(transcript)
image_prompt = summarize_text(transcript)  # You could also use a dedicated prompt function
image_url = generate_image_from_prompt(image_prompt)

# Download and save image
img_data = requests.get(image_url).content
with open("output_image.png", "wb") as f:
    f.write(img_data)

print("Transcript:", transcript)
print("Summary:", summary)
print("Image Prompt:", image_prompt)
print("Image URL:", image_url)

Exercise 3: Add Logging to Your Pipeline

Task:

Add a logging system that tracks when a file was processed and logs each stage of the pipeline.

Solution:

import logging

logging.basicConfig(
    filename="pipeline.log",
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s"
)

logging.info("Started processing example.m4a")
# After each step:
logging.info("Transcription complete")
logging.info("Summary complete")
logging.info("Image generated successfully")

Exercise 4: Deploy the Flask Dashboard on Render

Task:

Take your app.py and HTML files and deploy them using Render. Make sure to:

  • Create a requirements.txt
  • Use gunicorn as your server
  • Add OPENAI_API_KEY as an environment variable
  • Test the URL on both desktop and mobile

Deployment Notes:

# requirements.txt should include:
Flask
openai
python-dotenv
gunicorn
requests

Start command on Render:

gunicorn app:app

Exercise 5: Auto-Process Audio Uploads in Background

Task:

Create a local script that watches a directory for .mp3 or .m4a files and processes them automatically.

Solution (Watch Script):

import os
import time
from transcribe import transcribe_audio
from summarize import summarize_text
from generate_image import generate_image_from_prompt

UPLOAD_DIR = "uploads"
PROCESSED = set()

def watch_folder():
    while True:
        files = [f for f in os.listdir(UPLOAD_DIR) if f.endswith((".m4a", ".mp3"))]
        for f in files:
            if f not in PROCESSED:
                path = os.path.join(UPLOAD_DIR, f)
                transcript = transcribe_audio(path)
                summary = summarize_text(transcript)
                prompt = summarize_text(transcript)
                image_url = generate_image_from_prompt(prompt)
                print("✅ Processed:", f)
                PROCESSED.add(f)
        time.sleep(5)

watch_folder()

Exercise 6: Protect Your API Keys

Task:

Use python-dotenv to load your API key safely from a .env file.

Solution:

# .env
OPENAI_API_KEY=your-openai-key
# In your Python code
from dotenv import load_dotenv
import os

load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

By completing these exercises, you now have a hands-on understanding of how to:

  • Structure clean, modular AI pipelines
  • Automate file processing workflows
  • Secure your deployment and prepare for scaling
  • Build applications that combine voice, text, and visuals

This chapter forms the practical core of real-world AI products. In the next and final chapter, you’ll combine everything into a fully polished AI suite, complete with documentation, front-end, and real use case.