Click here to view the next lesson.

Chapter 6: Cross-Model AI Suites

Practical Exercises — Chapter 6

Exercise 1: Modularize Your Multimodal Logic

Task:

Refactor your existing multimodal pipeline (Whisper → GPT → DALL·E) into three clearly separated Python modules: transcribe.py, summarize.py, and generate_image.py.

Solution:

# transcribe.py
import openai

def transcribe_audio(file_path):
    with open(file_path, "rb") as audio_file:
        result = openai.Audio.transcribe("whisper-1", file=audio_file)
    return result["text"]

# summarize.py
import openai

def summarize_text(text):
    response = openai.ChatCompletion.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Summarize this text for clarity and brevity."},
            {"role": "user", "content": text}
        ]
    )
    return response["choices"][0]["message"]["content"]

# generate_image.py
import openai

def generate_image_from_prompt(prompt):
    response = openai.Image.create(
        prompt=prompt,
        model="dall-e-3",
        size="1024x1024",
        response_format="url"
    )
    return response["data"][0]["url"]

Exercise 2: Build a Local File Processor

Task:

Create a script that reads an audio file from the local system and automatically generates a transcription, summary, image prompt, and image.

Solution:

from transcribe import transcribe_audio
from summarize import summarize_text
from generate_image import generate_image_from_prompt
import requests

audio_file = "example.m4a"
transcript = transcribe_audio(audio_file)
summary = summarize_text(transcript)
image_prompt = summarize_text(transcript)  # You could also use a dedicated prompt function
image_url = generate_image_from_prompt(image_prompt)

# Download and save image
img_data = requests.get(image_url).content
with open("output_image.png", "wb") as f:
    f.write(img_data)

print("Transcript:", transcript)
print("Summary:", summary)
print("Image Prompt:", image_prompt)
print("Image URL:", image_url)

Exercise 3: Add Logging to Your Pipeline

Task:

Add a logging system that tracks when a file was processed and logs each stage of the pipeline.

Solution:

import logging

logging.basicConfig(
    filename="pipeline.log",
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s"
)

logging.info("Started processing example.m4a")
# After each step:
logging.info("Transcription complete")
logging.info("Summary complete")
logging.info("Image generated successfully")

Exercise 4: Deploy the Flask Dashboard on Render

Task:

Take your app.py and HTML files and deploy them using Render. Make sure to:

Create a requirements.txt
Use gunicorn as your server
Add OPENAI_API_KEY as an environment variable
Test the URL on both desktop and mobile

Deployment Notes:

# requirements.txt should include:
Flask
openai
python-dotenv
gunicorn
requests

Start command on Render:

gunicorn app:app

Exercise 5: Auto-Process Audio Uploads in Background

Task:

Create a local script that watches a directory for .mp3 or .m4a files and processes them automatically.

Solution (Watch Script):

import os
import time
from transcribe import transcribe_audio
from summarize import summarize_text
from generate_image import generate_image_from_prompt

UPLOAD_DIR = "uploads"
PROCESSED = set()

def watch_folder():
    while True:
        files = [f for f in os.listdir(UPLOAD_DIR) if f.endswith((".m4a", ".mp3"))]
        for f in files:
            if f not in PROCESSED:
                path = os.path.join(UPLOAD_DIR, f)
                transcript = transcribe_audio(path)
                summary = summarize_text(transcript)
                prompt = summarize_text(transcript)
                image_url = generate_image_from_prompt(prompt)
                print("✅ Processed:", f)
                PROCESSED.add(f)
        time.sleep(5)

watch_folder()

Exercise 6: Protect Your API Keys

Task:

Use python-dotenv to load your API key safely from a .env file.

Solution:

# .env
OPENAI_API_KEY=your-openai-key

# In your Python code
from dotenv import load_dotenv
import os

load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

By completing these exercises, you now have a hands-on understanding of how to:

Structure clean, modular AI pipelines
Automate file processing workflows
Secure your deployment and prepare for scaling
Build applications that combine voice, text, and visuals

This chapter forms the practical core of real-world AI products. In the next and final chapter, you’ll combine everything into a fully polished AI suite, complete with documentation, front-end, and real use case.

Practical Exercises — Chapter 6

Exercise 1: Modularize Your Multimodal Logic

Task:

Refactor your existing multimodal pipeline (Whisper → GPT → DALL·E) into three clearly separated Python modules: transcribe.py, summarize.py, and generate_image.py.

Solution:

# transcribe.py
import openai

def transcribe_audio(file_path):
    with open(file_path, "rb") as audio_file:
        result = openai.Audio.transcribe("whisper-1", file=audio_file)
    return result["text"]

# summarize.py
import openai

def summarize_text(text):
    response = openai.ChatCompletion.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Summarize this text for clarity and brevity."},
            {"role": "user", "content": text}
        ]
    )
    return response["choices"][0]["message"]["content"]

# generate_image.py
import openai

def generate_image_from_prompt(prompt):
    response = openai.Image.create(
        prompt=prompt,
        model="dall-e-3",
        size="1024x1024",
        response_format="url"
    )
    return response["data"][0]["url"]

Exercise 2: Build a Local File Processor

Task:

Create a script that reads an audio file from the local system and automatically generates a transcription, summary, image prompt, and image.

Solution:

from transcribe import transcribe_audio
from summarize import summarize_text
from generate_image import generate_image_from_prompt
import requests

audio_file = "example.m4a"
transcript = transcribe_audio(audio_file)
summary = summarize_text(transcript)
image_prompt = summarize_text(transcript)  # You could also use a dedicated prompt function
image_url = generate_image_from_prompt(image_prompt)

# Download and save image
img_data = requests.get(image_url).content
with open("output_image.png", "wb") as f:
    f.write(img_data)

print("Transcript:", transcript)
print("Summary:", summary)
print("Image Prompt:", image_prompt)
print("Image URL:", image_url)

Exercise 3: Add Logging to Your Pipeline

Task:

Add a logging system that tracks when a file was processed and logs each stage of the pipeline.

Solution:

import logging

logging.basicConfig(
    filename="pipeline.log",
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s"
)

logging.info("Started processing example.m4a")
# After each step:
logging.info("Transcription complete")
logging.info("Summary complete")
logging.info("Image generated successfully")

Exercise 4: Deploy the Flask Dashboard on Render

Task:

Take your app.py and HTML files and deploy them using Render. Make sure to:

Create a requirements.txt
Use gunicorn as your server
Add OPENAI_API_KEY as an environment variable
Test the URL on both desktop and mobile

Deployment Notes:

# requirements.txt should include:
Flask
openai
python-dotenv
gunicorn
requests

Start command on Render:

gunicorn app:app

Exercise 5: Auto-Process Audio Uploads in Background

Task:

Create a local script that watches a directory for .mp3 or .m4a files and processes them automatically.

Solution (Watch Script):

import os
import time
from transcribe import transcribe_audio
from summarize import summarize_text
from generate_image import generate_image_from_prompt

UPLOAD_DIR = "uploads"
PROCESSED = set()

def watch_folder():
    while True:
        files = [f for f in os.listdir(UPLOAD_DIR) if f.endswith((".m4a", ".mp3"))]
        for f in files:
            if f not in PROCESSED:
                path = os.path.join(UPLOAD_DIR, f)
                transcript = transcribe_audio(path)
                summary = summarize_text(transcript)
                prompt = summarize_text(transcript)
                image_url = generate_image_from_prompt(prompt)
                print("✅ Processed:", f)
                PROCESSED.add(f)
        time.sleep(5)

watch_folder()

Exercise 6: Protect Your API Keys

Task:

Use python-dotenv to load your API key safely from a .env file.

Solution:

# .env
OPENAI_API_KEY=your-openai-key

# In your Python code
from dotenv import load_dotenv
import os

load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

By completing these exercises, you now have a hands-on understanding of how to:

Structure clean, modular AI pipelines
Automate file processing workflows
Secure your deployment and prepare for scaling
Build applications that combine voice, text, and visuals

Practical Exercises — Chapter 6

Exercise 1: Modularize Your Multimodal Logic

Task:

Refactor your existing multimodal pipeline (Whisper → GPT → DALL·E) into three clearly separated Python modules: transcribe.py, summarize.py, and generate_image.py.

Solution:

# transcribe.py
import openai

def transcribe_audio(file_path):
    with open(file_path, "rb") as audio_file:
        result = openai.Audio.transcribe("whisper-1", file=audio_file)
    return result["text"]

# summarize.py
import openai

def summarize_text(text):
    response = openai.ChatCompletion.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Summarize this text for clarity and brevity."},
            {"role": "user", "content": text}
        ]
    )
    return response["choices"][0]["message"]["content"]

# generate_image.py
import openai

def generate_image_from_prompt(prompt):
    response = openai.Image.create(
        prompt=prompt,
        model="dall-e-3",
        size="1024x1024",
        response_format="url"
    )
    return response["data"][0]["url"]

Exercise 2: Build a Local File Processor

Task:

Create a script that reads an audio file from the local system and automatically generates a transcription, summary, image prompt, and image.

Solution:

from transcribe import transcribe_audio
from summarize import summarize_text
from generate_image import generate_image_from_prompt
import requests

audio_file = "example.m4a"
transcript = transcribe_audio(audio_file)
summary = summarize_text(transcript)
image_prompt = summarize_text(transcript)  # You could also use a dedicated prompt function
image_url = generate_image_from_prompt(image_prompt)

# Download and save image
img_data = requests.get(image_url).content
with open("output_image.png", "wb") as f:
    f.write(img_data)

print("Transcript:", transcript)
print("Summary:", summary)
print("Image Prompt:", image_prompt)
print("Image URL:", image_url)

Exercise 3: Add Logging to Your Pipeline

Task:

Add a logging system that tracks when a file was processed and logs each stage of the pipeline.

Solution:

import logging

logging.basicConfig(
    filename="pipeline.log",
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s"
)

logging.info("Started processing example.m4a")
# After each step:
logging.info("Transcription complete")
logging.info("Summary complete")
logging.info("Image generated successfully")

Exercise 4: Deploy the Flask Dashboard on Render

Task:

Take your app.py and HTML files and deploy them using Render. Make sure to:

Create a requirements.txt
Use gunicorn as your server
Add OPENAI_API_KEY as an environment variable
Test the URL on both desktop and mobile

Deployment Notes:

# requirements.txt should include:
Flask
openai
python-dotenv
gunicorn
requests

Start command on Render:

gunicorn app:app

Exercise 5: Auto-Process Audio Uploads in Background

Task:

Create a local script that watches a directory for .mp3 or .m4a files and processes them automatically.

Solution (Watch Script):

import os
import time
from transcribe import transcribe_audio
from summarize import summarize_text
from generate_image import generate_image_from_prompt

UPLOAD_DIR = "uploads"
PROCESSED = set()

def watch_folder():
    while True:
        files = [f for f in os.listdir(UPLOAD_DIR) if f.endswith((".m4a", ".mp3"))]
        for f in files:
            if f not in PROCESSED:
                path = os.path.join(UPLOAD_DIR, f)
                transcript = transcribe_audio(path)
                summary = summarize_text(transcript)
                prompt = summarize_text(transcript)
                image_url = generate_image_from_prompt(prompt)
                print("✅ Processed:", f)
                PROCESSED.add(f)
        time.sleep(5)

watch_folder()

Exercise 6: Protect Your API Keys

Task:

Use python-dotenv to load your API key safely from a .env file.

Solution:

# .env
OPENAI_API_KEY=your-openai-key

# In your Python code
from dotenv import load_dotenv
import os

load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

By completing these exercises, you now have a hands-on understanding of how to:

Structure clean, modular AI pipelines
Automate file processing workflows
Secure your deployment and prepare for scaling
Build applications that combine voice, text, and visuals

Practical Exercises — Chapter 6

Exercise 1: Modularize Your Multimodal Logic

Task:

Refactor your existing multimodal pipeline (Whisper → GPT → DALL·E) into three clearly separated Python modules: transcribe.py, summarize.py, and generate_image.py.

Solution:

# transcribe.py
import openai

def transcribe_audio(file_path):
    with open(file_path, "rb") as audio_file:
        result = openai.Audio.transcribe("whisper-1", file=audio_file)
    return result["text"]

# summarize.py
import openai

def summarize_text(text):
    response = openai.ChatCompletion.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Summarize this text for clarity and brevity."},
            {"role": "user", "content": text}
        ]
    )
    return response["choices"][0]["message"]["content"]

# generate_image.py
import openai

def generate_image_from_prompt(prompt):
    response = openai.Image.create(
        prompt=prompt,
        model="dall-e-3",
        size="1024x1024",
        response_format="url"
    )
    return response["data"][0]["url"]

Exercise 2: Build a Local File Processor

Task:

Create a script that reads an audio file from the local system and automatically generates a transcription, summary, image prompt, and image.

Solution:

from transcribe import transcribe_audio
from summarize import summarize_text
from generate_image import generate_image_from_prompt
import requests

audio_file = "example.m4a"
transcript = transcribe_audio(audio_file)
summary = summarize_text(transcript)
image_prompt = summarize_text(transcript)  # You could also use a dedicated prompt function
image_url = generate_image_from_prompt(image_prompt)

# Download and save image
img_data = requests.get(image_url).content
with open("output_image.png", "wb") as f:
    f.write(img_data)

print("Transcript:", transcript)
print("Summary:", summary)
print("Image Prompt:", image_prompt)
print("Image URL:", image_url)

Exercise 3: Add Logging to Your Pipeline

Task:

Add a logging system that tracks when a file was processed and logs each stage of the pipeline.

Solution:

import logging

logging.basicConfig(
    filename="pipeline.log",
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s"
)

logging.info("Started processing example.m4a")
# After each step:
logging.info("Transcription complete")
logging.info("Summary complete")
logging.info("Image generated successfully")

Exercise 4: Deploy the Flask Dashboard on Render

Task:

Take your app.py and HTML files and deploy them using Render. Make sure to:

Create a requirements.txt
Use gunicorn as your server
Add OPENAI_API_KEY as an environment variable
Test the URL on both desktop and mobile

Deployment Notes:

# requirements.txt should include:
Flask
openai
python-dotenv
gunicorn
requests

Start command on Render:

gunicorn app:app

Exercise 5: Auto-Process Audio Uploads in Background

Task:

Create a local script that watches a directory for .mp3 or .m4a files and processes them automatically.

Solution (Watch Script):

import os
import time
from transcribe import transcribe_audio
from summarize import summarize_text
from generate_image import generate_image_from_prompt

UPLOAD_DIR = "uploads"
PROCESSED = set()

def watch_folder():
    while True:
        files = [f for f in os.listdir(UPLOAD_DIR) if f.endswith((".m4a", ".mp3"))]
        for f in files:
            if f not in PROCESSED:
                path = os.path.join(UPLOAD_DIR, f)
                transcript = transcribe_audio(path)
                summary = summarize_text(transcript)
                prompt = summarize_text(transcript)
                image_url = generate_image_from_prompt(prompt)
                print("✅ Processed:", f)
                PROCESSED.add(f)
        time.sleep(5)

watch_folder()

Exercise 6: Protect Your API Keys

Task:

Use python-dotenv to load your API key safely from a .env file.

Solution:

# .env
OPENAI_API_KEY=your-openai-key

# In your Python code
from dotenv import load_dotenv
import os

load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

By completing these exercises, you now have a hands-on understanding of how to:

Structure clean, modular AI pipelines
Automate file processing workflows
Secure your deployment and prepare for scaling
Build applications that combine voice, text, and visuals

Purchase this book

The App is Under a Quick Maintenance

We apologize for the inconvenience. Please come back later

Chapter 6: Cross-Model AI Suites

Practical Exercises — Chapter 6

Exercise 1: Modularize Your Multimodal Logic

Exercise 2: Build a Local File Processor

Exercise 3: Add Logging to Your Pipeline

Exercise 4: Deploy the Flask Dashboard on Render

Exercise 5: Auto-Process Audio Uploads in Background

Exercise 6: Protect Your API Keys

Practical Exercises — Chapter 6

Exercise 1: Modularize Your Multimodal Logic

Exercise 2: Build a Local File Processor

Exercise 3: Add Logging to Your Pipeline

Exercise 4: Deploy the Flask Dashboard on Render

Exercise 5: Auto-Process Audio Uploads in Background

Exercise 6: Protect Your API Keys

Practical Exercises — Chapter 6

Exercise 1: Modularize Your Multimodal Logic

Exercise 2: Build a Local File Processor

Exercise 3: Add Logging to Your Pipeline

Exercise 4: Deploy the Flask Dashboard on Render

Exercise 5: Auto-Process Audio Uploads in Background

Exercise 6: Protect Your API Keys

Practical Exercises — Chapter 6

Exercise 1: Modularize Your Multimodal Logic

Exercise 2: Build a Local File Processor

Exercise 3: Add Logging to Your Pipeline

Exercise 4: Deploy the Flask Dashboard on Render

Exercise 5: Auto-Process Audio Uploads in Background

Exercise 6: Protect Your API Keys