Chapter 6: Cross-Model AI Suites
Practical Exercises — Chapter 6
Exercise 1: Modularize Your Multimodal Logic
Task:
Refactor your existing multimodal pipeline (Whisper → GPT → DALL·E) into three clearly separated Python modules: transcribe.py
, summarize.py
, and generate_image.py
.
Solution:
# transcribe.py
import openai
def transcribe_audio(file_path):
with open(file_path, "rb") as audio_file:
result = openai.Audio.transcribe("whisper-1", file=audio_file)
return result["text"]
# summarize.py
import openai
def summarize_text(text):
response = openai.ChatCompletion.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "Summarize this text for clarity and brevity."},
{"role": "user", "content": text}
]
)
return response["choices"][0]["message"]["content"]
# generate_image.py
import openai
def generate_image_from_prompt(prompt):
response = openai.Image.create(
prompt=prompt,
model="dall-e-3",
size="1024x1024",
response_format="url"
)
return response["data"][0]["url"]
Exercise 2: Build a Local File Processor
Task:
Create a script that reads an audio file from the local system and automatically generates a transcription, summary, image prompt, and image.
Solution:
from transcribe import transcribe_audio
from summarize import summarize_text
from generate_image import generate_image_from_prompt
import requests
audio_file = "example.m4a"
transcript = transcribe_audio(audio_file)
summary = summarize_text(transcript)
image_prompt = summarize_text(transcript) # You could also use a dedicated prompt function
image_url = generate_image_from_prompt(image_prompt)
# Download and save image
img_data = requests.get(image_url).content
with open("output_image.png", "wb") as f:
f.write(img_data)
print("Transcript:", transcript)
print("Summary:", summary)
print("Image Prompt:", image_prompt)
print("Image URL:", image_url)
Exercise 3: Add Logging to Your Pipeline
Task:
Add a logging system that tracks when a file was processed and logs each stage of the pipeline.
Solution:
import logging
logging.basicConfig(
filename="pipeline.log",
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s"
)
logging.info("Started processing example.m4a")
# After each step:
logging.info("Transcription complete")
logging.info("Summary complete")
logging.info("Image generated successfully")
Exercise 4: Deploy the Flask Dashboard on Render
Task:
Take your app.py
and HTML files and deploy them using Render. Make sure to:
- Create a
requirements.txt
- Use
gunicorn
as your server - Add
OPENAI_API_KEY
as an environment variable - Test the URL on both desktop and mobile
Deployment Notes:
# requirements.txt should include:
Flask
openai
python-dotenv
gunicorn
requests
Start command on Render:
gunicorn app:app
Exercise 5: Auto-Process Audio Uploads in Background
Task:
Create a local script that watches a directory for .mp3
or .m4a
files and processes them automatically.
Solution (Watch Script):
import os
import time
from transcribe import transcribe_audio
from summarize import summarize_text
from generate_image import generate_image_from_prompt
UPLOAD_DIR = "uploads"
PROCESSED = set()
def watch_folder():
while True:
files = [f for f in os.listdir(UPLOAD_DIR) if f.endswith((".m4a", ".mp3"))]
for f in files:
if f not in PROCESSED:
path = os.path.join(UPLOAD_DIR, f)
transcript = transcribe_audio(path)
summary = summarize_text(transcript)
prompt = summarize_text(transcript)
image_url = generate_image_from_prompt(prompt)
print("✅ Processed:", f)
PROCESSED.add(f)
time.sleep(5)
watch_folder()
Exercise 6: Protect Your API Keys
Task:
Use python-dotenv
to load your API key safely from a .env
file.
Solution:
# .env
OPENAI_API_KEY=your-openai-key
# In your Python code
from dotenv import load_dotenv
import os
load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")
By completing these exercises, you now have a hands-on understanding of how to:
- Structure clean, modular AI pipelines
- Automate file processing workflows
- Secure your deployment and prepare for scaling
- Build applications that combine voice, text, and visuals
This chapter forms the practical core of real-world AI products. In the next and final chapter, you’ll combine everything into a fully polished AI suite, complete with documentation, front-end, and real use case.
Practical Exercises — Chapter 6
Exercise 1: Modularize Your Multimodal Logic
Task:
Refactor your existing multimodal pipeline (Whisper → GPT → DALL·E) into three clearly separated Python modules: transcribe.py
, summarize.py
, and generate_image.py
.
Solution:
# transcribe.py
import openai
def transcribe_audio(file_path):
with open(file_path, "rb") as audio_file:
result = openai.Audio.transcribe("whisper-1", file=audio_file)
return result["text"]
# summarize.py
import openai
def summarize_text(text):
response = openai.ChatCompletion.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "Summarize this text for clarity and brevity."},
{"role": "user", "content": text}
]
)
return response["choices"][0]["message"]["content"]
# generate_image.py
import openai
def generate_image_from_prompt(prompt):
response = openai.Image.create(
prompt=prompt,
model="dall-e-3",
size="1024x1024",
response_format="url"
)
return response["data"][0]["url"]
Exercise 2: Build a Local File Processor
Task:
Create a script that reads an audio file from the local system and automatically generates a transcription, summary, image prompt, and image.
Solution:
from transcribe import transcribe_audio
from summarize import summarize_text
from generate_image import generate_image_from_prompt
import requests
audio_file = "example.m4a"
transcript = transcribe_audio(audio_file)
summary = summarize_text(transcript)
image_prompt = summarize_text(transcript) # You could also use a dedicated prompt function
image_url = generate_image_from_prompt(image_prompt)
# Download and save image
img_data = requests.get(image_url).content
with open("output_image.png", "wb") as f:
f.write(img_data)
print("Transcript:", transcript)
print("Summary:", summary)
print("Image Prompt:", image_prompt)
print("Image URL:", image_url)
Exercise 3: Add Logging to Your Pipeline
Task:
Add a logging system that tracks when a file was processed and logs each stage of the pipeline.
Solution:
import logging
logging.basicConfig(
filename="pipeline.log",
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s"
)
logging.info("Started processing example.m4a")
# After each step:
logging.info("Transcription complete")
logging.info("Summary complete")
logging.info("Image generated successfully")
Exercise 4: Deploy the Flask Dashboard on Render
Task:
Take your app.py
and HTML files and deploy them using Render. Make sure to:
- Create a
requirements.txt
- Use
gunicorn
as your server - Add
OPENAI_API_KEY
as an environment variable - Test the URL on both desktop and mobile
Deployment Notes:
# requirements.txt should include:
Flask
openai
python-dotenv
gunicorn
requests
Start command on Render:
gunicorn app:app
Exercise 5: Auto-Process Audio Uploads in Background
Task:
Create a local script that watches a directory for .mp3
or .m4a
files and processes them automatically.
Solution (Watch Script):
import os
import time
from transcribe import transcribe_audio
from summarize import summarize_text
from generate_image import generate_image_from_prompt
UPLOAD_DIR = "uploads"
PROCESSED = set()
def watch_folder():
while True:
files = [f for f in os.listdir(UPLOAD_DIR) if f.endswith((".m4a", ".mp3"))]
for f in files:
if f not in PROCESSED:
path = os.path.join(UPLOAD_DIR, f)
transcript = transcribe_audio(path)
summary = summarize_text(transcript)
prompt = summarize_text(transcript)
image_url = generate_image_from_prompt(prompt)
print("✅ Processed:", f)
PROCESSED.add(f)
time.sleep(5)
watch_folder()
Exercise 6: Protect Your API Keys
Task:
Use python-dotenv
to load your API key safely from a .env
file.
Solution:
# .env
OPENAI_API_KEY=your-openai-key
# In your Python code
from dotenv import load_dotenv
import os
load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")
By completing these exercises, you now have a hands-on understanding of how to:
- Structure clean, modular AI pipelines
- Automate file processing workflows
- Secure your deployment and prepare for scaling
- Build applications that combine voice, text, and visuals
This chapter forms the practical core of real-world AI products. In the next and final chapter, you’ll combine everything into a fully polished AI suite, complete with documentation, front-end, and real use case.
Practical Exercises — Chapter 6
Exercise 1: Modularize Your Multimodal Logic
Task:
Refactor your existing multimodal pipeline (Whisper → GPT → DALL·E) into three clearly separated Python modules: transcribe.py
, summarize.py
, and generate_image.py
.
Solution:
# transcribe.py
import openai
def transcribe_audio(file_path):
with open(file_path, "rb") as audio_file:
result = openai.Audio.transcribe("whisper-1", file=audio_file)
return result["text"]
# summarize.py
import openai
def summarize_text(text):
response = openai.ChatCompletion.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "Summarize this text for clarity and brevity."},
{"role": "user", "content": text}
]
)
return response["choices"][0]["message"]["content"]
# generate_image.py
import openai
def generate_image_from_prompt(prompt):
response = openai.Image.create(
prompt=prompt,
model="dall-e-3",
size="1024x1024",
response_format="url"
)
return response["data"][0]["url"]
Exercise 2: Build a Local File Processor
Task:
Create a script that reads an audio file from the local system and automatically generates a transcription, summary, image prompt, and image.
Solution:
from transcribe import transcribe_audio
from summarize import summarize_text
from generate_image import generate_image_from_prompt
import requests
audio_file = "example.m4a"
transcript = transcribe_audio(audio_file)
summary = summarize_text(transcript)
image_prompt = summarize_text(transcript) # You could also use a dedicated prompt function
image_url = generate_image_from_prompt(image_prompt)
# Download and save image
img_data = requests.get(image_url).content
with open("output_image.png", "wb") as f:
f.write(img_data)
print("Transcript:", transcript)
print("Summary:", summary)
print("Image Prompt:", image_prompt)
print("Image URL:", image_url)
Exercise 3: Add Logging to Your Pipeline
Task:
Add a logging system that tracks when a file was processed and logs each stage of the pipeline.
Solution:
import logging
logging.basicConfig(
filename="pipeline.log",
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s"
)
logging.info("Started processing example.m4a")
# After each step:
logging.info("Transcription complete")
logging.info("Summary complete")
logging.info("Image generated successfully")
Exercise 4: Deploy the Flask Dashboard on Render
Task:
Take your app.py
and HTML files and deploy them using Render. Make sure to:
- Create a
requirements.txt
- Use
gunicorn
as your server - Add
OPENAI_API_KEY
as an environment variable - Test the URL on both desktop and mobile
Deployment Notes:
# requirements.txt should include:
Flask
openai
python-dotenv
gunicorn
requests
Start command on Render:
gunicorn app:app
Exercise 5: Auto-Process Audio Uploads in Background
Task:
Create a local script that watches a directory for .mp3
or .m4a
files and processes them automatically.
Solution (Watch Script):
import os
import time
from transcribe import transcribe_audio
from summarize import summarize_text
from generate_image import generate_image_from_prompt
UPLOAD_DIR = "uploads"
PROCESSED = set()
def watch_folder():
while True:
files = [f for f in os.listdir(UPLOAD_DIR) if f.endswith((".m4a", ".mp3"))]
for f in files:
if f not in PROCESSED:
path = os.path.join(UPLOAD_DIR, f)
transcript = transcribe_audio(path)
summary = summarize_text(transcript)
prompt = summarize_text(transcript)
image_url = generate_image_from_prompt(prompt)
print("✅ Processed:", f)
PROCESSED.add(f)
time.sleep(5)
watch_folder()
Exercise 6: Protect Your API Keys
Task:
Use python-dotenv
to load your API key safely from a .env
file.
Solution:
# .env
OPENAI_API_KEY=your-openai-key
# In your Python code
from dotenv import load_dotenv
import os
load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")
By completing these exercises, you now have a hands-on understanding of how to:
- Structure clean, modular AI pipelines
- Automate file processing workflows
- Secure your deployment and prepare for scaling
- Build applications that combine voice, text, and visuals
This chapter forms the practical core of real-world AI products. In the next and final chapter, you’ll combine everything into a fully polished AI suite, complete with documentation, front-end, and real use case.
Practical Exercises — Chapter 6
Exercise 1: Modularize Your Multimodal Logic
Task:
Refactor your existing multimodal pipeline (Whisper → GPT → DALL·E) into three clearly separated Python modules: transcribe.py
, summarize.py
, and generate_image.py
.
Solution:
# transcribe.py
import openai
def transcribe_audio(file_path):
with open(file_path, "rb") as audio_file:
result = openai.Audio.transcribe("whisper-1", file=audio_file)
return result["text"]
# summarize.py
import openai
def summarize_text(text):
response = openai.ChatCompletion.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "Summarize this text for clarity and brevity."},
{"role": "user", "content": text}
]
)
return response["choices"][0]["message"]["content"]
# generate_image.py
import openai
def generate_image_from_prompt(prompt):
response = openai.Image.create(
prompt=prompt,
model="dall-e-3",
size="1024x1024",
response_format="url"
)
return response["data"][0]["url"]
Exercise 2: Build a Local File Processor
Task:
Create a script that reads an audio file from the local system and automatically generates a transcription, summary, image prompt, and image.
Solution:
from transcribe import transcribe_audio
from summarize import summarize_text
from generate_image import generate_image_from_prompt
import requests
audio_file = "example.m4a"
transcript = transcribe_audio(audio_file)
summary = summarize_text(transcript)
image_prompt = summarize_text(transcript) # You could also use a dedicated prompt function
image_url = generate_image_from_prompt(image_prompt)
# Download and save image
img_data = requests.get(image_url).content
with open("output_image.png", "wb") as f:
f.write(img_data)
print("Transcript:", transcript)
print("Summary:", summary)
print("Image Prompt:", image_prompt)
print("Image URL:", image_url)
Exercise 3: Add Logging to Your Pipeline
Task:
Add a logging system that tracks when a file was processed and logs each stage of the pipeline.
Solution:
import logging
logging.basicConfig(
filename="pipeline.log",
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s"
)
logging.info("Started processing example.m4a")
# After each step:
logging.info("Transcription complete")
logging.info("Summary complete")
logging.info("Image generated successfully")
Exercise 4: Deploy the Flask Dashboard on Render
Task:
Take your app.py
and HTML files and deploy them using Render. Make sure to:
- Create a
requirements.txt
- Use
gunicorn
as your server - Add
OPENAI_API_KEY
as an environment variable - Test the URL on both desktop and mobile
Deployment Notes:
# requirements.txt should include:
Flask
openai
python-dotenv
gunicorn
requests
Start command on Render:
gunicorn app:app
Exercise 5: Auto-Process Audio Uploads in Background
Task:
Create a local script that watches a directory for .mp3
or .m4a
files and processes them automatically.
Solution (Watch Script):
import os
import time
from transcribe import transcribe_audio
from summarize import summarize_text
from generate_image import generate_image_from_prompt
UPLOAD_DIR = "uploads"
PROCESSED = set()
def watch_folder():
while True:
files = [f for f in os.listdir(UPLOAD_DIR) if f.endswith((".m4a", ".mp3"))]
for f in files:
if f not in PROCESSED:
path = os.path.join(UPLOAD_DIR, f)
transcript = transcribe_audio(path)
summary = summarize_text(transcript)
prompt = summarize_text(transcript)
image_url = generate_image_from_prompt(prompt)
print("✅ Processed:", f)
PROCESSED.add(f)
time.sleep(5)
watch_folder()
Exercise 6: Protect Your API Keys
Task:
Use python-dotenv
to load your API key safely from a .env
file.
Solution:
# .env
OPENAI_API_KEY=your-openai-key
# In your Python code
from dotenv import load_dotenv
import os
load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")
By completing these exercises, you now have a hands-on understanding of how to:
- Structure clean, modular AI pipelines
- Automate file processing workflows
- Secure your deployment and prepare for scaling
- Build applications that combine voice, text, and visuals
This chapter forms the practical core of real-world AI products. In the next and final chapter, you’ll combine everything into a fully polished AI suite, complete with documentation, front-end, and real use case.