Project: Visual Story Generator: GPT-4o + DALL·E image flow based on prompt narrative
4. Step-by-Step Explanation (Referencing the Code Above)
- Setup & Initialization:
- The script imports necessary libraries (
openai
,os
,time
,dotenv
,datetime
). - It loads the
OPENAI_API_KEY
from your.env
file. - It initializes the
client = OpenAI(...)
object, which will be used for all API interactions. Error handling is included here. - Constants for the assistant's configuration, polling interval, and timeout are defined.
- The script imports necessary libraries (
Create/Retrieve Assistant (create_or_retrieve_assistant function):
- This function first checks if an assistant with the specified
ASSISTANT_NAME
already exists to avoid creating duplicates. - If not found, it calls
client.beta.assistants.create(...)
using the defined name, instructions, model (gpt-4o
), and crucially enables theimage_generation
tool. This tool uses DALL·E 3 behind the scenes.
- This function first checks if an assistant with the specified
Start a Thread (generate_visual_story function):
- A new conversation thread is created using
client.beta.threads.create()
. Threads store the message history.
- A new conversation thread is created using
Send User Prompt (generate_visual_story function):
- The user's story idea (
user_prompt
) is added to the thread usingclient.beta.threads.messages.create(...)
withrole="user"
.
- The user's story idea (
Run the Assistant (generate_visual_story function):
- The assistant is instructed to process the thread using
client.beta.threads.runs.create(...)
, passing theassistant_id
andthread_id
.
- The assistant is instructed to process the thread using
Wait for Completion (poll_run_status function):
- Creating the story and images takes time. The script enters a loop, periodically checking the run's status using
client.beta.threads.runs.retrieve(...)
. - It prints the status (
queued
,in_progress
,completed
, etc.) for user feedback. - The loop continues until the status is
completed
or another terminal state (failed
,cancelled
,expired
), or until the timeout is reached.
- Creating the story and images takes time. The script enters a loop, periodically checking the run's status using
Retrieve Story Text & Image File IDs (generate_visual_story function):
- Once the run is complete,
client.beta.threads.messages.list(...)
retrieves all messages from the thread (usingorder="asc"
to get them chronologically). - The code iterates through the assistant's messages.
- For content of
type="text"
, it prints the scene description. - For content of
type="image_file"
, itprints the file_id
associated with the generated image. This ID is the key to getting the actual image.
- Once the run is complete,
Retrieve Image Content (save_image_from_file_id function):
- This is the crucial step added. For each
file_id
obtained in the previous step, this helper function is called. - It uses
client.files.retrieve_content(file_id)
to fetch the raw binary data of the image. - It then saves this binary data into a
.png
file in a specified output directory (story_images
), including a timestamp and the file ID in the filename for uniqueness.
- This is the crucial step added. For each
Optional Extensions
- Add page numbers or captions based on the
scene_count
. - Modify the
ASSISTANT_INSTRUCTIONS
to accept a visual style parameter (e.g., "Generate images in a watercolor style"). - Implement logic to load a previous
thread_id
to continue or remix stories. - Use a library like
ReportLab
or HTML generation to create a formatted PDF or web page output.
What You’ve Built
By running this script, you'll have executed a complete AI storytelling engine that:
- Accepts a natural language prompt.
- Uses GPT-4o to generate sequential scenes with descriptions.
- Invokes the DALL·E 3 tool to render matching illustrations.
- Retrieves both text descriptions and actual image files.
- Builds the components of a visually compelling narrative experience.
This project effectively combines multimodal handling (text prompt -> text + image generation), tool chaining within the Assistants API, asynchronous operation handling, and file retrieval into one practical and fun application.
4. Step-by-Step Explanation (Referencing the Code Above)
- Setup & Initialization:
- The script imports necessary libraries (
openai
,os
,time
,dotenv
,datetime
). - It loads the
OPENAI_API_KEY
from your.env
file. - It initializes the
client = OpenAI(...)
object, which will be used for all API interactions. Error handling is included here. - Constants for the assistant's configuration, polling interval, and timeout are defined.
- The script imports necessary libraries (
Create/Retrieve Assistant (create_or_retrieve_assistant function):
- This function first checks if an assistant with the specified
ASSISTANT_NAME
already exists to avoid creating duplicates. - If not found, it calls
client.beta.assistants.create(...)
using the defined name, instructions, model (gpt-4o
), and crucially enables theimage_generation
tool. This tool uses DALL·E 3 behind the scenes.
- This function first checks if an assistant with the specified
Start a Thread (generate_visual_story function):
- A new conversation thread is created using
client.beta.threads.create()
. Threads store the message history.
- A new conversation thread is created using
Send User Prompt (generate_visual_story function):
- The user's story idea (
user_prompt
) is added to the thread usingclient.beta.threads.messages.create(...)
withrole="user"
.
- The user's story idea (
Run the Assistant (generate_visual_story function):
- The assistant is instructed to process the thread using
client.beta.threads.runs.create(...)
, passing theassistant_id
andthread_id
.
- The assistant is instructed to process the thread using
Wait for Completion (poll_run_status function):
- Creating the story and images takes time. The script enters a loop, periodically checking the run's status using
client.beta.threads.runs.retrieve(...)
. - It prints the status (
queued
,in_progress
,completed
, etc.) for user feedback. - The loop continues until the status is
completed
or another terminal state (failed
,cancelled
,expired
), or until the timeout is reached.
- Creating the story and images takes time. The script enters a loop, periodically checking the run's status using
Retrieve Story Text & Image File IDs (generate_visual_story function):
- Once the run is complete,
client.beta.threads.messages.list(...)
retrieves all messages from the thread (usingorder="asc"
to get them chronologically). - The code iterates through the assistant's messages.
- For content of
type="text"
, it prints the scene description. - For content of
type="image_file"
, itprints the file_id
associated with the generated image. This ID is the key to getting the actual image.
- Once the run is complete,
Retrieve Image Content (save_image_from_file_id function):
- This is the crucial step added. For each
file_id
obtained in the previous step, this helper function is called. - It uses
client.files.retrieve_content(file_id)
to fetch the raw binary data of the image. - It then saves this binary data into a
.png
file in a specified output directory (story_images
), including a timestamp and the file ID in the filename for uniqueness.
- This is the crucial step added. For each
Optional Extensions
- Add page numbers or captions based on the
scene_count
. - Modify the
ASSISTANT_INSTRUCTIONS
to accept a visual style parameter (e.g., "Generate images in a watercolor style"). - Implement logic to load a previous
thread_id
to continue or remix stories. - Use a library like
ReportLab
or HTML generation to create a formatted PDF or web page output.
What You’ve Built
By running this script, you'll have executed a complete AI storytelling engine that:
- Accepts a natural language prompt.
- Uses GPT-4o to generate sequential scenes with descriptions.
- Invokes the DALL·E 3 tool to render matching illustrations.
- Retrieves both text descriptions and actual image files.
- Builds the components of a visually compelling narrative experience.
This project effectively combines multimodal handling (text prompt -> text + image generation), tool chaining within the Assistants API, asynchronous operation handling, and file retrieval into one practical and fun application.
4. Step-by-Step Explanation (Referencing the Code Above)
- Setup & Initialization:
- The script imports necessary libraries (
openai
,os
,time
,dotenv
,datetime
). - It loads the
OPENAI_API_KEY
from your.env
file. - It initializes the
client = OpenAI(...)
object, which will be used for all API interactions. Error handling is included here. - Constants for the assistant's configuration, polling interval, and timeout are defined.
- The script imports necessary libraries (
Create/Retrieve Assistant (create_or_retrieve_assistant function):
- This function first checks if an assistant with the specified
ASSISTANT_NAME
already exists to avoid creating duplicates. - If not found, it calls
client.beta.assistants.create(...)
using the defined name, instructions, model (gpt-4o
), and crucially enables theimage_generation
tool. This tool uses DALL·E 3 behind the scenes.
- This function first checks if an assistant with the specified
Start a Thread (generate_visual_story function):
- A new conversation thread is created using
client.beta.threads.create()
. Threads store the message history.
- A new conversation thread is created using
Send User Prompt (generate_visual_story function):
- The user's story idea (
user_prompt
) is added to the thread usingclient.beta.threads.messages.create(...)
withrole="user"
.
- The user's story idea (
Run the Assistant (generate_visual_story function):
- The assistant is instructed to process the thread using
client.beta.threads.runs.create(...)
, passing theassistant_id
andthread_id
.
- The assistant is instructed to process the thread using
Wait for Completion (poll_run_status function):
- Creating the story and images takes time. The script enters a loop, periodically checking the run's status using
client.beta.threads.runs.retrieve(...)
. - It prints the status (
queued
,in_progress
,completed
, etc.) for user feedback. - The loop continues until the status is
completed
or another terminal state (failed
,cancelled
,expired
), or until the timeout is reached.
- Creating the story and images takes time. The script enters a loop, periodically checking the run's status using
Retrieve Story Text & Image File IDs (generate_visual_story function):
- Once the run is complete,
client.beta.threads.messages.list(...)
retrieves all messages from the thread (usingorder="asc"
to get them chronologically). - The code iterates through the assistant's messages.
- For content of
type="text"
, it prints the scene description. - For content of
type="image_file"
, itprints the file_id
associated with the generated image. This ID is the key to getting the actual image.
- Once the run is complete,
Retrieve Image Content (save_image_from_file_id function):
- This is the crucial step added. For each
file_id
obtained in the previous step, this helper function is called. - It uses
client.files.retrieve_content(file_id)
to fetch the raw binary data of the image. - It then saves this binary data into a
.png
file in a specified output directory (story_images
), including a timestamp and the file ID in the filename for uniqueness.
- This is the crucial step added. For each
Optional Extensions
- Add page numbers or captions based on the
scene_count
. - Modify the
ASSISTANT_INSTRUCTIONS
to accept a visual style parameter (e.g., "Generate images in a watercolor style"). - Implement logic to load a previous
thread_id
to continue or remix stories. - Use a library like
ReportLab
or HTML generation to create a formatted PDF or web page output.
What You’ve Built
By running this script, you'll have executed a complete AI storytelling engine that:
- Accepts a natural language prompt.
- Uses GPT-4o to generate sequential scenes with descriptions.
- Invokes the DALL·E 3 tool to render matching illustrations.
- Retrieves both text descriptions and actual image files.
- Builds the components of a visually compelling narrative experience.
This project effectively combines multimodal handling (text prompt -> text + image generation), tool chaining within the Assistants API, asynchronous operation handling, and file retrieval into one practical and fun application.
4. Step-by-Step Explanation (Referencing the Code Above)
- Setup & Initialization:
- The script imports necessary libraries (
openai
,os
,time
,dotenv
,datetime
). - It loads the
OPENAI_API_KEY
from your.env
file. - It initializes the
client = OpenAI(...)
object, which will be used for all API interactions. Error handling is included here. - Constants for the assistant's configuration, polling interval, and timeout are defined.
- The script imports necessary libraries (
Create/Retrieve Assistant (create_or_retrieve_assistant function):
- This function first checks if an assistant with the specified
ASSISTANT_NAME
already exists to avoid creating duplicates. - If not found, it calls
client.beta.assistants.create(...)
using the defined name, instructions, model (gpt-4o
), and crucially enables theimage_generation
tool. This tool uses DALL·E 3 behind the scenes.
- This function first checks if an assistant with the specified
Start a Thread (generate_visual_story function):
- A new conversation thread is created using
client.beta.threads.create()
. Threads store the message history.
- A new conversation thread is created using
Send User Prompt (generate_visual_story function):
- The user's story idea (
user_prompt
) is added to the thread usingclient.beta.threads.messages.create(...)
withrole="user"
.
- The user's story idea (
Run the Assistant (generate_visual_story function):
- The assistant is instructed to process the thread using
client.beta.threads.runs.create(...)
, passing theassistant_id
andthread_id
.
- The assistant is instructed to process the thread using
Wait for Completion (poll_run_status function):
- Creating the story and images takes time. The script enters a loop, periodically checking the run's status using
client.beta.threads.runs.retrieve(...)
. - It prints the status (
queued
,in_progress
,completed
, etc.) for user feedback. - The loop continues until the status is
completed
or another terminal state (failed
,cancelled
,expired
), or until the timeout is reached.
- Creating the story and images takes time. The script enters a loop, periodically checking the run's status using
Retrieve Story Text & Image File IDs (generate_visual_story function):
- Once the run is complete,
client.beta.threads.messages.list(...)
retrieves all messages from the thread (usingorder="asc"
to get them chronologically). - The code iterates through the assistant's messages.
- For content of
type="text"
, it prints the scene description. - For content of
type="image_file"
, itprints the file_id
associated with the generated image. This ID is the key to getting the actual image.
- Once the run is complete,
Retrieve Image Content (save_image_from_file_id function):
- This is the crucial step added. For each
file_id
obtained in the previous step, this helper function is called. - It uses
client.files.retrieve_content(file_id)
to fetch the raw binary data of the image. - It then saves this binary data into a
.png
file in a specified output directory (story_images
), including a timestamp and the file ID in the filename for uniqueness.
- This is the crucial step added. For each
Optional Extensions
- Add page numbers or captions based on the
scene_count
. - Modify the
ASSISTANT_INSTRUCTIONS
to accept a visual style parameter (e.g., "Generate images in a watercolor style"). - Implement logic to load a previous
thread_id
to continue or remix stories. - Use a library like
ReportLab
or HTML generation to create a formatted PDF or web page output.
What You’ve Built
By running this script, you'll have executed a complete AI storytelling engine that:
- Accepts a natural language prompt.
- Uses GPT-4o to generate sequential scenes with descriptions.
- Invokes the DALL·E 3 tool to render matching illustrations.
- Retrieves both text descriptions and actual image files.
- Builds the components of a visually compelling narrative experience.
This project effectively combines multimodal handling (text prompt -> text + image generation), tool chaining within the Assistants API, asynchronous operation handling, and file retrieval into one practical and fun application.