Code icon

The App is Under a Quick Maintenance

We apologize for the inconvenience. Please come back later

Menu iconMenu iconOpenAI API Bible Volume 2
OpenAI API Bible Volume 2

Project: Visual Story Generator: GPT-4o + DALL·E image flow based on prompt narrative

4. Step-by-Step Explanation (Referencing the Code Above)

  1. Setup & Initialization:
    • The script imports necessary libraries (openaiostimedotenvdatetime).
    • It loads the OPENAI_API_KEY from your .env file.
    • It initializes the client = OpenAI(...) object, which will be used for all API interactions. Error handling is included here.
    • Constants for the assistant's configuration, polling interval, and timeout are defined.
  2. Create/Retrieve Assistant (create_or_retrieve_assistant function):
    • This function first checks if an assistant with the specified ASSISTANT_NAME already exists to avoid creating duplicates.
    • If not found, it calls client.beta.assistants.create(...) using the defined name, instructions, model (gpt-4o), and crucially enables the image_generation tool. This tool uses DALL·E 3 behind the scenes.
  3. Start a Thread (generate_visual_story function):
    • A new conversation thread is created using client.beta.threads.create(). Threads store the message history.
  4. Send User Prompt (generate_visual_story function):
    • The user's story idea (user_prompt) is added to the thread using client.beta.threads.messages.create(...) with role="user".
  5. Run the Assistant (generate_visual_story function):
    • The assistant is instructed to process the thread using client.beta.threads.runs.create(...), passing the assistant_id and thread_id.
  6. Wait for Completion (poll_run_status function):
    • Creating the story and images takes time. The script enters a loop, periodically checking the run's status using client.beta.threads.runs.retrieve(...).
    • It prints the status (queuedin_progresscompleted, etc.) for user feedback.
    • The loop continues until the status is completed or another terminal state (failedcancelledexpired), or until the timeout is reached.
  7. Retrieve Story Text & Image File IDs (generate_visual_story function):
    • Once the run is complete, client.beta.threads.messages.list(...) retrieves all messages from the thread (using order="asc" to get them chronologically).
    • The code iterates through the assistant's messages.
    • For content of type="text", it prints the scene description.
    • For content of type="image_file", it prints the file_id associated with the generated image. This ID is the key to getting the actual image.
  8. Retrieve Image Content (save_image_from_file_id function):
    • This is the crucial step added. For each file_id obtained in the previous step, this helper function is called.
    • It uses client.files.retrieve_content(file_id) to fetch the raw binary data of the image.
    • It then saves this binary data into a .png file in a specified output directory (story_images), including a timestamp and the file ID in the filename for uniqueness.

Optional Extensions

  • Add page numbers or captions based on the scene_count.
  • Modify the ASSISTANT_INSTRUCTIONS to accept a visual style parameter (e.g., "Generate images in a watercolor style").
  • Implement logic to load a previous thread_id to continue or remix stories.
  • Use a library like ReportLab or HTML generation to create a formatted PDF or web page output.

What You’ve Built

By running this script, you'll have executed a complete AI storytelling engine that:

  • Accepts a natural language prompt.
  • Uses GPT-4o to generate sequential scenes with descriptions.
  • Invokes the DALL·E 3 tool to render matching illustrations.
  • Retrieves both text descriptions and actual image files.
  • Builds the components of a visually compelling narrative experience.

This project effectively combines multimodal handling (text prompt -> text + image generation), tool chaining within the Assistants API, asynchronous operation handling, and file retrieval into one practical and fun application.

4. Step-by-Step Explanation (Referencing the Code Above)

  1. Setup & Initialization:
    • The script imports necessary libraries (openaiostimedotenvdatetime).
    • It loads the OPENAI_API_KEY from your .env file.
    • It initializes the client = OpenAI(...) object, which will be used for all API interactions. Error handling is included here.
    • Constants for the assistant's configuration, polling interval, and timeout are defined.
  2. Create/Retrieve Assistant (create_or_retrieve_assistant function):
    • This function first checks if an assistant with the specified ASSISTANT_NAME already exists to avoid creating duplicates.
    • If not found, it calls client.beta.assistants.create(...) using the defined name, instructions, model (gpt-4o), and crucially enables the image_generation tool. This tool uses DALL·E 3 behind the scenes.
  3. Start a Thread (generate_visual_story function):
    • A new conversation thread is created using client.beta.threads.create(). Threads store the message history.
  4. Send User Prompt (generate_visual_story function):
    • The user's story idea (user_prompt) is added to the thread using client.beta.threads.messages.create(...) with role="user".
  5. Run the Assistant (generate_visual_story function):
    • The assistant is instructed to process the thread using client.beta.threads.runs.create(...), passing the assistant_id and thread_id.
  6. Wait for Completion (poll_run_status function):
    • Creating the story and images takes time. The script enters a loop, periodically checking the run's status using client.beta.threads.runs.retrieve(...).
    • It prints the status (queuedin_progresscompleted, etc.) for user feedback.
    • The loop continues until the status is completed or another terminal state (failedcancelledexpired), or until the timeout is reached.
  7. Retrieve Story Text & Image File IDs (generate_visual_story function):
    • Once the run is complete, client.beta.threads.messages.list(...) retrieves all messages from the thread (using order="asc" to get them chronologically).
    • The code iterates through the assistant's messages.
    • For content of type="text", it prints the scene description.
    • For content of type="image_file", it prints the file_id associated with the generated image. This ID is the key to getting the actual image.
  8. Retrieve Image Content (save_image_from_file_id function):
    • This is the crucial step added. For each file_id obtained in the previous step, this helper function is called.
    • It uses client.files.retrieve_content(file_id) to fetch the raw binary data of the image.
    • It then saves this binary data into a .png file in a specified output directory (story_images), including a timestamp and the file ID in the filename for uniqueness.

Optional Extensions

  • Add page numbers or captions based on the scene_count.
  • Modify the ASSISTANT_INSTRUCTIONS to accept a visual style parameter (e.g., "Generate images in a watercolor style").
  • Implement logic to load a previous thread_id to continue or remix stories.
  • Use a library like ReportLab or HTML generation to create a formatted PDF or web page output.

What You’ve Built

By running this script, you'll have executed a complete AI storytelling engine that:

  • Accepts a natural language prompt.
  • Uses GPT-4o to generate sequential scenes with descriptions.
  • Invokes the DALL·E 3 tool to render matching illustrations.
  • Retrieves both text descriptions and actual image files.
  • Builds the components of a visually compelling narrative experience.

This project effectively combines multimodal handling (text prompt -> text + image generation), tool chaining within the Assistants API, asynchronous operation handling, and file retrieval into one practical and fun application.

4. Step-by-Step Explanation (Referencing the Code Above)

  1. Setup & Initialization:
    • The script imports necessary libraries (openaiostimedotenvdatetime).
    • It loads the OPENAI_API_KEY from your .env file.
    • It initializes the client = OpenAI(...) object, which will be used for all API interactions. Error handling is included here.
    • Constants for the assistant's configuration, polling interval, and timeout are defined.
  2. Create/Retrieve Assistant (create_or_retrieve_assistant function):
    • This function first checks if an assistant with the specified ASSISTANT_NAME already exists to avoid creating duplicates.
    • If not found, it calls client.beta.assistants.create(...) using the defined name, instructions, model (gpt-4o), and crucially enables the image_generation tool. This tool uses DALL·E 3 behind the scenes.
  3. Start a Thread (generate_visual_story function):
    • A new conversation thread is created using client.beta.threads.create(). Threads store the message history.
  4. Send User Prompt (generate_visual_story function):
    • The user's story idea (user_prompt) is added to the thread using client.beta.threads.messages.create(...) with role="user".
  5. Run the Assistant (generate_visual_story function):
    • The assistant is instructed to process the thread using client.beta.threads.runs.create(...), passing the assistant_id and thread_id.
  6. Wait for Completion (poll_run_status function):
    • Creating the story and images takes time. The script enters a loop, periodically checking the run's status using client.beta.threads.runs.retrieve(...).
    • It prints the status (queuedin_progresscompleted, etc.) for user feedback.
    • The loop continues until the status is completed or another terminal state (failedcancelledexpired), or until the timeout is reached.
  7. Retrieve Story Text & Image File IDs (generate_visual_story function):
    • Once the run is complete, client.beta.threads.messages.list(...) retrieves all messages from the thread (using order="asc" to get them chronologically).
    • The code iterates through the assistant's messages.
    • For content of type="text", it prints the scene description.
    • For content of type="image_file", it prints the file_id associated with the generated image. This ID is the key to getting the actual image.
  8. Retrieve Image Content (save_image_from_file_id function):
    • This is the crucial step added. For each file_id obtained in the previous step, this helper function is called.
    • It uses client.files.retrieve_content(file_id) to fetch the raw binary data of the image.
    • It then saves this binary data into a .png file in a specified output directory (story_images), including a timestamp and the file ID in the filename for uniqueness.

Optional Extensions

  • Add page numbers or captions based on the scene_count.
  • Modify the ASSISTANT_INSTRUCTIONS to accept a visual style parameter (e.g., "Generate images in a watercolor style").
  • Implement logic to load a previous thread_id to continue or remix stories.
  • Use a library like ReportLab or HTML generation to create a formatted PDF or web page output.

What You’ve Built

By running this script, you'll have executed a complete AI storytelling engine that:

  • Accepts a natural language prompt.
  • Uses GPT-4o to generate sequential scenes with descriptions.
  • Invokes the DALL·E 3 tool to render matching illustrations.
  • Retrieves both text descriptions and actual image files.
  • Builds the components of a visually compelling narrative experience.

This project effectively combines multimodal handling (text prompt -> text + image generation), tool chaining within the Assistants API, asynchronous operation handling, and file retrieval into one practical and fun application.

4. Step-by-Step Explanation (Referencing the Code Above)

  1. Setup & Initialization:
    • The script imports necessary libraries (openaiostimedotenvdatetime).
    • It loads the OPENAI_API_KEY from your .env file.
    • It initializes the client = OpenAI(...) object, which will be used for all API interactions. Error handling is included here.
    • Constants for the assistant's configuration, polling interval, and timeout are defined.
  2. Create/Retrieve Assistant (create_or_retrieve_assistant function):
    • This function first checks if an assistant with the specified ASSISTANT_NAME already exists to avoid creating duplicates.
    • If not found, it calls client.beta.assistants.create(...) using the defined name, instructions, model (gpt-4o), and crucially enables the image_generation tool. This tool uses DALL·E 3 behind the scenes.
  3. Start a Thread (generate_visual_story function):
    • A new conversation thread is created using client.beta.threads.create(). Threads store the message history.
  4. Send User Prompt (generate_visual_story function):
    • The user's story idea (user_prompt) is added to the thread using client.beta.threads.messages.create(...) with role="user".
  5. Run the Assistant (generate_visual_story function):
    • The assistant is instructed to process the thread using client.beta.threads.runs.create(...), passing the assistant_id and thread_id.
  6. Wait for Completion (poll_run_status function):
    • Creating the story and images takes time. The script enters a loop, periodically checking the run's status using client.beta.threads.runs.retrieve(...).
    • It prints the status (queuedin_progresscompleted, etc.) for user feedback.
    • The loop continues until the status is completed or another terminal state (failedcancelledexpired), or until the timeout is reached.
  7. Retrieve Story Text & Image File IDs (generate_visual_story function):
    • Once the run is complete, client.beta.threads.messages.list(...) retrieves all messages from the thread (using order="asc" to get them chronologically).
    • The code iterates through the assistant's messages.
    • For content of type="text", it prints the scene description.
    • For content of type="image_file", it prints the file_id associated with the generated image. This ID is the key to getting the actual image.
  8. Retrieve Image Content (save_image_from_file_id function):
    • This is the crucial step added. For each file_id obtained in the previous step, this helper function is called.
    • It uses client.files.retrieve_content(file_id) to fetch the raw binary data of the image.
    • It then saves this binary data into a .png file in a specified output directory (story_images), including a timestamp and the file ID in the filename for uniqueness.

Optional Extensions

  • Add page numbers or captions based on the scene_count.
  • Modify the ASSISTANT_INSTRUCTIONS to accept a visual style parameter (e.g., "Generate images in a watercolor style").
  • Implement logic to load a previous thread_id to continue or remix stories.
  • Use a library like ReportLab or HTML generation to create a formatted PDF or web page output.

What You’ve Built

By running this script, you'll have executed a complete AI storytelling engine that:

  • Accepts a natural language prompt.
  • Uses GPT-4o to generate sequential scenes with descriptions.
  • Invokes the DALL·E 3 tool to render matching illustrations.
  • Retrieves both text descriptions and actual image files.
  • Builds the components of a visually compelling narrative experience.

This project effectively combines multimodal handling (text prompt -> text + image generation), tool chaining within the Assistants API, asynchronous operation handling, and file retrieval into one practical and fun application.