Code icon

The App is Under a Quick Maintenance

We apologize for the inconvenience. Please come back later

Menu iconMenu iconOpenAI API Bible Volume 2
OpenAI API Bible Volume 2

Chapter 5: Image and Audio Integration Projects

5.4 Audio Sentiment Analysis with OpenAI

In this section, we'll explore how to build a sophisticated Flask web application that performs sentiment analysis on audio content using OpenAI's powerful APIs. This integration combines two key technologies: first, the application uses OpenAI's Whisper API to convert spoken words into written text through accurate transcription. Then, it leverages OpenAI's language models to analyze the emotional tone and sentiment of the transcribed content.

The process flow is straightforward yet powerful: users upload an audio file, which gets transcribed into text, and then the application applies natural language processing to determine whether the speaker's message conveys positive, negative, or neutral sentiment. This dual-step analysis provides valuable insights into the emotional content of spoken communications, making it useful for various applications like customer feedback analysis, market research, and content moderation.

5.4.1 What You’ll Build

The web application provides a comprehensive audio analysis solution with several key functionalities. When users interact with the platform, it performs the following sequence of operations:

  1. Receive and Process Audio Uploads: The application features a sophisticated web interface that handles audio file uploads with extensive format support. Users can submit files in popular formats like MP3 (ideal for compressed audio), WAV (perfect for high-quality uncompressed audio), and M4A (optimized for voice recordings). The interface includes file validation, size checks, and format verification to ensure smooth processing.
  2. Secure File Management System: Upon receiving an upload, the application implements a robust temporary storage system. Files are stored in a secure directory with proper access controls, utilizing automatic file cleanup protocols to prevent storage overflow. The system includes error handling mechanisms for failed uploads, corrupt files, and timeout scenarios, ensuring reliable operation even under heavy load.
  3. Advanced Audio Transcription: The integration with OpenAI's Whisper API provides state-of-the-art speech recognition capabilities. This sophisticated model excels at handling various accents, dialects, and background noise conditions, delivering accurate transcriptions across multiple languages. The system processes audio in chunks for optimal performance and includes progress tracking for longer files.
  4. Comprehensive Sentiment Analysis: The application harnesses GPT-4's advanced natural language processing through the Chat Completion API. This analysis goes beyond basic positive/negative classification, examining contextual clues, emotional undertones, and linguistic nuances. The system considers factors like tone, intensity, and semantic context to provide nuanced sentiment understanding.
  5. User-Friendly Results Interface: The application presents analysis results through an intuitive, well-designed interface. Users receive both the full transcription and a detailed sentiment breakdown, with clear visual indicators for different emotional categories. The interface includes options for downloading results, sharing analysis reports, and viewing historical analyses when applicable.

This groundbreaking combination of audio transcription and language analysis technologies revolutionizes how we process and understand spoken content. The applications span across multiple industries, offering unprecedented insights and efficiency improvements:

  • Analyzing customer feedback from voice recordings - This application transforms how businesses handle customer interactions. Call centers and customer service departments can now automatically process thousands of calls to:
    • Track customer satisfaction trends over time
    • Identify specific pain points in customer experiences
    • Generate actionable insights for service improvement
    • Train customer service representatives more effectively
  • Detecting emotional tones in spoken content for media analysis - This capability provides media companies with sophisticated tools for content evaluation:
    • Measure audience emotional engagement throughout content
    • Analyze speaker authenticity and credibility
    • Ensure brand message consistency across different media
    • Optimize content for maximum emotional impact
  • Assessing the sentiment of audio content in podcasts or interviews - This feature revolutionizes content analysis by:
    • Processing hours of content in minutes
    • Identifying key moments of emotional significance
    • Tracking sentiment changes throughout discussions
    • Enabling data-driven content strategy decisions

Technologies Used:

  • Flask: A Python web framework.
  • OpenAI API:
    • Whisper for audio transcription.
    • GPT-4 (or similar) for sentiment analysis.
  • HTML: To structure the web page.
  • CSS: To style the web page.

Project Structure:

The project will have the following file structure:

/audio_sentiment_analyzer

├── app.py
├── .env
└── templates/
    └── index.html
  • app.py: The Python file containing the Flask application code.
  • .env: A file to store the OpenAI API key.
  • templates/: A directory to store the HTML templates.
  • templates/index.html: The HTML template for the main page.

5.4.2 Step-by-Step Implementation

Step 1: Install Required Packages

Download the sample audio file: https://files.cuantum.tech/audio/someone-speaking.mp3

Install the necessary Python libraries:

pip install flask openai python-dotenv

Step 2: Set Up Environment Variables

Create a .env file in your project directory and add your OpenAI API key:

OPENAI_API_KEY=YOUR_OPENAI_API_KEY

Step 3: Create the Flask App (app.py)

Create a Python file named app.py and add the following code:

from flask import Flask, request, render_template, jsonify
import openai
import os
from dotenv import load_dotenv
import logging
from typing import Optional, Dict

load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

app = Flask(__name__)

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

ALLOWED_EXTENSIONS = {'mp3', 'mp4', 'wav', 'm4a'}  # Allowed audio file extensions


def allowed_file(filename: str) -> bool:
    """
    Checks if the uploaded file has an allowed extension.

    Args:
        filename (str): The name of the file.

    Returns:
        bool: True if the file has an allowed extension, False otherwise.
    """
    return '.' in filename and filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS


def transcribe_audio(file_path: str) -> Optional[str]:
    """
    Transcribes an audio file using OpenAI's Whisper API.

    Args:
        file_path (str): The path to the audio file.

    Returns:
        Optional[str]: The transcribed text, or None on error.
    """
    try:
        logger.info(f"Transcribing audio file: {file_path}")
        audio_file = open(file_path, "rb")
        response = openai.Audio.transcriptions.create(
            model="whisper-1",
            file=audio_file,
        )
        transcript = response.text
        logger.info(f"Transcription successful. Length: {len(transcript)} characters.")
        return transcript
    except openai.error.OpenAIError as e:
        logger.error(f"OpenAI API Error: {e}")
        return None
    except Exception as e:
        logger.error(f"Error during transcription: {e}")
        return None



def analyze_sentiment(text: str) -> Optional[str]:
    """
    Analyzes the sentiment of a given text using OpenAI's Chat Completion API.

    Args:
        text (str): The text to analyze.

    Returns:
        Optional[str]: The sentiment analysis result, or None on error.
    """
    try:
        logger.info("Analyzing sentiment of transcribed text.")
        response = openai.chat.completions.create(
            model="gpt-4",  # Or another suitable chat model
            messages=[
                {
                    "role": "system",
                    "content": "You are a sentiment analysis expert. Provide a concise sentiment analysis of the text. Your response should be one of the following: 'Positive', 'Negative', or 'Neutral'.",
                },
                {"role": "user", "content": text},
            ],
            temperature=0.2,  # Keep the output focused
            max_tokens=20
        )
        sentiment = response.choices[0].message.content
        logger.info(f"Sentiment analysis result: {sentiment}")
        return sentiment
    except openai.error.OpenAIError as e:
        logger.error(f"OpenAI API Error: {e}")
        return None
    except Exception as e:
        logger.error(f"Error during sentiment analysis: {e}")
        return None



@app.route("/", methods=["GET", "POST"])
def index():
    """
    Handles the main route for the web application.
    Allows users to upload an audio file, transcribes it, and analyzes the sentiment.
    """
    sentiment = None
    error_message = None

    if request.method == "POST":
        if 'audio_file' not in request.files:
            error_message = "No file part"
            logger.warning(error_message)
            return render_template("index.html", error=error_message)
        file = request.files['audio_file']
        if file.filename == '':
            error_message = "No file selected"
            logger.warning(error_message)
            return render_template("index.html", error=error_message)

        if file and allowed_file(file.filename):
            try:
                # Securely save the uploaded file to a temporary location
                temp_file_path = os.path.join(app.root_path, "temp_audio." + file.filename.rsplit('.', 1)[1].lower())
                file.save(temp_file_path)

                transcript = transcribe_audio(temp_file_path)  # Transcribe the audio
                if not transcript:
                    error_message = "Transcription failed. Please try again."
                    return render_template("index.html", error=error_message)

                sentiment = analyze_sentiment(transcript)
                if not sentiment:
                    error_message = "Sentiment analysis failed. Please try again."
                    return render_template("index.html", error=error_message)

                # Optionally, delete the temporary file after processing
                os.remove(temp_file_path)
            except Exception as e:
                error_message = f"An error occurred: {e}"
                logger.error(error_message)
                return render_template("index.html", error=error_message)
        else:
            error_message = "Invalid file type. Please upload a valid audio file (MP3, MP4, WAV, M4A)."
            logger.warning(error_message)
            return render_template("index.html", error=error_message)

    return render_template("index.html", sentiment=sentiment, error=error_message)

@app.errorhandler(500)
def internal_server_error(e):
    """Handles internal server errors."""
    logger.error(f"Internal Server Error: {e}")
    return render_template("error.html", error="Internal Server Error"), 500

if __name__ == "__main__":
    app.run(debug=True)

Code Breakdown:

  • Import Statements: Imports the necessary Flask modules, OpenAI library, osdotenvlogging, and Optional and Dict for type hinting.
  • Environment Variables: Loads the OpenAI API key from the .env file.
  • Flask Application: Creates a Flask application instance.
  • Logging Configuration: Configures logging for the application.
  • allowed_file Function: Checks if the uploaded file has an allowed audio extension (MP3, MP4, WAV, M4A).
  • transcribe_audio Function: Transcribes an audio file using OpenAI's Whisper API. It logs the file path and any errors during transcription.
  • analyze_sentiment Function:
    • def analyze_sentiment(text: str) -> Optional[str]:: Defines a function to analyze the sentiment of a text using OpenAI's Chat Completion API.
    • It takes the transcribed text as input.
    • It sends a request to the Chat Completion API with a system message instructing the model to perform sentiment analysis. The temperature is set to 0.2 to make the output more focused, and max_tokens is limited to 20 to keep the response concise.
    • It extracts the sentiment from the API response.
    • It logs the sentiment analysis result.
    • It includes error handling for OpenAI API errors and other exceptions.
  • index Route:
    • Handles both GET and POST requests.
    • For GET requests, it renders the initial HTML page.
    • For POST requests (when the user uploads an audio file):
      • It validates the uploaded file.
      • It saves the file temporarily.
      • It calls transcribe_audio() to transcribe the audio.
      • It calls analyze_sentiment() to analyze the transcribed text.
      • It renders the HTML template, passing the sentiment analysis result or any error messages.
  • @app.errorhandler(500): Handles internal server errors by logging the error and rendering a user-friendly error page.
  • if __name__ == "__main__":: Starts the Flask development server if the script is executed directly.

Step 4: Create the HTML Template (templates/index.html)

Create a folder named templates in the same directory as app.py.  Inside the templates folder, create a file named index.html with the following HTML code:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Audio Sentiment Analyzer</title>
    <link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;600;700&display=swap" rel="stylesheet">
    <style>
        /* --- General Styles --- */
        body {
            font-family: 'Inter', sans-serif;
            padding: 40px;
            background-color: #f9fafb;
            display: flex;
            justify-content: center;
            align-items: center;
            min-height: 100vh;
            margin: 0;
            color: #374151;
        }
        .container {
            max-width: 800px;
            width: 95%;
            background-color: #fff;
            padding: 2rem;
            border-radius: 0.75rem;
            box-shadow: 0 10px 25px -5px rgba(0, 0, 0, 0.1), 0 8px 10px -6px rgba(0, 0, 0, 0.05);
            text-align: center;
        }
        h2 {
            font-size: 2.25rem;
            font-weight: 600;
            margin-bottom: 1.5rem;
            color: #1e293b;
        }
        p {
            color: #6b7280;
            margin-bottom: 1rem;
        }
        /* --- Form Styles --- */
        form {
            margin-top: 1rem;
            display: flex;
            flex-direction: column;
            align-items: center;
            gap: 0.5rem;
        }
        label {
            font-size: 1rem;
            font-weight: 600;
            color: #4b5563;
            margin-bottom: 0.25rem;
            display: block;
            text-align: left;
            width: 100%;
            max-width: 400px;
            margin-left: auto;
            margin-right: auto;
        }
        input[type="file"] {
            width: 100%;
            max-width: 400px;
            padding: 0.75rem;
            border-radius: 0.5rem;
            border: 1px solid #d1d5db;
            font-size: 1rem;
            margin-bottom: 0.25rem;
            margin-left: auto;
            margin-right: auto;
        }
        input[type="submit"] {
            padding: 0.75rem 1.5rem;
            border-radius: 0.5rem;
            background-color: #4f46e5;
            color: #fff;
            font-size: 1rem;
            font-weight: 600;
            cursor: pointer;
            transition: background-color 0.3s ease;
            border: none;
            box-shadow: 0 2px 5px rgba(0, 0, 0, 0.2);
            margin-top: 1rem;
        }
        input[type="submit"]:hover {
            background-color: #4338ca;
        }
        input[type="submit"]:focus {
            outline: none;
            box-shadow: 0 0 0 3px rgba(79, 70, 229, 0.3);
        }
        /* --- Result Styles --- */
        .result-container {
            margin-top: 2rem;
            border: 1px solid #e5e7eb;
            border-radius: 0.5rem;
            padding: 1rem;
            background-color: #f8fafc;
        }

        .result-title{
            font-size: 1.25rem;
            font-weight: 600;
            color: #1e293b;
            margin-bottom: 0.75rem;
        }
        .sentiment-positive {
            color: #16a34a;
            font-weight: 600;
        }
        .sentiment-negative {
            color: #dc2626;
            font-weight: 600;
        }
        .sentiment-neutral {
            color: #71717a;
            font-weight: 600;
        }
        /* --- Error Styles --- */
        .error-message {
            color: #dc2626;
            margin-top: 1rem;
            padding: 0.75rem;
            background-color: #fee2e2;
            border-radius: 0.375rem;
            border: 1px solid #fecaca;
            text-align: center;
        }
        /* --- Responsive Adjustments --- */
        @media (max-width: 768px) {
            .container {
                padding: 20px;
            }
            form {
                gap: 1rem;
            }
            input[type="file"] {
                max-width: 100%;
            }
        }
    </style>
</head>
<body>
    <div class="container">
        <h2>🎙️ Audio Sentiment Analyzer</h2>
        <p> Upload an audio file to analyze the sentiment of the spoken content. Supported formats: MP3, MP4, WAV, M4A </p>
        <form method="POST" enctype="multipart/form-data">
            <label for="audio_file">Upload an audio file:</label><br>
            <input type="file" name="audio_file" accept="audio/*" required><br><br>
            <input type="submit" value="Analyze Sentiment">
        </form>

        {% if sentiment %}
            <div class="result-container">
                <h3 class = "result-title">Sentiment Analysis Result:</h3>
                <p class="sentiment-{{ sentiment.lower() }}"> {{ sentiment }} </p>
            </div>
        {% endif %}
        {% if error %}
            <div class="error-message">{{ error }}</div>
        {% endif %}
    </div>
</body>
</html>

Key elements in the HTML template:

  • HTML Structure:
    • The <head> section defines the title, links a CSS stylesheet, and sets the viewport for responsiveness.
    • The <body> contains the visible content, including a form for uploading audio and a section to display the sentiment analysis result.
  • CSS Styling:
    • Modern Design: The CSS is updated to use a modern design.
    • Responsive Layout: The layout is more responsive, especially for smaller screens.
    • User Experience: Improved form and input styling for better usability.
    • Clear Error Display: Error messages are styled to be clearly visible.
    • Sentiment indication: the colors of the results change depending on the returned sentiment.
  • Form:
    • <form> with enctype="multipart/form-data" is used to handle file uploads.
    • <label> and <input type="file"> allow the user to select an audio file. The accept="audio/*" attribute restricts the user to uploading audio files.
    • <input type="submit"> button allows the user to submit the form.
  • Sentiment Display:
    • <div class="result-container"> is used to display the sentiment analysis result. The displayed sentiment will have its color changed depending on the result.
  • Error Handling:
    • <div class="error-message"> is used to display any error messages to the user.

Try It Out

  1. Save the files as app.py and templates/index.html.
  2. Ensure you have your OpenAI API key in the .env file.
  3. Run the application:
    python app.py
  4. Open http://localhost:5000 in your browser.
  5. Upload an audio file (e.g., a recording of someone speaking or the provided sample .mp3 file).
  6. View the sentiment analysis result displayed on the page.

5.4 Audio Sentiment Analysis with OpenAI

In this section, we'll explore how to build a sophisticated Flask web application that performs sentiment analysis on audio content using OpenAI's powerful APIs. This integration combines two key technologies: first, the application uses OpenAI's Whisper API to convert spoken words into written text through accurate transcription. Then, it leverages OpenAI's language models to analyze the emotional tone and sentiment of the transcribed content.

The process flow is straightforward yet powerful: users upload an audio file, which gets transcribed into text, and then the application applies natural language processing to determine whether the speaker's message conveys positive, negative, or neutral sentiment. This dual-step analysis provides valuable insights into the emotional content of spoken communications, making it useful for various applications like customer feedback analysis, market research, and content moderation.

5.4.1 What You’ll Build

The web application provides a comprehensive audio analysis solution with several key functionalities. When users interact with the platform, it performs the following sequence of operations:

  1. Receive and Process Audio Uploads: The application features a sophisticated web interface that handles audio file uploads with extensive format support. Users can submit files in popular formats like MP3 (ideal for compressed audio), WAV (perfect for high-quality uncompressed audio), and M4A (optimized for voice recordings). The interface includes file validation, size checks, and format verification to ensure smooth processing.
  2. Secure File Management System: Upon receiving an upload, the application implements a robust temporary storage system. Files are stored in a secure directory with proper access controls, utilizing automatic file cleanup protocols to prevent storage overflow. The system includes error handling mechanisms for failed uploads, corrupt files, and timeout scenarios, ensuring reliable operation even under heavy load.
  3. Advanced Audio Transcription: The integration with OpenAI's Whisper API provides state-of-the-art speech recognition capabilities. This sophisticated model excels at handling various accents, dialects, and background noise conditions, delivering accurate transcriptions across multiple languages. The system processes audio in chunks for optimal performance and includes progress tracking for longer files.
  4. Comprehensive Sentiment Analysis: The application harnesses GPT-4's advanced natural language processing through the Chat Completion API. This analysis goes beyond basic positive/negative classification, examining contextual clues, emotional undertones, and linguistic nuances. The system considers factors like tone, intensity, and semantic context to provide nuanced sentiment understanding.
  5. User-Friendly Results Interface: The application presents analysis results through an intuitive, well-designed interface. Users receive both the full transcription and a detailed sentiment breakdown, with clear visual indicators for different emotional categories. The interface includes options for downloading results, sharing analysis reports, and viewing historical analyses when applicable.

This groundbreaking combination of audio transcription and language analysis technologies revolutionizes how we process and understand spoken content. The applications span across multiple industries, offering unprecedented insights and efficiency improvements:

  • Analyzing customer feedback from voice recordings - This application transforms how businesses handle customer interactions. Call centers and customer service departments can now automatically process thousands of calls to:
    • Track customer satisfaction trends over time
    • Identify specific pain points in customer experiences
    • Generate actionable insights for service improvement
    • Train customer service representatives more effectively
  • Detecting emotional tones in spoken content for media analysis - This capability provides media companies with sophisticated tools for content evaluation:
    • Measure audience emotional engagement throughout content
    • Analyze speaker authenticity and credibility
    • Ensure brand message consistency across different media
    • Optimize content for maximum emotional impact
  • Assessing the sentiment of audio content in podcasts or interviews - This feature revolutionizes content analysis by:
    • Processing hours of content in minutes
    • Identifying key moments of emotional significance
    • Tracking sentiment changes throughout discussions
    • Enabling data-driven content strategy decisions

Technologies Used:

  • Flask: A Python web framework.
  • OpenAI API:
    • Whisper for audio transcription.
    • GPT-4 (or similar) for sentiment analysis.
  • HTML: To structure the web page.
  • CSS: To style the web page.

Project Structure:

The project will have the following file structure:

/audio_sentiment_analyzer

├── app.py
├── .env
└── templates/
    └── index.html
  • app.py: The Python file containing the Flask application code.
  • .env: A file to store the OpenAI API key.
  • templates/: A directory to store the HTML templates.
  • templates/index.html: The HTML template for the main page.

5.4.2 Step-by-Step Implementation

Step 1: Install Required Packages

Download the sample audio file: https://files.cuantum.tech/audio/someone-speaking.mp3

Install the necessary Python libraries:

pip install flask openai python-dotenv

Step 2: Set Up Environment Variables

Create a .env file in your project directory and add your OpenAI API key:

OPENAI_API_KEY=YOUR_OPENAI_API_KEY

Step 3: Create the Flask App (app.py)

Create a Python file named app.py and add the following code:

from flask import Flask, request, render_template, jsonify
import openai
import os
from dotenv import load_dotenv
import logging
from typing import Optional, Dict

load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

app = Flask(__name__)

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

ALLOWED_EXTENSIONS = {'mp3', 'mp4', 'wav', 'm4a'}  # Allowed audio file extensions


def allowed_file(filename: str) -> bool:
    """
    Checks if the uploaded file has an allowed extension.

    Args:
        filename (str): The name of the file.

    Returns:
        bool: True if the file has an allowed extension, False otherwise.
    """
    return '.' in filename and filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS


def transcribe_audio(file_path: str) -> Optional[str]:
    """
    Transcribes an audio file using OpenAI's Whisper API.

    Args:
        file_path (str): The path to the audio file.

    Returns:
        Optional[str]: The transcribed text, or None on error.
    """
    try:
        logger.info(f"Transcribing audio file: {file_path}")
        audio_file = open(file_path, "rb")
        response = openai.Audio.transcriptions.create(
            model="whisper-1",
            file=audio_file,
        )
        transcript = response.text
        logger.info(f"Transcription successful. Length: {len(transcript)} characters.")
        return transcript
    except openai.error.OpenAIError as e:
        logger.error(f"OpenAI API Error: {e}")
        return None
    except Exception as e:
        logger.error(f"Error during transcription: {e}")
        return None



def analyze_sentiment(text: str) -> Optional[str]:
    """
    Analyzes the sentiment of a given text using OpenAI's Chat Completion API.

    Args:
        text (str): The text to analyze.

    Returns:
        Optional[str]: The sentiment analysis result, or None on error.
    """
    try:
        logger.info("Analyzing sentiment of transcribed text.")
        response = openai.chat.completions.create(
            model="gpt-4",  # Or another suitable chat model
            messages=[
                {
                    "role": "system",
                    "content": "You are a sentiment analysis expert. Provide a concise sentiment analysis of the text. Your response should be one of the following: 'Positive', 'Negative', or 'Neutral'.",
                },
                {"role": "user", "content": text},
            ],
            temperature=0.2,  # Keep the output focused
            max_tokens=20
        )
        sentiment = response.choices[0].message.content
        logger.info(f"Sentiment analysis result: {sentiment}")
        return sentiment
    except openai.error.OpenAIError as e:
        logger.error(f"OpenAI API Error: {e}")
        return None
    except Exception as e:
        logger.error(f"Error during sentiment analysis: {e}")
        return None



@app.route("/", methods=["GET", "POST"])
def index():
    """
    Handles the main route for the web application.
    Allows users to upload an audio file, transcribes it, and analyzes the sentiment.
    """
    sentiment = None
    error_message = None

    if request.method == "POST":
        if 'audio_file' not in request.files:
            error_message = "No file part"
            logger.warning(error_message)
            return render_template("index.html", error=error_message)
        file = request.files['audio_file']
        if file.filename == '':
            error_message = "No file selected"
            logger.warning(error_message)
            return render_template("index.html", error=error_message)

        if file and allowed_file(file.filename):
            try:
                # Securely save the uploaded file to a temporary location
                temp_file_path = os.path.join(app.root_path, "temp_audio." + file.filename.rsplit('.', 1)[1].lower())
                file.save(temp_file_path)

                transcript = transcribe_audio(temp_file_path)  # Transcribe the audio
                if not transcript:
                    error_message = "Transcription failed. Please try again."
                    return render_template("index.html", error=error_message)

                sentiment = analyze_sentiment(transcript)
                if not sentiment:
                    error_message = "Sentiment analysis failed. Please try again."
                    return render_template("index.html", error=error_message)

                # Optionally, delete the temporary file after processing
                os.remove(temp_file_path)
            except Exception as e:
                error_message = f"An error occurred: {e}"
                logger.error(error_message)
                return render_template("index.html", error=error_message)
        else:
            error_message = "Invalid file type. Please upload a valid audio file (MP3, MP4, WAV, M4A)."
            logger.warning(error_message)
            return render_template("index.html", error=error_message)

    return render_template("index.html", sentiment=sentiment, error=error_message)

@app.errorhandler(500)
def internal_server_error(e):
    """Handles internal server errors."""
    logger.error(f"Internal Server Error: {e}")
    return render_template("error.html", error="Internal Server Error"), 500

if __name__ == "__main__":
    app.run(debug=True)

Code Breakdown:

  • Import Statements: Imports the necessary Flask modules, OpenAI library, osdotenvlogging, and Optional and Dict for type hinting.
  • Environment Variables: Loads the OpenAI API key from the .env file.
  • Flask Application: Creates a Flask application instance.
  • Logging Configuration: Configures logging for the application.
  • allowed_file Function: Checks if the uploaded file has an allowed audio extension (MP3, MP4, WAV, M4A).
  • transcribe_audio Function: Transcribes an audio file using OpenAI's Whisper API. It logs the file path and any errors during transcription.
  • analyze_sentiment Function:
    • def analyze_sentiment(text: str) -> Optional[str]:: Defines a function to analyze the sentiment of a text using OpenAI's Chat Completion API.
    • It takes the transcribed text as input.
    • It sends a request to the Chat Completion API with a system message instructing the model to perform sentiment analysis. The temperature is set to 0.2 to make the output more focused, and max_tokens is limited to 20 to keep the response concise.
    • It extracts the sentiment from the API response.
    • It logs the sentiment analysis result.
    • It includes error handling for OpenAI API errors and other exceptions.
  • index Route:
    • Handles both GET and POST requests.
    • For GET requests, it renders the initial HTML page.
    • For POST requests (when the user uploads an audio file):
      • It validates the uploaded file.
      • It saves the file temporarily.
      • It calls transcribe_audio() to transcribe the audio.
      • It calls analyze_sentiment() to analyze the transcribed text.
      • It renders the HTML template, passing the sentiment analysis result or any error messages.
  • @app.errorhandler(500): Handles internal server errors by logging the error and rendering a user-friendly error page.
  • if __name__ == "__main__":: Starts the Flask development server if the script is executed directly.

Step 4: Create the HTML Template (templates/index.html)

Create a folder named templates in the same directory as app.py.  Inside the templates folder, create a file named index.html with the following HTML code:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Audio Sentiment Analyzer</title>
    <link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;600;700&display=swap" rel="stylesheet">
    <style>
        /* --- General Styles --- */
        body {
            font-family: 'Inter', sans-serif;
            padding: 40px;
            background-color: #f9fafb;
            display: flex;
            justify-content: center;
            align-items: center;
            min-height: 100vh;
            margin: 0;
            color: #374151;
        }
        .container {
            max-width: 800px;
            width: 95%;
            background-color: #fff;
            padding: 2rem;
            border-radius: 0.75rem;
            box-shadow: 0 10px 25px -5px rgba(0, 0, 0, 0.1), 0 8px 10px -6px rgba(0, 0, 0, 0.05);
            text-align: center;
        }
        h2 {
            font-size: 2.25rem;
            font-weight: 600;
            margin-bottom: 1.5rem;
            color: #1e293b;
        }
        p {
            color: #6b7280;
            margin-bottom: 1rem;
        }
        /* --- Form Styles --- */
        form {
            margin-top: 1rem;
            display: flex;
            flex-direction: column;
            align-items: center;
            gap: 0.5rem;
        }
        label {
            font-size: 1rem;
            font-weight: 600;
            color: #4b5563;
            margin-bottom: 0.25rem;
            display: block;
            text-align: left;
            width: 100%;
            max-width: 400px;
            margin-left: auto;
            margin-right: auto;
        }
        input[type="file"] {
            width: 100%;
            max-width: 400px;
            padding: 0.75rem;
            border-radius: 0.5rem;
            border: 1px solid #d1d5db;
            font-size: 1rem;
            margin-bottom: 0.25rem;
            margin-left: auto;
            margin-right: auto;
        }
        input[type="submit"] {
            padding: 0.75rem 1.5rem;
            border-radius: 0.5rem;
            background-color: #4f46e5;
            color: #fff;
            font-size: 1rem;
            font-weight: 600;
            cursor: pointer;
            transition: background-color 0.3s ease;
            border: none;
            box-shadow: 0 2px 5px rgba(0, 0, 0, 0.2);
            margin-top: 1rem;
        }
        input[type="submit"]:hover {
            background-color: #4338ca;
        }
        input[type="submit"]:focus {
            outline: none;
            box-shadow: 0 0 0 3px rgba(79, 70, 229, 0.3);
        }
        /* --- Result Styles --- */
        .result-container {
            margin-top: 2rem;
            border: 1px solid #e5e7eb;
            border-radius: 0.5rem;
            padding: 1rem;
            background-color: #f8fafc;
        }

        .result-title{
            font-size: 1.25rem;
            font-weight: 600;
            color: #1e293b;
            margin-bottom: 0.75rem;
        }
        .sentiment-positive {
            color: #16a34a;
            font-weight: 600;
        }
        .sentiment-negative {
            color: #dc2626;
            font-weight: 600;
        }
        .sentiment-neutral {
            color: #71717a;
            font-weight: 600;
        }
        /* --- Error Styles --- */
        .error-message {
            color: #dc2626;
            margin-top: 1rem;
            padding: 0.75rem;
            background-color: #fee2e2;
            border-radius: 0.375rem;
            border: 1px solid #fecaca;
            text-align: center;
        }
        /* --- Responsive Adjustments --- */
        @media (max-width: 768px) {
            .container {
                padding: 20px;
            }
            form {
                gap: 1rem;
            }
            input[type="file"] {
                max-width: 100%;
            }
        }
    </style>
</head>
<body>
    <div class="container">
        <h2>🎙️ Audio Sentiment Analyzer</h2>
        <p> Upload an audio file to analyze the sentiment of the spoken content. Supported formats: MP3, MP4, WAV, M4A </p>
        <form method="POST" enctype="multipart/form-data">
            <label for="audio_file">Upload an audio file:</label><br>
            <input type="file" name="audio_file" accept="audio/*" required><br><br>
            <input type="submit" value="Analyze Sentiment">
        </form>

        {% if sentiment %}
            <div class="result-container">
                <h3 class = "result-title">Sentiment Analysis Result:</h3>
                <p class="sentiment-{{ sentiment.lower() }}"> {{ sentiment }} </p>
            </div>
        {% endif %}
        {% if error %}
            <div class="error-message">{{ error }}</div>
        {% endif %}
    </div>
</body>
</html>

Key elements in the HTML template:

  • HTML Structure:
    • The <head> section defines the title, links a CSS stylesheet, and sets the viewport for responsiveness.
    • The <body> contains the visible content, including a form for uploading audio and a section to display the sentiment analysis result.
  • CSS Styling:
    • Modern Design: The CSS is updated to use a modern design.
    • Responsive Layout: The layout is more responsive, especially for smaller screens.
    • User Experience: Improved form and input styling for better usability.
    • Clear Error Display: Error messages are styled to be clearly visible.
    • Sentiment indication: the colors of the results change depending on the returned sentiment.
  • Form:
    • <form> with enctype="multipart/form-data" is used to handle file uploads.
    • <label> and <input type="file"> allow the user to select an audio file. The accept="audio/*" attribute restricts the user to uploading audio files.
    • <input type="submit"> button allows the user to submit the form.
  • Sentiment Display:
    • <div class="result-container"> is used to display the sentiment analysis result. The displayed sentiment will have its color changed depending on the result.
  • Error Handling:
    • <div class="error-message"> is used to display any error messages to the user.

Try It Out

  1. Save the files as app.py and templates/index.html.
  2. Ensure you have your OpenAI API key in the .env file.
  3. Run the application:
    python app.py
  4. Open http://localhost:5000 in your browser.
  5. Upload an audio file (e.g., a recording of someone speaking or the provided sample .mp3 file).
  6. View the sentiment analysis result displayed on the page.

5.4 Audio Sentiment Analysis with OpenAI

In this section, we'll explore how to build a sophisticated Flask web application that performs sentiment analysis on audio content using OpenAI's powerful APIs. This integration combines two key technologies: first, the application uses OpenAI's Whisper API to convert spoken words into written text through accurate transcription. Then, it leverages OpenAI's language models to analyze the emotional tone and sentiment of the transcribed content.

The process flow is straightforward yet powerful: users upload an audio file, which gets transcribed into text, and then the application applies natural language processing to determine whether the speaker's message conveys positive, negative, or neutral sentiment. This dual-step analysis provides valuable insights into the emotional content of spoken communications, making it useful for various applications like customer feedback analysis, market research, and content moderation.

5.4.1 What You’ll Build

The web application provides a comprehensive audio analysis solution with several key functionalities. When users interact with the platform, it performs the following sequence of operations:

  1. Receive and Process Audio Uploads: The application features a sophisticated web interface that handles audio file uploads with extensive format support. Users can submit files in popular formats like MP3 (ideal for compressed audio), WAV (perfect for high-quality uncompressed audio), and M4A (optimized for voice recordings). The interface includes file validation, size checks, and format verification to ensure smooth processing.
  2. Secure File Management System: Upon receiving an upload, the application implements a robust temporary storage system. Files are stored in a secure directory with proper access controls, utilizing automatic file cleanup protocols to prevent storage overflow. The system includes error handling mechanisms for failed uploads, corrupt files, and timeout scenarios, ensuring reliable operation even under heavy load.
  3. Advanced Audio Transcription: The integration with OpenAI's Whisper API provides state-of-the-art speech recognition capabilities. This sophisticated model excels at handling various accents, dialects, and background noise conditions, delivering accurate transcriptions across multiple languages. The system processes audio in chunks for optimal performance and includes progress tracking for longer files.
  4. Comprehensive Sentiment Analysis: The application harnesses GPT-4's advanced natural language processing through the Chat Completion API. This analysis goes beyond basic positive/negative classification, examining contextual clues, emotional undertones, and linguistic nuances. The system considers factors like tone, intensity, and semantic context to provide nuanced sentiment understanding.
  5. User-Friendly Results Interface: The application presents analysis results through an intuitive, well-designed interface. Users receive both the full transcription and a detailed sentiment breakdown, with clear visual indicators for different emotional categories. The interface includes options for downloading results, sharing analysis reports, and viewing historical analyses when applicable.

This groundbreaking combination of audio transcription and language analysis technologies revolutionizes how we process and understand spoken content. The applications span across multiple industries, offering unprecedented insights and efficiency improvements:

  • Analyzing customer feedback from voice recordings - This application transforms how businesses handle customer interactions. Call centers and customer service departments can now automatically process thousands of calls to:
    • Track customer satisfaction trends over time
    • Identify specific pain points in customer experiences
    • Generate actionable insights for service improvement
    • Train customer service representatives more effectively
  • Detecting emotional tones in spoken content for media analysis - This capability provides media companies with sophisticated tools for content evaluation:
    • Measure audience emotional engagement throughout content
    • Analyze speaker authenticity and credibility
    • Ensure brand message consistency across different media
    • Optimize content for maximum emotional impact
  • Assessing the sentiment of audio content in podcasts or interviews - This feature revolutionizes content analysis by:
    • Processing hours of content in minutes
    • Identifying key moments of emotional significance
    • Tracking sentiment changes throughout discussions
    • Enabling data-driven content strategy decisions

Technologies Used:

  • Flask: A Python web framework.
  • OpenAI API:
    • Whisper for audio transcription.
    • GPT-4 (or similar) for sentiment analysis.
  • HTML: To structure the web page.
  • CSS: To style the web page.

Project Structure:

The project will have the following file structure:

/audio_sentiment_analyzer

├── app.py
├── .env
└── templates/
    └── index.html
  • app.py: The Python file containing the Flask application code.
  • .env: A file to store the OpenAI API key.
  • templates/: A directory to store the HTML templates.
  • templates/index.html: The HTML template for the main page.

5.4.2 Step-by-Step Implementation

Step 1: Install Required Packages

Download the sample audio file: https://files.cuantum.tech/audio/someone-speaking.mp3

Install the necessary Python libraries:

pip install flask openai python-dotenv

Step 2: Set Up Environment Variables

Create a .env file in your project directory and add your OpenAI API key:

OPENAI_API_KEY=YOUR_OPENAI_API_KEY

Step 3: Create the Flask App (app.py)

Create a Python file named app.py and add the following code:

from flask import Flask, request, render_template, jsonify
import openai
import os
from dotenv import load_dotenv
import logging
from typing import Optional, Dict

load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

app = Flask(__name__)

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

ALLOWED_EXTENSIONS = {'mp3', 'mp4', 'wav', 'm4a'}  # Allowed audio file extensions


def allowed_file(filename: str) -> bool:
    """
    Checks if the uploaded file has an allowed extension.

    Args:
        filename (str): The name of the file.

    Returns:
        bool: True if the file has an allowed extension, False otherwise.
    """
    return '.' in filename and filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS


def transcribe_audio(file_path: str) -> Optional[str]:
    """
    Transcribes an audio file using OpenAI's Whisper API.

    Args:
        file_path (str): The path to the audio file.

    Returns:
        Optional[str]: The transcribed text, or None on error.
    """
    try:
        logger.info(f"Transcribing audio file: {file_path}")
        audio_file = open(file_path, "rb")
        response = openai.Audio.transcriptions.create(
            model="whisper-1",
            file=audio_file,
        )
        transcript = response.text
        logger.info(f"Transcription successful. Length: {len(transcript)} characters.")
        return transcript
    except openai.error.OpenAIError as e:
        logger.error(f"OpenAI API Error: {e}")
        return None
    except Exception as e:
        logger.error(f"Error during transcription: {e}")
        return None



def analyze_sentiment(text: str) -> Optional[str]:
    """
    Analyzes the sentiment of a given text using OpenAI's Chat Completion API.

    Args:
        text (str): The text to analyze.

    Returns:
        Optional[str]: The sentiment analysis result, or None on error.
    """
    try:
        logger.info("Analyzing sentiment of transcribed text.")
        response = openai.chat.completions.create(
            model="gpt-4",  # Or another suitable chat model
            messages=[
                {
                    "role": "system",
                    "content": "You are a sentiment analysis expert. Provide a concise sentiment analysis of the text. Your response should be one of the following: 'Positive', 'Negative', or 'Neutral'.",
                },
                {"role": "user", "content": text},
            ],
            temperature=0.2,  # Keep the output focused
            max_tokens=20
        )
        sentiment = response.choices[0].message.content
        logger.info(f"Sentiment analysis result: {sentiment}")
        return sentiment
    except openai.error.OpenAIError as e:
        logger.error(f"OpenAI API Error: {e}")
        return None
    except Exception as e:
        logger.error(f"Error during sentiment analysis: {e}")
        return None



@app.route("/", methods=["GET", "POST"])
def index():
    """
    Handles the main route for the web application.
    Allows users to upload an audio file, transcribes it, and analyzes the sentiment.
    """
    sentiment = None
    error_message = None

    if request.method == "POST":
        if 'audio_file' not in request.files:
            error_message = "No file part"
            logger.warning(error_message)
            return render_template("index.html", error=error_message)
        file = request.files['audio_file']
        if file.filename == '':
            error_message = "No file selected"
            logger.warning(error_message)
            return render_template("index.html", error=error_message)

        if file and allowed_file(file.filename):
            try:
                # Securely save the uploaded file to a temporary location
                temp_file_path = os.path.join(app.root_path, "temp_audio." + file.filename.rsplit('.', 1)[1].lower())
                file.save(temp_file_path)

                transcript = transcribe_audio(temp_file_path)  # Transcribe the audio
                if not transcript:
                    error_message = "Transcription failed. Please try again."
                    return render_template("index.html", error=error_message)

                sentiment = analyze_sentiment(transcript)
                if not sentiment:
                    error_message = "Sentiment analysis failed. Please try again."
                    return render_template("index.html", error=error_message)

                # Optionally, delete the temporary file after processing
                os.remove(temp_file_path)
            except Exception as e:
                error_message = f"An error occurred: {e}"
                logger.error(error_message)
                return render_template("index.html", error=error_message)
        else:
            error_message = "Invalid file type. Please upload a valid audio file (MP3, MP4, WAV, M4A)."
            logger.warning(error_message)
            return render_template("index.html", error=error_message)

    return render_template("index.html", sentiment=sentiment, error=error_message)

@app.errorhandler(500)
def internal_server_error(e):
    """Handles internal server errors."""
    logger.error(f"Internal Server Error: {e}")
    return render_template("error.html", error="Internal Server Error"), 500

if __name__ == "__main__":
    app.run(debug=True)

Code Breakdown:

  • Import Statements: Imports the necessary Flask modules, OpenAI library, osdotenvlogging, and Optional and Dict for type hinting.
  • Environment Variables: Loads the OpenAI API key from the .env file.
  • Flask Application: Creates a Flask application instance.
  • Logging Configuration: Configures logging for the application.
  • allowed_file Function: Checks if the uploaded file has an allowed audio extension (MP3, MP4, WAV, M4A).
  • transcribe_audio Function: Transcribes an audio file using OpenAI's Whisper API. It logs the file path and any errors during transcription.
  • analyze_sentiment Function:
    • def analyze_sentiment(text: str) -> Optional[str]:: Defines a function to analyze the sentiment of a text using OpenAI's Chat Completion API.
    • It takes the transcribed text as input.
    • It sends a request to the Chat Completion API with a system message instructing the model to perform sentiment analysis. The temperature is set to 0.2 to make the output more focused, and max_tokens is limited to 20 to keep the response concise.
    • It extracts the sentiment from the API response.
    • It logs the sentiment analysis result.
    • It includes error handling for OpenAI API errors and other exceptions.
  • index Route:
    • Handles both GET and POST requests.
    • For GET requests, it renders the initial HTML page.
    • For POST requests (when the user uploads an audio file):
      • It validates the uploaded file.
      • It saves the file temporarily.
      • It calls transcribe_audio() to transcribe the audio.
      • It calls analyze_sentiment() to analyze the transcribed text.
      • It renders the HTML template, passing the sentiment analysis result or any error messages.
  • @app.errorhandler(500): Handles internal server errors by logging the error and rendering a user-friendly error page.
  • if __name__ == "__main__":: Starts the Flask development server if the script is executed directly.

Step 4: Create the HTML Template (templates/index.html)

Create a folder named templates in the same directory as app.py.  Inside the templates folder, create a file named index.html with the following HTML code:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Audio Sentiment Analyzer</title>
    <link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;600;700&display=swap" rel="stylesheet">
    <style>
        /* --- General Styles --- */
        body {
            font-family: 'Inter', sans-serif;
            padding: 40px;
            background-color: #f9fafb;
            display: flex;
            justify-content: center;
            align-items: center;
            min-height: 100vh;
            margin: 0;
            color: #374151;
        }
        .container {
            max-width: 800px;
            width: 95%;
            background-color: #fff;
            padding: 2rem;
            border-radius: 0.75rem;
            box-shadow: 0 10px 25px -5px rgba(0, 0, 0, 0.1), 0 8px 10px -6px rgba(0, 0, 0, 0.05);
            text-align: center;
        }
        h2 {
            font-size: 2.25rem;
            font-weight: 600;
            margin-bottom: 1.5rem;
            color: #1e293b;
        }
        p {
            color: #6b7280;
            margin-bottom: 1rem;
        }
        /* --- Form Styles --- */
        form {
            margin-top: 1rem;
            display: flex;
            flex-direction: column;
            align-items: center;
            gap: 0.5rem;
        }
        label {
            font-size: 1rem;
            font-weight: 600;
            color: #4b5563;
            margin-bottom: 0.25rem;
            display: block;
            text-align: left;
            width: 100%;
            max-width: 400px;
            margin-left: auto;
            margin-right: auto;
        }
        input[type="file"] {
            width: 100%;
            max-width: 400px;
            padding: 0.75rem;
            border-radius: 0.5rem;
            border: 1px solid #d1d5db;
            font-size: 1rem;
            margin-bottom: 0.25rem;
            margin-left: auto;
            margin-right: auto;
        }
        input[type="submit"] {
            padding: 0.75rem 1.5rem;
            border-radius: 0.5rem;
            background-color: #4f46e5;
            color: #fff;
            font-size: 1rem;
            font-weight: 600;
            cursor: pointer;
            transition: background-color 0.3s ease;
            border: none;
            box-shadow: 0 2px 5px rgba(0, 0, 0, 0.2);
            margin-top: 1rem;
        }
        input[type="submit"]:hover {
            background-color: #4338ca;
        }
        input[type="submit"]:focus {
            outline: none;
            box-shadow: 0 0 0 3px rgba(79, 70, 229, 0.3);
        }
        /* --- Result Styles --- */
        .result-container {
            margin-top: 2rem;
            border: 1px solid #e5e7eb;
            border-radius: 0.5rem;
            padding: 1rem;
            background-color: #f8fafc;
        }

        .result-title{
            font-size: 1.25rem;
            font-weight: 600;
            color: #1e293b;
            margin-bottom: 0.75rem;
        }
        .sentiment-positive {
            color: #16a34a;
            font-weight: 600;
        }
        .sentiment-negative {
            color: #dc2626;
            font-weight: 600;
        }
        .sentiment-neutral {
            color: #71717a;
            font-weight: 600;
        }
        /* --- Error Styles --- */
        .error-message {
            color: #dc2626;
            margin-top: 1rem;
            padding: 0.75rem;
            background-color: #fee2e2;
            border-radius: 0.375rem;
            border: 1px solid #fecaca;
            text-align: center;
        }
        /* --- Responsive Adjustments --- */
        @media (max-width: 768px) {
            .container {
                padding: 20px;
            }
            form {
                gap: 1rem;
            }
            input[type="file"] {
                max-width: 100%;
            }
        }
    </style>
</head>
<body>
    <div class="container">
        <h2>🎙️ Audio Sentiment Analyzer</h2>
        <p> Upload an audio file to analyze the sentiment of the spoken content. Supported formats: MP3, MP4, WAV, M4A </p>
        <form method="POST" enctype="multipart/form-data">
            <label for="audio_file">Upload an audio file:</label><br>
            <input type="file" name="audio_file" accept="audio/*" required><br><br>
            <input type="submit" value="Analyze Sentiment">
        </form>

        {% if sentiment %}
            <div class="result-container">
                <h3 class = "result-title">Sentiment Analysis Result:</h3>
                <p class="sentiment-{{ sentiment.lower() }}"> {{ sentiment }} </p>
            </div>
        {% endif %}
        {% if error %}
            <div class="error-message">{{ error }}</div>
        {% endif %}
    </div>
</body>
</html>

Key elements in the HTML template:

  • HTML Structure:
    • The <head> section defines the title, links a CSS stylesheet, and sets the viewport for responsiveness.
    • The <body> contains the visible content, including a form for uploading audio and a section to display the sentiment analysis result.
  • CSS Styling:
    • Modern Design: The CSS is updated to use a modern design.
    • Responsive Layout: The layout is more responsive, especially for smaller screens.
    • User Experience: Improved form and input styling for better usability.
    • Clear Error Display: Error messages are styled to be clearly visible.
    • Sentiment indication: the colors of the results change depending on the returned sentiment.
  • Form:
    • <form> with enctype="multipart/form-data" is used to handle file uploads.
    • <label> and <input type="file"> allow the user to select an audio file. The accept="audio/*" attribute restricts the user to uploading audio files.
    • <input type="submit"> button allows the user to submit the form.
  • Sentiment Display:
    • <div class="result-container"> is used to display the sentiment analysis result. The displayed sentiment will have its color changed depending on the result.
  • Error Handling:
    • <div class="error-message"> is used to display any error messages to the user.

Try It Out

  1. Save the files as app.py and templates/index.html.
  2. Ensure you have your OpenAI API key in the .env file.
  3. Run the application:
    python app.py
  4. Open http://localhost:5000 in your browser.
  5. Upload an audio file (e.g., a recording of someone speaking or the provided sample .mp3 file).
  6. View the sentiment analysis result displayed on the page.

5.4 Audio Sentiment Analysis with OpenAI

In this section, we'll explore how to build a sophisticated Flask web application that performs sentiment analysis on audio content using OpenAI's powerful APIs. This integration combines two key technologies: first, the application uses OpenAI's Whisper API to convert spoken words into written text through accurate transcription. Then, it leverages OpenAI's language models to analyze the emotional tone and sentiment of the transcribed content.

The process flow is straightforward yet powerful: users upload an audio file, which gets transcribed into text, and then the application applies natural language processing to determine whether the speaker's message conveys positive, negative, or neutral sentiment. This dual-step analysis provides valuable insights into the emotional content of spoken communications, making it useful for various applications like customer feedback analysis, market research, and content moderation.

5.4.1 What You’ll Build

The web application provides a comprehensive audio analysis solution with several key functionalities. When users interact with the platform, it performs the following sequence of operations:

  1. Receive and Process Audio Uploads: The application features a sophisticated web interface that handles audio file uploads with extensive format support. Users can submit files in popular formats like MP3 (ideal for compressed audio), WAV (perfect for high-quality uncompressed audio), and M4A (optimized for voice recordings). The interface includes file validation, size checks, and format verification to ensure smooth processing.
  2. Secure File Management System: Upon receiving an upload, the application implements a robust temporary storage system. Files are stored in a secure directory with proper access controls, utilizing automatic file cleanup protocols to prevent storage overflow. The system includes error handling mechanisms for failed uploads, corrupt files, and timeout scenarios, ensuring reliable operation even under heavy load.
  3. Advanced Audio Transcription: The integration with OpenAI's Whisper API provides state-of-the-art speech recognition capabilities. This sophisticated model excels at handling various accents, dialects, and background noise conditions, delivering accurate transcriptions across multiple languages. The system processes audio in chunks for optimal performance and includes progress tracking for longer files.
  4. Comprehensive Sentiment Analysis: The application harnesses GPT-4's advanced natural language processing through the Chat Completion API. This analysis goes beyond basic positive/negative classification, examining contextual clues, emotional undertones, and linguistic nuances. The system considers factors like tone, intensity, and semantic context to provide nuanced sentiment understanding.
  5. User-Friendly Results Interface: The application presents analysis results through an intuitive, well-designed interface. Users receive both the full transcription and a detailed sentiment breakdown, with clear visual indicators for different emotional categories. The interface includes options for downloading results, sharing analysis reports, and viewing historical analyses when applicable.

This groundbreaking combination of audio transcription and language analysis technologies revolutionizes how we process and understand spoken content. The applications span across multiple industries, offering unprecedented insights and efficiency improvements:

  • Analyzing customer feedback from voice recordings - This application transforms how businesses handle customer interactions. Call centers and customer service departments can now automatically process thousands of calls to:
    • Track customer satisfaction trends over time
    • Identify specific pain points in customer experiences
    • Generate actionable insights for service improvement
    • Train customer service representatives more effectively
  • Detecting emotional tones in spoken content for media analysis - This capability provides media companies with sophisticated tools for content evaluation:
    • Measure audience emotional engagement throughout content
    • Analyze speaker authenticity and credibility
    • Ensure brand message consistency across different media
    • Optimize content for maximum emotional impact
  • Assessing the sentiment of audio content in podcasts or interviews - This feature revolutionizes content analysis by:
    • Processing hours of content in minutes
    • Identifying key moments of emotional significance
    • Tracking sentiment changes throughout discussions
    • Enabling data-driven content strategy decisions

Technologies Used:

  • Flask: A Python web framework.
  • OpenAI API:
    • Whisper for audio transcription.
    • GPT-4 (or similar) for sentiment analysis.
  • HTML: To structure the web page.
  • CSS: To style the web page.

Project Structure:

The project will have the following file structure:

/audio_sentiment_analyzer

├── app.py
├── .env
└── templates/
    └── index.html
  • app.py: The Python file containing the Flask application code.
  • .env: A file to store the OpenAI API key.
  • templates/: A directory to store the HTML templates.
  • templates/index.html: The HTML template for the main page.

5.4.2 Step-by-Step Implementation

Step 1: Install Required Packages

Download the sample audio file: https://files.cuantum.tech/audio/someone-speaking.mp3

Install the necessary Python libraries:

pip install flask openai python-dotenv

Step 2: Set Up Environment Variables

Create a .env file in your project directory and add your OpenAI API key:

OPENAI_API_KEY=YOUR_OPENAI_API_KEY

Step 3: Create the Flask App (app.py)

Create a Python file named app.py and add the following code:

from flask import Flask, request, render_template, jsonify
import openai
import os
from dotenv import load_dotenv
import logging
from typing import Optional, Dict

load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

app = Flask(__name__)

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

ALLOWED_EXTENSIONS = {'mp3', 'mp4', 'wav', 'm4a'}  # Allowed audio file extensions


def allowed_file(filename: str) -> bool:
    """
    Checks if the uploaded file has an allowed extension.

    Args:
        filename (str): The name of the file.

    Returns:
        bool: True if the file has an allowed extension, False otherwise.
    """
    return '.' in filename and filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS


def transcribe_audio(file_path: str) -> Optional[str]:
    """
    Transcribes an audio file using OpenAI's Whisper API.

    Args:
        file_path (str): The path to the audio file.

    Returns:
        Optional[str]: The transcribed text, or None on error.
    """
    try:
        logger.info(f"Transcribing audio file: {file_path}")
        audio_file = open(file_path, "rb")
        response = openai.Audio.transcriptions.create(
            model="whisper-1",
            file=audio_file,
        )
        transcript = response.text
        logger.info(f"Transcription successful. Length: {len(transcript)} characters.")
        return transcript
    except openai.error.OpenAIError as e:
        logger.error(f"OpenAI API Error: {e}")
        return None
    except Exception as e:
        logger.error(f"Error during transcription: {e}")
        return None



def analyze_sentiment(text: str) -> Optional[str]:
    """
    Analyzes the sentiment of a given text using OpenAI's Chat Completion API.

    Args:
        text (str): The text to analyze.

    Returns:
        Optional[str]: The sentiment analysis result, or None on error.
    """
    try:
        logger.info("Analyzing sentiment of transcribed text.")
        response = openai.chat.completions.create(
            model="gpt-4",  # Or another suitable chat model
            messages=[
                {
                    "role": "system",
                    "content": "You are a sentiment analysis expert. Provide a concise sentiment analysis of the text. Your response should be one of the following: 'Positive', 'Negative', or 'Neutral'.",
                },
                {"role": "user", "content": text},
            ],
            temperature=0.2,  # Keep the output focused
            max_tokens=20
        )
        sentiment = response.choices[0].message.content
        logger.info(f"Sentiment analysis result: {sentiment}")
        return sentiment
    except openai.error.OpenAIError as e:
        logger.error(f"OpenAI API Error: {e}")
        return None
    except Exception as e:
        logger.error(f"Error during sentiment analysis: {e}")
        return None



@app.route("/", methods=["GET", "POST"])
def index():
    """
    Handles the main route for the web application.
    Allows users to upload an audio file, transcribes it, and analyzes the sentiment.
    """
    sentiment = None
    error_message = None

    if request.method == "POST":
        if 'audio_file' not in request.files:
            error_message = "No file part"
            logger.warning(error_message)
            return render_template("index.html", error=error_message)
        file = request.files['audio_file']
        if file.filename == '':
            error_message = "No file selected"
            logger.warning(error_message)
            return render_template("index.html", error=error_message)

        if file and allowed_file(file.filename):
            try:
                # Securely save the uploaded file to a temporary location
                temp_file_path = os.path.join(app.root_path, "temp_audio." + file.filename.rsplit('.', 1)[1].lower())
                file.save(temp_file_path)

                transcript = transcribe_audio(temp_file_path)  # Transcribe the audio
                if not transcript:
                    error_message = "Transcription failed. Please try again."
                    return render_template("index.html", error=error_message)

                sentiment = analyze_sentiment(transcript)
                if not sentiment:
                    error_message = "Sentiment analysis failed. Please try again."
                    return render_template("index.html", error=error_message)

                # Optionally, delete the temporary file after processing
                os.remove(temp_file_path)
            except Exception as e:
                error_message = f"An error occurred: {e}"
                logger.error(error_message)
                return render_template("index.html", error=error_message)
        else:
            error_message = "Invalid file type. Please upload a valid audio file (MP3, MP4, WAV, M4A)."
            logger.warning(error_message)
            return render_template("index.html", error=error_message)

    return render_template("index.html", sentiment=sentiment, error=error_message)

@app.errorhandler(500)
def internal_server_error(e):
    """Handles internal server errors."""
    logger.error(f"Internal Server Error: {e}")
    return render_template("error.html", error="Internal Server Error"), 500

if __name__ == "__main__":
    app.run(debug=True)

Code Breakdown:

  • Import Statements: Imports the necessary Flask modules, OpenAI library, osdotenvlogging, and Optional and Dict for type hinting.
  • Environment Variables: Loads the OpenAI API key from the .env file.
  • Flask Application: Creates a Flask application instance.
  • Logging Configuration: Configures logging for the application.
  • allowed_file Function: Checks if the uploaded file has an allowed audio extension (MP3, MP4, WAV, M4A).
  • transcribe_audio Function: Transcribes an audio file using OpenAI's Whisper API. It logs the file path and any errors during transcription.
  • analyze_sentiment Function:
    • def analyze_sentiment(text: str) -> Optional[str]:: Defines a function to analyze the sentiment of a text using OpenAI's Chat Completion API.
    • It takes the transcribed text as input.
    • It sends a request to the Chat Completion API with a system message instructing the model to perform sentiment analysis. The temperature is set to 0.2 to make the output more focused, and max_tokens is limited to 20 to keep the response concise.
    • It extracts the sentiment from the API response.
    • It logs the sentiment analysis result.
    • It includes error handling for OpenAI API errors and other exceptions.
  • index Route:
    • Handles both GET and POST requests.
    • For GET requests, it renders the initial HTML page.
    • For POST requests (when the user uploads an audio file):
      • It validates the uploaded file.
      • It saves the file temporarily.
      • It calls transcribe_audio() to transcribe the audio.
      • It calls analyze_sentiment() to analyze the transcribed text.
      • It renders the HTML template, passing the sentiment analysis result or any error messages.
  • @app.errorhandler(500): Handles internal server errors by logging the error and rendering a user-friendly error page.
  • if __name__ == "__main__":: Starts the Flask development server if the script is executed directly.

Step 4: Create the HTML Template (templates/index.html)

Create a folder named templates in the same directory as app.py.  Inside the templates folder, create a file named index.html with the following HTML code:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Audio Sentiment Analyzer</title>
    <link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;600;700&display=swap" rel="stylesheet">
    <style>
        /* --- General Styles --- */
        body {
            font-family: 'Inter', sans-serif;
            padding: 40px;
            background-color: #f9fafb;
            display: flex;
            justify-content: center;
            align-items: center;
            min-height: 100vh;
            margin: 0;
            color: #374151;
        }
        .container {
            max-width: 800px;
            width: 95%;
            background-color: #fff;
            padding: 2rem;
            border-radius: 0.75rem;
            box-shadow: 0 10px 25px -5px rgba(0, 0, 0, 0.1), 0 8px 10px -6px rgba(0, 0, 0, 0.05);
            text-align: center;
        }
        h2 {
            font-size: 2.25rem;
            font-weight: 600;
            margin-bottom: 1.5rem;
            color: #1e293b;
        }
        p {
            color: #6b7280;
            margin-bottom: 1rem;
        }
        /* --- Form Styles --- */
        form {
            margin-top: 1rem;
            display: flex;
            flex-direction: column;
            align-items: center;
            gap: 0.5rem;
        }
        label {
            font-size: 1rem;
            font-weight: 600;
            color: #4b5563;
            margin-bottom: 0.25rem;
            display: block;
            text-align: left;
            width: 100%;
            max-width: 400px;
            margin-left: auto;
            margin-right: auto;
        }
        input[type="file"] {
            width: 100%;
            max-width: 400px;
            padding: 0.75rem;
            border-radius: 0.5rem;
            border: 1px solid #d1d5db;
            font-size: 1rem;
            margin-bottom: 0.25rem;
            margin-left: auto;
            margin-right: auto;
        }
        input[type="submit"] {
            padding: 0.75rem 1.5rem;
            border-radius: 0.5rem;
            background-color: #4f46e5;
            color: #fff;
            font-size: 1rem;
            font-weight: 600;
            cursor: pointer;
            transition: background-color 0.3s ease;
            border: none;
            box-shadow: 0 2px 5px rgba(0, 0, 0, 0.2);
            margin-top: 1rem;
        }
        input[type="submit"]:hover {
            background-color: #4338ca;
        }
        input[type="submit"]:focus {
            outline: none;
            box-shadow: 0 0 0 3px rgba(79, 70, 229, 0.3);
        }
        /* --- Result Styles --- */
        .result-container {
            margin-top: 2rem;
            border: 1px solid #e5e7eb;
            border-radius: 0.5rem;
            padding: 1rem;
            background-color: #f8fafc;
        }

        .result-title{
            font-size: 1.25rem;
            font-weight: 600;
            color: #1e293b;
            margin-bottom: 0.75rem;
        }
        .sentiment-positive {
            color: #16a34a;
            font-weight: 600;
        }
        .sentiment-negative {
            color: #dc2626;
            font-weight: 600;
        }
        .sentiment-neutral {
            color: #71717a;
            font-weight: 600;
        }
        /* --- Error Styles --- */
        .error-message {
            color: #dc2626;
            margin-top: 1rem;
            padding: 0.75rem;
            background-color: #fee2e2;
            border-radius: 0.375rem;
            border: 1px solid #fecaca;
            text-align: center;
        }
        /* --- Responsive Adjustments --- */
        @media (max-width: 768px) {
            .container {
                padding: 20px;
            }
            form {
                gap: 1rem;
            }
            input[type="file"] {
                max-width: 100%;
            }
        }
    </style>
</head>
<body>
    <div class="container">
        <h2>🎙️ Audio Sentiment Analyzer</h2>
        <p> Upload an audio file to analyze the sentiment of the spoken content. Supported formats: MP3, MP4, WAV, M4A </p>
        <form method="POST" enctype="multipart/form-data">
            <label for="audio_file">Upload an audio file:</label><br>
            <input type="file" name="audio_file" accept="audio/*" required><br><br>
            <input type="submit" value="Analyze Sentiment">
        </form>

        {% if sentiment %}
            <div class="result-container">
                <h3 class = "result-title">Sentiment Analysis Result:</h3>
                <p class="sentiment-{{ sentiment.lower() }}"> {{ sentiment }} </p>
            </div>
        {% endif %}
        {% if error %}
            <div class="error-message">{{ error }}</div>
        {% endif %}
    </div>
</body>
</html>

Key elements in the HTML template:

  • HTML Structure:
    • The <head> section defines the title, links a CSS stylesheet, and sets the viewport for responsiveness.
    • The <body> contains the visible content, including a form for uploading audio and a section to display the sentiment analysis result.
  • CSS Styling:
    • Modern Design: The CSS is updated to use a modern design.
    • Responsive Layout: The layout is more responsive, especially for smaller screens.
    • User Experience: Improved form and input styling for better usability.
    • Clear Error Display: Error messages are styled to be clearly visible.
    • Sentiment indication: the colors of the results change depending on the returned sentiment.
  • Form:
    • <form> with enctype="multipart/form-data" is used to handle file uploads.
    • <label> and <input type="file"> allow the user to select an audio file. The accept="audio/*" attribute restricts the user to uploading audio files.
    • <input type="submit"> button allows the user to submit the form.
  • Sentiment Display:
    • <div class="result-container"> is used to display the sentiment analysis result. The displayed sentiment will have its color changed depending on the result.
  • Error Handling:
    • <div class="error-message"> is used to display any error messages to the user.

Try It Out

  1. Save the files as app.py and templates/index.html.
  2. Ensure you have your OpenAI API key in the .env file.
  3. Run the application:
    python app.py
  4. Open http://localhost:5000 in your browser.
  5. Upload an audio file (e.g., a recording of someone speaking or the provided sample .mp3 file).
  6. View the sentiment analysis result displayed on the page.