Click here to view the next lesson.

Chapter 7: Memory and Multi-Turn Conversations

7.3 Storing and Retrieving Past Interactions

In the real world, human conversations build upon shared history and previous interactions. When we talk to someone we know, we naturally draw upon our past experiences with them - their preferences, previous discussions, and the context we've built together. This natural memory system is what makes our conversations feel continuous and personally meaningful.

While OpenAI's language models don't inherently possess long-term memory capabilities, developers can create sophisticated systems to replicate this natural memory process. By implementing a storage and retrieval system for past interactions, you can create an AI assistant that appears to "remember" previous conversations. This involves carefully recording user interactions, storing relevant context, and strategically retrieving this information when needed.

This section demonstrates how to construct a lightweight but powerful memory system that enhances your AI assistant's capabilities. By maintaining conversation history and user preferences across multiple sessions, you can create more meaningful interactions that feel less like isolated exchanges and more like ongoing conversations. Your assistant will be able to reference past discussions, remember user preferences, and maintain context awareness - making it feel more like interacting with a knowledgeable colleague rather than starting fresh with each interaction.

7.3.1 Why Store Interactions?

Storing past interactions unlocks several powerful capabilities that enhance the AI's ability to provide meaningful and contextual responses:

Personalized Responses

The system learns and adapts to individual users by maintaining a detailed profile of their interactions and preferences over time. This personalization happens on multiple levels:

Communication Style: The system tracks how users express themselves by analyzing multiple aspects of their communication patterns:
- Formality level: Whether they use casual language ("hey there!") or formal address ("Dear Sir/Madam")
- Humor usage: Their tendency to use jokes, emojis, or playful language
- Conversation pace: If they prefer quick exchanges or detailed, lengthy discussions
- Vocabulary choices: Technical vs. simplified language
- Cultural references: Professional, academic, or pop culture references
For example, if a user consistently uses informal language like "hey" and "thanks!" with emojis, the system adapts by responding in a friendly, casual tone. Conversely, when interacting with business users who maintain formal language and professional terms, the system automatically adjusts to use appropriate business etiquette and industry-standard terminology.
This adaptive communication ensures more natural and effective interactions by matching each user's unique communication style and preferences.
Technical Proficiency: By analyzing past interactions, the system gauges users' expertise levels in different domains. This allows it to automatically adjust its explanations based on demonstrated knowledge.
For instance, when discussing programming, the system might use advanced terminology like "polymorphism" and "dependency injection" with experienced developers, while offering simpler explanations using real-world analogies for beginners. The system continuously refines this assessment through ongoing interactions - if a user demonstrates increased understanding over time, the technical depth of explanations adjusts accordingly. This adaptive approach ensures that experts aren't slowed down by basic explanations while newcomers aren't overwhelmed by complex technical details.
Historical Context: The system maintains comprehensive records of previous discussions, projects, and decisions, enabling it to reference past conversations with precision and relevance. This historical tracking operates on multiple levels:
- Conversation Threading: The system can follow the progression of specific topics across multiple sessions, understanding how discussions evolve and build upon each other.
- Project Milestones: Important decisions, agreements, and project updates are recorded and can be referenced to maintain consistency in future discussions.
- User Preferences Evolution: The system tracks how user preferences and requirements change over time, adapting its responses accordingly.
- Contextual References: When addressing current topics, the system can intelligently reference related past discussions to provide more informed and nuanced responses.
This sophisticated context management creates a seamless conversational experience where users feel understood and valued, as the system demonstrates awareness of their history and ongoing needs. For example, if a user previously discussed challenges with a specific programming framework, the system can reference those earlier conversations when providing new solutions or updates.
Customization Preferences: The system maintains and applies detailed user preferences across sessions, including:
- Preferred language and regional variations
  - Language selection (e.g., English, Spanish, Mandarin)
  - Regional dialects and localizations
  - Currency and measurement units
- Format preferences (bullet points vs. paragraphs)
  - Document structure preferences (hierarchical vs. flat)
  - Visual organization (lists, tables, or flowing text)
  - Code formatting conventions when applicable
- Level of detail desired in responses
  - Brief summaries vs. comprehensive explanations
  - Technical depth of content
  - Inclusion of examples and analogies
- Specific terminology or naming conventions
  - Industry-specific vocabulary
  - Preferred technical frameworks or methodologies
  - Company-specific terminology
- Time zones and working hours
  - Meeting scheduling preferences
  - Notification timing preferences
  - Availability windows for synchronous communication

This comprehensive approach to personalization helps create a more natural, efficient, and engaging interaction that feels tailored to each individual user's needs and preferences.

Example: Implementing Personalized Responses

Here's a comprehensive implementation of a personalization system that adapts to user communication styles:

import json
import os
from datetime import datetime
from typing import Dict, List, Optional
import openai

class UserProfile:
    def __init__(self, user_id: str):
        self.user_id = user_id
        self.communication_style = {
            "formality_level": 0.5,  # 0 = casual, 1 = formal
            "technical_level": 0.5,   # 0 = beginner, 1 = expert
            "verbosity": 0.5,         # 0 = concise, 1 = detailed
            "emoji_usage": False
        }
        self.preferences = {
            "language": "en",
            "timezone": "UTC",
            "topics_of_interest": []
        }
        self.interaction_history = []

class PersonalizedAIAssistant:
    def __init__(self, api_key: str):
        self.api_key = api_key
        openai.api_key = api_key
        self.profiles_dir = "user_profiles"
        os.makedirs(self.profiles_dir, exist_ok=True)
        self.users: Dict[str, UserProfile] = {}

    def _get_profile_path(self, user_id: str) -> str:
        return os.path.join(self.profiles_dir, f"{user_id}.json")

    def load_user_profile(self, user_id: str) -> UserProfile:
        if user_id in self.users:
            return self.users[user_id]

        profile_path = self._get_profile_path(user_id)
        if os.path.exists(profile_path):
            with open(profile_path, 'r') as f:
                data = json.load(f)
                profile = UserProfile(user_id)
                profile.communication_style = data.get('communication_style', profile.communication_style)
                profile.preferences = data.get('preferences', profile.preferences)
                profile.interaction_history = data.get('interaction_history', [])
        else:
            profile = UserProfile(user_id)

        self.users[user_id] = profile
        return profile

    def save_user_profile(self, profile: UserProfile):
        data = {
            'communication_style': profile.communication_style,
            'preferences': profile.preferences,
            'interaction_history': profile.interaction_history
        }
        with open(self._get_profile_path(profile.user_id), 'w') as f:
            json.dump(data, f, indent=2)

    def analyze_message(self, message: str) -> dict:
        """Analyze user message to update communication style metrics."""
        return {
            "formality_level": 0.8 if any(word in message.lower() for word in 
                ['please', 'thank you', 'sir', 'madam']) else 0.2,
            "technical_level": 0.8 if any(word in message.lower() for word in 
                ['api', 'function', 'implementation', 'code']) else 0.3,
            "emoji_usage": '😊' in message or '👍' in message
        }

    def generate_system_prompt(self, profile: UserProfile) -> str:
        """Create personalized system prompt based on user profile."""
        style = "formal" if profile.communication_style["formality_level"] > 0.5 else "casual"
        tech_level = "technical" if profile.communication_style["technical_level"] > 0.5 else "simple"
        
        return f"""You are a helpful assistant that communicates in a {style} style.
                  Use {tech_level} language and {'feel free to use emojis' 
                  if profile.communication_style['emoji_usage'] else 'avoid using emojis'}.
                  Communicate {'in detail' if profile.communication_style['verbosity'] > 0.5 
                  else 'concisely'}."""

    async def get_response(self, user_id: str, message: str) -> str:
        profile = self.load_user_profile(user_id)
        
        # Analyze and update user's communication style
        analysis = self.analyze_message(message)
        profile.communication_style.update(analysis)
        
        # Prepare conversation context
        messages = [
            {"role": "system", "content": self.generate_system_prompt(profile)},
            {"role": "user", "content": message}
        ]

        # Add relevant history if available
        if profile.interaction_history:
            recent_history = profile.interaction_history[-3:]  # Last 3 interactions
            messages[1:1] = recent_history

        # Get AI response
        response = openai.ChatCompletion.create(
            model="gpt-4o",
            messages=messages,
            temperature=0.7,
            max_tokens=150
        )

        # Store interaction
        interaction = {
            "timestamp": datetime.utcnow().isoformat(),
            "user_message": message,
            "assistant_response": response.choices[0].message.content
        }
        profile.interaction_history.append(interaction)
        
        # Save updated profile
        self.save_user_profile(profile)
        
        return response.choices[0].message.content

# Usage example
if __name__ == "__main__":
    assistant = PersonalizedAIAssistant("your-api-key-here")
    
    # Example interactions
    responses = [
        assistant.get_response("user123", "Hey there! Can you help me with Python? 😊"),
        assistant.get_response("user123", "Could you explain the technical implementation of APIs?"),
        assistant.get_response("user123", "Dear Sir, I require assistance with programming.")
    ]

Code Breakdown:

Class Structure:
- UserProfile class maintains individual user information:
  - Communication style metrics (formality, technical level, etc.)
  - Personal preferences (language, timezone)
  - Interaction history
- PersonalizedAIAssistant class handles the core functionality:
  - Profile management (loading/saving)
  - Message analysis
  - Response generation
Key Features:
- Persistent Storage: Profiles are saved as JSON files
- Style Analysis: Examines messages for communication patterns
- Dynamic Prompting: Generates customized system prompts
- Context Management: Maintains conversation history
Personalization Aspects:
- Communication Style:
  - Formality level detection
  - Technical language adaptation
  - Emoji usage tracking
- Response Adaptation:
  - Adjusts verbosity based on user preference
  - Maintains consistent style across interactions
  - Incorporates conversation history

This implementation demonstrates how to create an AI assistant that learns and adapts to each user's communication style while maintaining a persistent memory of interactions. The system continuously updates its understanding of user preferences and adjusts its responses accordingly.

Session Resumption

Users can return to conversations after breaks and have the AI understand the full context of previous discussions. This capability enables seamless conversation continuity, where the AI maintains awareness of prior interactions, user preferences, and established context. For example, if a user discusses a software bug on Monday and returns on Wednesday, the AI can recall the specific details of the bug, proposed solutions, and any attempted fixes without requiring the user to repeat information.

This feature is particularly valuable for complex tasks that span multiple sessions, like project planning or technical troubleshooting. During project planning, the AI can maintain records of previously agreed-upon milestones, resource allocations, and team responsibilities across multiple planning sessions. In technical troubleshooting scenarios, it can track the progression of debugging attempts, remember which solutions were already tried, and build upon previous diagnostic steps.

The AI can reference specific points from earlier conversations and maintain continuity across days or even weeks. This long-term context awareness enables the AI to make more informed suggestions, avoid redundant discussions, and provide more personalized assistance based on the user's historical interactions. For instance, if a user previously expressed a preference for certain programming frameworks or methodologies, the AI can incorporate these preferences into future recommendations without requiring explicit reminders.

Here's a practical implementation of session resumption:

from datetime import datetime
import json
import openai
from typing import List, Dict, Optional

class SessionManager:
    def __init__(self, api_key: str):
        self.api_key = api_key
        openai.api_key = api_key
        self.sessions: Dict[str, Dict] = {}
        
    def save_session(self, user_id: str, session_data: dict):
        """Save session data to persistent storage."""
        with open(f"sessions/{user_id}.json", "w") as f:
            json.dump(session_data, f)
            
    def load_session(self, user_id: str) -> Optional[dict]:
        """Load session data from storage."""
        try:
            with open(f"sessions/{user_id}.json", "r") as f:
                return json.load(f)
        except FileNotFoundError:
            return None

class ConversationManager:
    def __init__(self, session_manager: SessionManager):
        self.session_manager = session_manager
        self.current_context: List[Dict] = []
        
    def prepare_context(self, user_id: str, new_message: str) -> List[Dict]:
        """Prepare conversation context including session history."""
        # Load previous session if exists
        session = self.session_manager.load_session(user_id)
        
        # Initialize context with system message
        context = [{
            "role": "system",
            "content": "You are a helpful assistant with memory of past conversations."
        }]
        
        # Add relevant history from previous session
        if session and 'history' in session:
            # Add last 5 messages from previous session for context
            context.extend(session['history'][-5:])
        
        # Add new message
        context.append({
            "role": "user",
            "content": new_message
        })
        
        return context

    async def process_message(self, user_id: str, message: str) -> str:
        """Process new message with session context."""
        context = self.prepare_context(user_id, message)
        
        try:
            response = await openai.ChatCompletion.acreate(
                model="gpt-4o",
                messages=context,
                temperature=0.7,
                max_tokens=150
            )
            
            assistant_message = response.choices[0].message.content
            
            # Update session with new interaction
            session_data = {
                'last_interaction': datetime.now().isoformat(),
                'history': context + [{
                    "role": "assistant",
                    "content": assistant_message
                }]
            }
            self.session_manager.save_session(user_id, session_data)
            
            return assistant_message
            
        except Exception as e:
            print(f"Error processing message: {e}")
            return "I apologize, but I encountered an error processing your message."

# Example usage
async def main():
    session_manager = SessionManager("your-api-key-here")
    conversation_manager = ConversationManager(session_manager)
    
    # First interaction
    response1 = await conversation_manager.process_message(
        "user123",
        "What's the weather like today?"
    )
    print("Response 1:", response1)
    
    # Later interaction (session resumption)
    response2 = await conversation_manager.process_message(
        "user123",
        "What did we discuss earlier?"
    )
    print("Response 2:", response2)

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Code Breakdown:

SessionManager Class:
- Handles persistent storage of session data
- Provides methods for saving and loading session information
- Maintains user-specific session files
ConversationManager Class:
- Manages conversation context and history
- Prepares context by combining previous session data with new messages
- Handles interaction with the OpenAI API
Key Features:
- Asynchronous Processing: Uses async/await for efficient API calls
- Context Management: Maintains relevant conversation history
- Error Handling: Includes robust error management
- Session Persistence: Saves conversations to disk for later retrieval
Implementation Details:
- Uses JSON for session storage
- Limits context to last 5 messages for efficiency
- Includes timestamp tracking for session management
- Maintains conversation roles (system, user, assistant)

This example provides a robust foundation for managing multi-session conversations while maintaining context and user history. It's particularly useful for applications requiring persistent conversation memory across multiple interactions.

Persistent Knowledge

The system maintains a robust and comprehensive record of all significant information exchanged during conversations. This persistent knowledge architecture operates on multiple levels:

Basic Information Management: The system captures and stores essential operational data in a structured manner. This includes comprehensive tracking of calendar entries such as meetings and appointments, with metadata like attendees and agendas. Project timelines are maintained with detailed milestone tracking, dependencies, and phase transitions.
The system records all deadlines systematically, from task-level due dates to major project deliverables. Regular updates are stored chronologically, including daily reports, status changes, and project modifications. This robust information architecture ensures that all scheduling and project-related data remains easily retrievable, supporting efficient project management and team coordination.
User-Specific Data: The system maintains detailed profiles of individual users that encompass multiple aspects of their interactions:
- Personal Preferences: Including preferred communication channels, response formats, and specific domain interests
- Communication Styles: Tracking whether users prefer formal or casual language, technical or simplified explanations, and their typical response length preferences
- Technical Expertise: Monitoring and adapting to users' demonstrated knowledge levels across different subjects and adjusting explanations accordingly
- Historical Patterns: Recording timing of interactions, frequently discussed topics, and common questions or concerns
- Language Patterns: Noting vocabulary usage, technical terminology familiarity, and preferred examples or analogies
- Learning Progress: Tracking how users' understanding of various topics evolves over time
This comprehensive user profiling enables the system to deliver increasingly tailored responses that match each user's unique needs and preferences, creating a more effective and engaging interaction experience over time.
Decision Recording: Critical decisions are systematically documented in a comprehensive manner that includes multiple key components:
- Context: The full background situation, business environment, and constraints that framed the decision
- Rationale: Detailed reasoning behind the choice, including:
  - Analysis of alternatives considered
  - Risk assessment and mitigation strategies
  - Expected outcomes and success metrics
- Stakeholders: Complete documentation of:
  - Decision makers and their roles
  - Affected teams and departments
  - External parties involved
- Implementation Plan:
  - Step-by-step execution strategy
  - Resource allocation details
  - Timeline and milestones
This systematic documentation process creates a detailed and auditable trail that enables teams to:
- Track the evolution of important decisions
- Understand the complete context of past choices
- Learn from previous experiences
- Make more informed decisions in the future
- Maintain accountability and transparency
Task Management: The system implements a comprehensive task tracking system that monitors various aspects of project execution:
- Assignment Tracking: Each task is linked to specific team members or departments responsible for its completion, ensuring clear ownership and responsibility
- Timeline Management: Detailed due dates are maintained, including both final deadlines and intermediate milestones, allowing for better time management
- Progress Monitoring: Regular status updates are recorded to track task progression, including completed work, current blockers, and remaining steps
- Dependency Mapping: The system maintains a clear map of task dependencies, helping teams understand how delays or changes in one task might impact others
- Resource Allocation: Tracks the distribution of work and resources across team members to prevent overload and ensure efficient project execution
Project Details: The system maintains comprehensive documentation of technical aspects including:
- Technical Specifications:
  - Detailed system architecture blueprints
  - Complete API documentation and endpoints
  - Database schemas and data models
  - Third-party integration specifications
- Project Requirements:
  - Business requirements and objectives
  - Technical requirements and constraints
  - User stories and acceptance criteria
  - Scope definitions and boundaries
- Challenges and Solutions:
  - Identified technical obstacles
  - Implemented workarounds
  - Performance optimization efforts
  - Security measures and updates
- Implementation Records:
  - Code documentation and examples
  - Architecture decision records
  - Testing strategies and results
  - Deployment procedures

This comprehensive persistent knowledge base serves multiple purposes: it prevents information loss, enables accurate historical references, supports informed decision-making, and allows for seamless continuation of complex discussions across multiple sessions. The system can easily recall and contextualize information from past interactions, making each conversation more efficient and productive.

Here's a comprehensive implementation of Persistent Knowledge using OpenAI API:

from typing import Dict, List, Optional
import json
import openai
from datetime import datetime
import tiktoken

class PersistentKnowledge:
    def __init__(self, api_key: str):
        self.api_key = api_key
        openai.api_key = api_key
        self.encoding = tiktoken.encoding_for_model("gpt-4o")
        
    def count_tokens(self, text: str) -> int:
        """Count tokens in text using tiktoken"""
        return len(self.encoding.encode(text))
        
    def save_knowledge(self, user_id: str, category: str, data: Dict):
        """Save knowledge to persistent storage with categories"""
        filename = f"knowledge_{user_id}_{category}.json"
        timestamp = datetime.now().isoformat()
        
        data_with_metadata = {
            "timestamp": timestamp,
            "category": category,
            "content": data
        }
        
        try:
            with open(filename, "r") as f:
                existing_data = json.load(f)
        except FileNotFoundError:
            existing_data = []
            
        existing_data.append(data_with_metadata)
        
        with open(filename, "w") as f:
            json.dump(existing_data, f, indent=2)
            
    def retrieve_knowledge(self, user_id: str, category: str, 
                         max_tokens: int = 2000) -> List[Dict]:
        """Retrieve knowledge with token limit"""
        filename = f"knowledge_{user_id}_{category}.json"
        try:
            with open(filename, "r") as f:
                all_data = json.load(f)
        except FileNotFoundError:
            return []
            
        # Retrieve most recent entries within token limit
        retrieved_data = []
        current_tokens = 0
        
        for entry in reversed(all_data):
            content_str = json.dumps(entry["content"])
            tokens = self.count_tokens(content_str)
            
            if current_tokens + tokens <= max_tokens:
                retrieved_data.append(entry)
                current_tokens += tokens
            else:
                break
                
        return list(reversed(retrieved_data))
        
    async def get_ai_response(self, 
                            user_id: str,
                            current_input: str,
                            categories: List[str] = None) -> str:
        """Generate AI response with context from persistent knowledge"""
        
        # Build context from stored knowledge
        context = []
        if categories:
            for category in categories:
                knowledge = self.retrieve_knowledge(user_id, category)
                if knowledge:
                    context.append(f"\nRelevant {category} knowledge:")
                    for entry in knowledge:
                        context.append(json.dumps(entry["content"]))
                        
        # Prepare messages for API
        messages = [
            {
                "role": "system",
                "content": "You are an assistant with access to persistent knowledge. "
                          "Use this context to provide informed responses."
            },
            {
                "role": "user",
                "content": f"Context:\n{''.join(context)}\n\nCurrent query: {current_input}"
            }
        ]
        
        try:
            response = await openai.ChatCompletion.acreate(
                model="gpt-4",
                messages=messages,
                temperature=0.7,
                max_tokens=500
            )
            return response.choices[0].message.content
        except Exception as e:
            return f"Error generating response: {str(e)}"
            
class KnowledgeManager:
    def __init__(self, api_key: str):
        self.knowledge = PersistentKnowledge(api_key)
        
    async def process_interaction(self, 
                                user_id: str,
                                message: str,
                                categories: List[str] = None) -> str:
        """Process user interaction and maintain knowledge"""
        
        # Save user input
        self.knowledge.save_knowledge(
            user_id,
            "conversations",
            {"role": "user", "message": message}
        )
        
        # Get AI response with context
        response = await self.knowledge.get_ai_response(
            user_id,
            message,
            categories
        )
        
        # Save AI response
        self.knowledge.save_knowledge(
            user_id,
            "conversations",
            {"role": "assistant", "message": response}
        )
        
        return response

# Example usage
async def main():
    manager = KnowledgeManager("your-api-key")
    
    # First interaction
    response1 = await manager.process_interaction(
        "user123",
        "What's the best way to learn Python?",
        ["conversations", "preferences"]
    )
    print("Response 1:", response1)
    
    # Save user preference
    manager.knowledge.save_knowledge(
        "user123",
        "preferences",
        {"learning_style": "hands-on", "preferred_language": "Python"}
    )
    
    # Later interaction using stored knowledge
    response2 = await manager.process_interaction(
        "user123",
        "Can you suggest some projects based on my learning style?",
        ["conversations", "preferences"]
    )
    print("Response 2:", response2)

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Code Breakdown:

PersistentKnowledge Class:
- Handles token counting using tiktoken for context management
- Implements save_knowledge() for storing categorized information
- Provides retrieve_knowledge() with token limits for context retrieval
- Manages AI interactions through get_ai_response()
KnowledgeManager Class:
- Provides high-level interface for knowledge management
- Processes user interactions and maintains conversation history
- Handles saving both user inputs and AI responses
Key Features:
- Categorized Knowledge Storage: Organizes information by user and category
- Token Management: Ensures context stays within model limits
- Metadata Tracking: Includes timestamps and categories for all stored data
- Error Handling: Robust error management for file operations and API calls
Implementation Benefits:
- Scalable: Handles multiple users and knowledge categories
- Efficient: Uses token counting to optimize context usage
- Flexible: Supports various knowledge types and categories
- Maintainable: Well-structured code with clear separation of concerns

This implementation provides a solid foundation for AI applications requiring persistent knowledge across conversations. It excels at maintaining user preferences, conversation histories, and other ongoing data while efficiently managing token limits.

Conversation Summarization

By maintaining detailed conversation histories, the system can generate comprehensive summaries that capture key points, decisions, and action items. These summaries serve multiple critical functions:

Context Refreshing: When participants begin new conversation sessions, the system provides concise yet comprehensive summaries of previous discussions. These summaries serve as efficient briefings that:
- Quickly orient participants on key discussion points
- Highlight important decisions and outcomes
- Identify ongoing action items
- Refresh memory on critical context
  This eliminates the time-consuming process of reviewing extensive conversation logs and ensures all participants can immediately engage productively in the current discussion with full context awareness.
Progress Tracking: Regular summaries serve as a comprehensive tracking mechanism for ongoing discussions and projects. By maintaining detailed records of project evolution, teams can:
- Monitor Development Phases
  - Track progression from initial concepts to implementation
  - Document iterative improvements and refinements
  - Record key turning points in project direction
- Analyze Decision History
  - Capture the context behind important choices
  - Document alternative options considered
  - Track outcomes of implemented decisions
- Identify Project Trends
  - Spot recurring challenges or bottlenecks
  - Recognize successful patterns to replicate
  - Monitor velocity and momentum
- Facilitate Team Alignment
  - Maintain shared understanding of progress
  - Enable data-driven course corrections
  - Support informed resource allocation
Knowledge Extraction: The system employs advanced parsing techniques to identify and extract critical information from conversations, including:
- Key decisions and their rationale
  - Strategic choices made during discussions
  - Supporting evidence and justification
  - Alternative options considered
- Action items and their owners
  - Specific tasks assigned to team members
  - Clear responsibility assignments
  - Follow-up requirements
- Important deadlines and milestones
  - Project timeline markers
  - Critical delivery dates
  - Review and checkpoint schedules
- Unresolved questions or concerns
  - Open technical issues
  - Pending decisions
  - Areas needing clarification
- Agreements and commitments made
  - Formal decisions reached
  - Resource allocation agreements
  - Timeline commitments
Report Generation: Summaries can be automatically compiled into various types of reports:
- Executive briefings
  - High-level overviews for stakeholders
  - Key decisions and strategic implications
  - Resource allocation summaries
- Meeting minutes
  - Detailed discussion points and outcomes
  - Action items and assignees
  - Timeline commitments made
- Progress updates
  - Milestone achievements and delays
  - Current blockers and challenges
  - Next steps and priorities
- Project status reports
  - Overall project health indicators
  - Resource utilization metrics
  - Risk assessments and mitigation strategies

The ability to condense and recall relevant information not only makes conversations more efficient and focused but also ensures that critical information is never lost and can be easily accessed when needed. This systematic approach to conversation summarization helps maintain clarity and continuity across long-term interactions, especially in complex projects or ongoing discussions involving multiple participants.

Example: Implementing Conversation Summarization

Here's a practical implementation of a conversation summarizer using OpenAI's API:

import openai
from typing import List, Dict
from datetime import datetime

class ConversationSummarizer:
    def __init__(self, api_key: str):
        self.api_key = api_key
        openai.api_key = api_key
        
    def create_summary_prompt(self, messages: List[Dict]) -> str:
        """Create a prompt for summarization from messages"""
        conversation = "\n".join([
            f"{msg['role'].title()}: {msg['content']}" 
            for msg in messages
        ])
        
        return f"""Please provide a concise summary of the following conversation, 
        highlighting key points, decisions, and action items:

        {conversation}
        
        Summary should include:
        1. Main topics discussed
        2. Key decisions made
        3. Action items and owners
        4. Unresolved questions
        """

    async def generate_summary(self, messages: List[Dict]) -> Dict:
        """Generate a structured summary of the conversation"""
        try:
            response = await openai.ChatCompletion.acreate(
                model="gpt-4o",
                messages=[{
                    "role": "user",
                    "content": self.create_summary_prompt(messages)
                }],
                temperature=0.7,
                max_tokens=500
            )
            
            summary = response.choices[0].message.content
            
            return {
                "timestamp": datetime.now().isoformat(),
                "message_count": len(messages),
                "summary": summary
            }
            
        except Exception as e:
            return {
                "error": f"Failed to generate summary: {str(e)}",
                "timestamp": datetime.now().isoformat()
            }

    def save_summary(self, summary: Dict, filename: str = "summaries.json"):
        """Save summary to JSON file"""
        try:
            with open(filename, "r") as f:
                summaries = json.load(f)
        except FileNotFoundError:
            summaries = []
            
        summaries.append(summary)
        
        with open(filename, "w") as f:
            json.dump(summaries, f, indent=2)

# Example usage
async def main():
    summarizer = ConversationSummarizer("your-api-key")
    
    # Sample conversation
    messages = [
        {"role": "user", "content": "Let's discuss the new feature implementation."},
        {"role": "assistant", "content": "Sure! What specific aspects would you like to focus on?"},
        {"role": "user", "content": "We need to implement user authentication by next week."},
        {"role": "assistant", "content": "I understand. Let's break down the requirements and timeline."}
    ]
    
    # Generate and save summary
    summary = await summarizer.generate_summary(messages)
    summarizer.save_summary(summary)
    print("Summary:", summary["summary"])

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Code Breakdown:

Class Structure:
- ConversationSummarizer class handles all summarization operations
- Initialization with API key setup
- Methods for prompt creation, summary generation, and storage
Key Features:
- Structured prompt generation for consistent summaries
- Async API calls for better performance
- Error handling and logging
- Persistent storage of summaries
Implementation Benefits:
- Scalable: Handles conversations of varying lengths
- Structured Output: Organized summaries with key information
- Historical Tracking: Maintains summary history
- Error Resilient: Robust error handling and logging

This example provides a reliable way to generate and maintain conversation summaries, making it easier to track discussion progress and key decisions over time.

7.3.2 Core Architecture

To create an effective memory system for AI interactions, three essential components work together in a coordinated manner to maintain context and ensure conversational continuity. These components form a sophisticated architecture that enables AI systems to access, process, and utilize historical information effectively during ongoing conversations.

The storage component preserves conversation history, the retrieval mechanism intelligently fetches relevant past interactions, and the injection system seamlessly incorporates this historical context into current conversations.

Together, these three pillars create a robust foundation that enables AI systems to maintain meaningful, context-aware dialogues across multiple interactions:

Storage

The foundation of memory management where all interactions are systematically saved and preserved for future use. This critical component serves as the backbone of any AI conversation system:

Can utilize various storage solutions like JSON files, SQL databases, or cloud storage
- JSON files offer simplicity and portability for smaller applications
- SQL databases provide robust querying and indexing for larger datasets
- Cloud storage enables scalable, distributed access across multiple services
Should include metadata like timestamps and user identifiers
- Timestamps enable chronological tracking and time-based filtering
- User IDs maintain conversation threads and personalization
- Additional metadata can track conversation topics and contexts
Must be organized for efficient retrieval and scaling
- Implement proper indexing for quick access to relevant data
- Use data partitioning for improved performance
- Consider compression strategies for long-term storage

Retrieval

The intelligent system for accessing and filtering relevant past conversations serves as a crucial component in managing conversation history:

Implements search algorithms to find context-appropriate historical data
- Uses semantic search to match similar topics and themes
- Employs fuzzy matching for flexible text comparison
- Indexes conversations for quick retrieval
Uses parameters like recency, relevance, and conversation thread
- Prioritizes recent interactions for immediate context
- Weighs relevance based on topic similarity scores
- Maintains thread continuity by tracking conversation flows
Manages token limits by selecting the most important context
- Implements smart truncation strategies
- Prioritizes key information while removing redundant content
- Dynamically adjusts context window based on model limitations

Injection

The process of seamlessly incorporating historical context requires careful handling to maintain conversation coherence:

Strategically places retrieved messages into the current conversation flow
- Determines optimal insertion points for historical context
- Filters and prioritizes relevant historical information
- Balances new and historical content for natural flow
Maintains proper message ordering and relationships
- Preserves chronological sequence of interactions
- Respects conversation threading and reply chains
- Links related topics and discussions appropriately
Ensures smooth context integration without disrupting the conversation
- Avoids abrupt context switches or information overload
- Uses natural transition phrases and references
- Maintains consistent tone and conversation style

7.3.3 How to Store and Retrieve JSON-based Message Histories in Python

In this section, we'll explore a practical implementation that demonstrates how to create a persistent memory system using JSON files. This approach offers a straightforward way to maintain conversation context across multiple sessions while remaining scalable and easy to maintain.

Our implementation will focus on three key aspects: efficiently storing message histories in JSON format, retrieving relevant conversation context when needed, and managing the data structure to ensure optimal performance. This solution is particularly useful for developers building chatbots, virtual assistants, or any application requiring persistent conversation history.

Step 1: Build the Memory Manager

This module handles storing user interactions in JSON format and retrieving only the relevant ones.

import os
import json
from typing import List, Dict

MEMORY_DIR = "user_memory"
os.makedirs(MEMORY_DIR, exist_ok=True)

# Constants
MAX_HISTORY_MESSAGES = 5  # Truncate history to last 5 messages to manage tokens

def get_memory_path(user_id: str) -> str:
    return os.path.join(MEMORY_DIR, f"{user_id}.json")

def load_history(user_id: str) -> List[Dict[str, str]]:
    path = get_memory_path(user_id)
    if not os.path.exists(path):
        return []
    try:
        with open(path, "r", encoding="utf-8") as f:
            return json.load(f)
    except json.JSONDecodeError:
        return []

def store_interaction(user_id: str, role: str, content: str) -> None:
    message = {"role": role, "content": content}
    path = get_memory_path(user_id)

    history = load_history(user_id)
    history.append(message)

    with open(path, "w", encoding="utf-8") as f:
        json.dump(history, f, indent=2)

def get_recent_history(user_id: str, limit: int = MAX_HISTORY_MESSAGES) -> List[Dict[str, str]]:
    history = load_history(user_id)
    return history[-limit:]

Let's break down this code:

1. Initial Setup

Creates a "user_memory" directory to store conversation histories
Sets a maximum limit of 5 messages for history management

2. Core Functions

get_memory_path(user_id): Creates a unique JSON file path for each user
load_history(user_id):
- Attempts to read the user's conversation history
- Returns an empty list if file doesn't exist or is corrupted
store_interaction(user_id, role, content):
- Saves new messages to the user's history file
- Appends the message to existing history
- Stores in JSON format with proper indentation
get_recent_history(user_id, limit):
- Retrieves the most recent messages
- Respects the MAX_HISTORY_MESSAGES limit (5 messages)

3. Key Features

Persistent storage: Each user's conversations are saved in separate JSON files
Scalability: System can handle multiple users with individual files
Controlled context: Allows specific control over how much history to maintain
Debug-friendly: JSON format makes it easy to inspect stored conversations

Step 2: Create the Chat Engine Using OpenAI API

Now let’s integrate this memory system with the OpenAI API. We’ll load previous messages, add the new prompt, query the model, and save the response.

import openai
from dotenv import load_dotenv

load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

SYSTEM_PROMPT = {
    "role": "system",
    "content": "You are a helpful assistant that remembers the user's previous messages."
}

def continue_conversation(user_id: str, user_input: str) -> str:
    # Validate input
    if not user_input.strip():
        return "Input cannot be empty."

    # Load memory and prepare messages
    recent_history = get_recent_history(user_id)
    messages = [SYSTEM_PROMPT] + recent_history + [{"role": "user", "content": user_input}]

    # Call OpenAI API
    try:
        response = openai.ChatCompletion.create(
            model="gpt-4o",
            messages=messages,
            max_tokens=300,
            temperature=0.7
        )
        assistant_reply = response["choices"][0]["message"]["content"]

        # Store both messages
        store_interaction(user_id, "user", user_input)
        store_interaction(user_id, "assistant", assistant_reply)

        return assistant_reply

    except Exception as e:
        return f"Something went wrong: {e}"

Let’s break down this code:

1. Initial Setup

Imports required libraries (openai and dotenv)
Loads environment variables and sets up OpenAI API key
Defines a system prompt that establishes the assistant's role

2. Main Function: continue_conversation

Takes user_id and user_input as parameters
Input validation to check for empty messages
Loads conversation history using get_recent_history (defined in previous section)
Constructs messages array combining:
- System prompt
- Recent conversation history
- Current user input

3. API Interaction

Makes API call to OpenAI with parameters:
- Uses "gpt-4o" model
- Sets max_tokens to 300
- Uses temperature of 0.7 for balanced creativity
Extracts assistant's reply from the response

4. Memory Management

Stores both the user's input and assistant's reply using store_interaction
Handles errors gracefully with try-except block

This portion creates a stateful conversation system that maintains context across multiple interactions while managing the conversation flow efficiently.

Step 3: Test the Conversation Memory

if __name__ == "__main__":
    user_id = "user_456"

    print("User: What are the best Python libraries for data science?")
    reply1 = continue_conversation(user_id, "What are the best Python libraries for data science?")
    print("Assistant:", reply1)

    print("\nUser: Could you remind me which ones you mentioned?")
    reply2 = continue_conversation(user_id, "Could you remind me which ones you mentioned?")
    print("Assistant:", reply2)

Let's break down this code example for testing conversation memory:

1. Entry Point Check

The code uses the standard Python if __name__ == "__main__": idiom to ensure this code only runs when the file is executed directly

2. User Setup

Creates a test user with ID "user_456"

3. Test Conversation Flow

Demonstrates a two-turn conversation where:
First turn: Asks about Python libraries for data science
Second turn: Asks for a reminder of previously mentioned libraries, testing the memory system

4. Implementation Details

Each interaction uses the continue_conversation() function to:
Process the user input
Generate and store the response
Print both the user input and assistant's response

This test code effectively demonstrates how the system maintains context between multiple interactions, allowing the assistant to reference previous responses when answering follow-up questions.

Benefits of This Approach

Persistent: All conversations are stored locally by user, ensuring that no interaction history is lost. This means your application can maintain context across multiple sessions, even if the server restarts or the application closes.
Scalable (to a point): By storing each user's conversation history in their own dedicated JSON file, the system can handle multiple users efficiently. This approach works well for small to medium-sized applications, though for very large deployments you might want to consider a database solution.
Controllable context: The system gives you complete control over how much conversation history to include in each interaction. You can adjust the memory window size, filter by relevance, or implement custom logic for selecting which previous messages to include in the context.
Readable: The JSON file format makes it simple to inspect, debug, and modify stored conversations. This is invaluable during development and maintenance, as you can easily view the conversation history in any text editor and validate the data structure.

Optional Enhancements

Summarization:Instead of storing every message verbatim, implement periodic summarization of conversation history. This technique involves automatically generating concise summaries of longer conversation segments, which helps:
- Reduce token usage in API calls
- Maintain essential context while removing redundant details
- Create a more efficient memory structure
For example, multiple messages about a specific topic could be condensed into a single summary statement.
Vector Search (Advanced):Transform messages into numerical vectors using embedding models, enabling sophisticated retrieval based on semantic meaning. This approach offers several advantages:
- Discover contextually relevant historical messages even if they use different words
- Prioritize messages based on their relevance to the current conversation
- Enable fast similarity searches across large conversation histories
This is particularly useful for long-running conversations or when specific context needs to be recalled.
Token Budgeting:Implement smart token management strategies to optimize context window usage. This includes:
- Setting dynamic limits based on conversation importance
- Implementing intelligent pruning of older, less relevant messages
- Maintaining a balance between recent context and important historical information
This ensures you stay within API token limits while preserving the most valuable conversation context.
Keep the system prompt consistent across interactions
- Maintain identical prompt wording and instructions throughout the entire conversation lifecycle
- Use version control to track any system prompt changes across deployments
- Prevents confusion and contradictory responses by maintaining consistent context
- Ensures the AI maintains a reliable personality and behavioral pattern throughout interactions
Don't overload the context—store everything, but retrieve selectively
- Implement a comprehensive storage system that maintains complete conversation histories in your database
- Develop intelligent retrieval algorithms that prioritize relevant context for API calls
- Use semantic search or embedding-based similarity to find pertinent historical messages
- Balance token usage by implementing smart pruning strategies for older messages
Label stored messages clearly (role, content, timestamp) for future filtering or summarization
- Role: Carefully identify and tag message sources (system, user, or assistant) to maintain clear conversation flow and enable role-based filtering
- Content: Implement consistent formatting standards for message content, including handling of special characters and maintaining data integrity
- Timestamp: Add precise temporal metadata to enable sophisticated time-based operations like conversation segmentation and contextual relevance scoring

Long-term memory isn't native to OpenAI's chat models (yet), but with thoughtful engineering, you can create sophisticated memory systems. This limitation means that by default, the model treats each interaction as isolated, without any knowledge of previous conversations. However, developers can implement custom solutions to overcome this limitation.

Here's a detailed explanation of how it works: While the model itself doesn't maintain memories between conversations, your application acts as a memory manager. It stores each interaction in a structured format, typically using databases or file systems. This stored data includes not just the conversation content, but also metadata like timestamps, user information, and conversation topics. By carefully managing these stored conversations and selectively retrieving relevant pieces, you create an illusion of continuous memory - making the AI appear to "remember" previous interactions.

This memory system transforms your AI assistant in several powerful ways:

Continuity: The assistant can reference past conversations and maintain context over extended periods, creating seamless interactions that build upon previous discussions. For example, if a user mentions their preference for Python programming in one conversation, the system can reference this in future interactions.
Personality: Consistent response patterns and remembered preferences create a more distinct personality. This includes maintaining a consistent tone, remembering user preferences, and adapting communication styles based on past interactions.
Understanding: By accessing historical context, responses become more informed and personalized. The system can recall specific details from previous conversations, making interactions feel more natural and contextually aware.
Depth: The ability to build upon previous conversations enables more sophisticated interactions, allowing for complex problem-solving and long-term project support.

This approach scales remarkably well across different usage scenarios, from individual users to enterprise-level applications. Whether you're building a personal assistant for a single user or a system that serves thousands, the core principle remains the same: you're creating an intelligent memory layer that sits between the user and the API. This architectural approach provides several key capabilities:

Grow: Continuously accumulate new interactions and learnings, building a rich history of user interactions and preferences over time. This growing knowledge base becomes increasingly valuable for personalizing responses.
Summarize: Condense lengthy conversation histories into manageable contexts, using advanced techniques like semantic clustering and importance scoring to maintain the most relevant information.
Adapt: Adjust its retrieval strategies based on conversation patterns, learning which types of historical context are most valuable for different types of interactions.

All while leveraging the same underlying OpenAI API for the actual language processing. This combination of structured memory management and powerful language processing creates a system that can maintain context and personality across multiple conversations while staying within the technical constraints of the API.

7.3 Storing and Retrieving Past Interactions

7.3.1 Why Store Interactions?

Storing past interactions unlocks several powerful capabilities that enhance the AI's ability to provide meaningful and contextual responses:

Personalized Responses

The system learns and adapts to individual users by maintaining a detailed profile of their interactions and preferences over time. This personalization happens on multiple levels:

Communication Style: The system tracks how users express themselves by analyzing multiple aspects of their communication patterns:
- Formality level: Whether they use casual language ("hey there!") or formal address ("Dear Sir/Madam")
- Humor usage: Their tendency to use jokes, emojis, or playful language
- Conversation pace: If they prefer quick exchanges or detailed, lengthy discussions
- Vocabulary choices: Technical vs. simplified language
- Cultural references: Professional, academic, or pop culture references
For example, if a user consistently uses informal language like "hey" and "thanks!" with emojis, the system adapts by responding in a friendly, casual tone. Conversely, when interacting with business users who maintain formal language and professional terms, the system automatically adjusts to use appropriate business etiquette and industry-standard terminology.
This adaptive communication ensures more natural and effective interactions by matching each user's unique communication style and preferences.
Technical Proficiency: By analyzing past interactions, the system gauges users' expertise levels in different domains. This allows it to automatically adjust its explanations based on demonstrated knowledge.
For instance, when discussing programming, the system might use advanced terminology like "polymorphism" and "dependency injection" with experienced developers, while offering simpler explanations using real-world analogies for beginners. The system continuously refines this assessment through ongoing interactions - if a user demonstrates increased understanding over time, the technical depth of explanations adjusts accordingly. This adaptive approach ensures that experts aren't slowed down by basic explanations while newcomers aren't overwhelmed by complex technical details.
Historical Context: The system maintains comprehensive records of previous discussions, projects, and decisions, enabling it to reference past conversations with precision and relevance. This historical tracking operates on multiple levels:
- Conversation Threading: The system can follow the progression of specific topics across multiple sessions, understanding how discussions evolve and build upon each other.
- Project Milestones: Important decisions, agreements, and project updates are recorded and can be referenced to maintain consistency in future discussions.
- User Preferences Evolution: The system tracks how user preferences and requirements change over time, adapting its responses accordingly.
- Contextual References: When addressing current topics, the system can intelligently reference related past discussions to provide more informed and nuanced responses.
This sophisticated context management creates a seamless conversational experience where users feel understood and valued, as the system demonstrates awareness of their history and ongoing needs. For example, if a user previously discussed challenges with a specific programming framework, the system can reference those earlier conversations when providing new solutions or updates.
Customization Preferences: The system maintains and applies detailed user preferences across sessions, including:
- Preferred language and regional variations
  - Language selection (e.g., English, Spanish, Mandarin)
  - Regional dialects and localizations
  - Currency and measurement units
- Format preferences (bullet points vs. paragraphs)
  - Document structure preferences (hierarchical vs. flat)
  - Visual organization (lists, tables, or flowing text)
  - Code formatting conventions when applicable
- Level of detail desired in responses
  - Brief summaries vs. comprehensive explanations
  - Technical depth of content
  - Inclusion of examples and analogies
- Specific terminology or naming conventions
  - Industry-specific vocabulary
  - Preferred technical frameworks or methodologies
  - Company-specific terminology
- Time zones and working hours
  - Meeting scheduling preferences
  - Notification timing preferences
  - Availability windows for synchronous communication

This comprehensive approach to personalization helps create a more natural, efficient, and engaging interaction that feels tailored to each individual user's needs and preferences.

Example: Implementing Personalized Responses

Here's a comprehensive implementation of a personalization system that adapts to user communication styles:

import json
import os
from datetime import datetime
from typing import Dict, List, Optional
import openai

class UserProfile:
    def __init__(self, user_id: str):
        self.user_id = user_id
        self.communication_style = {
            "formality_level": 0.5,  # 0 = casual, 1 = formal
            "technical_level": 0.5,   # 0 = beginner, 1 = expert
            "verbosity": 0.5,         # 0 = concise, 1 = detailed
            "emoji_usage": False
        }
        self.preferences = {
            "language": "en",
            "timezone": "UTC",
            "topics_of_interest": []
        }
        self.interaction_history = []

class PersonalizedAIAssistant:
    def __init__(self, api_key: str):
        self.api_key = api_key
        openai.api_key = api_key
        self.profiles_dir = "user_profiles"
        os.makedirs(self.profiles_dir, exist_ok=True)
        self.users: Dict[str, UserProfile] = {}

    def _get_profile_path(self, user_id: str) -> str:
        return os.path.join(self.profiles_dir, f"{user_id}.json")

    def load_user_profile(self, user_id: str) -> UserProfile:
        if user_id in self.users:
            return self.users[user_id]

        profile_path = self._get_profile_path(user_id)
        if os.path.exists(profile_path):
            with open(profile_path, 'r') as f:
                data = json.load(f)
                profile = UserProfile(user_id)
                profile.communication_style = data.get('communication_style', profile.communication_style)
                profile.preferences = data.get('preferences', profile.preferences)
                profile.interaction_history = data.get('interaction_history', [])
        else:
            profile = UserProfile(user_id)

        self.users[user_id] = profile
        return profile

    def save_user_profile(self, profile: UserProfile):
        data = {
            'communication_style': profile.communication_style,
            'preferences': profile.preferences,
            'interaction_history': profile.interaction_history
        }
        with open(self._get_profile_path(profile.user_id), 'w') as f:
            json.dump(data, f, indent=2)

    def analyze_message(self, message: str) -> dict:
        """Analyze user message to update communication style metrics."""
        return {
            "formality_level": 0.8 if any(word in message.lower() for word in 
                ['please', 'thank you', 'sir', 'madam']) else 0.2,
            "technical_level": 0.8 if any(word in message.lower() for word in 
                ['api', 'function', 'implementation', 'code']) else 0.3,
            "emoji_usage": '😊' in message or '👍' in message
        }

    def generate_system_prompt(self, profile: UserProfile) -> str:
        """Create personalized system prompt based on user profile."""
        style = "formal" if profile.communication_style["formality_level"] > 0.5 else "casual"
        tech_level = "technical" if profile.communication_style["technical_level"] > 0.5 else "simple"
        
        return f"""You are a helpful assistant that communicates in a {style} style.
                  Use {tech_level} language and {'feel free to use emojis' 
                  if profile.communication_style['emoji_usage'] else 'avoid using emojis'}.
                  Communicate {'in detail' if profile.communication_style['verbosity'] > 0.5 
                  else 'concisely'}."""

    async def get_response(self, user_id: str, message: str) -> str:
        profile = self.load_user_profile(user_id)
        
        # Analyze and update user's communication style
        analysis = self.analyze_message(message)
        profile.communication_style.update(analysis)
        
        # Prepare conversation context
        messages = [
            {"role": "system", "content": self.generate_system_prompt(profile)},
            {"role": "user", "content": message}
        ]

        # Add relevant history if available
        if profile.interaction_history:
            recent_history = profile.interaction_history[-3:]  # Last 3 interactions
            messages[1:1] = recent_history

        # Get AI response
        response = openai.ChatCompletion.create(
            model="gpt-4o",
            messages=messages,
            temperature=0.7,
            max_tokens=150
        )

        # Store interaction
        interaction = {
            "timestamp": datetime.utcnow().isoformat(),
            "user_message": message,
            "assistant_response": response.choices[0].message.content
        }
        profile.interaction_history.append(interaction)
        
        # Save updated profile
        self.save_user_profile(profile)
        
        return response.choices[0].message.content

# Usage example
if __name__ == "__main__":
    assistant = PersonalizedAIAssistant("your-api-key-here")
    
    # Example interactions
    responses = [
        assistant.get_response("user123", "Hey there! Can you help me with Python? 😊"),
        assistant.get_response("user123", "Could you explain the technical implementation of APIs?"),
        assistant.get_response("user123", "Dear Sir, I require assistance with programming.")
    ]

Code Breakdown:

Class Structure:
- UserProfile class maintains individual user information:
  - Communication style metrics (formality, technical level, etc.)
  - Personal preferences (language, timezone)
  - Interaction history
- PersonalizedAIAssistant class handles the core functionality:
  - Profile management (loading/saving)
  - Message analysis
  - Response generation
Key Features:
- Persistent Storage: Profiles are saved as JSON files
- Style Analysis: Examines messages for communication patterns
- Dynamic Prompting: Generates customized system prompts
- Context Management: Maintains conversation history
Personalization Aspects:
- Communication Style:
  - Formality level detection
  - Technical language adaptation
  - Emoji usage tracking
- Response Adaptation:
  - Adjusts verbosity based on user preference
  - Maintains consistent style across interactions
  - Incorporates conversation history

Session Resumption

Here's a practical implementation of session resumption:

from datetime import datetime
import json
import openai
from typing import List, Dict, Optional

class SessionManager:
    def __init__(self, api_key: str):
        self.api_key = api_key
        openai.api_key = api_key
        self.sessions: Dict[str, Dict] = {}
        
    def save_session(self, user_id: str, session_data: dict):
        """Save session data to persistent storage."""
        with open(f"sessions/{user_id}.json", "w") as f:
            json.dump(session_data, f)
            
    def load_session(self, user_id: str) -> Optional[dict]:
        """Load session data from storage."""
        try:
            with open(f"sessions/{user_id}.json", "r") as f:
                return json.load(f)
        except FileNotFoundError:
            return None

class ConversationManager:
    def __init__(self, session_manager: SessionManager):
        self.session_manager = session_manager
        self.current_context: List[Dict] = []
        
    def prepare_context(self, user_id: str, new_message: str) -> List[Dict]:
        """Prepare conversation context including session history."""
        # Load previous session if exists
        session = self.session_manager.load_session(user_id)
        
        # Initialize context with system message
        context = [{
            "role": "system",
            "content": "You are a helpful assistant with memory of past conversations."
        }]
        
        # Add relevant history from previous session
        if session and 'history' in session:
            # Add last 5 messages from previous session for context
            context.extend(session['history'][-5:])
        
        # Add new message
        context.append({
            "role": "user",
            "content": new_message
        })
        
        return context

    async def process_message(self, user_id: str, message: str) -> str:
        """Process new message with session context."""
        context = self.prepare_context(user_id, message)
        
        try:
            response = await openai.ChatCompletion.acreate(
                model="gpt-4o",
                messages=context,
                temperature=0.7,
                max_tokens=150
            )
            
            assistant_message = response.choices[0].message.content
            
            # Update session with new interaction
            session_data = {
                'last_interaction': datetime.now().isoformat(),
                'history': context + [{
                    "role": "assistant",
                    "content": assistant_message
                }]
            }
            self.session_manager.save_session(user_id, session_data)
            
            return assistant_message
            
        except Exception as e:
            print(f"Error processing message: {e}")
            return "I apologize, but I encountered an error processing your message."

# Example usage
async def main():
    session_manager = SessionManager("your-api-key-here")
    conversation_manager = ConversationManager(session_manager)
    
    # First interaction
    response1 = await conversation_manager.process_message(
        "user123",
        "What's the weather like today?"
    )
    print("Response 1:", response1)
    
    # Later interaction (session resumption)
    response2 = await conversation_manager.process_message(
        "user123",
        "What did we discuss earlier?"
    )
    print("Response 2:", response2)

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Code Breakdown:

SessionManager Class:
- Handles persistent storage of session data
- Provides methods for saving and loading session information
- Maintains user-specific session files
ConversationManager Class:
- Manages conversation context and history
- Prepares context by combining previous session data with new messages
- Handles interaction with the OpenAI API
Key Features:
- Asynchronous Processing: Uses async/await for efficient API calls
- Context Management: Maintains relevant conversation history
- Error Handling: Includes robust error management
- Session Persistence: Saves conversations to disk for later retrieval
Implementation Details:
- Uses JSON for session storage
- Limits context to last 5 messages for efficiency
- Includes timestamp tracking for session management
- Maintains conversation roles (system, user, assistant)

Persistent Knowledge

The system maintains a robust and comprehensive record of all significant information exchanged during conversations. This persistent knowledge architecture operates on multiple levels:

Basic Information Management: The system captures and stores essential operational data in a structured manner. This includes comprehensive tracking of calendar entries such as meetings and appointments, with metadata like attendees and agendas. Project timelines are maintained with detailed milestone tracking, dependencies, and phase transitions.
The system records all deadlines systematically, from task-level due dates to major project deliverables. Regular updates are stored chronologically, including daily reports, status changes, and project modifications. This robust information architecture ensures that all scheduling and project-related data remains easily retrievable, supporting efficient project management and team coordination.
User-Specific Data: The system maintains detailed profiles of individual users that encompass multiple aspects of their interactions:
- Personal Preferences: Including preferred communication channels, response formats, and specific domain interests
- Communication Styles: Tracking whether users prefer formal or casual language, technical or simplified explanations, and their typical response length preferences
- Technical Expertise: Monitoring and adapting to users' demonstrated knowledge levels across different subjects and adjusting explanations accordingly
- Historical Patterns: Recording timing of interactions, frequently discussed topics, and common questions or concerns
- Language Patterns: Noting vocabulary usage, technical terminology familiarity, and preferred examples or analogies
- Learning Progress: Tracking how users' understanding of various topics evolves over time
This comprehensive user profiling enables the system to deliver increasingly tailored responses that match each user's unique needs and preferences, creating a more effective and engaging interaction experience over time.
Decision Recording: Critical decisions are systematically documented in a comprehensive manner that includes multiple key components:
- Context: The full background situation, business environment, and constraints that framed the decision
- Rationale: Detailed reasoning behind the choice, including:
  - Analysis of alternatives considered
  - Risk assessment and mitigation strategies
  - Expected outcomes and success metrics
- Stakeholders: Complete documentation of:
  - Decision makers and their roles
  - Affected teams and departments
  - External parties involved
- Implementation Plan:
  - Step-by-step execution strategy
  - Resource allocation details
  - Timeline and milestones
This systematic documentation process creates a detailed and auditable trail that enables teams to:
- Track the evolution of important decisions
- Understand the complete context of past choices
- Learn from previous experiences
- Make more informed decisions in the future
- Maintain accountability and transparency
Task Management: The system implements a comprehensive task tracking system that monitors various aspects of project execution:
- Assignment Tracking: Each task is linked to specific team members or departments responsible for its completion, ensuring clear ownership and responsibility
- Timeline Management: Detailed due dates are maintained, including both final deadlines and intermediate milestones, allowing for better time management
- Progress Monitoring: Regular status updates are recorded to track task progression, including completed work, current blockers, and remaining steps
- Dependency Mapping: The system maintains a clear map of task dependencies, helping teams understand how delays or changes in one task might impact others
- Resource Allocation: Tracks the distribution of work and resources across team members to prevent overload and ensure efficient project execution
Project Details: The system maintains comprehensive documentation of technical aspects including:
- Technical Specifications:
  - Detailed system architecture blueprints
  - Complete API documentation and endpoints
  - Database schemas and data models
  - Third-party integration specifications
- Project Requirements:
  - Business requirements and objectives
  - Technical requirements and constraints
  - User stories and acceptance criteria
  - Scope definitions and boundaries
- Challenges and Solutions:
  - Identified technical obstacles
  - Implemented workarounds
  - Performance optimization efforts
  - Security measures and updates
- Implementation Records:
  - Code documentation and examples
  - Architecture decision records
  - Testing strategies and results
  - Deployment procedures

Here's a comprehensive implementation of Persistent Knowledge using OpenAI API:

from typing import Dict, List, Optional
import json
import openai
from datetime import datetime
import tiktoken

class PersistentKnowledge:
    def __init__(self, api_key: str):
        self.api_key = api_key
        openai.api_key = api_key
        self.encoding = tiktoken.encoding_for_model("gpt-4o")
        
    def count_tokens(self, text: str) -> int:
        """Count tokens in text using tiktoken"""
        return len(self.encoding.encode(text))
        
    def save_knowledge(self, user_id: str, category: str, data: Dict):
        """Save knowledge to persistent storage with categories"""
        filename = f"knowledge_{user_id}_{category}.json"
        timestamp = datetime.now().isoformat()
        
        data_with_metadata = {
            "timestamp": timestamp,
            "category": category,
            "content": data
        }
        
        try:
            with open(filename, "r") as f:
                existing_data = json.load(f)
        except FileNotFoundError:
            existing_data = []
            
        existing_data.append(data_with_metadata)
        
        with open(filename, "w") as f:
            json.dump(existing_data, f, indent=2)
            
    def retrieve_knowledge(self, user_id: str, category: str, 
                         max_tokens: int = 2000) -> List[Dict]:
        """Retrieve knowledge with token limit"""
        filename = f"knowledge_{user_id}_{category}.json"
        try:
            with open(filename, "r") as f:
                all_data = json.load(f)
        except FileNotFoundError:
            return []
            
        # Retrieve most recent entries within token limit
        retrieved_data = []
        current_tokens = 0
        
        for entry in reversed(all_data):
            content_str = json.dumps(entry["content"])
            tokens = self.count_tokens(content_str)
            
            if current_tokens + tokens <= max_tokens:
                retrieved_data.append(entry)
                current_tokens += tokens
            else:
                break
                
        return list(reversed(retrieved_data))
        
    async def get_ai_response(self, 
                            user_id: str,
                            current_input: str,
                            categories: List[str] = None) -> str:
        """Generate AI response with context from persistent knowledge"""
        
        # Build context from stored knowledge
        context = []
        if categories:
            for category in categories:
                knowledge = self.retrieve_knowledge(user_id, category)
                if knowledge:
                    context.append(f"\nRelevant {category} knowledge:")
                    for entry in knowledge:
                        context.append(json.dumps(entry["content"]))
                        
        # Prepare messages for API
        messages = [
            {
                "role": "system",
                "content": "You are an assistant with access to persistent knowledge. "
                          "Use this context to provide informed responses."
            },
            {
                "role": "user",
                "content": f"Context:\n{''.join(context)}\n\nCurrent query: {current_input}"
            }
        ]
        
        try:
            response = await openai.ChatCompletion.acreate(
                model="gpt-4",
                messages=messages,
                temperature=0.7,
                max_tokens=500
            )
            return response.choices[0].message.content
        except Exception as e:
            return f"Error generating response: {str(e)}"
            
class KnowledgeManager:
    def __init__(self, api_key: str):
        self.knowledge = PersistentKnowledge(api_key)
        
    async def process_interaction(self, 
                                user_id: str,
                                message: str,
                                categories: List[str] = None) -> str:
        """Process user interaction and maintain knowledge"""
        
        # Save user input
        self.knowledge.save_knowledge(
            user_id,
            "conversations",
            {"role": "user", "message": message}
        )
        
        # Get AI response with context
        response = await self.knowledge.get_ai_response(
            user_id,
            message,
            categories
        )
        
        # Save AI response
        self.knowledge.save_knowledge(
            user_id,
            "conversations",
            {"role": "assistant", "message": response}
        )
        
        return response

# Example usage
async def main():
    manager = KnowledgeManager("your-api-key")
    
    # First interaction
    response1 = await manager.process_interaction(
        "user123",
        "What's the best way to learn Python?",
        ["conversations", "preferences"]
    )
    print("Response 1:", response1)
    
    # Save user preference
    manager.knowledge.save_knowledge(
        "user123",
        "preferences",
        {"learning_style": "hands-on", "preferred_language": "Python"}
    )
    
    # Later interaction using stored knowledge
    response2 = await manager.process_interaction(
        "user123",
        "Can you suggest some projects based on my learning style?",
        ["conversations", "preferences"]
    )
    print("Response 2:", response2)

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Code Breakdown:

PersistentKnowledge Class:
- Handles token counting using tiktoken for context management
- Implements save_knowledge() for storing categorized information
- Provides retrieve_knowledge() with token limits for context retrieval
- Manages AI interactions through get_ai_response()
KnowledgeManager Class:
- Provides high-level interface for knowledge management
- Processes user interactions and maintains conversation history
- Handles saving both user inputs and AI responses
Key Features:
- Categorized Knowledge Storage: Organizes information by user and category
- Token Management: Ensures context stays within model limits
- Metadata Tracking: Includes timestamps and categories for all stored data
- Error Handling: Robust error management for file operations and API calls
Implementation Benefits:
- Scalable: Handles multiple users and knowledge categories
- Efficient: Uses token counting to optimize context usage
- Flexible: Supports various knowledge types and categories
- Maintainable: Well-structured code with clear separation of concerns

Conversation Summarization

Context Refreshing: When participants begin new conversation sessions, the system provides concise yet comprehensive summaries of previous discussions. These summaries serve as efficient briefings that:
- Quickly orient participants on key discussion points
- Highlight important decisions and outcomes
- Identify ongoing action items
- Refresh memory on critical context
  This eliminates the time-consuming process of reviewing extensive conversation logs and ensures all participants can immediately engage productively in the current discussion with full context awareness.
Progress Tracking: Regular summaries serve as a comprehensive tracking mechanism for ongoing discussions and projects. By maintaining detailed records of project evolution, teams can:
- Monitor Development Phases
  - Track progression from initial concepts to implementation
  - Document iterative improvements and refinements
  - Record key turning points in project direction
- Analyze Decision History
  - Capture the context behind important choices
  - Document alternative options considered
  - Track outcomes of implemented decisions
- Identify Project Trends
  - Spot recurring challenges or bottlenecks
  - Recognize successful patterns to replicate
  - Monitor velocity and momentum
- Facilitate Team Alignment
  - Maintain shared understanding of progress
  - Enable data-driven course corrections
  - Support informed resource allocation
Knowledge Extraction: The system employs advanced parsing techniques to identify and extract critical information from conversations, including:
- Key decisions and their rationale
  - Strategic choices made during discussions
  - Supporting evidence and justification
  - Alternative options considered
- Action items and their owners
  - Specific tasks assigned to team members
  - Clear responsibility assignments
  - Follow-up requirements
- Important deadlines and milestones
  - Project timeline markers
  - Critical delivery dates
  - Review and checkpoint schedules
- Unresolved questions or concerns
  - Open technical issues
  - Pending decisions
  - Areas needing clarification
- Agreements and commitments made
  - Formal decisions reached
  - Resource allocation agreements
  - Timeline commitments
Report Generation: Summaries can be automatically compiled into various types of reports:
- Executive briefings
  - High-level overviews for stakeholders
  - Key decisions and strategic implications
  - Resource allocation summaries
- Meeting minutes
  - Detailed discussion points and outcomes
  - Action items and assignees
  - Timeline commitments made
- Progress updates
  - Milestone achievements and delays
  - Current blockers and challenges
  - Next steps and priorities
- Project status reports
  - Overall project health indicators
  - Resource utilization metrics
  - Risk assessments and mitigation strategies

Example: Implementing Conversation Summarization

Here's a practical implementation of a conversation summarizer using OpenAI's API:

import openai
from typing import List, Dict
from datetime import datetime

class ConversationSummarizer:
    def __init__(self, api_key: str):
        self.api_key = api_key
        openai.api_key = api_key
        
    def create_summary_prompt(self, messages: List[Dict]) -> str:
        """Create a prompt for summarization from messages"""
        conversation = "\n".join([
            f"{msg['role'].title()}: {msg['content']}" 
            for msg in messages
        ])
        
        return f"""Please provide a concise summary of the following conversation, 
        highlighting key points, decisions, and action items:

        {conversation}
        
        Summary should include:
        1. Main topics discussed
        2. Key decisions made
        3. Action items and owners
        4. Unresolved questions
        """

    async def generate_summary(self, messages: List[Dict]) -> Dict:
        """Generate a structured summary of the conversation"""
        try:
            response = await openai.ChatCompletion.acreate(
                model="gpt-4o",
                messages=[{
                    "role": "user",
                    "content": self.create_summary_prompt(messages)
                }],
                temperature=0.7,
                max_tokens=500
            )
            
            summary = response.choices[0].message.content
            
            return {
                "timestamp": datetime.now().isoformat(),
                "message_count": len(messages),
                "summary": summary
            }
            
        except Exception as e:
            return {
                "error": f"Failed to generate summary: {str(e)}",
                "timestamp": datetime.now().isoformat()
            }

    def save_summary(self, summary: Dict, filename: str = "summaries.json"):
        """Save summary to JSON file"""
        try:
            with open(filename, "r") as f:
                summaries = json.load(f)
        except FileNotFoundError:
            summaries = []
            
        summaries.append(summary)
        
        with open(filename, "w") as f:
            json.dump(summaries, f, indent=2)

# Example usage
async def main():
    summarizer = ConversationSummarizer("your-api-key")
    
    # Sample conversation
    messages = [
        {"role": "user", "content": "Let's discuss the new feature implementation."},
        {"role": "assistant", "content": "Sure! What specific aspects would you like to focus on?"},
        {"role": "user", "content": "We need to implement user authentication by next week."},
        {"role": "assistant", "content": "I understand. Let's break down the requirements and timeline."}
    ]
    
    # Generate and save summary
    summary = await summarizer.generate_summary(messages)
    summarizer.save_summary(summary)
    print("Summary:", summary["summary"])

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Code Breakdown:

Class Structure:
- ConversationSummarizer class handles all summarization operations
- Initialization with API key setup
- Methods for prompt creation, summary generation, and storage
Key Features:
- Structured prompt generation for consistent summaries
- Async API calls for better performance
- Error handling and logging
- Persistent storage of summaries
Implementation Benefits:
- Scalable: Handles conversations of varying lengths
- Structured Output: Organized summaries with key information
- Historical Tracking: Maintains summary history
- Error Resilient: Robust error handling and logging

This example provides a reliable way to generate and maintain conversation summaries, making it easier to track discussion progress and key decisions over time.

7.3.2 Core Architecture

Together, these three pillars create a robust foundation that enables AI systems to maintain meaningful, context-aware dialogues across multiple interactions:

Storage

The foundation of memory management where all interactions are systematically saved and preserved for future use. This critical component serves as the backbone of any AI conversation system:

Can utilize various storage solutions like JSON files, SQL databases, or cloud storage
- JSON files offer simplicity and portability for smaller applications
- SQL databases provide robust querying and indexing for larger datasets
- Cloud storage enables scalable, distributed access across multiple services
Should include metadata like timestamps and user identifiers
- Timestamps enable chronological tracking and time-based filtering
- User IDs maintain conversation threads and personalization
- Additional metadata can track conversation topics and contexts
Must be organized for efficient retrieval and scaling
- Implement proper indexing for quick access to relevant data
- Use data partitioning for improved performance
- Consider compression strategies for long-term storage

Retrieval

The intelligent system for accessing and filtering relevant past conversations serves as a crucial component in managing conversation history:

Implements search algorithms to find context-appropriate historical data
- Uses semantic search to match similar topics and themes
- Employs fuzzy matching for flexible text comparison
- Indexes conversations for quick retrieval
Uses parameters like recency, relevance, and conversation thread
- Prioritizes recent interactions for immediate context
- Weighs relevance based on topic similarity scores
- Maintains thread continuity by tracking conversation flows
Manages token limits by selecting the most important context
- Implements smart truncation strategies
- Prioritizes key information while removing redundant content
- Dynamically adjusts context window based on model limitations

Injection

The process of seamlessly incorporating historical context requires careful handling to maintain conversation coherence:

Strategically places retrieved messages into the current conversation flow
- Determines optimal insertion points for historical context
- Filters and prioritizes relevant historical information
- Balances new and historical content for natural flow
Maintains proper message ordering and relationships
- Preserves chronological sequence of interactions
- Respects conversation threading and reply chains
- Links related topics and discussions appropriately
Ensures smooth context integration without disrupting the conversation
- Avoids abrupt context switches or information overload
- Uses natural transition phrases and references
- Maintains consistent tone and conversation style

7.3.3 How to Store and Retrieve JSON-based Message Histories in Python

Step 1: Build the Memory Manager

This module handles storing user interactions in JSON format and retrieving only the relevant ones.

import os
import json
from typing import List, Dict

MEMORY_DIR = "user_memory"
os.makedirs(MEMORY_DIR, exist_ok=True)

# Constants
MAX_HISTORY_MESSAGES = 5  # Truncate history to last 5 messages to manage tokens

def get_memory_path(user_id: str) -> str:
    return os.path.join(MEMORY_DIR, f"{user_id}.json")

def load_history(user_id: str) -> List[Dict[str, str]]:
    path = get_memory_path(user_id)
    if not os.path.exists(path):
        return []
    try:
        with open(path, "r", encoding="utf-8") as f:
            return json.load(f)
    except json.JSONDecodeError:
        return []

def store_interaction(user_id: str, role: str, content: str) -> None:
    message = {"role": role, "content": content}
    path = get_memory_path(user_id)

    history = load_history(user_id)
    history.append(message)

    with open(path, "w", encoding="utf-8") as f:
        json.dump(history, f, indent=2)

def get_recent_history(user_id: str, limit: int = MAX_HISTORY_MESSAGES) -> List[Dict[str, str]]:
    history = load_history(user_id)
    return history[-limit:]

Let's break down this code:

1. Initial Setup

Creates a "user_memory" directory to store conversation histories
Sets a maximum limit of 5 messages for history management

2. Core Functions

get_memory_path(user_id): Creates a unique JSON file path for each user
load_history(user_id):
- Attempts to read the user's conversation history
- Returns an empty list if file doesn't exist or is corrupted
store_interaction(user_id, role, content):
- Saves new messages to the user's history file
- Appends the message to existing history
- Stores in JSON format with proper indentation
get_recent_history(user_id, limit):
- Retrieves the most recent messages
- Respects the MAX_HISTORY_MESSAGES limit (5 messages)

3. Key Features

Persistent storage: Each user's conversations are saved in separate JSON files
Scalability: System can handle multiple users with individual files
Controlled context: Allows specific control over how much history to maintain
Debug-friendly: JSON format makes it easy to inspect stored conversations

Step 2: Create the Chat Engine Using OpenAI API

Now let’s integrate this memory system with the OpenAI API. We’ll load previous messages, add the new prompt, query the model, and save the response.

import openai
from dotenv import load_dotenv

load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

SYSTEM_PROMPT = {
    "role": "system",
    "content": "You are a helpful assistant that remembers the user's previous messages."
}

def continue_conversation(user_id: str, user_input: str) -> str:
    # Validate input
    if not user_input.strip():
        return "Input cannot be empty."

    # Load memory and prepare messages
    recent_history = get_recent_history(user_id)
    messages = [SYSTEM_PROMPT] + recent_history + [{"role": "user", "content": user_input}]

    # Call OpenAI API
    try:
        response = openai.ChatCompletion.create(
            model="gpt-4o",
            messages=messages,
            max_tokens=300,
            temperature=0.7
        )
        assistant_reply = response["choices"][0]["message"]["content"]

        # Store both messages
        store_interaction(user_id, "user", user_input)
        store_interaction(user_id, "assistant", assistant_reply)

        return assistant_reply

    except Exception as e:
        return f"Something went wrong: {e}"

Let’s break down this code:

1. Initial Setup

Imports required libraries (openai and dotenv)
Loads environment variables and sets up OpenAI API key
Defines a system prompt that establishes the assistant's role

2. Main Function: continue_conversation

Takes user_id and user_input as parameters
Input validation to check for empty messages
Loads conversation history using get_recent_history (defined in previous section)
Constructs messages array combining:
- System prompt
- Recent conversation history
- Current user input

3. API Interaction

Makes API call to OpenAI with parameters:
- Uses "gpt-4o" model
- Sets max_tokens to 300
- Uses temperature of 0.7 for balanced creativity
Extracts assistant's reply from the response

4. Memory Management

Stores both the user's input and assistant's reply using store_interaction
Handles errors gracefully with try-except block

This portion creates a stateful conversation system that maintains context across multiple interactions while managing the conversation flow efficiently.

Step 3: Test the Conversation Memory

if __name__ == "__main__":
    user_id = "user_456"

    print("User: What are the best Python libraries for data science?")
    reply1 = continue_conversation(user_id, "What are the best Python libraries for data science?")
    print("Assistant:", reply1)

    print("\nUser: Could you remind me which ones you mentioned?")
    reply2 = continue_conversation(user_id, "Could you remind me which ones you mentioned?")
    print("Assistant:", reply2)

Let's break down this code example for testing conversation memory:

1. Entry Point Check

The code uses the standard Python if __name__ == "__main__": idiom to ensure this code only runs when the file is executed directly

2. User Setup

Creates a test user with ID "user_456"

3. Test Conversation Flow

Demonstrates a two-turn conversation where:
First turn: Asks about Python libraries for data science
Second turn: Asks for a reminder of previously mentioned libraries, testing the memory system

4. Implementation Details

Each interaction uses the continue_conversation() function to:
Process the user input
Generate and store the response
Print both the user input and assistant's response

This test code effectively demonstrates how the system maintains context between multiple interactions, allowing the assistant to reference previous responses when answering follow-up questions.

Benefits of This Approach

Persistent: All conversations are stored locally by user, ensuring that no interaction history is lost. This means your application can maintain context across multiple sessions, even if the server restarts or the application closes.
Scalable (to a point): By storing each user's conversation history in their own dedicated JSON file, the system can handle multiple users efficiently. This approach works well for small to medium-sized applications, though for very large deployments you might want to consider a database solution.
Controllable context: The system gives you complete control over how much conversation history to include in each interaction. You can adjust the memory window size, filter by relevance, or implement custom logic for selecting which previous messages to include in the context.
Readable: The JSON file format makes it simple to inspect, debug, and modify stored conversations. This is invaluable during development and maintenance, as you can easily view the conversation history in any text editor and validate the data structure.

Optional Enhancements

Summarization:Instead of storing every message verbatim, implement periodic summarization of conversation history. This technique involves automatically generating concise summaries of longer conversation segments, which helps:
- Reduce token usage in API calls
- Maintain essential context while removing redundant details
- Create a more efficient memory structure
For example, multiple messages about a specific topic could be condensed into a single summary statement.
Vector Search (Advanced):Transform messages into numerical vectors using embedding models, enabling sophisticated retrieval based on semantic meaning. This approach offers several advantages:
- Discover contextually relevant historical messages even if they use different words
- Prioritize messages based on their relevance to the current conversation
- Enable fast similarity searches across large conversation histories
This is particularly useful for long-running conversations or when specific context needs to be recalled.
Token Budgeting:Implement smart token management strategies to optimize context window usage. This includes:
- Setting dynamic limits based on conversation importance
- Implementing intelligent pruning of older, less relevant messages
- Maintaining a balance between recent context and important historical information
This ensures you stay within API token limits while preserving the most valuable conversation context.
Keep the system prompt consistent across interactions
- Maintain identical prompt wording and instructions throughout the entire conversation lifecycle
- Use version control to track any system prompt changes across deployments
- Prevents confusion and contradictory responses by maintaining consistent context
- Ensures the AI maintains a reliable personality and behavioral pattern throughout interactions
Don't overload the context—store everything, but retrieve selectively
- Implement a comprehensive storage system that maintains complete conversation histories in your database
- Develop intelligent retrieval algorithms that prioritize relevant context for API calls
- Use semantic search or embedding-based similarity to find pertinent historical messages
- Balance token usage by implementing smart pruning strategies for older messages
Label stored messages clearly (role, content, timestamp) for future filtering or summarization
- Role: Carefully identify and tag message sources (system, user, or assistant) to maintain clear conversation flow and enable role-based filtering
- Content: Implement consistent formatting standards for message content, including handling of special characters and maintaining data integrity
- Timestamp: Add precise temporal metadata to enable sophisticated time-based operations like conversation segmentation and contextual relevance scoring

This memory system transforms your AI assistant in several powerful ways:

Continuity: The assistant can reference past conversations and maintain context over extended periods, creating seamless interactions that build upon previous discussions. For example, if a user mentions their preference for Python programming in one conversation, the system can reference this in future interactions.
Personality: Consistent response patterns and remembered preferences create a more distinct personality. This includes maintaining a consistent tone, remembering user preferences, and adapting communication styles based on past interactions.
Understanding: By accessing historical context, responses become more informed and personalized. The system can recall specific details from previous conversations, making interactions feel more natural and contextually aware.
Depth: The ability to build upon previous conversations enables more sophisticated interactions, allowing for complex problem-solving and long-term project support.

Grow: Continuously accumulate new interactions and learnings, building a rich history of user interactions and preferences over time. This growing knowledge base becomes increasingly valuable for personalizing responses.
Summarize: Condense lengthy conversation histories into manageable contexts, using advanced techniques like semantic clustering and importance scoring to maintain the most relevant information.
Adapt: Adjust its retrieval strategies based on conversation patterns, learning which types of historical context are most valuable for different types of interactions.

7.3 Storing and Retrieving Past Interactions

7.3.1 Why Store Interactions?

Storing past interactions unlocks several powerful capabilities that enhance the AI's ability to provide meaningful and contextual responses:

Personalized Responses

The system learns and adapts to individual users by maintaining a detailed profile of their interactions and preferences over time. This personalization happens on multiple levels:

Communication Style: The system tracks how users express themselves by analyzing multiple aspects of their communication patterns:
- Formality level: Whether they use casual language ("hey there!") or formal address ("Dear Sir/Madam")
- Humor usage: Their tendency to use jokes, emojis, or playful language
- Conversation pace: If they prefer quick exchanges or detailed, lengthy discussions
- Vocabulary choices: Technical vs. simplified language
- Cultural references: Professional, academic, or pop culture references
For example, if a user consistently uses informal language like "hey" and "thanks!" with emojis, the system adapts by responding in a friendly, casual tone. Conversely, when interacting with business users who maintain formal language and professional terms, the system automatically adjusts to use appropriate business etiquette and industry-standard terminology.
This adaptive communication ensures more natural and effective interactions by matching each user's unique communication style and preferences.
Technical Proficiency: By analyzing past interactions, the system gauges users' expertise levels in different domains. This allows it to automatically adjust its explanations based on demonstrated knowledge.
For instance, when discussing programming, the system might use advanced terminology like "polymorphism" and "dependency injection" with experienced developers, while offering simpler explanations using real-world analogies for beginners. The system continuously refines this assessment through ongoing interactions - if a user demonstrates increased understanding over time, the technical depth of explanations adjusts accordingly. This adaptive approach ensures that experts aren't slowed down by basic explanations while newcomers aren't overwhelmed by complex technical details.
Historical Context: The system maintains comprehensive records of previous discussions, projects, and decisions, enabling it to reference past conversations with precision and relevance. This historical tracking operates on multiple levels:
- Conversation Threading: The system can follow the progression of specific topics across multiple sessions, understanding how discussions evolve and build upon each other.
- Project Milestones: Important decisions, agreements, and project updates are recorded and can be referenced to maintain consistency in future discussions.
- User Preferences Evolution: The system tracks how user preferences and requirements change over time, adapting its responses accordingly.
- Contextual References: When addressing current topics, the system can intelligently reference related past discussions to provide more informed and nuanced responses.
This sophisticated context management creates a seamless conversational experience where users feel understood and valued, as the system demonstrates awareness of their history and ongoing needs. For example, if a user previously discussed challenges with a specific programming framework, the system can reference those earlier conversations when providing new solutions or updates.
Customization Preferences: The system maintains and applies detailed user preferences across sessions, including:
- Preferred language and regional variations
  - Language selection (e.g., English, Spanish, Mandarin)
  - Regional dialects and localizations
  - Currency and measurement units
- Format preferences (bullet points vs. paragraphs)
  - Document structure preferences (hierarchical vs. flat)
  - Visual organization (lists, tables, or flowing text)
  - Code formatting conventions when applicable
- Level of detail desired in responses
  - Brief summaries vs. comprehensive explanations
  - Technical depth of content
  - Inclusion of examples and analogies
- Specific terminology or naming conventions
  - Industry-specific vocabulary
  - Preferred technical frameworks or methodologies
  - Company-specific terminology
- Time zones and working hours
  - Meeting scheduling preferences
  - Notification timing preferences
  - Availability windows for synchronous communication

This comprehensive approach to personalization helps create a more natural, efficient, and engaging interaction that feels tailored to each individual user's needs and preferences.

Example: Implementing Personalized Responses

Here's a comprehensive implementation of a personalization system that adapts to user communication styles:

import json
import os
from datetime import datetime
from typing import Dict, List, Optional
import openai

class UserProfile:
    def __init__(self, user_id: str):
        self.user_id = user_id
        self.communication_style = {
            "formality_level": 0.5,  # 0 = casual, 1 = formal
            "technical_level": 0.5,   # 0 = beginner, 1 = expert
            "verbosity": 0.5,         # 0 = concise, 1 = detailed
            "emoji_usage": False
        }
        self.preferences = {
            "language": "en",
            "timezone": "UTC",
            "topics_of_interest": []
        }
        self.interaction_history = []

class PersonalizedAIAssistant:
    def __init__(self, api_key: str):
        self.api_key = api_key
        openai.api_key = api_key
        self.profiles_dir = "user_profiles"
        os.makedirs(self.profiles_dir, exist_ok=True)
        self.users: Dict[str, UserProfile] = {}

    def _get_profile_path(self, user_id: str) -> str:
        return os.path.join(self.profiles_dir, f"{user_id}.json")

    def load_user_profile(self, user_id: str) -> UserProfile:
        if user_id in self.users:
            return self.users[user_id]

        profile_path = self._get_profile_path(user_id)
        if os.path.exists(profile_path):
            with open(profile_path, 'r') as f:
                data = json.load(f)
                profile = UserProfile(user_id)
                profile.communication_style = data.get('communication_style', profile.communication_style)
                profile.preferences = data.get('preferences', profile.preferences)
                profile.interaction_history = data.get('interaction_history', [])
        else:
            profile = UserProfile(user_id)

        self.users[user_id] = profile
        return profile

    def save_user_profile(self, profile: UserProfile):
        data = {
            'communication_style': profile.communication_style,
            'preferences': profile.preferences,
            'interaction_history': profile.interaction_history
        }
        with open(self._get_profile_path(profile.user_id), 'w') as f:
            json.dump(data, f, indent=2)

    def analyze_message(self, message: str) -> dict:
        """Analyze user message to update communication style metrics."""
        return {
            "formality_level": 0.8 if any(word in message.lower() for word in 
                ['please', 'thank you', 'sir', 'madam']) else 0.2,
            "technical_level": 0.8 if any(word in message.lower() for word in 
                ['api', 'function', 'implementation', 'code']) else 0.3,
            "emoji_usage": '😊' in message or '👍' in message
        }

    def generate_system_prompt(self, profile: UserProfile) -> str:
        """Create personalized system prompt based on user profile."""
        style = "formal" if profile.communication_style["formality_level"] > 0.5 else "casual"
        tech_level = "technical" if profile.communication_style["technical_level"] > 0.5 else "simple"
        
        return f"""You are a helpful assistant that communicates in a {style} style.
                  Use {tech_level} language and {'feel free to use emojis' 
                  if profile.communication_style['emoji_usage'] else 'avoid using emojis'}.
                  Communicate {'in detail' if profile.communication_style['verbosity'] > 0.5 
                  else 'concisely'}."""

    async def get_response(self, user_id: str, message: str) -> str:
        profile = self.load_user_profile(user_id)
        
        # Analyze and update user's communication style
        analysis = self.analyze_message(message)
        profile.communication_style.update(analysis)
        
        # Prepare conversation context
        messages = [
            {"role": "system", "content": self.generate_system_prompt(profile)},
            {"role": "user", "content": message}
        ]

        # Add relevant history if available
        if profile.interaction_history:
            recent_history = profile.interaction_history[-3:]  # Last 3 interactions
            messages[1:1] = recent_history

        # Get AI response
        response = openai.ChatCompletion.create(
            model="gpt-4o",
            messages=messages,
            temperature=0.7,
            max_tokens=150
        )

        # Store interaction
        interaction = {
            "timestamp": datetime.utcnow().isoformat(),
            "user_message": message,
            "assistant_response": response.choices[0].message.content
        }
        profile.interaction_history.append(interaction)
        
        # Save updated profile
        self.save_user_profile(profile)
        
        return response.choices[0].message.content

# Usage example
if __name__ == "__main__":
    assistant = PersonalizedAIAssistant("your-api-key-here")
    
    # Example interactions
    responses = [
        assistant.get_response("user123", "Hey there! Can you help me with Python? 😊"),
        assistant.get_response("user123", "Could you explain the technical implementation of APIs?"),
        assistant.get_response("user123", "Dear Sir, I require assistance with programming.")
    ]

Code Breakdown:

Class Structure:
- UserProfile class maintains individual user information:
  - Communication style metrics (formality, technical level, etc.)
  - Personal preferences (language, timezone)
  - Interaction history
- PersonalizedAIAssistant class handles the core functionality:
  - Profile management (loading/saving)
  - Message analysis
  - Response generation
Key Features:
- Persistent Storage: Profiles are saved as JSON files
- Style Analysis: Examines messages for communication patterns
- Dynamic Prompting: Generates customized system prompts
- Context Management: Maintains conversation history
Personalization Aspects:
- Communication Style:
  - Formality level detection
  - Technical language adaptation
  - Emoji usage tracking
- Response Adaptation:
  - Adjusts verbosity based on user preference
  - Maintains consistent style across interactions
  - Incorporates conversation history

Session Resumption

Here's a practical implementation of session resumption:

from datetime import datetime
import json
import openai
from typing import List, Dict, Optional

class SessionManager:
    def __init__(self, api_key: str):
        self.api_key = api_key
        openai.api_key = api_key
        self.sessions: Dict[str, Dict] = {}
        
    def save_session(self, user_id: str, session_data: dict):
        """Save session data to persistent storage."""
        with open(f"sessions/{user_id}.json", "w") as f:
            json.dump(session_data, f)
            
    def load_session(self, user_id: str) -> Optional[dict]:
        """Load session data from storage."""
        try:
            with open(f"sessions/{user_id}.json", "r") as f:
                return json.load(f)
        except FileNotFoundError:
            return None

class ConversationManager:
    def __init__(self, session_manager: SessionManager):
        self.session_manager = session_manager
        self.current_context: List[Dict] = []
        
    def prepare_context(self, user_id: str, new_message: str) -> List[Dict]:
        """Prepare conversation context including session history."""
        # Load previous session if exists
        session = self.session_manager.load_session(user_id)
        
        # Initialize context with system message
        context = [{
            "role": "system",
            "content": "You are a helpful assistant with memory of past conversations."
        }]
        
        # Add relevant history from previous session
        if session and 'history' in session:
            # Add last 5 messages from previous session for context
            context.extend(session['history'][-5:])
        
        # Add new message
        context.append({
            "role": "user",
            "content": new_message
        })
        
        return context

    async def process_message(self, user_id: str, message: str) -> str:
        """Process new message with session context."""
        context = self.prepare_context(user_id, message)
        
        try:
            response = await openai.ChatCompletion.acreate(
                model="gpt-4o",
                messages=context,
                temperature=0.7,
                max_tokens=150
            )
            
            assistant_message = response.choices[0].message.content
            
            # Update session with new interaction
            session_data = {
                'last_interaction': datetime.now().isoformat(),
                'history': context + [{
                    "role": "assistant",
                    "content": assistant_message
                }]
            }
            self.session_manager.save_session(user_id, session_data)
            
            return assistant_message
            
        except Exception as e:
            print(f"Error processing message: {e}")
            return "I apologize, but I encountered an error processing your message."

# Example usage
async def main():
    session_manager = SessionManager("your-api-key-here")
    conversation_manager = ConversationManager(session_manager)
    
    # First interaction
    response1 = await conversation_manager.process_message(
        "user123",
        "What's the weather like today?"
    )
    print("Response 1:", response1)
    
    # Later interaction (session resumption)
    response2 = await conversation_manager.process_message(
        "user123",
        "What did we discuss earlier?"
    )
    print("Response 2:", response2)

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Code Breakdown:

SessionManager Class:
- Handles persistent storage of session data
- Provides methods for saving and loading session information
- Maintains user-specific session files
ConversationManager Class:
- Manages conversation context and history
- Prepares context by combining previous session data with new messages
- Handles interaction with the OpenAI API
Key Features:
- Asynchronous Processing: Uses async/await for efficient API calls
- Context Management: Maintains relevant conversation history
- Error Handling: Includes robust error management
- Session Persistence: Saves conversations to disk for later retrieval
Implementation Details:
- Uses JSON for session storage
- Limits context to last 5 messages for efficiency
- Includes timestamp tracking for session management
- Maintains conversation roles (system, user, assistant)

Persistent Knowledge

The system maintains a robust and comprehensive record of all significant information exchanged during conversations. This persistent knowledge architecture operates on multiple levels:

Basic Information Management: The system captures and stores essential operational data in a structured manner. This includes comprehensive tracking of calendar entries such as meetings and appointments, with metadata like attendees and agendas. Project timelines are maintained with detailed milestone tracking, dependencies, and phase transitions.
The system records all deadlines systematically, from task-level due dates to major project deliverables. Regular updates are stored chronologically, including daily reports, status changes, and project modifications. This robust information architecture ensures that all scheduling and project-related data remains easily retrievable, supporting efficient project management and team coordination.
User-Specific Data: The system maintains detailed profiles of individual users that encompass multiple aspects of their interactions:
- Personal Preferences: Including preferred communication channels, response formats, and specific domain interests
- Communication Styles: Tracking whether users prefer formal or casual language, technical or simplified explanations, and their typical response length preferences
- Technical Expertise: Monitoring and adapting to users' demonstrated knowledge levels across different subjects and adjusting explanations accordingly
- Historical Patterns: Recording timing of interactions, frequently discussed topics, and common questions or concerns
- Language Patterns: Noting vocabulary usage, technical terminology familiarity, and preferred examples or analogies
- Learning Progress: Tracking how users' understanding of various topics evolves over time
This comprehensive user profiling enables the system to deliver increasingly tailored responses that match each user's unique needs and preferences, creating a more effective and engaging interaction experience over time.
Decision Recording: Critical decisions are systematically documented in a comprehensive manner that includes multiple key components:
- Context: The full background situation, business environment, and constraints that framed the decision
- Rationale: Detailed reasoning behind the choice, including:
  - Analysis of alternatives considered
  - Risk assessment and mitigation strategies
  - Expected outcomes and success metrics
- Stakeholders: Complete documentation of:
  - Decision makers and their roles
  - Affected teams and departments
  - External parties involved
- Implementation Plan:
  - Step-by-step execution strategy
  - Resource allocation details
  - Timeline and milestones
This systematic documentation process creates a detailed and auditable trail that enables teams to:
- Track the evolution of important decisions
- Understand the complete context of past choices
- Learn from previous experiences
- Make more informed decisions in the future
- Maintain accountability and transparency
Task Management: The system implements a comprehensive task tracking system that monitors various aspects of project execution:
- Assignment Tracking: Each task is linked to specific team members or departments responsible for its completion, ensuring clear ownership and responsibility
- Timeline Management: Detailed due dates are maintained, including both final deadlines and intermediate milestones, allowing for better time management
- Progress Monitoring: Regular status updates are recorded to track task progression, including completed work, current blockers, and remaining steps
- Dependency Mapping: The system maintains a clear map of task dependencies, helping teams understand how delays or changes in one task might impact others
- Resource Allocation: Tracks the distribution of work and resources across team members to prevent overload and ensure efficient project execution
Project Details: The system maintains comprehensive documentation of technical aspects including:
- Technical Specifications:
  - Detailed system architecture blueprints
  - Complete API documentation and endpoints
  - Database schemas and data models
  - Third-party integration specifications
- Project Requirements:
  - Business requirements and objectives
  - Technical requirements and constraints
  - User stories and acceptance criteria
  - Scope definitions and boundaries
- Challenges and Solutions:
  - Identified technical obstacles
  - Implemented workarounds
  - Performance optimization efforts
  - Security measures and updates
- Implementation Records:
  - Code documentation and examples
  - Architecture decision records
  - Testing strategies and results
  - Deployment procedures

Here's a comprehensive implementation of Persistent Knowledge using OpenAI API:

from typing import Dict, List, Optional
import json
import openai
from datetime import datetime
import tiktoken

class PersistentKnowledge:
    def __init__(self, api_key: str):
        self.api_key = api_key
        openai.api_key = api_key
        self.encoding = tiktoken.encoding_for_model("gpt-4o")
        
    def count_tokens(self, text: str) -> int:
        """Count tokens in text using tiktoken"""
        return len(self.encoding.encode(text))
        
    def save_knowledge(self, user_id: str, category: str, data: Dict):
        """Save knowledge to persistent storage with categories"""
        filename = f"knowledge_{user_id}_{category}.json"
        timestamp = datetime.now().isoformat()
        
        data_with_metadata = {
            "timestamp": timestamp,
            "category": category,
            "content": data
        }
        
        try:
            with open(filename, "r") as f:
                existing_data = json.load(f)
        except FileNotFoundError:
            existing_data = []
            
        existing_data.append(data_with_metadata)
        
        with open(filename, "w") as f:
            json.dump(existing_data, f, indent=2)
            
    def retrieve_knowledge(self, user_id: str, category: str, 
                         max_tokens: int = 2000) -> List[Dict]:
        """Retrieve knowledge with token limit"""
        filename = f"knowledge_{user_id}_{category}.json"
        try:
            with open(filename, "r") as f:
                all_data = json.load(f)
        except FileNotFoundError:
            return []
            
        # Retrieve most recent entries within token limit
        retrieved_data = []
        current_tokens = 0
        
        for entry in reversed(all_data):
            content_str = json.dumps(entry["content"])
            tokens = self.count_tokens(content_str)
            
            if current_tokens + tokens <= max_tokens:
                retrieved_data.append(entry)
                current_tokens += tokens
            else:
                break
                
        return list(reversed(retrieved_data))
        
    async def get_ai_response(self, 
                            user_id: str,
                            current_input: str,
                            categories: List[str] = None) -> str:
        """Generate AI response with context from persistent knowledge"""
        
        # Build context from stored knowledge
        context = []
        if categories:
            for category in categories:
                knowledge = self.retrieve_knowledge(user_id, category)
                if knowledge:
                    context.append(f"\nRelevant {category} knowledge:")
                    for entry in knowledge:
                        context.append(json.dumps(entry["content"]))
                        
        # Prepare messages for API
        messages = [
            {
                "role": "system",
                "content": "You are an assistant with access to persistent knowledge. "
                          "Use this context to provide informed responses."
            },
            {
                "role": "user",
                "content": f"Context:\n{''.join(context)}\n\nCurrent query: {current_input}"
            }
        ]
        
        try:
            response = await openai.ChatCompletion.acreate(
                model="gpt-4",
                messages=messages,
                temperature=0.7,
                max_tokens=500
            )
            return response.choices[0].message.content
        except Exception as e:
            return f"Error generating response: {str(e)}"
            
class KnowledgeManager:
    def __init__(self, api_key: str):
        self.knowledge = PersistentKnowledge(api_key)
        
    async def process_interaction(self, 
                                user_id: str,
                                message: str,
                                categories: List[str] = None) -> str:
        """Process user interaction and maintain knowledge"""
        
        # Save user input
        self.knowledge.save_knowledge(
            user_id,
            "conversations",
            {"role": "user", "message": message}
        )
        
        # Get AI response with context
        response = await self.knowledge.get_ai_response(
            user_id,
            message,
            categories
        )
        
        # Save AI response
        self.knowledge.save_knowledge(
            user_id,
            "conversations",
            {"role": "assistant", "message": response}
        )
        
        return response

# Example usage
async def main():
    manager = KnowledgeManager("your-api-key")
    
    # First interaction
    response1 = await manager.process_interaction(
        "user123",
        "What's the best way to learn Python?",
        ["conversations", "preferences"]
    )
    print("Response 1:", response1)
    
    # Save user preference
    manager.knowledge.save_knowledge(
        "user123",
        "preferences",
        {"learning_style": "hands-on", "preferred_language": "Python"}
    )
    
    # Later interaction using stored knowledge
    response2 = await manager.process_interaction(
        "user123",
        "Can you suggest some projects based on my learning style?",
        ["conversations", "preferences"]
    )
    print("Response 2:", response2)

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Code Breakdown:

PersistentKnowledge Class:
- Handles token counting using tiktoken for context management
- Implements save_knowledge() for storing categorized information
- Provides retrieve_knowledge() with token limits for context retrieval
- Manages AI interactions through get_ai_response()
KnowledgeManager Class:
- Provides high-level interface for knowledge management
- Processes user interactions and maintains conversation history
- Handles saving both user inputs and AI responses
Key Features:
- Categorized Knowledge Storage: Organizes information by user and category
- Token Management: Ensures context stays within model limits
- Metadata Tracking: Includes timestamps and categories for all stored data
- Error Handling: Robust error management for file operations and API calls
Implementation Benefits:
- Scalable: Handles multiple users and knowledge categories
- Efficient: Uses token counting to optimize context usage
- Flexible: Supports various knowledge types and categories
- Maintainable: Well-structured code with clear separation of concerns

Conversation Summarization

Context Refreshing: When participants begin new conversation sessions, the system provides concise yet comprehensive summaries of previous discussions. These summaries serve as efficient briefings that:
- Quickly orient participants on key discussion points
- Highlight important decisions and outcomes
- Identify ongoing action items
- Refresh memory on critical context
  This eliminates the time-consuming process of reviewing extensive conversation logs and ensures all participants can immediately engage productively in the current discussion with full context awareness.
Progress Tracking: Regular summaries serve as a comprehensive tracking mechanism for ongoing discussions and projects. By maintaining detailed records of project evolution, teams can:
- Monitor Development Phases
  - Track progression from initial concepts to implementation
  - Document iterative improvements and refinements
  - Record key turning points in project direction
- Analyze Decision History
  - Capture the context behind important choices
  - Document alternative options considered
  - Track outcomes of implemented decisions
- Identify Project Trends
  - Spot recurring challenges or bottlenecks
  - Recognize successful patterns to replicate
  - Monitor velocity and momentum
- Facilitate Team Alignment
  - Maintain shared understanding of progress
  - Enable data-driven course corrections
  - Support informed resource allocation
Knowledge Extraction: The system employs advanced parsing techniques to identify and extract critical information from conversations, including:
- Key decisions and their rationale
  - Strategic choices made during discussions
  - Supporting evidence and justification
  - Alternative options considered
- Action items and their owners
  - Specific tasks assigned to team members
  - Clear responsibility assignments
  - Follow-up requirements
- Important deadlines and milestones
  - Project timeline markers
  - Critical delivery dates
  - Review and checkpoint schedules
- Unresolved questions or concerns
  - Open technical issues
  - Pending decisions
  - Areas needing clarification
- Agreements and commitments made
  - Formal decisions reached
  - Resource allocation agreements
  - Timeline commitments
Report Generation: Summaries can be automatically compiled into various types of reports:
- Executive briefings
  - High-level overviews for stakeholders
  - Key decisions and strategic implications
  - Resource allocation summaries
- Meeting minutes
  - Detailed discussion points and outcomes
  - Action items and assignees
  - Timeline commitments made
- Progress updates
  - Milestone achievements and delays
  - Current blockers and challenges
  - Next steps and priorities
- Project status reports
  - Overall project health indicators
  - Resource utilization metrics
  - Risk assessments and mitigation strategies

Example: Implementing Conversation Summarization

Here's a practical implementation of a conversation summarizer using OpenAI's API:

import openai
from typing import List, Dict
from datetime import datetime

class ConversationSummarizer:
    def __init__(self, api_key: str):
        self.api_key = api_key
        openai.api_key = api_key
        
    def create_summary_prompt(self, messages: List[Dict]) -> str:
        """Create a prompt for summarization from messages"""
        conversation = "\n".join([
            f"{msg['role'].title()}: {msg['content']}" 
            for msg in messages
        ])
        
        return f"""Please provide a concise summary of the following conversation, 
        highlighting key points, decisions, and action items:

        {conversation}
        
        Summary should include:
        1. Main topics discussed
        2. Key decisions made
        3. Action items and owners
        4. Unresolved questions
        """

    async def generate_summary(self, messages: List[Dict]) -> Dict:
        """Generate a structured summary of the conversation"""
        try:
            response = await openai.ChatCompletion.acreate(
                model="gpt-4o",
                messages=[{
                    "role": "user",
                    "content": self.create_summary_prompt(messages)
                }],
                temperature=0.7,
                max_tokens=500
            )
            
            summary = response.choices[0].message.content
            
            return {
                "timestamp": datetime.now().isoformat(),
                "message_count": len(messages),
                "summary": summary
            }
            
        except Exception as e:
            return {
                "error": f"Failed to generate summary: {str(e)}",
                "timestamp": datetime.now().isoformat()
            }

    def save_summary(self, summary: Dict, filename: str = "summaries.json"):
        """Save summary to JSON file"""
        try:
            with open(filename, "r") as f:
                summaries = json.load(f)
        except FileNotFoundError:
            summaries = []
            
        summaries.append(summary)
        
        with open(filename, "w") as f:
            json.dump(summaries, f, indent=2)

# Example usage
async def main():
    summarizer = ConversationSummarizer("your-api-key")
    
    # Sample conversation
    messages = [
        {"role": "user", "content": "Let's discuss the new feature implementation."},
        {"role": "assistant", "content": "Sure! What specific aspects would you like to focus on?"},
        {"role": "user", "content": "We need to implement user authentication by next week."},
        {"role": "assistant", "content": "I understand. Let's break down the requirements and timeline."}
    ]
    
    # Generate and save summary
    summary = await summarizer.generate_summary(messages)
    summarizer.save_summary(summary)
    print("Summary:", summary["summary"])

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Code Breakdown:

Class Structure:
- ConversationSummarizer class handles all summarization operations
- Initialization with API key setup
- Methods for prompt creation, summary generation, and storage
Key Features:
- Structured prompt generation for consistent summaries
- Async API calls for better performance
- Error handling and logging
- Persistent storage of summaries
Implementation Benefits:
- Scalable: Handles conversations of varying lengths
- Structured Output: Organized summaries with key information
- Historical Tracking: Maintains summary history
- Error Resilient: Robust error handling and logging

This example provides a reliable way to generate and maintain conversation summaries, making it easier to track discussion progress and key decisions over time.

7.3.2 Core Architecture

Together, these three pillars create a robust foundation that enables AI systems to maintain meaningful, context-aware dialogues across multiple interactions:

Storage

The foundation of memory management where all interactions are systematically saved and preserved for future use. This critical component serves as the backbone of any AI conversation system:

Can utilize various storage solutions like JSON files, SQL databases, or cloud storage
- JSON files offer simplicity and portability for smaller applications
- SQL databases provide robust querying and indexing for larger datasets
- Cloud storage enables scalable, distributed access across multiple services
Should include metadata like timestamps and user identifiers
- Timestamps enable chronological tracking and time-based filtering
- User IDs maintain conversation threads and personalization
- Additional metadata can track conversation topics and contexts
Must be organized for efficient retrieval and scaling
- Implement proper indexing for quick access to relevant data
- Use data partitioning for improved performance
- Consider compression strategies for long-term storage

Retrieval

The intelligent system for accessing and filtering relevant past conversations serves as a crucial component in managing conversation history:

Implements search algorithms to find context-appropriate historical data
- Uses semantic search to match similar topics and themes
- Employs fuzzy matching for flexible text comparison
- Indexes conversations for quick retrieval
Uses parameters like recency, relevance, and conversation thread
- Prioritizes recent interactions for immediate context
- Weighs relevance based on topic similarity scores
- Maintains thread continuity by tracking conversation flows
Manages token limits by selecting the most important context
- Implements smart truncation strategies
- Prioritizes key information while removing redundant content
- Dynamically adjusts context window based on model limitations

Injection

The process of seamlessly incorporating historical context requires careful handling to maintain conversation coherence:

Strategically places retrieved messages into the current conversation flow
- Determines optimal insertion points for historical context
- Filters and prioritizes relevant historical information
- Balances new and historical content for natural flow
Maintains proper message ordering and relationships
- Preserves chronological sequence of interactions
- Respects conversation threading and reply chains
- Links related topics and discussions appropriately
Ensures smooth context integration without disrupting the conversation
- Avoids abrupt context switches or information overload
- Uses natural transition phrases and references
- Maintains consistent tone and conversation style

7.3.3 How to Store and Retrieve JSON-based Message Histories in Python

Step 1: Build the Memory Manager

This module handles storing user interactions in JSON format and retrieving only the relevant ones.

import os
import json
from typing import List, Dict

MEMORY_DIR = "user_memory"
os.makedirs(MEMORY_DIR, exist_ok=True)

# Constants
MAX_HISTORY_MESSAGES = 5  # Truncate history to last 5 messages to manage tokens

def get_memory_path(user_id: str) -> str:
    return os.path.join(MEMORY_DIR, f"{user_id}.json")

def load_history(user_id: str) -> List[Dict[str, str]]:
    path = get_memory_path(user_id)
    if not os.path.exists(path):
        return []
    try:
        with open(path, "r", encoding="utf-8") as f:
            return json.load(f)
    except json.JSONDecodeError:
        return []

def store_interaction(user_id: str, role: str, content: str) -> None:
    message = {"role": role, "content": content}
    path = get_memory_path(user_id)

    history = load_history(user_id)
    history.append(message)

    with open(path, "w", encoding="utf-8") as f:
        json.dump(history, f, indent=2)

def get_recent_history(user_id: str, limit: int = MAX_HISTORY_MESSAGES) -> List[Dict[str, str]]:
    history = load_history(user_id)
    return history[-limit:]

Let's break down this code:

1. Initial Setup

Creates a "user_memory" directory to store conversation histories
Sets a maximum limit of 5 messages for history management

2. Core Functions

get_memory_path(user_id): Creates a unique JSON file path for each user
load_history(user_id):
- Attempts to read the user's conversation history
- Returns an empty list if file doesn't exist or is corrupted
store_interaction(user_id, role, content):
- Saves new messages to the user's history file
- Appends the message to existing history
- Stores in JSON format with proper indentation
get_recent_history(user_id, limit):
- Retrieves the most recent messages
- Respects the MAX_HISTORY_MESSAGES limit (5 messages)

3. Key Features

Persistent storage: Each user's conversations are saved in separate JSON files
Scalability: System can handle multiple users with individual files
Controlled context: Allows specific control over how much history to maintain
Debug-friendly: JSON format makes it easy to inspect stored conversations

Step 2: Create the Chat Engine Using OpenAI API

Now let’s integrate this memory system with the OpenAI API. We’ll load previous messages, add the new prompt, query the model, and save the response.

import openai
from dotenv import load_dotenv

load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

SYSTEM_PROMPT = {
    "role": "system",
    "content": "You are a helpful assistant that remembers the user's previous messages."
}

def continue_conversation(user_id: str, user_input: str) -> str:
    # Validate input
    if not user_input.strip():
        return "Input cannot be empty."

    # Load memory and prepare messages
    recent_history = get_recent_history(user_id)
    messages = [SYSTEM_PROMPT] + recent_history + [{"role": "user", "content": user_input}]

    # Call OpenAI API
    try:
        response = openai.ChatCompletion.create(
            model="gpt-4o",
            messages=messages,
            max_tokens=300,
            temperature=0.7
        )
        assistant_reply = response["choices"][0]["message"]["content"]

        # Store both messages
        store_interaction(user_id, "user", user_input)
        store_interaction(user_id, "assistant", assistant_reply)

        return assistant_reply

    except Exception as e:
        return f"Something went wrong: {e}"

Let’s break down this code:

1. Initial Setup

Imports required libraries (openai and dotenv)
Loads environment variables and sets up OpenAI API key
Defines a system prompt that establishes the assistant's role

2. Main Function: continue_conversation

Takes user_id and user_input as parameters
Input validation to check for empty messages
Loads conversation history using get_recent_history (defined in previous section)
Constructs messages array combining:
- System prompt
- Recent conversation history
- Current user input

3. API Interaction

Makes API call to OpenAI with parameters:
- Uses "gpt-4o" model
- Sets max_tokens to 300
- Uses temperature of 0.7 for balanced creativity
Extracts assistant's reply from the response

4. Memory Management

Stores both the user's input and assistant's reply using store_interaction
Handles errors gracefully with try-except block

This portion creates a stateful conversation system that maintains context across multiple interactions while managing the conversation flow efficiently.

Step 3: Test the Conversation Memory

if __name__ == "__main__":
    user_id = "user_456"

    print("User: What are the best Python libraries for data science?")
    reply1 = continue_conversation(user_id, "What are the best Python libraries for data science?")
    print("Assistant:", reply1)

    print("\nUser: Could you remind me which ones you mentioned?")
    reply2 = continue_conversation(user_id, "Could you remind me which ones you mentioned?")
    print("Assistant:", reply2)

Let's break down this code example for testing conversation memory:

1. Entry Point Check

The code uses the standard Python if __name__ == "__main__": idiom to ensure this code only runs when the file is executed directly

2. User Setup

Creates a test user with ID "user_456"

3. Test Conversation Flow

Demonstrates a two-turn conversation where:
First turn: Asks about Python libraries for data science
Second turn: Asks for a reminder of previously mentioned libraries, testing the memory system

4. Implementation Details

Each interaction uses the continue_conversation() function to:
Process the user input
Generate and store the response
Print both the user input and assistant's response

This test code effectively demonstrates how the system maintains context between multiple interactions, allowing the assistant to reference previous responses when answering follow-up questions.

Benefits of This Approach

Persistent: All conversations are stored locally by user, ensuring that no interaction history is lost. This means your application can maintain context across multiple sessions, even if the server restarts or the application closes.
Scalable (to a point): By storing each user's conversation history in their own dedicated JSON file, the system can handle multiple users efficiently. This approach works well for small to medium-sized applications, though for very large deployments you might want to consider a database solution.
Controllable context: The system gives you complete control over how much conversation history to include in each interaction. You can adjust the memory window size, filter by relevance, or implement custom logic for selecting which previous messages to include in the context.
Readable: The JSON file format makes it simple to inspect, debug, and modify stored conversations. This is invaluable during development and maintenance, as you can easily view the conversation history in any text editor and validate the data structure.

Optional Enhancements

Summarization:Instead of storing every message verbatim, implement periodic summarization of conversation history. This technique involves automatically generating concise summaries of longer conversation segments, which helps:
- Reduce token usage in API calls
- Maintain essential context while removing redundant details
- Create a more efficient memory structure
For example, multiple messages about a specific topic could be condensed into a single summary statement.
Vector Search (Advanced):Transform messages into numerical vectors using embedding models, enabling sophisticated retrieval based on semantic meaning. This approach offers several advantages:
- Discover contextually relevant historical messages even if they use different words
- Prioritize messages based on their relevance to the current conversation
- Enable fast similarity searches across large conversation histories
This is particularly useful for long-running conversations or when specific context needs to be recalled.
Token Budgeting:Implement smart token management strategies to optimize context window usage. This includes:
- Setting dynamic limits based on conversation importance
- Implementing intelligent pruning of older, less relevant messages
- Maintaining a balance between recent context and important historical information
This ensures you stay within API token limits while preserving the most valuable conversation context.
Keep the system prompt consistent across interactions
- Maintain identical prompt wording and instructions throughout the entire conversation lifecycle
- Use version control to track any system prompt changes across deployments
- Prevents confusion and contradictory responses by maintaining consistent context
- Ensures the AI maintains a reliable personality and behavioral pattern throughout interactions
Don't overload the context—store everything, but retrieve selectively
- Implement a comprehensive storage system that maintains complete conversation histories in your database
- Develop intelligent retrieval algorithms that prioritize relevant context for API calls
- Use semantic search or embedding-based similarity to find pertinent historical messages
- Balance token usage by implementing smart pruning strategies for older messages
Label stored messages clearly (role, content, timestamp) for future filtering or summarization
- Role: Carefully identify and tag message sources (system, user, or assistant) to maintain clear conversation flow and enable role-based filtering
- Content: Implement consistent formatting standards for message content, including handling of special characters and maintaining data integrity
- Timestamp: Add precise temporal metadata to enable sophisticated time-based operations like conversation segmentation and contextual relevance scoring

This memory system transforms your AI assistant in several powerful ways:

Continuity: The assistant can reference past conversations and maintain context over extended periods, creating seamless interactions that build upon previous discussions. For example, if a user mentions their preference for Python programming in one conversation, the system can reference this in future interactions.
Personality: Consistent response patterns and remembered preferences create a more distinct personality. This includes maintaining a consistent tone, remembering user preferences, and adapting communication styles based on past interactions.
Understanding: By accessing historical context, responses become more informed and personalized. The system can recall specific details from previous conversations, making interactions feel more natural and contextually aware.
Depth: The ability to build upon previous conversations enables more sophisticated interactions, allowing for complex problem-solving and long-term project support.

Grow: Continuously accumulate new interactions and learnings, building a rich history of user interactions and preferences over time. This growing knowledge base becomes increasingly valuable for personalizing responses.
Summarize: Condense lengthy conversation histories into manageable contexts, using advanced techniques like semantic clustering and importance scoring to maintain the most relevant information.
Adapt: Adjust its retrieval strategies based on conversation patterns, learning which types of historical context are most valuable for different types of interactions.

7.3 Storing and Retrieving Past Interactions

7.3.1 Why Store Interactions?

Storing past interactions unlocks several powerful capabilities that enhance the AI's ability to provide meaningful and contextual responses:

Personalized Responses

The system learns and adapts to individual users by maintaining a detailed profile of their interactions and preferences over time. This personalization happens on multiple levels:

Communication Style: The system tracks how users express themselves by analyzing multiple aspects of their communication patterns:
- Formality level: Whether they use casual language ("hey there!") or formal address ("Dear Sir/Madam")
- Humor usage: Their tendency to use jokes, emojis, or playful language
- Conversation pace: If they prefer quick exchanges or detailed, lengthy discussions
- Vocabulary choices: Technical vs. simplified language
- Cultural references: Professional, academic, or pop culture references
For example, if a user consistently uses informal language like "hey" and "thanks!" with emojis, the system adapts by responding in a friendly, casual tone. Conversely, when interacting with business users who maintain formal language and professional terms, the system automatically adjusts to use appropriate business etiquette and industry-standard terminology.
This adaptive communication ensures more natural and effective interactions by matching each user's unique communication style and preferences.
Technical Proficiency: By analyzing past interactions, the system gauges users' expertise levels in different domains. This allows it to automatically adjust its explanations based on demonstrated knowledge.
For instance, when discussing programming, the system might use advanced terminology like "polymorphism" and "dependency injection" with experienced developers, while offering simpler explanations using real-world analogies for beginners. The system continuously refines this assessment through ongoing interactions - if a user demonstrates increased understanding over time, the technical depth of explanations adjusts accordingly. This adaptive approach ensures that experts aren't slowed down by basic explanations while newcomers aren't overwhelmed by complex technical details.
Historical Context: The system maintains comprehensive records of previous discussions, projects, and decisions, enabling it to reference past conversations with precision and relevance. This historical tracking operates on multiple levels:
- Conversation Threading: The system can follow the progression of specific topics across multiple sessions, understanding how discussions evolve and build upon each other.
- Project Milestones: Important decisions, agreements, and project updates are recorded and can be referenced to maintain consistency in future discussions.
- User Preferences Evolution: The system tracks how user preferences and requirements change over time, adapting its responses accordingly.
- Contextual References: When addressing current topics, the system can intelligently reference related past discussions to provide more informed and nuanced responses.
This sophisticated context management creates a seamless conversational experience where users feel understood and valued, as the system demonstrates awareness of their history and ongoing needs. For example, if a user previously discussed challenges with a specific programming framework, the system can reference those earlier conversations when providing new solutions or updates.
Customization Preferences: The system maintains and applies detailed user preferences across sessions, including:
- Preferred language and regional variations
  - Language selection (e.g., English, Spanish, Mandarin)
  - Regional dialects and localizations
  - Currency and measurement units
- Format preferences (bullet points vs. paragraphs)
  - Document structure preferences (hierarchical vs. flat)
  - Visual organization (lists, tables, or flowing text)
  - Code formatting conventions when applicable
- Level of detail desired in responses
  - Brief summaries vs. comprehensive explanations
  - Technical depth of content
  - Inclusion of examples and analogies
- Specific terminology or naming conventions
  - Industry-specific vocabulary
  - Preferred technical frameworks or methodologies
  - Company-specific terminology
- Time zones and working hours
  - Meeting scheduling preferences
  - Notification timing preferences
  - Availability windows for synchronous communication

This comprehensive approach to personalization helps create a more natural, efficient, and engaging interaction that feels tailored to each individual user's needs and preferences.

Example: Implementing Personalized Responses

Here's a comprehensive implementation of a personalization system that adapts to user communication styles:

import json
import os
from datetime import datetime
from typing import Dict, List, Optional
import openai

class UserProfile:
    def __init__(self, user_id: str):
        self.user_id = user_id
        self.communication_style = {
            "formality_level": 0.5,  # 0 = casual, 1 = formal
            "technical_level": 0.5,   # 0 = beginner, 1 = expert
            "verbosity": 0.5,         # 0 = concise, 1 = detailed
            "emoji_usage": False
        }
        self.preferences = {
            "language": "en",
            "timezone": "UTC",
            "topics_of_interest": []
        }
        self.interaction_history = []

class PersonalizedAIAssistant:
    def __init__(self, api_key: str):
        self.api_key = api_key
        openai.api_key = api_key
        self.profiles_dir = "user_profiles"
        os.makedirs(self.profiles_dir, exist_ok=True)
        self.users: Dict[str, UserProfile] = {}

    def _get_profile_path(self, user_id: str) -> str:
        return os.path.join(self.profiles_dir, f"{user_id}.json")

    def load_user_profile(self, user_id: str) -> UserProfile:
        if user_id in self.users:
            return self.users[user_id]

        profile_path = self._get_profile_path(user_id)
        if os.path.exists(profile_path):
            with open(profile_path, 'r') as f:
                data = json.load(f)
                profile = UserProfile(user_id)
                profile.communication_style = data.get('communication_style', profile.communication_style)
                profile.preferences = data.get('preferences', profile.preferences)
                profile.interaction_history = data.get('interaction_history', [])
        else:
            profile = UserProfile(user_id)

        self.users[user_id] = profile
        return profile

    def save_user_profile(self, profile: UserProfile):
        data = {
            'communication_style': profile.communication_style,
            'preferences': profile.preferences,
            'interaction_history': profile.interaction_history
        }
        with open(self._get_profile_path(profile.user_id), 'w') as f:
            json.dump(data, f, indent=2)

    def analyze_message(self, message: str) -> dict:
        """Analyze user message to update communication style metrics."""
        return {
            "formality_level": 0.8 if any(word in message.lower() for word in 
                ['please', 'thank you', 'sir', 'madam']) else 0.2,
            "technical_level": 0.8 if any(word in message.lower() for word in 
                ['api', 'function', 'implementation', 'code']) else 0.3,
            "emoji_usage": '😊' in message or '👍' in message
        }

    def generate_system_prompt(self, profile: UserProfile) -> str:
        """Create personalized system prompt based on user profile."""
        style = "formal" if profile.communication_style["formality_level"] > 0.5 else "casual"
        tech_level = "technical" if profile.communication_style["technical_level"] > 0.5 else "simple"
        
        return f"""You are a helpful assistant that communicates in a {style} style.
                  Use {tech_level} language and {'feel free to use emojis' 
                  if profile.communication_style['emoji_usage'] else 'avoid using emojis'}.
                  Communicate {'in detail' if profile.communication_style['verbosity'] > 0.5 
                  else 'concisely'}."""

    async def get_response(self, user_id: str, message: str) -> str:
        profile = self.load_user_profile(user_id)
        
        # Analyze and update user's communication style
        analysis = self.analyze_message(message)
        profile.communication_style.update(analysis)
        
        # Prepare conversation context
        messages = [
            {"role": "system", "content": self.generate_system_prompt(profile)},
            {"role": "user", "content": message}
        ]

        # Add relevant history if available
        if profile.interaction_history:
            recent_history = profile.interaction_history[-3:]  # Last 3 interactions
            messages[1:1] = recent_history

        # Get AI response
        response = openai.ChatCompletion.create(
            model="gpt-4o",
            messages=messages,
            temperature=0.7,
            max_tokens=150
        )

        # Store interaction
        interaction = {
            "timestamp": datetime.utcnow().isoformat(),
            "user_message": message,
            "assistant_response": response.choices[0].message.content
        }
        profile.interaction_history.append(interaction)
        
        # Save updated profile
        self.save_user_profile(profile)
        
        return response.choices[0].message.content

# Usage example
if __name__ == "__main__":
    assistant = PersonalizedAIAssistant("your-api-key-here")
    
    # Example interactions
    responses = [
        assistant.get_response("user123", "Hey there! Can you help me with Python? 😊"),
        assistant.get_response("user123", "Could you explain the technical implementation of APIs?"),
        assistant.get_response("user123", "Dear Sir, I require assistance with programming.")
    ]

Code Breakdown:

Class Structure:
- UserProfile class maintains individual user information:
  - Communication style metrics (formality, technical level, etc.)
  - Personal preferences (language, timezone)
  - Interaction history
- PersonalizedAIAssistant class handles the core functionality:
  - Profile management (loading/saving)
  - Message analysis
  - Response generation
Key Features:
- Persistent Storage: Profiles are saved as JSON files
- Style Analysis: Examines messages for communication patterns
- Dynamic Prompting: Generates customized system prompts
- Context Management: Maintains conversation history
Personalization Aspects:
- Communication Style:
  - Formality level detection
  - Technical language adaptation
  - Emoji usage tracking
- Response Adaptation:
  - Adjusts verbosity based on user preference
  - Maintains consistent style across interactions
  - Incorporates conversation history

Session Resumption

Here's a practical implementation of session resumption:

from datetime import datetime
import json
import openai
from typing import List, Dict, Optional

class SessionManager:
    def __init__(self, api_key: str):
        self.api_key = api_key
        openai.api_key = api_key
        self.sessions: Dict[str, Dict] = {}
        
    def save_session(self, user_id: str, session_data: dict):
        """Save session data to persistent storage."""
        with open(f"sessions/{user_id}.json", "w") as f:
            json.dump(session_data, f)
            
    def load_session(self, user_id: str) -> Optional[dict]:
        """Load session data from storage."""
        try:
            with open(f"sessions/{user_id}.json", "r") as f:
                return json.load(f)
        except FileNotFoundError:
            return None

class ConversationManager:
    def __init__(self, session_manager: SessionManager):
        self.session_manager = session_manager
        self.current_context: List[Dict] = []
        
    def prepare_context(self, user_id: str, new_message: str) -> List[Dict]:
        """Prepare conversation context including session history."""
        # Load previous session if exists
        session = self.session_manager.load_session(user_id)
        
        # Initialize context with system message
        context = [{
            "role": "system",
            "content": "You are a helpful assistant with memory of past conversations."
        }]
        
        # Add relevant history from previous session
        if session and 'history' in session:
            # Add last 5 messages from previous session for context
            context.extend(session['history'][-5:])
        
        # Add new message
        context.append({
            "role": "user",
            "content": new_message
        })
        
        return context

    async def process_message(self, user_id: str, message: str) -> str:
        """Process new message with session context."""
        context = self.prepare_context(user_id, message)
        
        try:
            response = await openai.ChatCompletion.acreate(
                model="gpt-4o",
                messages=context,
                temperature=0.7,
                max_tokens=150
            )
            
            assistant_message = response.choices[0].message.content
            
            # Update session with new interaction
            session_data = {
                'last_interaction': datetime.now().isoformat(),
                'history': context + [{
                    "role": "assistant",
                    "content": assistant_message
                }]
            }
            self.session_manager.save_session(user_id, session_data)
            
            return assistant_message
            
        except Exception as e:
            print(f"Error processing message: {e}")
            return "I apologize, but I encountered an error processing your message."

# Example usage
async def main():
    session_manager = SessionManager("your-api-key-here")
    conversation_manager = ConversationManager(session_manager)
    
    # First interaction
    response1 = await conversation_manager.process_message(
        "user123",
        "What's the weather like today?"
    )
    print("Response 1:", response1)
    
    # Later interaction (session resumption)
    response2 = await conversation_manager.process_message(
        "user123",
        "What did we discuss earlier?"
    )
    print("Response 2:", response2)

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Code Breakdown:

SessionManager Class:
- Handles persistent storage of session data
- Provides methods for saving and loading session information
- Maintains user-specific session files
ConversationManager Class:
- Manages conversation context and history
- Prepares context by combining previous session data with new messages
- Handles interaction with the OpenAI API
Key Features:
- Asynchronous Processing: Uses async/await for efficient API calls
- Context Management: Maintains relevant conversation history
- Error Handling: Includes robust error management
- Session Persistence: Saves conversations to disk for later retrieval
Implementation Details:
- Uses JSON for session storage
- Limits context to last 5 messages for efficiency
- Includes timestamp tracking for session management
- Maintains conversation roles (system, user, assistant)

Persistent Knowledge

The system maintains a robust and comprehensive record of all significant information exchanged during conversations. This persistent knowledge architecture operates on multiple levels:

Basic Information Management: The system captures and stores essential operational data in a structured manner. This includes comprehensive tracking of calendar entries such as meetings and appointments, with metadata like attendees and agendas. Project timelines are maintained with detailed milestone tracking, dependencies, and phase transitions.
The system records all deadlines systematically, from task-level due dates to major project deliverables. Regular updates are stored chronologically, including daily reports, status changes, and project modifications. This robust information architecture ensures that all scheduling and project-related data remains easily retrievable, supporting efficient project management and team coordination.
User-Specific Data: The system maintains detailed profiles of individual users that encompass multiple aspects of their interactions:
- Personal Preferences: Including preferred communication channels, response formats, and specific domain interests
- Communication Styles: Tracking whether users prefer formal or casual language, technical or simplified explanations, and their typical response length preferences
- Technical Expertise: Monitoring and adapting to users' demonstrated knowledge levels across different subjects and adjusting explanations accordingly
- Historical Patterns: Recording timing of interactions, frequently discussed topics, and common questions or concerns
- Language Patterns: Noting vocabulary usage, technical terminology familiarity, and preferred examples or analogies
- Learning Progress: Tracking how users' understanding of various topics evolves over time
This comprehensive user profiling enables the system to deliver increasingly tailored responses that match each user's unique needs and preferences, creating a more effective and engaging interaction experience over time.
Decision Recording: Critical decisions are systematically documented in a comprehensive manner that includes multiple key components:
- Context: The full background situation, business environment, and constraints that framed the decision
- Rationale: Detailed reasoning behind the choice, including:
  - Analysis of alternatives considered
  - Risk assessment and mitigation strategies
  - Expected outcomes and success metrics
- Stakeholders: Complete documentation of:
  - Decision makers and their roles
  - Affected teams and departments
  - External parties involved
- Implementation Plan:
  - Step-by-step execution strategy
  - Resource allocation details
  - Timeline and milestones
This systematic documentation process creates a detailed and auditable trail that enables teams to:
- Track the evolution of important decisions
- Understand the complete context of past choices
- Learn from previous experiences
- Make more informed decisions in the future
- Maintain accountability and transparency
Task Management: The system implements a comprehensive task tracking system that monitors various aspects of project execution:
- Assignment Tracking: Each task is linked to specific team members or departments responsible for its completion, ensuring clear ownership and responsibility
- Timeline Management: Detailed due dates are maintained, including both final deadlines and intermediate milestones, allowing for better time management
- Progress Monitoring: Regular status updates are recorded to track task progression, including completed work, current blockers, and remaining steps
- Dependency Mapping: The system maintains a clear map of task dependencies, helping teams understand how delays or changes in one task might impact others
- Resource Allocation: Tracks the distribution of work and resources across team members to prevent overload and ensure efficient project execution
Project Details: The system maintains comprehensive documentation of technical aspects including:
- Technical Specifications:
  - Detailed system architecture blueprints
  - Complete API documentation and endpoints
  - Database schemas and data models
  - Third-party integration specifications
- Project Requirements:
  - Business requirements and objectives
  - Technical requirements and constraints
  - User stories and acceptance criteria
  - Scope definitions and boundaries
- Challenges and Solutions:
  - Identified technical obstacles
  - Implemented workarounds
  - Performance optimization efforts
  - Security measures and updates
- Implementation Records:
  - Code documentation and examples
  - Architecture decision records
  - Testing strategies and results
  - Deployment procedures

Here's a comprehensive implementation of Persistent Knowledge using OpenAI API:

from typing import Dict, List, Optional
import json
import openai
from datetime import datetime
import tiktoken

class PersistentKnowledge:
    def __init__(self, api_key: str):
        self.api_key = api_key
        openai.api_key = api_key
        self.encoding = tiktoken.encoding_for_model("gpt-4o")
        
    def count_tokens(self, text: str) -> int:
        """Count tokens in text using tiktoken"""
        return len(self.encoding.encode(text))
        
    def save_knowledge(self, user_id: str, category: str, data: Dict):
        """Save knowledge to persistent storage with categories"""
        filename = f"knowledge_{user_id}_{category}.json"
        timestamp = datetime.now().isoformat()
        
        data_with_metadata = {
            "timestamp": timestamp,
            "category": category,
            "content": data
        }
        
        try:
            with open(filename, "r") as f:
                existing_data = json.load(f)
        except FileNotFoundError:
            existing_data = []
            
        existing_data.append(data_with_metadata)
        
        with open(filename, "w") as f:
            json.dump(existing_data, f, indent=2)
            
    def retrieve_knowledge(self, user_id: str, category: str, 
                         max_tokens: int = 2000) -> List[Dict]:
        """Retrieve knowledge with token limit"""
        filename = f"knowledge_{user_id}_{category}.json"
        try:
            with open(filename, "r") as f:
                all_data = json.load(f)
        except FileNotFoundError:
            return []
            
        # Retrieve most recent entries within token limit
        retrieved_data = []
        current_tokens = 0
        
        for entry in reversed(all_data):
            content_str = json.dumps(entry["content"])
            tokens = self.count_tokens(content_str)
            
            if current_tokens + tokens <= max_tokens:
                retrieved_data.append(entry)
                current_tokens += tokens
            else:
                break
                
        return list(reversed(retrieved_data))
        
    async def get_ai_response(self, 
                            user_id: str,
                            current_input: str,
                            categories: List[str] = None) -> str:
        """Generate AI response with context from persistent knowledge"""
        
        # Build context from stored knowledge
        context = []
        if categories:
            for category in categories:
                knowledge = self.retrieve_knowledge(user_id, category)
                if knowledge:
                    context.append(f"\nRelevant {category} knowledge:")
                    for entry in knowledge:
                        context.append(json.dumps(entry["content"]))
                        
        # Prepare messages for API
        messages = [
            {
                "role": "system",
                "content": "You are an assistant with access to persistent knowledge. "
                          "Use this context to provide informed responses."
            },
            {
                "role": "user",
                "content": f"Context:\n{''.join(context)}\n\nCurrent query: {current_input}"
            }
        ]
        
        try:
            response = await openai.ChatCompletion.acreate(
                model="gpt-4",
                messages=messages,
                temperature=0.7,
                max_tokens=500
            )
            return response.choices[0].message.content
        except Exception as e:
            return f"Error generating response: {str(e)}"
            
class KnowledgeManager:
    def __init__(self, api_key: str):
        self.knowledge = PersistentKnowledge(api_key)
        
    async def process_interaction(self, 
                                user_id: str,
                                message: str,
                                categories: List[str] = None) -> str:
        """Process user interaction and maintain knowledge"""
        
        # Save user input
        self.knowledge.save_knowledge(
            user_id,
            "conversations",
            {"role": "user", "message": message}
        )
        
        # Get AI response with context
        response = await self.knowledge.get_ai_response(
            user_id,
            message,
            categories
        )
        
        # Save AI response
        self.knowledge.save_knowledge(
            user_id,
            "conversations",
            {"role": "assistant", "message": response}
        )
        
        return response

# Example usage
async def main():
    manager = KnowledgeManager("your-api-key")
    
    # First interaction
    response1 = await manager.process_interaction(
        "user123",
        "What's the best way to learn Python?",
        ["conversations", "preferences"]
    )
    print("Response 1:", response1)
    
    # Save user preference
    manager.knowledge.save_knowledge(
        "user123",
        "preferences",
        {"learning_style": "hands-on", "preferred_language": "Python"}
    )
    
    # Later interaction using stored knowledge
    response2 = await manager.process_interaction(
        "user123",
        "Can you suggest some projects based on my learning style?",
        ["conversations", "preferences"]
    )
    print("Response 2:", response2)

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Code Breakdown:

PersistentKnowledge Class:
- Handles token counting using tiktoken for context management
- Implements save_knowledge() for storing categorized information
- Provides retrieve_knowledge() with token limits for context retrieval
- Manages AI interactions through get_ai_response()
KnowledgeManager Class:
- Provides high-level interface for knowledge management
- Processes user interactions and maintains conversation history
- Handles saving both user inputs and AI responses
Key Features:
- Categorized Knowledge Storage: Organizes information by user and category
- Token Management: Ensures context stays within model limits
- Metadata Tracking: Includes timestamps and categories for all stored data
- Error Handling: Robust error management for file operations and API calls
Implementation Benefits:
- Scalable: Handles multiple users and knowledge categories
- Efficient: Uses token counting to optimize context usage
- Flexible: Supports various knowledge types and categories
- Maintainable: Well-structured code with clear separation of concerns

Conversation Summarization

Context Refreshing: When participants begin new conversation sessions, the system provides concise yet comprehensive summaries of previous discussions. These summaries serve as efficient briefings that:
- Quickly orient participants on key discussion points
- Highlight important decisions and outcomes
- Identify ongoing action items
- Refresh memory on critical context
  This eliminates the time-consuming process of reviewing extensive conversation logs and ensures all participants can immediately engage productively in the current discussion with full context awareness.
Progress Tracking: Regular summaries serve as a comprehensive tracking mechanism for ongoing discussions and projects. By maintaining detailed records of project evolution, teams can:
- Monitor Development Phases
  - Track progression from initial concepts to implementation
  - Document iterative improvements and refinements
  - Record key turning points in project direction
- Analyze Decision History
  - Capture the context behind important choices
  - Document alternative options considered
  - Track outcomes of implemented decisions
- Identify Project Trends
  - Spot recurring challenges or bottlenecks
  - Recognize successful patterns to replicate
  - Monitor velocity and momentum
- Facilitate Team Alignment
  - Maintain shared understanding of progress
  - Enable data-driven course corrections
  - Support informed resource allocation
Knowledge Extraction: The system employs advanced parsing techniques to identify and extract critical information from conversations, including:
- Key decisions and their rationale
  - Strategic choices made during discussions
  - Supporting evidence and justification
  - Alternative options considered
- Action items and their owners
  - Specific tasks assigned to team members
  - Clear responsibility assignments
  - Follow-up requirements
- Important deadlines and milestones
  - Project timeline markers
  - Critical delivery dates
  - Review and checkpoint schedules
- Unresolved questions or concerns
  - Open technical issues
  - Pending decisions
  - Areas needing clarification
- Agreements and commitments made
  - Formal decisions reached
  - Resource allocation agreements
  - Timeline commitments
Report Generation: Summaries can be automatically compiled into various types of reports:
- Executive briefings
  - High-level overviews for stakeholders
  - Key decisions and strategic implications
  - Resource allocation summaries
- Meeting minutes
  - Detailed discussion points and outcomes
  - Action items and assignees
  - Timeline commitments made
- Progress updates
  - Milestone achievements and delays
  - Current blockers and challenges
  - Next steps and priorities
- Project status reports
  - Overall project health indicators
  - Resource utilization metrics
  - Risk assessments and mitigation strategies

Example: Implementing Conversation Summarization

Here's a practical implementation of a conversation summarizer using OpenAI's API:

import openai
from typing import List, Dict
from datetime import datetime

class ConversationSummarizer:
    def __init__(self, api_key: str):
        self.api_key = api_key
        openai.api_key = api_key
        
    def create_summary_prompt(self, messages: List[Dict]) -> str:
        """Create a prompt for summarization from messages"""
        conversation = "\n".join([
            f"{msg['role'].title()}: {msg['content']}" 
            for msg in messages
        ])
        
        return f"""Please provide a concise summary of the following conversation, 
        highlighting key points, decisions, and action items:

        {conversation}
        
        Summary should include:
        1. Main topics discussed
        2. Key decisions made
        3. Action items and owners
        4. Unresolved questions
        """

    async def generate_summary(self, messages: List[Dict]) -> Dict:
        """Generate a structured summary of the conversation"""
        try:
            response = await openai.ChatCompletion.acreate(
                model="gpt-4o",
                messages=[{
                    "role": "user",
                    "content": self.create_summary_prompt(messages)
                }],
                temperature=0.7,
                max_tokens=500
            )
            
            summary = response.choices[0].message.content
            
            return {
                "timestamp": datetime.now().isoformat(),
                "message_count": len(messages),
                "summary": summary
            }
            
        except Exception as e:
            return {
                "error": f"Failed to generate summary: {str(e)}",
                "timestamp": datetime.now().isoformat()
            }

    def save_summary(self, summary: Dict, filename: str = "summaries.json"):
        """Save summary to JSON file"""
        try:
            with open(filename, "r") as f:
                summaries = json.load(f)
        except FileNotFoundError:
            summaries = []
            
        summaries.append(summary)
        
        with open(filename, "w") as f:
            json.dump(summaries, f, indent=2)

# Example usage
async def main():
    summarizer = ConversationSummarizer("your-api-key")
    
    # Sample conversation
    messages = [
        {"role": "user", "content": "Let's discuss the new feature implementation."},
        {"role": "assistant", "content": "Sure! What specific aspects would you like to focus on?"},
        {"role": "user", "content": "We need to implement user authentication by next week."},
        {"role": "assistant", "content": "I understand. Let's break down the requirements and timeline."}
    ]
    
    # Generate and save summary
    summary = await summarizer.generate_summary(messages)
    summarizer.save_summary(summary)
    print("Summary:", summary["summary"])

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Code Breakdown:

Class Structure:
- ConversationSummarizer class handles all summarization operations
- Initialization with API key setup
- Methods for prompt creation, summary generation, and storage
Key Features:
- Structured prompt generation for consistent summaries
- Async API calls for better performance
- Error handling and logging
- Persistent storage of summaries
Implementation Benefits:
- Scalable: Handles conversations of varying lengths
- Structured Output: Organized summaries with key information
- Historical Tracking: Maintains summary history
- Error Resilient: Robust error handling and logging

This example provides a reliable way to generate and maintain conversation summaries, making it easier to track discussion progress and key decisions over time.

7.3.2 Core Architecture

Together, these three pillars create a robust foundation that enables AI systems to maintain meaningful, context-aware dialogues across multiple interactions:

Storage

The foundation of memory management where all interactions are systematically saved and preserved for future use. This critical component serves as the backbone of any AI conversation system:

Can utilize various storage solutions like JSON files, SQL databases, or cloud storage
- JSON files offer simplicity and portability for smaller applications
- SQL databases provide robust querying and indexing for larger datasets
- Cloud storage enables scalable, distributed access across multiple services
Should include metadata like timestamps and user identifiers
- Timestamps enable chronological tracking and time-based filtering
- User IDs maintain conversation threads and personalization
- Additional metadata can track conversation topics and contexts
Must be organized for efficient retrieval and scaling
- Implement proper indexing for quick access to relevant data
- Use data partitioning for improved performance
- Consider compression strategies for long-term storage

Retrieval

The intelligent system for accessing and filtering relevant past conversations serves as a crucial component in managing conversation history:

Implements search algorithms to find context-appropriate historical data
- Uses semantic search to match similar topics and themes
- Employs fuzzy matching for flexible text comparison
- Indexes conversations for quick retrieval
Uses parameters like recency, relevance, and conversation thread
- Prioritizes recent interactions for immediate context
- Weighs relevance based on topic similarity scores
- Maintains thread continuity by tracking conversation flows
Manages token limits by selecting the most important context
- Implements smart truncation strategies
- Prioritizes key information while removing redundant content
- Dynamically adjusts context window based on model limitations

Injection

The process of seamlessly incorporating historical context requires careful handling to maintain conversation coherence:

Strategically places retrieved messages into the current conversation flow
- Determines optimal insertion points for historical context
- Filters and prioritizes relevant historical information
- Balances new and historical content for natural flow
Maintains proper message ordering and relationships
- Preserves chronological sequence of interactions
- Respects conversation threading and reply chains
- Links related topics and discussions appropriately
Ensures smooth context integration without disrupting the conversation
- Avoids abrupt context switches or information overload
- Uses natural transition phrases and references
- Maintains consistent tone and conversation style

7.3.3 How to Store and Retrieve JSON-based Message Histories in Python

Step 1: Build the Memory Manager

This module handles storing user interactions in JSON format and retrieving only the relevant ones.

import os
import json
from typing import List, Dict

MEMORY_DIR = "user_memory"
os.makedirs(MEMORY_DIR, exist_ok=True)

# Constants
MAX_HISTORY_MESSAGES = 5  # Truncate history to last 5 messages to manage tokens

def get_memory_path(user_id: str) -> str:
    return os.path.join(MEMORY_DIR, f"{user_id}.json")

def load_history(user_id: str) -> List[Dict[str, str]]:
    path = get_memory_path(user_id)
    if not os.path.exists(path):
        return []
    try:
        with open(path, "r", encoding="utf-8") as f:
            return json.load(f)
    except json.JSONDecodeError:
        return []

def store_interaction(user_id: str, role: str, content: str) -> None:
    message = {"role": role, "content": content}
    path = get_memory_path(user_id)

    history = load_history(user_id)
    history.append(message)

    with open(path, "w", encoding="utf-8") as f:
        json.dump(history, f, indent=2)

def get_recent_history(user_id: str, limit: int = MAX_HISTORY_MESSAGES) -> List[Dict[str, str]]:
    history = load_history(user_id)
    return history[-limit:]

Let's break down this code:

1. Initial Setup

Creates a "user_memory" directory to store conversation histories
Sets a maximum limit of 5 messages for history management

2. Core Functions

get_memory_path(user_id): Creates a unique JSON file path for each user
load_history(user_id):
- Attempts to read the user's conversation history
- Returns an empty list if file doesn't exist or is corrupted
store_interaction(user_id, role, content):
- Saves new messages to the user's history file
- Appends the message to existing history
- Stores in JSON format with proper indentation
get_recent_history(user_id, limit):
- Retrieves the most recent messages
- Respects the MAX_HISTORY_MESSAGES limit (5 messages)

3. Key Features

Persistent storage: Each user's conversations are saved in separate JSON files
Scalability: System can handle multiple users with individual files
Controlled context: Allows specific control over how much history to maintain
Debug-friendly: JSON format makes it easy to inspect stored conversations

Step 2: Create the Chat Engine Using OpenAI API

Now let’s integrate this memory system with the OpenAI API. We’ll load previous messages, add the new prompt, query the model, and save the response.

import openai
from dotenv import load_dotenv

load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

SYSTEM_PROMPT = {
    "role": "system",
    "content": "You are a helpful assistant that remembers the user's previous messages."
}

def continue_conversation(user_id: str, user_input: str) -> str:
    # Validate input
    if not user_input.strip():
        return "Input cannot be empty."

    # Load memory and prepare messages
    recent_history = get_recent_history(user_id)
    messages = [SYSTEM_PROMPT] + recent_history + [{"role": "user", "content": user_input}]

    # Call OpenAI API
    try:
        response = openai.ChatCompletion.create(
            model="gpt-4o",
            messages=messages,
            max_tokens=300,
            temperature=0.7
        )
        assistant_reply = response["choices"][0]["message"]["content"]

        # Store both messages
        store_interaction(user_id, "user", user_input)
        store_interaction(user_id, "assistant", assistant_reply)

        return assistant_reply

    except Exception as e:
        return f"Something went wrong: {e}"

Let’s break down this code:

1. Initial Setup

Imports required libraries (openai and dotenv)
Loads environment variables and sets up OpenAI API key
Defines a system prompt that establishes the assistant's role

2. Main Function: continue_conversation

Takes user_id and user_input as parameters
Input validation to check for empty messages
Loads conversation history using get_recent_history (defined in previous section)
Constructs messages array combining:
- System prompt
- Recent conversation history
- Current user input

3. API Interaction

Makes API call to OpenAI with parameters:
- Uses "gpt-4o" model
- Sets max_tokens to 300
- Uses temperature of 0.7 for balanced creativity
Extracts assistant's reply from the response

4. Memory Management

Stores both the user's input and assistant's reply using store_interaction
Handles errors gracefully with try-except block

This portion creates a stateful conversation system that maintains context across multiple interactions while managing the conversation flow efficiently.

Step 3: Test the Conversation Memory

if __name__ == "__main__":
    user_id = "user_456"

    print("User: What are the best Python libraries for data science?")
    reply1 = continue_conversation(user_id, "What are the best Python libraries for data science?")
    print("Assistant:", reply1)

    print("\nUser: Could you remind me which ones you mentioned?")
    reply2 = continue_conversation(user_id, "Could you remind me which ones you mentioned?")
    print("Assistant:", reply2)

Let's break down this code example for testing conversation memory:

1. Entry Point Check

The code uses the standard Python if __name__ == "__main__": idiom to ensure this code only runs when the file is executed directly

2. User Setup

Creates a test user with ID "user_456"

3. Test Conversation Flow

Demonstrates a two-turn conversation where:
First turn: Asks about Python libraries for data science
Second turn: Asks for a reminder of previously mentioned libraries, testing the memory system

4. Implementation Details

Each interaction uses the continue_conversation() function to:
Process the user input
Generate and store the response
Print both the user input and assistant's response

This test code effectively demonstrates how the system maintains context between multiple interactions, allowing the assistant to reference previous responses when answering follow-up questions.

Benefits of This Approach

Persistent: All conversations are stored locally by user, ensuring that no interaction history is lost. This means your application can maintain context across multiple sessions, even if the server restarts or the application closes.
Scalable (to a point): By storing each user's conversation history in their own dedicated JSON file, the system can handle multiple users efficiently. This approach works well for small to medium-sized applications, though for very large deployments you might want to consider a database solution.
Controllable context: The system gives you complete control over how much conversation history to include in each interaction. You can adjust the memory window size, filter by relevance, or implement custom logic for selecting which previous messages to include in the context.
Readable: The JSON file format makes it simple to inspect, debug, and modify stored conversations. This is invaluable during development and maintenance, as you can easily view the conversation history in any text editor and validate the data structure.

Optional Enhancements

Summarization:Instead of storing every message verbatim, implement periodic summarization of conversation history. This technique involves automatically generating concise summaries of longer conversation segments, which helps:
- Reduce token usage in API calls
- Maintain essential context while removing redundant details
- Create a more efficient memory structure
For example, multiple messages about a specific topic could be condensed into a single summary statement.
Vector Search (Advanced):Transform messages into numerical vectors using embedding models, enabling sophisticated retrieval based on semantic meaning. This approach offers several advantages:
- Discover contextually relevant historical messages even if they use different words
- Prioritize messages based on their relevance to the current conversation
- Enable fast similarity searches across large conversation histories
This is particularly useful for long-running conversations or when specific context needs to be recalled.
Token Budgeting:Implement smart token management strategies to optimize context window usage. This includes:
- Setting dynamic limits based on conversation importance
- Implementing intelligent pruning of older, less relevant messages
- Maintaining a balance between recent context and important historical information
This ensures you stay within API token limits while preserving the most valuable conversation context.
Keep the system prompt consistent across interactions
- Maintain identical prompt wording and instructions throughout the entire conversation lifecycle
- Use version control to track any system prompt changes across deployments
- Prevents confusion and contradictory responses by maintaining consistent context
- Ensures the AI maintains a reliable personality and behavioral pattern throughout interactions
Don't overload the context—store everything, but retrieve selectively
- Implement a comprehensive storage system that maintains complete conversation histories in your database
- Develop intelligent retrieval algorithms that prioritize relevant context for API calls
- Use semantic search or embedding-based similarity to find pertinent historical messages
- Balance token usage by implementing smart pruning strategies for older messages
Label stored messages clearly (role, content, timestamp) for future filtering or summarization
- Role: Carefully identify and tag message sources (system, user, or assistant) to maintain clear conversation flow and enable role-based filtering
- Content: Implement consistent formatting standards for message content, including handling of special characters and maintaining data integrity
- Timestamp: Add precise temporal metadata to enable sophisticated time-based operations like conversation segmentation and contextual relevance scoring

This memory system transforms your AI assistant in several powerful ways:

Continuity: The assistant can reference past conversations and maintain context over extended periods, creating seamless interactions that build upon previous discussions. For example, if a user mentions their preference for Python programming in one conversation, the system can reference this in future interactions.
Personality: Consistent response patterns and remembered preferences create a more distinct personality. This includes maintaining a consistent tone, remembering user preferences, and adapting communication styles based on past interactions.
Understanding: By accessing historical context, responses become more informed and personalized. The system can recall specific details from previous conversations, making interactions feel more natural and contextually aware.
Depth: The ability to build upon previous conversations enables more sophisticated interactions, allowing for complex problem-solving and long-term project support.

Grow: Continuously accumulate new interactions and learnings, building a rich history of user interactions and preferences over time. This growing knowledge base becomes increasingly valuable for personalizing responses.
Summarize: Condense lengthy conversation histories into manageable contexts, using advanced techniques like semantic clustering and importance scoring to maintain the most relevant information.
Adapt: Adjust its retrieval strategies based on conversation patterns, learning which types of historical context are most valuable for different types of interactions.

Purchase this book

The App is Under a Quick Maintenance

We apologize for the inconvenience. Please come back later

Chapter 7: Memory and Multi-Turn Conversations

7.3 Storing and Retrieving Past Interactions

7.3.1 Why Store Interactions?

7.3.2 Core Architecture

7.3.3 How to Store and Retrieve JSON-based Message Histories in Python

7.3 Storing and Retrieving Past Interactions

7.3.1 Why Store Interactions?

7.3.2 Core Architecture

7.3.3 How to Store and Retrieve JSON-based Message Histories in Python

7.3 Storing and Retrieving Past Interactions

7.3.1 Why Store Interactions?

7.3.2 Core Architecture

7.3.3 How to Store and Retrieve JSON-based Message Histories in Python

7.3 Storing and Retrieving Past Interactions

7.3.1 Why Store Interactions?

7.3.2 Core Architecture

7.3.3 How to Store and Retrieve JSON-based Message Histories in Python