commit 978d3794e8156687546deaf9f17af883756fd560 Author: Travis Vasceannie Date: Tue Aug 26 06:35:41 2025 -0400 firs diff --git a/.qoder/quests/unknown-feature.md b/.qoder/quests/unknown-feature.md new file mode 100644 index 0000000..7156bbd --- /dev/null +++ b/.qoder/quests/unknown-feature.md @@ -0,0 +1,1802 @@ +# Discord Voice Chat Quote Bot - Technical Design + +## Overview + +An advanced AI-powered Discord bot that continuously records, analyzes, and curates memorable quotes from voice channel conversations using persistent 120-second audio clips. The system employs progressive speaker identification (starting with diarization, advancing to user-assisted recognition), quantitative humor analysis, and long-term memory to generate contextually aware commentary with configurable response thresholds and multi-provider AI support. + +### Core Value Proposition +- **Consent-First Recording**: Explicit user consent required before any recording begins +- **Progressive Speaker Recognition**: Start with "Speaker 1/2/3" labels, evolve to user identification through assisted tagging and optional enrollment +- **Persistent 120-second audio recording** with automatic clipping and batch processing +- **Quantitative quote analysis** (funny, dark, silly, suspicious, asinine) with laughter detection +- **Configurable response thresholds** (real-time, 6-hour rotation, daily) +- **Long-term memory system** for personalized interactions +- **Multi-AI provider support** (OpenAI, Anthropic, Groq, Ollama, etc.) +- **Enterprise-grade infrastructure** with PostgreSQL and modern TTS services +- **Transparent and interactive** user experience with explanation features + +## Technology Stack & Dependencies + +### Core Framework +- **Discord.py**: Discord API wrapper (v2.3.0+) +- **discord-ext-voice-recv**: Voice data reception extension +- **Python 3.9+**: Runtime environment +- **PostgreSQL + asyncpg**: Production database with concurrent write support +- **Redis**: Caching, session management, and real-time queues +- **Qdrant**: Vector database for long-term memory and embeddings + +### AI & Processing +- **OpenAI SDK**: Whisper (transcription) + GPT models + TTS +- **Anthropic SDK**: Claude models for analysis +- **Groq SDK**: Fast inference capabilities +- **Ollama Client**: Local model support +- **OpenRouter SDK**: Multi-provider access +- **LMStudio API**: Local model server integration +- **ElevenLabs API**: High-quality, low-latency TTS (primary) +- **Azure Cognitive Services**: Speaker recognition as a service (optional) +- **Hume AI**: Advanced emotion detection from voice (optional) + +### Audio Processing & Recognition +- **pyannote.audio**: Speaker diarization (who spoke when) +- **librosa**: Advanced audio analysis +- **scipy**: Signal processing for laughter detection +- **webrtcvad**: Voice activity detection +- **ffmpeg-python**: Audio format conversion + +### Storage & Memory +- **sentence-transformers**: Text embeddings generation +- **numpy**: Numerical computations for audio processing +- **scikit-learn**: ML utilities for speaker clustering + +### Monitoring & Infrastructure +- **prometheus-client**: Metrics and monitoring +- **python-dotenv**: Environment variable management +- **asyncio**: Concurrent processing +- **aiohttp**: HTTP client for external APIs + +### Environment Configuration +``` +# Discord Configuration +DISCORD_TOKEN=your_discord_bot_token +GUILD_ID=your_test_server_id +SUMMARY_CHANNEL_ID=channel_for_daily_summaries + +# Database +POSTGRES_URL=postgresql://user:pass@postgres:5432/quotes_db +REDIS_URL=redis://redis:6379 +QDRANT_URL=http://qdrant:6333 + +# AI Providers +OPENAI_API_KEY=your_openai_api_key +ANTHROPIC_API_KEY=your_anthropic_api_key +GROQ_API_KEY=your_groq_api_key +OPENROUTER_API_KEY=your_openrouter_api_key +ELEVENLABS_API_KEY=your_elevenlabs_api_key +AZURE_SPEECH_KEY=your_azure_key +HUME_AI_API_KEY=your_hume_api_key + +# Local AI Services +OLLAMA_BASE_URL=http://ollama:11434 +LMSTUDIO_BASE_URL=http://lmstudio:1234 + +# Configuration +RECORDING_CLIP_DURATION=120 +QUOTE_THRESHOLD_REALTIME=8.5 +QUOTE_THRESHOLD_ROTATION=6.0 +QUOTE_THRESHOLD_DAILY=3.0 +DEFAULT_AI_PROVIDER=openai +DEFAULT_TTS_PROVIDER=elevenlabs +SPEAKER_RECOGNITION_PROVIDER=azure # azure/local/disabled +``` + +## Architecture + +### Project Structure +``` +/discord-quote-bot +├── main.py # Bot entry point and initialization +├── .env # Environment variables +├── requirements.txt # Python dependencies +├── docker-compose.yml # Container orchestration +├── Dockerfile # Bot container definition +├── alembic.ini # Database migration configuration +├── migrations/ # Database schema migrations +│ └── versions/ +├── config/ +│ ├── settings.py # Configuration management +│ ├── ai_providers.py # AI provider configurations +│ └── consent_templates.py # User consent messaging +├── core/ +│ ├── database.py # PostgreSQL abstraction layer +│ ├── ai_manager.py # Multi-provider AI interface +│ ├── memory_manager.py # Long-term memory system +│ ├── consent_manager.py # User consent and privacy +│ └── blocklist.py # User exclusion management +├── services/ +│ ├── audio_recorder.py # Persistent recording service +│ ├── speaker_diarization.py # Basic speaker separation +│ ├── speaker_recognition.py # Progressive user identification +│ ├── quote_analyzer.py # Quantitative analysis engine +│ ├── laughter_detector.py # Audio pattern recognition +│ ├── tts_service.py # Modern TTS integration +│ └── response_scheduler.py # Threshold-based responses +├── utils/ +│ ├── audio_processor.py # Audio conversion utilities +│ ├── prompts.py # AI prompt templates +│ ├── metrics.py # Performance monitoring +│ └── ui_components.py # Discord UI builders +├── cogs/ +│ ├── voice_cog.py # Voice channel management +│ ├── quotes_cog.py # Quote detection and commands +│ ├── consent_cog.py # Consent and privacy management +│ ├── admin_cog.py # Configuration and management +│ └── tasks_cog.py # Background tasks and scheduling +└── extensions/ + ├── ai_voice_chat.py # Future: AI voice interaction + ├── research_agents.py # Future: Information gathering + └── personality_engine.py # Future: Dynamic personality +``` + +### Container Architecture (docker-compose.yml) +```yaml +version: '3.8' +services: + bot: + build: . + environment: + - POSTGRES_URL=postgresql://quotes_user:secure_password@postgres:5432/quotes_db + - REDIS_URL=redis://redis:6379 + - QDRANT_URL=http://qdrant:6333 + depends_on: + - postgres + - redis + - qdrant + volumes: + - ./data:/app/data + - ./logs:/app/logs + - ./temp:/app/temp + restart: unless-stopped + + postgres: + image: postgres:15-alpine + environment: + - POSTGRES_DB=quotes_db + - POSTGRES_USER=quotes_user + - POSTGRES_PASSWORD=secure_password + ports: + - "5432:5432" + volumes: + - postgres_data:/var/lib/postgresql/data + - ./migrations:/docker-entrypoint-initdb.d + restart: unless-stopped + + redis: + image: redis:7-alpine + command: redis-server --maxmemory 512mb --maxmemory-policy allkeys-lru + ports: + - "6379:6379" + volumes: + - redis_data:/data + restart: unless-stopped + + qdrant: + image: qdrant/qdrant:latest + ports: + - "6333:6333" + volumes: + - qdrant_data:/qdrant/storage + environment: + - QDRANT__SERVICE__HTTP_PORT=6333 + restart: unless-stopped + + ollama: + image: ollama/ollama:latest + ports: + - "11434:11434" + volumes: + - ollama_data:/root/.ollama + environment: + - OLLAMA_HOST=0.0.0.0 + restart: unless-stopped + deploy: + resources: + limits: + memory: 8G + +volumes: + postgres_data: + redis_data: + qdrant_data: + ollama_data: +``` + +### Component Architecture + +```mermaid +graph TB + A[Discord Voice Channel] --> B[Persistent Audio Recorder] + B --> C[120s Clip Generator] + C --> D[Speaker Diarization] + D --> E[OpenAI Whisper API] + E --> F[Quote Analyzer] + F --> G[Quantitative Scorer] + G --> H{Score Analysis} + H -->|Score >= 8.5| I[Real-time Response] + H -->|Score >= 6.0| J[6-Hour Rotation Queue] + H -->|Score >= 3.0| K[Daily Summary Queue] + H -->|Score < 3.0| L[Discard] + + M[Laughter Detector] --> G + N[Speaker Recognition] --> O[User Mapping Cache] + O --> P[Long-term Memory] + P --> Q[Qdrant Vector DB] + + I --> R[Commentary Generator] + J --> S[Redis Queue] + K --> T[Daily Task Scheduler] + R --> U[Discord Text Channel] + + V[Blocklist Manager] --> B + W[Multi-AI Provider] --> F + W --> R +``` + +## Data Models & Database Schema + +## Data Models & Database Schema + +### Database Design (PostgreSQL) + +#### User Consent Table +| Column | Type | Constraints | Description | +|--------|------|-------------|-------------| +| user_id | BIGINT | PRIMARY KEY | Discord user ID | +| guild_id | BIGINT | NOT NULL | Discord server ID | +| consent_given | BOOLEAN | NOT NULL DEFAULT FALSE | Recording consent status | +| consent_timestamp | TIMESTAMP | NULL | When consent was given | +| global_opt_out | BOOLEAN | NOT NULL DEFAULT FALSE | Global opt-out across all servers | +| first_name | VARCHAR(100) | NULL | User's preferred first name | +| created_at | TIMESTAMP | NOT NULL DEFAULT NOW() | Record creation time | +| updated_at | TIMESTAMP | NOT NULL DEFAULT NOW() | Last update time | + +#### Quotes Table +| Column | Type | Constraints | Description | +|--------|------|-------------|-------------| +| id | SERIAL | PRIMARY KEY | Unique quote identifier | +| user_id | BIGINT | NULL | Discord user ID (null for unknown speakers) | +| speaker_label | VARCHAR(20) | NOT NULL | "Speaker 1", "Speaker 2", or username | +| username | VARCHAR(100) | NULL | User display name at time of quote | +| quote | TEXT | NOT NULL | The actual quote text | +| timestamp | TIMESTAMP | NOT NULL | Quote occurrence time | +| guild_id | BIGINT | NOT NULL | Discord server ID | +| channel_id | BIGINT | NOT NULL | Voice channel where quote occurred | +| funny_score | DECIMAL(3,1) | DEFAULT 0.0 | Humor quantification (0-10) | +| dark_score | DECIMAL(3,1) | DEFAULT 0.0 | Dark humor rating (0-10) | +| silly_score | DECIMAL(3,1) | DEFAULT 0.0 | Silliness rating (0-10) | +| suspicious_score | DECIMAL(3,1) | DEFAULT 0.0 | Suspiciousness rating (0-10) | +| asinine_score | DECIMAL(3,1) | DEFAULT 0.0 | Asininity rating (0-10) | +| overall_score | DECIMAL(4,2) | NOT NULL | Combined weighted score | +| laughter_duration | DECIMAL(5,2) | DEFAULT 0.0 | Seconds of laughter detected | +| laughter_intensity | DECIMAL(3,2) | DEFAULT 0.0 | Average laughter volume (0-1) | +| response_type | VARCHAR(20) | NOT NULL | realtime/rotation/daily/none | +| audio_clip_path | VARCHAR(500) | NULL | Path to original audio clip | +| speaker_confidence | DECIMAL(3,2) | DEFAULT 0.0 | Speaker recognition confidence | +| user_feedback | INTEGER | NULL | User reaction (+1, -1, null) | +| created_at | TIMESTAMP | NOT NULL DEFAULT NOW() | Record creation time | + +#### Speaker Profiles Table +| Column | Type | Constraints | Description | +|--------|------|-------------|-------------| +| id | SERIAL | PRIMARY KEY | Profile identifier | +| user_id | BIGINT | NOT NULL UNIQUE | Discord user ID | +| voice_embedding | BYTEA | NULL | Speaker recognition embedding | +| enrollment_status | VARCHAR(20) | DEFAULT 'none' | none/pending/enrolled | +| enrollment_phrase | VARCHAR(200) | NULL | Phrase used for enrollment | +| personality_summary | TEXT | NULL | AI-generated personality notes | +| quote_count | INTEGER | DEFAULT 0 | Total quotes from user | +| avg_humor_score | DECIMAL(3,1) | DEFAULT 0.0 | Average humor rating | +| last_seen | TIMESTAMP | NULL | Last voice activity timestamp | +| training_samples | INTEGER | DEFAULT 0 | Number of training audio samples | +| recognition_accuracy | DECIMAL(3,2) | DEFAULT 0.0 | Historical accuracy rate | +| created_at | TIMESTAMP | NOT NULL DEFAULT NOW() | Profile creation time | +| updated_at | TIMESTAMP | NOT NULL DEFAULT NOW() | Last profile update | + +#### Quote Feedback Table +| Column | Type | Constraints | Description | +|--------|------|-------------|-------------| +| id | SERIAL | PRIMARY KEY | Feedback identifier | +| quote_id | INTEGER | FOREIGN KEY REFERENCES quotes(id) | Quote being tagged | +| user_id | BIGINT | NOT NULL | User providing feedback | +| feedback_type | VARCHAR(20) | NOT NULL | tag_speaker/rate_quote/correct_score | +| feedback_value | TEXT | NOT NULL | JSON feedback data | +| timestamp | TIMESTAMP | NOT NULL DEFAULT NOW() | Feedback time | + +### Database Manager Classes + +```python +class ConsentManager: + async def init_db() + async def request_consent(self, guild_id: int, channel_id: int) -> discord.Embed + async def grant_consent(self, user_id: int, guild_id: int) -> bool + async def revoke_consent(self, user_id: int, guild_id: int) -> bool + async def check_consent(self, user_id: int, guild_id: int) -> bool + async def set_global_opt_out(self, user_id: int, opt_out: bool) -> bool + async def get_consented_users(self, guild_id: int) -> List[int] + async def cleanup_non_consented_data(self, guild_id: int) + +class QuoteDatabase: + async def init_db() + async def save_quote(self, quote_data: QuoteData) + async def get_quotes_by_score(self, guild_id: int, min_score: float, limit: int) + async def get_unknown_speaker_quotes(self, guild_id: int, limit: int) + async def tag_quote_speaker(self, quote_id: int, user_id: int, tagger_id: int) + async def update_quote_scores(self, quote_id: int, scores: dict) + async def get_user_quotes(self, user_id: int, guild_id: int) + async def delete_user_quotes(self, user_id: int, guild_id: int) + async def get_quote_explanation(self, quote_id: int) -> dict + +class SpeakerDatabase: + async def store_speaker_profile(self, user_id: int, voice_embedding: bytes, metadata: dict) + async def recognize_speaker(self, voice_embedding: bytes, threshold: float = 0.8) -> Optional[int] + async def start_enrollment(self, user_id: int, phrase: str) + async def complete_enrollment(self, user_id: int, voice_embedding: bytes) + async def add_training_sample(self, user_id: int, voice_embedding: bytes) + async def update_speaker_stats(self, user_id: int, quote_scores: dict) + async def get_speaker_personality(self, user_id: int) -> Optional[str] + async def get_enrollment_candidates(self, guild_id: int) -> List[dict] +``` + +## Consent-First Recording & Audio Processing + +### User Consent Flow + +#### Initial Consent Request +```python +class ConsentManager: + async def request_recording_consent(self, guild_id: int, channel_id: int): + """Display consent request embed with interactive buttons""" + embed = discord.Embed( + title="🎤 Quote Bot Recording Request", + description=( + "I'd like to record this voice channel to capture memorable quotes.\n\n" + "**What I record:**\n" + "• 120-second audio clips for transcription\n" + "• Quotes are analyzed for humor and memorable content\n" + "• Audio files are deleted after 24 hours\n\n" + "**Your privacy:**\n" + "• Click 'Give Consent' to participate\n" + "• Use `/opt_out` anytime to stop recording\n" + "• Use `/delete_my_quotes` to remove your data\n\n" + "Only consenting users will be recorded." + ), + color=0x00ff00 + ) + + view = ConsentView() + await channel.send(embed=embed, view=view) + + async def announce_recording_start(self, voice_client): + """TTS announcement when recording begins""" + announcement = ( + "Hello! Quote Bot is now recording this channel to capture memorable moments. " + "If you haven't consented or wish to opt out, please use the /opt_out command." + ) + await self.tts_service.speak_in_channel(voice_client, announcement) +``` + +#### Interactive Consent Interface +```python +class ConsentView(discord.ui.View): + @discord.ui.button(label="Give Consent", style=discord.ButtonStyle.green, emoji="✅") + async def give_consent(self, interaction: discord.Interaction, button: discord.ui.Button): + await self.consent_manager.grant_consent(interaction.user.id, interaction.guild.id) + await interaction.response.send_message( + "✅ Consent granted! You'll now be included in recordings. Use `/opt_out` anytime to stop.", + ephemeral=True + ) + + @discord.ui.button(label="Learn More", style=discord.ButtonStyle.gray, emoji="ℹ️") + async def learn_more(self, interaction: discord.Interaction, button: discord.ui.Button): + await interaction.response.send_message( + self.get_privacy_info_embed(), + ephemeral=True + ) +``` + +### Progressive Speaker Recognition System + +#### Phase 1: Speaker Diarization Only +```python +class SpeakerDiarization: + def __init__(self): + self.diarization_pipeline = load_pyannote_pipeline() + self.speaker_labels = {} # temp_id -> "Speaker 1", "Speaker 2", etc. + + async def process_audio_clip(self, audio_clip: AudioClip) -> List[SpeakerSegment]: + """Separate speakers without identification""" + diarization = self.diarization_pipeline(audio_clip.audio_data) + + segments = [] + for turn, _, speaker in diarization.itertracks(yield_label=True): + if speaker not in self.speaker_labels: + self.speaker_labels[speaker] = f"Speaker {len(self.speaker_labels) + 1}" + + segments.append(SpeakerSegment( + speaker_label=self.speaker_labels[speaker], + start_time=turn.start, + end_time=turn.end, + audio_segment=audio_clip.extract_segment(turn.start, turn.end) + )) + + return segments +``` + +#### Phase 2: User-Assisted Tagging +```python +class UserAssistedTagging: + async def post_unknown_speaker_quote(self, quote_data: QuoteData): + """Post quote with tagging buttons for user identification""" + embed = discord.Embed( + title=f"📝 Quote from {quote_data.speaker_label}", + description=f'"{quote_data.quote}"', + color=self.get_score_color(quote_data.overall_score) + ) + + embed.add_field( + name="Scores", + value=f"Funny: {quote_data.funny_score}/10 | " + f"Overall: {quote_data.overall_score}/10", + inline=False + ) + + # Get active voice channel members for tagging options + voice_members = self.get_voice_channel_members(quote_data.channel_id) + view = SpeakerTaggingView(quote_data.id, voice_members) + + await self.channel.send(embed=embed, view=view) + + async def process_speaker_tag(self, quote_id: int, tagged_user_id: int, tagger_id: int): + """Process user tagging and build speaker recognition database""" + quote = await self.db.get_quote(quote_id) + if quote and quote.audio_clip_path: + # Extract audio segment for this speaker + audio_segment = await self.extract_speaker_audio(quote.audio_clip_path, quote.speaker_label) + + # Add to training data + await self.speaker_db.add_training_sample(tagged_user_id, audio_segment) + + # Update quote with user ID + await self.db.update_quote_speaker(quote_id, tagged_user_id) + + # Check if we have enough samples to enable recognition + sample_count = await self.speaker_db.get_training_sample_count(tagged_user_id) + if sample_count >= 3: # Minimum samples for recognition + await self.enable_speaker_recognition(tagged_user_id) +``` + +#### Phase 3: Optional Active Enrollment +```python +class ActiveEnrollment: + def __init__(self): + self.enrollment_phrases = [ + "The quick brown fox jumps over the lazy dog", + "Hello, this is my voice for the quote bot", + "I am enrolling my voice for speaker recognition", + "Testing one two three for voice identification" + ] + + async def start_enrollment(self, user_id: int, voice_channel): + """Guide user through voice enrollment process""" + phrase = random.choice(self.enrollment_phrases) + + await self.speaker_db.start_enrollment(user_id, phrase) + + embed = discord.Embed( + title="🎙️ Voice Enrollment", + description=( + f"Please clearly say the following phrase:\n\n" + f"**\"{phrase}\"**\n\n" + f"Speak clearly and wait for the confirmation message." + ), + color=0x0099ff + ) + + await voice_channel.send(embed=embed) + + # Start recording for enrollment + enrollment_audio = await self.record_enrollment_sample(user_id, voice_channel) + + if enrollment_audio: + embedding = await self.generate_speaker_embedding(enrollment_audio) + await self.speaker_db.complete_enrollment(user_id, embedding) + + await voice_channel.send( + f"✅ <@{user_id}> Voice enrollment complete! I'll now recognize your voice automatically." + ) + else: + await voice_channel.send( + f"❌ <@{user_id}> Enrollment failed. Please try again with `/enroll_voice`." + ) +``` + +### Modern TTS Integration + +#### Multi-Provider TTS Service +```python +class TTSService: + def __init__(self): + self.providers = { + 'elevenlabs': ElevenLabsTTS(), + 'openai': OpenAITTS(), + 'azure': AzureTTS() + } + self.default_provider = os.getenv('DEFAULT_TTS_PROVIDER', 'elevenlabs') + + async def generate_speech(self, text: str, voice_style: str = 'conversational') -> bytes: + """Generate high-quality speech audio""" + provider = self.providers[self.default_provider] + + try: + audio_data = await provider.text_to_speech( + text=text, + voice=voice_style, + stability=0.5, + clarity=0.8, + style=0.3 + ) + return audio_data + except Exception as e: + # Fallback to alternative provider + logger.warning(f"TTS provider {self.default_provider} failed: {e}") + fallback_provider = self.providers['openai'] + return await fallback_provider.text_to_speech(text) + + async def speak_in_channel(self, voice_client, text: str): + """Play TTS audio in voice channel""" + audio_data = await self.generate_speech(text) + + # Create temporary audio file + with tempfile.NamedTemporaryFile(suffix='.mp3', delete=False) as temp_file: + temp_file.write(audio_data) + temp_path = temp_file.name + + try: + # Play audio in voice channel + audio_source = discord.FFmpegPCMAudio(temp_path) + voice_client.play(audio_source) + + # Wait for playback to complete + while voice_client.is_playing(): + await asyncio.sleep(0.1) + finally: + # Cleanup temporary file + os.unlink(temp_path) +``` + +### Audio Processing Pipeline + +```mermaid +sequenceDiagram + participant U as User + participant B as Bot + participant CM as Consent Manager + participant AR as Audio Recorder + participant SD as Speaker Diarization + participant TA as Transcription API + participant QA as Quote Analyzer + participant UT as User Tagging + + U->>B: /start_recording + B->>CM: Request consent from channel + CM->>U: Display consent embed + U->>CM: Click "Give Consent" + CM->>AR: Start recording consented users + AR->>SD: Process 120s clips + SD->>TA: Send speaker segments + TA->>QA: Return transcript with "Speaker 1", "Speaker 2" + QA->>B: Post quote with tagging buttons + U->>UT: Tag speaker as @User + UT->>AR: Build speaker training data +``` + +## Multi-Provider AI Integration + +### AI Provider Manager +```python +class AIProviderManager: + def __init__(self): + self.providers = { + 'openai': OpenAIProvider(), + 'anthropic': AnthropicProvider(), + 'groq': GroqProvider(), + 'openrouter': OpenRouterProvider(), + 'ollama': OllamaProvider(), + 'lmstudio': LMStudioProvider() + } + self.default_provider = os.getenv('DEFAULT_AI_PROVIDER', 'openai') + + async def get_provider(self, provider_name: str = None) -> BaseAIProvider + async def transcribe(self, audio_data: bytes, provider: str = None) -> str + async def analyze_quote(self, transcript: str, context: dict, provider: str = None) + async def generate_commentary(self, quote_data: dict, provider: str = None) -> str +``` + +### Transcription Service (Multi-Provider) +```python +class TranscriptionService: + async def transcribe_with_speakers(self, audio_clip: AudioClip) -> TranscriptResult: + # Primary: OpenAI Whisper for accuracy + # Fallback: Groq for speed, local models for privacy + + async def batch_transcribe(self, clips: List[AudioClip]) -> List[TranscriptResult]: + # Parallel processing with rate limit management +``` + +### Quantitative Quote Analysis Engine + +#### Scoring Algorithm +```python +class QuoteAnalyzer: + def __init__(self, ai_provider: BaseAIProvider): + self.ai_provider = ai_provider + self.scoring_weights = { + 'funny': 0.3, + 'dark': 0.15, + 'silly': 0.2, + 'suspicious': 0.1, + 'asinine': 0.25 + } + + async def analyze_quote(self, quote: str, context: dict) -> QuoteScores: + # Multi-dimensional analysis with AI scoring + + async def calculate_final_score(self, scores: QuoteScores, laughter_data: LaughterMetrics) -> float: + # Weighted combination with laughter boost +``` + +#### Scoring Prompt Templates +```python +QUOTE_ANALYSIS_PROMPT = """ +Analyze this quote from a Discord voice chat for the following dimensions (0-10 scale): + +Quote: "{quote}" +Speaker: {speaker_name} +Context: {conversation_context} +Laughter Response: {laughter_duration}s duration, {laughter_intensity} intensity + +Score each dimension: +1. FUNNY: How humorous/witty is this? (0=not funny, 10=hilarious) +2. DARK: How dark/edgy is the humor? (0=light, 10=very dark) +3. SILLY: How absurd/nonsensical? (0=serious, 10=completely ridiculous) +4. SUSPICIOUS: How questionable/concerning? (0=innocent, 10=very sus) +5. ASININE: How stupid/mindless? (0=thoughtful, 10=completely brain-dead) + +Respond in JSON format: +{ + "funny": score, + "dark": score, + "silly": score, + "suspicious": score, + "asinine": score, + "reasoning": "Brief explanation" +} +""" + +COMMENTARY_GENERATION_PROMPT = """ +Generate witty commentary for this quote: + +Quote: "{quote}" - {speaker_name} +Scores: Funny({funny}/10), Dark({dark}/10), Silly({silly}/10) +Personality Context: {speaker_personality} +Recent Interactions: {recent_context} + +Style Guidelines: +- Match the humor level to the quote's tone +- Reference the speaker's personality/history when relevant +- Keep under 200 characters for Discord +- Be clever, not mean-spirited +- Use appropriate emojis + +Generate a witty response: +""" +``` + +### Long-Term Memory System + +#### Memory Storage (Qdrant Integration) +```python +class MemoryManager: + def __init__(self, qdrant_client): + self.qdrant = qdrant_client + self.embedding_model = SentenceTransformer('all-MiniLM-L6-v2') + + async def store_interaction(self, user_id: int, content: str, context_type: str): + embedding = self.embedding_model.encode(content) + await self.qdrant.upsert( + collection_name="user_memories", + points=[{ + "id": generate_id(), + "vector": embedding.tolist(), + "payload": { + "user_id": user_id, + "content": content, + "context_type": context_type, + "timestamp": datetime.utcnow().isoformat() + } + }] + ) + + async def retrieve_user_context(self, user_id: int, query: str, limit: int = 5) -> List[dict]: + query_embedding = self.embedding_model.encode(query) + results = await self.qdrant.search( + collection_name="user_memories", + query_vector=query_embedding.tolist(), + query_filter={ + "must": [{"key": "user_id", "match": {"value": user_id}}] + }, + limit=limit + ) + return [hit.payload for hit in results] + + async def update_personality_summary(self, user_id: int, recent_quotes: List[str]): + # AI-generated personality analysis based on quote history +``` + +### Response Threshold System + +#### Threshold Configuration +```python +class ResponseScheduler: + def __init__(self): + self.thresholds = { + 'realtime': float(os.getenv('QUOTE_THRESHOLD_REALTIME', 8.5)), + 'rotation': float(os.getenv('QUOTE_THRESHOLD_ROTATION', 6.0)), + 'daily': float(os.getenv('QUOTE_THRESHOLD_DAILY', 3.0)) + } + self.rotation_interval = 6 * 3600 # 6 hours in seconds + + async def process_quote_score(self, quote_data: QuoteData) -> ResponseType: + score = quote_data.overall_score + + if score >= self.thresholds['realtime']: + await self.send_realtime_response(quote_data) + return ResponseType.REALTIME + elif score >= self.thresholds['rotation']: + await self.queue_for_rotation(quote_data) + return ResponseType.ROTATION + elif score >= self.thresholds['daily']: + await self.queue_for_daily(quote_data) + return ResponseType.DAILY + else: + return ResponseType.NONE + + async def process_rotation_queue(self, guild_id: int): + # Send accumulated quotes every 6 hours + + async def process_daily_queue(self, guild_id: int): + # Generate daily summary +``` + +## Command Interface (Slash Commands) + +### Voice Management Commands +- `/start_recording [channel]` - Request consent and begin recording in voice channel +- `/stop_recording` - Stop recording and process final clips +- `/recording_status` - Show active recordings, consent status, and queue metrics +- `/join_voice` - Join user's current voice channel without recording +- `/leave_voice` - Leave voice channel and cleanup resources + +### Quote Retrieval & Interaction Commands +- `/random_quote [user] [min_score] [category]` - Display random quote with filters +- `/top_quotes [timeframe] [category] [limit]` - Show highest scoring quotes +- `/user_quotes [limit] [category]` - Show quotes from specific user +- `/quote_stats [user]` - Display comprehensive quote and scoring statistics +- `/search_quotes [category] [min_score]` - Search quotes by content +- `/leaderboard [category] [timeframe]` - Show quote leaderboards by score +- `/why_quote ` - **Explain why a quote was selected with score breakdown** +- `/my_personality` - **View your AI-generated personality summary** + +### Speaker Recognition & Enrollment +- `/enroll_voice` - **Active voice enrollment for better recognition** +- `/enrollment_status` - Check your voice enrollment progress +- `/recognition_stats` - View speaker recognition accuracy metrics +- `/update_first_name ` - Set your preferred first name for quotes + +### Privacy & Consent Commands +- `/give_consent` - Explicitly consent to recording (alternative to button) +- `/revoke_consent` - Revoke consent and stop recording your voice +- `/opt_out [global]` - Exclude yourself from recording (server or global) +- `/opt_in` - Re-enable recording for yourself +- `/delete_my_quotes [timeframe]` - Remove your quotes from database +- `/privacy_info` - Display detailed privacy and data handling information +- `/export_my_data` - Export your quotes and data (GDPR compliance) + +### Administrative Commands +- `/admin_panel` - Display bot configuration dashboard +- `/set_thresholds ` - Configure response thresholds +- `/set_summary_channel ` - Set daily summary destination +- `/set_ai_provider ` - Change AI provider for different functions +- `/configure_scoring ` - Adjust scoring algorithm weights +- `/blocklist_add ` - Add user to server blocklist (admin only) +- `/blocklist_remove ` - Remove user from blocklist (admin only) +- `/blocklist_show` - Display current server blocklist +- `/system_health` - Show bot performance and service status +- `/force_daily_summary` - Manually trigger daily summary generation + +### Memory & Context Commands +- `/conversation_context [hours]` - Display recent conversation context +- `/memory_search [user]` - Search long-term memory system +- `/personality_summary [user]` - Show AI-generated personality analysis +- `/reset_memory [user]` - Clear memory data (admin only) + +### Data Management Commands +- `/export_quotes [date_range] [format] [filter]` - Export quotes (JSON/CSV) +- `/import_quotes ` - Import quotes from backup (admin only) +- `/cleanup_old_data [days]` - Remove old audio files and temporary data +- `/backup_database` - Create database backup (admin only) + +### Interactive Features + +#### Quote Explanation System +```python +class QuoteExplanation: + async def explain_quote(self, quote_id: int) -> discord.Embed: + quote = await self.db.get_quote_with_context(quote_id) + + embed = discord.Embed( + title=f"🎯 Why this quote was selected", + description=f'"{quote.text}" - {quote.speaker_label}', + color=self.get_score_color(quote.overall_score) + ) + + # Score breakdown + scores_text = ( + f"**Funny:** {quote.funny_score}/10\n" + f"**Dark:** {quote.dark_score}/10\n" + f"**Silly:** {quote.silly_score}/10\n" + f"**Suspicious:** {quote.suspicious_score}/10\n" + f"**Asinine:** {quote.asinine_score}/10\n\n" + f"**Overall Score:** {quote.overall_score}/10" + ) + embed.add_field(name="📊 Score Breakdown", value=scores_text, inline=True) + + # Laughter analysis + if quote.laughter_duration > 0: + laughter_text = ( + f"**Duration:** {quote.laughter_duration:.1f} seconds\n" + f"**Intensity:** {quote.laughter_intensity:.2f}/1.0\n" + f"**Boost Applied:** +{self.calculate_laughter_boost(quote)}pts" + ) + embed.add_field(name="😂 Laughter Analysis", value=laughter_text, inline=True) + + # Context factors + context_factors = await self.get_context_factors(quote) + if context_factors: + embed.add_field(name="🎭 Context Factors", value=context_factors, inline=False) + + return embed +``` + +#### Interactive Quote Tagging +```python +class SpeakerTaggingView(discord.ui.View): + def __init__(self, quote_id: int, voice_members: List[discord.Member]): + super().__init__(timeout=300) # 5 minute timeout + self.quote_id = quote_id + + # Create buttons for each voice channel member + for member in voice_members[:5]: # Limit to 5 buttons + button = discord.ui.Button( + label=f"Tag {member.display_name}", + style=discord.ButtonStyle.primary, + custom_id=f"tag_{member.id}" + ) + button.callback = self.create_tag_callback(member.id) + self.add_item(button) + + # Add "Unknown Speaker" option + unknown_button = discord.ui.Button( + label="Keep as Unknown", + style=discord.ButtonStyle.gray, + custom_id="unknown" + ) + unknown_button.callback = self.keep_unknown + self.add_item(unknown_button) + + def create_tag_callback(self, user_id: int): + async def tag_callback(interaction: discord.Interaction): + await self.speaker_recognition.process_speaker_tag( + self.quote_id, user_id, interaction.user.id + ) + + embed = discord.Embed( + title="✅ Speaker Tagged", + description=f"Quote tagged as <@{user_id}>. This helps improve speaker recognition!", + color=0x00ff00 + ) + await interaction.response.edit_message(embed=embed, view=None) + + return tag_callback +``` + +#### Feedback Collection System +```python +class QuoteFeedbackView(discord.ui.View): + def __init__(self, quote_id: int): + super().__init__(timeout=None) # Persistent view + self.quote_id = quote_id + + @discord.ui.button(emoji="👍", style=discord.ButtonStyle.green, custom_id="upvote") + async def upvote(self, interaction: discord.Interaction, button: discord.ui.Button): + await self.record_feedback(interaction, 1) + + @discord.ui.button(emoji="👎", style=discord.ButtonStyle.red, custom_id="downvote") + async def downvote(self, interaction: discord.Interaction, button: discord.ui.Button): + await self.record_feedback(interaction, -1) + + async def record_feedback(self, interaction: discord.Interaction, value: int): + await self.db.record_quote_feedback(self.quote_id, interaction.user.id, value) + + # Update AI training data for RLHF + await self.ai_trainer.process_feedback(self.quote_id, value) + + feedback_text = "positive" if value > 0 else "negative" + await interaction.response.send_message( + f"Thanks for the {feedback_text} feedback! This helps improve my commentary.", + ephemeral=True + ) +``` + +## User Experience & Transparency Features + +### Progressive Speaker Recognition Strategy + +#### Implementation Phases + +**Phase 1: Foundation (MVP)** +- Start with pyannote.audio diarization only +- Label speakers as "Speaker 1", "Speaker 2", etc. +- Implement user-assisted tagging system +- Build consent and privacy framework + +**Phase 2: Learning (User Feedback)** +- Collect tagged audio samples from user feedback +- Implement basic speaker clustering +- Add enrollment command for voluntary participation +- Begin building speaker recognition confidence + +**Phase 3: Recognition (Advanced)** +- Enable automatic speaker identification with confidence thresholds +- Integrate external services (Azure Speaker Recognition) as optional upgrade +- Implement active learning from correction feedback +- Add voice change detection and adaptation + +#### Addressing Recognition Challenges + +```python +class ReliableSpeakerRecognition: + def __init__(self): + self.confidence_threshold = 0.8 # High threshold for reliability + self.fallback_to_diarization = True + self.learning_enabled = True + + async def identify_speaker(self, audio_segment: bytes, context: dict) -> SpeakerResult: + """Multi-stage speaker identification with fallbacks""" + + # Stage 1: Attempt recognition if sufficient training data exists + if self.has_sufficient_training_data(): + recognition_result = await self.attempt_recognition(audio_segment) + + if recognition_result.confidence >= self.confidence_threshold: + return SpeakerResult( + user_id=recognition_result.user_id, + confidence=recognition_result.confidence, + method="recognition" + ) + + # Stage 2: Fallback to diarization labeling + speaker_label = await self.diarization_fallback(audio_segment, context) + + return SpeakerResult( + speaker_label=speaker_label, + confidence=1.0, # Diarization is always confident about separation + method="diarization", + needs_tagging=True + ) + + async def handle_voice_changes(self, user_id: int, new_audio: bytes): + """Adapt to voice changes (illness, different mic, etc.)""" + current_embedding = await self.get_user_embedding(user_id) + new_embedding = await self.generate_embedding(new_audio) + + similarity = self.calculate_similarity(current_embedding, new_embedding) + + if similarity < 0.6: # Significant voice change detected + # Request re-enrollment or add as variation + await self.request_voice_update(user_id) + elif similarity < 0.8: # Minor change, adapt gradually + await self.update_embedding_with_adaptation(user_id, new_embedding) +``` + +### Transparency & Trust Building + +#### Quote Explanation System +```python +class TransparencyEngine: + async def explain_quote_selection(self, quote_id: int) -> dict: + """Provide detailed explanation of why quote was selected""" + quote = await self.db.get_quote_with_context(quote_id) + + explanation = { + "quote_text": quote.text, + "speaker": quote.speaker_label, + "overall_score": quote.overall_score, + "score_breakdown": { + "funny": { + "score": quote.funny_score, + "reasoning": await self.get_score_reasoning(quote, "funny") + }, + "dark": { + "score": quote.dark_score, + "reasoning": await self.get_score_reasoning(quote, "dark") + }, + # ... other categories + }, + "laughter_analysis": { + "detected": quote.laughter_duration > 0, + "duration": quote.laughter_duration, + "intensity": quote.laughter_intensity, + "boost_applied": self.calculate_laughter_boost(quote) + }, + "context_factors": await self.get_context_factors(quote), + "threshold_met": self.get_threshold_explanation(quote.overall_score) + } + + return explanation + + async def get_score_reasoning(self, quote: Quote, category: str) -> str: + """Get AI explanation for specific score category""" + prompt = f""" + Explain why this quote received a {category} score of {getattr(quote, f'{category}_score')}/10: + + Quote: "{quote.text}" + Context: {quote.conversation_context} + + Provide a brief, clear explanation in 1-2 sentences. + """ + + return await self.ai_provider.generate_explanation(prompt) +``` + +### Enhanced User Interaction + +#### Smart Onboarding Flow +```python +class OnboardingManager: + async def start_server_onboarding(self, guild_id: int, admin_user_id: int): + """Comprehensive server setup process""" + + # Step 1: Privacy and consent explanation + privacy_embed = self.create_privacy_explanation_embed() + await self.send_admin_message(admin_user_id, privacy_embed) + + # Step 2: Configuration guidance + config_embed = self.create_configuration_guide_embed() + config_view = ConfigurationView(guild_id) + await self.send_admin_message(admin_user_id, config_embed, view=config_view) + + # Step 3: Test recording setup + await self.guide_test_recording(guild_id, admin_user_id) + + # Step 4: User education + user_guide = self.create_user_guide_embed() + await self.send_to_general_channel(guild_id, user_guide) + + def create_privacy_explanation_embed(self) -> discord.Embed: + return discord.Embed( + title="🔒 Privacy & Data Handling", + description=( + "**What the bot records:**\n" + "• 120-second audio clips for transcription only\n" + "• Text quotes extracted from conversations\n" + "• Humor and content analysis scores\n\n" + "**What the bot does NOT store:**\n" + "• Permanent audio recordings\n" + "• Full conversation history\n" + "• Personal information beyond Discord usernames\n\n" + "**Data retention:**\n" + "• Audio clips: Deleted after 24 hours\n" + "• Quotes: Stored indefinitely (user can delete)\n" + "• User embeddings: Only if voluntarily enrolled\n\n" + "**User controls:**\n" + "• Opt-out anytime with `/opt_out`\n" + "• Delete personal data with `/delete_my_quotes`\n" + "• Export data with `/export_my_data`\n" + ), + color=0x00ff00 + ) +``` + +#### Interactive Help System +```python +class InteractiveHelp: + @discord.slash_command() + async def help_interactive(self, ctx): + """Interactive help system with contextual guidance""" + embed = discord.Embed( + title="🤖 Quote Bot Help Center", + description="Choose a topic to learn more:", + color=0x0099ff + ) + + view = HelpNavigationView() + await ctx.respond(embed=embed, view=view) + + class HelpNavigationView(discord.ui.View): + @discord.ui.select( + placeholder="Choose a help topic...", + options=[ + discord.SelectOption(label="Getting Started", value="getting_started", emoji="🏁"), + discord.SelectOption(label="Privacy & Consent", value="privacy", emoji="🔒"), + discord.SelectOption(label="Speaker Recognition", value="speaker_rec", emoji="🎙️"), + discord.SelectOption(label="Quote Scoring", value="scoring", emoji="📊"), + discord.SelectOption(label="Commands Reference", value="commands", emoji="📝"), + discord.SelectOption(label="Troubleshooting", value="troubleshoot", emoji="🔧") + ] + ) + async def help_select(self, select, interaction): + topic = select.values[0] + embed = await self.get_help_embed(topic) + await interaction.response.edit_message(embed=embed) +``` + +### External Service Integration Strategy + +#### Azure Speaker Recognition Integration +```python +class AzureSpeakerService: + """Optional premium speaker recognition using Azure Cognitive Services""" + + def __init__(self): + self.enabled = bool(os.getenv('AZURE_SPEECH_KEY')) + self.client = None + if self.enabled: + self.client = SpeakerRecognitionClient() + + async def create_speaker_profile(self, user_id: int, audio_samples: List[bytes]) -> str: + """Create Azure speaker profile for user""" + if not self.enabled: + return None + + profile_id = await self.client.create_profile( + enrollment_audio=audio_samples, + metadata={"user_id": user_id} + ) + + await self.db.store_azure_profile_id(user_id, profile_id) + return profile_id + + async def identify_speaker(self, audio_segment: bytes, candidate_profiles: List[str]) -> Optional[dict]: + """Identify speaker using Azure service""" + if not self.enabled or not candidate_profiles: + return None + + result = await self.client.identify( + audio=audio_segment, + profile_ids=candidate_profiles + ) + + if result.confidence > 0.8: + user_id = await self.db.get_user_by_profile_id(result.profile_id) + return { + "user_id": user_id, + "confidence": result.confidence, + "method": "azure_speaker_recognition" + } + + return None +``` + +#### Hume AI Emotion Detection +```python +class EmotionAnalysis: + """Advanced emotion detection to enhance quote scoring""" + + def __init__(self): + self.enabled = bool(os.getenv('HUME_AI_API_KEY')) + self.client = HumeAIClient() if self.enabled else None + + async def analyze_emotional_context(self, audio_segment: bytes) -> dict: + """Analyze emotional context of quote""" + if not self.enabled: + return {"emotions": {}, "confidence": 0.0} + + emotions = await self.client.analyze_audio_emotions(audio_segment) + + # Map emotions to our scoring categories + emotion_boost = { + "funny": emotions.get("joy", 0) + emotions.get("amusement", 0), + "dark": emotions.get("sadness", 0) + emotions.get("anger", 0), + "silly": emotions.get("surprise", 0) + emotions.get("confusion", 0), + "suspicious": emotions.get("suspicion", 0) + emotions.get("contempt", 0) + } + + return { + "emotions": emotions, + "scoring_boost": emotion_boost, + "confidence": emotions.get("confidence", 0.0) + } +``` + +### Reinforcement Learning from Human Feedback (RLHF) + +```python +class RLHFTrainer: + """Improve AI commentary based on user feedback""" + + async def process_feedback(self, quote_id: int, feedback_value: int): + """Process user feedback to improve future commentary""" + quote = await self.db.get_quote_with_commentary(quote_id) + + training_sample = { + "quote_text": quote.text, + "scores": quote.get_score_dict(), + "commentary": quote.commentary, + "feedback": feedback_value, + "context": quote.conversation_context + } + + # Store for periodic model fine-tuning + await self.feedback_db.store_training_sample(training_sample) + + # Immediate prompt adjustment + if feedback_value < 0: + await self.adjust_commentary_style(quote.speaker_id, "less_aggressive") + elif feedback_value > 0: + await self.reinforce_commentary_style(quote.speaker_id, quote.commentary_style) + + async def retrain_commentary_model(self): + """Periodic retraining based on accumulated feedback""" + feedback_samples = await self.feedback_db.get_recent_samples(limit=1000) + + # Analyze feedback patterns + successful_patterns = [s for s in feedback_samples if s["feedback"] > 0] + failed_patterns = [s for s in feedback_samples if s["feedback"] < 0] + + # Update prompt templates based on successful patterns + await self.update_prompt_templates(successful_patterns, failed_patterns) +``` + +## Error Handling & Resilience + +### API Failures +- **OpenAI API**: Retry with exponential backoff, fallback to local processing +- **Discord API**: Automatic reconnection, rate limit compliance +- **Database**: Connection pooling, transaction rollback + +### Voice Processing Errors +- Audio corruption handling +- Network interruption recovery +- Buffer overflow protection +- Silent failure graceful degradation + +### Logging Strategy +```python +# Structured logging levels +DEBUG: Audio buffer operations +INFO: Quote detections, command usage +WARNING: API rate limits, connection issues +ERROR: Critical failures, data loss prevention +``` + +## Security & Permissions + +### Discord Permissions Required +- `Connect` - Join voice channels +- `Speak` - Audio playback for TTS +- `Use Slash Commands` - Command interface +- `Send Messages` - Commentary posting +- `Embed Links` - Rich message formatting +- `Read Message History` - Context awareness + +### Data Privacy Measures +- No persistent audio storage +- Quote anonymization options +- User opt-out mechanisms +- GDPR compliance for EU users + +### Rate Limiting Strategy +- OpenAI API: 3 RPM for Whisper, 500 RPM for GPT +- Discord API: Built-in rate limit handling +- Database: Connection pooling and query optimization + +## Performance Optimization + +### Memory Management +- Circular audio buffers with size limits +- Garbage collection for processed audio +- Database connection pooling +- Async operation batching + +### Scalability Considerations +- Multi-server support with isolated data +- Horizontal scaling preparation +- Database indexing for query performance +- Audio processing queue management + +## Testing Strategy + +### Unit Testing +```python +# Core components to test +- AudioSink buffer management +- Quote detection accuracy +- Database operations +- API error handling +- Command validation +``` + +### Integration Testing +- End-to-end voice processing pipeline +- AI API integration reliability +- Discord bot command functionality +- Database transaction consistency + +### Load Testing +- Multiple simultaneous voice channels +- High-frequency quote detection +- Database performance under load +- Memory usage during extended operation + +## Deployment Configuration + +### Container Architecture + +#### Main Bot Container (Dockerfile) +```dockerfile +FROM python:3.11-slim + +# Install system dependencies +RUN apt-get update && apt-get install -y \ + ffmpeg \ + portaudio19-dev \ + gcc \ + g++ \ + && rm -rf /var/lib/apt/lists/* + +# Install Python dependencies +COPY requirements.txt . +RUN pip install --no-cache-dir -r requirements.txt + +# Install additional ML models +RUN python -c "import torch; torch.hub.download_url_to_file('https://github.com/pyannote/pyannote-audio/raw/develop/pyannote/audio/models/segmentation/0123456789abcdef.onnx', 'speaker_model.onnx')" + +# Copy application code +COPY . /app +WORKDIR /app + +# Create data directories +RUN mkdir -p /app/data /app/logs /app/temp + +# Health check +HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \ + CMD python -c "import requests; requests.get('http://localhost:8080/health')" + +CMD ["python", "main.py"] +``` + +#### Production Docker Compose +```yaml +version: '3.8' +services: + bot: + build: . + environment: + - REDIS_URL=redis://redis:6379 + - QDRANT_URL=http://qdrant:6333 + - PROMETHEUS_PORT=8080 + depends_on: + - redis + - qdrant + volumes: + - ./data:/app/data + - ./logs:/app/logs + - ./temp:/app/temp + ports: + - "8080:8080" # Health check endpoint + restart: unless-stopped + deploy: + resources: + limits: + memory: 4G + cpus: '2' + reservations: + memory: 2G + cpus: '1' + + redis: + image: redis:7-alpine + command: redis-server --maxmemory 512mb --maxmemory-policy allkeys-lru + ports: + - "6379:6379" + volumes: + - redis_data:/data + restart: unless-stopped + + qdrant: + image: qdrant/qdrant:latest + ports: + - "6333:6333" + volumes: + - qdrant_data:/qdrant/storage + environment: + - QDRANT__SERVICE__HTTP_PORT=6333 + restart: unless-stopped + + ollama: + image: ollama/ollama:latest + ports: + - "11434:11434" + volumes: + - ollama_data:/root/.ollama + environment: + - OLLAMA_HOST=0.0.0.0 + restart: unless-stopped + deploy: + resources: + limits: + memory: 8G + + prometheus: + image: prom/prometheus:latest + ports: + - "9090:9090" + volumes: + - ./config/prometheus.yml:/etc/prometheus/prometheus.yml + - prometheus_data:/prometheus + restart: unless-stopped + + grafana: + image: grafana/grafana:latest + ports: + - "3000:3000" + volumes: + - grafana_data:/var/lib/grafana + - ./config/grafana:/etc/grafana/provisioning + environment: + - GF_SECURITY_ADMIN_PASSWORD=admin + restart: unless-stopped + +volumes: + redis_data: + qdrant_data: + ollama_data: + prometheus_data: + grafana_data: +``` + +### Environment Configuration +```bash +# Discord Configuration +DISCORD_TOKEN=your_discord_bot_token +GUILD_ID=your_test_server_id +SUMMARY_CHANNEL_ID=channel_for_daily_summaries + +# AI Providers +OPENAI_API_KEY=your_openai_api_key +ANTHROPIC_API_KEY=your_anthropic_api_key +GROQ_API_KEY=your_groq_api_key +OPENROUTER_API_KEY=your_openrouter_api_key +OLLAMA_BASE_URL=http://ollama:11434 +LMSTUDIO_BASE_URL=http://localhost:1234 + +# Services +REDIS_URL=redis://redis:6379 +QDRANT_URL=http://qdrant:6333 + +# Recording Configuration +RECORDING_CLIP_DURATION=120 +MAX_CONCURRENT_RECORDINGS=5 +AUDIO_RETENTION_HOURS=24 +TEMP_AUDIO_PATH=/app/temp + +# Scoring Thresholds +QUOTE_THRESHOLD_REALTIME=8.5 +QUOTE_THRESHOLD_ROTATION=6.0 +QUOTE_THRESHOLD_DAILY=3.0 +SCORING_WEIGHT_FUNNY=0.3 +SCORING_WEIGHT_DARK=0.15 +SCORING_WEIGHT_SILLY=0.2 +SCORING_WEIGHT_SUSPICIOUS=0.1 +SCORING_WEIGHT_ASININE=0.25 + +# AI Configuration +DEFAULT_AI_PROVIDER=openai +TRANSCRIPTION_PROVIDER=openai +ANALYSIS_PROVIDER=openai +COMMENTARY_PROVIDER=anthropic +FALLBACK_PROVIDER=groq + +# Performance +MAX_MEMORY_USAGE_MB=4096 +MAX_AUDIO_BUFFER_SIZE=10485760 +CONCURRENT_TRANSCRIPTIONS=3 +API_RATE_LIMIT_RPM=100 + +# Monitoring +LOG_LEVEL=INFO +PROMETHEUS_PORT=8080 +HEALTH_CHECK_INTERVAL=30 +METRICS_RETENTION_DAYS=30 +``` + +### Health Monitoring + +#### Application Health Checks +```python +class HealthChecker: + async def check_database_health(self) -> bool + async def check_redis_health(self) -> bool + async def check_qdrant_health(self) -> bool + async def check_ai_providers_health(self) -> Dict[str, bool] + async def check_discord_connection(self) -> bool + async def check_memory_usage(self) -> Dict[str, float] + async def get_system_metrics(self) -> Dict[str, Any] +``` + +#### Prometheus Metrics +- `discord_quotes_detected_total` - Total quotes detected +- `discord_audio_clips_processed_total` - Audio clips processed +- `discord_transcription_duration_seconds` - Transcription latency +- `discord_ai_provider_requests_total` - AI API requests by provider +- `discord_memory_usage_bytes` - Memory consumption +- `discord_active_recordings_gauge` - Current active recordings +- `discord_quote_scores_histogram` - Distribution of quote scores + +### Backup & Recovery + +#### Database Backup Strategy +```bash +# Daily SQLite backup +docker exec bot_container sqlite3 /app/data/quotes.db ".backup /app/data/backup_$(date +%Y%m%d).db" + +# Qdrant collection backup +curl -X POST "http://qdrant:6333/collections/user_memories/snapshots" + +# Redis backup +docker exec redis_container redis-cli BGSAVE +``` + +#### Disaster Recovery +- Automated daily backups to external storage +- Database restoration procedures +- Configuration rollback mechanisms +- Service dependency recovery order + +## Extensibility Framework + +### Plugin Architecture + +#### Extension Interface +```python +class BaseExtension: + def __init__(self, bot, config): + self.bot = bot + self.config = config + + async def initialize(self) -> bool: + """Initialize extension resources""" + pass + + async def on_quote_detected(self, quote_data: QuoteData) -> None: + """Handle quote detection events""" + pass + + async def on_recording_start(self, channel_id: int) -> None: + """Handle recording start events""" + pass + + async def on_user_join_voice(self, user_id: int, channel_id: int) -> None: + """Handle user voice events""" + pass + + async def cleanup(self) -> None: + """Cleanup extension resources""" + pass +``` + +#### Extension Manager +```python +class ExtensionManager: + def __init__(self): + self.extensions = {} + self.hooks = defaultdict(list) + + async def load_extension(self, name: str, extension_class: Type[BaseExtension]): + """Dynamically load extension""" + + async def unload_extension(self, name: str): + """Unload and cleanup extension""" + + async def emit_event(self, event_name: str, *args, **kwargs): + """Emit event to all registered extensions""" +``` + +### Future Feature Implementations + +#### AI Voice Chat Extension (`extensions/ai_voice_chat.py`) +```python +class AIVoiceChatExtension(BaseExtension): + """Enable AI to participate in voice conversations""" + + def __init__(self, bot, config): + super().__init__(bot, config) + self.voice_synthesis = TTSEngine() + self.conversation_ai = ConversationAI() + self.participation_threshold = 7.0 # Score to trigger AI response + + async def on_quote_detected(self, quote_data: QuoteData): + if quote_data.overall_score >= self.participation_threshold: + response = await self.conversation_ai.generate_voice_response( + quote_data.quote, + quote_data.speaker_context, + quote_data.conversation_history + ) + audio_data = await self.voice_synthesis.text_to_speech(response) + await self.bot.play_audio_in_channel(quote_data.channel_id, audio_data) + + async def handle_direct_mention(self, user_id: int, message: str, channel_id: int): + """Respond when AI is mentioned in voice chat""" + context = await self.bot.memory_manager.retrieve_user_context(user_id, message) + response = await self.conversation_ai.generate_contextual_response(message, context) + audio_data = await self.voice_synthesis.text_to_speech(response) + await self.bot.play_audio_in_channel(channel_id, audio_data) +``` + +#### Research Agent Extension (`extensions/research_agents.py`) +```python +class ResearchAgentExtension(BaseExtension): + """Autonomous research and fact-checking""" + + def __init__(self, bot, config): + super().__init__(bot, config) + self.web_searcher = WebSearchAgent() + self.fact_checker = FactCheckingAgent() + self.research_triggers = ['fact', 'research', 'look up', 'what is'] + + async def on_quote_detected(self, quote_data: QuoteData): + if any(trigger in quote_data.quote.lower() for trigger in self.research_triggers): + research_query = self.extract_research_query(quote_data.quote) + if research_query: + results = await self.web_searcher.search(research_query) + summary = await self.fact_checker.analyze_and_summarize(results) + + await self.bot.send_message( + quote_data.channel_id, + f"🔍 Research Result for '{research_query}':\n{summary}" + ) + + def extract_research_query(self, quote: str) -> Optional[str]: + """Extract searchable query from quote using NLP""" + # Implementation for query extraction + pass +``` + +#### Personality Engine Extension (`extensions/personality_engine.py`) +```python +class PersonalityEngineExtension(BaseExtension): + """Dynamic AI personality adaptation""" + + def __init__(self, bot, config): + super().__init__(bot, config) + self.personality_models = { + 'sarcastic': SarcasticPersonality(), + 'wholesome': WholesomePersonality(), + 'chaotic': ChaoticPersonality(), + 'intellectual': IntellectualPersonality() + } + self.current_personality = 'sarcastic' + self.adaptation_threshold = 10 # quotes before personality shift + + async def on_quote_detected(self, quote_data: QuoteData): + # Analyze conversation mood and adapt personality + mood_analysis = await self.analyze_conversation_mood(quote_data) + + if self.should_adapt_personality(mood_analysis): + new_personality = self.select_optimal_personality(mood_analysis) + await self.switch_personality(new_personality) + + async def switch_personality(self, personality_name: str): + """Dynamically switch AI personality""" + self.current_personality = personality_name + personality = self.personality_models[personality_name] + + # Update AI prompts and response patterns + await self.bot.ai_manager.update_personality_context(personality.get_context()) + + # Announce personality change + announcement = personality.get_switch_announcement() + await self.bot.send_message(self.bot.active_channel, announcement) +``` + +### Extension Configuration + +#### Extension Registry (`config/extensions.yml`) +```yaml +extensions: + ai_voice_chat: + enabled: false + config: + participation_threshold: 7.0 + voice_model: "elevenlabs" + response_delay: 2.0 + + research_agents: + enabled: true + config: + search_provider: "google" + fact_check_threshold: 0.8 + auto_research: true + + personality_engine: + enabled: true + config: + default_personality: "sarcastic" + adaptation_enabled: true + personality_switch_cooldown: 3600 + + custom_responses: + enabled: true + config: + response_templates_path: "./config/response_templates.json" + user_specific_responses: true +``` + +#### Dynamic Loading System +```python +class DynamicExtensionLoader: + async def hot_reload_extension(self, extension_name: str): + """Reload extension without restarting bot""" + await self.extension_manager.unload_extension(extension_name) + module = importlib.reload(importlib.import_module(f"extensions.{extension_name}")) + extension_class = getattr(module, f"{extension_name.title()}Extension") + await self.extension_manager.load_extension(extension_name, extension_class) + + async def install_extension_from_repo(self, repo_url: str, extension_name: str): + """Install extension from external repository""" + # Git clone, validation, and installation logic + pass +``` + +### API Hooks for External Integration + +#### Webhook System +```python +class WebhookManager: + def __init__(self): + self.webhooks = {} + + async def register_webhook(self, event_type: str, url: str, auth_token: str): + """Register external webhook for events""" + + async def emit_webhook(self, event_type: str, data: dict): + """Send webhook notification""" + for webhook in self.webhooks.get(event_type, []): + await self.send_webhook_request(webhook, data) +``` + +#### REST API Extension Points +```python +# External API endpoints for integration +@app.route('/api/v1/quotes', methods=['GET']) +async def get_quotes(guild_id: int, limit: int = 50): + """External API access to quotes""" + +@app.route('/api/v1/trigger-recording', methods=['POST']) +async def trigger_recording(guild_id: int, channel_id: int): + """External trigger for recording""" + +@app.route('/api/v1/personality', methods=['PUT']) +async def update_personality(personality_data: dict): + """External personality updates""" +``` \ No newline at end of file diff --git a/reqs.md b/reqs.md new file mode 100644 index 0000000..422efa0 --- /dev/null +++ b/reqs.md @@ -0,0 +1,168 @@ +An evaluation of the provided text has been completed. The core concepts and architecture are sound, but several technical inaccuracies and potential issues were identified that would prevent the bot from working optimally and securely. + +### **Fact-Checking and Validation Summary:** + +* **Correct Libraries:** The use of `discord.py`, `discord-ext-voice-recv`, and `openai` is appropriate for this project's goals. +* **Incorrect Installation Commands:** `pip install sqlite3` is unnecessary, as `sqlite3` is a standard library built into Python. +* **Outdated API Usage:** The code uses a deprecated syntax for the OpenAI API (`openai.ChatCompletion.create`). The current `openai` library (v1.0.0 and newer) uses a client-based approach (`client.chat.completions.create`). +* **Fictional AI Model:** The model `gpt-5-nano` does not exist. A valid, existing model must be used. +* **Critical Blocking Code:** The Text-to-Speech (TTS) implementation uses `pyttsx3`'s `engine.runAndWait()`, which is a synchronous, blocking call. In an asynchronous application like a Discord bot, this would freeze the entire bot, making it unresponsive until the TTS generation is complete. +* **Audio Data Handling:** The code assumes mono audio at 48kHz. Discord voice channels provide stereo audio, which must be converted to mono for compatibility with most speech-to-text models, including Whisper. +* **Command Style:** The text uses prefix commands (`!command`). The modern and recommended standard for Discord bots is slash commands, which provide a better user experience and integration. + +The original text has been corrected, and its logic has been expanded upon to create the detailed, one-shot prompt below. This prompt addresses the identified issues by incorporating asynchronous database operations, correct API syntax, proper handling of blocking code, and modern Discord bot development practices. + +*** + +### **Prompt for Language Model Code Assistant** + +**Project Goal:** Develop a fully featured, optimal, and secure Discord Voice Chat Quote Bot using Python. + +**Core Features:** + +1. **Voice Channel Integration:** The bot must be able to join and leave the user's current voice channel. +2. **Live Audio Transcription:** It will continuously listen to all participants in the voice channel, capturing each user's audio separately. This audio will be transcribed into text in near real-time using the OpenAI Whisper API. +3. **AI-Powered Quote Detection:** Transcribed text will be analyzed by an AI model (e.g., GPT-4o-mini) to identify and extract "outlandish," funny, or memorable quotes. +4. **Live Commentary:** When a quote is detected, the bot will generate witty, humorous commentary about the quote and post it as a styled embed in the text channel where the interaction was initiated. +5. **Persistent Quote Storage:** All detected quotes, along with the user who said them and a timestamp, will be saved to a persistent SQLite database. +6. **Automated Daily Summaries:** The bot will automatically generate a "Daily Quote Compilation" every 24 hours, summarizing the day's best quotes using an AI model and posting it to a designated channel. +7. **(Optional) Text-to-Speech Commentary:** As an advanced feature, the bot can convert its generated commentary into speech and play it back in the voice channel. +8. **User Commands:** All interactions will be handled through modern, user-friendly slash commands (e.g., `/listen`, `/stop`, `/random_quote`). + +--- + +### **Technical Specifications and Implementation Details** + +#### **1. Project Structure** + +Organize the code into a modular structure using `discord.py` Cogs for maintainability. + +``` +/project_root +|-- main.py # Main bot runner, loads cogs +|-- .env # For storing secret keys +|-- requirements.txt # Project dependencies +|-- database.py # Handles all database interactions +|-- /cogs +| |-- voice_cog.py # Manages voice connection, listening, and audio processing +| |-- quotes_cog.py # Manages quote detection, commentary, and user commands +| |-- tasks_cog.py # Manages the scheduled daily summary +``` + +#### **2. Environment and Dependencies** + +Create a `.env` file for secure key storage: + +```ini +DISCORD_TOKEN=your_discord_bot_token +OPENAI_API_KEY=your_openai_api_key +``` + +Create a `requirements.txt` file with the following libraries. Use `aiosqlite` for non-blocking database operations suitable for an async environment. + +``` +discord.py>=2.3.0 +discord-ext-voice-recv +openai>=1.0.0 +python-dotenv +aiosqlite +pyttsx3 +ffmpeg-python +``` + +#### **3. Database Schema (`database.py`)** + +Use `aiosqlite` to create and interact with a `quotes.db` file. The database manager should be a class that handles all connections and queries asynchronously. + +```sql +CREATE TABLE IF NOT EXISTS quotes ( + id INTEGER PRIMARY KEY AUTOINCREMENT, + user_id INTEGER NOT NULL, + username TEXT NOT NULL, + quote TEXT NOT NULL, + timestamp TEXT NOT NULL, + guild_id INTEGER NOT NULL +); +``` + +#### **4. Voice Cog (`cogs/voice_cog.py`)** + +This cog will handle the core audio processing pipeline. + +* **Slash Commands:** + * `/listen`: Makes the bot join the user's voice channel, connects a `voice_recv.VoiceRecvClient`, and starts listening with a custom `AudioSink`. + * `/stop`: Stops listening, disconnects the bot from the voice channel, and performs cleanup. +* **`AudioSink` Class:** + * This class will inherit from `voice_recv.AudioSink`. + * It should buffer incoming audio data separately for each user (`dict[user_id, audio_buffer]`). + * To avoid infinite buffering, implement a system to process audio chunks. A good strategy is to create a new processing task for a user after they have been silent for a short duration (e.g., 1.5 seconds). +* **Audio Processing Logic:** + 1. When a user's audio chunk is ready for processing, run the following in an `asyncio.Task` to avoid blocking. + 2. The raw PCM data from Discord is 16-bit 48kHz **stereo**. Convert it to **mono** as required by Whisper. + 3. Save the mono audio data to a temporary in-memory buffer (`io.BytesIO`) or a temporary file. + 4. Call the OpenAI Whisper API using the `client.audio.transcriptions.create` method with the audio file data. + 5. Pass the resulting transcript to the `quotes_cog` for analysis. + +#### **5. Quotes Cog (`cogs/quotes_cog.py`)** + +This cog handles the AI logic and user-facing quote commands. + +* **Quote Detection:** + * Create a method that receives a transcript and a user object. + * Use the `openai` client (`client.chat.completions.create`) with a model like `gpt-4o-mini`. + * **Prompt Engineering (Quote Detection):** + ``` + You are an AI that detects memorable quotes from a voice chat transcript. Analyze the following text: "{transcript}" + + If it contains a genuinely funny, outlandish, or witty statement worth saving, respond ONLY with: + QUOTE: [The exact quote] + + If it does not, respond ONLY with: + NO_QUOTE + ``` + * Parse the model's response. If a quote is found, proceed to save it and generate commentary. +* **Live Commentary Generation:** + * If a quote is detected, call the chat completions API again with a different prompt. + * **Prompt Engineering (Commentary):** + ``` + A user named {username} just said: "{quote}" + Generate a short, witty, and humorous commentary about this quote. The tone should be like a live sports commentator who is amused by the situation. Keep it under 150 characters. + ``` + * Format the commentary and the original quote into a `discord.Embed` and send it to the channel. +* **Database Interaction:** + * When a quote is confirmed, call the `database.py` manager to save the quote, user ID, username, timestamp, and server ID. +* **Slash Commands:** + * `/random_quote`: Fetches and displays a random quote from the database for the current server. + * `/user_quotes [user]`: Fetches and displays all saved quotes from a specific user. + +#### **6. Tasks Cog (`cogs/tasks_cog.py`)** + +This cog manages scheduled background tasks. + +* **Daily Summary:** + * Use the `discord.ext.tasks` loop, configured to run once every 24 hours at a set time (e.g., midnight UTC). + * The task will fetch all quotes from the last 24 hours from the database. + * If quotes exist, format them into a single string and send them to a GPT model (`gpt-4o-mini` is suitable). + * **Prompt Engineering (Daily Summary):** + ``` + You are a Discord bot that creates a fun "end of day" summary. Here are the memorable quotes from today's voice chats: + {quote_list} + + Generate an entertaining summary of the day. Highlight the funniest moments, give out silly "awards" (e.g., "Quote of the Day"), and format it for a Discord embed using markdown and emojis. + ``` + * Post the resulting summary in a designated channel. + +#### **7. Optional TTS Feature (`voice_cog.py`)** + +* **Addressing Blocking Code:** The `pyttsx3` library is synchronous. To prevent it from freezing the bot, its blocking operations (`engine.save_to_file`, `engine.runAndWait`) **must** be run in a separate thread using `asyncio.to_thread`. +* **Implementation:** + 1. After generating text commentary, create a function `speak_commentary(text)`. + 2. Inside this function, use `await asyncio.to_thread(blocking_tts_function, text)` to generate the MP3 file without blocking the event loop. + 3. Once the file is ready, play it in the voice channel using `discord.FFmpegPCMAudio`. + 4. Ensure the temporary audio file is deleted after playback is complete. + +#### **8. Permissions and Security** + +* **Bot Permissions:** When generating the bot invite link, ensure the following permissions are requested: `Connect`, `Speak`, `Send Messages`, `Embed Links`, and `Read Message History`. +* **Intents:** Enable the Privileged Gateway Intents (especially Message Content) in the Discord Developer Portal. +* **Error Handling:** Implement comprehensive `try...except` blocks for all API calls, file operations, and voice connections to prevent the bot from crashing. Log errors appropriately. \ No newline at end of file