Files
disbord/reqs.md
2025-08-26 06:35:41 -04:00

10 KiB

An evaluation of the provided text has been completed. The core concepts and architecture are sound, but several technical inaccuracies and potential issues were identified that would prevent the bot from working optimally and securely.

Fact-Checking and Validation Summary:

  • Correct Libraries: The use of discord.py, discord-ext-voice-recv, and openai is appropriate for this project's goals.
  • Incorrect Installation Commands: pip install sqlite3 is unnecessary, as sqlite3 is a standard library built into Python.
  • Outdated API Usage: The code uses a deprecated syntax for the OpenAI API (openai.ChatCompletion.create). The current openai library (v1.0.0 and newer) uses a client-based approach (client.chat.completions.create).
  • Fictional AI Model: The model gpt-5-nano does not exist. A valid, existing model must be used.
  • Critical Blocking Code: The Text-to-Speech (TTS) implementation uses pyttsx3's engine.runAndWait(), which is a synchronous, blocking call. In an asynchronous application like a Discord bot, this would freeze the entire bot, making it unresponsive until the TTS generation is complete.
  • Audio Data Handling: The code assumes mono audio at 48kHz. Discord voice channels provide stereo audio, which must be converted to mono for compatibility with most speech-to-text models, including Whisper.
  • Command Style: The text uses prefix commands (!command). The modern and recommended standard for Discord bots is slash commands, which provide a better user experience and integration.

The original text has been corrected, and its logic has been expanded upon to create the detailed, one-shot prompt below. This prompt addresses the identified issues by incorporating asynchronous database operations, correct API syntax, proper handling of blocking code, and modern Discord bot development practices.


Prompt for Language Model Code Assistant

Project Goal: Develop a fully featured, optimal, and secure Discord Voice Chat Quote Bot using Python.

Core Features:

  1. Voice Channel Integration: The bot must be able to join and leave the user's current voice channel.
  2. Live Audio Transcription: It will continuously listen to all participants in the voice channel, capturing each user's audio separately. This audio will be transcribed into text in near real-time using the OpenAI Whisper API.
  3. AI-Powered Quote Detection: Transcribed text will be analyzed by an AI model (e.g., GPT-4o-mini) to identify and extract "outlandish," funny, or memorable quotes.
  4. Live Commentary: When a quote is detected, the bot will generate witty, humorous commentary about the quote and post it as a styled embed in the text channel where the interaction was initiated.
  5. Persistent Quote Storage: All detected quotes, along with the user who said them and a timestamp, will be saved to a persistent SQLite database.
  6. Automated Daily Summaries: The bot will automatically generate a "Daily Quote Compilation" every 24 hours, summarizing the day's best quotes using an AI model and posting it to a designated channel.
  7. (Optional) Text-to-Speech Commentary: As an advanced feature, the bot can convert its generated commentary into speech and play it back in the voice channel.
  8. User Commands: All interactions will be handled through modern, user-friendly slash commands (e.g., /listen, /stop, /random_quote).

Technical Specifications and Implementation Details

1. Project Structure

Organize the code into a modular structure using discord.py Cogs for maintainability.

/project_root
|-- main.py             # Main bot runner, loads cogs
|-- .env                # For storing secret keys
|-- requirements.txt    # Project dependencies
|-- database.py         # Handles all database interactions
|-- /cogs
|   |-- voice_cog.py    # Manages voice connection, listening, and audio processing
|   |-- quotes_cog.py   # Manages quote detection, commentary, and user commands
|   |-- tasks_cog.py    # Manages the scheduled daily summary

2. Environment and Dependencies

Create a .env file for secure key storage:

DISCORD_TOKEN=your_discord_bot_token
OPENAI_API_KEY=your_openai_api_key

Create a requirements.txt file with the following libraries. Use aiosqlite for non-blocking database operations suitable for an async environment.

discord.py>=2.3.0
discord-ext-voice-recv
openai>=1.0.0
python-dotenv
aiosqlite
pyttsx3
ffmpeg-python

3. Database Schema (database.py)

Use aiosqlite to create and interact with a quotes.db file. The database manager should be a class that handles all connections and queries asynchronously.

CREATE TABLE IF NOT EXISTS quotes (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    user_id INTEGER NOT NULL,
    username TEXT NOT NULL,
    quote TEXT NOT NULL,
    timestamp TEXT NOT NULL,
    guild_id INTEGER NOT NULL
);

4. Voice Cog (cogs/voice_cog.py)

This cog will handle the core audio processing pipeline.

  • Slash Commands:
    • /listen: Makes the bot join the user's voice channel, connects a voice_recv.VoiceRecvClient, and starts listening with a custom AudioSink.
    • /stop: Stops listening, disconnects the bot from the voice channel, and performs cleanup.
  • AudioSink Class:
    • This class will inherit from voice_recv.AudioSink.
    • It should buffer incoming audio data separately for each user (dict[user_id, audio_buffer]).
    • To avoid infinite buffering, implement a system to process audio chunks. A good strategy is to create a new processing task for a user after they have been silent for a short duration (e.g., 1.5 seconds).
  • Audio Processing Logic:
    1. When a user's audio chunk is ready for processing, run the following in an asyncio.Task to avoid blocking.
    2. The raw PCM data from Discord is 16-bit 48kHz stereo. Convert it to mono as required by Whisper.
    3. Save the mono audio data to a temporary in-memory buffer (io.BytesIO) or a temporary file.
    4. Call the OpenAI Whisper API using the client.audio.transcriptions.create method with the audio file data.
    5. Pass the resulting transcript to the quotes_cog for analysis.

5. Quotes Cog (cogs/quotes_cog.py)

This cog handles the AI logic and user-facing quote commands.

  • Quote Detection:
    • Create a method that receives a transcript and a user object.
    • Use the openai client (client.chat.completions.create) with a model like gpt-4o-mini.
    • Prompt Engineering (Quote Detection):
      You are an AI that detects memorable quotes from a voice chat transcript. Analyze the following text: "{transcript}"
      
      If it contains a genuinely funny, outlandish, or witty statement worth saving, respond ONLY with:
      QUOTE: [The exact quote]
      
      If it does not, respond ONLY with:
      NO_QUOTE
      
    • Parse the model's response. If a quote is found, proceed to save it and generate commentary.
  • Live Commentary Generation:
    • If a quote is detected, call the chat completions API again with a different prompt.
    • Prompt Engineering (Commentary):
      A user named {username} just said: "{quote}"
      Generate a short, witty, and humorous commentary about this quote. The tone should be like a live sports commentator who is amused by the situation. Keep it under 150 characters.
      
    • Format the commentary and the original quote into a discord.Embed and send it to the channel.
  • Database Interaction:
    • When a quote is confirmed, call the database.py manager to save the quote, user ID, username, timestamp, and server ID.
  • Slash Commands:
    • /random_quote: Fetches and displays a random quote from the database for the current server.
    • /user_quotes [user]: Fetches and displays all saved quotes from a specific user.

6. Tasks Cog (cogs/tasks_cog.py)

This cog manages scheduled background tasks.

  • Daily Summary:
    • Use the discord.ext.tasks loop, configured to run once every 24 hours at a set time (e.g., midnight UTC).
    • The task will fetch all quotes from the last 24 hours from the database.
    • If quotes exist, format them into a single string and send them to a GPT model (gpt-4o-mini is suitable).
    • Prompt Engineering (Daily Summary):
      You are a Discord bot that creates a fun "end of day" summary. Here are the memorable quotes from today's voice chats:
      {quote_list}
      
      Generate an entertaining summary of the day. Highlight the funniest moments, give out silly "awards" (e.g., "Quote of the Day"), and format it for a Discord embed using markdown and emojis.
      
    • Post the resulting summary in a designated channel.

7. Optional TTS Feature (voice_cog.py)

  • Addressing Blocking Code: The pyttsx3 library is synchronous. To prevent it from freezing the bot, its blocking operations (engine.save_to_file, engine.runAndWait) must be run in a separate thread using asyncio.to_thread.
  • Implementation:
    1. After generating text commentary, create a function speak_commentary(text).
    2. Inside this function, use await asyncio.to_thread(blocking_tts_function, text) to generate the MP3 file without blocking the event loop.
    3. Once the file is ready, play it in the voice channel using discord.FFmpegPCMAudio.
    4. Ensure the temporary audio file is deleted after playback is complete.

8. Permissions and Security

  • Bot Permissions: When generating the bot invite link, ensure the following permissions are requested: Connect, Speak, Send Messages, Embed Links, and Read Message History.
  • Intents: Enable the Privileged Gateway Intents (especially Message Content) in the Discord Developer Portal.
  • Error Handling: Implement comprehensive try...except blocks for all API calls, file operations, and voice connections to prevent the bot from crashing. Log errors appropriately.