10 KiB
An evaluation of the provided text has been completed. The core concepts and architecture are sound, but several technical inaccuracies and potential issues were identified that would prevent the bot from working optimally and securely.
Fact-Checking and Validation Summary:
- Correct Libraries: The use of
discord.py,discord-ext-voice-recv, andopenaiis appropriate for this project's goals. - Incorrect Installation Commands:
pip install sqlite3is unnecessary, assqlite3is a standard library built into Python. - Outdated API Usage: The code uses a deprecated syntax for the OpenAI API (
openai.ChatCompletion.create). The currentopenailibrary (v1.0.0 and newer) uses a client-based approach (client.chat.completions.create). - Fictional AI Model: The model
gpt-5-nanodoes not exist. A valid, existing model must be used. - Critical Blocking Code: The Text-to-Speech (TTS) implementation uses
pyttsx3'sengine.runAndWait(), which is a synchronous, blocking call. In an asynchronous application like a Discord bot, this would freeze the entire bot, making it unresponsive until the TTS generation is complete. - Audio Data Handling: The code assumes mono audio at 48kHz. Discord voice channels provide stereo audio, which must be converted to mono for compatibility with most speech-to-text models, including Whisper.
- Command Style: The text uses prefix commands (
!command). The modern and recommended standard for Discord bots is slash commands, which provide a better user experience and integration.
The original text has been corrected, and its logic has been expanded upon to create the detailed, one-shot prompt below. This prompt addresses the identified issues by incorporating asynchronous database operations, correct API syntax, proper handling of blocking code, and modern Discord bot development practices.
Prompt for Language Model Code Assistant
Project Goal: Develop a fully featured, optimal, and secure Discord Voice Chat Quote Bot using Python.
Core Features:
- Voice Channel Integration: The bot must be able to join and leave the user's current voice channel.
- Live Audio Transcription: It will continuously listen to all participants in the voice channel, capturing each user's audio separately. This audio will be transcribed into text in near real-time using the OpenAI Whisper API.
- AI-Powered Quote Detection: Transcribed text will be analyzed by an AI model (e.g., GPT-4o-mini) to identify and extract "outlandish," funny, or memorable quotes.
- Live Commentary: When a quote is detected, the bot will generate witty, humorous commentary about the quote and post it as a styled embed in the text channel where the interaction was initiated.
- Persistent Quote Storage: All detected quotes, along with the user who said them and a timestamp, will be saved to a persistent SQLite database.
- Automated Daily Summaries: The bot will automatically generate a "Daily Quote Compilation" every 24 hours, summarizing the day's best quotes using an AI model and posting it to a designated channel.
- (Optional) Text-to-Speech Commentary: As an advanced feature, the bot can convert its generated commentary into speech and play it back in the voice channel.
- User Commands: All interactions will be handled through modern, user-friendly slash commands (e.g.,
/listen,/stop,/random_quote).
Technical Specifications and Implementation Details
1. Project Structure
Organize the code into a modular structure using discord.py Cogs for maintainability.
/project_root
|-- main.py # Main bot runner, loads cogs
|-- .env # For storing secret keys
|-- requirements.txt # Project dependencies
|-- database.py # Handles all database interactions
|-- /cogs
| |-- voice_cog.py # Manages voice connection, listening, and audio processing
| |-- quotes_cog.py # Manages quote detection, commentary, and user commands
| |-- tasks_cog.py # Manages the scheduled daily summary
2. Environment and Dependencies
Create a .env file for secure key storage:
DISCORD_TOKEN=your_discord_bot_token
OPENAI_API_KEY=your_openai_api_key
Create a requirements.txt file with the following libraries. Use aiosqlite for non-blocking database operations suitable for an async environment.
discord.py>=2.3.0
discord-ext-voice-recv
openai>=1.0.0
python-dotenv
aiosqlite
pyttsx3
ffmpeg-python
3. Database Schema (database.py)
Use aiosqlite to create and interact with a quotes.db file. The database manager should be a class that handles all connections and queries asynchronously.
CREATE TABLE IF NOT EXISTS quotes (
id INTEGER PRIMARY KEY AUTOINCREMENT,
user_id INTEGER NOT NULL,
username TEXT NOT NULL,
quote TEXT NOT NULL,
timestamp TEXT NOT NULL,
guild_id INTEGER NOT NULL
);
4. Voice Cog (cogs/voice_cog.py)
This cog will handle the core audio processing pipeline.
- Slash Commands:
/listen: Makes the bot join the user's voice channel, connects avoice_recv.VoiceRecvClient, and starts listening with a customAudioSink./stop: Stops listening, disconnects the bot from the voice channel, and performs cleanup.
AudioSinkClass:- This class will inherit from
voice_recv.AudioSink. - It should buffer incoming audio data separately for each user (
dict[user_id, audio_buffer]). - To avoid infinite buffering, implement a system to process audio chunks. A good strategy is to create a new processing task for a user after they have been silent for a short duration (e.g., 1.5 seconds).
- This class will inherit from
- Audio Processing Logic:
- When a user's audio chunk is ready for processing, run the following in an
asyncio.Taskto avoid blocking. - The raw PCM data from Discord is 16-bit 48kHz stereo. Convert it to mono as required by Whisper.
- Save the mono audio data to a temporary in-memory buffer (
io.BytesIO) or a temporary file. - Call the OpenAI Whisper API using the
client.audio.transcriptions.createmethod with the audio file data. - Pass the resulting transcript to the
quotes_cogfor analysis.
- When a user's audio chunk is ready for processing, run the following in an
5. Quotes Cog (cogs/quotes_cog.py)
This cog handles the AI logic and user-facing quote commands.
- Quote Detection:
- Create a method that receives a transcript and a user object.
- Use the
openaiclient (client.chat.completions.create) with a model likegpt-4o-mini. - Prompt Engineering (Quote Detection):
You are an AI that detects memorable quotes from a voice chat transcript. Analyze the following text: "{transcript}" If it contains a genuinely funny, outlandish, or witty statement worth saving, respond ONLY with: QUOTE: [The exact quote] If it does not, respond ONLY with: NO_QUOTE - Parse the model's response. If a quote is found, proceed to save it and generate commentary.
- Live Commentary Generation:
- If a quote is detected, call the chat completions API again with a different prompt.
- Prompt Engineering (Commentary):
A user named {username} just said: "{quote}" Generate a short, witty, and humorous commentary about this quote. The tone should be like a live sports commentator who is amused by the situation. Keep it under 150 characters. - Format the commentary and the original quote into a
discord.Embedand send it to the channel.
- Database Interaction:
- When a quote is confirmed, call the
database.pymanager to save the quote, user ID, username, timestamp, and server ID.
- When a quote is confirmed, call the
- Slash Commands:
/random_quote: Fetches and displays a random quote from the database for the current server./user_quotes [user]: Fetches and displays all saved quotes from a specific user.
6. Tasks Cog (cogs/tasks_cog.py)
This cog manages scheduled background tasks.
- Daily Summary:
- Use the
discord.ext.tasksloop, configured to run once every 24 hours at a set time (e.g., midnight UTC). - The task will fetch all quotes from the last 24 hours from the database.
- If quotes exist, format them into a single string and send them to a GPT model (
gpt-4o-miniis suitable). - Prompt Engineering (Daily Summary):
You are a Discord bot that creates a fun "end of day" summary. Here are the memorable quotes from today's voice chats: {quote_list} Generate an entertaining summary of the day. Highlight the funniest moments, give out silly "awards" (e.g., "Quote of the Day"), and format it for a Discord embed using markdown and emojis. - Post the resulting summary in a designated channel.
- Use the
7. Optional TTS Feature (voice_cog.py)
- Addressing Blocking Code: The
pyttsx3library is synchronous. To prevent it from freezing the bot, its blocking operations (engine.save_to_file,engine.runAndWait) must be run in a separate thread usingasyncio.to_thread. - Implementation:
- After generating text commentary, create a function
speak_commentary(text). - Inside this function, use
await asyncio.to_thread(blocking_tts_function, text)to generate the MP3 file without blocking the event loop. - Once the file is ready, play it in the voice channel using
discord.FFmpegPCMAudio. - Ensure the temporary audio file is deleted after playback is complete.
- After generating text commentary, create a function
8. Permissions and Security
- Bot Permissions: When generating the bot invite link, ensure the following permissions are requested:
Connect,Speak,Send Messages,Embed Links, andRead Message History. - Intents: Enable the Privileged Gateway Intents (especially Message Content) in the Discord Developer Portal.
- Error Handling: Implement comprehensive
try...exceptblocks for all API calls, file operations, and voice connections to prevent the bot from crashing. Log errors appropriately.