This commit is contained in:
2025-08-26 06:35:41 -04:00
commit 978d3794e8
2 changed files with 1970 additions and 0 deletions

File diff suppressed because it is too large Load Diff

168
reqs.md Normal file
View File

@@ -0,0 +1,168 @@
An evaluation of the provided text has been completed. The core concepts and architecture are sound, but several technical inaccuracies and potential issues were identified that would prevent the bot from working optimally and securely.
### **Fact-Checking and Validation Summary:**
* **Correct Libraries:** The use of `discord.py`, `discord-ext-voice-recv`, and `openai` is appropriate for this project's goals.
* **Incorrect Installation Commands:** `pip install sqlite3` is unnecessary, as `sqlite3` is a standard library built into Python.
* **Outdated API Usage:** The code uses a deprecated syntax for the OpenAI API (`openai.ChatCompletion.create`). The current `openai` library (v1.0.0 and newer) uses a client-based approach (`client.chat.completions.create`).
* **Fictional AI Model:** The model `gpt-5-nano` does not exist. A valid, existing model must be used.
* **Critical Blocking Code:** The Text-to-Speech (TTS) implementation uses `pyttsx3`'s `engine.runAndWait()`, which is a synchronous, blocking call. In an asynchronous application like a Discord bot, this would freeze the entire bot, making it unresponsive until the TTS generation is complete.
* **Audio Data Handling:** The code assumes mono audio at 48kHz. Discord voice channels provide stereo audio, which must be converted to mono for compatibility with most speech-to-text models, including Whisper.
* **Command Style:** The text uses prefix commands (`!command`). The modern and recommended standard for Discord bots is slash commands, which provide a better user experience and integration.
The original text has been corrected, and its logic has been expanded upon to create the detailed, one-shot prompt below. This prompt addresses the identified issues by incorporating asynchronous database operations, correct API syntax, proper handling of blocking code, and modern Discord bot development practices.
***
### **Prompt for Language Model Code Assistant**
**Project Goal:** Develop a fully featured, optimal, and secure Discord Voice Chat Quote Bot using Python.
**Core Features:**
1. **Voice Channel Integration:** The bot must be able to join and leave the user's current voice channel.
2. **Live Audio Transcription:** It will continuously listen to all participants in the voice channel, capturing each user's audio separately. This audio will be transcribed into text in near real-time using the OpenAI Whisper API.
3. **AI-Powered Quote Detection:** Transcribed text will be analyzed by an AI model (e.g., GPT-4o-mini) to identify and extract "outlandish," funny, or memorable quotes.
4. **Live Commentary:** When a quote is detected, the bot will generate witty, humorous commentary about the quote and post it as a styled embed in the text channel where the interaction was initiated.
5. **Persistent Quote Storage:** All detected quotes, along with the user who said them and a timestamp, will be saved to a persistent SQLite database.
6. **Automated Daily Summaries:** The bot will automatically generate a "Daily Quote Compilation" every 24 hours, summarizing the day's best quotes using an AI model and posting it to a designated channel.
7. **(Optional) Text-to-Speech Commentary:** As an advanced feature, the bot can convert its generated commentary into speech and play it back in the voice channel.
8. **User Commands:** All interactions will be handled through modern, user-friendly slash commands (e.g., `/listen`, `/stop`, `/random_quote`).
---
### **Technical Specifications and Implementation Details**
#### **1. Project Structure**
Organize the code into a modular structure using `discord.py` Cogs for maintainability.
```
/project_root
|-- main.py # Main bot runner, loads cogs
|-- .env # For storing secret keys
|-- requirements.txt # Project dependencies
|-- database.py # Handles all database interactions
|-- /cogs
| |-- voice_cog.py # Manages voice connection, listening, and audio processing
| |-- quotes_cog.py # Manages quote detection, commentary, and user commands
| |-- tasks_cog.py # Manages the scheduled daily summary
```
#### **2. Environment and Dependencies**
Create a `.env` file for secure key storage:
```ini
DISCORD_TOKEN=your_discord_bot_token
OPENAI_API_KEY=your_openai_api_key
```
Create a `requirements.txt` file with the following libraries. Use `aiosqlite` for non-blocking database operations suitable for an async environment.
```
discord.py>=2.3.0
discord-ext-voice-recv
openai>=1.0.0
python-dotenv
aiosqlite
pyttsx3
ffmpeg-python
```
#### **3. Database Schema (`database.py`)**
Use `aiosqlite` to create and interact with a `quotes.db` file. The database manager should be a class that handles all connections and queries asynchronously.
```sql
CREATE TABLE IF NOT EXISTS quotes (
id INTEGER PRIMARY KEY AUTOINCREMENT,
user_id INTEGER NOT NULL,
username TEXT NOT NULL,
quote TEXT NOT NULL,
timestamp TEXT NOT NULL,
guild_id INTEGER NOT NULL
);
```
#### **4. Voice Cog (`cogs/voice_cog.py`)**
This cog will handle the core audio processing pipeline.
* **Slash Commands:**
* `/listen`: Makes the bot join the user's voice channel, connects a `voice_recv.VoiceRecvClient`, and starts listening with a custom `AudioSink`.
* `/stop`: Stops listening, disconnects the bot from the voice channel, and performs cleanup.
* **`AudioSink` Class:**
* This class will inherit from `voice_recv.AudioSink`.
* It should buffer incoming audio data separately for each user (`dict[user_id, audio_buffer]`).
* To avoid infinite buffering, implement a system to process audio chunks. A good strategy is to create a new processing task for a user after they have been silent for a short duration (e.g., 1.5 seconds).
* **Audio Processing Logic:**
1. When a user's audio chunk is ready for processing, run the following in an `asyncio.Task` to avoid blocking.
2. The raw PCM data from Discord is 16-bit 48kHz **stereo**. Convert it to **mono** as required by Whisper.
3. Save the mono audio data to a temporary in-memory buffer (`io.BytesIO`) or a temporary file.
4. Call the OpenAI Whisper API using the `client.audio.transcriptions.create` method with the audio file data.
5. Pass the resulting transcript to the `quotes_cog` for analysis.
#### **5. Quotes Cog (`cogs/quotes_cog.py`)**
This cog handles the AI logic and user-facing quote commands.
* **Quote Detection:**
* Create a method that receives a transcript and a user object.
* Use the `openai` client (`client.chat.completions.create`) with a model like `gpt-4o-mini`.
* **Prompt Engineering (Quote Detection):**
```
You are an AI that detects memorable quotes from a voice chat transcript. Analyze the following text: "{transcript}"
If it contains a genuinely funny, outlandish, or witty statement worth saving, respond ONLY with:
QUOTE: [The exact quote]
If it does not, respond ONLY with:
NO_QUOTE
```
* Parse the model's response. If a quote is found, proceed to save it and generate commentary.
* **Live Commentary Generation:**
* If a quote is detected, call the chat completions API again with a different prompt.
* **Prompt Engineering (Commentary):**
```
A user named {username} just said: "{quote}"
Generate a short, witty, and humorous commentary about this quote. The tone should be like a live sports commentator who is amused by the situation. Keep it under 150 characters.
```
* Format the commentary and the original quote into a `discord.Embed` and send it to the channel.
* **Database Interaction:**
* When a quote is confirmed, call the `database.py` manager to save the quote, user ID, username, timestamp, and server ID.
* **Slash Commands:**
* `/random_quote`: Fetches and displays a random quote from the database for the current server.
* `/user_quotes [user]`: Fetches and displays all saved quotes from a specific user.
#### **6. Tasks Cog (`cogs/tasks_cog.py`)**
This cog manages scheduled background tasks.
* **Daily Summary:**
* Use the `discord.ext.tasks` loop, configured to run once every 24 hours at a set time (e.g., midnight UTC).
* The task will fetch all quotes from the last 24 hours from the database.
* If quotes exist, format them into a single string and send them to a GPT model (`gpt-4o-mini` is suitable).
* **Prompt Engineering (Daily Summary):**
```
You are a Discord bot that creates a fun "end of day" summary. Here are the memorable quotes from today's voice chats:
{quote_list}
Generate an entertaining summary of the day. Highlight the funniest moments, give out silly "awards" (e.g., "Quote of the Day"), and format it for a Discord embed using markdown and emojis.
```
* Post the resulting summary in a designated channel.
#### **7. Optional TTS Feature (`voice_cog.py`)**
* **Addressing Blocking Code:** The `pyttsx3` library is synchronous. To prevent it from freezing the bot, its blocking operations (`engine.save_to_file`, `engine.runAndWait`) **must** be run in a separate thread using `asyncio.to_thread`.
* **Implementation:**
1. After generating text commentary, create a function `speak_commentary(text)`.
2. Inside this function, use `await asyncio.to_thread(blocking_tts_function, text)` to generate the MP3 file without blocking the event loop.
3. Once the file is ready, play it in the voice channel using `discord.FFmpegPCMAudio`.
4. Ensure the temporary audio file is deleted after playback is complete.
#### **8. Permissions and Security**
* **Bot Permissions:** When generating the bot invite link, ensure the following permissions are requested: `Connect`, `Speak`, `Send Messages`, `Embed Links`, and `Read Message History`.
* **Intents:** Enable the Privileged Gateway Intents (especially Message Content) in the Discord Developer Portal.
* **Error Handling:** Implement comprehensive `try...except` blocks for all API calls, file operations, and voice connections to prevent the bot from crashing. Log errors appropriately.