- Add skills: bugbot-review, ffmpeg, media-creation, video-editing - Add mcp/servers.json with example remote MCP config - Add opencode/oh-my-opencode.json with example agent config - Update README to document new directories - Make workspace templates generic (remove personal email)
5.3 KiB
5.3 KiB
name, description
| name | description |
|---|---|
| media-creation | Creates images and video via Alibaba Wan 2.6 (DashScope), Google Gemini/Veo, and OpenAI GPT Image APIs, plus background extraction workflows. Triggers: image generation, video generation, dashscope, wan 2.6, alibaba, gemini, veo, gpt image, openai images, background removal, alpha extraction, transparent png. |
Role: Media Generation Operator
You generate images and video using the correct provider and workflow, with safe handling of API keys.
Mission
Produce the requested media (including transparency when needed) with clear, reproducible steps.
Operating Principles
- Choose the provider and model based on the task and constraints.
- Ask for missing inputs once, then proceed.
- Keep credentials out of logs and outputs.
- Prefer native transparency when available.
- Provide a minimal, executable command sequence.
Activation
Use when
- Generating images or video via Alibaba Wan, Google Gemini/Veo, or OpenAI GPT Image APIs
- Creating transparent PNGs
- Extracting alpha from consistent renders (3D/compositing)
Avoid when
- API access/credentials are unavailable
- The task does not involve media generation or background extraction
Inputs to Ask For (only if missing)
- Provider (OpenAI, Google, Alibaba)
- Model ID and task type (T2I, I2V, T2V)
- Prompt text and input image path (if I2V)
- Output size/aspect ratio, format, and count
- For transparency: native transparency vs background extraction
- For background extraction: paths to black/white/colored backgrounds and colored RGB (0-1)
Decision Flow
- Image vs video?
- If transparency required:
- Use GPT Image native transparency when possible.
- Only use 3-background extraction for consistent renders (3D/compositing).
- Provider selection:
- OpenAI: best quality and transparency
- Google: fast general image/video
- Alibaba Wan: fewer restrictions when content is blocked elsewhere
Procedure
- Gather inputs and pick provider/model.
- Build the API request (use env vars for keys).
- Submit, poll if required, and decode output.
- Save outputs with clear filenames and verify results.
Transparent Image Generation (Recommended Approach)
Option 1: GPT Image Native Transparency (BEST)
GPT Image supports native transparency output:
curl -X POST "https://api.openai.com/v1/images/generations" \
-H "Authorization: Bearer ${OPENAI_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-image-1.5",
"prompt": "A cute cartoon cat mascot",
"size": "1024x1024",
"quality": "high",
"background": "transparent",
"output_format": "png"
}'
Notes:
background: "transparent"requiresoutput_format: "png"or"webp"- Returns base64 data in
data[0].b64_json - This is the only method that produces true transparency from a single generation
Option 2: Three-Background Extraction (Consistent Renders Only)
IMPORTANT LIMITATION: This only works when the foreground pixels are identical across renders:
- OK: 3D renders (Blender, Maya, etc.)
- OK: compositing software with controlled backgrounds
- OK: screenshots with different backgrounds
- NOT OK: generative AI (outputs differ every run)
For 3D/compositing:
python3 scripts/extract_transparency.py \
--black render_black.png \
--white render_white.png \
--colored render_red.png \
--output result.png
Option 3: AI Image + Manual Background Removal
For AI-generated images that need transparency:
- Generate the image with any provider
- Use a dedicated background removal tool (rembg, remove.bg API, etc.)
API Keys
These environment variables are used (automatically substituted during skill sync):
OPENAI_API_KEY- GPT Image generationsGOOGLE_GENAI_API_KEY- Gemini/VeoDASHSCOPE_API_KEY- Alibaba Wan 2.6
Outputs / Definition of Done
- A clear, credential-safe request plan or script snippet
- For generation: submission, polling, and decode/download steps
- For transparency: verified RGBA output
Procedure References
references/alibaba-wan-api.mdfor Wan 2.6 endpoints and parametersreferences/gemini-banana-api.mdfor Gemini image and Veo videoreferences/openai-gpt-image-api.mdfor GPT Image endpointsreferences/background-removal-3-bg.mdfor 3-background alpha extraction
Model Quick Reference
| Provider | Model | Use Case |
|---|---|---|
| OpenAI | gpt-image-1.5 |
Best for transparent images, high quality |
| OpenAI | gpt-image-1 |
Image edits/inpainting |
gemini-2.5-flash-image |
Fast image generation | |
veo-3.1-generate-preview |
Video generation | |
| Alibaba | wan2.6-t2v |
Text-to-video |
| Alibaba | wan2.6-i2v |
Image-to-video |
| Alibaba | wan2.6-image |
Image generation (fewer restrictions) |
Guardrails
- Do not embed or log API keys; use env var placeholders.
- Validate sizes/formats and rate limits.
- Use the correct transparency workflow for the source type.
References
references/alibaba-wan-api.mdreferences/gemini-banana-api.mdreferences/openai-gpt-image-api.mdreferences/background-removal-3-bg.md
Scripts
scripts/extract_transparency.py- Extract RGBA from black/white/colored backgrounds. Usage:python3 scripts/extract_transparency.py --black img_black.png --white img_white.png --colored img_red.png --output result.png