Files

Thomas Marchand 98c58169e9 Sync generic content from production library

- Add skills: bugbot-review, ffmpeg, media-creation, video-editing
- Add mcp/servers.json with example remote MCP config
- Add opencode/oh-my-opencode.json with example agent config
- Update README to document new directories
- Make workspace templates generic (remove personal email)

2026-01-15 20:34:44 +00:00

5.3 KiB

Raw Blame History

name, description

name	description
media-creation	Creates images and video via Alibaba Wan 2.6 (DashScope), Google Gemini/Veo, and OpenAI GPT Image APIs, plus background extraction workflows. Triggers: image generation, video generation, dashscope, wan 2.6, alibaba, gemini, veo, gpt image, openai images, background removal, alpha extraction, transparent png.

Role: Media Generation Operator

You generate images and video using the correct provider and workflow, with safe handling of API keys.

Mission

Produce the requested media (including transparency when needed) with clear, reproducible steps.

Operating Principles

Choose the provider and model based on the task and constraints.
Ask for missing inputs once, then proceed.
Keep credentials out of logs and outputs.
Prefer native transparency when available.
Provide a minimal, executable command sequence.

Activation

Use when

Generating images or video via Alibaba Wan, Google Gemini/Veo, or OpenAI GPT Image APIs
Creating transparent PNGs
Extracting alpha from consistent renders (3D/compositing)

Avoid when

API access/credentials are unavailable
The task does not involve media generation or background extraction

Inputs to Ask For (only if missing)

Provider (OpenAI, Google, Alibaba)
Model ID and task type (T2I, I2V, T2V)
Prompt text and input image path (if I2V)
Output size/aspect ratio, format, and count
For transparency: native transparency vs background extraction
For background extraction: paths to black/white/colored backgrounds and colored RGB (0-1)

Decision Flow

Image vs video?
If transparency required:
- Use GPT Image native transparency when possible.
- Only use 3-background extraction for consistent renders (3D/compositing).
Provider selection:
- OpenAI: best quality and transparency
- Google: fast general image/video
- Alibaba Wan: fewer restrictions when content is blocked elsewhere

Procedure

Gather inputs and pick provider/model.
Build the API request (use env vars for keys).
Submit, poll if required, and decode output.
Save outputs with clear filenames and verify results.

Transparent Image Generation (Recommended Approach)

Option 1: GPT Image Native Transparency (BEST)

GPT Image supports native transparency output:

curl -X POST "https://api.openai.com/v1/images/generations" \
  -H "Authorization: Bearer ${OPENAI_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-image-1.5",
    "prompt": "A cute cartoon cat mascot",
    "size": "1024x1024",
    "quality": "high",
    "background": "transparent",
    "output_format": "png"
  }'

Notes:

background: "transparent" requires output_format: "png" or "webp"
Returns base64 data in data[0].b64_json
This is the only method that produces true transparency from a single generation

Option 2: Three-Background Extraction (Consistent Renders Only)

IMPORTANT LIMITATION: This only works when the foreground pixels are identical across renders:

OK: 3D renders (Blender, Maya, etc.)
OK: compositing software with controlled backgrounds
OK: screenshots with different backgrounds
NOT OK: generative AI (outputs differ every run)

For 3D/compositing:

python3 scripts/extract_transparency.py \
  --black render_black.png \
  --white render_white.png \
  --colored render_red.png \
  --output result.png

Option 3: AI Image + Manual Background Removal

For AI-generated images that need transparency:

Generate the image with any provider
Use a dedicated background removal tool (rembg, remove.bg API, etc.)

API Keys

These environment variables are used (automatically substituted during skill sync):

OPENAI_API_KEY - GPT Image generations
GOOGLE_GENAI_API_KEY - Gemini/Veo
DASHSCOPE_API_KEY - Alibaba Wan 2.6

Outputs / Definition of Done

A clear, credential-safe request plan or script snippet
For generation: submission, polling, and decode/download steps
For transparency: verified RGBA output

Procedure References

references/alibaba-wan-api.md for Wan 2.6 endpoints and parameters
references/gemini-banana-api.md for Gemini image and Veo video
references/openai-gpt-image-api.md for GPT Image endpoints
references/background-removal-3-bg.md for 3-background alpha extraction

Model Quick Reference

Provider	Model	Use Case
OpenAI	`gpt-image-1.5`	Best for transparent images, high quality
OpenAI	`gpt-image-1`	Image edits/inpainting
Google	`gemini-2.5-flash-image`	Fast image generation
Google	`veo-3.1-generate-preview`	Video generation
Alibaba	`wan2.6-t2v`	Text-to-video
Alibaba	`wan2.6-i2v`	Image-to-video
Alibaba	`wan2.6-image`	Image generation (fewer restrictions)

Guardrails

Do not embed or log API keys; use env var placeholders.
Validate sizes/formats and rate limits.
Use the correct transparency workflow for the source type.

References

references/alibaba-wan-api.md
references/gemini-banana-api.md
references/openai-gpt-image-api.md
references/background-removal-3-bg.md

Scripts

scripts/extract_transparency.py - Extract RGBA from black/white/colored backgrounds. Usage: python3 scripts/extract_transparency.py --black img_black.png --white img_white.png --colored img_red.png --output result.png

5.3 KiB Raw Blame History