Skip to main content

Music Generation

The music_generate tool lets the agent create music or audio through the shared music-generation capability with configured providers such as Google, MiniMax, and workflow-configured ComfyUI. For shared provider-backed agent sessions, OpenClaw starts music generation as a background task, tracks it in the task ledger, then wakes the agent again when the track is ready so the agent can post the finished audio back into the original channel.
The built-in shared tool only appears when at least one music-generation provider is available. If you don’t see music_generate in your agent’s tools, configure agents.defaults.musicGenerationModel or set up a provider API key.

Quick start

Shared provider-backed generation

  1. Set an API key for at least one provider, for example GEMINI_API_KEY or MINIMAX_API_KEY.
  2. Optionally set your preferred model:
{
  agents: {
    defaults: {
      musicGenerationModel: {
        primary: "google/lyria-3-clip-preview",
      },
    },
  },
}
  1. Ask the agent: “Generate an upbeat synthpop track about a night drive through a neon city.”
The agent calls music_generate automatically. No tool allow-listing needed. For direct synchronous contexts without a session-backed agent run, the built-in tool still falls back to inline generation and returns the final media path in the tool result. Example prompts:
Generate a cinematic piano track with soft strings and no vocals.
Generate an energetic chiptune loop about launching a rocket at sunrise.

Workflow-driven Comfy generation

The bundled comfy plugin plugs into the shared music_generate tool through the music-generation provider registry.
  1. Configure models.providers.comfy.music with a workflow JSON and prompt/output nodes.
  2. If you use Comfy Cloud, set COMFY_API_KEY or COMFY_CLOUD_API_KEY.
  3. Ask the agent for music or call the tool directly.
Example:
/tool music_generate prompt="Warm ambient synth loop with soft tape texture"

Shared bundled provider support

ProviderDefault modelReference inputsSupported controlsAPI key
ComfyUIworkflowUp to 1 imageWorkflow-defined music or audioCOMFY_API_KEY, COMFY_CLOUD_API_KEY
Googlelyria-3-clip-previewUp to 10 imageslyrics, instrumental, formatGEMINI_API_KEY, GOOGLE_API_KEY
MiniMaxmusic-2.5+Nonelyrics, instrumental, durationSeconds, format=mp3MINIMAX_API_KEY
Use action: "list" to inspect available shared providers and models at runtime:
/tool music_generate action=list
Use action: "status" to inspect the active session-backed music task:
/tool music_generate action=status
Direct generation example:
/tool music_generate prompt="Dreamy lo-fi hip hop with vinyl texture and gentle rain" instrumental=true

Built-in tool parameters

ParameterTypeDescription
promptstringMusic generation prompt (required for action: "generate")
actionstring"generate" (default), "status" for the current session task, or "list" to inspect providers
modelstringProvider/model override, e.g. google/lyria-3-pro-preview or comfy/workflow
lyricsstringOptional lyrics when the provider supports explicit lyric input
instrumentalbooleanRequest instrumental-only output when the provider supports it
imagestringSingle reference image path or URL
imagesstring[]Multiple reference images (up to 10)
durationSecondsnumberTarget duration in seconds when the provider supports duration hints
formatstringOutput format hint (mp3 or wav) when the provider supports it
filenamestringOutput filename hint
Not all providers support all parameters. OpenClaw still validates hard limits such as input counts before submission, but unsupported optional hints are ignored with a warning when the selected provider or model cannot honor them.

Async behavior for the shared provider-backed path

  • Session-backed agent runs: music_generate creates a background task, returns a started/task response immediately, and posts the finished track later in a follow-up agent message.
  • Duplicate prevention: while that background task is still queued or running, later music_generate calls in the same session return task status instead of starting another generation.
  • Status lookup: use action: "status" to inspect the active session-backed music task without starting a new one.
  • Task tracking: use openclaw tasks list or openclaw tasks show <taskId> to inspect queued, running, and terminal status for the generation.
  • Completion wake: OpenClaw injects an internal completion event back into the same session so the model can write the user-facing follow-up itself.
  • Prompt hint: later user/manual turns in the same session get a small runtime hint when a music task is already in flight so the model does not blindly call music_generate again.
  • No-session fallback: direct/local contexts without a real agent session still run inline and return the final audio result in the same turn.

Configuration

Model selection

{
  agents: {
    defaults: {
      musicGenerationModel: {
        primary: "google/lyria-3-clip-preview",
        fallbacks: ["minimax/music-2.5+"],
      },
    },
  },
}

Provider selection order

When generating music, OpenClaw tries providers in this order:
  1. model parameter from the tool call, if the agent specifies one
  2. musicGenerationModel.primary from config
  3. musicGenerationModel.fallbacks in order
  4. Auto-detection using auth-backed provider defaults only:
    • current default provider first
    • remaining registered music-generation providers in provider-id order
If a provider fails, the next candidate is tried automatically. If all fail, the error includes details from each attempt.

Provider notes

  • Google uses Lyria 3 batch generation. The current bundled flow supports prompt, optional lyrics text, and optional reference images.
  • MiniMax uses the batch music_generation endpoint. The current bundled flow supports prompt, optional lyrics, instrumental mode, duration steering, and mp3 output.
  • ComfyUI support is workflow-driven and depends on the configured graph plus node mapping for prompt/output fields.

Choosing the right path

  • Use the shared provider-backed path when you want model selection, provider failover, and the built-in async task/status flow.
  • Use a plugin path such as ComfyUI when you need a custom workflow graph or a provider that is not part of the shared bundled music capability.
  • If you are debugging ComfyUI-specific behavior, see ComfyUI. If you are debugging shared provider behavior, start with Google (Gemini) or MiniMax.

Live tests

Opt-in live coverage for the shared bundled providers:
OPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/music-generation-providers.live.test.ts
Opt-in live coverage for the bundled ComfyUI music path:
OPENCLAW_LIVE_TEST=1 COMFY_LIVE_TEST=1 pnpm test:live -- extensions/comfy/comfy.live.test.ts
The Comfy live file also covers comfy image and video workflows when those sections are configured.