Web fetch - OpenClaw

web_fetch does a plain HTTP GET and extracts readable content (HTML to markdown or text). It does not execute JavaScript. For JS-heavy sites or login-protected pages, use the Web Browser instead.

Quick start

Enabled by default, no configuration needed:

await web_fetch({ url: "https://example.com/article" });

Tool parameters

url

string

required

URL to fetch. http(s) only.

extractMode

'markdown' | 'text'

default:"markdown"

Output format after main-content extraction.

maxChars

number

Truncate output to this many characters. Clamped to tools.web.fetch.maxCharsCap.

How it works

Fetch

Sends an HTTP GET with a Chrome-like User-Agent and Accept-Language header. Blocks private/internal hostnames and re-checks redirects.

Extract

Runs Readability (main-content extraction) on the HTML response.

Fallback (optional)

If Readability fails and a fetch provider is available, retries through that provider (for example Firecrawl’s bot-circumvention mode).

Cache

Results are cached for 15 minutes (configurable) to reduce repeated fetches of the same URL.

Progress updates

web_fetch emits a public progress line only when the fetch is still pending after five seconds:

Fetching page content...

Fast cache hits and quick network responses finish before the timer fires, so they never show a progress line. Canceling the call clears the timer. The progress line is channel UI state only and never contains fetched page content.

Config

{
  tools: {
    web: {
      fetch: {
        enabled: true, // default: true
        provider: "firecrawl", // optional; omit for auto-detect
        maxChars: 20000, // default output chars; capped by maxCharsCap
        maxCharsCap: 20000, // hard cap for maxChars param
        maxResponseBytes: 750000, // max download size before truncation (32000-10000000)
        timeoutSeconds: 30,
        cacheTtlMinutes: 15,
        maxRedirects: 3,
        useTrustedEnvProxy: false, // let a trusted HTTP(S) env proxy resolve DNS
        readability: true, // use Readability extraction
        userAgent: "Mozilla/5.0 ...", // override User-Agent
        ssrfPolicy: {
          allowRfc2544BenchmarkRange: true, // opt-in for trusted fake-IP proxies using 198.18.0.0/15
          allowIpv6UniqueLocalRange: true, // opt-in for trusted fake-IP proxies using fc00::/7
        },
      },
    },
  },
}

Firecrawl fallback

If Readability extraction fails, web_fetch can fall back to Firecrawl for bot-circumvention and better extraction:

{
  tools: {
    web: {
      fetch: {
        provider: "firecrawl", // optional; omit for auto-detect from available credentials
      },
    },
  },
  plugins: {
    entries: {
      firecrawl: {
        enabled: true,
        config: {
          webFetch: {
            // apiKey: "fc-...", // optional; omit for keyless starter access
            baseUrl: "https://api.firecrawl.dev",
            onlyMainContent: true,
            maxAgeMs: 172800000, // cache duration (2 days)
            timeoutSeconds: 60,
          },
        },
      },
    },
  },
}

plugins.entries.firecrawl.config.webFetch.apiKey is optional and supports SecretRef objects. Legacy tools.web.fetch.firecrawl.* config auto-migrates to plugins.entries.firecrawl.config.webFetch via openclaw doctor --fix.

If you configure a Firecrawl API-key SecretRef and it is unresolved with no FIRECRAWL_API_KEY env fallback, gateway startup fails fast.

Firecrawl baseUrl overrides are locked down: hosted traffic uses https://api.firecrawl.dev; self-hosted overrides must target private or internal endpoints, and http:// is accepted only for those private targets.

Current runtime behavior:

tools.web.fetch.provider selects the fetch fallback provider explicitly.
If provider is omitted, OpenClaw auto-detects the first ready web-fetch provider from configured credentials. Non-sandboxed web_fetch can use installed plugins that declare contracts.webFetchProviders and register a matching provider at runtime. The official Firecrawl plugin provides this fallback today.
Sandboxed web_fetch calls allow bundled providers plus installed providers whose official npm or ClawHub provenance is verified. Today that permits the official Firecrawl plugin; third-party external fetch plugins stay excluded.
If Readability is disabled, web_fetch skips straight to the selected provider fallback. If no provider is available, it fails closed.

Trusted env proxy

If your deployment requires web_fetch to go through a trusted outbound HTTP(S) proxy, set tools.web.fetch.useTrustedEnvProxy: true. In this mode, OpenClaw still applies hostname-based SSRF checks before sending the request, but it lets the proxy resolve DNS instead of doing local DNS pinning. Enable this only when the proxy is operator-controlled and enforces outbound policy after DNS resolution.

If no HTTP(S) proxy env var is configured, or the target host is excluded by NO_PROXY, web_fetch falls back to the normal strict path with local DNS pinning.

Limits and safety

maxChars is clamped to tools.web.fetch.maxCharsCap (default 20000)
Response body is capped at maxResponseBytes (default 750000, clamped to 32000-10000000) before parsing; oversized responses are truncated with a warning
Private/internal hostnames are blocked
tools.web.fetch.ssrfPolicy.allowRfc2544BenchmarkRange and tools.web.fetch.ssrfPolicy.allowIpv6UniqueLocalRange are narrow opt-ins for trusted fake-IP proxy stacks; leave them unset unless your proxy owns those synthetic ranges and enforces its own destination policy
Redirects are checked and limited by maxRedirects (default 3)
useTrustedEnvProxy is an explicit opt-in and should only be enabled for operator-controlled proxies that still enforce outbound policy after DNS resolution
web_fetch is best-effort — some sites need the Web Browser

Tool profiles

If you use tool profiles or allowlists, add web_fetch or group:web:

{
  tools: {
    allow: ["web_fetch"],
    // or: allow: ["group:web"]  (includes web_fetch, web_search, and x_search)
  },
}

Web Search — search the web with multiple providers
Web Browser — full browser automation for JS-heavy sites
Firecrawl — Firecrawl search and scrape tools

​Quick start

​Tool parameters

​How it works

​Progress updates

​Config

​Firecrawl fallback

​Trusted env proxy

​Limits and safety

​Tool profiles

​Related

Quick start

Tool parameters

How it works

Progress updates

Config

Firecrawl fallback

Trusted env proxy

Limits and safety

Tool profiles

Related