跳到正文

Web

联网搜索与单页抓取,专为 LLM grounding 优化(Tavily/Exa/Brave · Firecrawl/Jina)。

概览

基础路径: https://api.infrai.cc/v1/web
鉴权头: Authorization: Bearer $INFRAI_API_KEY
bash
# Call any /v1/web capability over raw HTTP — no SDK to install.
# curl:
curl https://api.infrai.cc/v1/web/... \
  -H "Authorization: Bearer $INFRAI_API_KEY" \
  -H "Content-Type: application/json"

方法

web.scrape

POST /v1/web/scrape

抓取并读取单个网页,返回供 LLM 总结的干净文本/markdown。只读且遵守 robots.txt——不爬全站、不绕付费墙。厂商支撑(Firecrawl/Jina)。计费工作动作。

参数

名称类型必填说明
urlstring
必填
要抓取并读取的单个页面 URL。
format"markdown" | "text"可选抓取内容的输出格式:markdown(默认)或 text。
idempotency_keystring可选可选去重 key;相同重试将返回同一结果。

返回

ScrapeResult { url, title, content, format }

示例

一次性前置(每个范例都假定已完成):

bash
# No SDK to install — every call is a plain HTTPS request.
# Get a project key by signing in at https://infrai.cc/login (Google/GitHub gives
# you $2 free credit; email sign-in starts at $0). On 402 INSUFFICIENT_CREDIT, add
# funds at https://infrai.cc/billing (or POST /v1/account/topup and open the
# returned checkout_url).
export INFRAI_API_KEY="ifr_..."
bash
curl -X POST https://api.infrai.cc/v1/web/scrape \
  -H "Authorization: Bearer $INFRAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "..."}'

全部能力

本模块全部已路由能力——完整的对外 REST 契约。上方方法是带讲解的入门示例,此表是完整参考。

能力端点说明
web.scrapePOST /v1/web/scrapeFetch and read a single web page, returning clean text/markdown for an LLM to summarize. Read-only and honors robots.txt — no crawl, no paywall bypass. Vendor-backed (Firecrawl/Jina). Billable work-action.
web.searchPOST /v1/web/searchSearch the live web and return ranked results (title, url, snippet) optimized for LLM grounding — adds real-time knowledge to an AI app. Vendor-backed (Tavily/Exa/Brave). Billable work-action.

完整示例

本模块的生产级端到端范例:先一次性配置,再运行业务流程,尽量覆盖本模块的多数 API。

单文件可运行 Python 程序(仅标准库、无 SDK):拷贝后填入 INFRAI_API_KEY 运行,即可按真实业务流逐步体验本模块核心 API——每一步都真实调用并计费,后续步骤复用前一步返回的真实字段。12 行 helper 就是全部集成代码。

python
#!/usr/bin/env python3
"""Infrai · web — runnable real-app example (single file, zero deps).

Copy this file, set your key, run it: every step is a REAL call to
api.infrai.cc, billed at the real (tiny) per-call price, printing the
live JSON response. Get a key at https://infrai.cc/login (Google/
GitHub sign-in grants $2 free credit); add funds at
https://infrai.cc/billing. No SDK — the 12-line helper below is the
entire integration."""
import json
import os
from urllib import error, request

KEY = os.environ.get("INFRAI_API_KEY") or "ifr_..."  # <- your key
BASE = "https://api.infrai.cc"


# Same raw HTTPS POST/GET as every per-method example on this page —
# wrapped once for reuse. There is nothing else to it: no SDK.
def infrai(method, path, body=None):
    req = request.Request(
        BASE + path, method=method,
        data=json.dumps(body).encode() if body is not None else None,
        headers={"Authorization": f"Bearer {KEY}",
                 "Content-Type": "application/json"})
    try:
        with request.urlopen(req, timeout=60) as r:
            return json.loads(r.read())
    except error.HTTPError as e:
        return json.loads(e.read())


def show(label, resp):
    print(f"\n== {label} ==")
    print(json.dumps(resp, indent=2, ensure_ascii=False))
    return resp


# 1) web.search — POST /v1/web/search · Search the live web and return ranked results (title, url, snippet) optimized for LLM grounding — adds real-time knowledge to an AI app. Vendor-backed (Tavily/Exa/Brave). Billable work-action.
r1 = show("web.search", infrai("POST", "/v1/web/search", {"query":"what is retrieval-augmented generation","max_results":3}))

# 2) web.scrape — POST /v1/web/scrape · Fetch and read a single web page, returning clean text/markdown for an LLM to summarize. Read-only and honors robots.txt — no crawl, no paywall bypass. Vendor-backed (Firecrawl/Jina). Billable work-action.
r2 = show("web.scrape", infrai("POST", "/v1/web/scrape", {"url":"https://infrai.cc","format":"markdown"}))