Skip to main content
By the end of this guide, your orchestrator will accept AI inference jobs alongside transcoding.

Prerequisites

Before you begin:
  • go-livepeer is installed and running as a transcoding orchestrator on Arbitrum mainnet (see Install go-livepeer and Get Started)
  • Your orchestrator is in the Top 100 active set on the Livepeer network
  • Docker is installed with nvidia-container-toolkit enabled (GPU passthrough required for the AI runner containers)
  • Your GPU has at least 4GB of VRAM available to run at least one AI pipeline (see the hardware check below)
  • Model weights pre-downloaded for the pipeline(s) you want to serve (see Download AI Models)
This guide adds AI inference to an existing transcoding node. If you are setting up from scratch, start with Install go-livepeer.

Check your hardware

AI inference runs in a separate Docker container alongside your transcoding process. If both share the same GPU, VRAM is split between them. Before configuring anything, confirm how much VRAM your GPU has available. Run this command to list your GPUs and their VRAM:
nvidia-smi --query-gpu=index,name,memory.total,memory.free --format=csv
You should see output similar to:
index, name, memory.total [MiB], memory.free [MiB]
0, NVIDIA GeForce RTX 3090, 24576 MiB, 22000 MiB
Use the table below to see which pipelines you can run based on your available VRAM:
PipelineMin VRAMNotes
image-to-text4GBCaption generation; lowest barrier to entry
segment-anything-26GBObject segmentation
LLM (llm)8GBRequires Ollama runner; 7–8B quantised models
audio-to-text12GBSpeech transcription; Whisper-based
image-to-video16GB+ Animated video from image
image-to-image20GBStyle transfer, image manipulation
text-to-image24GBText-to-image generation (Stable Diffusion, SDXL)
upscaleImage upscaling
text-to-speechSpeech synthesis
For details on each pipeline, see Job Types.
If your GPU does not have enough free VRAM to run both transcoding and your chosen AI pipeline, AI runner containers will fail to start. Either select a lower-VRAM pipeline, dedicate a second GPU exclusively to AI, or stop transcoding on that GPU before enabling AI.

Step 1 — Pull the AI runner image

The AI subnet uses a separate Docker image (livepeer/ai-runner) to run inference. Pull it before starting your node:
docker pull livepeer/ai-runner:latest
If you plan to run the segment-anything-2 pipeline, also pull its pipeline-specific image:
docker pull livepeer/ai-runner:segment-anything-2
Check the AI Pipelines documentation for any other pipeline-specific images.

Step 2 — Configure aiModels.json

The aiModels.json file tells your orchestrator which AI pipelines and models to serve, what to charge, and whether to keep models warm in VRAM. Create the file at ~/.lpData/aiModels.json:
touch ~/.lpData/aiModels.json
Add at least one pipeline entry. The example below configures a single text-to-image pipeline with a warm model — the minimal working configuration:
[
  {
    "pipeline": "text-to-image",
    "model_id": "ByteDance/SDXL-Lightning",
    "price_per_unit": 4768371,
    "warm": true
  }
]

Field reference

FieldRequiredDescription
pipelineYesPipeline name (e.g. "text-to-image", "audio-to-text", "llm")
model_idYesHuggingFace model ID
price_per_unitYesPrice in wei per unit (integer), or USD string e.g. "0.5e-2USD"
warmNoIf true, model is preloaded into VRAM on startup
capacityNoMax concurrent inference requests (default: 1)
optimization_flagsNoPerformance flags: SFAST (up to +25% speed) and/or DEEPCACHE (up to +50% speed)
urlNoFor external containers only — URL of a separately managed runner
tokenNoBearer token for external container authentication
During Beta, only one warm model per GPU is supported. Set "warm": true for the model you want pre-loaded; additional models will load on demand when requested.
For recommended pricing per pipeline, see Job Types. For a full multi-pipeline example, see AI Pipeline Configuration.

Step 3 — Update your startup command

Stop your current go-livepeer process, then restart it with the following additions. Three flags enable AI:
  • -aiWorker — enables the AI worker functionality
  • -aiModels — path to your aiModels.json file
  • -aiModelsDir — directory where model weights are stored on the host machine
Before (transcoding only):
livepeer \
  -network arbitrum-one-mainnet \
  -ethUrl <ETH_URL> \
  -orchestrator \
  -transcoder \
  -nvidia 0 \
  -pricePerUnit <PRICE> \
  -serviceAddr <SERVICE_ADDR>
After (transcoding + AI):
livepeer \
  -network arbitrum-one-mainnet \
  -ethUrl <ETH_URL> \
  -orchestrator \
  -transcoder \
  -nvidia 0 \
  -pricePerUnit <PRICE> \
  -serviceAddr <SERVICE_ADDR> \
  -aiWorker \
  -aiModels ~/.lpData/aiModels.json \
  -aiModelsDir ~/.lpData/models
If you are running via Docker, mount the Docker socket so the orchestrator can manage ai-runner containers:
docker run \
  --name livepeer_orchestrator \
  -v ~/.lpData/:/root/.lpData/ \
  -v /var/run/docker.sock:/var/run/docker.sock \
  --network host \
  --gpus all \
  livepeer/go-livepeer:master \
  -network arbitrum-one-mainnet \
  -ethUrl <ETH_URL> \
  -orchestrator \
  -transcoder \
  -nvidia 0 \
  -pricePerUnit <PRICE> \
  -serviceAddr <SERVICE_ADDR> \
  -aiWorker \
  -aiModels /root/.lpData/aiModels.json \
  -aiModelsDir ~/.lpData/models
The -aiModelsDir path must be the host machine path, not the path inside the Docker container. The orchestrator uses docker-out-of-docker to start ai-runner containers, and passes this path directly to them.

Step 4 — Verify AI is active

Check the logs

Within a few seconds of startup, you should see a line like this for each model configured as warm:
2024/05/01 09:01:39 INFO Starting managed container gpu=0 name=text-to-image_ByteDance_SDXL-Lightning modelID=ByteDance/SDXL-Lightning
If you see the standard RPC ping without the managed container line, check that:
  • aiModels.json is valid JSON and at the path specified in -aiModels
  • The model weights are present in -aiModelsDir
  • The Docker socket is mounted (Docker mode only)

Test the AI runner directly

Once running, confirm the AI runner responds by sending a test inference request. Navigate to http://localhost:8000/docs in your browser to access the Swagger UI for the ai-runner container. Alternatively, use curl:
curl -X POST "http://localhost:8000/text-to-image" \
  -H "Content-Type: application/json" \
  -d '{"model_id": "ByteDance/SDXL-Lightning", "prompt": "A cool cat on the beach", "width": 512, "height": 512}'
A successful response returns a JSON object with an images array containing a base64-encoded PNG URL.

Confirm pipelines are advertised

Your AI pipelines will appear in the Livepeer Explorer on your orchestrator’s profile once on-chain capability advertisement is configured. See Publish Offerings for that step.

Choose your AI path

Your AI runner is active. The next step depends on which pipeline type you want to specialise in.

Set up batch AI inference

Configure image, audio, and video generation pipelines. Covers model downloads, pricing, and on-chain registration for batch inference.

Set up real-time AI (Cascade)

Configure ComfyStream for persistent video stream processing. Covers ComfyUI workflow deployment and GPU allocation.

  • Job Types — understand the difference between transcoding, batch AI, real-time AI, and LLM inference before choosing a path
  • AI Pipeline Configuration — advanced aiModels.json options, multi-GPU setup, external containers, and optimization flags
Last modified on April 7, 2026