🎨 图像生成

2025年2月22日大约 5 分钟

🎨 图像生成

LocalAI 支持使用稳定扩散（Stable diffusion）在 CPU 上通过 C++ 和 Python 实现生成图像。

使用方法

OpenAI 文档：https://platform.openai.com/docs/api-reference/images/create

要生成图像，您可以通过向 /v1/images/generations 端点发送 POST 请求，并将指令作为请求体：

# 也支持 512x512
curl http://localhost:8080/v1/images/generations -H "Content-Type: application/json" -d '{
  "prompt": "一个可爱的小海獭",
  "size": "256x256"
}'

可用的附加参数：mode、step。

注意：要设置负面提示，您可以使用 | 分割提示，例如：一个可爱的小海獭|畸形的。

curl http://localhost:8080/v1/images/generations -H "Content-Type: application/json" -d '{
  "prompt": "漂浮的头发，肖像， ((loli))， ((一个女孩))，可爱的脸，隐藏的手，不对称的刘海，美丽的详细眼睛，眼影，头发饰品，丝带，蝴蝶结，按钮，褶皱裙， (((杰作)))， ((最佳质量))，多彩的|((头部的部分))， ((((突变的双手和手指))))，变形，模糊，糟糕的解剖，毁容，画得不好的脸，突变，突变，额外的肢体，丑陋，画得不好的手，缺失的肢体，模糊，漂浮的肢体，断开的肢体，畸形的双手，模糊，失焦，长脖子，长身体，Octane 渲染器，低分辨率，糟糕的解剖，糟糕的手，文字",
  "size": "256x256"
}'

后端

stablediffusion-ggml

这个后端基于 stable-diffusion.cpp。LocalAI 支持该后端支持的所有模型。

设置

在模型库中已经有几个模型可供安装并与此后端一起运行，例如您可以通过在模型库中搜索 flux（flux.1-dev-ggml）来运行 flux，或者使用 run 启动 LocalAI：

local-ai run flux.1-dev-ggml

要使用自定义模型，您可以按照以下步骤操作：

在模型文件夹中创建一个模型文件 stablediffusion.yaml：

name: stablediffusion
backend: stablediffusion-ggml
parameters:
  model: gguf_model.gguf
step: 25
cfg_scale: 4.5
options:
- "clip_l_path:clip_l.safetensors"
- "clip_g_path:clip_g.safetensors"
- "t5xxl_path:t5xxl-Q5_0.gguf"
- "sampler:euler"

将所需的资源下载到 models 存储库
启动 LocalAI

Diffusers

Diffusers 是生成图像、音频甚至分子 3D 结构的最先进预训练扩散模型的库。LocalAI 有一个 diffusers 后端，允许使用 diffusers 库进行图像生成。

(由 AnimagineXL 生成)

模型设置

首次使用后端时，模型将自动从 huggingface 下载。

在 models 目录中创建一个模型配置文件，例如，要在 CPU 上使用 Linaqruf/animagine-xl：

name: animagine-xl
parameters:
  model: Linaqruf/animagine-xl
backend: diffusers
f16: false
diffusers:
  cuda: false # GPU 使用时启用 (CUDA)
  scheduler_type: euler_a

依赖

这是一个额外的后端 - 在容器中已经可用，无需进行设置。不要使用以 -core 结尾的 core 镜像。如果是手动构建，请查看构建说明。

模型设置

首次使用后端时，模型将自动从 huggingface 下载。

在 models 目录中创建一个模型配置文件，例如，要在 CPU 上使用 Linaqruf/animagine-xl：

name: animagine-xl
parameters:
  model: Linaqruf/animagine-xl
backend: diffusers
cuda: true
f16: true
diffusers:
  scheduler_type: euler_a

本地模型

您也可以使用本地模型，或修改某些参数，如 clip_skip、scheduler_type 等：

name: stablediffusion
parameters:
  model: toonyou_beta6.safetensors
backend: diffusers
step: 30
f16: true
cuda: true
diffusers:
  pipeline_type: StableDiffusionPipeline
  enable_parameters: "negative_prompt,num_inference_steps,clip_skip"
  scheduler_type: "k_dpmpp_sde"
  clip_skip: 11

cfg_scale: 8

配置参数

配置文件中可用的以下参数：

参数	描述	默认值
`f16`	强制使用 `float16` 而不是 `float32`	`false`
`step`	模型运行的步数	`30`
`cuda`	启用 CUDA 加速	`false`
`enable_parameters`	为模型启用参数	`negative_prompt,num_inference_steps,clip_skip`
`scheduler_type`	计划程序类型	`k_dpp_sde`
`cfg_scale`	配置比例	`8`
`clip_skip`	Clip skip	None
`pipeline_type`	管道类型	`AutoPipelineForText2Image`
`lora_adapters`	要应用的一组 lora 适配器（相对于模型目录的文件名）	None
`lora_scales`	要应用的 lora 缩放比例（浮点数）	None

可用的计划程序类型：

计划程序	描述
`ddim`	DDIM
`pndm`	PNDM
`heun`	Heun
`unipc`	UniPC
`euler`	Euler
`euler_a`	Euler a
`lms`	LMS
`k_lms`	LMS Karras
`dpm_2`	DPM2
`k_dpm_2`	DPM2 Karras
`dpm_2_a`	DPM2 a
`k_dpm_2_a`	DPM2 a Karras
`dpmpp_2m`	DPM++ 2M
`k_dpmpp_2m`	DPM++ 2M Karras
`dpmpp_sde`	DPM++ SDE
`k_dpmpp_sde`	DPM++ SDE Karras
`dpmpp_2m_sde`	DPM++ 2M SDE
`k_dpmpp_2m_sde`	DPM++ 2M SDE Karras

可用的管道类型：

管道类型	描述
`StableDiffusionPipeline`	稳定扩散管道
`StableDiffusionImg2ImgPipeline`	稳定扩散图像到图像管道
`StableDiffusionDepth2ImgPipeline`	稳定扩散深度到图像管道
`DiffusionPipeline`	扩散管道
`StableDiffusionXLPipeline`	稳定扩散 XL 管道
`StableVideoDiffusionPipeline`	稳定视频扩散管道
`AutoPipelineForText2Image`	自动检测文本到图像管道
`VideoDiffusionPipeline`	视频扩散管道
`StableDiffusion3Pipeline`	稳定扩散 3 管道
`FluxPipeline`	Flux 管道
`FluxTransformer2DModel`	Flux 变换器 2D 模型
`SanaPipeline`	Sana 管道

高级：附加参数

在选项字段中可以指定任意参数，键值对用 : 分隔：

name: animagine-xl
# ...
options:
- "cfg_scale:6"

注意：没有完整的参数列表。任何参数都可以任意传递，并直接作为管道的参数传递给模型。不同的管道/实现支持不同的参数。

上述示例将生成以下 Python 代码以生成图像：

pipe(
    prompt="A cute baby sea otter", # 通过 API 传递的选项
    size="256x256", # 通过 API 传递的选项
    cfg_scale=6 # 通过配置文件传递的附加参数
)

使用方法

文本到图像

使用 image 生成端点和配置文件中的 model 名称：

curl http://localhost:8080/v1/images/generations \
    -H "Content-Type: application/json" \
    -d '{
      "prompt": "<正面提示>|<负面提示>", 
      "model": "animagine-xl", 
      "step": 51,
      "size": "1024x1024" 
    }'

图像到图像

https://huggingface.co/docs/diffusers/using-diffusers/img2img

一个示例模型（GPU）：

name: stablediffusion-edit
parameters:
  model: nitrosocke/Ghibli-Diffusion
backend: diffusers
step: 25
cuda: true
f16: true
diffusers:
  pipeline_type: StableDiffusionImg2ImgPipeline
  enable_parameters: "negative_prompt,num_inference_steps,image"

IMAGE_PATH=/path/to/your/image
(echo -n '{"file": "'; base64 $IMAGE_PATH; echo '", "prompt": "a sky background","size": "512x512","model":"stablediffusion-edit"}') |
curl -H "Content-Type: application/json" -d @-  http://localhost:8080/v1/images/generations

深度到图像

https://huggingface.co/docs/diffusers/using-diffusers/depth2img

name: stablediffusion-depth
parameters:
  model: stabilityai/stable-diffusion-2-depth
backend: diffusers
step: 50
# 强制 CPU 使用
f16: true
cuda: true
diffusers:
  pipeline_type: StableDiffusionDepth2ImgPipeline
  enable_parameters: "negative_prompt,num_inference_steps,image"

cfg_scale: 6

(echo -n '{"file": "'; base64 ~/path/to/image.jpeg; echo '", "prompt": "a sky background","size": "512x512","model":"stablediffusion-depth"}') |
curl -H "Content-Type: application/json" -d @-  http://localhost:8080/v1/images/generations

img2vid

name: img2vid
parameters:
  model: stabilityai/stable-video-diffusion-img2vid
backend: diffusers
step: 25
# 强制 CPU 使用
f16: true
cuda: true
diffusers:
  pipeline_type: StableVideoDiffusionPipeline

(echo -n '{"file": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png?download=true","size": "512x512","model":"img2vid"}') |
curl -H "Content-Type: application/json" -X POST -d @- http://localhost:8080/v1/images/generations

txt2vid

name: txt2vid
parameters:
  model: damo-vilab/text-to-video-ms-1.7b
backend: diffusers
step: 25
# 强制 CPU 使用
f16: true
cuda: true
diffusers:
  pipeline_type: VideoDiffusionPipeline
  cuda: true
``````markdown
# Spiderman Surfing

将以下内容翻译成中文，并执行相关操作：

{"prompt": "spiderman surfing","size": "512x512","model":"txt2vid"}') |
curl -H "Content-Type: application/json" -X POST -d @- http://localhost:8080/v1/images/generations