Stable Diffusionを使って画像を生成する

しむどん： 2026-01-16

今回は、Google Colab上でStable Diffusionを実行できる環境を作り、画像の生成をする。

環境構築

必要なライブラリをインストールする。

!pip install diffusers transformers accelerate ftfy pillow scipy
!pip install --upgrade torch torchvision torchaudio peft --index-url https://download.pytorch.org/whl/cu118

簡単な画像を生成する

ライブラリのインポートやモデルのロードを行う。 StableDiffusionPipeline.from_pretrained() を呼び出した時、モデルがまだダウンロードされていなければ、モデルのダウンロードも行われる。

import os
import torch

from diffusers import StableDiffusionPipeline

os.environ["TOKENIZERS_PARALLELISM"] = "true"

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipe = pipe.to("cuda")

プロンプトを渡し、画像を生成する。

prompt = "かわいい猫"
image = pipe(prompt).images[0]

生成した画像を表示する。

display(image)

diffusersをソースコードからインストールする

今度は、diffusersをソースコードからインストールしてみよう。

まずは、リポジトリを取得する。

git clone --depth 1 [email protected]:huggingface/diffusers.git

リポジトリの最上位ディレクトリに移動する。

cd ./diffusers

インストールする。今回はエディタブルインストールする。

python3 -m pip install --no-build-isolation -v -e .

インストールに成功すると、最後に以下のような出力が表示される。

Successfully installed diffusers-0.37.0.dev0 importlib_metadata-8.7.1 regex-2026.1.15 safetensors-0.7.0 zipp-3.23.0

インストールできている事を確認する。

python3 -c 'import diffusers; print(diffusers.__version__)'

0.37.0.dev0

出来ていそうだ。

DDPMを使う

DDPM を使用してみる。これはGoogleが配布しているモデルで、とても軽量なモデルであるため、モデル自体の品質が重要でない場合の動作確認には向いている。

Pythonインタプリタでコードを実行し、モデルのロードから推論までを見ていく。まずはモデルをロードする。この時、ローカルのキャッシュにモデルが無ければダウンロードも行われる。

import torch
from diffusers.pipelines.ddpm.pipeline_ddpm import DDPMPipeline

pipeline = DDPMPipeline.from_pretrained("google/ddpm-cat-256", torch_dtype=torch.float16).to("mps")

このパイプラインを使って推論を実行し、画像を生成する。シンプルなDDPMの実装の場合、推論ステップは1000回ほど必要になるしい。

generator = torch.Generator(device="cpu")  # ここはあえてcpuを指定する(理由は後述)

result = pipeline(batch_size=1, generator=generator, num_inference_steps=10,)  # 推論

推論の結果が返ったら、その画像を表示して確認しよう。

result.images[0].show()  # 画像を表示

RuntimeError: Placeholder storage has not been allocated on MPS device! が発生した場合はdeviceが食い違っている

先程のコードの中の torch.Generator() は、deviceに"cpu"を指定しているが、"mps"を指定してしまうと RuntimeError が発生した。

RuntimeError: Placeholder storage has not been allocated on MPS device!

ソースコードを辿っていくと、 diffusers/src/diffusers/pipelines/ddpm/pipeline_ddpm.py で定義されているDDPMPipelineの以下のコードで、問題が発生していた。

        if self.device.type == "mps":
            # randn does not work reproducibly on mps
            image = randn_tensor(image_shape, generator=generator, dtype=self.unet.dtype)
            image = image.to(self.device)
        else:
            image = randn_tensor(image_shape, generator=generator, device=self.device, dtype=self.unet.dtype)

diffusers/src/diffusers/pipelines/ddpm/pipeline_ddpm.py 抜粋

self.device.type は "mps" であるため、最初の if のブロックに入のだが、何やらあやしげなコメント「randn does not work reproducibly on mps」がある。どうやら mps では randn の再現性や挙動が環境依存になることがあるらしい。そのため、乱数テンソルを一時的にcpuで生成してからmpsに移動させているようだ。 randn_tensor() の呼び出しに device を指定していないため、デフォルト値の cpu が使われるようになっていた。generatorのdeviceはmpsを指定していると、この部分で食い違いが発生し、最終的に randn_tensor() 内の以下の処理でエラーするようになっていた。

        latents = torch.randn(shape, generator=generator, device=rand_device, dtype=dtype, layout=layout).to(device)

diffusers/utils/torch_utils.py 抜粋

僕の環境では、 randn_tensor() 呼出しの時にdeviceも渡すように修正してみたのだが、その場合でもエラーは発生しなかった。ただ、エラーが発生していないだけで、意図した挙動をさせられていない可能性はある。そのため、Generatorのdeviceにはcpuを指定する事で、エラーを回避した。

Stable Diffusion Web UIを使う

git clone --depth 1 [email protected]:AUTOMATIC1111/stable-diffusion-webui.git

cd stable-diffusion-webui

python3 -m pip install --no-build-isolation -v -e .#+end_src


* memo                                                             :noexport:





#+begin_src python
import os

import torch
from diffusers.pipelines.ddpm.pipeline_ddpm import DDPMPipeline

from src.utils import ExperimentalContext, options


def inference(pipeline, context: ExperimentalContext, batch_size, num_inference_steps=1000):
    # 推論
    images = pipeline(
        batch_size=batch_size,
        generator=context.generator,
        num_inference_steps=num_inference_steps,
    ).images

    # 画像の保存
    for i, image in enumerate(images):
        context.save_image(image, 'uncond', f'i{i}_n{num_inference_steps}')


@options
def main(seed, device):
    batch_size = 1

    # モデルの読み込み
    pipeline = DDPMPipeline.from_pretrained('google/ddpm-cat-256', torch_No changes detected in the diff output.dtype=torch.float16).to(device)

    context = ExperimentalContext(seed=seed, device=device, root_dir=os.path.join('out', 'ddpm_cat'))
    inference(pipeline=pipeline, context=context, batch_size=batch_size, num_inference_steps=1000)


if __name__ == '__main__':
    main()

RuntimeError: Failed to import transformers.models.clip.modeling_clip because of the following error (look up to see its traceback):
No module named 'torch._C._distributed_c10d'; 'torch._C' is not a package