東大 生物情報学科、学部生の備忘録

東京大学の学生です。日々の気づき、学び、つまづいたことをメモにします。

MENU

Stable Diffusion for M2 mac

Introduction

I will explain the process of setting up "Stable Diffusion" on an M2/M1 Mac. It took me several days to reach this method, and while it may not be the optimal solution, I believe I have reached a local optimum.

This content is intended for those with a certain level of PC knowledge who have struggled and experimented up to this point. I want to share the process I went through to reach this stage.

For those with limited knowledge or who do not use Lora, I recommend referring to the article about DiffusionBee. It is a very popular app, and it became compatible with ".safetensors" about two weeks ago, with the possibility of using Lora in the future. If that development continues, there will be no need to go through the trouble of setting up the environment.

The Japanese article is available here:

ut-bioinformatic.hatenablog.jp

Environment

Candidates for Stable Diffusion Interface

There are three candidates:

Automatic1111

installation

Please search for "M2 Automatic1111 installation" on Google to find an easy installation guide.

Errors about xformers and torch

While Automatic1111 can be run locally, Mac does not support CUDA, so errors will always occur with "torch.cuda.*". Solutions for this issue are suggested in resources such as:

Methods like adding "--xformers" to "COMMANDLINE_ARGS" or replacing all occurrences of the string "cuda" with "mps" were suggested, but they are not accurate replacements. Furthermore, the following error message continues to appear:

Despite trying various solutions to resolve this issue, I could not eliminate the error message.

However, when checking the GPU usage history in the Activity Monitor of the Mac's Application, I found that the GPU utilization reached 100% when generating images with Stable Diffusion. This means that the Mac's GPU is being utilized. So, even if error messages related to xformers and torch appear, it is not a significant concern as long as the GPU usage is close to 100%.

In the end, I removed "--xformers" from "Commandline_args". Note: It is necessary to specify "--skip-torch-cuda-test".

Usage for Automatic1111

Automatic1111 には多くの有用な拡張機能が存在している。それを確実に利用するのが良い。 ControlNet, Adetailer, Openpose-editor and sd-dynamic-prompts are really useful. I recommend that you should install it and learn how to use as soon as possible!

Training Lora (How to create Lora) with M2/M1 mac

M2

Training Lora (How to create Lora) with M2/M1 Mac M2 First, in Automatic1111, install the "sd-webui-train-tools" extension and start the training process as described in the following link:

遂に登場!!待望のWebUI上でLoRA学習可能な拡張機能を使いトレーニングする方法 | 経済的生活日誌 However, the training process was incredibly slow. This was expected because the GPU utilization was only around 30%, indicating that the GPU was not being utilized. I explored ways to utilize the GPU and found several references suggesting a solution like the one mentioned here:

Apple Silicon(M1 / M2)MacのGPUでStable DiffusionのLoRAを学習 - Planaria Work Log It involves simply replacing all instances of "cuda" with "mps" in the "kohya_ss/library/train_util.py" file. However, this method has its drawbacks, as mentioned earlier, and considering that Lora training already requires a significant amount of time and computational resources, I gave up on working locally.

Colab

I recommend using Colab instead. You can quickly start training by referring to the following video:

  • youtu.be
  • The URL for the Colab notebook can be found in the following link:

https://github.com/Linaqruf/kohya-trainer/blob/main/README.md However, the free version of Google Colab has resource limitations, and long training sessions may be interrupted. Therefore, it is advisable to subscribe to Google Colab Pro. It is also recommended to expand the storage capacity of Google Drive. Although Google Colab offers an on-demand payment system where you pay for the resources used, I found it inconvenient to investigate the details, so I opted for a monthly subscription. (There is room for improvement in this aspect, but I haven't used it for more than a month yet, so I'm not sure what the ideal option is.) Each Lora training session consumes around 6.0 computing units, and Colab Pro provides 100.0 computing units per month. Therefore, it may not be suitable for those who plan to perform Lora training more than 20 times per month. In such cases, upgrading to Colab Pro+ or resorting to the previously mentioned "training Lora on M2 Mac" are the only alternatives.

Conclusion

Here is the current answer based on the decision tree for M2 Mac Stable Diffusion:

M2 Mac StableDiffusion Decision Tree
If you don't use Lora, it is recommended to use DiffusionBee. If you use Lora for training frequently, it is advisable to use Colab Pro+. Otherwise, you can use Automatic1111 on your local machine or perform Lora training on Colab Pro.

This is the current answer at this point in time.

related article

DiffusionBee has been updated! It can has LoRA and ControlNet with better UI!!

ut-bioinformatic.hatenablog.jp