[core] parallel loading of shards #12028

sayakpaul · 2025-07-31T07:53:18Z

What does this PR do?

Similar to huggingface/transformers#36835.

`main`: time: 8.162s
this branch: time: 5.663s

code

import time
t_ini = time.time()

import torch
import os
from diffusers import DiffusionPipeline, AutoModel
print(f"import time: {time.time() - t_ini:.3f}s")

os.environ["HF_ENABLE_PARALLEL_LOADING"] = "YES"
os.environ["HF_PARALLEL_LOADING_WORKERS"] = "12"
model_id = "Wan-AI/Wan2.2-I2V-A14B-Diffusers"

t0 = time.time()
torch.cuda.synchronize()
print(f"CUDA sync time: {time.time() - t0:.3f}s")

print("starting model load")
t1 = time.time()
transformer = AutoModel.from_pretrained(
    model_id, subfolder="transformer", torch_dtype=torch.bfloat16, device_map="cuda"
)
torch.cuda.synchronize()
t2 = time.time()

diff = t2 - t1
print(f"time: {diff:.3f}s")

sayakpaul · 2025-07-31T07:55:49Z

src/diffusers/models/model_loading_utils.py

@@ -310,6 +311,130 @@ def load_model_dict_into_meta(
    return offload_index, state_dict_index


+def check_support_param_buffer_assignment(model_to_load, state_dict, start_prefix=""):


Moved it here from modeling_utils.py.

sayakpaul · 2025-07-31T07:56:12Z

src/diffusers/models/model_loading_utils.py

+    return offload_index, state_dict_index, mismatched_keys, error_msgs
+
+
+def _find_mismatched_keys(


Same. Moved it out of modeling_utils.py.

sayakpaul · 2025-07-31T07:56:47Z

src/diffusers/models/modeling_utils.py

-        if len(resolved_model_file) > 1:
-            resolved_model_file = logging.tqdm(resolved_model_file, desc="Loading checkpoint shards")
-
-        mismatched_keys = []
-        assign_to_params_buffers = None
-        error_msgs = []
-
-        for shard_file in resolved_model_file:
-            state_dict = load_state_dict(shard_file, dduf_entries=dduf_entries)
-            mismatched_keys += _find_mismatched_keys(
-                state_dict, model_state_dict, loaded_keys, ignore_mismatched_sizes


This has been moved to load_shard_file().

HuggingFaceDocBuilderDev · 2025-07-31T08:27:55Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

src/diffusers/models/model_loading_utils.py

DN6 · 2025-08-12T09:36:45Z

src/diffusers/models/model_loading_utils.py

+
+
+def load_shard_files_with_threadpool(args_list):
+    num_workers = int(os.environ.get("HF_PARALLEL_LOADING_WORKERS", "8"))


Would add HF_PARALLEL_LOADING_WORKERS as a constant at the top of the file for consistency.

src/diffusers/models/model_loading_utils.py

DN6 · 2025-08-12T10:19:08Z

src/diffusers/models/modeling_utils.py

+        args_list = [
+            (
+                model,
+                model_state_dict,
+                shard_file,
+                device_map,
+                dtype,
+                hf_quantizer,
+                keep_in_fp32_modules,
+                dduf_entries,
+                loaded_keys,
+                unexpected_keys,
+                offload_index,
+                offload_folder,
+                state_dict_index,
+                state_dict_folder,
+                ignore_mismatched_sizes,
+                low_cpu_mem_usage,
+            )
+            for shard_file in resolved_model_file
+        ]


Since the same arguments are used across the two loading functions, it's a good candidate for functools.partial

load_fn = partial( load_shard_files_with_threadpool if is_parallel_loading_enabled else load_shard_file, model=model, model_state_dict=model_state_dict, device_map=device_map, dtype=dtype, hf_quantizer=hf_quantizer, keep_in_fp32_modules=keep_in_fp32_modules, dduf_entries=dduf_entries, loaded_keys=loaded_keys, unexpected_keys=unexpected_keys, offload_index=offload_index, offload_folder=offload_folder, state_dict_index=state_dict_index, state_dict_folder=state_dict_folder, ignore_mismatched_sizes=ignore_mismatched_sizes, low_cpu_mem_usage=low_cpu_mem_usage, ) if is_parallel_loading_enabled: offload_index, state_dict_index, _mismatched_keys, _error_msgs = load_fn( resolved_model_file, ) error_msgs += _error_msgs mismatched_keys += _mismatched_keys else: shard_files = resolved_model_file if len(resolved_model_file) > 1: shard_files = logging.tqdm(resolved_model_file, desc="Loading checkpoint shards") for shard_file in resolved_model_file: offload_index, state_dict_index, _mismatched_keys, _error_msgs = load_fn(shard_file) error_msgs += _error_msgs mismatched_keys += _mismatched_keys

Co-authored-by: Dhruv Nair <[email protected]>

sayakpaul · 2025-08-12T15:38:28Z

@stevhliu, could you help add docs for this PR (separate PR is fine)? I think we could have some guidance on how to load a DiffusionPipeline faster when the hardware allows for it (like directly loading all model components on the accelerator).

#11904 could also be mentioned in the document.

Then we're working on #12122

sayakpaul · 2025-08-12T15:47:53Z

@DN6 thanks a lot for your thoughtful suggestions. I have reflected them and I have added a test case, as well.

LMK what you think.

DN6

Thanks!

src/diffusers/models/model_loading_utils.py

src/diffusers/models/modeling_utils.py

DN6 · 2025-08-13T08:32:56Z

src/diffusers/utils/constants.py

@@ -43,6 +43,8 @@
 DIFFUSERS_REQUEST_TIMEOUT = 60
 DIFFUSERS_ATTN_BACKEND = os.getenv("DIFFUSERS_ATTN_BACKEND", "native")
 DIFFUSERS_ATTN_CHECKS = os.getenv("DIFFUSERS_ATTN_CHECKS", "0") in ENV_VARS_TRUE_VALUES
+DEFAULT_HF_PARALLEL_LOADING_WORKERS = 8
+HF_PARALLEL_LOADING_FLAG = "HF_ENABLE_PARALLEL_LOADING"


I meant to run the env check here

HF_ENABLE_PARALLEL_LOADING = os.environ.get("HF_ENABLE_PARALLEL_LOADING", "").upper() in ENV_VARS_TRUE_VALUES

Then import the constant into modeling_utils.

sayakpaul added 5 commits July 10, 2025 11:06

checking.

af72ece

checking

d4e2976

checking

c9b680d

up

ab84d5a

up

536df5a

sayakpaul commented Jul 31, 2025

View reviewed changes

sayakpaul added 2 commits July 31, 2025 13:36

up

04cd5cc

up

cb0b3ed

sayakpaul mentioned this pull request Jul 31, 2025

Enhance Model Loading By Providing Parallelism, Uses Optional Env Flag huggingface/transformers#36835

Merged

3 tasks

sayakpaul requested a review from a-r-r-o-w July 31, 2025 08:48

sayakpaul added 8 commits August 1, 2025 08:13

Merge branch 'main' into parallel-shards-loading

2fdc091

Merge branch 'main' into parallel-shards-loading

6d15594

Merge branch 'main' into parallel-shards-loading

d34f426

Merge branch 'main' into parallel-shards-loading

35e859b

Merge branch 'main' into parallel-shards-loading

2cc83b8

Merge branch 'main' into parallel-shards-loading

9844c10

Merge branch 'main' into parallel-shards-loading

73fb972

Merge branch 'main' into parallel-shards-loading

04bff1c

DN6 reviewed Aug 12, 2025

View reviewed changes

sayakpaul and others added 3 commits August 12, 2025 20:30

Apply suggestions from code review

cd13977

Co-authored-by: Dhruv Nair <[email protected]>

up

8968e2f

up

dca6388

sayakpaul marked this pull request as ready for review August 12, 2025 15:36

sayakpaul changed the title ~~[wip][core] parallel loading of shards~~ [core] parallel loading of shards Aug 12, 2025

sayakpaul added 2 commits August 12, 2025 21:13

Merge branch 'main' into parallel-shards-loading

e276f08

fix

ad2dd62

Merge branch 'main' into parallel-shards-loading

36c86d2

sayakpaul requested a review from DN6 August 13, 2025 02:32

stevhliu mentioned this pull request Aug 13, 2025

[docs] Parallel loading of shards #12135

Open

DN6 approved these changes Aug 13, 2025

View reviewed changes

src/diffusers/models/model_loading_utils.py Outdated Show resolved Hide resolved

src/diffusers/models/modeling_utils.py Outdated Show resolved Hide resolved

sayakpaul added 2 commits August 13, 2025 09:44

review feedback.

ae2561b

Merge branch 'main' into parallel-shards-loading

f0eec0d

sayakpaul merged commit baa9b58 into main Aug 13, 2025
34 of 35 checks passed

sayakpaul deleted the parallel-shards-loading branch August 13, 2025 05:03

DN6 reviewed Aug 13, 2025

View reviewed changes

sayakpaul mentioned this pull request Aug 13, 2025

make parallel loading flag a part of constants. #12137

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[core] parallel loading of shards #12028

[core] parallel loading of shards #12028

sayakpaul commented Jul 31, 2025 •

edited

Loading

Uh oh!

sayakpaul Jul 31, 2025

Uh oh!

sayakpaul Jul 31, 2025

Uh oh!

sayakpaul Jul 31, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jul 31, 2025

Uh oh!

Uh oh!

DN6 Aug 12, 2025

Uh oh!

Uh oh!

DN6 Aug 12, 2025 •

edited

Loading

Uh oh!

sayakpaul commented Aug 12, 2025

Uh oh!

sayakpaul commented Aug 12, 2025

Uh oh!

DN6 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DN6 Aug 13, 2025

Uh oh!

sayakpaul Aug 13, 2025

Uh oh!

Uh oh!

		@@ -310,6 +311,130 @@ def load_model_dict_into_meta(
		return offload_index, state_dict_index


		def check_support_param_buffer_assignment(model_to_load, state_dict, start_prefix=""):

		return offload_index, state_dict_index, mismatched_keys, error_msgs


		def _find_mismatched_keys(



		def load_shard_files_with_threadpool(args_list):
		num_workers = int(os.environ.get("HF_PARALLEL_LOADING_WORKERS", "8"))

[core] parallel loading of shards #12028

[core] parallel loading of shards #12028

Conversation

sayakpaul commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

sayakpaul Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

sayakpaul Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

sayakpaul Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Jul 31, 2025

Uh oh!

Uh oh!

DN6 Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DN6 Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented Aug 12, 2025

Uh oh!

sayakpaul commented Aug 12, 2025

Uh oh!

DN6 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DN6 Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

sayakpaul Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sayakpaul commented Jul 31, 2025 •

edited

Loading

DN6 Aug 12, 2025 •

edited

Loading