[ET-VK] Add mechanism to trigger command buffer re-encode only when necessary #13184

SS-JIA · 2025-08-07T16:04:59Z

Stack from ghstack (oldest at bottom):

Context

Dynamic shape models currently will require the command buffer to be re-encoded every inference. However, this introduces a significant overhead when running models that require dynamic shapes.

The reality is that a command buffer re-encode may not be needed every frame. A command buffer re-encode will only be needed when:

Shader dispatch parameters change; i.e. new tensor sizes require a completely different compute shader, require new local work group sizing, or require new work group grid size (i.e. global work group size / local work group size)
Push constants containing tensor metadata need to be updated

This diff aims to reduce the overhead of triggering tensor shape change by detecting when a command buffer re-encode is actually needed.

Changes

ComputeGraph:

Introduce requires_reencode flag to ComputeGraph to indicate when a command buffer re-encode is needed.
Introduce a std::set<ValueRef> tracking which values were updated when propagating tensor sizes
- "update" can be one of two things: 1) tensor sizes changed 2) symint value changed

DispatchNode:

When propagating new tensor sizes, only execute the resize function if any of the values participating in the computation have been updated
Mark requries_reencode if any push constants associated with tensor metadata need to be udpated

DynamicDispatchNode:

Only recompute compute shader dispatch params if any of the values participating in the computation have been updated
Mark requires_reencode if 1) a new compute shader is required, 2) local work group size changed, 3) work group grid size changed

Differential Revision: D79813237

…ecessary ## Context Dynamic shape models currently will require the command buffer to be re-encoded every inference. However, this introduces a significant overhead when running models that require dynamic shapes. The reality is that a command buffer re-encode may not be needed every frame. A command buffer re-encode will only be needed when: 1. Shader dispatch parameters change; i.e. new tensor sizes require a completely different compute shader, require new local work group sizing, or require new work group grid size (i.e. global work group size / local work group size) 2. Push constants containing tensor metadata need to be updated This diff aims to reduce the overhead of triggering tensor shape change by detecting when a command buffer re-encode is actually needed. ## Changes `ComputeGraph`: * Introduce `requires_reencode` flag to `ComputeGraph` to indicate when a command buffer re-encode is needed. * Introduce a `std::set<ValueRef>` tracking which values were updated when propagating tensor sizes * "update" can be one of two things: 1) tensor sizes changed 2) symint value changed `DispatchNode`: * When propagating new tensor sizes, only execute the resize function if any of the values participating in the computation have been updated * Mark `requries_reencode` if any push constants associated with tensor metadata need to be udpated `DynamicDispatchNode`: * Only recompute compute shader dispatch params if any of the values participating in the computation have been updated * Mark `requires_reencode` if 1) a new compute shader is required, 2) local work group size changed, 3) work group grid size changed Differential Revision: [D79813237](https://our.internmc.facebook.com/intern/diff/D79813237/) [ghstack-poisoned]

pytorch-bot · 2025-08-07T16:05:02Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13184

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 4 Unrelated Failures

As of commit e90389a with merge base b36d6b6 ():

NEW FAILURES - The following jobs have failed:

Build documentation / build (buck2) / Build doc (gh)
At least one of the pre-conditions you specified did not hold
pull / unittest-arm-backend-with-no-fvp (test_pytest_ops) / linux-job (gh)
RuntimeError: Command docker exec -t 2ada0333b3e9c39a1a8fe80899f98d0ce9e5c86dd793474be2c5f6dc446c700d /exec failed with exit code 1
pull / unittest-editable / macos / macos-job (gh)
backends/xnnpack/test/recipes/test_xnnpack_recipes.py::TestXnnpackRecipes::test_8a4w_recipe

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / linux / linux-job (gh) (trunk failure)
examples/models/llama/tests/test_ring_attention.py::TestRingAttention::test_single_token_processing_quantized
pull / unittest / macos / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
pull / unittest-arm-backend-with-no-fvp (test_pytest_models) / linux-job (gh) (trunk failure)
backends/arm/test/models/stable_diffusion/test_vae_AutoencoderKL.py::TestAutoencoderKL::test_AutoencoderKL_tosa_MI
pull / unittest-editable / linux / linux-job (gh) (trunk failure)
examples/models/llama/tests/test_ring_attention.py::TestRingAttention::test_single_token_processing_quantized

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-08-07T16:05:18Z

This pull request was exported from Phabricator. Differential Revision: D79813237

github-actions · 2025-08-07T16:05:52Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

backends/vulkan/runtime/graph/ComputeGraph.h

backends/vulkan/runtime/graph/containers/PushConstantData.h

backends/vulkan/runtime/graph/ops/ExecuteNode.h

…only when necessary" ## Context Dynamic shape models currently will require the command buffer to be re-encoded every inference. However, this introduces a significant overhead when running models that require dynamic shapes. The reality is that a command buffer re-encode may not be needed every frame. A command buffer re-encode will only be needed when: 1. Shader dispatch parameters change; i.e. new tensor sizes require a completely different compute shader, require new local work group sizing, or require new work group grid size (i.e. global work group size / local work group size) 2. Push constants containing tensor metadata need to be updated This diff aims to reduce the overhead of triggering tensor shape change by detecting when a command buffer re-encode is actually needed. ## Changes `ComputeGraph`: * Introduce `requires_reencode` flag to `ComputeGraph` to indicate when a command buffer re-encode is needed. * Introduce a `std::set<ValueRef>` tracking which values were updated when propagating tensor sizes * "update" can be one of two things: 1) tensor sizes changed 2) symint value changed `DispatchNode`: * When propagating new tensor sizes, only execute the resize function if any of the values participating in the computation have been updated * Mark `requries_reencode` if any push constants associated with tensor metadata need to be udpated `DynamicDispatchNode`: * Only recompute compute shader dispatch params if any of the values participating in the computation have been updated * Mark `requires_reencode` if 1) a new compute shader is required, 2) local work group size changed, 3) work group grid size changed Differential Revision: [D79813237](https://our.internmc.facebook.com/intern/diff/D79813237/) [ghstack-poisoned]

…ecessary Pull Request resolved: #13184 ## Context Dynamic shape models currently will require the command buffer to be re-encoded every inference. However, this introduces a significant overhead when running models that require dynamic shapes. The reality is that a command buffer re-encode may not be needed every frame. A command buffer re-encode will only be needed when: 1. Shader dispatch parameters change; i.e. new tensor sizes require a completely different compute shader, require new local work group sizing, or require new work group grid size (i.e. global work group size / local work group size) 2. Push constants containing tensor metadata need to be updated This diff aims to reduce the overhead of triggering tensor shape change by detecting when a command buffer re-encode is actually needed. ## Changes `ComputeGraph`: * Introduce `requires_reencode` flag to `ComputeGraph` to indicate when a command buffer re-encode is needed. * Introduce a `std::set<ValueRef>` tracking which values were updated when propagating tensor sizes * "update" can be one of two things: 1) tensor sizes changed 2) symint value changed `DispatchNode`: * When propagating new tensor sizes, only execute the resize function if any of the values participating in the computation have been updated * Mark `requries_reencode` if any push constants associated with tensor metadata need to be udpated `DynamicDispatchNode`: * Only recompute compute shader dispatch params if any of the values participating in the computation have been updated * Mark `requires_reencode` if 1) a new compute shader is required, 2) local work group size changed, 3) work group grid size changed ghstack-source-id: 302101273 @exported-using-ghexport Differential Revision: [D79813237](https://our.internmc.facebook.com/intern/diff/D79813237/)

facebook-github-bot · 2025-08-11T13:54:30Z

This pull request was exported from Phabricator. Differential Revision: D79813237

…only when necessary" ## Context Dynamic shape models currently will require the command buffer to be re-encoded every inference. However, this introduces a significant overhead when running models that require dynamic shapes. The reality is that a command buffer re-encode may not be needed every frame. A command buffer re-encode will only be needed when: 1. Shader dispatch parameters change; i.e. new tensor sizes require a completely different compute shader, require new local work group sizing, or require new work group grid size (i.e. global work group size / local work group size) 2. Push constants containing tensor metadata need to be updated This diff aims to reduce the overhead of triggering tensor shape change by detecting when a command buffer re-encode is actually needed. ## Changes `ComputeGraph`: * Introduce `requires_reencode` flag to `ComputeGraph` to indicate when a command buffer re-encode is needed. * Introduce a `std::set<ValueRef>` tracking which values were updated when propagating tensor sizes * "update" can be one of two things: 1) tensor sizes changed 2) symint value changed `DispatchNode`: * When propagating new tensor sizes, only execute the resize function if any of the values participating in the computation have been updated * Mark `requries_reencode` if any push constants associated with tensor metadata need to be udpated `DynamicDispatchNode`: * Only recompute compute shader dispatch params if any of the values participating in the computation have been updated * Mark `requires_reencode` if 1) a new compute shader is required, 2) local work group size changed, 3) work group grid size changed Differential Revision: [D79813237](https://our.internmc.facebook.com/intern/diff/D79813237/) [ghstack-poisoned]

…ecessary Pull Request resolved: #13184 ## Context Dynamic shape models currently will require the command buffer to be re-encoded every inference. However, this introduces a significant overhead when running models that require dynamic shapes. The reality is that a command buffer re-encode may not be needed every frame. A command buffer re-encode will only be needed when: 1. Shader dispatch parameters change; i.e. new tensor sizes require a completely different compute shader, require new local work group sizing, or require new work group grid size (i.e. global work group size / local work group size) 2. Push constants containing tensor metadata need to be updated This diff aims to reduce the overhead of triggering tensor shape change by detecting when a command buffer re-encode is actually needed. ## Changes `ComputeGraph`: * Introduce `requires_reencode` flag to `ComputeGraph` to indicate when a command buffer re-encode is needed. * Introduce a `std::set<ValueRef>` tracking which values were updated when propagating tensor sizes * "update" can be one of two things: 1) tensor sizes changed 2) symint value changed `DispatchNode`: * When propagating new tensor sizes, only execute the resize function if any of the values participating in the computation have been updated * Mark `requries_reencode` if any push constants associated with tensor metadata need to be udpated `DynamicDispatchNode`: * Only recompute compute shader dispatch params if any of the values participating in the computation have been updated * Mark `requires_reencode` if 1) a new compute shader is required, 2) local work group size changed, 3) work group grid size changed ghstack-source-id: 302596078 @exported-using-ghexport Differential Revision: [D79813237](https://our.internmc.facebook.com/intern/diff/D79813237/)

facebook-github-bot · 2025-08-13T01:18:41Z

This pull request was exported from Phabricator. Differential Revision: D79813237

…only when necessary" ## Context Dynamic shape models currently will require the command buffer to be re-encoded every inference. However, this introduces a significant overhead when running models that require dynamic shapes. The reality is that a command buffer re-encode may not be needed every frame. A command buffer re-encode will only be needed when: 1. Shader dispatch parameters change; i.e. new tensor sizes require a completely different compute shader, require new local work group sizing, or require new work group grid size (i.e. global work group size / local work group size) 2. Push constants containing tensor metadata need to be updated This diff aims to reduce the overhead of triggering tensor shape change by detecting when a command buffer re-encode is actually needed. ## Changes `ComputeGraph`: * Introduce `requires_reencode` flag to `ComputeGraph` to indicate when a command buffer re-encode is needed. * Introduce a `std::set<ValueRef>` tracking which values were updated when propagating tensor sizes * "update" can be one of two things: 1) tensor sizes changed 2) symint value changed `DispatchNode`: * When propagating new tensor sizes, only execute the resize function if any of the values participating in the computation have been updated * Mark `requries_reencode` if any push constants associated with tensor metadata need to be udpated `DynamicDispatchNode`: * Only recompute compute shader dispatch params if any of the values participating in the computation have been updated * Mark `requires_reencode` if 1) a new compute shader is required, 2) local work group size changed, 3) work group grid size changed Differential Revision: [D79813237](https://our.internmc.facebook.com/intern/diff/D79813237/) [ghstack-poisoned]

…ecessary Pull Request resolved: #13184 ## Context Dynamic shape models currently will require the command buffer to be re-encoded every inference. However, this introduces a significant overhead when running models that require dynamic shapes. The reality is that a command buffer re-encode may not be needed every frame. A command buffer re-encode will only be needed when: 1. Shader dispatch parameters change; i.e. new tensor sizes require a completely different compute shader, require new local work group sizing, or require new work group grid size (i.e. global work group size / local work group size) 2. Push constants containing tensor metadata need to be updated This diff aims to reduce the overhead of triggering tensor shape change by detecting when a command buffer re-encode is actually needed. ## Changes `ComputeGraph`: * Introduce `requires_reencode` flag to `ComputeGraph` to indicate when a command buffer re-encode is needed. * Introduce a `std::set<ValueRef>` tracking which values were updated when propagating tensor sizes * "update" can be one of two things: 1) tensor sizes changed 2) symint value changed `DispatchNode`: * When propagating new tensor sizes, only execute the resize function if any of the values participating in the computation have been updated * Mark `requries_reencode` if any push constants associated with tensor metadata need to be udpated `DynamicDispatchNode`: * Only recompute compute shader dispatch params if any of the values participating in the computation have been updated * Mark `requires_reencode` if 1) a new compute shader is required, 2) local work group size changed, 3) work group grid size changed ghstack-source-id: 302703876 @exported-using-ghexport Differential Revision: [D79813237](https://our.internmc.facebook.com/intern/diff/D79813237/)

facebook-github-bot · 2025-08-13T13:54:27Z

This pull request was exported from Phabricator. Differential Revision: D79813237

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 7, 2025

SS-JIA mentioned this pull request Aug 7, 2025

[ET-VK] Better work group sizes for matmul #13185

Merged

facebook-github-bot added the fb-exported label Aug 7, 2025

msluszniak reviewed Aug 7, 2025

View reviewed changes

backends/vulkan/runtime/graph/ComputeGraph.h Outdated Show resolved Hide resolved

backends/vulkan/runtime/graph/containers/PushConstantData.h Outdated Show resolved Hide resolved

backends/vulkan/runtime/graph/ops/ExecuteNode.h Outdated Show resolved Hide resolved

SS-JIA mentioned this pull request Aug 11, 2025

[Vulkan] Improve LLM Prefill Performance #12920

Open

andreanicastro approved these changes Aug 13, 2025

View reviewed changes

facebook-github-bot merged commit a64208e into gh/SS-JIA/271/base Aug 13, 2025
98 of 106 checks passed

facebook-github-bot deleted the gh/SS-JIA/271/head branch August 13, 2025 17:52

facebook-github-bot temporarily deployed to cherry-pick-bot August 13, 2025 17:52 — with GitHub Actions Inactive

pytorchbot mentioned this pull request Aug 13, 2025

[ET-VK] Add mechanism to trigger command buffer re-encode only when necessary #13379

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ET-VK] Add mechanism to trigger command buffer re-encode only when necessary #13184

[ET-VK] Add mechanism to trigger command buffer re-encode only when necessary #13184

Uh oh!

SS-JIA commented Aug 7, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 7, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Aug 7, 2025

Uh oh!

github-actions bot commented Aug 7, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Aug 11, 2025

Uh oh!

facebook-github-bot commented Aug 13, 2025

Uh oh!

facebook-github-bot commented Aug 13, 2025

Uh oh!

Uh oh!

Uh oh!

[ET-VK] Add mechanism to trigger command buffer re-encode only when necessary #13184

[ET-VK] Add mechanism to trigger command buffer re-encode only when necessary #13184

Uh oh!

Conversation

SS-JIA commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Changes

Uh oh!

pytorch-bot bot commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13184

❌ 3 New Failures, 4 Unrelated Failures

Uh oh!

facebook-github-bot commented Aug 7, 2025

Uh oh!

github-actions bot commented Aug 7, 2025

This PR needs a release notes: label

Uh oh!

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Aug 11, 2025

Uh oh!

facebook-github-bot commented Aug 13, 2025

Uh oh!

facebook-github-bot commented Aug 13, 2025

Uh oh!

Uh oh!

Uh oh!

SS-JIA commented Aug 7, 2025 •

edited

Loading

pytorch-bot bot commented Aug 7, 2025 •

edited

Loading

This PR needs a `release notes:` label