Skip to content

NXP Backend: Add infrastructure for pre processing passes in edge dialect #13183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Pop-korn
Copy link
Contributor

@Pop-korn Pop-korn commented Aug 7, 2025

Summary

Add infrastructure for running pre-processing passes on edge dialect programs.
Add pre-processing pass to move view_copy nodes into their own QDQ clusters.

Test plan

Unit test provided.

Copy link

pytorch-bot bot commented Aug 7, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13183

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 35c8809 with merge base c8a0706 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 7, 2025
@Pop-korn
Copy link
Contributor Author

Pop-korn commented Aug 7, 2025

@pytorchbot label "release notes: nxp" module "nxp backend"

Copy link

pytorch-bot bot commented Aug 7, 2025

Didn't find following labels among repository labels: module,nxp backend

@pytorch-bot pytorch-bot bot added the release notes: nxp Changes to the NXP Neutron backend delegate label Aug 7, 2025
@robert-kalmar robert-kalmar marked this pull request as draft August 9, 2025 20:17
partitioner=[partitioner],
compile_config=EdgeCompileConfig(_check_ir_validity=False),
)
edge_program_manager = edge_program_manager.to_backend(partitioner)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have introduced back the use of export_to_edge and edge_program_manager.to_edge functions, which were replaced by the to_edge_transform_and_lower function. The passes we introduce (edge passes) shall run after the export to edge dialect but before partitioning. The to_edge_transform_and_lower have optional parameter transform_passes which I suppose serves this purpose.
@Pop-korn, please update the code to use this call in the executorch_pipeline.py and aot_neutron_compile.py if feasible. This shall also resolve the conflict you have with the main branch.
Also rebase to the main. Then please mark the PR as ready to merge.

Copy link
Contributor Author

@Pop-korn Pop-korn Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have resolved the conflict and re-based to main.

The to_edge_transform_and_lower parameter transform_passes only runs a single iteration of the passes. Therefore, if a pass makes a change that would allow an earlier pass to make another change, it will not get the chance to do so. The number of iterations is limited by the steps attribute of the PassManager, which cannot be set using the current interface. The only way to use a different steps value (that I know of) is to modify the class attribute directly (call PassManager.steps = 10 before calling to_edge_transform_and_lower), which is not god practice as it has a global effect.

I understand that the to_edge_transform_and_lower function is the preferred way for program lowering. So do we want to:

  1. Use to_edge_transform_and_lower and only run the passes once.
  2. Use to_edge_transform_and_lower and globally modify the steps class attribute, to run multiple iterations of passes.
  3. Use the current implementation. I.e. manually export to edge, run the passes ourselves (unsing PassManager(steps=10)), and then explicitly export to backend (with partitioning).

Which of these solutions is preferred? Or is there a different solution I am not seeing? @digantdesai


# The graph has now changed, and we shouldn't keep iterating through it. Return the new graph and the parent
# class will call this pass again.
return PassResult(graph_module, True)
Copy link
Collaborator

@robert-kalmar robert-kalmar Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Pop-korn, in principal this does not differ from the initial draft. You return everytime, you make a modification and the caller EdgePassManager starts a new iteration. Can you do all the graph modification and return then?

@digantdesai, @Pop-korn noticed in some of the passes the code iterates over a changing graph. E.g. https://github.com/pytorch/executorch/blob/main/backends/xnnpack/_passes/fuse_batch_norm.py#L41 the graph.nodes gets modified in-place in the https://github.com/pytorch/executorch/blob/main/backends/xnnpack/_passes/fuse_batch_norm.py#L208, everytime a match is found.
@Pop-korn, can you please confirm? @digantdesai Is this legit and intended?

Copy link
Contributor Author

@Pop-korn Pop-korn Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responding to the first paragraph:
We can safely to all the modifications at once, and then return. We would be modifying the list of nodes while iterating over it, but it shouldn't have negative side-effects as we are only inserting nodes. The downside is that is is not good practice. Do you think that implementation would be preferred?

The second paragraph:
As far as I can tell, the XNNPack batch_norm fusion pass does indeed modify the graph while iterating over it. When the pass removes a batch_norm node, the next iteration of the for loop will skip a node. If (somehow) the graph contained the sequence convolution -> batch_norm -> batch_norm, only the first batch_norm node would be fused, and the second one would be skipped.

…heir own QDQ clusters.

A Pytorch model can contain a `Linear` operator with 4D IO. After quantization, it gets its own QDQ cluster. But after lowering to edge, `view_copy` operators are added within the cluster, right before and after the `Linear` (now `addmm`/`mm`). This does not follow the QDQ schema and causes issues later down the pipeline. Therefore, pre-processing passes at the edge dialect level were implemented, to move the `view_copy` nodes into their own QDQ clusters.
@Pop-korn Pop-korn force-pushed the upstream/main-nxp/EIEX-409-upstream-infrastructure-for-pre-processing-passes-in-edge-dialect branch from 5daaed4 to 35c8809 Compare August 11, 2025 08:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: nxp Changes to the NXP Neutron backend delegate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants