Qualcomm AI Engine Direct - GA Static Phi-4-mini #13179

DannyYuyang-quic · 2025-08-07T09:25:02Z

Summary

Support Phi-4-mini-instruct for static llama path
add P-ROPE for phi-4-mini
add EOS tok for Phi-4-mini

Test plan

python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -s $DEVICE -m SM8750 --prompt "I would like to learn python, could you teach me with a simple example?" --temperature 0 --model_mode hybrid --prefill_ar_len 32 --max_seq_len 128 --ptq 16a8w --decoder_model phi_4_mini --num_sharding 4

cc: @haowhsu-quic, @shewu-quic, @winskuo-quic, @cccclai

pytorch-bot · 2025-08-07T09:25:06Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13179

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Cancelled Job

As of commit d6dda6c with merge base 18098a4 ():

NEW FAILURE - The following job has failed:

Build documentation / build (buck2) / Build doc (gh)
At least one of the pre-conditions you specified did not hold

CANCELLED JOB - The following job was cancelled. Please retry:

pull / unittest-editable / macos / macos-job (gh)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

DannyYuyang-quic · 2025-08-07T09:56:23Z

Hi @cccclai
This PR enables GA Phi4-mini-instruct for static llama path.
We will submit an optimized version ASAP.

Thanks!!

DannyYuyang-quic · 2025-08-07T10:02:44Z

examples/qualcomm/oss_scripts/llama/llama.py

+            # TODO: Encountered the following error during runtime, so switched behavior for now.
+            # Error: libc++abi: terminating due to uncaught exception of type std::runtime_error: invert=true is not supported for Split PreTokenizer. Only invert=false is supported.
+            data["pre_tokenizer"]["pretokenizers"][-2]["invert"] = False


Hi @cccclai,
And here is a temporary workaround for the tokenizer error:
invert=true is currently not supported for Split PreTokenizer, and only invert=false is allowed.
We would appreciate it if this could be fixed.

Thanks!

@DannyYuyang-quic is it particularly for phi tokenizer?

Thanks for the follow-up!
I’m not sure if tokenizers from other GA models would encounter the same issue, but based on what I’ve seen so far, models like Qwen2, Qwen3, and Gemma3 don’t seem to have the invert kwarg in their tokenizer.

DannyYuyang-quic · 2025-08-07T10:28:31Z

@pytorchbot label "release notes: qualcomm"

cccclai · 2025-08-07T17:17:44Z

examples/qualcomm/oss_scripts/llama/model/static_llama.py

+
+
+APPLY_ROPE_EMBEDDING_FUNCTIONS = {
+    "phi_4_mini": apply_partial_rotary_emb_single,


Can this stay in this folder? https://github.com/pytorch/executorch/tree/main/examples/models/phi_4_mini/config or is it qnn specific?

It's not qnn-specific.
apply_partial_rotary_emb_single(partial ROPE embedding) is required whenever the condition partial_rotary_factor < 1.0 is met.

I've updated the condition to explicitly check for partial_rotary_factor < 1.0, ensuring that partial ROPE is applied only when the condition is met.

oh wow...you're still awake...

Can we refer to the config in this folder #13086 instead of having phi specific logic inside static llama?

facebook-github-bot · 2025-08-07T19:05:41Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this in D79828234.

DannyYuyang-quic · 2025-08-08T01:59:28Z

examples/qualcomm/oss_scripts/llama/model/static_llama.py

+        if config.partial_rotary_factor < 1:
+            self.apply_rope_emb = apply_partial_rotary_emb_single
+        else:
+            self.apply_rope_emb = apply_rotary_emb_single


Hi @cccclai

Can we refer to the config in this folder #13086 instead of having phi specific logic inside static llama?

I've updated the condition here; static llama only depends on config file now.

cccclai · 2025-08-08T18:20:24Z

Looks like there is conflict, can you resolve?

Summary: - Support Phi-4-mini-instruct for static llama path - add P-ROPE for phi-4-mini - add EOS tok for Phi-4-mini

facebook-github-bot · 2025-08-09T17:32:05Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this in D79828234.

cccclai

Thank you for enabling phi4!

DannyYuyang-quic requested a review from cccclai as a code owner August 7, 2025 09:25

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 7, 2025

DannyYuyang-quic force-pushed the dev1/danny/GA_static_phi4-mini branch from d0fe026 to 76f4d50 Compare August 7, 2025 09:55

DannyYuyang-quic commented Aug 7, 2025

View reviewed changes

DannyYuyang-quic force-pushed the dev1/danny/GA_static_phi4-mini branch from 76f4d50 to a14610e Compare August 7, 2025 10:10

pytorch-bot bot added the release notes: qualcomm Changes to the Qualcomm backend delegate label Aug 7, 2025

DannyYuyang-quic force-pushed the dev1/danny/GA_static_phi4-mini branch from a14610e to 013d222 Compare August 7, 2025 16:02

cccclai reviewed Aug 7, 2025

View reviewed changes

DannyYuyang-quic force-pushed the dev1/danny/GA_static_phi4-mini branch from 013d222 to 371c80a Compare August 7, 2025 18:32

DannyYuyang-quic commented Aug 8, 2025

View reviewed changes

Qualcomm AI Engine Direct - GA Static Phi-4-mini

d6dda6c

Summary: - Support Phi-4-mini-instruct for static llama path - add P-ROPE for phi-4-mini - add EOS tok for Phi-4-mini

DannyYuyang-quic force-pushed the dev1/danny/GA_static_phi4-mini branch from 46f019a to d6dda6c Compare August 9, 2025 07:06

cccclai approved these changes Aug 10, 2025

View reviewed changes

cccclai merged commit c8a0706 into pytorch:main Aug 10, 2025
101 of 103 checks passed



		APPLY_ROPE_EMBEDDING_FUNCTIONS = {
		"phi_4_mini": apply_partial_rotary_emb_single,

Qualcomm AI Engine Direct - GA Static Phi-4-mini #13179

Qualcomm AI Engine Direct - GA Static Phi-4-mini #13179

Uh oh!

Conversation

DannyYuyang-quic commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

pytorch-bot bot commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13179

❌ 1 New Failure, 1 Cancelled Job

Uh oh!

DannyYuyang-quic commented Aug 7, 2025

Uh oh!

DannyYuyang-quic Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

cccclai Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

DannyYuyang-quic Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

DannyYuyang-quic commented Aug 7, 2025

Uh oh!

cccclai Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

DannyYuyang-quic Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cccclai Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

cccclai Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Aug 7, 2025

Uh oh!

DannyYuyang-quic Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

cccclai commented Aug 8, 2025

Uh oh!

facebook-github-bot commented Aug 9, 2025

Uh oh!

cccclai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

DannyYuyang-quic commented Aug 7, 2025 •

edited

Loading

pytorch-bot bot commented Aug 7, 2025 •

edited

Loading

DannyYuyang-quic Aug 7, 2025 •

edited

Loading