Skip to content

Qualcomm AI Engine Direct - GA Static Phi-4-mini #13179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 10, 2025

Conversation

DannyYuyang-quic
Copy link
Collaborator

@DannyYuyang-quic DannyYuyang-quic commented Aug 7, 2025

Summary

  • Support Phi-4-mini-instruct for static llama path
  • add P-ROPE for phi-4-mini
  • add EOS tok for Phi-4-mini

Test plan

python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -s $DEVICE -m SM8750 --prompt "I would like to learn python, could you teach me with a simple example?" --temperature 0 --model_mode hybrid --prefill_ar_len 32 --max_seq_len 128 --ptq 16a8w --decoder_model phi_4_mini --num_sharding 4

cc: @haowhsu-quic, @shewu-quic, @winskuo-quic, @cccclai

Copy link

pytorch-bot bot commented Aug 7, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13179

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Cancelled Job

As of commit d6dda6c with merge base 18098a4 (image):

NEW FAILURE - The following job has failed:

CANCELLED JOB - The following job was cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 7, 2025
@DannyYuyang-quic DannyYuyang-quic force-pushed the dev1/danny/GA_static_phi4-mini branch from d0fe026 to 76f4d50 Compare August 7, 2025 09:55
@DannyYuyang-quic
Copy link
Collaborator Author

Hi @cccclai
This PR enables GA Phi4-mini-instruct for static llama path.
We will submit an optimized version ASAP.

Thanks!!

Comment on lines 1192 to 1187
# TODO: Encountered the following error during runtime, so switched behavior for now.
# Error: libc++abi: terminating due to uncaught exception of type std::runtime_error: invert=true is not supported for Split PreTokenizer. Only invert=false is supported.
data["pre_tokenizer"]["pretokenizers"][-2]["invert"] = False
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cccclai,
And here is a temporary workaround for the tokenizer error:
invert=true is currently not supported for Split PreTokenizer, and only invert=false is allowed.
We would appreciate it if this could be fixed.

Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DannyYuyang-quic is it particularly for phi tokenizer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-up!
I’m not sure if tokenizers from other GA models would encounter the same issue, but based on what I’ve seen so far, models like Qwen2, Qwen3, and Gemma3 don’t seem to have the invert kwarg in their tokenizer.

@DannyYuyang-quic DannyYuyang-quic force-pushed the dev1/danny/GA_static_phi4-mini branch from 76f4d50 to a14610e Compare August 7, 2025 10:10
@DannyYuyang-quic
Copy link
Collaborator Author

@pytorchbot label "release notes: qualcomm"

@pytorch-bot pytorch-bot bot added the release notes: qualcomm Changes to the Qualcomm backend delegate label Aug 7, 2025
@DannyYuyang-quic DannyYuyang-quic force-pushed the dev1/danny/GA_static_phi4-mini branch from a14610e to 013d222 Compare August 7, 2025 16:02


APPLY_ROPE_EMBEDDING_FUNCTIONS = {
"phi_4_mini": apply_partial_rotary_emb_single,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

@DannyYuyang-quic DannyYuyang-quic Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not qnn-specific.
apply_partial_rotary_emb_single(partial ROPE embedding) is required whenever the condition partial_rotary_factor < 1.0 is met.

I've updated the condition to explicitly check for partial_rotary_factor < 1.0, ensuring that partial ROPE is applied only when the condition is met.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wow...you're still awake...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we refer to the config in this folder #13086 instead of having phi specific logic inside static llama?

@DannyYuyang-quic DannyYuyang-quic force-pushed the dev1/danny/GA_static_phi4-mini branch from 013d222 to 371c80a Compare August 7, 2025 18:32
@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this in D79828234.

Comment on lines +81 to +84
if config.partial_rotary_factor < 1:
self.apply_rope_emb = apply_partial_rotary_emb_single
else:
self.apply_rope_emb = apply_rotary_emb_single
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cccclai

Can we refer to the config in this folder #13086 instead of having phi specific logic inside static llama?

I've updated the condition here; static llama only depends on config file now.

@cccclai
Copy link
Contributor

cccclai commented Aug 8, 2025

Looks like there is conflict, can you resolve?

Summary:
 - Support Phi-4-mini-instruct for static llama path
 - add P-ROPE for phi-4-mini
 - add EOS tok for Phi-4-mini
@DannyYuyang-quic DannyYuyang-quic force-pushed the dev1/danny/GA_static_phi4-mini branch from 46f019a to d6dda6c Compare August 9, 2025 07:06
@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this in D79828234.

Copy link
Contributor

@cccclai cccclai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for enabling phi4!

@cccclai cccclai merged commit c8a0706 into pytorch:main Aug 10, 2025
101 of 103 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: qualcomm Changes to the Qualcomm backend delegate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants