ONNX Runtime improvements (experimental native webgpu; fix iOS) #1231

fs-eire · 2025-03-13T20:33:48Z

This change allows using WebGPU in transformers.js with ORT Node.js binding.

Still doing testing (while the tests need this change)

AdamStrojek · 2025-03-16T11:14:58Z

Wouldn't it be better to do the same thing as it is done in Onnx Runtime Web?

    if (apis.IS_WEBGPU_AVAILABLE) {
        supportedDevices.push('webgpu');
    }

Electron applications can have WebGPU enabled when terminal Node not. Also onnx-runtime-node provides only backers for native modules, when onnx-runtime-web have bindings for WebGPU, so just adding supported devices will not work without switching runtime

fs-eire · 2025-03-16T20:07:58Z

If I remember it correctly, IS_WEBGPU_AVAILABLE is checked against nagivator.gpu, which is only available in browser.

For electron, the rendering process is actually a "web" environment instead of "node"

AdamStrojek · 2025-03-16T20:44:10Z

Yes, you are correct, IS_WEBGPU_AVAILABLE is just a simple check against navigation.gpu. In theory, it is possible to install a 3rd-party package for WebGPU support in Node, but it is a complicated topic. Still, my comment is valid; I copied my example from a few lines higher in the same source file.

I recently did tests. Unfortunately, transformers.js are not detecting Electron applications correctly and mark them as Node applications, so it provides only CPU. I had a lot of trouble getting it running in an Electron app. Mostly, it was picky about path and fs packages. If I changed the target platform to Node, it generated other problems. I'm preparing a new issue report for developers with my findings.

I already did tests with your branch, and this simple change didn’t enable WebGPU in Electron apps.

fs-eire · 2025-04-18T23:29:48Z

Updated the version of onnxruntime-node to 1.22.0-dev.20250418-c19a49615b. This version supports WebGPU on Windows and macOS.

xenova · 2025-04-19T00:18:20Z

Wow thanks @fs-eire! Very exciting!!! Does the browser package https://www.npmjs.com/package/onnxruntime-web/v/1.22.0-dev.20250418-c19a49615b release also add anything of significance?

fs-eire · 2025-04-19T00:38:11Z

Wow thanks @fs-eire! Very exciting!!! Does the browser package https://www.npmjs.com/package/onnxruntime-web/v/1.22.0-dev.20250418-c19a49615b release also add anything of significance?

No.

BTW for WebGPU EP support in onnxruntime-web : There are still some perf issue for using WebGPU EP in a WebAssembly build. If you want to do conformance test only for WebGPU EP (eg. check correctness but not latency), I can offer you a private build of onnxruntime-web with WebGPU EP.

xenova · 2025-04-19T01:14:17Z

That would be great! Feel free to send via slack perhaps? Eventually, we can hook this into the Transformers.js CI to ensure correctness across all supported architectures.

xenova · 2025-04-19T17:48:04Z

I've been testing the webgpu EP for some llama/qwen models, and running into a few correctness issues.

Here's some code to help test/debug:

import { pipeline, TextStreamer } from "@huggingface/transformers";

// Create a text generation pipeline
const generator = await pipeline(
  "text-generation",
  "onnx-community/ZR1-1.5B-ONNX",
  { dtype: "q4f16", device: "webgpu" }, // device="cpu" works fine
);

// Define the list of messages
const messages = [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Write me a poem about Machine Learning." },
];

// Generate a response
const output = await generator(messages, {
    max_new_tokens: 512,
    do_sample: false,
    streamer: new TextStreamer(generator.tokenizer, { skip_prompt: true, skip_special_tokens: true}),
});
console.log(output[0].generated_text.at(-1).content);

xenova · 2025-04-19T18:25:07Z

I can confirm that q4 (instead of q4f16) works correctly, so it looks to be an issue with the f16 implementation.

guschmue · 2025-04-21T18:04:01Z

I can confirm that q4 (instead of q4f16) works correctly, so it looks to be an issue with the f16 implementation.

for webgpu-ep / DeepSeek-R1-Distill-Qwen-1.5B we know about some open issue when GQA takes the FA2 path.
Don't happen on all GPU's but I can reproduce it on nvidia.

If ZR1-1.5B-ONNX is similar to DeepSeek-R1-Distill-Qwen-1.5B, might be the same. Not tried DeepSeek-R1-Distill-Qwen-1.5B with fp32. Let me check on this.

guschmue · 2025-04-21T21:59:06Z

looks like the same issue as deepseek when GQA uses FA2 with fp16. fp32 seems ok.
I'll put this high on my list to look at.

xenova · 2025-04-22T00:40:44Z

Great, thanks @guschmue!

xenova · 2025-04-25T22:44:38Z

I'm accumulating all these changes into https://github.com/huggingface/transformers.js/tree/ort-improvements to make development and testing a bit easier (many version bumps and ort-specific changes)

* ONNX Runtime improvements (experimental native webgpu; fix iOS) (#1231) * customize the wasm paths * update implementation * allow using 'webgpu' in nodejs binding * update version of onnxruntime-node * Upgrade onnxruntime-web to same version as onnxruntime-node * Update list of supported devices --------- Co-authored-by: Joshua Lochner <[email protected]> * customize the wasm paths (#1250) * customize the wasm paths * update implementation * [internal] Add is_decoder option to session retrieval for preferred output location * Update tests * Formatting * Bump ort versions * Bump onnxruntime-node version * Bump versions * Bump ORT versions * Bump versions * Only check webgpu fp16 for non-node environments * Fix * Assume node supports webgpu * Update ORT node support comment * Relax test strictness * Update conversion script versions * Downgrade onnxslim * cleanup * Update package-lock.json * Update onnxruntime versions * Update post-build script * Use built-in session release function * Call garbage collection after each tokenizer test * Do not double-throw error * Fix race-condition in build process with file removal * Update versions * Bump jinja version * [version] Update to 3.6.3 * Bump jinja version to support new features * [version] Update to 3.6.3 * Add support for LFM2 models (#1367) * Use prefix in lfm2 output location (#1369) * Update package-lock.json * Run `npm audit fix` * Add special tokens in text-generation pipeline if tokenizer requires (#1370) * Add special tokens in text-generation pipeline if tokenizer requires * Fix logits processors tests * Update bundles.test.js * Update comment * Formatting * Add support for ModernBERT Decoder (#1371) * Use from/to buffer instead of string Actually fixes #1343 * Add support for Voxtral (#1373) * Support longform voxtral processing (#1375) * [version] Update to 3.7.0 * Add support for Arcee (#1377) * Optimize tensor.slice() (#1381) * Optimize tensor.slice() The performance of executing `tensor.slice()` is super poor, especially for the 'logits' tensor with large dimensions. ``` const logits = outputs.logits.slice(null, -1, null);` ``` This is because currently implementation of the `slice` method manually iterates through each element and calculate indices which is a big time consuming if the tensor shape is large. For cases like `slice(null, -1, null)`, where the slicing operation is contiguous along certain dimensions, which can be optimized by bulk copy by using `TypeArray.subarray()` and `TypeArray.set()`. * nit * Add a few more tensor slice unit tests --------- Co-authored-by: Joshua Lochner <[email protected]> --------- Co-authored-by: Yulong Wang <[email protected]> Co-authored-by: Wanming Lin <[email protected]>

fs-eire added 3 commits March 21, 2025 10:30

customize the wasm paths

c39f3dc

update implementation

ea2b574

allow using 'webgpu' in nodejs binding

f15e632

fs-eire force-pushed the fs-eire/nodejs-support-native-webgpu-ep branch from a536b8d to 2dbde16 Compare April 18, 2025 23:26

update version of onnxruntime-node

6cfeec3

fs-eire force-pushed the fs-eire/nodejs-support-native-webgpu-ep branch from 2dbde16 to 6cfeec3 Compare April 18, 2025 23:26

Upgrade onnxruntime-web to same version as onnxruntime-node

0c3bc8d

AdamStrojek mentioned this pull request Apr 19, 2025

Allow to choose ONNX Runtime in Electron App #1240

Open

Update list of supported devices

751e702

xenova mentioned this pull request Apr 25, 2025

customize the wasm paths #1250

Merged

Merge branch 'pr/1250' into pr/1231

8f4cc0c

xenova changed the title ~~[WIP] allow using 'webgpu' in nodejs binding~~ ONNX Runtime improvements (experimental native webgpu; fix iOS) Apr 25, 2025

xenova changed the base branch from main to ort-improvements April 25, 2025 22:43

xenova marked this pull request as ready for review April 25, 2025 22:43

xenova merged commit 747a04d into huggingface:ort-improvements Apr 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ONNX Runtime improvements (experimental native webgpu; fix iOS) #1231

ONNX Runtime improvements (experimental native webgpu; fix iOS) #1231

Uh oh!

fs-eire commented Mar 13, 2025 •

edited by xenova

Loading

Uh oh!

AdamStrojek commented Mar 16, 2025

Uh oh!

fs-eire commented Mar 16, 2025

Uh oh!

AdamStrojek commented Mar 16, 2025

Uh oh!

fs-eire commented Apr 18, 2025

Uh oh!

xenova commented Apr 19, 2025 •

edited

Loading

Uh oh!

fs-eire commented Apr 19, 2025

Uh oh!

xenova commented Apr 19, 2025

Uh oh!

xenova commented Apr 19, 2025 •

edited

Loading

Uh oh!

xenova commented Apr 19, 2025

Uh oh!

guschmue commented Apr 21, 2025

Uh oh!

guschmue commented Apr 21, 2025

Uh oh!

xenova commented Apr 22, 2025

Uh oh!

xenova commented Apr 25, 2025

Uh oh!

Uh oh!

ONNX Runtime improvements (experimental native webgpu; fix iOS) #1231

ONNX Runtime improvements (experimental native webgpu; fix iOS) #1231

Uh oh!

Conversation

fs-eire commented Mar 13, 2025 • edited by xenova Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AdamStrojek commented Mar 16, 2025

Uh oh!

fs-eire commented Mar 16, 2025

Uh oh!

AdamStrojek commented Mar 16, 2025

Uh oh!

fs-eire commented Apr 18, 2025

Uh oh!

xenova commented Apr 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fs-eire commented Apr 19, 2025

Uh oh!

xenova commented Apr 19, 2025

Uh oh!

xenova commented Apr 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xenova commented Apr 19, 2025

Uh oh!

guschmue commented Apr 21, 2025

Uh oh!

guschmue commented Apr 21, 2025

Uh oh!

xenova commented Apr 22, 2025

Uh oh!

xenova commented Apr 25, 2025

Uh oh!

Uh oh!

fs-eire commented Mar 13, 2025 •

edited by xenova

Loading

xenova commented Apr 19, 2025 •

edited

Loading

xenova commented Apr 19, 2025 •

edited

Loading