Skip to content

Commit 1d08e91

Browse files
authored
Add support for LFM2 models (#1367)
1 parent 467f59c commit 1d08e91

File tree

5 files changed

+101
-10
lines changed

5 files changed

+101
-10
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -355,6 +355,7 @@ You can refine your search by selecting the task you're interested in (e.g., [te
355355
1. **JinaCLIP** (from Jina AI) released with the paper [Jina CLIP: Your CLIP Model Is Also Your Text Retriever](https://huggingface.co/papers/2405.20204) by Andreas Koukounas, Georgios Mastrapas, Michael Günther, Bo Wang, Scott Martens, Isabelle Mohr, Saba Sturua, Mohammad Kalim Akram, Joan Fontanals Martínez, Saahil Ognawala, Susana Guzman, Maximilian Werk, Nan Wang, Han Xiao.
356356
1. **LiteWhisper** (from University of Washington, Kotoba Technologies) released with the paper [LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation](https://huggingface.co/papers/2502.20583) by Keisuke Kamahori, Jungo Kasai, Noriyuki Kojima, Baris Kasikci.
357357
1. **[LongT5](https://huggingface.co/docs/transformers/model_doc/longt5)** (from Google AI) released with the paper [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://huggingface.co/papers/2112.07916) by Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang.
358+
1. **[LFM2](https://huggingface.co/docs/transformers/model_doc/lfm2)** (from Liquid AI) released with the blog post [Introducing LFM2: The Fastest On-Device Foundation Models on the Market](https://www.liquid.ai/blog/liquid-foundation-models-v2-our-second-series-of-generative-ai-models) by the Liquid AI Team.
358359
1. **[LLaMA](https://huggingface.co/docs/transformers/model_doc/llama)** (from The FAIR team of Meta AI) released with the paper [LLaMA: Open and Efficient Foundation Language Models](https://huggingface.co/papers/2302.13971) by Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample.
359360
1. **[Llama2](https://huggingface.co/docs/transformers/model_doc/llama2)** (from The FAIR team of Meta AI) released with the paper [Llama2: Open Foundation and Fine-Tuned Chat Models](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/XXX) by Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushka rMishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing EllenTan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, Thomas Scialom.
360361
1. **[LLaVa](https://huggingface.co/docs/transformers/model_doc/llava)** (from Microsoft Research & University of Wisconsin-Madison) released with the paper [Visual Instruction Tuning](https://huggingface.co/papers/2304.08485) by Haotian Liu, Chunyuan Li, Yuheng Li and Yong Jae Lee.

docs/snippets/6_supported-models.snippet

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,7 @@
6969
1. **JinaCLIP** (from Jina AI) released with the paper [Jina CLIP: Your CLIP Model Is Also Your Text Retriever](https://huggingface.co/papers/2405.20204) by Andreas Koukounas, Georgios Mastrapas, Michael Günther, Bo Wang, Scott Martens, Isabelle Mohr, Saba Sturua, Mohammad Kalim Akram, Joan Fontanals Martínez, Saahil Ognawala, Susana Guzman, Maximilian Werk, Nan Wang, Han Xiao.
7070
1. **LiteWhisper** (from University of Washington, Kotoba Technologies) released with the paper [LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation](https://huggingface.co/papers/2502.20583) by Keisuke Kamahori, Jungo Kasai, Noriyuki Kojima, Baris Kasikci.
7171
1. **[LongT5](https://huggingface.co/docs/transformers/model_doc/longt5)** (from Google AI) released with the paper [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://huggingface.co/papers/2112.07916) by Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang.
72+
1. **[LFM2](https://huggingface.co/docs/transformers/model_doc/lfm2)** (from Liquid AI) released with the blog post [Introducing LFM2: The Fastest On-Device Foundation Models on the Market](https://www.liquid.ai/blog/liquid-foundation-models-v2-our-second-series-of-generative-ai-models) by the Liquid AI Team.
7273
1. **[LLaMA](https://huggingface.co/docs/transformers/model_doc/llama)** (from The FAIR team of Meta AI) released with the paper [LLaMA: Open and Efficient Foundation Language Models](https://huggingface.co/papers/2302.13971) by Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample.
7374
1. **[Llama2](https://huggingface.co/docs/transformers/model_doc/llama2)** (from The FAIR team of Meta AI) released with the paper [Llama2: Open Foundation and Fine-Tuned Chat Models](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/XXX) by Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushka rMishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing EllenTan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, Thomas Scialom.
7475
1. **[LLaVa](https://huggingface.co/docs/transformers/model_doc/llava)** (from Microsoft Research & University of Wisconsin-Madison) released with the paper [Visual Instruction Tuning](https://huggingface.co/papers/2304.08485) by Haotian Liu, Chunyuan Li, Yuheng Li and Yong Jae Lee.

src/configs.js

Lines changed: 29 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,7 @@ function getNormalizedConfig(config) {
109109
mapping['hidden_size'] = 'hidden_size';
110110
break;
111111
case 'llama':
112+
case 'lfm2':
112113
case 'smollm3':
113114
case 'olmo':
114115
case 'olmo2':
@@ -261,9 +262,35 @@ function getNormalizedConfig(config) {
261262
* @param {PretrainedConfig} config
262263
* @returns {Record<string, number[]>}
263264
*/
264-
export function getKeyValueShapes(config, {
265+
export function getCacheShapes(config, options) {
266+
if (config.model_type === 'lfm2') {
267+
// Custom caching mechanism for LFM2
268+
/** @type {Record<string, number[]>} */
269+
const cache_values = {};
270+
// @ts-expect-error TS2339
271+
const { layer_types, num_attention_heads, num_key_value_heads, hidden_size, conv_L_cache } = config;
272+
const head_dim = hidden_size / num_attention_heads;
273+
const batch_size = options?.batch_size ?? 1;
274+
for (let i = 0; i < layer_types.length; ++i) {
275+
if (layer_types[i] === 'full_attention') {
276+
for (const kv of ['key', 'value']) {
277+
cache_values[`past_key_values.${i}.${kv}`] = [batch_size, num_key_value_heads, 0, head_dim];
278+
}
279+
} else if (layer_types[i] === 'conv') {
280+
cache_values[`past_conv.${i}`] = [batch_size, hidden_size, conv_L_cache];
281+
} else {
282+
throw new Error(`Unsupported layer type: ${layer_types[i]}`);
283+
}
284+
}
285+
return cache_values;
286+
}
287+
return getKeyValueShapes(config, options);
288+
}
289+
290+
/** @type {typeof getKeyValueShapes} */
291+
function getKeyValueShapes(config, {
265292
prefix = 'past_key_values',
266-
batch_size=1,
293+
batch_size = 1,
267294
} = {}) {
268295
/** @type {Record<string, number[]>} */
269296
const decoderFeeds = {};

src/models.js

Lines changed: 19 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@
4040

4141
import {
4242
AutoConfig,
43-
getKeyValueShapes,
43+
getCacheShapes,
4444
} from './configs.js';
4545

4646
import {
@@ -318,7 +318,7 @@ async function getSession(pretrained_model_name_or_path, fileName, options) {
318318
}
319319

320320
if (selectedDevice === 'webgpu') {
321-
const shapes = getKeyValueShapes(options.config, {
321+
const shapes = getCacheShapes(options.config, {
322322
prefix: 'present',
323323
});
324324
if (Object.keys(shapes).length > 0 && !isONNXProxy()) {
@@ -1960,7 +1960,9 @@ export class PreTrainedModel extends Callable {
19601960

19611961
for (const name in decoderResults) {
19621962
if (name.startsWith('present')) {
1963-
const newName = name.replace('present', 'past_key_values');
1963+
const newName = name
1964+
.replace('present_conv', 'past_conv') // Hybrid cache architecture (e.g., LFM2)
1965+
.replace('present', 'past_key_values');
19641966
const is_encoder_pkv = name.includes('encoder');
19651967
if (is_encoder_pkv && pastKeyValues) {
19661968
// Optimization introduced by optimum to reuse past key values.
@@ -2017,14 +2019,14 @@ export class PreTrainedModel extends Callable {
20172019
Object.assign(decoderFeeds, pastKeyValues)
20182020
} else {
20192021
const session = this.sessions['decoder_model_merged'] ?? this.sessions['model'];
2020-
const dtype = session?.config?.kv_cache_dtype ?? 'float32';
2021-
const empty = (dtype === 'float16') ? new DataTypeMap.float16() : [];
2022-
20232022
const batch_size = (decoderFeeds[this.main_input_name] ?? decoderFeeds.attention_mask)?.dims?.[0] ?? 1;
2024-
const shapes = getKeyValueShapes(this.config, { batch_size });
20252023

2024+
const dtype = session?.config?.kv_cache_dtype ?? 'float32';
2025+
const cls = (dtype === 'float16') ? DataTypeMap.float16 : DataTypeMap.float32;
2026+
const shapes = getCacheShapes(this.config, { batch_size });
20262027
for (const name in shapes) {
2027-
decoderFeeds[name] = new Tensor(dtype, empty, shapes[name]);
2028+
const size = shapes[name].reduce((a, b) => a * b, 1);
2029+
decoderFeeds[name] = new Tensor(dtype, new cls(size), shapes[name]);
20282030
}
20292031
}
20302032
}
@@ -4586,6 +4588,13 @@ export class LlamaModel extends LlamaPreTrainedModel { }
45864588
export class LlamaForCausalLM extends LlamaPreTrainedModel { }
45874589
//////////////////////////////////////////////////
45884590

4591+
//////////////////////////////////////////////////
4592+
// LFM2 models
4593+
export class Lfm2PreTrainedModel extends PreTrainedModel { }
4594+
export class Lfm2Model extends Lfm2PreTrainedModel { }
4595+
export class Lfm2ForCausalLM extends Lfm2PreTrainedModel { }
4596+
//////////////////////////////////////////////////
4597+
45894598
//////////////////////////////////////////////////
45904599
// SmolLM3 models
45914600
export class SmolLM3PreTrainedModel extends PreTrainedModel { }
@@ -7803,6 +7812,7 @@ const MODEL_MAPPING_NAMES_DECODER_ONLY = new Map([
78037812
['gpt_neox', ['GPTNeoXModel', GPTNeoXModel]],
78047813
['codegen', ['CodeGenModel', CodeGenModel]],
78057814
['llama', ['LlamaModel', LlamaModel]],
7815+
['lfm2', ['Lfm2Model', Lfm2Model]],
78067816
['smollm3', ['SmolLM3Model', SmolLM3Model]],
78077817
['exaone', ['ExaoneModel', ExaoneModel]],
78087818
['olmo', ['OlmoModel', OlmoModel]],
@@ -7908,6 +7918,7 @@ const MODEL_FOR_CAUSAL_LM_MAPPING_NAMES = new Map([
79087918
['gpt_neox', ['GPTNeoXForCausalLM', GPTNeoXForCausalLM]],
79097919
['codegen', ['CodeGenForCausalLM', CodeGenForCausalLM]],
79107920
['llama', ['LlamaForCausalLM', LlamaForCausalLM]],
7921+
['lfm2', ['Lfm2ForCausalLM', Lfm2ForCausalLM]],
79117922
['smollm3', ['SmolLM3ForCausalLM', SmolLM3ForCausalLM]],
79127923
['exaone', ['ExaoneForCausalLM', ExaoneForCausalLM]],
79137924
['olmo', ['OlmoForCausalLM', OlmoForCausalLM]],
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
import { PreTrainedTokenizer, Lfm2ForCausalLM } from "../../../src/transformers.js";
2+
3+
import { MAX_MODEL_LOAD_TIME, MAX_TEST_EXECUTION_TIME, MAX_MODEL_DISPOSE_TIME, DEFAULT_MODEL_OPTIONS } from "../../init.js";
4+
5+
export default () => {
6+
describe("Lfm2ForCausalLM", () => {
7+
const model_id = "onnx-internal-testing/tiny-random-Lfm2ForCausalLM";
8+
/** @type {Lfm2ForCausalLM} */
9+
let model;
10+
/** @type {PreTrainedTokenizer} */
11+
let tokenizer;
12+
beforeAll(async () => {
13+
model = await Lfm2ForCausalLM.from_pretrained(model_id, DEFAULT_MODEL_OPTIONS);
14+
tokenizer = await PreTrainedTokenizer.from_pretrained(model_id);
15+
tokenizer.padding_side = "left";
16+
}, MAX_MODEL_LOAD_TIME);
17+
18+
it(
19+
"batch_size=1",
20+
async () => {
21+
const inputs = tokenizer("hello");
22+
const outputs = await model.generate({
23+
...inputs,
24+
max_length: 10,
25+
});
26+
expect(outputs.tolist()).toEqual([[1n, 52572n, 38892n, 6902n, 53329n, 33092n, 13656n, 49822n, 6902n, 52520n]]);
27+
},
28+
MAX_TEST_EXECUTION_TIME,
29+
);
30+
31+
it(
32+
"batch_size>1",
33+
async () => {
34+
const inputs = tokenizer(["hello", "hello world"], { padding: true });
35+
const outputs = await model.generate({
36+
...inputs,
37+
max_length: 10,
38+
});
39+
expect(outputs.tolist()).toEqual([
40+
[0n, 1n, 52572n, 60239n, 57205n, 6790n, 58292n, 30935n, 5959n, 6902n],
41+
[1n, 52572n, 2031n, 59572n, 43345n, 42427n, 31142n, 41100n, 5321n, 5816n],
42+
]);
43+
},
44+
MAX_TEST_EXECUTION_TIME,
45+
);
46+
47+
afterAll(async () => {
48+
await model?.dispose();
49+
}, MAX_MODEL_DISPOSE_TIME);
50+
});
51+
};

0 commit comments

Comments
 (0)