Skip to content

Commit 9407ed8

Browse files
authored
Add support for ModernBERT Decoder (#1371)
1 parent 11fdae5 commit 9407ed8

File tree

4 files changed

+11
-0
lines changed

4 files changed

+11
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -382,6 +382,7 @@ You can refine your search by selecting the task you're interested in (e.g., [te
382382
1. **[MobileViT](https://huggingface.co/docs/transformers/model_doc/mobilevit)** (from Apple) released with the paper [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://huggingface.co/papers/2110.02178) by Sachin Mehta and Mohammad Rastegari.
383383
1. **[MobileViTV2](https://huggingface.co/docs/transformers/model_doc/mobilevitv2)** (from Apple) released with the paper [Separable Self-attention for Mobile Vision Transformers](https://huggingface.co/papers/2206.02680) by Sachin Mehta and Mohammad Rastegari.
384384
1. **[ModernBERT](https://huggingface.co/docs/transformers/model_doc/modernbert)** (from Answer.AI and LightOn) released with the paper [Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference](https://huggingface.co/papers/2412.13663) by Benjamin Warner, Antoine Chaffin, Benjamin Clavié, Orion Weller, Oskar Hallström, Said Taghadouini, Alexis Gallagher, Raja Biswas, Faisal Ladhak, Tom Aarsen, Nathan Cooper, Griffin Adams, Jeremy Howard, Iacopo Poli.
385+
1. **[ModernBERT Decoder](https://huggingface.co/docs/transformers/model_doc/modernbert-decoder)** (from Johns Hopkins University and LightOn) released with the paper [Seq vs Seq: An Open Suite of Paired Encoders and Decoders](https://huggingface.co/papers/2507.11412) by Orion Weller, Kathryn Ricci, Marc Marone, Antoine Chaffin, Dawn Lawrie, Benjamin Van Durme.
385386
1. **Moondream1** released in the repository [moondream](https://github.com/vikhyat/moondream) by vikhyat.
386387
1. **[Moonshine](https://huggingface.co/docs/transformers/model_doc/moonshine)** (from Useful Sensors) released with the paper [Moonshine: Speech Recognition for Live Transcription and Voice Commands](https://huggingface.co/papers/2410.15608) by Nat Jeffries, Evan King, Manjunath Kudlur, Guy Nicholson, James Wang, Pete Warden.
387388
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://huggingface.co/papers/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.

docs/snippets/6_supported-models.snippet

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,7 @@
9696
1. **[MobileViT](https://huggingface.co/docs/transformers/model_doc/mobilevit)** (from Apple) released with the paper [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://huggingface.co/papers/2110.02178) by Sachin Mehta and Mohammad Rastegari.
9797
1. **[MobileViTV2](https://huggingface.co/docs/transformers/model_doc/mobilevitv2)** (from Apple) released with the paper [Separable Self-attention for Mobile Vision Transformers](https://huggingface.co/papers/2206.02680) by Sachin Mehta and Mohammad Rastegari.
9898
1. **[ModernBERT](https://huggingface.co/docs/transformers/model_doc/modernbert)** (from Answer.AI and LightOn) released with the paper [Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference](https://huggingface.co/papers/2412.13663) by Benjamin Warner, Antoine Chaffin, Benjamin Clavié, Orion Weller, Oskar Hallström, Said Taghadouini, Alexis Gallagher, Raja Biswas, Faisal Ladhak, Tom Aarsen, Nathan Cooper, Griffin Adams, Jeremy Howard, Iacopo Poli.
99+
1. **[ModernBERT Decoder](https://huggingface.co/docs/transformers/model_doc/modernbert-decoder)** (from Johns Hopkins University and LightOn) released with the paper [Seq vs Seq: An Open Suite of Paired Encoders and Decoders](https://huggingface.co/papers/2507.11412) by Orion Weller, Kathryn Ricci, Marc Marone, Antoine Chaffin, Dawn Lawrie, Benjamin Van Durme.
99100
1. **Moondream1** released in the repository [moondream](https://github.com/vikhyat/moondream) by vikhyat.
100101
1. **[Moonshine](https://huggingface.co/docs/transformers/model_doc/moonshine)** (from Useful Sensors) released with the paper [Moonshine: Speech Recognition for Live Transcription and Voice Commands](https://huggingface.co/papers/2410.15608) by Nat Jeffries, Evan King, Manjunath Kudlur, Guy Nicholson, James Wang, Pete Warden.
101102
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://huggingface.co/papers/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.

src/configs.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,7 @@ function getNormalizedConfig(config) {
104104
case 'stablelm':
105105
case 'opt':
106106
case 'falcon':
107+
case 'modernbert-decoder':
107108
mapping['num_heads'] = 'num_attention_heads';
108109
mapping['num_layers'] = 'num_hidden_layers';
109110
mapping['hidden_size'] = 'hidden_size';

src/models.js

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2230,6 +2230,12 @@ export class ModernBertForTokenClassification extends ModernBertPreTrainedModel
22302230
}
22312231
//////////////////////////////////////////////////
22322232

2233+
//////////////////////////////////////////////////
2234+
// ModernBERT Decoder models
2235+
export class ModernBertDecoderPreTrainedModel extends PreTrainedModel { }
2236+
export class ModernBertDecoderModel extends ModernBertDecoderPreTrainedModel { }
2237+
export class ModernBertDecoderForCausalLM extends ModernBertDecoderPreTrainedModel { }
2238+
//////////////////////////////////////////////////
22332239

22342240
//////////////////////////////////////////////////
22352241
// NomicBert models
@@ -7837,6 +7843,7 @@ const MODEL_MAPPING_NAMES_DECODER_ONLY = new Map([
78377843
['starcoder2', ['Starcoder2Model', Starcoder2Model]],
78387844
['falcon', ['FalconModel', FalconModel]],
78397845
['stablelm', ['StableLmModel', StableLmModel]],
7846+
['modernbert-decoder', ['ModernBertDecoderModel', ModernBertDecoderModel]],
78407847
]);
78417848

78427849
const MODEL_FOR_SPEECH_SEQ_2_SEQ_MAPPING_NAMES = new Map([
@@ -7945,6 +7952,7 @@ const MODEL_FOR_CAUSAL_LM_MAPPING_NAMES = new Map([
79457952
['falcon', ['FalconForCausalLM', FalconForCausalLM]],
79467953
['trocr', ['TrOCRForCausalLM', TrOCRForCausalLM]],
79477954
['stablelm', ['StableLmForCausalLM', StableLmForCausalLM]],
7955+
['modernbert-decoder', ['ModernBertDecoderForCausalLM', ModernBertDecoderForCausalLM]],
79487956

79497957
// Also image-text-to-text
79507958
['phi3_v', ['Phi3VForCausalLM', Phi3VForCausalLM]],

0 commit comments

Comments
 (0)