Release Release v1.0.16 · huggingface/pytorch-image-models

June 26, 2025

MobileNetV5 backbone (w/ encoder only variant) for Gemma 3n image encoder
Version 1.0.16 released

June 23, 2025

Add F.grid_sample based 2D and factorized pos embed resize to NaFlexViT. Faster when lots of different sizes (based on example by https://github.com/stas-sl).
Further speed up patch embed resample by replacing vmap with matmul (based on snippet by https://github.com/stas-sl).
Add 3 initial native aspect NaFlexViT checkpoints created while testing, ImageNet-1k and 3 different pos embed configs w/ same hparams.

Model	Top-1 Acc	Top-5 Acc	Params (M)	Eval Seq Len
naflexvit_base_patch16_par_gap.e300_s576_in1k	83.67	96.45	86.63	576
naflexvit_base_patch16_parfac_gap.e300_s576_in1k	83.63	96.41	86.46	576
naflexvit_base_patch16_gap.e300_s576_in1k	83.50	96.46	86.63	576

Support gradient checkpointing for forward_intermediates and fix some checkpointing bugs. Thanks https://github.com/brianhou0208
Add 'corrected weight decay' (https://arxiv.org/abs/2506.02285) as option to AdamW (legacy), Adopt, Kron, Adafactor (BV), Lamb, LaProp, Lion, NadamW, RmsPropTF, SGDW optimizers
Switch PE (perception encoder) ViT models to use native timm weights instead of remapping on the fly
Fix cuda stream bug in prefetch loader

June 5, 2025

Initial NaFlexVit model code. NaFlexVit is a Vision Transformer with:
1. Encapsulated embedding and position encoding in a single module
2. Support for nn.Linear patch embedding on pre-patchified (dictionary) inputs
3. Support for NaFlex variable aspect, variable resolution (SigLip-2: https://arxiv.org/abs/2502.14786)
4. Support for FlexiViT variable patch size (https://arxiv.org/abs/2212.08013)
5. Support for NaViT fractional/factorized position embedding (https://arxiv.org/abs/2307.06304)
Existing vit models in vision_transformer.py can be loaded into the NaFlexVit model by adding the use_naflex=True flag to create_model
- Some native weights coming soon
A full NaFlex data pipeline is available that allows training / fine-tuning / evaluating with variable aspect / size images
- To enable in train.py and validate.py add the --naflex-loader arg, must be used with a NaFlexVit
To evaluate an existing (classic) ViT loaded in NaFlexVit model w/ NaFlex data pipe:
- python validate.py /imagenet --amp -j 8 --model vit_base_patch16_224 --model-kwargs use_naflex=True --naflex-loader --naflex-max-seq-len 256
The training has some extra args features worth noting
- The --naflex-train-seq-lens' argument specifies which sequence lengths to randomly pick from per batch during training
- The --naflex-max-seq-len argument sets the target sequence length for validation
- Adding --model-kwargs enable_patch_interpolator=True --naflex-patch-sizes 12 16 24 will enable random patch size selection per-batch w/ interpolation
- The --naflex-loss-scale arg changes loss scaling mode per batch relative to the batch size, timm NaFlex loading changes the batch size for each seq len

May 28, 2025

Add a number of small/fast models thanks to https://github.com/brianhou0208
- SwiftFormer - (ICCV2023) SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
- FasterNet - (CVPR2023) Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks
- SHViT - (CVPR2024) SHViT: Single-Head Vision Transformer with Memory Efficient
- StarNet - (CVPR2024) Rewrite the Stars
- GhostNet-V3 GhostNetV3: Exploring the Training Strategies for Compact Models
Update EVA ViT (closest match) to support Perception Encoder models (https://arxiv.org/abs/2504.13181) from Meta, loading Hub weights but I still need to push dedicated timm weights
- Add some flexibility to ROPE impl
Big increase in number of models supporting forward_intermediates() and some additional fixes thanks to https://github.com/brianhou0208
- DaViT, EdgeNeXt, EfficientFormerV2, EfficientViT(MIT), EfficientViT(MSRA), FocalNet, GCViT, HGNet /V2, InceptionNeXt, Inception-V4, MambaOut, MetaFormer, NesT, Next-ViT, PiT, PVT V2, RepGhostNet, RepViT, ResNetV2, ReXNet, TinyViT, TResNet, VoV
TNT model updated w/ new weights forward_intermediates() thanks to https://github.com/brianhou0208
Add local-dir: pretrained schema, can use local-dir:/path/to/model/folder for model name to source model / pretrained cfg & weights Hugging Face Hub models (config.json + weights file) from a local folder.
Fixes, improvements for onnx export

What's Changed

Fix arg merging of sknet, old seresnet. Fix #2470 by @rwightman in #2471
Fix onnx export by @rwightman in #2475
Add local-dir: schema support for model loading (config + weights) from folder by @rwightman in #2476
Fix: Allow img_size to be int or tuple in PatchEmbed by @sddongxh in #2477
Add LightlyTrain Integration for Pretraining Support by @yutong-xiang-97 in #2474
Check forward_intermediates features against forward_features output by @rwightman in #2483
More models support forward_intermediates by @brianhou0208 in #2482
Update README.md by @atharva-pathak in #2484
remove download argument from torch_kwargs for torchvision ImageNet class by @ryan-caesar-ramos in #2486
Update TNT-(S/B) model weights and add feature extraction support by @brianhou0208 in #2480
Add EVA ViT based PE (Perceptual Encoder) impl by @rwightman in #2487
Add SwiftFormer, SHViT, StarNet, FasterNet and GhostNetV3 by @brianhou0208 in #2499
A cleaned up beit3 remap onto vision_transformer.py vit by @rwightman in #2503
Initial NaFlex ViT model and training support by @rwightman in #2466
Forgot to compact attention pool branches after verifying by @rwightman in #2507
Throw exception on non-directory path for pretrained weights by @emmanuel-ferdman in #2510
Add corrected_weight decay to several optimizers by @rwightman in #2511
Doing some Claude enabled docstring, type annotation and other cleanup by @rwightman in #2504
Fix #2513, be explicit about stream devices by @rwightman in #2515
Update legacy AdamW impl so it has a multi-tensor impl like NAdamW (n… by @rwightman in #2517
Fix head_dim reference in AttentionRope class of attention.py by @amorehead in #2519
Refactor patch and pos embed resampling based on feedback from https://github.com/stas-sl by @rwightman in #2518
Add initial weights for my first 3 naflexvit_base models by @rwightman in #2523
Support gradient checkpointing in forward_intermediates() by @brianhou0208 in #2501
Update README: add references for additional supported models by @brianhou0208 in #2526
MobileNetV5 by @rwightman in #2527

New Contributors

@sddongxh made their first contribution in #2477
@yutong-xiang-97 made their first contribution in #2474
@atharva-pathak made their first contribution in #2484
@ryan-caesar-ramos made their first contribution in #2486
@emmanuel-ferdman made their first contribution in #2510
@amorehead made their first contribution in #2519

Full Changelog: v1.0.15...v1.0.16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Release v1.0.16

June 26, 2025

June 23, 2025

June 5, 2025

May 28, 2025

What's Changed

New Contributors

Contributors

Uh oh!