Eva02 weight_init #2565
-
I read in the paper that in their experiments the usual trunc_normal weight initialisation didn't work well and they used xavier_normal instead. I see that in the eva.py file, the _init_weigths for linear layers is the same regardless of whether swiglu is used or not. The swiglu module also has a weight initialisation function, but that one does not use xavier either I am wondering what the reasoning for this is. Related topic: |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
@TonyCongqianWang I could add support for the other init approach, but the main reason it's not there is that it's for training from scratch, and I pretty sure the observation made in the EVA02 paper was for training from scratch with their combo of MIM + CLIP style pretraining on very large datasets, which may or may not have impact for other types of pretraining. All in all given that most people are fine-tuning the EVA models from EVA weights, lower priority to add (and more time consumingly, test) the alternative init mode. |
Beta Was this translation helpful? Give feedback.
-
No init fn are called in the training loop, different models have slightly different approaches to weight init due to origins of the code and evolving ideas. It has been a backburner TODO of mine to better unify this into a more consistent API that can be used with meta device init to do a proper two-phase init-process |
Beta Was this translation helpful? Give feedback.
@TonyCongqianWang I could add support for the other init approach, but the main reason it's not there is that it's for training from scratch, and I pretty sure the observation made in the EVA02 paper was for training from scratch with their combo of MIM + CLIP style pretraining on very large datasets, which may or may not have impact for other types of pretraining. All in all given that most people are fine-tuning the EVA models from EVA weights, lower priority to add (and more time consumingly, test) the alternative init mode.