Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Paper • 2203.05482 • Published • 8
How to use sraj/Merge_Drop_MARK_FastText with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("fill-mask", model="sraj/Merge_Drop_MARK_FastText") # Load model directly
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("sraj/Merge_Drop_MARK_FastText")
model = AutoModelForMaskedLM.from_pretrained("sraj/Merge_Drop_MARK_FastText")This is a merge of pre-trained language models created using mergekit.
This model was merged using the Linear merge method.
The following models were included in the merge:
The following YAML configuration was used to produce this model:
models:
- model: sraj/CMB_FWEdu_V2_FastTxt_CX_LRD
parameters:
weight: 1.0
- model: sraj/CMB_WX_SYN_CX_LRD
parameters:
weight: 1.0
# - model: sraj/CMB_SYN_QWEN35_122B_FP8_10K_SEED42_CX_LRD
# parameters:
# weight: 1.0
merge_method: linear
parameters:
normalize: true
dtype: bfloat16