Instructions to use answerdotai/ModernBERT-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use answerdotai/ModernBERT-base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="answerdotai/ModernBERT-base")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("answerdotai/ModernBERT-base") model = AutoModelForMaskedLM.from_pretrained("answerdotai/ModernBERT-base") - Notebooks
- Google Colab
- Kaggle
ModernBERT-base-chinese
#42
by ZBW - opened
any plan for chinese support???
any plan for chinese support???
我自己训了几个支持 CJK 的版本,但是效果都不太好,不如 xlm-roberta,感觉可能还是训练量太小了或者数据洗的不够干净
而且在我这儿 FA2 并不能正确的运行,最后还是 SDPA 跑的,不知道是我的问题还是 BUG
ModernBERT 官方应该没有多语言支持的计划 https://github.com/AnswerDotAI/ModernBERT/issues/143
好的,感谢,加在拉丁文词表上面微调不行的话,只用中文语料重新构建词表呢,有无验证过效果
跨语言微调估计都不太行,但个人又没有算力资源,唉
好的,感谢,加在拉丁文词表上面微调不行的话,只用中文语料重新构建词表呢,有无验证过效果
当然是先扩展了词表以后再进行的继续预训练
我猜这个方法本身应该是没问题的,主要还是训练量和工作量的问题
算力倒还好,只是钱的问题,数据清洗和筛选还是十分消耗人工的,毕竟人工智能的本质就是有多少人工,就有多少智能
等我解决了这个问题以后再试试看 https://github.com/AnswerDotAI/ModernBERT/issues/172
any plan for chinese support???
you can enjoy this model https://huggingface.co/feynmanzhao/chinese-modernbert-large-wwm