Token Classification
Transformers
Safetensors
big_bird
fill-mask
CodonTransformer
Computational Biology
Machine Learning
Bioinformatics
Synthetic Biology
biology
Instructions to use adibvafa/CodonTransformer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use adibvafa/CodonTransformer with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="adibvafa/CodonTransformer")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("adibvafa/CodonTransformer") model = AutoModelForMaskedLM.from_pretrained("adibvafa/CodonTransformer") - Inference
- Notebooks
- Google Colab
- Kaggle
| library_name: transformers | |
| tags: | |
| - CodonTransformer | |
| - Computational Biology | |
| - Machine Learning | |
| - Bioinformatics | |
| - Synthetic Biology | |
| - biology | |
| license: apache-2.0 | |
| pipeline_tag: token-classification | |
|  | |
| **CodonTransformer** is the ultimate tool for codon optimization, transforming protein sequences into optimized DNA sequences specific for your target organisms. Whether you are a researcher or a practitioner in genetic engineering, CodonTransformer provides a comprehensive suite of features to facilitate your work. By leveraging the Transformer architecture and a user-friendly Jupyter notebook, it reduces the complexity of codon optimization, saving you time and effort. | |
| ## Authors | |
| Adibvafa Fallahpour<sup>1,2</sup>\*, Vincent Gureghian<sup>3</sup>\*, Guillaume J. Filion<sup>2</sup>‡, Ariel B. Lindner<sup>3</sup>‡, Amir Pandi<sup>3</sup>‡ | |
| <sup>1</sup> Vector Institute for Artificial Intelligence, Toronto ON, Canada | |
| <sup>2</sup> University of Toronto Scarborough; Department of Biological Science; Scarborough ON, Canada | |
| <sup>3</sup> Université Paris Cité, INSERM U1284, Center for Research and Interdisciplinarity, F-75006 Paris, France | |
| \* These authors contributed equally to this work. | |
| ‡ To whom correspondence should be addressed: <br> | |
| guillaume.filion@utoronto.ca, ariel.lindner@inserm.fr, amir.pandi@cri-paris.org | |
| <br> | |
| ## Use Case | |
| **For an interactive demo, check out our [Google Colab Notebook.](https://adibvafa.github.io/CodonTransformer/GoogleColab)** | |
| <br></br> | |
| After installing CodonTransformer, you can use: | |
| ```python | |
| import torch | |
| from transformers import AutoTokenizer, BigBirdForMaskedLM | |
| from CodonTransformer.CodonPrediction import predict_dna_sequence | |
| from CodonTransformer.CodonJupyter import format_model_output | |
| device = torch.device("cuda" if torch.cuda.is_available() else "cpu") | |
| # Load model and tokenizer | |
| tokenizer = AutoTokenizer.from_pretrained("adibvafa/CodonTransformer") | |
| model = BigBirdForMaskedLM.from_pretrained("adibvafa/CodonTransformer").to(device) | |
| # Set your input data | |
| protein = "MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGG" | |
| organism = "Escherichia coli general" | |
| # Predict with CodonTransformer | |
| output = predict_dna_sequence( | |
| protein=protein, | |
| organism=organism, | |
| device=device, | |
| tokenizer=tokenizer, | |
| model=model, | |
| attention_type="original_full", | |
| deterministic=True | |
| ) | |
| print(format_model_output(output)) | |
| ``` | |
| The output is: | |
| <br> | |
| ```python | |
| ----------------------------- | |
| | Organism | | |
| ----------------------------- | |
| Escherichia coli general | |
| ----------------------------- | |
| | Input Protein | | |
| ----------------------------- | |
| MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGG | |
| ----------------------------- | |
| | Processed Input | | |
| ----------------------------- | |
| M_UNK A_UNK L_UNK W_UNK M_UNK R_UNK L_UNK L_UNK P_UNK L_UNK L_UNK A_UNK L_UNK L_UNK A_UNK L_UNK W_UNK G_UNK P_UNK D_UNK P_UNK A_UNK A_UNK A_UNK F_UNK V_UNK N_UNK Q_UNK H_UNK L_UNK C_UNK G_UNK S_UNK H_UNK L_UNK V_UNK E_UNK A_UNK L_UNK Y_UNK L_UNK V_UNK C_UNK G_UNK E_UNK R_UNK G_UNK F_UNK F_UNK Y_UNK T_UNK P_UNK K_UNK T_UNK R_UNK R_UNK E_UNK A_UNK E_UNK D_UNK L_UNK Q_UNK V_UNK G_UNK Q_UNK V_UNK E_UNK L_UNK G_UNK G_UNK __UNK | |
| ----------------------------- | |
| | Predicted DNA | | |
| ----------------------------- | |
| ATGGCTTTATGGATGCGTCTGCTGCCGCTGCTGGCGCTGCTGGCGCTGTGGGGCCCGGACCCGGCGGCGGCGTTTGTGAATCAGCACCTGTGCGGCAGCCACCTGGTGGAAGCGCTGTATCTGGTGTGCGGTGAGCGCGGCTTCTTCTACACGCCCAAAACCCGCCGCGAAGCGGAAGATCTGCAGGTGGGCCAGGTGGAGCTGGGCGGCTAA | |
| ``` | |
| ## Additional Resources | |
| - **Project Website** <br> | |
| https://adibvafa.github.io/CodonTransformer/ | |
| - **GitHub Repository** <br> | |
| https://github.com/Adibvafa/CodonTransformer | |
| - **Google Colab Demo** <br> | |
| https://adibvafa.github.io/CodonTransformer/GoogleColab | |
| - **PyPI Package** <br> | |
| https://pypi.org/project/CodonTransformer/ | |
| - **Paper** <br> | |
| https://www.nature.com/articles/s41467-025-58588-7 | |
| ## Citation | |
| ``` | |
| @article{Fallahpour_Gureghian_Filion_Lindner_Pandi_2025, | |
| title={CodonTransformer: a multispecies codon optimizer using context-aware neural networks}, | |
| volume={16}, | |
| ISSN={2041-1723}, | |
| url={https://www.nature.com/articles/s41467-025-58588-7}, | |
| DOI={10.1038/s41467-025-58588-7}, | |
| number={1}, | |
| journal={Nature Communications}, | |
| author={Fallahpour, Adibvafa and Gureghian, Vincent and Filion, Guillaume J. and Lindner, Ariel B. and Pandi, Amir}, | |
| year={2025}, | |
| month=apr, | |
| pages={3205}, | |
| language={en} | |
| } | |
| ``` |