Instructions to use khanfs/ChemSolubilityBERTa with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use khanfs/ChemSolubilityBERTa with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="khanfs/ChemSolubilityBERTa")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("khanfs/ChemSolubilityBERTa") model = AutoModelForSequenceClassification.from_pretrained("khanfs/ChemSolubilityBERTa") - Notebooks
- Google Colab
- Kaggle
ChemSolubilityBERTa
Model Description
ChemSolubilityBERTa is a prototype designed to predict the aqueous solubility of chemical compounds from their SMILES representations. Based on ChemBERTa, a BERT-like transformer-based architecture, ChemBERTa pre-trained on 77M SMILES strings for molecular property prediction. We adapted ChemBERTa to predict solubility values by fine-tuning ChemBERTa with the ESOL (Estimated SOLubility) dataset, a water solubility prediction dataset of 1,128 samples. A user inputs a SMILES string, and the model outputs a log solubility value (log mol/L). You can read the full paper here.
Fine-Tuning Details
- Pretrained model:
seyonec/ChemBERTa-zinc-base-v1 - Dataset: ESOL (delaney-processed)
- Task: Aqueous solubility prediction (log mol/L)
- Number of training epochs: 3
- Batch size: 16
How to Use
You can use the model to predict solubility for any molecule represented by a SMILES string:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("username/ChemSolubilityBERTa")
model = AutoModelForSequenceClassification.from_pretrained("username/ChemSolubilityBERTa")
smiles_string = "CCO" # Example for ethanol
inputs = tokenizer(smiles_string, return_tensors='pt')
outputs = model(**inputs)
solubility = outputs.logits.item()
print(f"Predicted solubility: {solubility}")
Citation and Usage
If you use ChemSolubilityBERTa in your research or projects, please cite the following:
@misc{ChemSolubilityBERTa,
author = {Farooq Khan},
title = {ChemSolubilityBERTa: A Transformer-Based Model for Predicting Aqueous Solubility from SMILES},
year = {2024},
url = {https://huggingface.co/khanfs/ChemSolubilityBERTa}
}
License
This model is licensed under the MIT License.
- Downloads last month
- 12
Model tree for khanfs/ChemSolubilityBERTa
Base model
seyonec/ChemBERTa-zinc-base-v1