Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,38 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
tags:
|
| 4 |
+
- biology
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
<div align="center">
|
| 8 |
+
<!-- TODO: Uncomment and set YOUR_IMAGE_URL -->
|
| 9 |
+
<!-- <img src="YOUR_IMAGE_URL" width="100%" alt="OneGenome-Rice (OGR)" /> -->
|
| 10 |
+
*(Banner / architecture figure: add URL, then uncomment the line above.)*
|
| 11 |
+
</div>
|
| 12 |
+
|
| 13 |
+
# OneGenome-Rice (OGR)
|
| 14 |
+
|
| 15 |
+
OGR is a foundational model for AI-driven precision breeding and functional genomics in rice. It is a generative genomic foundation model trained to process DNA sequences up to **1 million** base pairs in length, with **1.25B** total parameters and a **Mixture-of-Experts (MoE)** architecture. It was pre-trained on a curated corpus of **422** rice genomes spanning cultivated and wild *Oryza* diversity.
|
| 16 |
+
|
| 17 |
+
For instructions, details, and examples, see the project repository: *[TODO: GitHub or documentation URL](https://github.com/TODO/TODO)*.
|
| 18 |
+
|
| 19 |
+
The table below summarizes training scale and key hyperparameters. **Trained Tokens** follows the **Training Process** section (sequence curriculum and CPT).
|
| 20 |
+
|
| 21 |
+
<!-- If you ship multiple sizes (e.g. Small / Large), duplicate the table and add columns. -->
|
| 22 |
+
|
| 23 |
+
| Model Specification | OneGenome-Rice (OGR) |
|
| 24 |
+
| --- | --- |
|
| 25 |
+
| **Model Scale** | |
|
| 26 |
+
| Total Parameters | 1.25B |
|
| 27 |
+
| Activated Parameters | 0.33B |
|
| 28 |
+
| Trained Tokens | ~490B (sequence curriculum) + ~104B (CPT) |
|
| 29 |
+
| **Architecture** | |
|
| 30 |
+
| Architecture | MoE |
|
| 31 |
+
| Number of Experts | 8 |
|
| 32 |
+
| Selected Experts per Token | 2 |
|
| 33 |
+
| Number of Layers | 12 |
|
| 34 |
+
| Attention Hidden Dimension | 1024 |
|
| 35 |
+
| Number of Attention Heads | 16 (GQA, 8 KV groups) |
|
| 36 |
+
| MoE Hidden Dimension (per Expert) | 4096 |
|
| 37 |
+
| Vocabulary Size | 128 (padded) |
|
| 38 |
+
| Context Length | up to 1M |
|