xueyunlong commited on
Commit
ee218f1
·
verified ·
1 Parent(s): c03a40b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -0
README.md ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - biology
5
+ ---
6
+
7
+ <div align="center">
8
+ <!-- TODO: Uncomment and set YOUR_IMAGE_URL -->
9
+ <!-- <img src="YOUR_IMAGE_URL" width="100%" alt="OneGenome-Rice (OGR)" /> -->
10
+ *(Banner / architecture figure: add URL, then uncomment the line above.)*
11
+ </div>
12
+
13
+ # OneGenome-Rice (OGR)
14
+
15
+ OGR is a foundational model for AI-driven precision breeding and functional genomics in rice. It is a generative genomic foundation model trained to process DNA sequences up to **1 million** base pairs in length, with **1.25B** total parameters and a **Mixture-of-Experts (MoE)** architecture. It was pre-trained on a curated corpus of **422** rice genomes spanning cultivated and wild *Oryza* diversity.
16
+
17
+ For instructions, details, and examples, see the project repository: *[TODO: GitHub or documentation URL](https://github.com/TODO/TODO)*.
18
+
19
+ The table below summarizes training scale and key hyperparameters. **Trained Tokens** follows the **Training Process** section (sequence curriculum and CPT).
20
+
21
+ <!-- If you ship multiple sizes (e.g. Small / Large), duplicate the table and add columns. -->
22
+
23
+ | Model Specification | OneGenome-Rice (OGR) |
24
+ | --- | --- |
25
+ | **Model Scale** | |
26
+ | Total Parameters | 1.25B |
27
+ | Activated Parameters | 0.33B |
28
+ | Trained Tokens | ~490B (sequence curriculum) + ~104B (CPT) |
29
+ | **Architecture** | |
30
+ | Architecture | MoE |
31
+ | Number of Experts | 8 |
32
+ | Selected Experts per Token | 2 |
33
+ | Number of Layers | 12 |
34
+ | Attention Hidden Dimension | 1024 |
35
+ | Number of Attention Heads | 16 (GQA, 8 KV groups) |
36
+ | MoE Hidden Dimension (per Expert) | 4096 |
37
+ | Vocabulary Size | 128 (padded) |
38
+ | Context Length | up to 1M |