Oryza coarctata Oco Assembly & Annotation

Overview

Analysis Name Oryza coarctata Oco Assembly & Annotation
Sequencing technology PacBio
Assembly method hifiasm v0.12
Release Date 2023-08-15
Reference Publication(s)

Zhao H, Wang W, Yang Y, Wang Z, Sun J, Yuan K, Rabbi SMHA, Khanam M, Kabir MS, Seraj ZI, Rahman MS, Zhang Z. A high-quality chromosome-level wild rice genome of Oryza coarctata. Sci Data. 2023 Oct 14;10(1):701. doi: 10.1038/s41597-023-02594-1.

Abstract

Oryza coarctata (2n = 4X = 48, KKLL) is an allotetraploid, undomesticated relative of rice and the only species in the genus Oryza with tolerance to high salinity and submergence. Therefore, it contains important stress and tolerance genes/factors for rice. The initial draft genome published was limited by data and technical restrictions, leading to an incomplete and highly fragmented assembly. This study reports a new, highly contiguous chromosome-level genome assembly and annotation of O. coarctata. PacBio high-quality HiFi reads generated 460 contigs with a total length of 573.4 Mb and an N50 of 23.1 Mb, which were assembled into scaffolds with Hi-C data, anchoring 96.99% of the assembly onto 24 chromosomes. The genome assembly comprises 45,571 genes, and repetitive content contributes 25.5% of the genome. This study provides the novel identification of the KK and LL genome types of the genus Oryza, leading to valuable insights into rice genome evolution. The chromosome-level genome assembly of O. coarctata is a valuable resource for rice research and molecular breeding.

Assembly statistics

Genome size (bp)573,362,877
GC content42.06%
Chromosomes sequence No.24
Genome sequence No.450
Maximum genome sequence length (bp)37,520,647
Minimum genome sequence length (bp)16,801
Average genome sequence length (bp)1,274,140
Genome sequence N50 (bp)23,112,565
Genome sequence N90 (bp)16,161,634
Assembly levelChromosome

Assembly

The Oryza coarctata Oco Assembly file is available in FASTA format.

Downloads

Chromosomes (FASTA file) GWHCBHR00000000.genome.fasta.gz

Gene Predictions

The Oryza coarctata Oco genome gene prediction files are available in GFF3 and FASTA format.

Downloads

Genes (GFF3 file) GWHCBHR00000000.gff.gz
CDS sequences (FASTA file) GWHCBHR00000000.RNA.fasta.gz
Protein sequences (FASTA file) GWHCBHR00000000.Protein.faa.gz

Functional Analysis

Functional annotation for the Oryza coarctata Oco is available for download below. The proteins were analyzed using InterProScan to assign InterPro domains(Pfam).

Downloads

Domain from InterProScan Oryza_coarctata.Pfam.tsv.gz

S genes

Summary

QueryChromosomeSize(bp)CoordinatestBLASTn HittBLASTn %IDDomain
DUF247I-S1ΨGWHCBHR00000009237655814727597-4728520Olongistaminata80DUF247
DUF247I-S2ΨGWHCBHR00000010234141805599897-5599947Olongistaminata64DUF247
HPS10-S1GWHCBHR00000009237655814723566-4723707,
4723782-4723888
LpsS_chromosome142-
HPS10-S2GWHCBHR00000010234141805602105-5602229,
5602322-5602394
Olongistaminata47-
DUF247I-ZGWHCBHR000000072311256520674591-20676168LpZDUF247-I_chromosome261DUF247
DUF247II-ZGWHCBHR000000072311256520688482-20690182Scereale67DUF247
HPS10-Z1GWHCBHR000000072311256520681595-20681757,
20681830-20681882
Telongatum38-
HPS10-Z2GWHCBHR000000082265409920165556-20165665,
20165760-20165922
LpsZ_chromosome240-

Nucleotide

Protein

© 2023 National Genomics Data Center, China National Center for Bioinformation / Beijing Institute of Genomics, Chinese Academy of Sciences