Data Standards

Overview

Standardisation of data entities is of increasing value in ensuring researchers and end-users are able to navigate the world of cannabis big data. The adoption of FAIR (Findable Accessible Interoperable and Re-usable) data principles underlies the abilitie of humans, machines and cyborgs to make use of relevant data, and make meaningful connections. The Cannabis research community is able to build on various existing initiatives and approaches taken elsewher in the genomics and plant sciences communities. This includes the Gene Ontology and other controlled vocabularies. Standardisation of gene model nomenclature has been effective in many other genus-level genomic communities.

The ICGRC has identified a need to propose, develop and agree on commons standards that facilitate interchange and analysis of genetic, genomic and trait data. Specifically, for Cannabis, there is a need for standardization of:

Plant sample IDs relating to genetic resources in relation to where sourced
Agreement of terminology in relation to 'strain', 'cultivar', 'line', 'accession', 'plant', 'genotype'
Chromosome numbering and orientation used within different genome assemblies and genetic maps.
Genome annotation, format of gene model nomenclature and harmonization of 'reference' genomes.
Registries or look-up tables for genes and variants that are mapped to different reference genomes.