Data Annotation and Relations Modeling for Integrated Omics in Clinical research
Abstract
Omics has massively permeated translational clinical research with numerous diseases being covered by Omics studies from the genome to the metabolome level. Integrating disease specific Omics tracks (cross-Omics) appears a logical next step for building the fundament od Systems Biology and Systems Medicine. here, coherence of individual Omics tracks regarding clinical hypothesis, sample and data set, and finally dadta handling and integration become pivotal. We present a data integration, annotation and relations modeling framework for embedding heterogeneous Omics data workflows. With molecular features at the center of all Omics we link the cross-Omics annotation to a human molecular reference network allowing for seamless integration and subsequent interpretation of screening results. Our concept rests on data structures for representing dataobjects specified by metadata and content. For handling diverse Omics tracks a flexible structure for contents is proposed allowing data representation at different levels of granularity as demanded by the type of Omics and specific type of data. Content on molecular level includes deep annotation of molecular features on gene and protein level. Based on this annotation pair-wise relations between molecular objects are compute, traversing the molecular annotation into a network of relations (molecular feature graph). Such a relations network is also built on the Omics level, combining explicit relations derived from study setup and implicit relations generated by mining metadta and content (Omics data graph). Finally both graphs are merged utilizing the molecular feature levels as common denominator, enabling a persistent integration and subsequently interpretation of cross-Omics in the realm of a cgiven clinical hypothesis.

