Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Section

Similar to the concept of splicing graphs [Heber 2002], we employ a graph structure

LaTeX Math Inline
body$G=(V,E)$
for representing the reference transcriptome that is quantified in a non-redundant data structure. Each edge 
LaTeX Math Inline
body$e=(tail,head,mode,T)$
represents a segment of an annotated pre-mRNA molecule by the genomic coordinate of the corresponding 3'-tail and 5'-head position, by the type (exonic or intronic), and by the set 
LaTeX Math Inline
body$T$
of supporting transcripts (Definition 1).

Section

 Definition 1 (Segment Graph Properties): two adjacent edges 

LaTeX Math Inline
body$e \prec f$
in G are characterized by:

(i) they share the same intermediary splice site 

LaTeX Math Inline
body$s$
(adjacency)

LaTeX Math Inline
body$head_e = tail_f = s$

(ii) they describe the exon-intron structures of all transcripts spanning s (completeness)

LaTeX Math Inline
body$\forall i \in inedges(s), j \in outedges(s), \bigcup{T_i}=\bigcup{T_j}$

(iii) they either differ in mode or in supporting transcripts (discrimination)

LaTeX Math Inline
body$(mode_e \neq mode_f) \vee (T_e ≠ T_f)$

Section

To ensure the properties of 

LaTeX Math Inline
body$G$
at the respective transcript edges, all transcription initiation sites are connected to an artificial source node, and all cleavage sites are connected to an artificial sink node [Sammeth 2008]. Once the segment graph 
LaTeX Math Inline
body$G$
has been constructed for a locus, the edge set E describes the backbone of exonic segments and introns from the 3'-most transcription start to the 5'-most cleavage site, with additional introns, source and sink links that allow to navigate alternative transcripts (Fig.1, panel A and B).

...