Semantic hierarchies are increasingly present in and necessary for knowledge discovery and knowledge systems management. Semantic hierarchies are most visible as large hand-coded taxonomically structured controlled vocabularies, the pre-eminent examples of which are Wordnet and the Gene Ontology (GO). But semantic hierarchies are also apparent as data typing hierarchies, class hierarchies in object-oriented meta-models, and verb typing hierarchies in computational linguistics. And while semantic hierarchies play central roles as the "taxonomic cores" of ontologies, they also emerge as the results of other analytical processes, for example the search spaces of hypotheses of multi-variate models, the strongly-connected hierarchical "backbones" of directed graphs, and concept lattice representations of relational databases. So for a variety of tasks such as ontology induction, interoperability, and ontological classification, it is essential that these hierarchical structures be treated as first-class data objects with coherent algorithms and efficient representations.
The proper mathematical grounding for these hierarchical representations is within order theory, the theory of ordered sets and lattices. In this talk we will survey our research program in measures on DAGs, partially-ordered sets (posets), and lattices, with applications primarily for computational biology using the GO. Our techniques take a metric approach to posets, with a corresponding re-working of such concepts as semantic similarity and evaluation measures (precision and recall) based on semi-modular functions on valuated posets. After introducing the mathematical foundations, we will outline their use specifically in automated ontological protein function annotation, GO visualization, and the analysis of GO structure and annotation. We will conclude with a discussion of future work in ontology induction and alignment.
There are no site access restrictions to the UCB Faculty Club.