Coordinator: Christine Chichester
There is no shortage of experimental data and knowledge in the life sciences. However, it is siloed in databases, scientific literature, and the minds of scientists. Any locally performed reasoning process, either computationally by computers or conceptually by humans, will miss potentially relevant data, making serendipitous findings less likely. Numerous projects have tried to address this problem by data integration, but have only marginally succeeded. We are now able to realistically propose a fundamentally different system for enhanced knowledge management. Using Semantic Web technologies, we will generate an interoperable, interdisciplinary catalog of unique scientific assertions assembled from previously documented data.
The project involves the classic problems of dealing with heterogeneous data and making the entire collection interoperable while ensuring that any annotation, which includes the recognition-and-reward system of scientific publishing, fits into a seamless beginning to end. The challenge is to create a system that manages heterogeneity and to provide interoperability using open and extensible standards and methodologies. The ultimate goal is to create a sustainable future for a large-scale, community editable store of disambiguated scientific assertions, exemplifying a new paradigm in life sciences data accumulation. We will draw from the mental resources of an extended scientific community in an innovative and complex, yet ‘daily practice’, manner that promises a profound impact on our ability to use existing data to generate new knowledge with the maximum conceivable serendipity.
Initially, the Interoperability Task Force (ITF) will focus on participating in the development of the Open PHACTS project. This project will focus on developing an open source and open access platform via a semantic web-based approach. The semantic integration hub, named the Open Pharmacological Space (OPS), will concentrate on the task of building coherent services guided by well-defined research questions assembled from the participating researchers. As the ITF gains momentum, facilitating data sharing between the different NBIC task forces such as Next Generation Sequencing and Biobanking will be a primary goal.
To make this approach possible, the team consists of participants from the disciplines of biology and chemistry and from computer science. We will draw on the knowledge and the capacities of the consortium assembled by the Concept Web Alliance (CWA), the Semantic Web Consortium (W3C), and from the NBIC as a whole to form a strong backbone for a broad range of work, including coordinating collaborations between diverse expert communities, facilitating knowledge transfer, building consensus on technical directions (metadata, Web services, component architecture), aligning interfaces with already existing community practices, and training user populations. The CWA was recently chartered to enhance existing information exchange by developing open-platform protocols, data formats, workflow tools, and semantic integration to overcome existing legacies and information bottlenecks. In this capacity, the CWA has demonstrated a leadership role in obtaining service and application level agreements for public data and established consensus expertise in the area of semantic frameworks. Both aspects represent key factors necessary for establishing standards that will sustain processes required for mediating large scale data interoperability. Drawing on its diverse membership in academia and private enterprise, the CWA is uniquely positioned as a trusted agent to mediate this unprecedented confederation of existing public and private information. The CWA is firmly committed to the application of open source / open access principles to biomedical data through its trusted public-private partnerships.
The CWA includes many international groups and projects : NBIC, Stanford National Center for Biomedical Computing (NCBO), The Large Knowledge Collider (LarKC), Swiss Institute of Bioinformatics (SIB), Bireme, Indiana University, iCAPTURE, Leiden University Medical Center (LUMC) and many individual members from varying institutions and companies. As participating members of the CWA, we are exceptionally aware of many projects carried out by groups within the alliance that touch aspects interoperability. We do not want to reinvent the wheel, thus we will strive to implement, together with the creators, the knowledge arising from their individual projects. The strong foundation of the CWA gives us a support system that extends well beyond what could be achieved individually.
An organizational set-up meeting with the principal investigators is scheduled for January 6, 2011. Other investigators who have not been explicitly contacted about this meeting but have some interest should contact Christine.
The ConceptWiki is a working prototype built on the WikiData software and originates from the earlier established Wiki predecessors. The ConceptWiki is a web based system containing the biomedical terminology of Unified Medical Language System (UMLS, levels 0 and 1) mapped where appropriate to the protein terminology from SwissProt. The ConceptWiki repository will be continually expanded with the data from many collaborating groups such as chemical terminology from ChemSpider. Each concept is annotated with one or more semantic types and basic information, like a definition. Users can view and edit information through a uniform interface. The information in the system is stored and edited in a highly structured way, as triples (e.g. <concept A> <has synonym> <term B>). This compatibility with other information storage systems enables higher level applications to easily query, summarize and mine the knowledge. The WikiData backend has been designed to support the storage of concepts in a very generic form, thereby trying to avoid as much as possible the exclusion of potential valuable information sources.
WikiData keeps a complete history containing every change made to the concepts. Changes can be analyzed via the history page of each concept or the global chronological transaction log. Transactions can be rolled back partially or completely. Using the ConceptWiki interface scientists with no background in programming can directly map, merge, or integrate individual concepts. The ConceptWiki supports the distinction between ’authority’ and ‘community’ data and permits general editing only on the community branch of the data. This distinction is the highly innovative aspect that convinces authorities that it is prudent to donate and integrate their data into the system. Additionally, the comparison between the authority branches and the community branches allows personal value judgments of displayed data.
The prototype thesauri extractor is the application for freely downloading terminology systems for specific purposes or domains from the ConceptWiki. The downloaded thesauri can be used to identify concept-denoting tokens in text and databases so that individual indexers can be linked to the concepts to create a linked open data system.
The Knowledge Enhancer ITS is an application that provides a graphical interface for displaying the assertions that result from running text through the indexing system (Peregrine engine with the ConceptWiki thesaurus). Currently, the existing Knowledge Enhancer ITS exploits the ConceptWiki to recognize concepts on the fly in any website text. A variety of different functionalities can be invoked when a highlighted concept-denoting term in the text is clicked. Each unambiguous term detected by the Knowledge Enhancer ITS is directly linked to the concept it denotes in the ConceptWiki, and therefore all information accessible in the ConceptWiki can be called upon in the popup. Presently, the NBICCentral site applies the Knowledge Enhancer technology to academic respositories of Dutch universties. Eventually, this application will also be developed as browser plugin application to allow scholars logged in through any university system to semantically browse text that is only available behind their firewalls.