Linker/Architecture

From BioAssist
Jump to: navigation, search

Introduction

The Popup application currently provides the users with the following functionality. Linker is a semantic tagging engine for HTML pages. It provides the following functionalities:

  • Fetch the page requested and perform URL rewrite
  • Parsing, indexing and tagging HTML with concepts from an ontology/s
  • Displaying further information about the concept using a GUI.

Overview

Retrieve Web Page

All content that can be indexed needs to be fetched via the reverse “proxy mechanism”. E.g.user request for google / pubmed search page is routed through the linker using nph-proxy. The proxy ensures that all URL’s are rewritten in the content. This ensures that any further links that the user might click still go through the linker “reverse proxy” mechanism.Popup Reverse Proxy.png

Parse and index web page (highlight concepts)

Popup Indexing.png

  1. JavaScript parses the content from the web page (see: text parsing) and divides it into chunks of size defined in configuration files (see: configuration)
  2. These chunks are then sent to the python backend (HTTP POST).
  3. The python layer forwards this request to the Peregrine Web App (HTTP POST).
  4. The Peregrine Web Service invokes Peregrine indexer on these chunks. (In Memory)
  5. Peregrine returns the indexing information to the Web App.
  6. The Web App iterates through all the concepts and gets the mapping information for these concepts from its cache. If a concept is not found in the cache the information is retrieved from the mapping service. (RMI)
  7. Mapping service retrieves the information for the concept and the Web App creates an XML response in a format understood by the linker: see fingerprint.xsd:
<?xml version="1.0" encoding="UTF-8"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
    <xs:element name="concept">
        <xs:complexType>
            <xs:sequence>
                <xs:element ref="name"/>
                <xs:element ref="semantictypes"/>
                <xs:element ref="word" maxOccurs="unbounded"/>
            </xs:sequence>
            <xs:attribute name="id" type="xs:string" use="required"/>
            <xs:attribute name="rank" use="required" type="xs:double">
            </xs:attribute>
            <xs:attribute name="freq" use="required" type="xs:int">
            </xs:attribute>
        </xs:complexType>
    </xs:element>
    <xs:element name="concepts">
        <xs:complexType>
            <xs:attribute name="count" type="xs:int" use="required"/>
            <xs:attribute name="clusters" type="xs:int" use="required"/>
        </xs:complexType>
    </xs:element>
    <xs:element name="fingerprint">
        <xs:complexType>
            <xs:sequence>
                <xs:element ref="concepts"/>
                <xs:element ref="lineslist"/>
                <xs:element ref="concept" maxOccurs="unbounded"/>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
    <xs:element name="line">
        <xs:complexType>
            <xs:attribute name="startpos" use="required">
            </xs:attribute>
            <xs:attribute name="length" use="required">
            </xs:attribute>
        </xs:complexType>
    </xs:element>
    <xs:element name="lineslist">
        <xs:complexType>
            <xs:sequence>
                <xs:element ref="line" maxOccurs="unbounded"/>
            </xs:sequence>
            <xs:attribute name="count" type="xs:int" use="required"/>
        </xs:complexType>
    </xs:element>
    <xs:element name="name" type="xs:string"/>
    <xs:element name="semantictype">
        <xs:complexType>
            <xs:simpleContent>
                <xs:extension base="xs:string">
                    <xs:attribute name="id" use="required">
                    </xs:attribute>
                    <xs:attribute name="group" type="xs:string" use="required"/>
                </xs:extension>
            </xs:simpleContent>
        </xs:complexType>
    </xs:element>
    <xs:element name="semantictypes">
        <xs:complexType>
            <xs:sequence>
                <xs:element ref="semantictype" maxOccurs="unbounded"/>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
    <xs:element name="word">
        <xs:complexType>
            <xs:simpleContent>
                <xs:extension base="xs:string">
                    <xs:attribute name="clid" type="xs:int" use="required"/>
                    <xs:attribute name="pos" type="xs:int" use="required"/>
                    <xs:attribute name="len" type="xs:int" use="required"/>
                </xs:extension>
            </xs:simpleContent>
        </xs:complexType>
    </xs:element>
</xs:schema>
  1. This XML is passed back to the python layer (HTTP)
  2. The python layer simply passes the XML packet to the Browser (HTTP)
  3. The JavaScript running in the browser parses the XML, Identifies the text that is identified as concepts and highlights the text corresponding to the concepts in the page.

Display concept information (show popup)

The concept information display is based on the information that is retrieved from the indexing process. Indexing provides the JavaScript code with the “conceptId”. This “conceptId” is mapped to the “conceptId” that the definition service understands. Definition is the only information that is displayed when the popup is first displayed by clicking on an identified concept. The information for other popup headers is retrieved through AJAX calls when the specific header is clicked.

Structure

The folder structure of the popup application is

Popup Folder structure.png

Architecture

An architecture overview Popup Interaction.png

Technologies

  1. PHP
  2. Python
  3. JavaScript. Jquery library