March 2010 Hackathon

From BioAssist
Jump to: navigation, search

Before the NBIC Conference on 29/30 March, we have organized a Hackathon on Friday 26 and Saturday 27 March. Results of the hackathon have been presented on the second day of the conference in a lunch session. This presentation has been filmed.

Pictures of the work during the hackathon

Pictures are available on Rob Hooft's photo site

Projects for the Hack-a-thon

Expose ConceptWiki to semantic web and show interconnection with other sources

Needs for task

  1. Content negotiation for a *simple* RDF based resolution of ConceptWiki to provide results in RDF format
  2. new UI to view the concept information based on the RDF pages
  3. mash up of conceptwiki RDF pages and bio2rdf RDF pages

Responsibility

  • Scientific Lead: Andrew Gibson
  • Technical support: Kees Burger

Plans

Import of CWA format into the Concept Wiki first by using only urls represented in the ConceptWiki. The cwa format should be hosted on another site. Later, we should use Bio2RDF urls and use the ConceptWiki to dereference those URLs.

Includes the following activities:

  • To be done before hackathon: Use an open source SQL2RDF tool, e.g., D2RQ, SquirrelRDF or OpenLink Virtuoso, to expose data.
  • In addition we can make a small application showing how we resolve different URI's for the same concept via the ConceptWiki and maybe 'bridgedb' software from Maastricht
  • Marco is also interested in developing a workflow that creates the mapping between the RDF interface of the provider and the CWA format.
  • Demo for NBIC conference: If sparql will allow us to pull very specific statements from the ConceptWiki triple store, we should show that several of the same identifiers are used to indicate different concepts (ambiguous identifiers).

Action Points:

  1. Get D2RQ, SquirrelRDF and OpenLink Virtuoso into local code base
  2. Set up a sandbox DB server contains latest ConceptWiki DB
  3. Get a workflow management software into local code base
  4. Select a few external data providers that support RDF ie bio2RDF
  5. Add OntoCAT (http://ontocat.sourceforge.net) wrapper for ConceptWiki using above (Morris,Despoina)
  6. Generate MOLGENIS frontend, including D2RQ interface, on top of ConceptWiki DB (Morris,Despoina)

In-text concept recognition with linking to ConceptWiki and triple stores

Responsibilities

In-text highlighting of disambiguated triples

Erik's tool supports highlighting of triples in text and shows the connectedness of concepts within a context. We would like to link the highlighted concepts to ConceptWiki and then link them to his triple store to determine if the identified triple is already present in the triple store or is a new triple.

Action Points (Between square brackets are optional actions)

  1. Erik's source code, APIs, documents have to be clear and ready in the code base.
  2. ConceptWiki needs to expose a clear API. An easier alternative here is a little demo triplestore containing a set of (~100) protein-protein interactions identified using Hermen's software.
  3. The concept Ids need to be consistent for Nigam's and Erik's tools (concept wiki uuid?, or if bridgeDB is in placenigam and Erik can (or houdl fo rdemo purposes ? use different URO's for the same concept.

Actions

  • Develop a webservice that receives the fingerprint of one page and computes cooccurrence pairs within a sentence for concepts; use as example code the code as developed for the CWA prototype (Dmitry)
  • This same webservice will use Knewco's SOLR database with health relations to check if ambiguous concepts can be disambiguated and subsequently cooccurrence pairs are already existing in the SOLR database (triple store)Need to check if the DB will be allowed to be used in the hackathon context
  • Return a JSON message that contains triples and for each triple whether it is new or already defined in the triple store (and then provide information on the type of the relation and its weight) (Erik)
  • Extend the linker so that new relations/triples for a concept are shown in the linker with for each relation a checkbox (Bharat)
  • Integrate concept recognition feature into MOLGENIS using OntoCAT as concept source (Despoina, Morris)

Optional Steps: To be done if time allows

  • [Make available the thesaurus that has been used by Knewco for the health relations. Add to the thesaurus the protein concept identifiers used by Herman (Marc)]
  • [Add Hermans relations for proteins to the SOLR database (Erik/Herman)]
  • [If checked, relations can be submitted back to the SOLR service and added to the SOLR database/triple store (Bharat/Erik)]

In-text concept recognition (MSWord and Excel) of unique or new triples

Proposed by Barend, Erik, and Nigam. Show the triple store derived information in MS Word or Excel. Nigam indicted they have an Excel plug-in for which the code base could be used . A MSWord plug-in maybe ready by the time of the hackathon which also could be used. If necessary, there is already an MSWord plugin developed by Microsoft and UCSD that recognizes concepts using ontologies from NCBO (the concept recognition is done within Word and the output is smart tags markup in Word).

use scenario

  • Two concepts are recognized in a text (or excel spreadsheet).
  • It is indicated whether these concepts have a previously determined relationship (as contained in the DB) or whether the relationship is new.

Optional

  • User can indicate new relationships that should be added to the DB

Action Points

  1. Get the triple store (from Erik?) ready: source code, APIs, documents have to be clear and ready in the code base.
  2. Get a text editor, e.g., MS word, (EXCEL) that supports highlighting and annotation popup. Get the relevant user manuals (if available) as well.

TO DO

Christine will send roughly 10 sentences that contain concepts for which triples exist in Erik's triple store. Erik (?) will send (or have ready at the hackathon) the service signature (preferably RESTful) that Excel or Word can POST to. Nigam will do a dry run of:

  • recognizing concepts (using the NCBO Annotator service) from sentences sent by Christine
  • composing a service request for "check for a triple b/w these concepts"
  • obtaining a response from the service and displaying it (in a browser or within Excel/Word).

Enriching PDFs with the ConceptWiki

Responsible Persons

  • Scientific Lead: Steve Pettifer
  • Technical support: Leon, Dmitry

Problem Statement

Many open access resources provide full text articles in PDF format. It would be great to highlight and link the concepts in PDF files to the ConceptWiki. Steve Pettifer at OMIIUK has an indredible de-PDFying tool and we might be able to this with PDFs. To carry this further it would be useful if in the de-PDFified documents links could then show the popup from the knowledge enhancer for more useability.

Steve's tool can convert pdfs to XML documents. We would then like to index these documents with Peregrine to identify the concepts. The concepts then can be linked to the ConceptWiki via the RDF which will in turn connect the concepts in the pdfs to the semantic web via the ConceptWiki. Adding back the Knowledge enhancer popup to the highlighted connected concepts in the pdfs would also give more functionality to the user.

Plans

  1. Prepare Peregrine, Knowledge enhancer and ConceptWiki developer documents
  2. Get "the de-PDFying tool" to a local code base.
  3. Identify a few data resources that provide full text articles in PDF format

Participants

(maximum 20)

Please indicate behind your name if you have registered and will be attending the NBIC conference

  • Kees Burger (NBIC) (confirmed, registered for the NBIC conference)
  • Rob Hooft (NBIC) (confirmed, Staying in Utrecht Friday night, conference:yes)
  • Dmitry Katsubo (NBIC) (confirmed)
  • Hailiang (Leon) Mei (NBIC, confirmed, registered for the NBIC conference)
  • Erik van Mulligen (confirmed Friday/most of Sat, EMC/Knewco)
  • Bharat Singh (NBIC) (confirmed, registered for conference: yes)
  • Nigam Shah (Stanford, will attend by skype)
  • Morris Swertz (RUG/Lifelines/NBIC)(confirmed, need a room Friday night)
  • Despoina Antonakaki (RUG/LifeLines/NBIC) (confirmed, friday only)
  • Valery Tkachenko (Chemspider) (will not attend but may participate remotely)
  • Marco Roos (LUMC)(confirmed)
  • Scott Marshal (Stanford/LUMC/W23C/HCLS)
  • Marc Weeber (Knewco)?
  • Steve Pettifer (OMIIUK, confirmed, staying in Utrecht)
  • Phil McDermott (OMIIUK, confirmed, staying in Utrecht)
  • Paul Groth (VU) (Friday, not at NBIC conference)
  • Matthias Samwald (will participate remotely / via Skype, if this is possible)
  • Andrew Gibson (UvA, confirmed, not currently registered for NBIC conference)
  • Andrew Su (GNF, apologies, will not be able to attend)
  • Martijn van Iersel (UNIMAAS) ?
  • Andra Waagmeester (UNIMAAS) (confirmed)
  • Herman van Haagen (LUMC, no hackathon, oral presentation on Monday during NBIC meeting)
  • Martijn Schuemie ?
  • Erik Schultes Duke & LUMC, registered and attending the NBIC conference

observers:

  • Barend Mons (LUMC/NBIC)
  • Frank van Harmelen ?
  • Anita de Waardt ?
  • Christine Chichester : (confirmed, need room for Friday and reg for NBIC conference)

Agenda & Location

Friday March 26

Location: SURF Foundation (next to Central Station) Google map

Hojel City Center, gebouw D (5th floor)
Graadt van Roggenweg 340
3531 AH  Utrecht             T 030 234 66 00
Postbus 2290                 F 030 233 29 60
3500 GG UTRECHT              E info@surf.nl
Time slot Responsible Type Subject
9:30-9:45
Barend Mons -
Welcome and purpose
9:45-10:15
Rob Hooft -
Introductions
12:00
-
Lunch
18:00-21:00
- -
Dinner at Poort van kleef

Address: Mariaplaats 7, 3511 LH Utrecht, Phone: +31 (0)30 231 80 84, Website: Poort van Kleef ,Map , Reservation at 18:00 for 15 people under the name: Van Beemen (if there are more or fewer they need to know before noon on Friday)

Saturday March 27

Location: alternative location, accross the road from SURF, Hogeschool Domstad, Koningsbergerstraat 9, Utrecht, Map


Time slot Responsible Type Subject
9:30-9:45
Barend Mons -
Welcome and purpose
12:00
-
Lunch
16:00-17:00
- -
Preparation for the conference session

Hotel information

Hotel rooms booked for:

Hotel information: NH Hotel Jaarbeursplein 24 3521AR Utrecht Tel: +31.30.2977977 Fax: +31.30.2977999 E-mail: nhutrecht@nh-hotels.com Website reservation number: please ask Jacintha Valk-van Beemen via office@nbic.nl

Attention

  • Try to get to know each other in advance, e.g., check CV, profiles, and technical level of each other.
  • Identify goals very precisely and communicate them to the attendees in advance. It is best if attendees can share their design and implementation ideas based on the identified goals prior to the Hackathon.
  • Make sure the HW/SW/Network are ready.

Final Checklist

  • location (done)
  • (wireless) network connection (available)
  • 3 additional laptops for the video skype sessions with remote participants
    1. I can bring one extra netbook with integrated wifi&webcam. --Leon 10:52, 21 March 2010 (CET)
    2. Dmitry is taking a second laptop Rob Hooft 09:11, 25 March 2010 (CET)
  • a sandbox environment/server (Rob Hooft/BioAssist). Done: I have a rackspacecloud account and will be able to create sandbox servers on demand in minutes with anywhere between 256MB and 16GB of memory and any of 15 different Linux systems installed. Rob Hooft 21:23, 2 March 2010 (CET)
  • a local code base
  • drinks and food (done)
  • reservation hotel rooms (done)
  • determine who will stay on for NBIC conference (done)

Evaluation from the BioAssist Engineering Team

What Went Well

  • Exciting results
  • Great people
  • Dinner
  • Well planned division of tasks
  • Remaining freedom for project assignments
  • Focussed development

Take A Look At

  • Development capacity to follow up on the results
  • The end of the second day died out slowly
  • People that could not stay both full days
  • Taking up a Saturday
  • Preparations took a lot of unplanned time, throwing our Agile cycle in jeopardy
  • Do not stick to the semantic focus next time: either more diverse subjects, or another focus.