Human Computation Hackathon project

From BioAssist
Jump to: navigation, search

This page is used to identify and prepare the NBIC Human Computation tasks at NBIC 2011 Hackathon.

Background

The goal of this project is to develop a task that can be used on Amazon's Mechanical Turk site to use the human intelligence to assign biological concepts to biological laboratory supplies.

It has proved difficult in the past to use automatic methods to assign relevant concepts, such as genomic imprinting to products such as DNA methylation kit. It appears that the most efficient way to accomplish this task is by the use of peoples knowledge.

There is available software that can show additional information when concepts are detected in a text (Knowledge Enhancer). When a concept is detected in a text, a popup window can be called which displays information pertinent to that concept. Showing very specific product information in this popup window is one possibility but the product needs to be assigned to the correct specific concept.

Participants

  • Christine Chichester
  • Andra Waagmeester
  • Kees Burger
  • Alec Tunbridge
  • Reinout van Schouwen
  • Bharat Singh
  • Jahn Saito

Preparatory Data Tasks

Different methods need slightly different preparatory tasks

Present pre-selected list of concepts method

  1. Get frequency of ConceptWiki concepts in PubMed
    1. Use Peregrine to index PubMed and frequency scripts developed by Leon
  2. Decide on frequency thresholds for concepts
  3. Cluster/categorize concepts of desired frequency
    1. Group on semantic types
  4. Find concepts categories that correspond to products
    1. The template of semantic type-relation has already been defined in the semantic network of UMLS
      1. "Molecular Biology Research Technique" "uses" "Research Device"; Example: “Western blotting” “uses” “filter paper”
  5. Find semantic types that represent products
  6. Limit concepts in categories

Autocomplete Method

  1. Get frequency of ConceptWiki concepts in PubMed
    1. Use Peregrine to index PubMed and frequency scripts developed by Leon
  2. Decide on frequency thresholds for concepts
  3. Generate a Lucene index for all (life science) concepts in ConceptWiki (using concept preferred term and all alternative terms).
  4. Generate a search box with an autocomplete function to select from the concepts within the frequency threshold

Product Backlog for Autocomplete method

User stories:

Story # Story Storypoints
1 As a mturk worker, I want the autocomplete to work quickly. My standard for quick is the same as is done with google autocomplete.
2 As a mturk worker, I want the autocomplete to show approximately 20 options, not more.
3 As a mturk worker, I want the instructions small enough that I do not have to scroll to get to the HIT.
4 As a mturk results reviewer, I want all work by specific mturk workers grouped together.
5 As a mturk results reviewer, I want all work submitted without a concept to be rejected automatically without review.
6 As a mturk results reviewer, if two workers submit the same concept for the same product, then it is automatically accepted and paid.
7 As the project requester, I want the same product to be submitted to at least two different workers.
8 As the project requester, I would like to have 2-3 different concepts per product.
9 As the project requester, I would like the task to have a game option where two people need to match the concept.
10 As a mturk worker, I don't want to see the instructions every time I visit the HIT page.
11 As a mturk worker, I want the term that appears in the second box to be automatically selected if there is only one term.
12 As a mturk worker, I dont want to receive for a second time a HIT I have already rejected.
13 As the project requester, I would like to remove a concept from the first dropdown list when it has been used 3 times with different products.
14 As a mturk results reviewer, I would like to be alerted when a concept has been used 3 times for different products.
15 As a mturk worker. I would like to have a hint box to help me find terms I may not be aware of.

How it should look

HumanComputationInterface.png

How it could be designed

WIP, currently rev.5
click for the original diagram


Workers Tasks

  1. Generate mturk task: Worker workflow for preselected list of concepts or autocomplete method
    1. Look at product website
    2. Select from given list of concepts best relating to product
      1. The list of concepts would either be preselected or based on autocomplete and the workers interface would differ depending on method. Autocomplete interface shown above
  1. Generate mturk task: Worker workflow for free text entry of concepts by mturk worker
    1. Look at product website
    2. List 10 concepts appropriate for product that do not appear in product description
      1. Compare free text list to concepts of desired frequency. This means data processing after mturk data collection