Requirements DSP Spectral trees

From BioAssist
Jump to: navigation, search

Spectral Trees

There are several possible use cases for the use of the spectral tree database in Leiden.

  • Adding a spectrum to the database
  • Querying the database
  • Annotating the database (annotating fragments)
  • Browsing and visually inspecting the database

Adding a spectral tree

Adding a spectrum to the database consists of two steps. First processing takes place of raw data files, then after curating spectral trees can be added.


1. Users upload mzxml files (raw data) and process online. The output comes in the form of a CML file for each MZXML file. As input also parameters are given. By default some of these paramaters are set. The original MZXML file is stored for later (re)use.

2. The resulting CML files are manually inspected on the basis of a simple visualisation of the CML output. A person with the validator role (who is probably most of the time the uploader him/herself?) checks if the CML is correct. When not the processing can be repeated with different parameters. If the processing is done correctly there follows another manual step: curating. A curator looks at the spectral tree and decides to upload it to the database. At the moment the manual checking of the CML file is done on the basis of a simple text based representation of the trees. This should be improved and a more intuitive visualisation of the spectral tree should be provided. Curation (probably) is also done on the basis of a visualisation of the tree.

Adding a tree to the database

The database of spectral trees contains RDF representations of the trees, with pointers to the CML files. The MZXML files are also stored. The process of adding a tree to the database has already been implmented.


Users can query in several ways:

  • On the basis of an MZXML file
  • On the basis of a CML file
  • Typing in a matrix like query

When MZXML is the input then processing (see above) should take place first and a CML file is produced. The matching algorithm is still subject of research. The search returns an ordered list of possible matches according to some similarity measure. When a perfect match is not found, partial results are returned.

Inspecting search results

In order to inspect the search results the trees should be intuitively visualized. There are basically two ways in which users will look at the results. First, they will look what is in the database and only inspect the top nodes. Second, they get partial matches and want to inspect the returned trees for similarities and differences. This means that the possibility of comparing trees with collapsing nodes should be implemented. This visualisation part has not been implemented. Furter requirement analysis is needed.

Annotating spectral trees

Experts are allowed to annotate the fragments of a spectral tree. When adding a spectral tree only the identity of the top node of the tree is known. Annotation is done by adding structural information and other data to the subnodes of the tree. The expert also wants to see the spectra of the subnodes, which might involve use of the raw data (MZXML). The product Mass Frontier makes use of a structure editor, this could be instructive for this step.

Browsing search results

See the section inspecting search results above. This is very close to inspecting search results, in the sense that visualisation tools are needed. Mass Frontier is a commercial product for spectral trees. The complaint is that it shows too much information. It can be helpful to use Mass Frontier in the process of acquiring visualization and more detailed requirements.