NBIC Galaxy Hackathon project

From BioAssist
Jump to: navigation, search

This page is used to identify and prepare the NBIC Galaxy tasks at NBIC 2011 Hackathon.

Logistic

  • Dates

April 13-14, 2011 (Wednesday and Thursday)

  • Location

Farm house in Belgium very near the Dutch border. We will arrange transport from Maastricht train station (which will take about 20 minutes).

  • Participants

Alex Bossers (WUR), Dannon Baker (Emory), Frans Paul Ruzius (NGS), Freddy de Bree (CVI, WUR), Freek de Bruijn (NBIC), Henk van den Toorn (UU), Ishtiaq Ahmad (RUG), Joachim Jacob (VIB - BITS), Martijn Vermaat (NGS), Nate Coraor (Penn State), Rob Hooft (NBIC), Wil Koetsier (UMCG)

  • Coordinator

Freek de Bruijn (NBIC)

  • Hardware
    • Network equipment
    • Coffee machine
    • Access to the bitbucket and trac galaxytools repositories
    • A NBIC Galaxy VM

Actual subjects

Loop over files in a directory

  • Requester: Pieter Neerincx
  • Developers: Dannon, Frans Paul, Freddy, Ishtiaq, Joachim, Martijn

Can be done hard coded but would be nice to have it as a history/workflow feature.

Can be used for QC over a number of files.

Comment from Alex: "I have this operational by bash and perl scripts that allows upload of multiple files in tar ball. In temp makes a dir, extracts, loops over all files and returns all results in tgz ball. In addition we have it working using the default repeat function to grab multiple files from the history and process them in a tool ij one go. This is single tool only so most likely you want to focus on doing these kind of things using complete work flows right?"

This is similar to Galaxy Central issue #53 Iterative execution in a workflow.

Comment from Kees van Bochove: Not exactly similar, that ticket is about iteration in workflows. I think looping over files would be 'batch execution': so apply the same workflow to many files. Ideally, Galaxy should have a 'Set' history item type which would consist of multiple files. It could look and be treated just like a single history item, but when a tool is run on it, it should be run for each file in the set and the result would also again be a set. This feature would make a lot of sense to me. In many cases (I'm thinking about transcriptomics, metabolomics) biologists would have a set of files and just apply one 'pipeline' or 'workflow' to all of them.

Hackathon: The workflow running interface was modified to allow for selecting multiple data items from the history as input for the workflow. For each selected input, a separate workflow invocation is created and the results are stored as separate history items. Also, the list for selecting data items can easily be filtered by free typing.

Galaxy-multi-input.png

Improving tool installation

  • Requester: Rob
  • Developers: Henk, Rob, Wil

All tools are listed in the tool_conf file. We want to make it modularized. --Freek de Bruijn 15:01, 16 March 2011 (CET)

Improve the way galaxy works with the tool shed.

Currently, BITS has a prototype of a script install tool to easily add scripts to a galaxy (e.g. from shed), developed by Michiel Bataillie (BITS). Upload a tar.gz file (with script and xml file) and restart galaxy from the output from the install tool, which is a php script. --Joachim

We can work on something with RPMs here? refer to the discussions with VIB --Leon 14:53, 17 February 2011 (CET)

Improving the link with the toolshed or using RPMs sound like decent options to me. In addition you may want to have a look at CDE or similar tools to make the commandline tools more portable --Pneerincx 15:25, 16 March 2011 (CET)

This topic can be divided into two sub topics:

  • make it easier to add a tool to Galaxy (add it to tool_conf.xml, add additional datatypes if a tool requires them, add example/test data for that tool, etc.)
  • install the (commandline) tool itself (Galaxy independent stuff)

Hackathon: split up tool_conf.xml to make installing and uninstalling tools easier. Two Python scripts have been created. The first breaks up the existing tool_conf.xml in separate configuration files for each tool (which needs to be run only once). The second script combines all the files into a generated tool_conf.xml (which should be used after each modification).

Improving NBIC Galaxy UI

  • Requester: Leon Mei
  • Developer: Alex, Freek, Nate

We want to merge some default Galaxy tool sections and make the NBIC tool sections (e.g. GAPSS or a better name should replace NGS: Snip Detection & NGS: Tools LUMC) more visible. Some experience can be obtained from Alex's version at CVI, for which the default menu's have been modified drastically. See this screenshot of Galaxy@WUR taken from a presentation given by Alex to get an impression:

Galaxy at WUR.png


You can view the entire presentation here.

Comment from Alex: "A great addition would be one extra menu level in the tool list. This would avoid long lists. Another "hack" might be just to display the main menus and while hovering the links the submenus explode/collapse....."

Hackathon: see "Customization of the tool panel". You can edit the tool_conf.xml and use tags to improve the NBIC Galaxy UI.

Making the access to NGS/Proteomics pipelines easier

  • Requester: Leon Mei
  • Developer: Frans?

A bit related with previous task. Workflows should be really easy to find, preferably without having to be logged in.

Customization of the tool panel

  • Requester: Leon Mei
  • Developers: Alex, Freek, Nate

Make it possible to have different tools visible to different users, including "user profiles" (proteomics users have a different view from the beginning).

https://bitbucket.org/galaxy/galaxy-central/issue/286/user-customization-in-the-galaxy-tool-pane

Hackathon: we've added support for tags, which are used to create groups of tools that you can view and search is. You can combine tags with the search box to filter the available tools. The state of the tool search is stored in the user preferences: whether the tool search is opened and which tags are selected.


Tools selection with tags.png

Future subjects

Dynamic installation of tools

  • Requester: Kostas
  • Developer: ?

Currently, to add a new tool the Galaxy server has to be restarted. Ideally we would like to be able to do that automatically from within the Galaxy UI.

Only specific users/roles should be able to do that. Similar work has been done by other groups but are not part of the Galaxy codebase.

This topic can be divided into two sub topics:

  • relate tools per user/role, i.e. not all tools will be available to a specific user
  • implement programmatic reloading of tool specifications

Hackathon: although we did not actually manage to work on this feature, we had a useful discussion. It turns out the Galaxy Team is already planning to work on improving tool installation this year and is certainly interested in patches in this area.

Call another workflow within a workflow

  • Requester: Pieter Neerincx, Jan van Haarst
    • we should have a clear use scenario here.--Freek de Bruijn 14:30, 16 March 2011 (CET)
  • Developer: ? + Dannon

Comment from Alex: 1) I think there are plenty of use cases here. Galaxy promotes the breakdown in elementary steps of most of the tools (so not making do it all in one tools) to allow modular reuse. So small sub workflows are used a lot in our case. i.e. alignment of genomes, making SNP reports, making alignment plots, converting alignment data to be viewed/analysed elsewhere.....

2) The galaxy API is about to be released..... in addition the work of Kostas regarding the Taverna workflow plug might give a basic start since galaxy workflows also have defined ins/outs and parameters which should be addressable somehow :) by direct access (or a hack) or by using the experimental API.

Support interactive visualization

  • Requester: Pieter Neerincx
  • Developer: Rutger Brouwer?

Marcel is interested to support autorefresh of images in Galaxy UI. For displaying the progress of certain process. --Freek de Bruijn 14:40, 16 March 2011 (CET)

Generating static tables or a static picture is easy in Galaxy, but how do you plug an interactive visualization tool? Is SVG+JavaScript an option? Is Cytoscape with Java Webstart an option?

Jeroen also likes this one.

Comment from Alex: "We made some "fixed" displays which is easy for instance using R. We are about to look at plugging WebArtemis but since its about curation/annotation...I have no idea where the improved and updated result files should go......"

Charlie asked a question about Cytoscape web start on the Galaxy development mailinglist (March 11th)

Excel upload

  • Requester: Kees van Bochove
  • Developer: ?

Biologists often use Excel to store their data. In my first real usage test of Galaxy, I tried to upload an Excel file and run a PCA on the data. This proved really hard. Excel upload is not supported. CSV-export from a Dutch-localized Excel program results in semi-colon separated files, so that also does not work. So I had to go to tab-delimited (a biologist would have given up by now). But then, I couldn't convince Galaxy that my first row actually was a header row, which would make sense to have. E.g. in a scatterplot I would like to choose to plot 'Age' vs 'Weight' instead of 'c3' vs 'c5' (less intuitive and more error prone). Testset to use: https://trac.nbic.nl/gscf/raw-attachment/ticket/395/subjecttest.xls, contains fake data

Galaxy badgrouping.png

Open issues Galaxy Central

We want to make sure we are not fixing issues that have already been solved.

List with open issues for Galaxy Central on bitbucket.

It would be very nice if our improvements could be included in the central Galaxy version (that is maintained by the Galaxy Team). We will contact the Galaxy Team to ask their advice.