TavernaGrid

From BioAssist
Jump to: navigation, search

Introduction

TavernaGrid workflow processing

TavernaGrid is a project by SARA and Eagle Genomics Ltd., funded by NBIC. The primary goal of the project is to allow users to compose Grid web services with Taverna.

News

2008-10-29

We just had the project kick-off meeting! Richard Holland was at SARA for three days. During many short sessions, we've managed to get a firm understanding of what needs to be done, and we agreed on the first timebox, which will run until Christmas. Some documents were drafted, which can be found in this Wiki.

2008-11-15

API documentation uploaded.

2008-12-05

Code-in-progress has been committed to SVN.

Done so far:

  • A working SOAP client that sends requests over HTTPS and includes custom headers.
  • A working SOAP server that receives reqeusts over HTTPS and parses custom headers.
  • An API that can retrieve a certificate from MyProxy, modify it with VOMS attributes, and delegate it to WMProxy.
  • The API can also submit a job, but this bit is currently broken (giving SSL errors from the WMProxy server, not due to communication with WMProxy, but due to communication between WMProxy and some other internal server or piece of code - to be investigated next week).

Next week:

  • Visiting Manchester on Tuesday+Wednesday to meet the team, find out more about Taverna, and discuss the best way of implementing the new SOAP client code as a Taverna plugin.
  • Try to work out what's going on internally in WMProxy as to why it's giving SSL errors when submitting jobs. Might need a hand here.

Before end of sub-contract:

  • Produce plan of how to integrate this all into a Taverna plugin.
  • Make API able to fully submit jobs. Hopefully to include Machiel's modifications to be able to include InputSandbox.
  • Simplify API and example code where possible and make sure fully internally documented.
  • Possibly add API code for delegating user certificates to MyProxy.

2008-12-08

Code-in-progress has been committed to SVN.

Jobs can now be submitted and executed and output (or errors) retrieved. No-input, single-output jobs are possible. The client and server side of the service both work. All this is correctly achieved via delegated certificates from MyProxy locally modified within the service to have the VOMS attributes added. Certificates need to be manually delegated to MyProxy before calling the service.

As a result of visiting the team in Manchester, I've produced a plan for the Taverna integration which Alex and Stuart have both reviewed and agreed is sensible. The plan has been placed here. I believe this is detailed enough to serve as a template for the next unit of work to be planned, but please let me know if you need more detail. In terms of timescale for this I can start some simpler parts of it in January, but Tom will be presenting a code tutorial in Manchester in February which I plan to attend, at which point I hope to learn enough to get the more complicated parts of the Taverna plugin code written.

Tasks remaining to complete before end of sub-contract, in order of priority:

  • Fix dependency issues in ws-client.
  • Remove temporary code inside the API/ws-server/ws-client implementations and make them into neat and tidy deliverables.
  • Produce detailed usage docs on the wiki.
  • Javadoc everything in detail.
  • Implement single- and multiple-input file upload and multiple output file download for jobs.
  • Simplify API and implement MyProxy delegation methods for client side to remove need to manually delegate certificate.
  • Investigate possibility of long-running proxy delegation.

2008-12-09

Code-in-progress has been committed to SVN. API documentation amended and updated.

  • Dependency issues in ws-client have been fixed and the SOAP call code simplified.
  • Temporary code has been removed throughout.
  • MyProxy delegation on the client side has now been implemented and is demonstrated by ws-client.

Remaining to do in this sub-contract:

  • Correct and expand all JavaDocs.
  • Fully document system on Wiki.
  • Implement single- and multiple-input file upload and multiple output file download for jobs.
  • Implement asynchronous web service methods and demonstration of their use.
  • Investigate possibility of long-running proxy delegation.

2008-12-11

Code-in-progress has been committed to SVN.

  • All JavaDocs done.
  • Asynchronous calls implemented and demonstrated in ws-client.
  • API documentation complete.

Remaining to do in this sub-contract:

  • Implement single- and multiple-input file upload and multiple output file download for jobs.
  • Investigate possibility of long-running proxy delegation.

2008-12-15

Code-in-progress has been committed to SVN.

  • Single- and multiple-input file and multiple-output file tested and working.
  • API documentation updated.

Still to-do:

  • Finish planning and research for Taverna extension (mostly complete).
  • Long-running proxy delegation (optional).

2009-01-19

  • A demo Tomcat server has been set up.
  • A Maven repository has been set up.
  • The dependency modules and the api module have been added to the Maven repository.
  • The ws-client and ws-server are not suitable for putting into the repository as they are not library modules. They are designed to be templates for other projects instead.
  • The 'tavernag' user on ws1 hosts the Tomcat and Maven services.
  • The API documentation has been updated to refer to the new repository, and the demo service.

2009-02-03

  • An OMSSA server (omssa-server inside MyProxyWebServices) has been committed to Subversion but not yet tested as details of binaries and databases not yet provided. Like ws-server it is not available in the Maven repository as it is not a library module.
  • Details obtained on the general principles of how to make a T2 plugin. These are for to be studied in advance of the Taverna workshop.
  • Revised plans for best way of designing T2 extension for grid to simplify user interaction. The revised plans have been sent but no comments were received yet.
  • Email issues resolved and can now communicate properly with Pieter.
  • Missing contract issues resolved and signed contracts now received at both ends.
  • Will be attending Tom's T2 workshop, travelling to Manchester on the 17th and returning on the 19th.

2009-03-02

  • Attended T2 developers workshop in Manchester.
  • Committed template code for T2 plugin, including workflow component and central config panel. Doesn't work yet but template is in place to allow code to be completed rapidly.
  • Committed MyProxy delegation time modification to API.
  • Provisionally booked to visit Amsterdam 3rd/4th April.

2009-03-24

  • Working plugin. Need to work on usability now but functionally it is complete.
  • Committed all code to subversion.
  • Amsterdam visit will take place 15th-17th April.

2009-04-07

2009-04-09

  • Fixed some bugs in the web server, API and plugin code.

2009-04-14

  • Clarifications to the OMSSA code.

2009-04-30

  • Copyright and Apache 2 licence notices added. Note that code copied from Manchester retains the LGPL licence. Code from Vincento regarding Base64 encoding retains Vincento's original copyright notice. Code from Sun for auto-completion retains Sun's original copyright notice.
  • Packages renamed from nl.sara to nl.nbic.
  • Default value provided in delegation field to prevent message about invalid time when applying not delegating.
  • Default profile provided with SARA lsgrid details.
  • Logging installed in API and taverna UI plugin - all at DEBUG level.

2009-05-08

  • Demonstration command-line tool for delegating MyProxy certificates added as 'mpx-delegate' under the existing MyProxyWebService project. Documentation added to wiki for API docs.
  • Added SSL certificate auto-detection and user-acceptance into T2 plugin.

2009-05-12

  • Anonymous delegation implemented as per Machiel's outline but untestable due to bugs in WMS.

2009-05-14

  • Job collections and parametric jobs implemented in API.
  • Expiration problem: seems to be something inside one of the Jlite/Glite libraries. Diagnosis:
    • Submit job. MyProxy correctly delegated (we'll call this the 'original' one the very first time you do it after starting up the Tomcat container). VOMS proxy correctly constructed. VOMS correctly delegated to WMS with unique delegation ID. Job submits OK.
    • Keep checking status till returns COMPLETE.
    • Retrieve results. All OK.
    • Repeat first three steps multiple times until the 'original' MyProxy certificate delegation has expired, usually 1 hour later as this is the default setting in the API code.
    • Submit job. MyProxy correctly delegated. VOMS proxy correctly constructed. VOMS correctly delegated to WMS with unique delegation ID. Job submits OK. All seems fine and is same as before.
    • Keep checking status till returns COMPLETE. All seems fine and is same as before.
    • Retrieve results. Throws exception: "Authorisation failed / Expired credentials detected" from deep inside WMProxyAPI/GridFTPClient during UrlCopy process. All these three classes mentioned are part of the Jlite/Glite libraries.
    • Can continue to submit jobs fine, and they all run fine on the grid, but output file retrieval fails every time until Tomcat container is restarted, when all becomes fine again.
    • Re-delegating MyProxy, re-creating VOMS, re-delegating to WMS using the same delegation ID, all have no effect.
    • Appears that Jlite/Glite API is somehow caching the first certificate delegated to WMS after Tomcat startup, and is refusing to change it for a newer one once it has been cached.
    • Conclusion: this bug is not with our API code but with the Jlite/Glite API code. Does not affect standalone applications which run only one complete job request/retrieval cycle then exit completely.

2009-05-21

  • Expiration problem definitively caused by SSLSocket session caching, which is a part of the Java SSL API which assumes that if you talk to the same host, you want to reuse the same credentials as last time. It can be overridden but only by talking directly to the socket, which is hidden deep inside the AXIS client code and is therefore beyond our control.

2009-06-15

  • Back from paternity leave!
  • Demo OMSSA service coded but still needs details of where to find BLAST dbs on grid before can deploy and demonstrate.
  • Expiration problem above is also a problem in that the service will lock itself into the first user that uses it, with all other users ending up using the wrong certificate to retrieve files, thus failing. Again this is because of the AXIS client code in the Grid libraries and is beyond our control. It would seem that the Grid libraries themselves are very poorly written (inappropriate use of static variables and system environment variables) and were not designed to allow access to multiple users (or even multiple concurrent tasks) from within the same Java program. This precludes any reasonable use of Grid libraries within the context of a web service where one Java program (the application container, e.g. Tomcat) is shared between multiple client requests.
  • Machiel advised to skip eToken work due to lack of interest/knowledge at SARA.

2009-06-22

  • Demo OMSSA service now using hardcoded database provided by Pieter. Service can be deployed when SARA is ready for testing it. Owing to expiration problem this will have to be coordinated closely with person testing it. Note that solution uses hardcoded databases which is not ideal - and it also has to download them to Tomcat then re-upload them to the grid. Until a map can be provided which converts database names to URLs/paths, and until databases can be made available on the grid by default via some kind of shared folder, this is the only way to go despite being very inefficient.
  • Contact made with Piter from VBrowse to discuss integration/reuse of code. Piter is also providing advice in order to attempt to fix the expiration problem.

2009-06-24

  • With the help of Piter, the expiration problem is now fixed. Solution was to copy+paste the broken JLite code directly into the API and edit it to correct it. The corrections required were to provide credentials to UrlCopy calls, and to turn off third-party copy.
  • The demo server on ws1 is now running the corrected code and is working properly.

2009-07-01

  • Created javadocs. Emailed bundle to Pieter/Machiel for installation on ws1 and in GForge.

2009-07-21

  • Full user (TavernaGrid plugin) and developer (TavernaGrid webservices/client and API) documentation.
  • Developed half-day hands-on workshop to demonstrate TavernaGrid plugin, building TavernaGrid web services, building clients for TavernaGrid web services, and working directly with the grid using the TavernaGrid API.
  • Arranged to present workshop in Utrecht on 24th July.

2009-07-24

  • Presented workshop in Utrecht.

2009-07-26

  • Made updates to code and docs based on problems/feedback from workshop in Utrecht.

Timebox 1 (T1) Work Package

Timebox Parameters
Start date 2008-10-27
End date 2008-12-23
Eagle Genomics hours
Period Days Comments
2008-10-27 — 2008-10-29 3 Kick-off meeting
2008-11-01 — 2008-11-14 4 50% of 4 days/week
2008-11-15 — 2008-11-24 0 Richard unavailable
2008-11-26 — 2008-12-23 17 4 days/week
Total: 24 ≋ 192hrs ≋ € 9,792

MoSCoW Prioritized Actions and Acceptance Criteria

Do not edit the Acceptance Criteria without the consent of all team members!

Musts

  • An example WS-I web service (hereafter called TestWS) has been created and deployed.
  • The functionality of the TestWS must be testable. The trivial way to do this is to build a test client (which is listed as a SHOULD), but good alternatives are acceptable.
  • The TestWS accepts authentication and authorization information in SOAP headers:
    • MyProxy hostname, port, username, password;
    • VO-name, VOMS-server hostname and port, VOMS-server X.509 certificate DN.
  • Proof that the VOMS-proxy created by the TestWS will allow WMS job submission and SE data manipulation. This could be proven by having the TestWS actually perform these tasks (which is listed as a COULD), but good alternatives are acceptable. E.g., TestWS could create a PEM-formatted proxy file, that can then be tested for validity with command line utilities.
  • Investigate what exactly needs to be done to have Taverna properly call our TestWS, and produce a draft version of the Stage Plan for the next timebox.

Shoulds

  • Create an example client.
  • Proper JavaDoc documentation of all code.
  • Proper System documentation in this wiki. (What is where, how to deploy, etc.)
  • Code is in the GForge SCM.
  • Describe the extra credential XML in the SOAP header (SARA).
  • Draft a document that can be used for our meeting with the Taverna development team.
    • Things we don't understand (Richard)
    • Things we'd like to see in Taverna2 (All)
    • People we'd like to meet (Pieter, Marco)
  • Have a (two day) meeting in Manchester with the Taverna development team, somewhere at the end of this timebox.

Coulds

  • The TestWS can submit Grid jobs to a WMS and manipulate data on an SE.
  • Taverna extension/plugin development:
    • User interface for managing the configuration.
    • Storing the configuration in a proper configuration file.
    • Creating a Service Provider
      • First, create a "stub" Service Provider, that's just a copy of the (default?) WSDL/SOAP Service Provider, without any extra functionality.
      • Then, extend this Service Provider to submit Credential information in the SOAP header.
  • Create an XSD for the extra credential XML in the SOAP header (SARA).

Won'ts

  • Anonymous proxy delegation/retrieval to/from the MyProxy server for long lived jobs.

Reporting

  • Richard, Machiel and Pieter will have weekly Skype meetings. During these meetings:
    • all identified risks will be considered, and the Risk Log will be updated.
    • progress will be discussed on the basis of the prioritized Acceptance Criteria above.
  • The minutes of these meetings (first draft by SARA, corrections by Eagle Genomics) will serve as detailed progress reports. No further progress reports are expected from Eagle Genomics.

Risk Log

This risk log is incomplete.

  • We can't agree on a proper WS framework to start with.
  • SARA doesn't provide the necessary environment for Richard to deploy the TestWS.

Timebox 2 (T2) Work Package

Timebox Parameters
Start date 2009-01-12
End date 2009-04-30
Eagle Genomics hours
Period Days Comments
January 2009 6 3 days/week, 2 weeks (delayed start, decorating house)
February 2009 9 3 days/week, 3 weeks (travelling on other business)
March 2009 9 3 days/week, 3 weeks (moving house)
April 2009 12 3 days/week, 4 weeks
Total: 36 ≋ 288hrs ≋ € 14,688

MoSCoW Prioritized Actions and Acceptance Criteria

Do not edit the Acceptance Criteria without the consent of all team members!

Musts

  • Publish the existing web service on a publically testable Tomcat server at NBIC/SARA.
  • Secure the certificate storage in MyProxyUtils so that it doesn't leave them visible/accessible on disk. At times when disk visibility is necessary, use chmod to restrict them to user-read-only (400).
  • Allow users to install MyProxyUtils via Maven.
    • Install the jlite-deps dependencies into a public Maven repository hosted at NBIC/SARA, and point the MyProxyUtils pom.xml files to this new repository instead. This removes the need for the separate jlite-deps component of MyProxyUtils, thus allowing Windows users to interact with the service.
    • Install MyProxyUtils itself into a public Maven repository at NBIC/SARA.
  • Create VOMSWSDLActivity in Taverna (based on document on Mancester WIKI).
    • Basis is an extension of WSDLActivity.
    • Pass through the headers required for MyProxyUtils-based web services.
    • Local config screen in activity to request the selection of a certificate from the Credential Manager for use as the user's public certificate.
    • Local config screen must allow entry of MyProxy server and VOMS server/DN/VO details. Each of these two sets of info must be auto-completable, with the auto-completion data stored in the central Taverna config bean.
    • On start of each VOMSWSDLActivity, request from user the Credential Manager login, and the amount of time they wish to delegate their certificate for. Load and delegate the certificate, then start executing the activity.
  • VOMSWSDLActivity is shown to work correctly via reproducable set of testing instructions.
  • SARA proposes a particular test-case for the WS: one of our customers has a Mass-Spectrometry analysis tool (called OMSA) that runs on the Grid. As a working example, we'd like to make that particular data analysis tool available as a web service, to be called from within Taverna2.

Shoulds

  • The proposed work (see under MUSTS) depends on the availability of certain features within Taverna2 that haven't yet been (fully) implemented. SARA proposes some alternative tasks, should the Taverna2 development itself prove to become a bottleneck. In particular, SARA would like to have a simple portal framework for building REST Web-Apps that work on the Grid (as an alternative to SOAP Grid web-services).
  • Handle MyProxy expiry times better in the API.
    • Adapt the API specification to allow user to specify MyProxy delegation expiry time.
    • Adapt the API implementation code to check if a job cannot get a certificate for enough time, and exit gracefully with a meaningful message.
  • Design and implement a webapp which allows both synchronous+asynchronous interaction with a GRID service, with the page secured using the user's grid certificate (in same way as wms.grid.sara.nl:9000 is secured).
  • Co-ordinate with NBIC/SARA delegation to make a second visit to Manchester in Jan/Feb.
  • Attend the Taverna coding workshop in Manchester in Feb.
  • Proper JavaDoc documentation of all code.
  • Proper System documentation in this wiki. (What is where, how to deploy, etc.)
  • Code is in the GForge SCM.
  • Produce a plan of action like this one for Timebox 3.

Coulds

  • Implement workflow preprocessing in Taverna (unless they do it themselves in Manchester).
    • Depends on workflow preprocessing implementation being done first: Make VOMSWSDLActivity aware of other instances of itself in the same workflow, so that it only needs to prompt for and delegate one set of credentials for each distinct service being used. (All services that share the same user certificate, MyProxy, and VOMS details can be grouped together and share the input from a single user prompt for Credential Manager and expiry details. This requires the ability to share information with other activites via the workflow preprocessor context state.)
  • Anonymous proxy delegation to MyProxy for long lived jobs.
  • Add config panel to Taverna central config window to allow management of auto-completion data from VOMSWSDLActivity.

Won'ts

  • Allow passing of input/output files to the service by reference instead of physical copying.

Reporting

  • Richard, Machiel and Pieter will have weekly Skype meetings. During these meetings:
    • all identified risks will be considered, and the Risk Log will be updated.
    • progress will be discussed on the basis of the prioritized Acceptance Criteria above.
  • The minutes of these meetings (first draft by SARA, corrections by Eagle Genomics) will serve as detailed progress reports. No further progress reports are expected from Eagle Genomics.

Risk Log

This risk log is incomplete.

  • Taverna does not/cannot implement workflow preprocessing pass for workflows.
  • If we want to implement workflow preprocessing ourselves, but the Taverna group won't allow us.
  • Workflow preprocessing cannot support context state.
  • Taverna API is not flexible enough to allow the creation of the VOMSWSDLActivity extension.
  • My wife and I are moving house in March, and we're having our first baby in May - if any complications arise due to either of these situations I could find myself short of time for a while.
  • The NBIC/SARA visit to Manchester does not take place.
  • The coding workshop in Manchester is cancelled or rescheduled.
  • JLite/MyProxy APIs do not support anonymous proxy delegation.
  • NBIC/SARA does not provide a suitable public Tomcat server or set up a public Maven repository.

Timebox 3 (T3) Work Package

Timebox Parameters
Start date 2009-05-04
End date 2009-08-03
Eagle Genomics hours
Period Days Comments
May 2009 6 3 days/week, 2 weeks (baby due)
June 2009 6 3 days/week, 2 weeks (attending BOSC in Sweden)
July 2009 12 3 days/week, 4 weeks
August 2009 0 Contract expires 3rd August
Total: 24 ≋ 192hrs ≋ € 9,792

MoSCoW Prioritized Actions and Acceptance Criteria

Do not edit the Acceptance Criteria without the consent of all team members!

Musts

  • Add code from Piter de Boer to the Taverna plugin that imports server certs into the Java keystore (maybe with a one-time confirmation dialog?).
  • Add the JavaDoc documentation to GForge, as a project file and as part of the project web pages.
  • Add WMS Job Collections and WMS Parametric Jobs to the developed API (if this isn't already possible).
  • If delegation time has been left blank and user hits 'Apply', a validation error occurs. Stop this error from occuring because it is irrelevant in the context of 'Apply'.
  • Full JavaDocs for all API elements on wiki with other docs. Possibly best as link from wiki to pages on ws1 instead, by sending tarballed docs to Pieter for upload.
  • Change package name from nl.sara to nl.nbic throughout all code, docs and references.
  • Add SARA copyright notice to all code.
  • Add Apache 2 licence declarations to all code.
  • Default profiles in MyProxy config panel in Taverna 2 plugin, using SARA profile plus any others supplied by Pieter.
  • Review all JavaDocs with Machiel to bring them to an acceptable quality.
  • Fully detailed logging in API and T2 plugin code, including logging job IDs, SOAP envelopes, exception background info, etc.

Shoulds

  • Anonymous delegation to/from MyProxy.
  • Automated method of importing SSL certificates into Taverna 2, to avoid user having to do this manually on the command line.
    • Test and confirm acceptance when user enters HTTPS-based VOMS or MyProxy in MyProxy profile config panel and hits either 'Apply' or 'Delegate'.
    • Test and confirm acceptance when user enters HTTPS-based WSDL for "WSDL with MyProxy..." activity.
  • Upload demonstration sync and async sayHello workflows to wiki.
  • Demonstration OMSSA service working using Pieter's shellscript for distributing OMSSA as a tarball before execution.
  • Check the 'expired credentials' server problem in the demo server.
  • Participate in one-day web service workshop at NBIC.
  • Turn the how-do-i-get-my-certificate-in-myproxy piece of the plugin in a small stand-alone java application. Could be a command line-application for all we care. Most important is to have some (more) example-code using the API.

Coulds

  • Both the VBrowser and "our" API abstracts from gLite/JLite stuff. They both depend largely on the same external libraries. Maybe we can merge efforts? Or at least discuss this?
  • Using a certificate and key-pair from an "e-Token", i.e. with PKCS#11. To our (SARA) knowledge, this isn't trivial to do from within Java.

Won'ts

  • None defined as yet.

Reporting

  • As per Timebox 2 above.

Risk Log

  • External dependencies (e.g. MyProxy Java APIs) prevent anonymous delegation from being possible.
  • Taverna 2 might have bugs relating to custom looping in workflows and/or saving these to file.
  • Java itself might not permit the automation of certificate download and acceptance.
  • Machiel/Pieter may not be able to provide sufficient information to enable the tasks that mention them to work.