Program "Infrastructure" workshop NBIC2013
- 17 April 2013
- 13:15 until 14:30: only 75 minutes.
Goal of the workshop: what questions do you need to ask yourself when dealing with a 'big data' analysis, and which infrastructure solutions are available?
In this workshop we will discuss different viewpoints on the execution of a relatively large computation effort and of making the results available to the world in the form of a web service. The question we want to answer is: What can you host yourself, what can be done at the university computer center, what would be interesting to run at SURFsara, and what could be handled by commercial providers of computer services? We will have short presentations based on our example use case, from two academic providers as well as from two commercial providers, and will conclude the session with an open discussion.
The use case: a researcher at an UMC has 10 TB of data, available on disk. To process this data 5000 core hours of calculations (100 independent calculations of 50 hours each, potentially parallellizable on 4 cores in 15 hours, and each job needing 1TB of data) need to be performed. Of course the parameters of the calculations need to be tuned, for that a number of single calculations will be run. The result will be a 1TB data set that the researcher wants to make available to the world via a reasonably simple web service.
- Introduction (Rob Hooft, 5 minutes)
- Talks by several infrastructure service providers (max 15 mins each)
- Discussion (min. 15 mins)
- How would you host an application with 10 TB of data? After hearing possibilities for hosting, do you want to do this yourself?
- Do you enjoy doing these things yourself?
The presenters at the workshop are:
- SURFsara (Maurice Bouwhuis)
- CIT/UMCG University (Hans Gankema)
- BitBrains (Gjalt van Rutten)
- ProAct (Bertus Doppenberg) [Sponsor of t the conference]
They can each present max 15 minutes what they offer, how they work, and how they think it compares with a local solution, where they excel.
- Infrastructure, based on use cases like "NFU" data.
- What are the different needs?
- Hosting a web site with information
- Hosting industry-standard software (wiki / trac / etc)
- Host your software for the lab
- Host your software for the world (publication)
- Store local data (temporarily / long term)
- Local compute service. Dimensioning? Centralized overflow for extreme needs?
- Potential overlap with BIUP session where there is lot of interest in "extreme needs" overflow space.
- What can be done locally? What can be done centrally? What can be done commercially?
- Security and legal aspects
- Helsinki experience
- What are you doing yourself? What should be left to others?
- Do you know about IaaS / PaaS / SaaS? Who runs the Helpdesk?
- Discussion: How is this kind of infrastructure budgetted for? Afterthought?
- Discussion: How many hours will you be spending on maintenance? Is that ideal?
- What does it cost? Compute? Storage? What does it cost when you do it yourself?
- Are you setting time for maintenance? Incident response?
- Find speakers