DbNPFeaturesImplementation

From BioAssist
Jump to: navigation, search

This webpage is no longer maintained and only here for archive purposes. Please refer to https://trac.nbic.nl/gscf and http://dbnp.org for up to date information on this project.


Developer perspective

List of concrete dbNP features

The following is a list of features, extracted from the official specification at DbNPFeatures, that could serve as a first draft for feature / milestone planning. They are ordered from basic to advanced, so it would make sense to follow roughly this order for iteration planning.

  • First version of the database model: implement templating/modularization
  • First version of the web interface, allowing people to log in and create accounts
  • Input of studies and experiments
  • Possibility to query studies/experiments
  • Input of study organism using controlled vocabularies
  • Possibility to query on all available fields
  • Possibility to input the study design using the timeline (refinement of database schema to make it possible to store and retrieve study design, testing for end users: can we store all main types of study design in a consistent manner)
  • Possibility to specify details about the samples
  • Possibility to define which assays were performed on the samples
  • Possibility to store the protocols that were used in sample extraction and preparation (what should be queryable / stored in a structured way and what not?)

Implementation discussions

Event timeline

For adding events, a timeline should be shown where is easy to apply a certain EventDescription to all subjects in a group at the same time in the study, leading to an instance of the Event class for all these subjects. The Event class has the following properties:

  • Subject subject
  • EventDescription eventDescription
  • Date startTime
  • Date endTime

It's clear from these properties that the Event class links a certain event (e.g. a treatment with a compound, a diet, etc.) to concrete subjects in time.

Event view

The event view should give an overview of the event, which mainly boils down to a description of the EventDescription class. There are four properties:

  • String name
  • String description
  • Term classification
  • ProtocolInstance protocol

The first two are easy. Term can be mocked for now (and will be provided for the test data). So the hard nut to crack is ProtocolInstance. The main aim of the ProtocolInstance class is to describe a concrete application of a protocol. The Protocol class describes the actual parameters a protocol has. The ProtocolInstance class fills in the values of these parameters. There is some domain engineering needed here too: the current description of the ProtocolInstance class does have a way to store the parameter values, but there no mapping yet between the parameters and their values, so if you have more than one parameter of a certain type (string, number etc.) you are in trouble. So next section:


Protocol Parameter storage

On the one hand we have the specification of the protocols, saying: I am a DietIntervention protocol and I have a float parameter dose and a 'list of items' parameter diet with the terms Control and FishDiet. But on the other hand, in the actual ProtocolInstances, we need to store the values: PI 1 for subject 1 has dose .2 and diet Control, PI 2 for subject 2 has dose .2 and diet FishDiet etc.

Right now, what we have is:

class ProtocolInstance {

  Protocol protocol
  // TODO: check how the values can be indexed so that they can be mapped to their respective parameters (should we should use maps here?) 
  static hasMany = [stringParameterValues : String, numberParameterValues : double, listParameterValues: long]
  static constraints = {
  }

}

Suggestion: use ordering on the parameters

If we can store an ordering in the parameters, than we can remap the values to the parameters they belong to.

Suggestion: use dedicated classes linking to parameter specifications

Change ProtocolInstance to hasMany = [stringParameterValues : ProtocolParameterValueString ...]

and then the class would be

class ProtocolParameterString { String value ProtocolParameter refParameter }

and so on.

Suggestion: use maps and use the parameter names keys

Just store the values in a Map

class Event { Map parameterStringValues static hasMany = [parameterStringValues : String] }

and use the parameter names as the keys and the strings (or float/numbers/objects) as values.

The latter is chosen in the code now (as of rev. 139).

Query view

Simple query implementation

See DbNPFeaturesQueryImplementation for a first (relatively simple) query implementation that we will build before implementing the query flow from DbNPFeatures (which steps are in the next sections).

Select studies view

The first query page should just present you with a full text query box. The result should be mapped back into studies, displayed just as in the study>list (Browse studies), but augmented with the child objects where the string was found (such as subjects, events etc., see Study > hasMany = []). It would be nice if the found strings are highlighted. From this list, the user should be able to select the studies he wants to query on (with checkboxes).

Select samples view

The second query page should present the user with the possibility to choose samples to query on. It should display all samples that are in the previously chosen studies. The samples should be listed in a tabular format (see mockup): Study.name > Subject.name > SamplingEvent.description.name > Sample.name, like in the mockup. We skip the timeline for now, and only display additionally: SamplingEvent.startTime and Sample.material. It should be possible to group the samples, like in the mockup.

Select biomarkers/features view

In the third step, the user should be presented with all biomarkers that are available for all the assays that were done on the chosen samples. The user chooses of which biomarkers he wants the actual values to be included in the query result.

Grouping samples view

The user can group his samples in three ways:

  • <TO BE UPDATED> from subject groups: just take the names of all the groups in all the chosen studies, merge them to a big list and use that as a sample grouping (via the subjects): f(sample -> label) : Sample -> get source Subject -> get parent Study -> get child SubjectGroups -> find SubjectGroup in which the Subject resides -> use SubjectGroup.name | 'Other' if the Subject is in 0 or more than 1 SubjectGroups
  • <TO BE UPDATED> from event descriptions:

f(sample -> label) : Sample -> get parent Event -> get EventDescription -> use EventDescription.name

  • from event startTimes:

f(sample -> label) : Sample -> get parent SamplingEvent and also get parent Study -> use SamplingEvent.startTime - Study.startTime as label (but fancy, based on biggest unit: '3 weeks' or '3 days')

Query result

The query result is a downloadable table of the samples (in the rows) versus their properties (Sample.name, SamplingEvent.name, Subject.name) and the values of the selected biomarkers (in the columns).



Module communication with the CommunicationManager

The dbNP modules GSCF and SAM can run on seperate servers. They communicate with each other by either of two ways: (1) Grails views, or (2) Rest resources.

Communication via Grails views simply means that one module provides a view, that another module binds in (e.g. by a link or remote call).

Communication via Rest resources transfers actual data via a web service in a standardized way.

Both ways of communicating use built-in Grails features and a utility class called the CommunicationManager. We describe how this works below.


Communcation via Grails views

When communicating via Grails views we call one side the client and the other side the server. The client refers to a Grails view on the server. The client thus needs the URL of the view. All views available to the client are registered in the client's CommunicationManager using

CommunicationManager.addViewWrapper( methodName, serverURL, viewName, params = [] )

This method registers the URL and makes it available through a static method methodName of the CommunicationManager. We call this kind of method a wrapper method, because it wraps around the URL of the server. The main advantage of the wrapper method is it will automatically create a URL string from the parameters that the view might require.

Thus for example:

addViewWrapper( 'getAssayImportURL', 'nbx14.nugo.org/gscf', 'importer/pages', ['externalAssayID', 'externalStudyID'] )

Will create a method called getAssayImportURL() which can be called like this:


def study = getMyStudy(...)
def assay = getMyAssay(...)
def url = CommunicationManager.getAssayImportURL( assay.externaStudyID, study.code )

The url then could be something like

http://nbx14.nugo.org/gscf/rest/getAssayImportURL/nil?externalAssayID=7&externalStudyID=4711

Rest resources via CommunicationManager

Grails offers a simple way of providing Rest services (documented in the user guide). Two advantages render implementing a rest service with Grails controllers easy: (1) the Grails controllers can be used to pass arguments to the rest resource, (2) Grails renders the results as XML or JSON for you (we are using JSON). While Grails is grate at providing such Rest services via its controllers, there is no standard way of receiving Rest resources in Grails. We accomplish this task with the CommunicationManager. In conclusion, we are using Grails controllers as Rest server side, and the CommunicationManager as Rest client side.

Since documentation for the server side is available, we do not repeat it here. Instead, we present our own CommunicationManager next.

The main idea of using the CommunicationManager as a Rest client is that we want to conveniently generate and bundle wrapper methods for the Rest resources.

For instance, suppose there is a rest service on a Grails Rest server like this:

http://nbx14.nuro.org/gscf/rest/getSubjects

This service requires an argument called externalStudyID, so e.g., it is called like this:

http://nbx14.nuro.org/gscf/rest/getSubjects/nil?externalStudyId=8

It would be convenient to have a wrapper method on the client, so that one could simpley call:

def resultOfRestResource = getSubjects( myStudyID )

The Communication manager will provide this method. In order to register that method, we simply call:

CommunicationManager.addRestWrapper( 'nbx14.nuro.org/gscf/rest', 'getSubjects', ['externalStudyID'] )

This will generate our wrapper function as a static member function. It can be called

CommunicationManager.getSubjects( [externalAssayId:23] )

Note, that the arguments for the wrapper function are handed over an map. The keys of the map are the names of the arguments of the Rest resource. The values of the map are the actual arguments. The CommunicationManager also handles methods with multiple or no paramters. Simply specify the names of all arguments in the list given as last argument to addRestWrapper(). Furthermore, Internally, we are using URLEcoder to ensure proper protection for argument strings.

In conclusion, the CommunicationManager provides an easy to use means for registering Rest services on the client side. There is no hustle over writing wrapper methods by hand, escaping URL strings or handling JSON objects. The server's Rest resource can simply be called on the client side as a method that just gives the desired return value. The CommunicationManager also bundles all wrapper functions in one place, is used in all modules, and makes changing Rest resources at run time easy. It can also be extended to non-Grails restful services.


Rest resources currently implemented

All Rest resources mentioned here are implemented by Grails controllers. Each module implements its Rest resources in a controller called RestController. All resources return JSON representations of Groovy objects. A list of restful resources currently available in dbNP modules is can be found here .


Initializing the CommunicationManager

The CommunicationManager provides convenience methods that register wrapper methods for Rest resources on the modules. The convenience methods require initialization of the server urls used for the rest resources. The convenience methods should be called on start up at every module as follows.

import dbnp.rest.common.CommunicationManager
....
....
....
CommunicationManager.GSCFServerURL = "http://localhost/gscf"
CommunicationManager.SAMServerURL  = "http://localhost/sam"
CommunicationManager.registerRestWrapperMethodsSAMtoGSCF()  // registering methods for your module, in this case GSCF
....

By default the Rest servers are nbx5.nugo.org/gscf/rest for GSCF Rest resources and nbx14.nugo.org/sam/rest for SAM resources.

Simple Assay Module

Current state of the module

Situation as of week 28:

The Simple Assay Module (SAM) stores clinical measurement data (e.g., weight, BMI, or glucose levels) and makes it available to the GSCF module. The SAM is a stand alone data service implemented in Grails. The sources are available at on nbic's SAM project page .

Mockup for the assays list can be found here: File:Mockup simple assays list.ods.

Structured data

The data contained in this module is very simple and is structured into four kinds of simple objects: SimpleAssay, SimpleAssaySample, SimpleAssayMeasurement, SimpleAssayMeasurementType.

Communication with the SAM

The GSCF communicates with the SAM for retrieving, storing and querying data. The communication is realized as a Rest service using the communication method outlined in the documenation on the CommunicationManager. The Rest that GSCF provides and that SAM uses are:

* getStudies
* getSubjects
* getAssays
* getSamples

Each SimpleAssay domain object in SAM mirrors an Assay domain object in GSCF. The correspondence is organized using a unique shared long identifier (first generated by GSCF). Similarly, Study domain objects in GSCF are referred to from SAM by using a string identifier (the "code" member of the Study domain object). In GSCF, it is defined which Samples within the study an Assay is performed on. (An Assay is always bound to 1 particular Study). These Sample-collections of the GSCF Assay are mirrored in the SimpleAssay SAM domain object. Also the Samples itself are mirrored in the SimpleAssaySample class.

To summarize, an overview of the relevant classes with the most important member properties:

Module Class Members Description
GSCF Study
* code
* description
* assays (1..* Assay)
* samples (1..* Sample)
The class that describes a study (basic entity of GSCF)
GSCF Assay
* name
* module (1 AssayModule)
* externalAssayId
* samples (= 1..* Sample on which the assay was performed)
Definition of an assay that was performed on samples in the study
GSCF AssayModule
* name
* url
A description of an actual instance of a known dbNP clean data module (e.g. a SAM module)
GSCF Sample
* name 
* material
* externalSampleId (= name right now)
* parentSubject (= 1 Subject sample was taken from)
* parentEvent (= 1 SamplingEvent describing the sampling)
Definition of a Sample in the study
SAM SimpleAssay
* name (mirrored)
* externalAssayId (references the Assay in GSCF)
* samples (1..* SimpleAssaySample, mirrored)
* measurements (1..* SimpleAssayMeasurementType)
A proxy copy of the GSCF Assay of which this SAM instance contains measurement data, which describes which measurements were performed in this assay
SAM SimpleAssaySample
* name (mirrored)
* externalSampleId (references the Sample in GSCF)
* material (mirrored)
* subject (mirrored from  Sample.parentSubject.name)
* event (mirrored from Sample.parentEvent.template.name)
* startTime (mirrored from sample.parentEvent.startTime)
A proxy copy of a GSCF Sample which belongs to an assay (with some key properties cached/mirrored)
SAM SimpleAssayMeasurementDoubleData
* value
* parentAssay (1 SimpleAssay)
* parentMeasurementType (1 MeasurementType)
* parentSample (1 SimpleAssaySample)
The actual measurents, as a long table of doubles and ids
SAM SimpleAssayMeasurementType (in UI: 'sub assay')
* name (name of the measured compound)
* sop (protocol of measurement)
* etc.. (isDrug, see DbNPFeatures)
A list of all possible measurements of which data can be stored in this SAM instance

Discussion

For reference/archive reasons, here follows the rationale behind putting clinical data into a separate module.

From the beginning on, the SAM was thought of as a separate clean data module of dbNP, for several reasons. One of those reasons was that the SAM could be an easy first module which would demonstrate the feasibility of the overall design and the clean data layer of dbNP. Furthermore, having a separate module describing the clinical data would also allow for the storage of extra information about the assays, such as the protocols of the different measurements or assays.

However, the study capture data model has by now grown so advanced that it could easily store simple clinical data in the study capture module itself. One would just have to define a Sample template that covers the measurements of the assay, taking advantage of the many available datatypes in the GSCF Template structure, and it would perfectly be possible to fit this data into the GSCF Studies database. Also, the SAM module is probably the only module that would never be considered separate from the study capture module, and for which no uses exists on its own (unlike e.g. metabolomics where there are pipelines and tools for the data involved).

So, we have to take a clear approach concerning the SAM. Obviously, factoring out the code into a separate module is an extra effort. Is it worth it?

Pro modularization
  • Separate measurement data from study design
  • Demonstration of the module principle
  • Ability to store extra information on measurements and assays such as normal measurement ranges, links to compounds and so on
  • Ability to add extra functionality on the numerical data, such as statistics or graphs
  • One place where all the measurement data goes, one could easily create a separate interface for e.g. lab analysts
  • Use the module as a clear test case to for implementation up to the point where everything actually runs
Con modularization
  • Saving clinical chemistry data can in principle already achieved with Sample templates, so why bother?
  • Overhead of communication of GSCF and SAM via web
  • Extra effort for security / AAA
  • Maintenance
Possible approach: integration of SAM into GSCF

Using Sample templates to store the data would be a quick and dirty way to achieve the basic requirements for SAM (storage of clinical chemistry data). Also, it would eliminate the question whether a very simple assay such as Body weight should be stored in the GSCF itself or in a separate module. However, the requirements about measurement information (see DbNPFeatures#Clinical_chemistry_clean_data) cannot be met that way.

Possible approach: making SAM a stand alone module

SAM could very well be the first working module for GSCF. It would only have to store simple measurement values, but we could extend that and add value as described above. For example, in practice, clinical chemistry information often resides in Excel sheets on all sorts of locations. We could make an Excel pastebox interface where we could interpret that tabular data just as we do in the Importer wizard in GSCF.

Possible approach: make SAM a module and factor out common functionality into a common module

We could also combine the best of both worlds into a common module. That way, the SAM module could profit from e.g. the extensive Template structure or the Excel importer wizard in GSCF, and vice versa. However, that would require separating out these functionalities into a stand-alone module. Grails allows for easy modularization, but it might take a bit to make the methods general enough for that.

Biologists perspective

Templating

Jildau: Als ik nu een study hebt waarbij ik twee treatments doe (bv placebo en paracetamol 500 microgram per dag) en daarnaast twee challenges (glucose 75mg en lipid 50 mg), hoe sla ik deze gegevens dan op? De events zijn neem ik aan de de treatment en de challenge. Zijn het feit dat het om een treatment gaat en om een challenge een eventdescription? En zijn de hoeveelheden eeen eventdescription of zijn dat parameterstrings?

Kees: De events zijn degene waar ik het meeste moeite mee had om tot een eenduidig datamodel te komen. Nog steeds speel ik ook met de gedachte om een versimpelde, niet-tijdgebonden entity toe te voegen die precies zo werkt als een 'factor' in ISATAB. Maar dat terzijde. Voor de voorbeelden die je nu geeft is het huidige datamodel juist wel handig (denk ik).

EventDescription geeft in principe een generieke beschrijving van een event, terwijl Event een concrete toepassing van die EventDescription voor een concreet Subject op een bepaalde tijd weergeeft, waarbij mogelijk ook nog een aantal details ingevuld worden. De invulling van die details gaat via het Protocol. Het Protocol is in principe gedefinieerd in de EventDescription, maar bij het Protocol kunnen ook een willekeurig aantal ProtocolParameters gedefinieerd worden. De concrete waarden voor deze protocol-parameters worden ook aan Event gehangen.

Dus in jouw geval stel ik het volgende voor:

EventDescription: treatment Protocol: patient takes 1 pill each day with ProtocolParameters:

  • drug: stringlist: placebo or paracetamol
  • dose: integer (mg)

EventDescription: challenge Protocol: nutritional challenge with ProtocolParameters:

  • drug: stringlist: glucose or lipid
  • dose: integer (mg)

Events: for each subject: 1 treatment event with respective ProtocolParameter values 1 challenge event with respective ProtocolParameter values

Importer wizard

  • Choose template and input file (Excel file)
    • Create new template based on sheet
  • Define columns
    • Per column select entity (Don't import or select Study, Subject, Sample et cetera) it belongs to
    • Per column select property (properties of an entity are fields like name, dob, bmi etc.)* Validd
  • Validation
  • Check whether import is possible (e.g. 450/500 rows can be imported, 50 rows fail)
  • Overview (confirmation) what is going to happen in the database (insert, update)

Main import logic

studyColumns = allIMC where entity = Study
if studyColumns.count == 0
  study Chooser + link to create new study
else
 user chooses whether:
 studyIDColumns = allMC where entity=Study and isID=true
 - add to existing studies
 - create new studies
 check: we have by now a mapping importerrow -> study Object
 subjectColumns = allIMC where entity=Subject
 if subjectColumns > 0
  user chooses whether:
  - add to existing subject
  - create new studies
endif
#Datamodel changes: MappingColumn lookup
#Feature: find unique studies
bool mapSampleToSubject
sampleColumns = allIMC where entity = Sample
if sampleColumns.count > 0
  user chooses whether
  - add to existing samples
  - create new samples
   if subjectColumns > 0
     mapSamplesToSubject = true
   end
import (bool commitDate, IMmapping, MappingChoice, HashMap<String>, subjectColumns, ... )
- studies : "id-columns" + id (object in db)
- subjects
- samples
- rowErrors (Not found...) 
for each row (int long)
Study rowStudy
{ 
  it studyColumns > 0
  if not Study Create
    find studyIDfields in database
       if found rowStudy = found.id
        update study
       else
         throwerror (study not found)
    else
      create study
      fill all fields
      save
      rowstudy = createdId
  else
    rowStudy = choosStudy
 end
end

if subjectColumns > 0
  if subjectCreate
    create Subject
    fill all fields (setField(nameValue))
    rowSubject = save()
     rowStudy.addToSubjects(rowSubject)
  else
  find subject in study RowStudy
   withSubjectIdcolumn -> idcolumn values
  if findCount > 1
   throw error (multiple occurences)
   found of ID columnentries [ ]
  elseif findCount == 1
    rowSubject = ...
    update fields
    save
  else
   throw error Subject with IDcol values [...] not found
end
if sampleColumns > 0
  sampleCreate
  create sample in study
  fill all fields
  rowSample = save()
  if mapSamplesToSubject
   parentSubject = rowSubject
  else
   find samples in study with idcolumn values as in this row
    foundCount as in Subject
  if mapSamplesToSubject
   if defined parentSubject
     check whether the same
      if not throw error
   else
    set parentSubject = rowSubject
  end
 end
end
catch { rowError [rowNumber] = e.error }

Treatment of built-in columns

Right now, we only import TemplateEntity-extending entitities, and use TemplateEntity.giveFields to give us the fields that the imported columns can be mapped to. The current implementation of giveFields returns the template fields of the underlying templates. However, some fields are not in the template, but are fixed for each entity (such as name and species for Subject). This results in the situation that those standard fields cannot be imported at the moment.

Two probable solutions:

  • Move all fields into templates. This seems a drastic change, however, it makes the data model more flexible/customizable, which is a good idea, and also it makes it easier to set up user interfaces on top of the data model (simply iterate over all template fields to get all properties). Of course, specific fields such as links to parent objects still need to be in the super class. However, there are also drawbacks: apparently we will need to change some of our code, and you cannot force certain fields (such as name) to be there anymore. In the last example, it is convenient to have name to serve as a string identifier.
  • Extend the giveFields to also return fixed fields. This might be a little tricky to do, because you would have to know what the superclass is, or else fetch all properties from the super class and eliminate any non-fields (such as constraints, belongsTo, but also e.g. parentSubject), which is tricky as well. An additional problem is that the current giveFields returns a Set<TemplateField>, so either we would have to change that, or make mock TemplateField descriptions for the standard fields.

The first solution was optimized to a point where in the BootStrap certain 'system' template fields were dynamically added at the time of any Template object creation. The storage was thus done in the same way as normal template field, in the respective template field tables. This was implemented for the Event class in revision #390. However, it turned out that to get this to work and be able to specify domain class constraints on in this case Event startTime and endTime, we still had to implement getStartTime, setStartTime etc. manually.

So, in revision #392, we switched back to the second solution, still storing the 'domain fields' such as Subject species in the domain class (table) itself, but giving the code an option to pull the complete list of domain+template fields with a giveFields() method in TemplateEntity.

Ontology chooser

We could use the BioPortal Ontology Term Chooser Widget: http://www.bioontology.org/wiki/index.php/NCBO_Widgets#Term-selection_field_on_a_form. Update: this widget is modified by Jeroen and incorporated into the source.

Save strategy for a chosen term using the hidden fields:

  • use Ontology.shortName gebruiken for ontology-id
  • use Term.accession for concept-id
  • use Term.name for the displayed name

So, given a chosen ontology concept:

  • test if an Ontology with that ontology-id already exists
  • if not, insert it
  • check if there is a Term that belongs to that Ontology which has accession = concept-id
  • if not, create it
  • reference that Term

In this way we will build an offline cache of database ontology Terms that are used in our database.


Ontology showPopup

An Ontology selection popup must be implemented into the Study Capture Wizard, which uses both GSCF caching and the BioPortal Ontology services.

  • The user clicks on the Ontology input field and starts entering the first characters of the aimed Ontology
  • The cache is being checked whether any Ontologies exist with the string
  • If so they are shown in the dropdown menu
  • If the user finds the targeted Ontology in the dropdown it can be selected
  • If the Ontology is not found, the user can click on an add button next to the dropdown
  • A popup appears with an input field, containing the string already entered
  • Based on this string a search is being performed on the BioPortal services (bioportal.bioontology.org)
  • The user can select the targeted Ontology and clicks submit/add
  • The Ontology is added to the cache and the value is returned to the main form



  • Choose template and input file (Excel file)
    • Create new template based on sheet
  • Define columns
    • Per column select entity (Don't import or select Study, Subject, Sample et cetera) it belongs to
    • Per column select property (properties of an entity are fields like name, dob, bmi etc.)* Validd
  • Validation
  • Check whether import is possible (e.g. 450/500 rows can be imported, 50 rows fail)
  • Overview (confirmation) what is going to happen in the database (insert, update)

Main import logic

studyColumns = allIMC where entity = Study
if studyColumns.count == 0
  study Chooser + link to create new study
else
 user chooses whether:
 studyIDColumns = allMC where entity=Study and isID=true
 - add to existing studies
 - create new studies
 check: we have by now a mapping importerrow -> study Object
 subjectColumns = allIMC where entity=Subject
 if subjectColumns > 0
  user chooses whether:
  - add to existing subject
  - create new studies
endif
#Datamodel changes: MappingColumn lookup
#Feature: find unique studies
bool mapSampleToSubject
sampleColumns = allIMC where entity = Sample
if sampleColumns.count > 0
  user chooses whether
  - add to existing samples
  - create new samples
   if subjectColumns > 0
     mapSamplesToSubject = true
   end
import (bool commitDate, IMmapping, MappingChoice, HashMap<String>, subjectColumns, ... )
- studies : "id-columns" + id (object in db)
- subjects
- samples
- rowErrors (Not found...) 
for each row (int long)
Study rowStudy
{ 
  it studyColumns > 0
  if not Study Create
    find studyIDfields in database
       if found rowStudy = found.id
        update study
       else
         throwerror (study not found)
    else
      create study
      fill all fields
      save
      rowstudy = createdId
  else
    rowStudy = choosStudy
 end
end

if subjectColumns > 0
  if subjectCreate
    create Subject
    fill all fields (setField(nameValue))
    rowSubject = save()
     rowStudy.addToSubjects(rowSubject)
  else
  find subject in study RowStudy
   withSubjectIdcolumn -> idcolumn values
  if findCount > 1
   throw error (multiple occurences)
   found of ID columnentries [ ]
  elseif findCount == 1
    rowSubject = ...
    update fields
    save
  else
   throw error Subject with IDcol values [...] not found
end
if sampleColumns > 0
  sampleCreate
  create sample in study
  fill all fields
  rowSample = save()
  if mapSamplesToSubject
   parentSubject = rowSubject
  else
   find samples in study with idcolumn values as in this row
    foundCount as in Subject
  if mapSamplesToSubject
   if defined parentSubject
     check whether the same
      if not throw error
   else
    set parentSubject = rowSubject
  end
 end
end
catch { rowError [rowNumber] = e.error }

Treatment of built-in columns

Right now, we only import TemplateEntity-extending entitities, and use TemplateEntity.giveFields to give us the fields that the imported columns can be mapped to. The current implementation of giveFields returns the template fields of the underlying templates. However, some fields are not in the template, but are fixed for each entity (such as name and species for Subject). This results in the situation that those standard fields cannot be imported at the moment.

Two probable solutions:

  • Move all fields into templates. This seems a drastic change, however, it makes the data model more flexible/customizable, which is a good idea, and also it makes it easier to set up user interfaces on top of the data model (simply iterate over all template fields to get all properties). Of course, specific fields such as links to parent objects still need to be in the super class. However, there are also drawbacks: apparently we will need to change some of our code, and you cannot force certain fields (such as name) to be there anymore. In the last example, it is convenient to have name to serve as a string identifier.
  • Extend the giveFields to also return fixed fields. This might be a little tricky to do, because you would have to know what the superclass is, or else fetch all properties from the super class and eliminate any non-fields (such as constraints, belongsTo, but also e.g. parentSubject), which is tricky as well. An additional problem is that the current giveFields returns a Set<TemplateField>, so either we would have to change that, or make mock TemplateField descriptions for the standard fields.

The first solution was optimized to a point where in the BootStrap certain 'system' template fields were dynamically added at the time of any Template object creation. The storage was thus done in the same way as normal template field, in the respective template field tables. This was implemented for the Event class in revision #390. However, it turned out that to get this to work and be able to specify domain class constraints on in this case Event startTime and endTime, we still had to implement getStartTime, setStartTime etc. manually.

So, in revision #392, we switched back to the second solution, still storing the 'domain fields' such as Subject species in the domain class (table) itself, but giving the code an option to pull the complete list of domain+template fields with a giveFields() method in TemplateEntity.