Generalizing B-Fabric Towards an Infrastructure for Collaborative Research in Switzerland


Project Short Title: B-Fabric for Switzerland
Project funded by AAA/SWITCH programme
Start and End Date: 2009/6/1 – 2011/5/31
Responsible Institution: Functional Genomics Center Zurich (FGCZ), Prof. Ralph Schlapbach
Project leader of the University: Dr. Can Türker (FGCZ)
Project Participants of the University: Dr. Fuat Akal (FGCZ), Dieter Joho (FGCZ)
Other Participating Institutions and Staff:
Dr. Lars Malmström (Institute of Molecular Systems Biology, IMSB, ETH Zurich),
Dr. Marc Suter (EAWAG Swiss Federal Institute of Aquatic Science and Technology)

Project Description, Goals and Benefit

Modern biological and bio-medical research uses advanced analysis technologies that enable the description of complex (biological) systems at the molecular level and allow the comprehensive characterization of the systems. Analytical and bio-informatics technologies thereby allow the understanding of the underlying biological mechanisms and interactions. Such technologies are applied in a wide spectrum of development projects, which reaches from biology to clinical questions and from the structural clarification to active substance development. Essentially to the clarification of such complex questions is the integrated consideration of analytical data which goes beyond the huge number of measurements at the molecular and technical levels.

For collaboration with the research groups of the both universities of Zurich (University and ETH Zurich), the Functional Genomics Center Zurich (FGCZ) pursues a state-of-the-art technology and bioinformatics infrastructure, and makes these available via his experts to the research groups for collaborative projects. FGCZ has developed an innovative data management infrastructure, called B-Fabric, which provides a comprehensive data repository for all data produced in the scientific projects carried out at the FGCZ. B-Fabric supports the integration, annotation, publication, and exchange of experimental data. In addition, it offers an open platform for supporting technology-spanning (Genomics, Transcriptomics, Proteomics, Metabolomics) analysis. In cooperation with the ProjectRequest tool, which was also developed at the FGCZ, B-Fabric also provides functionality to organize scientific projects, i.e., it supports the project application, reviewing, coaching, and member management. Both tools are successful in operation at the FGCZ for two and three years, respectively.

Nearly all research groups - not only in the Life Sciences - need such tools to efficiently administer their research projects and the results they produce in these projects (raw data, analysis data etc.). It is indispensable for inter-project analyses as well as collaboration purposes that the results (data) obtained in different projects can be used transparently. B-Fabric offers the basis for these demands. Moreover, its flexibility allows a deployment in a more general setting wherever data annotation is an essential issue, e.g. for general archiving purposes where files can be archived together with their annotations.

With the present project we would like to open and generalize B-Fabric such that a broader research community in Switzerland can profit from it. There are two potential scenarios how B-Fabric could be applied:
  1. Research groups with the necessary (technical and human) resources can deploy and administer B-Fabric locally. In this case, independent adaptations and extensions of B-Fabric could be carried out. The B-Fabric community would then again benefit from these developments.
  2. Research groups without own (technical and human) resources for the local deployment and maintenance of B-Fabric might use an instance hosted at FGCZ (in return for a small fee to cover the hardware and maintenance costs – which is to be defined at a given time).
In both cases, the research community in Switzerland could benefit from the already six years of work and experience of the FGCZ in managing and sharing scientific data.

The present focus of B-Fabric lies in the support and management of scientific projects which are carried out at the FGCZ. Consequently, B-Fabric comprises up to now only data sources (instruments) available at FGCZ. Also, access control and the supplied functionality are designed for the specific needs of the FGCZ users. With the present project, we aim at extending and generalizing B-Fabric in collaboration with representative research groups (ETH IMSB and EAWAG) such that it can serve as a generic data repository and collaboration platform for a broader research community. This extension and generalization concern the following four aspects:
  • Fine-grained, dynamically adaptable access management: Currently, the implemented access management of B-Fabric follows the FGCZ policy, which demands that access to data is controlled on the basis of project membership, i.e., all members of a project have access to the data of that project. Moreover, the FGCZ policy requires that all data of a project becomes public with the end of that project. When B-Fabric should be applied in a more general environment, where especially ad-hoc cooperation/collaboration is supported, a selective, user-defined provision of project data seems to be needed. Consequently, a central task of this project is to implement a fine-grained access management that allows to dynamically change access rights in the frame of a given access policy.
  • Switch AAI Shibboleth-based authorization and authentication: Currently, B-Fabric has its own user management including authorization and authentication. In addition to that, B-Fabric should also support the authorization and authentication via Switch AAI/Shibboleth. Users may benefit from support in several cases. First, users who already have a Switch AAI/Shibboleth account will not require anymore to explicitly register to B-Fabric and thus to take care of an additional account - for which they have to remember their login name and password. Second, any user with an existing Switch AAI/Shibboleth account will automatically have access to data that is public for the B-Fabric community. In other words, all users with a Switch AAI/Shibboleth account (currently these are more or less all members of Swiss academic institutions) will implicitly become a part of the B-Fabric community. Third, users may access several B-Fabric instances - possibly managed by different institutions - with the same login and password and thus increase the potential of collaboration.
  • Ad-hoc coupling of data sources: Another central part of this project is to support the previously described second scenario by dynamic coupling of external data resources with B-Fabric. In its current implementation, B-Fabric allows the coupling of data sources via a static configuration. Hence, a complete rebuild of the system is required. Here, the plan is to implement a more general concept which allows an ad-hoc coupling of any external data source.
  • Advanced annotation management: In its current implementation, B-Fabric permits that a user can extend the vocabulary of certain attribute domains. In the background, a user, who is an employee of the FGCZ, is triggered to review and release the new annotation. For a broader application of B-Fabric in a setting, where manual reviewing of the vocabulary cannot be done by a small set of users, it is required to generalize the current concept such that reviewing and releasing can be done by any user who gets the right for reviewing. The granting and revoking of such rights should be possible at run-time. Another issue, which is relevant in this context, concerns the synchronization of the annotations with standard vocabularies. This should be done whenever possible. To do so, extensions are needed to set up and control a relationship to a public standard vocabulary repository.
The feasibility and usefulness of the generalized B-Fabric framework shall be demonstrated in collaboration with the following representative groups:
  • The IMSB group of Prof. Ruedi Aebershold and Dr. Lars Malmström will evaluate a locally deployed generalized version of B-Fabric. For demonstration purposes, we will together select some representative IMSB data sources and couple it with a B-Fabric instance deployed at IMSB. Thus, the IMSB group will act as representative partner of the first scenario described in the beginning.
  • Dr. Marc Suter from EAWAG will provide external data sources to be coupled with B-Fabric. For demonstration purposes, we will together select data sources available at EAWAG to couple them with the B-Fabric instance running at the FGCZ. Thus, the EAWAG group will act as representative partner of the second scenario described in the beginning.
The sustainability of this project is ensured by the fact that FGCZ uses B-Fabric as core strategic tool for collaboration. Thus, the further usage, maintenance, and development of the tool are guaranteed beyond the scope of this project.

Expected Number of Users

At present more than 1000 users are registered with the B-Fabric instance deployed at FGCZ. We expect a similar number of new users as soon as our collaborators really use it in production.

Technical Description

B-Fabric is developed on the basis of open source technologies, e.g., Apache Cocoon (Web Application Development), PostgreSQL (Data Management), Apache Lucene (Fulltext Search), OpenSymphony (Workflow Management). As stated before, the current conception and implementation of B-Fabric focuses on the support and management of scientific projects as they are carried out at the FGCZ. So far, B-Fabric integrates only data sources (instruments) that are in use at the FGCZ. Similarly, data access management as well as data annotation management is designed according the special needs of the FGCZ users.

The ad-hoc coupling of external data sources requires a decoupling of the corresponding functionality in the current code. Currently, the information about the external data sources are statically configured in XML files. To attach a new external data source to B-Fabric, the system must be compiled and restarted. In future, the configuration and coupling of external data sources shall be possible dynamically without compiling and restarting the system.

The generalization of the access management requires a concept to represent the currently hard-coded access rules. Here, we plan to realize a role-based solution similar to access control in databases. For each B-Fabric application object (Project, Sample, Extract, Workunit, Data Resource etc.), it shall be possible to grant and revoke access rights dynamically.

The Switch AAI/Shibboleth-based authorization and authentication requires an extension and adaption of the current B-Fabric architecture with respect to the user management and service provision. This task is complicated by the fact that the metadata about the user that is usually provided by identity provider is not complete enough to use all B-Fabric services, e.g. detailed address information is required for project requests, service billing or door key ordering (to physically access the FGCZ lab). A simple elimination or complete replacement of the current user management is therefore not possible, at least for the B-Fabric deployment at FGCZ where some applications require detailed personal user information. In addition, the current deployment of B-Fabric of FGCZ has more than 1200 registered users. To associate these users with their Switch AAI accounts (if they have any), additional functionality must be implemented to give a user the possibility to build up this relationship.

The current concept of annotation management in B-Fabric allows the run-time extension of the given vocabulary of a corresponding annotation domain. The current implementation has some serious shortcomings. First, it does support a direct link to corresponding external vocabularies such as NCBI or GO ontologies. Here, the annotation concept must be extended such that external links can be added to an annotation. Second, it is not possible to simply import an annotation source like NCBI or GO as base vocabulary. Therefore, we plan to include such import functionality and try to provide a mechanism for synchronizing the B-Fabric vocabulary with such standard external repositories. Third, the annotation reviewing and releasing process is bound in B-Fabric to the concept of a coach, i.e., a user that is an employee of the FGCZ. This concept shall be generalized such that the reviewing and releasing of an annotation can be done by any user who has the corresponding right.

Deliverables
  1. Deliverable D1: Specification for Ad-hoc Coupling of External Data Stores
  2. Deliverable D2: Coupling of Some Selected EAWAG Resources with B-Fabric
  3. Deliverable D3: Specification of a role-based access model for B-Fabric
  4. Deliverable D4: Specification for Authentication via SwitchAAI Shibboleth
  5. Deliverable D5: Specification of the Generalized Annotation Management


Publications
  1. Can Türker, Fuat Akal, Ralph Schlapbach: Life Sciences Data and Application Integration with B-Fabric. International Symposium on Integrative Bioinformatics (IB 2011), 21-22 March, 2011, Wageningen, The Netherlands.
  2. Can Türker, Fuat Akal, Dieter Joho, Christian Panse, Simon Barkow-Oesterreicher, Hubert Rehrauer, Ralph Schlapbach: B-Fabric: On-the-Fly Integration of Life Sciences Applications (Demonstration). 7th International Conference on Data Integration in the Life Sciences (DILS 2010), 25-27 August, 2010, Gothenburg, Sweden.
  3. Can Türker, Fuat Akal, Dieter Joho, Christian Panse, Simon Barkow-Oesterreicher, Hubert Rehrauer, Ralph Schlapbach: B-Fabric: The Swiss Army Knife for Life Sciences (Demonstration). 13th International Conference on Extending Database Technology (EDBT), March 22-26, 2010, Lausanne, Switzerland.
Demos/Presentations
  1. General demonstration for The Electron Microscopy Center of ETH Zurich, 9.7.2010.
Meetings
  1. Public B-Fabric Day, Uni Irchel, 23.05.2011, 09:00-14:00. Agenda: 1) Overview and Demonstration of B-Fabric, 2) Discussions, 3) Apero.
  2. Project Meeting, FGCZ, Uni Irchel, 9.7.2009, 15:00-16:30. Agenda: 1) Quick overview of the project: goals, tasks, and milestones, 2) Discussion: open questions and potential schedule revisions, 3) Apero.
  3. Technical Meeting with IMSB, ETH Hönggerberg, 25.08.2009, 09:30-13:00. Agenda: To deploy the current version of B-Fabric at IMSB and brainstorm on how to couple representative data resources of IMSB for testing purposes.
  4. Technical Meeting with EAWAG, EAWAG Dübendorf, 24.08.2009, 09:30-11:00. Agenda: To develop first ideas regarding how to couple which EAWAG data sources with B-Fabric.
Acknowledgements

This project has been carried out as part of the “AAA/SWITCH – e-infrastructure for e-science” programme under the leadership of SWITCH, the Swiss National Research and Education Network, and has been supported by funds from the State Secretariat for Education and Research (SER) and ETH Board.