C-MS3D
Goals and Objectives
Participants
Advisory Board
Publications
Technical Details
Links
 
  Technical Details
 
   HomeTeam Members  The C-MS3D Portal

Collaboratory for MS3D
Data Portal Enabling New Protein Structure Collaboration: Overview

Principal Investigator: Carmen Pancerella

Institutional Points-Of-Contact:

Sandia National Laboratories: Carmen Pancerella
University of Maryland, Baltimore County: Dan Fabris
University of California, San Francisco: Irwin Kuntz

We are building on an established data-centric collaboration infrastructure, the Knowledge Environment for Collaborative Science (KnECS), and adaptively adding new data management and analysis capabilities to enable emerging research communities tackling innovative approaches in the biomedical community. This resulting infrastructure will incorporate emerging middleware standards for data/metadata management, security, application integration, and collaboration. In the longer term, the project will target a standards-based knowledge synthesis and management capability. This work is being carried out in direct collaboration with scientists leading the development of MS3D[1, 2], a new method that combines intra-molecular chemical crosslinking with high-resolution mass spectrometry to glean structural information about proteins and other biological macromolecules. The 'Collaboratory for MS3D', or C-MS3D, will integrate the evaluation of the tools and the measurement of their impact on a newly developing community. The tools developed will be open sourced, and made available as a 'collaboration tool kit' for other interested communities.

The specific aims of this project are:

  1. To build an extensible portal for sharing data and tools, supporting both public and private group collaborations of geographically distributed biologists
  2. To enable information interoperability by creating new community data schemas and tools for sharing data in the MS3D domain, taking advantage of existing and emerging standards and technologies, where possible.
  3. To modify existing tools that generate and analyze data to enable the creation and storage of new MS3D metadata in a format that allows interoperability with other tools and collaboratory functions; and to create new tools as the portal and data schemas mature.
  4. To research and develop methods for automating the capture of data provenance and workflow, towards the goal of a comprehensive knowledge management system.
  5. To demonstrate the impact and effectiveness of the portal to enable new science by piloting these developments with collaborating scientists in the MS3D community.

This work is taking advantage of previous work by the Collaboratory for Multi-scale Chemical Science (CMCS) [http://cmcs.org/], a multi-institution project funded by the U.S. Department of Energy to develop and pilot an advanced collaborative community data system for chemical science. CMCS is an open, public resource supporting a systems approach to chemical science including sub-disciplines from quantum chemistry to reacting-flow simulations of chemical combustion. KnECS is the discipline-independent infrastructure that was built in the CMCS project. C-MS3D will inherit these KnECS technologies:

  • A collaboration infrastructure to enable real-time and asynchronous collaborative development of standards for data/metadata description, multi-discipline scientific communication, geographically distributed collaboration, and project management.
  • Repositories to store data and metadata in a way that preserves data integrity and allows web access.
  • Tools to browse, search and query metadata, and to retrieve, analyze, and visualize data across all scales, disciplines, and locations.
  • APIs to enable new and existing scientific tools to generate, access, and store data and metadata in the repositories.

KnECS is built on a web-based portal using the CompreHensive collaborativE Framework (CHEF) [http://www.chefproject.org/], which itself leverages the Apache Jetspeed portal framework. As a data and metadata management framework, KnECS employs the Scientific Annotation Middleware (SAM) to provide federated data/metadata access, extensible metadata annotation, and transformations of data and metadata [3]. This capability enables scientific knowledge management.

The initial design of C-MS3D capabilities has been guided by an overarching use case scenario. Based on this guiding use case, we are targeting several domain applications for C-MS3D portal integration. These include the following tools:

  • Automatic support for new mass spectra being stored directly in the shared data repository in an interoperable format.
  • Integration of developing crosslink assignment tools, with built-in data interfaces for acquisition of all facets of input data, automatically in most cases by taking advantage of detailed of annotation of mass list data.
  • Integration of bio-molecular structure modeling tools. Extension of these tools to support the analysis of distance constraints, initially through consistency checks, later through incorporating these constraints into the structure optimization algorithms.
  • Tabular and graphical visualization tools for mass spectra, mass lists, assignment lists, and partial and total protein structures.
  • Interface for the development of a collaborative crosslinking chemistry knowledge base.
  • A workflow environment that integrates tools and data.

Rapid development of this field dictates an adaptive strategy of reviewing requirements at regular intervals and directing efforts opportunistically. For example, it is anticipated that other types of low-resolution or otherwise qualified structural information (such as reactivity data, partial structures, sparse NMR data) would add significant value to the bio-molecular structure modeling process. An open source model for the integrated bio-molecular structure modeling tools might provide a forum for a growing community to contribute to the speed and accuracy of 3D structure discovery.

The CMS3D project will open source its software following relevant NIH and Sandia National Laboratories guidelines, pending the approval of those organizations. Members of our team have experience with open sourcing project software through their current CMCS efforts. We anticipate the C-MS3D project software (consisting of the informatics infrastructure, data interoperability technologies, annotation and provenance management software, workflow management tools, and those domain applications that C-MS3D gains the right to distribute) to be sufficiently mature to be open-sourced by the mid-project timeframe. Once the software is open-sourced, the C-MS3D team will manage the ongoing open-source project, direct community involvement and contributions, and provide incremental releases of the software suite.

  1. Young, M.M., et al., High-Throughput Structure Determination: Rapid Identification of Protein Folds Using Mass Spectrometry and Intramolecular Cross-linking. Proc Natl Acad Sci U S A, 2000. 97(11): p. 5802-6.
  2. Schilling, B., et al., MS2Assign, automated assignment and nomenclature of tandem mass spectra of chemically crosslinked peptides. J Am Soc Mass Spectrom, 2003. 14(8): p. 834-50.
  3. Myers, J.D., et al., Re-Integrating the Research Record, in IEEE Computing in Science and Engineering. 2003. p. 44-50.

 

       
 
 

Last Modified 06/17/08