![]() |
||
|
A Unique Opportunity in Biological Information Object Standards C.F. Dewey, Jr.1,2 and Shixin Zhang1
1Department
of Mechanical Engineering Massachusetts Institute of Technology, Cambridge MA USA IntroductionOver the past several years, the explosive growth of biological data generated by new high-throughput instruments has literally begun to drown the biological community. There is no established infrastructure to deal with these data in a consistent and successful fashion. This paper discusses the opportunity to develop a new informatics platform to handle a large subsection of the experimental protocols that currently exist. A consistent data definition strategy is outlined that will handle gel electrophoresis, microarrays, fluorescence activated cell sorting, mass spectrometry, and microscopy within a single coherent set of information object definitions. MethodsSeveral important experimental techniques in contemporary biology have been used to create a single composite schema. The results bear a striking relationship to the DICOM standard of 1993 that provides information object definitions of all of the major medical imaging modalities (MR, CT, US, XA, NM, VL, CR, and Waveforms). The de novae information object definition we developed for gel electrophoresis turned out to be very similar to the existing MAGE-OM information model for microarrays. Further investigation revealed that similar object definitions characterized other experimental biology methods as well. ResultsA first implementation of this work is called ExperiBase. It can store and query data generated by the leading experimental protocols used in biology within a single database. ExperiBase also has provisions to store derived data from analysis as a part of an expanded definition of the information object. Transport of the raw data and analytical results between ExperiBase and external analysis packages uses web-based network technologies and XML representation of the data itself. The information object model is used to define the form of the XML data document. Import and export of data in spreadsheet format is also supported. ExperiBase has been ported to three leading database platforms: Oracle, DB2 and Informix. There are no platform-specific dependencies. DiscussionWe have submitted this work to the Interoperable Informatics Infrastructure Consortium (the “I3C”) to assist in developing approved methods and to promote international standards. Participation by standards organizations such as OMG is encouraged and anticipated. ConclusionThe medical and biological communities are invited to participate in this effort to develop international standards to handle the massive data collections that are now being created in every pharmaceutical company and every academic biology laboratory. Having consistent formats for the information objects will greatly speed the development of analysis tools AcknowledgementsThis research was supported by the Defence Advanced Research Projects Agency and the Pacific Northwest National Laboratories (Department of Energy). References
|
|
|