CAPIM Data Aggregation

The completed CAPIM (Centre for Aquatic Pollution Identification and Management) data aggregation project was a proof-of-principle project intended to provide a centralised research data store to CAPIM consortium members. Based on a commercial open source document-management system called Alfresco, a scheme for ingesting Excel (xml and csv format) data files into a PostgreSQL database was developed. A research data domain model/database schema was provided as part of this development. This domain model incorporated geo-tagged data. Microsoft Access Forms were also provided to maintain compatibility with existing data-management practices. 

The particular advantage of Alfresco in this scheme was its support for a virtual file system (VFS) that gave researchers a non-intrusive mechanism for managing data held in familiar file formats. In addition, metadata support by the VFS allowed not only file-tagging, but also file-relationship management intended to facilitate computer-aided control of the workflow from raw data, to analysis product, to research paper.

Opportunities for VeRSI to further engage with CAPIM and the eResearch Centre include: leveraging the developing VeRSI program under the strategic capability areas of Research Data & Repositories.

 

Aims and objectives

The CAPIM Data Aggregation project was a proof-of-principle in research data management. Future work will test the ability of the design to ingest new forms of research data, integrate that data into a flexible data model and provide that data to external clients.

Specifically, the extension of the proof-of-principle system will take the existing data store into two separate directions:

  • Provision of geo-tagged sensor and processed image data for eventual incorporation into an interactive public display implemented as a Liferay Portlet.
  • Ingestion and organisation of near-realtime field data in the form of sensor output and feature-processed video files.

 

Outcomes

The project will facilitate the provision of near-realtime water-quality data to researchers within the context of the larger CAPIM/BEIP project. In turn, this will allow a more rapid response to water-pollution incidents, with the aim of establishing their causes and consequences.

 


Project details

ID number  CAPIM/P/XXX

Project title  CAPIM Data Aggregation Project Extension

Start date  November 2011 End date  February 2012

Lead institute  CAPIM

Principal investigator  Steve Marshall – Program Manager CAPIM/BEIP Project

Partner PIs and/or participating institutions  Richard Sinnott, Melbourne University eResearch Centre

VeRSI executive sponsor  Dr Ann Borda VeRSI Executive Director

VeRSI project management  Jared Winton

Brief summary of project  The existing Alfresco data store will be modified to ingest data from field instruments and provide data to a GeoServer data store.

 

Keywords: CAPIM | Data | Aggregation | VeRSI | eResearch | Melbourne University | Pollution | Identification |