TR-11-03 Geoanalytics

Jeff R. Heard. Technical Report TR-11-03, Geoanalytics, July 2011.

The pressures of producing science with global relevance and global impact have made understanding and using geographic essential to a large portion of scientific research. Geographic information systems live at the heart of projects in public health, environmental science, policy and government, situational awareness, and others. Geographic data need has also become a Big Data need. Datasets essential to these projects are often terabytes in size, or they are rapidly evolving streams of complex data.

The tools available to professionals looking to do things with geographic data have not grown to meet the Big Data problem. Traditional GIS allows researchers to build custom databases with analytics. Google Maps allows them to publish data to the web. Various open source tools exist for specialized and general GIS purposes. As of yet, the world is without a compelling infrastructure for integrating these. As a result, geographic solutions to scientific and social problems are often cobbled together, and the results are isolated silos that cannot be easily integrated or adapted to new and different data.

Traditional GIS solutions like ArcGIS and GRASS allow a user to import a number of maps and work with them as a project, doing complex analysis, but the results of this are offline, or at very least relatively static. There exist “onlining” modules for these, but they are built on a pre- web paradigm whose thinking pervades the online experience. Modern users expect integration, mashups, and web-based application platforms that include data “at the bleeding edge of now.”

What we will call “second generation” solutions, like Google Maps and Google App Engine allow a user to quickly create a map without prior training, requiring only a text editor, a web browser, and some patience. These solutions, in their simplicity, abstract away functionality that is needed for serious scientific analysis. These must be completed with other tools, imported into data formats tailored towards visual presentation, and do not preserve the data for analysis. This adds complexity to the scientific process as well as discourages the sharing of source data.