Geospatial Feature Conflation:
Conceptual, Statistical, and Optimization Approaches

NGA Grant Award image Before Conflation
Before Conflation
NGA Grant Award image After Conflation
After Conflation
NGA Grant Award Real World Image
Real World Image

Funding for Geospatial Feature Conflation:
Conceptual, Statistical, and Optimization

A two-year research award to Mike Goodchild and Martin Raubal
10/1/2009–€“9/30/2011, with potential for renewal until 9/30/2014

This research proposes to: design a relational-algebra framework for conflating geospatial data from diverse sources; develop statistical and optimization approaches for multi-source data integration; and develop new methods of spatiotemporal reasoning. The research will extend across different time instants and different data standards. It pertains to NGA’s 2009 NURI solicitation Topic #4.3: “Harvesting and Using Data from Heterogeneous Digital Sources” and is also partially relevant to Topic #4.6: “Using Qualitative Descriptions of Spatio-Temporal Entities.”

Four research themes will be addressed in this project: (i) development of a relational-algebra framework and formalization for conflating heterogeneous geospatial data; (ii) statistical and optimization approaches for conflating multiple geospatial data sources; (iii) provenance characterization and uncertainty evaluation in geospatial data conflation; and (iv) conflation-based approaches to spatiotemporal reasoning.

By describing and modeling the process systematically in a consistent manner, theme (i) is critical for understanding major components in heterogeneous data conflation and for providing guidelines for choosing appropriate methods. Theme (ii) addresses the issues of non-optimality in existing conflation techniques and of increasing data accuracy using statistical and optimization approaches. Theme (iii) aims to represent uncertainty propagation and quality assessment in conflation through provenance characterization and modeling. Theme (iv) proposes to facilitate spatiotemporal reasoning by developing conflation-based approaches that take into consideration geospatial data from various sources to construct multiple constraints about geographic features in a multi-dimensional space-time context.

Project findings will be disseminated through: (i) presentations at academic conferences and NGA meetings, (ii) publications in relevant journals, (iii) development of open-source prototypes using the proposed approaches, (iv) development of a Web service for conflation that can be incorporated into service-oriented architectures, and (v) applications of these techniques and approaches to several real-world case studies.

The proposed research will provide a theoretical foundation to the integration of incompatible geospatial data. This problem is currently addressed by ad hoc solutions using dataset-sensitive techniques. It will develop novel approaches to implementing the conceptual framework, thus improving conflation results and enhancing spatiotemporal reasoning. The findings of this project will not only meet the requirement of creating higher-accuracy data from multiple sources, but will also offer a new direction for utilizing rich yet incompatible geospatial data to facilitate spatial reasoning. Its results will be of substantial benefit to NGA, scientific researchers, policy makers, and the general public.

The increasing and rapid development of remote sensing and other technologies as well as the growth of the Internet provide abundant opportunities to collect and access vast volumes of geospatial data. In addition to well-known datasets provided by government, such as US Census TIGER/Line files and free data services like Google Earth, large amounts of geospatial information are being generated daily by individuals worldwide, which creates an increasingly extensive net of volunteered geographic information. Large volumes of geospatial data have the potential to benefit scientific research, decision making, and everyday life. However, it is not always straightforward to take advantage of this abundance due to inconsistency, incompatibility, and heterogeneity among various datasets. Rather than a visual overlay of data from diverse sources, conflation of heterogeneous datasets provides a better solution since it opens possibilities for updating, averaging to obtain better estimates, and analysis and modeling. (Many terms are routinely used to describe different forms of geospatial data integration. Fusion is the accepted term when dealing with imagery, but here conflation is preferred as an umbrella term since the primary emphasis in this project will be on vector and mixed vector/raster integration.)

The difficulty of conflation depends on many factors, such as complexity of representation and the volume and accuracy of the datasets involved. Specifically, incompleteness and inaccuracy of the original datasets, different reference systems, distinct generalizations and representations of reality, semantic issues of terminology and classification, various scales, and different purposes, as well as various time frames all create challenges in the use of geospatial data from heterogeneous digital sources.

Although there are several ad hoc solutions of digital geospatial data integration designed for particular datasets (e.g., Saalfeld, 1988; Samal, Seth, and Cueto, 2004; Walter and Fritsch, 1999), geospatial data conflation has not been systematically and adequately studied as a general and fundamental problem in geographic information science. This project seeks to investigate integration and assessment of incompatible geospatial data by developing a comprehensive framework for conflation from diverse sources, and by creating methods that can effectively and efficiently incorporate multiple-source data into a consistent structure. Specifically, we propose to address the following four research themes, and to extend to other research questions if time permits:

  • A general conceptual and theoretical framework for conflation
  • Statistical and optimization approaches to conflation
  • Provenance characterization and uncertainty evaluation
  • A conflation-based approach to spatiotemporal reasoning

For more information about this project, please see NGA research grant to Goodchild and Raubal.

Last modified: April 7, 2014