The Sustain Project consists of several modules working in tandem to allow our users to establish an end-to-end understanding of Urban Data.
The Sustain project includes support for a rich set of services that are aligned with the needs of the urban sustainability community. We engage with two of the three major urban sustainability networks in the United States: UWIN and UReX. Broadly services within Sustain include the following:
- Data federation
- Query specification including support for query predicate formulation
- Model validation
- Model building
Data Federation: Our data federation schemes reconcile data induced challenges: heterogeneity and volumes and storage mechanisms. All operators work across contiguous, disparate, or overlapping spatiotemporal scales. Additionally, a fluent interface allows chaining operators to formulate complex analyses subsets of data. To ensure timeliness, we manage the speed differential of the memory hierarchy, disperse loads and avoid I/O hotspots, preserve data locality during processing, and avoid disk and CPU contention.
A key feature we support is overlay of datasets. This involves fusion of datasets based on spatial and chronological attributes. An example of such an operation is to overlay topographical information such as roads or natural boundaries on observed phenomena – disease clusters, for instance. We provide support for feature class datasets (including ESRI shapefile format), such as city block polygons, roads, power lines, and rivers. Overlays will also be used to contrast regions at different points in time.
Queries: Our queries allow a user to constrain the geographical area of interest and rank results based on proximity. The query geometry may be specified as a feature class, including points, lines, quadratic and cubic (Bezier) curves, and polygons. Our proximity queries will allow location-based ranking and search of data of interest. Queries begin at an anchor point, and incrementally increasing annuli (donut shaped geometry) radiate outward till a certain number of results are available. For example, researchers will be able to identify environmentally sensitive areas or water supply systems adjacent to a contaminated site or chemical spill. Another important application of these queries is assessing the risk to assets due to sea level rise, hurricanes or flooding.
Model validation and assessments will be launched by treating the models as black boxes; models may be packaged either as application containers (e.g., Docker, RKT) or as a virtual appliance. Model assessments will be queued and scheduled as background jobs with low priority. Each model assessment workload will be encapsulated within a single VM (by default). The sizing and prioritization of the pool of VMs will be based on load to minimize interference with interactive explorations of the observational spaces.
The ingestion process refers to reconciling data formats, performing spatiotemporal data alignments, addressing issues of scale via sketching, and indexing the data so that it is amenable for querying, visualization, and visualization. Notably, we can perform fusion of datasets based on spatial and chronological attributes. An example of such an operation is to overlay topographical information such as roads or natural boundaries on observed phenomena – disease clusters, for instance. We provide support for feature class datasets (including ESRI shapefile format), such as city block polygons, roads, power lines, and rivers. Overlays will also be used to contrast regions at different points in time. We have currently ingested, or are in the process of ingesting, several spatiotemporal datasets relating to urban systems. These datasets are in different formats (CSV, Tabbed formats, GeoTIFF, JPEG, JSON, GeoJSON, GRIB, XML, ESRI geo-databases, ESRI shapefiles) from NGOs and state/federal agencies. Furthermore, each of the datasets includes a large number of features and can be available at different granularities.