Ongoing work in the project targets two concurrent efforts relating to modeling: construction and validation/refinements. Our modeling efforts target building models directly over the ingested datasets and sketches. These could be process-based models that need observational data to ensure completeness. Alternatively, these could cases where researchers are interested in fitting models to the data using statistical or machine learning techniques.

The modeling efforts can combine process-based models and traditional analytics models. Consider one of our use cases where we are trying to model infrastructure vulnerability to post-fire flows. Our goal is to operationalize the model across the entire US West in Sustain creating an unparalleled picture of infrastructure vulnerability across the region. Besides datasets from relating to electrical, gas pipelines, hospitals and emergency services, we are also in the process of integrating data about watershed boundaries and streamflow paths from National Hydrography Dataset, Soil characteristics data from NRCS State Soil Geographic (STATSGO) Data Base, and latest fire hazard map from the US Forest Service.

Model validation and assessments will be launched by treating the models as black boxes; models may be packaged either as application containers (e.g., Docker, RKT) or as a virtual appliance. Model assessments will be queued and scheduled as background jobs with low priority. Each model assessment workload will be encapsulated within a single VM or container (by default). The sizing and prioritization of the virtualized resource pool will be based on load to minimize interference with interactive explorations of the observational spaces.