Overview – CyberWater

The CyberWater project is to build an open-data open-model framework for easy and incremental integration of heterogeneous data sources and diverse scientific models across disciplines in the broad water domain. The CyberWater framework extends the open-data open-model framework called Meta-Scientific-Modeling (MSM) that provides a system-wide data and model integration platform. On top of MSM, the CyberWater framework provides a set of toolkits, and external system integration engines, to further facilitate users’ scientific modeling and collaboration across disciplines. For example, the developed generic model agent toolkit enables users to integrate their computational models into CyberWater via graphical user interface configuration without coding, which further simplifies the data and model integration and model coupling. CyberWater adopts a graphical scientific workflow system, VisTrails, ensuring data provenance and reproducible computing. CyberWater supports novel access to high-performance computing resources on demand for users’ computational expensive model tasks.

Examples and Use Cases

CyberWater offers a wide variety of tools (called “modules”) that allow the user to create complicated hydrological workflows for the execution of various models. Some models, like the Variable Infiltration Capacity model, have “Model Agents” included with CyberWater that allow for the easy execution of the models without any user intervention. CyberWater also includes the “Generic Model Agent” and “Static Parameter Agent” toolkits to allow the user to integrate their own models without the need for new code to be created. A few of these workflows are examined below to demonstrate just a few of the many use cases for CyberWater.

Native Model Agents: VIC4 and Routing

One of the best examples of the power of a CyberWater workflow can be demonstrated by a simple water balance simulation of the “French Creek” basin, located in the Southeast of Pennsylvania, using both VIC4 and a routing model. Starting first with just the Variable Infiltration Capacity component, all the “modules” needed to download forcing data, execute VIC4, and display the results are captured in the below workflow:

Here, we use the “TimeRange” and “SpaceRange” modules to define the spatial-temporal extent of the simulation, the “NCALDASAgent” and “PasswordDialog” modules to collect user information and download the complete forcing data, and the “msmUnitConversion” modules to convert the data into units usable by the model. The “VicAgent” model receives this prepared forcing data all within the workflow, generates all the necessary parameter files, and executes the VIC4 model. The results of the model are displayed with the “msmShowChart” module, which can be seen below:

By coupling this workflow with a routing agent, we can compute the expected streamflow at an outlet and compare it against real world data. The additional “modules” needed for this computation can be found below:

Now, we use the “DEMAgent” to download elevation data for the extent specified, which requires the “GISEngine” to hook into the GRASS GIS toolkit packaged with CyberWater. GRASS GIS is a powerful and open source GIS toolkit that modules in CyberWater leverage for more complicated operations. This toolkit is also used by the “msmComputeExactOutput” module to calculate the estimated outlet location based on the digital elevation map. The “RoutingAgent” module takes the digital elevation map and outlet location as inputs, along with the outputs from the VicAgent, to compute the expected streamflow at the outlet. It handles parameter preparation and model execution automatically and returns the model output back to the CyberWater workflow. Finally, we use CyberWater’s “UsgsAgent” to download the real streamflow at the outlet to compare it against the streamflow that was predicted. Using the “msmShowChart” module against to plot the results, we can see how well this simple workflow predicted the streamflow at the outlet:

These results are using default parameters without any calibration, but show how easy it is to create a powerful and reproducible workflow that involves several sets of forcing data and model executions.

Generic Model Agent Toolkit and On-Demand High Performance Computing: VIC4

CyberWater offers the Generic Model Agent Toolkit for both integrating users’ computational models not natively available in CyberWater and for executing those models on High Performance Computing resources. No code needs to be created by the user to integrate a wide variety of models, and the on-demand access to High-Performance Computing (HPC) resources allow users to create reproducible workflows that can leverage external computing power. Looking again at VIC4, we can rebuild the workflow using only the generic toolkit and have the model be executed on an external HPC resource. By modifying and rearranging the previous workflow, we can recreate the functionality using solely generic toolkit and execute the model on High Performance Computing resources using the following workflow:

We replace the “VicAgent” with five modules: The “MainGenerator”, “AreaWiseParamGenerator”, “ForcingDataFileGenerator”, “InitialStateFileGenerator”, and “HPC” modules. The MainGenerator module receives the forcing data from the upstream modules and orchestrates the other toolkit modules. The AreaWiseParamGenerator module is responsible for organizing the parameter files necessary for execution. The ForcingDataFileGenerator module forms the folder and file structure required by the model for its forcing data. The InitialStateFileGenerator module is responsible for placing the initial state files in the right folder structure on the working directory. Finally, the HPC module packages the model and sends it to an HPC resource, and then receives and interprets the results. Instead of the HPC module, the user can also use CyberWater’s “RunModuleAgent” module in the toolkit to execute the VIC model locally. In summary, users can use the generic toolkit to integrate their model’s into CyberWater without writing model’s agent code and can either run it on local desktop/laptop computer by using “RunModuleAgent” module or offload it to remote HPC resource by using ”HPC” module. Note that when users want to access HPC on-demand, they need to provide their HPC facility account for their access.

The flexibility of this toolkit allows the user to add many new models into CyberWater without the need for programming an entirely new native model agent. It covers much of the common requirements of hydrological models while being flexible enough that, if a model requires special intervention, the user can still implement small elements of code using CyberWater’s “PythonSource” module to add additional functionality while maintaining the simple reproducible nature of the workflows. The HPC module also provides key functionality for models that would otherwise be far too computationally expensive to run on the user’s own device. The main modules of this toolkit are highlighted below, but their implementation is very flexible and can vary significantly.

Static Parameter Agent Toolkit: VIC5

To allow for even more models to be implemented into CyberWater, the Static Parameter Agent toolkit offers many features that allow users to create novel parameter files at runtime for their models. While the Generic Model Agent toolkit covers model execution, it requires the user to provide the necessary parameter files for their model. To allow for more automation, the Static Parameter Agent allows for the user to generate a wide variety of parameter file types. While parameter files are rarely shared between models, there are many common parameter file formats that are frequently used. By leveraging this shared structure, modules within the toolkit allow you to populate these files with new data. Below is an example where all the necessary parameter files to execution VIC5 are created only using the toolkit:

This workflow shares a lot of the features that have already been demonstrated, but it also incorporates many of the new features of the toolkit to entirely produce and execute everything needed for the VIC5 model within the workflow. The “ConstantSizeParamAgent” is used to create parameter files where each cell has the same number and types of parameters. “ParameterEntry” modules are used to specify which parameters are actually inserted into the file. This module can create various parameters: it can implement the coordinates of the cell, can execute Python functions to generate data, or can just fill the file with constants. Here, these modules are used to create the soil parameter file.

The “VariableSizeParamAgent” is used to address the input file with a non-structured format where each cell can contain different combinations of another categorical variable. Here, we use it to create the vegetation parameter file for the VIC5 model. It has several modes of use: the full computation model demonstrated here requires a main map containing the modeling cells, and a higher-resolution map, holding the categories inside each cell. It uses these to create the necessary parameter file as specified in the module’s configuration.

This workflow also demonstrates another kind of Static Parameter Agent included in the toolkit. The “RoutingParameterFile” module is built specifically for routing modules and generates the network files needed for the model execution. These are included for very specific parameter files that are not easily generated in traditional ways and are included to still allow the user the flexibility of producing parameter files within the workflow while also meeting the needs of more complicated models.

Another notable element of this workflow is CyberWater’s grouping functionality. If the user has many modules that are used to complete one core task of the workflow, they may group these modules into a single module to make the workflow easier to follow. Here we group all the Generic Model Agent tools used to execute VIC5 into one module titled “VIC5 Agent using GT”. This makes the workflow much easier to view while still keeping modules as the fundamental building block of CyberWater.

Conclusions

These examples demonstrate only a small fraction of the complete functionality of CyberWater. Many tools and functionalities for building workflows in the CyberWater are not explicit in these workflows, like data provenance and error tracing . Ultimately, the goal of these features are to allow the user to create simple, reusable, and reproducible workflows that can be executed repeatedly and shared with little hassle. If you are interested in reading more about the modules featured here and other CyberWater tools, see our Documentation page!