Title: Advancing Science to Better Characterize Drought and GroundwaterDriven Low-Flow Conditions in NOAA and USGS National-Scale Models
This project is part of the multi-institutional project:
Advancing Science to Better Characterize Drought and Groundwater Driven Low-Flow Conditions in NOAA & USGS National-Scale Models
Project Lead: Dr. Donna Rizzo
University of Vermont Research Plan: We propose to develop state-of-the-art methods to improve national-level streamflow forecasts for low-flow conditions, where, in some regions, flows are dominated by baseflow from groundwater discharge rather than runoff from precipitation events. We will train machine learning algorithms using multiple observations including landcover/land-use, rainfall, soil moisture, and groundwater level datasets to identify reaches influenced by groundwater. Once these regions have been identified, we will develop methods to more accurately predict baseflow conditions in neighboring streams. We will refine and augment groundwater data with Earth observations to improve accuracy and compensate for sparse and missing groundwater data. We will leverage work being done with the USGS on Long-Short Term Memory (LSTM) models enhanced with feature selection. We will test Physics Informed Neural Networks (PINN) to ensure that our predictions are reasonable and so that they can be used reliably in sparsely gaged basins. We will develop physical models to calibrate and improve our machine learning algorithms. We will employ our resulting prediction methods to generate forcings and determine how to link these boundary conditions to the NextGen NWM using the HydroFabric framework. This work will improve the National Water Model (NWM) predictions for low-flow conditions, which to date have received less attention due to the emphasis on predicting extreme flooding events. These predictions will support operations where low-flow conditions are critical, including drought management, water supply, minimum flow rates for critical infrastructure, ecological sustainability, and river navigation. We will work with the USGS to ensure algorithms are compatible with computational frameworks/projects and that the USGS can use our results.
Background: In a recent collaboration with U.S. Geological Survey (USGS), the University of Vermont (UVM) has been leveraging the USGS low flow ML modeling efforts being developed within their Water Resources Mission project – Data-Driven Drought Prediction project. Specifically, the USGS has developed and is currently testing deep learning models, known as Long Short-Term Memory (LSTM) models, to improve daily estimates of low streamflow and to forecast streamflow drought at lead times ranging from 0 days to 60 days. This UVM statement of work will leverage these on-going collaborative efforts to assess and improve LSTM model performance with a focus on forecasts for streamflow under drought conditions. In recent decades, the duration and deficit volume of streamflow droughts – defined as abnormally low streamflow and the resulting lack of water in the hydrological system (Van Loon, 2015) – have increased in the southern and western U.S. (Hammond et al., 2022). The proposed ML methods offer an approach to increase the accuracy of the NWM predictions for low-flows (i.e., streamflow drought forecasts) and expand the spatial coverage of these forecasts, which to date have received less attention due to the emphasis on predicting extreme flooding events. Under low-flow conditions, groundwater (GW) contributions to base flow become a critical forcing, and characterizing GW interactions with streamflow at a continental scale is critical.
Proposed Training/Testing Data: BYU has recently used ML tools that leverage Earth observational datasets to impute gaps in historical groundwater level records (S. Evans et al., 2020; S. W. Evans et al., 2020; Ramirez et al., 2022). The regional USGS LSTM models are being trained and tested using 40 years (1980-2020) of daily streamflow data from 425 streamgages within the Colorado River Basin and surrounding area. In addition to estimating low streamflows at the gages, now-casting of the latter are being assessed at ungaged locations. The LSTM input features include a large set of static watershed attributes available for the National Hydrography NHDPlus V2.1 catchments (Wieczorek et al., 2018) as well as meteorological and remotely sensed dynamic forcing inputs that have been aggregated to basin averages. Proposed Tasks: UVM will leverage the above big data and existing ML models and, in concert with BYU and UA, will work to improve low streamflow estimates as well as short- and medium-range streamflow drought forecasts. To date, UVM has been performing feature selection to rank the importance of the LSTM input features with respect to model performance accuracies. Both UVM and the USGS will present preliminary findings at the SEDHYD conference in St. Louis, MO in May, 2023. USGS has publicly released daily streamflow streamflow percentiles and drought event datasets for gages spanning all of CONUS (Simeone, 2022) and will be preparing a data release of compiled model input features, so that all data are publicly available. As a result, we expect these data will be available to our BYU-UA-UVM research team as early as late Spring 2023. LSTM model performance is currently being assessed using a variety of performance metrics.
Because BYU has shown that aquifer storage curves correlate closely with long-term baseflow patterns observed in nearby streams and rivers, we hypothesize that LSTM model performance metrics will be correlated to the degree of groundwater-surface (GW-SW) water interactions. Thus, streamflows that correlate well with groundwater levels (e.g., perennial streams) may help improve forecasts of baseflow under low flow conditions. UVM proposes to expand the preliminary feature selection with an iterative clustering approach using new ML clustering tools to assess/improve LSTM model performance. Specifically, we will:
1) Cluster USGS gaged watersheds based on the LSTM model performance metrics and then perform feature selection on a clustered watershed basis. In year one, we will use the upcoming 2023 USGS data release. In year two, after groundwater baseflow data have been compiled by BYU, we will repeat the cluster-feature selection analysis (i.e., re-cluster watersheds based on their degree of GW-SW water interactions and repeat feature selection).
2) First perform a feature importance analysis on a watershed-by-watershed basis using a model input that varies dynamically (e.g., meteorology, degree of GW-SW water interaction, sensitivity to nearby groundwater levels), and then cluster the watersheds based on the importance/strength of the ranked input features. In this manner, we’d be investigating whether the LSTM models are able to capture SW-GW interactions.
3) Tasks 1) and 2) above can be re-done on a seasonal basis to leverage, and perhaps identify, those times of the year when intermittent streams might provide added predictive value.
4) Incorporate one (or more) GW-SW baseflow constraints into the loss/objective component of the USGS LSTM models. If time permits: (i) the USGS LSTM models could be re-trained using the input features selected in tasks above, and/or (ii) we might incorporate the BYU-UA groundwater baseflow data. https://water.w3.uvm.edu/scripts/bib/journal_article/query2.php?id=13