temp_regression.Rmd
The goal of this analysis is to fill in data gaps where temperature data are missing or the time series is incomplete in order to make the dataset more useful for SR JPE modeling. Temperature is an important covariate in understanding juvenile production though the completeness of these data vary by location.
Currently this analysis relies on a regression model and is performed for the Feather River and Yuba River. The resulting dataset with predicted values is saved and integrated in the development of a water temperature dataset.
Butte Creek is used to build the regression models because the time series is complete and the data are high quality.
Prepare datasets for regression analysis (dataset with no missing data is used to train the model and dataset with missing data are predicted using the model)
Fit and evaluate linear regression models for mean, min, and max temperatures
Make predictions for missing data using the fitted models
Combine predictions with actual measurements
Visualize the predicted and actual temperature over time to asses model performance trends
Use data where there are no missing data from either dataset for regression modeling
Use the regression model to make predictions from the testing dataset and evaluate
Use the model to make predictions for missing data
Before we developed any models, we explored the relationship between water temperature at each location. There is a linear correlation between mean, min, and max water temperature on Butte Creek and Feather River LFC. For example, the plot below suggests a strong linear relationship between the mean water temperatures of Butte Creek and Feather River LFC. The positive slope of the linear trend line implies that higher water temperatures in Butte Creek are associated with higher water temperatures in Feather River LFC. These visual representations support the results of the linear regression analysis, which identified a statistically significant relationship between the mean, max and min water temperatures of these two locations.
Plot of mean temp for Feather River LFC and Butte Creek
We built 3 regression models for Feather River LFC - one each for mean, min, and max water temperature relationships. We evaluated the models using the Mean Absolute Percentage Error (MAPE). The MAPE for all three models indicated good predictive accuracy.
##
## Call:
## lm(formula = temp ~ date + butte_temp, data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.9322 -0.8321 0.0508 0.7829 5.8784
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.108e+00 9.996e-01 -7.111 1.69e-12 ***
## date 7.677e-04 5.306e-05 14.469 < 2e-16 ***
## butte_temp 4.171e-01 5.933e-03 70.298 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.363 on 1720 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.7573, Adjusted R-squared: 0.757
## F-statistic: 2684 on 2 and 1720 DF, p-value: < 2.2e-16
## [1] 0.08457775
##
## Call:
## lm(formula = temp ~ date + butte_temp, data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.7646 -0.8095 0.0347 0.7484 6.1049
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.752e+00 9.754e-01 -4.872 1.21e-06 ***
## date 6.408e-04 5.179e-05 12.373 < 2e-16 ***
## butte_temp 3.941e-01 6.345e-03 62.111 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.329 on 1721 degrees of freedom
## Multiple R-squared: 0.7071, Adjusted R-squared: 0.7067
## F-statistic: 2077 on 2 and 1721 DF, p-value: < 2.2e-16
## [1] NA
##
## Call:
## lm(formula = temp ~ date + butte_temp, data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.4480 -0.7760 0.0212 0.8377 5.8177
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.321e+00 1.039e+00 -8.006 2.16e-15 ***
## date 8.257e-04 5.514e-05 14.975 < 2e-16 ***
## butte_temp 4.413e-01 5.538e-03 79.692 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.423 on 1720 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.7982, Adjusted R-squared: 0.7979
## F-statistic: 3401 on 2 and 1720 DF, p-value: < 2.2e-16
## [1] 0.08051571
The plot shows the predicted mean temperature of the Feather River LFC over time (which is similar for min and max predictions as well). The line represents the trend of the predicted mean temperatures, indicating how they change as the date progresses. This visualization helps to identify any patterns or trends in the mean water temperature over the observed period.
The plot below shows how the mean, min, and max temperatures for the Feather River LFC over time. Interpolated values are seamlessly integrated where observed data is missing, ensuring a continuous temperature dataset. Each water temperature type (mean, min, max) is represented with a different color to help in distinguishing the temperature trends and understanding the temperature fluctuations.
## Rows: 26,946
## Columns: 7
## Groups: stream, date, statistic, gage_agency, gage_number, site_group [26,946]
## $ date <date> 1999-12-31, 1999-12-31, 1999-12-31, 2000-01-01, 2000-01-0…
## $ stream <chr> "feather river", "feather river", "feather river", "feathe…
## $ site_group <chr> "upper feather lfc", "upper feather lfc", "upper feather l…
## $ gage_agency <chr> "interpolated", "interpolated", "interpolated", "interpola…
## $ gage_number <chr> "interpolated", "interpolated", "interpolated", "interpola…
## $ statistic <chr> "mean", "max", "min", "mean", "max", "min", "mean", "max",…
## $ value <dbl> 3.304975, 2.844276, 4.160482, 3.375258, 3.286411, 3.964067…
There is a linear correlation between mean, min, and max water temperature on Butte Creek and Feather River HFC. For example, the plot below suggests a strong linear relationship between the mean water temperatures of Butte Creek and Feather River HFC. The positive slope of the linear trend line implies that higher water temperatures in Butte Creek are associated with higher water temperatures in Feather River HFC. These visual representations support the results of the linear regression analysis, which identified a statistically significant relationship between the mean, max and min water temperatures of these two locations.
Plot of mean temp for Feather River HFC and Butte Creek
We built 3 regression models for Feather River HFC - one each for mean, min, and max water temperature relationships. We evaluated the models using the Mean Absolute Percentage Error (MAPE). The MAPE for all three models indicated good predictive accuracy.
The plot shows the predicted mean temperature of the Feather River HFC over time (which is similar for min and max predictions as well). The line represents the trend of the predicted mean temperatures, indicating how they change as the date progresses. This visualization helps to identify any patterns or trends in the mean water temperature over the observed period.
The plot below shows how the mean, min, and max temperatures for the Feather River HFC over time. Interpolated values are seamlessly integrated where observed data is missing, ensuring a continuous temperature dataset. Each water temperature type (mean, min, max) is represented with a different color to help in distinguishing the temperature trends and understanding the temperature fluctuations.
## Rows: 26,982
## Columns: 7
## Groups: stream, date, statistic, gage_agency, gage_number, site_group [26,982]
## $ date <date> 1999-12-31, 1999-12-31, 1999-12-31, 2000-01-01, 2000-01-0…
## $ stream <chr> "feather river", "feather river", "feather river", "feathe…
## $ site_group <chr> "upper feather hfc", "upper feather hfc", "upper feather h…
## $ gage_agency <chr> "interpolated", "interpolated", "interpolated", "interpola…
## $ gage_number <chr> "interpolated", "interpolated", "interpolated", "interpola…
## $ statistic <chr> "mean", "max", "min", "mean", "max", "min", "mean", "max",…
## $ value <dbl> 10.617213, 11.279177, 10.312868, 10.725223, 11.881711, 9.9…
Combine datasets with no missing data, and missing data
Identify gaps to predict
Use data where there are no missing data for either dataset for regression modeling
There is a linear correlation between mean, min, and max water temperature on Butte Creek and Yuba River. For example, the plot below suggests a strong linear relationship between the mean water temperatures of Butte Creek and Yuba River. The positive slope of the linear trend line implies that higher water temperatures in Butte Creek are associated with higher water temperatures in Yuba River. These visual representations support the results of the linear regression analysis, which identified a statistically significant relationship between the mean, max and min water temperatures of these two locations.
Plot of mean temp for Yuba River and Butte Creek
We built 3 regression models for Yuba River - one each for mean, min, and max water temperature relationships. We evaluated the models using the Mean Absolute Percentage Error (MAPE). The MAPE for all three models indicated good predictive accuracy.
The plot shows the predicted mean temperature of the Yuba River over time (which is similar for min and max predictions as well). The line represents the trend of the predicted mean temperatures, indicating how they change as the date progresses. This visualization helps to identify any patterns or trends in the mean water temperature over the observed period.
The plot below shows how the mean, min, and max temperatures for the Yuba River over time. Interpolated values are seamlessly integrated where observed data is missing, ensuring a continuous temperature dataset. Each water temperature type (mean, min, max) is represented with a different color to help in distinguishing the temperature trends and understanding the temperature fluctuations.
## Rows: 26,946
## Columns: 6
## Groups: stream, date, statistic, gage_agency, gage_number [26,946]
## $ date <date> 1999-12-31, 1999-12-31, 1999-12-31, 2000-01-01, 2000-01-0…
## $ stream <chr> "yuba river", "yuba river", "yuba river", "yuba river", "y…
## $ gage_agency <chr> "interpolated", "interpolated", "interpolated", "interpola…
## $ gage_number <chr> "interpolated", "interpolated", "interpolated", "interpola…
## $ statistic <chr> "mean", "max", "min", "mean", "max", "min", "mean", "max",…
## $ value <dbl> 15.98101, 14.79745, 16.52032, 16.07775, 15.43969, 16.25013…