1. Exploratory analysis

1.1 Data Inputs

NOAA SEFSC visual data goes back to 1992, but as shown in the figure below, many predictor variables are only available starting in 2003, therefore earlier visual data is currently excluded from further analyses.

Note: Future work could use monthly climatologies (averages) so that older sightings data could be used. Some dynamic drivers like eddy and front locations would not be able to be considered using that approach.

Visual data predictor variable availability:

1.1.1 Splitting into testing and training sets

The data are split into training and testing sets. In this case, visual data from 2009 and acoustic data from 2013 were used only for testing. Only observations from 2003 or later were used for modeling due to covariate limitations.

1.1.2 Map of visual sightings data

The visual data selected for modeling are displayed on the map below. Data from 2009 were held back for testing. Blue markers indicate HARP locations.

1.1.3 Time series of acoustic data

The time series below show timeseries of estimated densities from passive acoustic data used for modeling (Densities were calculated following methods detailed in [1]). Data from 2011 and 2012 were used for training, and 2013 data was held back for testing.

Acoustic Timeseries:

1.2 Examination of covariates

1.2.1 Covariate distribution check

Distributions of covariates from acoustic observations (training data only):

Distributions of covariates from the visual observations (training data only):

Some of these covariates are more or less interrelated. Correlations are examined in the figure below. Numbers closer to 1 above the diagonal in the figure below represent correlation coefficients. If a pair of covariates is highly-correlated only one should typically be used in the model.

Covariate Correlations:

1.2.2 Transformation of predictor variables

Some variables, including chlorophyll, mixed layer depth and distance to fronts are highly skewed and were log-transformed for input to GAMs.

Below, the two sets of covariates have been combined and transformed:

1.2.3 Preliminary check of predictive power

To get an idea of the basic predictive power of these covariates, we can look at presence/absence relative to each variable. This also provides an opportunity to look at the range of values observed for each covariate in the visual and acoustic datasets. In the plots below dotted lines indicate the distribution of each covariate when Stenella spp. were present, and solid lines indicate the distribution when Stenella spp. were absent. Note that these plots do not account for effort.

Acoustic kernel densities:

Visual kernel densities:

1.2.4 Estimation of relative weights

To train the model, we need to know how much power the various data points have relative to one another. This is important because the duration, spatial coverage, and detection probabilities are quite different between the visual and acoustic data sets. If an animal is seen or heard, we know for certain that the species was present. However, if it was not heard, then it either wasn’t present, or it was present but missed.

“Zero inflation” or an excess of false zeros more common in the visual survey data because each data point represents a 10km transect section, traversed at survey speed (>10 knots) or approximately 30 minutes of observation effort. In contrast, the acoustic data are binned by day with a stationary instrument, therefore the probability of missing a group over the course of a day is lower.

For each data type, we estimated the probability of a missed detection to account for differences in zero-inflation, downweighting zeros according to the probability of a recording false negative.

The visual data represent wether or not Stenella spp. were seen during each transect segment. The probability of missing a sighting of Stenella spp. was estimated as

\[P_{V}(detect|present) = \mu_{det} * g0\ = 0.28 * 0.94 = 0.26\]

where _{det} is the mean detection probability as estimated by a model fit using the mrds package, and g0 is the probability of observing an animal on the transect line [2]. We assume that reported absences are likely to be true absences 26 % of the time, therefore zeros are given a weight of 0.26 on a scale of [0,1].

The acoustic data represent presence or absence of Stenella spp. in one day bins. Given that a group of animals is present near the sensor, the probability of detecting them in a 5 minute period within a 5 km range is estimated at 0.15, therefore the probability of missing an encounter is 1 - 0.15 = 0.85 [1,3]. Given that animals were present, the probability of missing a group for a full day (288 5-minute periods) is estimated as \[P_{A}(detect|present) = (1-0.15)^{288} \approx 1\]

Therefore we assume that there are no false negative days in the passive acoustic timeseries, and all acoustic observations are given weight = 1.

Best visual detection probability model for Stenella spp.:

2. Model Fitting

Models were fit using gam from the mgcv package in R [4].

2.1 Set up weighting

2.2 Run GAMs

Run GAMs on acoustic-only, visual-only, and joint acoustic/visual datasets.

varNames <- names(transformedCovars_AcOnly.train)
varNamesFormula <- varNames[2:10]
formulaAcOnly_allVars <-paste('s(', varNamesFormula, ", bs = 'ts', k = kVal)", sep = "", collapse = ' + ')
formulaAcOnly_allVars <- as.formula(paste('yAcOnly ~',formulaAcOnly_allVars))
gam_full_AcOnly <- gam(formulaAcOnly_allVars,
                               data = transformedCovars_AcOnly.train,
                               na.action = na.omit,family = tw())

formulaVisOnly_allVars <-paste('s(', varNamesFormula, ", bs = 'ts', k = kVal)", sep = "", collapse = ' + ')
formulaVisOnly_allVars <- as.formula(paste('yVisOnly ~',formulaVisOnly_allVars))
gam_full_VisOnly <- gam(formulaVisOnly_allVars,
                               data = transformedCovars_VisOnly.train,
                               na.action = na.omit,
                               family = Tweedie(p = 1.5, link = log),
                               weights = VisOnly.train_weightsG0)

formulaJoint_allVars <-paste('s(', varNamesFormula, ", bs = 'ts', k = kVal)", sep = "", collapse = ' + ')
formulaJoint_allVars <- as.formula(paste('y ~',formulaJoint_allVars))
gam_full_Joint <- gam(formulaJoint_allVars,
                        data = transformedCovars.train, 
                        weights = joint_train_weightsG0,
                        na.action = na.omit,
                        family = Tweedie(p = 1.5, link = log))

3. Model Summaries

Model summaries show that some terms were not significant.

3.1 Acoustic-only model

Model summary:

## 
## Family: Tweedie(p=1.484) 
## Link function: log 
## 
## Formula:
## yAcOnly ~ s(SST, bs = "ts", k = kVal) + s(SSH, bs = "ts", k = kVal) + 
##     s(log10_CHL, bs = "ts", k = kVal) + s(log10_HYCOM_MLD, bs = "ts", 
##     k = kVal) + s(HYCOM_SALIN_0, bs = "ts", k = kVal) + s(log10_HYCOM_MAG_0, 
##     bs = "ts", k = kVal) + s(Neg_EddyDist, bs = "ts", k = kVal) + 
##     s(Pos_EddyDist, bs = "ts", k = kVal)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.37870    0.02261   237.9   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                         edf Ref.df      F  p-value    
## s(SST)               3.1833      4 12.023 1.25e-11 ***
## s(SSH)               3.7407      4 18.756  < 2e-16 ***
## s(log10_CHL)         3.6937      4 44.634  < 2e-16 ***
## s(log10_HYCOM_MLD)   2.2507      4  9.123 1.60e-09 ***
## s(HYCOM_SALIN_0)     3.7781      4 15.012 5.20e-13 ***
## s(log10_HYCOM_MAG_0) 0.9042      4  1.927  0.00312 ** 
## s(Neg_EddyDist)      1.0717      4  4.887 5.92e-06 ***
## s(Pos_EddyDist)      2.5568      4  9.105 2.34e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =   0.17   Deviance explained = 15.1%
## -REML =  16292  Scale est. = 23.199    n = 2937

Smooths of predictor variables:

## png 
##   2

Plot of residuals:

3.2 Visual-only model

Model summary:

## 
## Family: Tweedie(1.5) 
## Link function: log 
## 
## Formula:
## yVisOnly ~ s(SST, bs = "ts", k = kVal) + s(SSH, bs = "ts", k = kVal) + 
##     s(log10_CHL, bs = "ts", k = kVal) + s(log10_HYCOM_MLD, bs = "ts", 
##     k = kVal) + s(log10_HYCOM_MAG_0, bs = "ts", k = kVal) + s(Neg_EddyDist, 
##     bs = "ts", k = kVal)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.01313    0.06113   82.01   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                         edf Ref.df      F  p-value    
## s(SST)               2.9144      4  3.315  0.00187 ** 
## s(SSH)               3.4899      4  6.106 1.32e-05 ***
## s(log10_CHL)         3.8588      4  3.952  0.00235 ** 
## s(log10_HYCOM_MLD)   3.0049      4 10.819 1.20e-10 ***
## s(log10_HYCOM_MAG_0) 3.0812      4  2.448  0.01550 *  
## s(Neg_EddyDist)      0.8277      4  0.571  0.08435 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.105   Deviance explained = 12.9%
## GCV = 16.564  Scale est. = 27.75     n = 1956

Smooths of predictor variables:

## png 
##   2

Plot of residuals:

3.3 Joint model

Model summary:

## 
## Family: Tweedie(1.5) 
## Link function: log 
## 
## Formula:
## y ~ s(SST, bs = "ts", k = kVal) + s(SSH, bs = "ts", k = kVal) + 
##     s(log10_CHL, bs = "ts", k = kVal) + s(log10_HYCOM_MLD, bs = "ts", 
##     k = kVal) + s(HYCOM_SALIN_0, bs = "ts", k = kVal) + s(log10_HYCOM_MAG_0, 
##     bs = "ts", k = kVal) + s(Neg_EddyDist, bs = "ts", k = kVal) + 
##     s(Pos_EddyDist, bs = "ts", k = kVal)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.37507    0.02243   239.6   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                        edf Ref.df      F  p-value    
## s(SST)               3.796      4 14.982 4.28e-13 ***
## s(SSH)               3.374      4  7.673 2.83e-07 ***
## s(log10_CHL)         3.044      4 35.772  < 2e-16 ***
## s(log10_HYCOM_MLD)   1.878      4 11.609 2.02e-12 ***
## s(HYCOM_SALIN_0)     3.922      4 16.490 7.57e-14 ***
## s(log10_HYCOM_MAG_0) 3.432      4  8.997 3.13e-08 ***
## s(Neg_EddyDist)      1.047      4  5.178 3.25e-06 ***
## s(Pos_EddyDist)      3.321      4  6.604 3.66e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.117   Deviance explained = 10.6%
## GCV = 19.866  Scale est. = 20.003    n = 4895

Smooths of predictor variables:

## png 
##   2

Plot of residuals:

3.4 Model prediction on joint dataset

## [1] "Mean absolute error scores (lower is better)"

##                            Mean Absolute Error
## Acoustic - Test Joint Data             284.865
## Visual - Test Joint Data               228.261
## Joint - Test Joint Data                255.005

4. Model Predictions

4.1 Temporal predictions

Predictions were made on the acoustic test dataset, and compared with actual observations for 2013. The predicted density of animals at each site was compared with the estimated daily density from the pasive acoustic record.

4.1.1 Acoustic-only prediction

Predicted and observed densities at passive acoustic monitoring sites using the acoustic-only model:

4.1.2 Visual-only prediction

Predicted and observed densities at passive acoustic monitoring sites using the visual-only model:

4.1.3 Joint prediction

Predicted and observed encounter probabilities at passive acoustic monitoring sites using the joint model:

4.2 Spatial predictions

Models were evaluated for summer (July 2009) and winter(January 2009) across the entire Gulf of Mexico (US EEZ beyond the 200m contour).

Note: As an alternative to training joint models, an average surface was computed across the acoustic and visual-only predictions.

Each grid cell in the following maps represents a 10x10 km square. Densities are therefore shown as estimated number of animals per 100 km².

4.2.1 Summer 2009 predictions

Summer 2009 predicted distribution and test sightings:

4.2.2 Winter 2009 predictions

Winter 2009 predicted distribution:

5. Monthly model predictions

Spatial model predictions were generated using climatological means of oceanographic variables, averaged by month between 2003 and 2015.

#References

1. Frasier KE, Wiggins SM, Harris D, Marques TA, Thomas L, Hildebrand JA. Delphinid echolocation click detection probability on near-seafloor sensors. The Journal of the Acoustical Society of America. 2016;140: 1918–1930. doi:http://dx.doi.org/10.1121/1.4962279

2. Palka DL. Summer abundance estimates of cetaceans in us north atlantic navy operating areas. Northeast Fish Sci Cent Ref Doc. 2006; 06–03.

3. Hildebrand J, Baumann-Pickering S, Frasier K, Tricky J, Merkens K, Wiggins S, et al. Passive acoustic monitoring of beaked whale densities in the gulf of mexico during and after the deepwater horizon oil spill. Nature Scientific Reports. 2015;5: 16343.

4. Wood S. Generalized additive models: An introduction with r. CRC press; 2006.

GoMx Stenella spp. habitat models: Density with GAMs

Kaitlin Frasier

25 September 2019

1. Exploratory analysis

1.1 Data Inputs

1.1.1 Splitting into testing and training sets

1.1.2 Map of visual sightings data

1.1.3 Time series of acoustic data

1.2 Examination of covariates

1.2.1 Covariate distribution check

1.2.2 Transformation of predictor variables

1.2.3 Preliminary check of predictive power

1.2.4 Estimation of relative weights

2. Model Fitting

2.1 Set up weighting

2.2 Run GAMs

3. Model Summaries

3.1 Acoustic-only model

3.2 Visual-only model

3.3 Joint model

3.4 Model prediction on joint dataset

4. Model Predictions

4.1 Temporal predictions

4.1.1 Acoustic-only prediction

4.1.2 Visual-only prediction

4.1.3 Joint prediction

4.2 Spatial predictions

4.2.1 Summer 2009 predictions

4.2.2 Winter 2009 predictions

5. Monthly model predictions