Unidentified thornyheads - what to do with missing values? #14
Replies: 4 comments 4 replies
-
I will ask Mark Scheuerell about his thoughts on smoother functions for proportion data. I still think the smoother would be a good way to deal with some of the extra variability in the ratio that isn't likely to be due to changes in actual relative population sizes. Using the interpolation for now seems acceptable to me though. |
Beta Was this translation helpful? Give feedback.
-
@JaneSullivan-NOAA Talked to Mark Scheuerell just now about this. He recommended that we could transform the data into logit space and then fit a timeseries-esque smoothing model that way, since the data should be normal in logit space. The only caveat with that is that logit(0) and logit(1) are both undefined, and since we have ratios that are 1.0 that could cause problems. If we want to do deal with it in logit space, I think we could get away with slightly modifying the the 1.0 ratio data points to be something like 0.99999, so they're close to their original value but defined in logit space. Since we aren't worried specifically about the fit, but about the smoothing properties, I think that could be acceptable. |
Beta Was this translation helpful? Give feedback.
-
OK - just a note that I am now following this thread. Happy to discuss later about this. Thanks @Ovec8hkin for the update and let's see what @JaneSullivan-NOAA thinks after this feedback from Mark! |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
@haleyoleynik @pyhernvann @shipmadison @Ovec8hkin @adamlhayes - Thanks for everyone's participation today! You all are awesome.
Today we discussed options for how to deal with missing values for the ratio of shortspine thornyhead to total thornyheads, which is the metric we use to split unidentified catch.
Here is a comparison of the two options we discussed today, a random walk model and linear interpolation. As @adamlhayes suggested, the observations in the random walk model are weighted by the unidentified catch in that year/fleet combination (i.e., the larger catches are assigned lower CVs, the smaller catches are assigned higher CVs). The linear interpolation method is more in line with what @pyhernvann suggested, where we use the observed values in years where they exist, and interpolated values for NAs only.
After going through this exercise and reconsidering the conversation from today, I think @pyhernvann was right. I'm switching my vote to the linear interpolation method. It's simpler and will require less explanation/justification in the text... additionally the simple random walk model will require more development because assumptions are violated (the data are not normal because they're in proportions).
1 vote ·
Beta Was this translation helpful? Give feedback.
All reactions