Reported impressive reduction in accidents due to ADS-B In

dmspilot said:
Where are the p-values?
Appears they used a binomial test based on estimated total operations and reported accidents. They say they only report statistically significant results but don’t give details on the p-level (presumably 5%) or how they corrected for multiple comparisons.

Nonetheless, on first read and without the details that seems as though it is valid. The effect size here is large (surprisingly so actually) and they have estimated fairly large number of operations, so a significant effect would not be surprising.

Could be other methodological issues? It seems like a very large effect.
 
asicer said:
Possible self-selecting dataset?
I think this is the potential big confound. Maybe people who took the trouble to have ADS-B during this timeframe are also more conscientious pilots.

Seems like that could be addressed in the dataset because the NTSB reports usually have estimates of pilot experience and currency.
 
dmspilot said:
No, it isn't. Maybe you didn't see my edit.

Let's say there are an average of 10 mid-airs per year. Let's also say that 5% of pilots are named Charlie. Let's further say that during a particular year, zero pilots named Charlie were involved in mid-air collisions. Would that surprise you? Would you think that being named Charlie helped you prevent mid-air collisions?
I think the binomial test should account for this. It is basically a test for a difference in ratios. In this case the fraction of equipped aircraft which have accidents vs the fraction of unequipped aircraft which have accidents.

They did not give the total operations which they are using for the denominator, though I gather it is in a general report.

I agree and would be surprised by the result in your example. However, the numbers they are using are different than that. 5% of the total number of GA operations under consideration is still a fairly large number. What would be needed to double check them is the values of the denominators in the ratios from the report they reference.
 
Let'sgoflying! said:
Reading the link now....first thing a medical reviewer is going to look at when perusing research is its source in order to possibly detect bias. This says Regulus Group did the research then goes on to say how they contract with the FAA etc...does RG benefit from an infusion of cash into the ADS-B system full stop.
I agree there are a lot of possible biases that could cause the FAA or their contractors to want to over-inflate the benefits of ADS-B.

This type of argument though is a subtle form of attack on the source or speaker. Technically such bias does not indicate whether the argument is correct or not because even fools can occasionally speak the truth.

That is one reason I don’t entirely agree with the current policy of requiring disclosures of conflicts in medical journals. I think we should look for better ways of improving reviews and the accuracy and replicability of papers rather than trying to ferret out biases.

Disclosure of potential conflicts is required now though. OTOH, I know of no reviewers who make that a first concern. I usually find it more productive to focus on the methods and results and see if they are correct.
 
Capt. Geoffrey Thorpe said:
Reducing consumption of Margarine reduces the divorce rate
It is a good point that correlation does not imply causality. But that does not mean that all such correlational studies should simply be discarded.

Rather there are other indicators of causality that need to be present to suggest causality.

One such is plausible mechanism. Is it plausible that having information about the location of other aircraft and weather in the cockpit helps pilots to avoid collisions and fatalities? Seems at least plausible to me.

Another is the lack of other potential causative factors. Are there other possible things happening in GA in these 4 years which would have caused this near halving of accidents in the planes equipped with ADS-B? I will leave it to others more knowledgeable than me to suggest those.
 
dmspilot said:
I didn't read the report cover to cover. But I see now you're right, they did check for statistical significance. And you're wrong, there was none as far as mid-airs are concerned.

View attachment 73527
Good point about the mid-airs lacking statistical significance. That may be due to a change in a low rate being difficult to detect, thus a lack of power in this sample, as previously suggested. There is a non-significant trend in the direction of lower mid-airs caused by installing ADS-B.

Per the classic theory of hypothesis testing, the failure to find a significant result does not say there is absence of an effect, rather, it is just says there is no statistically significant evidence of an effect.

It is a puzzling pattern to the results I think though. The mechanism for ADS-B reducing mid-air accidents seems more plausible than some of the other categories. Possibly strange though the lack of power may explain that.

It would be nice to see the numbers of operations they are using for the denominators as well as their mechanism of correction for multiple comparisons. Perhaps a fuller report will deal with that.
 
dmspilot said:
Do all 3 of the categories where there is a significant effect involve, directly or indirectly, weather?

Could it be the main significant effects are due to weather information being available to the pilot in the cockpit.

Or maybe even the CFIT is related to the presence of GPS position being displayed?
 
Capt. Geoffrey Thorpe said:
How do they come up with a "Overall reduction"? They don't. They come up with a reduction in accidents in very limited categories, not overall
Good question. They did not say how they are weighting this to make the overall claims as in the headlines.

Look at the pre-ADS-B accidents per year for Cessna 150's for an example of how much things can change over a few years that would be totally unrelated to any new technology:
View attachment 73528
There are a lot better statistical models they could have constructed, even with this dataset, to try and control for an overall trend, etc. Definitely not up to peer review standards at this point.
 
dmspilot said:
What trend? You just agreed there is no statistically significant difference between the two populations, and then you say there's a trend of lower accidents in one? There is no evidence of that. You can't say there is a trend.
This is a phrase often used in scientific discussion to mean that the averages go a certain way, but there is no significant difference. In this case, the average rate of mid-air collisions is lower for ADS-B equipped aircraft than for those without such equipment, though this difference is not statistically significant. It is not a statement which would normally be used in a publication but is often used in less formal discussions when trying to simply understand what the data are saying.

I think of the discussion here as such an informal discussion trying to understand the data, versus say a formal scientific presentation or debate trying to prove one person right or wrong.

What do you mean by "an effect"? There's no statistically significant difference between the two populations.
A statistically significant effect would be one where the presence or absence of ADS-B equipment, the independent factor, had an effect on the outcome, the likelihood of having a mid-air collision. A statistically significant effect is lacking here.

There is no statistically significant difference in the rate of mid-airs between ADS-B equipped and non-equipped aircraft. You can't infer or speculate from the data that ADS-B caused a reduction in mid-airs when there wasn't a reduction.
Actually, I don’t believe I ever stated there was statistically significant effect of ADS-B equipage on the rate of mid air collisions.

And yes, I think it is reasonable to speculate on possible causes and the reasons for possible lack of significance due to inadequate statistical power, provided that it is clearly labeled as such. That is how reasonable people proceed when they want to understand what the data mean in an objective manner. For example, such non-significant trends can indicate areas where further observations would be fruitful.

I would say overall there is some evidence here to suggest that mid-airs may be reduced by ADS-B, but that it is not convincing at this point. In a 3 alternative forced choice situation, in other words, if I had to pick between ADS-B reduces mid-air, or had no effect, or increased them, I would choose reduces. But the level of certainty on that point is certainly low enough that there is room for reasonable people to disagree. Further data could easily prove my choice to be wrong.
 
kyleb said:
Here's the problem. You are paying $3-$7k or thereabouts for a device that is way down on the priority list to improve safety. Most light aircraft would get far more safety benefit from a $4k single axis autopilot than they will from ADS-B.
This is an excellent point. Even if the statistics were to be born out by further study and analysis, does not imply it is a good use of funds for safety. Sadly, regulatory agencies often fail to perform this type of cost-benefit analysis. They tend to adopt an attitude of “our mission is safety, and we will always err on the side of safety, no matter the cost”.
 
chemgeek said:
necessarily mean there is a correlation, much less causation. A p-value of 0.05 still means there is a 5% chance the correlation is entirely random. There is even a higher chance a p<0.05 correlation is meaningless if you go correlation-hunting, that is, looking for various things that correlate and then coming up with a hypothesis rather than the other way around.
Yes, this where they should have disclosed how they corrected for multiple comparisons. They also don’t give the actual p-values, which would permit better understanding.

The description of the methods suggests they classified the accidents and then tested each category separately, not omitting any, which would argue in favor of their results. So I don’t see any evidence of cherry picking the tests in their description. But it would be much better to see that issue explicitly addressed.

I gather this will be presented in more detail in September, so hopefully we will learn more then.
 
dmspilot said:
I'm asking where you see a trend. I don't see any trends.
Table 2 shows total of 0 mid-airs for aircraft equipped with ADS-B in. It also shows 49 mid-airs for non equipped aircraft.

Thus, Table 3 shows there is an observed decrease of the rate of mid-airs in ADS-B equipped aircraft.

That is a trend in the common use of the term. “
trend /trend/ noun 1. a general direction in which something is developing or changing.” though it is not significant in this dataset, and thus a “non-significant trend”.
 
dmspilot said:
No, the table does not show that. The "Reduction in rate for ADS-B" is omitted for mid-airs, replaced with "NSS" for not statistically significant. Meaning there is no difference.
Please see second and third columns from the left. First non label row. Rate for unequipped is 0.3. Rate for equipped is 0. Thus there is a reduction in the observed rates.

The fraction of reduction in the 4th column is replaced by NSS as noted to indicate the failure of the test to produce a statistically significant result. For the way they are computing the fractions, it would be 100%, but this is apparently how the authors chose to indicate their significance results. Not the clearest way to do so. Normally in scientific papers all the fractions would be given and the significant ones marked with a separate symbol or the actual p-values from the tests would be given in a separate column or in parentheses.

A trend is something that occurs over time. We're comparing two populations against each other, not looking at what one population, sample, or variable does over time.
In the discussions I have been in, in the context of discussion of possible effects of independent variables, it can also refer the effect on a putative dependent variable of an independent variable as it changes in a particular direction, such as an increase or decrease. Sort of a generalization of time as the independent variable. But I would agree that is a bit different than daily usage and is a shorthand.

"Not significant" and "not statistically significant" are two different things and you keep using them interchangeably.
Again a shorthand. In this case we have been discussing primarily statistical significance and I had assumed it was clear from context.

But in any case, if the difference is true, namely a 100% reduction in the rate of mid-airs, I think most people would regard that as important or significant, in the non-statistical sense.

Now some might argue that given the relative infrequency of mid-airs that this is still not practically that important or significant, not worth the cost, etc. Good points for discussion but beyond the questions regarding the statistical interpretation of the data in this report.
 
dmspilot said:
Any perceived reduction is due to the much smaller population of ADS-B equipped aircraft and random chance. That's what statistically insignificant means.
This is where the question of the meaning of a failure to reject the null hypothesis enters in. In the classic interpretation of hypothesis testing, that is all it is, a failure to reject. One is not supposed to ascribe meaning to that per se.

The failure to find a statistically significant difference (SSD) can be caused by at least two things. Either the observations are due to random chance or there is simply not enough power (enough observations) to detect a difference. Either one may be true.

So I would contend that it is not correct to assert that it is necessarily due to random chance, as stated above, when the other alternative, low power, is a real possibility.

To understand this distinction a bit more, consider the case where there is no data at all. Clearly there will be no SSD. Does that mean there is no difference? Or that there can’t be one? Or that it is nonsense to discuss the possible existence of one? The best explanation in such a case is that there is no data.

Similarly in this case. Given the low rates of mid-airs, there may not be enough data to detect these small differences. Given there are two possible explanations for a failure to obtain an SSD, I don’t think it is reasonable to assume it it is due to no actual difference, especially since estimation theory tells us that the best estimate of the rates, given this data, is that it is 0.3 for unequipped and 0.0 for equipped.

One way to resolve this would be to compute the power of the test to detect a change in the rate of mid-airs, given the sample size. If that power is high, I would then agree that it becomes more likely there is no real difference. The study may have been poorly designed in that it has low power to detect an SSD given the low rate of mid-airs. I suspect that is what happened here, but since the denominator numbers are not given, it is not possible to compute the power. The estimated difference in rate is low enough and the likely number of operations is high enough that my intuition is not very good, the power could go either way. Really would have to be computed. The numbers could be found in the references if one is sufficiently interested in arguing that point.

Practically speaking we should be suspect that there is a real difference here, given the lack of a statistically significant difference, but if we want to understand the effect of ADS-B equipment on mid-airs more clearly, it may be fruitful to collect more data.
 
dmspilot said:
Yet that's exactly what you're doing — ascribing meaning to a non-statistically significant correlation (which is an oxymoron, there is no such thing) between ADS-B and mid-airs.
I beg to differ. What I am saying is that the best estimates of the rates are 0.3 and 0.0 and that may be interesting. I believe I have always been careful to note that this difference is not statistically significant. The observation that the rates are different does have some meaning. Indeed, estimation theory tells us the means of the observations are our best estimates of the actual values of the rates (by actual, I mean what is really true in reality and presumably reflected in a very large sample). I revised my prior post while you were responding -- it may be informative to review what I say about power of a test there. That is really the way to get to the bottom of this.

Now, where you might be confused is that you think I'm trying to say that ADS-B has no effect on mid-air collisions in general, or that it won't several years from now. But I'm not saying that. I'm talking about the data in THIS study, the one you started the thread about. The data show no effect.
I agree there is definitely some confusion. It may arise from exchanges like the following -

PeterNSteinmetz said:
The mechanism for ADS-B reducing mid-air accidents seems more plausible than some of the other categories.

DmsPilot said:
There is no statistically significant difference in the rate of mid-airs between ADS-B equipped and non-equipped aircraft. You can't infer or speculate from the data that ADS-B caused a reduction in mid-airs when there wasn't a reduction.

You see, there was an observed reduction and it was shown in the tables 2 and 3 (apparently previously ignored). Yet this post says "when there wasn't a reduction". Confusing. And instead of assuming the other person is "wrong" it might be best to try and clarify the confusion.

It is likely useful in reducing confusion here to review what I had revised into my prior post regarding statistical power. If one is really interested in ascertaining how to best interpret the failure to obtain a statistically significant result in this test (there are two possible explanations), versus say just having an argument, probably best to go and determine what the number of operations they were using as denominators and compute the power of the test. That would be my suggestion.
 
dmspilot said:
I did not ignore the table, I explained with an example a 10-year old could understand why it is statistically meaningless.
Glad to hear it was not ignored, but that was confusing. Perhaps that is not surprising because it turns out the proper interpretation of the failure to obtain a statistically significant result was the subject of a long debate between professional statisticians, mathematicians, and scientists. Sort of resolved into several schools of thought.

I think it would be fair to say that the recognition that one is dealing with two types of probability in any statistical test -- the significance level, that is the likelihood that one is obtaining the results by chance because there is no real difference -- and the power of test, that is the likelihood of finding a significant result if the difference is true -- are both important aspects when considering the meaning of the data and the results of the tests.

So my point is that one should pay attention to both the significance level and the power when considering this result.

It is not always easy to know the answers in a particular case because the application of statistics can be subtle. For example, there is likely an error in the interpretation of two successive statistical tests in the field of single neuron recording in neuroscience which has affected a large part of the publications in the field by professional scientists. See for example Steinmetz PN, Thorp CK (2013) Testing for effects of different stimuli on neuronal firing relative to background activity.Journal of Neural Engineering, 10: 056019.
 
Palmpilot said:
What would be a valid use for a non-statistically-significant best estimate?
Good question. For one, it can suggest areas of potentially fruitful additional data gathering. It also can be the basis of a hunch about a scientific hypothesis to be further investigated.

Or in this case, it can suggest an area where it would be useful to compute the power of the test in order to better interpret the results and decide whether it is more likely that a test failed to be significant because the test lacked power or because there is no real effect.

Finally, if you are in a betting game, bet on the mean expectation of outcomes, even if you don't have enough data to know for sure (like the rest of the cards in the pack).
 
Capt. Geoffrey Thorpe said:
Number in column C may be less than the number in column B. At first glance it would appear that C is less than B. But because of the normal variation one needs to apply statistics to determine if we have some confidence that the results for column C are, in fact, real and not just the result of random variation. When it is determined that the difference between B and C is not significant, it means that it's possible that the "real" underlying average for "C" could actually be equal or higher than the "real" average value for B.

Now, it could be that ADS-B would result in fewer mid air events. And it might make sense that it is a likely outcome. But, given the available data there is no way to demonstrate that.

You flip a coin 6 times. You get 4 heads and 2 tails. Does that mean that coins are more likely to come up heads?
Nice explanation. As I explained in my other post above, for ADS-B it would be useful to determine the other important statistical aspect of the test, its power to detect a difference. That would let us better judge whether there is more likely to be no real effect or if we just don't have enough observations.

And yes, if betting on the coin, the best best would be that it is unfair, but not by much. The best estimate of the probability of getting a heads on that coin is 4/6, but it is not statistically significantly different from 1/1, so I wouldn't put too much money on it!
 
Back
Top