What could be a good way to interpret this neurophysiological data?

Cross Validated Asked by Øystein Dunker on November 28, 2020

We are looking at different combinations of neurophysiological tests, and trying to find out which combinations of variables are best suited for clinical use (diagnosing diabetic polyneuropathy). To combine the data, we compute Z-scores and combine these into aggregated scores, and we classify >2SD as abnormal.

Ideally we would just look at sensitivity/specificity/ROC. The problem is that we do not have gold standard data, i.e. we do not have access to the medical files of the participants to investigate at a later date whether they actually were diagnosed with the disease. We still want to try to use this data as well as we can to inform future studies. So since we do not know the true prevalence in the groups, we need to find some sort of proxy for sensitivity/specificity that makes sense..

One group consists of diabetics (n=68), most of whom are likely to have the disease, and the other group are pretty healthy controls (n=38), likely to contain none or very few. The standard deviations used are based on another dataset, with even stricter criteria (super healthy people, n = 600 ish), and are adjusted for known covariates (age, sex and height), so we think that the >2SD limits should be pretty good.

So what we end up with are "prevalence rates" (or at least positive/negatives) of the disease in both groups (we do not know exactly what rate is "true", seems like its somewhere around 75% for the diabetics based on some ceiling-effect here, and between 5-23% for the controls – could be actual findings or false positives). So far we looked at these ways of interpreting the data, but are not entirely sure which makes the most sense, or if maybe something else would be much better suited?

  • Abnormality ratio: Prevalence disease dia group / prevalence disease control group. This ratio is affected a lot by prevalence in the control group.
  • "Accuracy ratio" (might go under another name?): prevalence disease dia group / prevalence healthy controls. This one varies much less, and is preferred by one of the researchers, but it feels a little weird that assumedly false positives in the control group increases the ratio.
  • "Accuracy": (n positive dia + n negative healthy) / total n included.
  • Make assumptions about the data, i.e. that the ceiling of 75% is the true prevalence in the diabetic group, and that the control group is completely healthy.

Add your own answers!

Ask a Question

Get help from others!

© 2024 All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP