I have a problem with my datasets. It has a column consisting of range of age of patients but it's all in a messy formatting. For example, there's a value '25-29' but also '25-44' and even '25-99'.
Is it okay if I find the midpoint of each value and assigned them to each of the record? I want to know exactly what age my patients are or at least where do they fall within the range (example: <17, 17-25, 26-35, etc.)
I think that will really depend on what you are doing. How are you obtaining the data that you do have? And how would you go about getting the precise age?
So after reading through the repo linked, if I were trying to do any sort of analysis with that data, I think I would probably not use the data from that column, at least not without significant transformation. As a general rule, you should only impute data where you have a reasonable expectation that your imputation approximates imputed value appropriately. This would seem to be even more true for an epidemiological dataset, where I would expect the information value of age to be high.
I think that if you knew more about how this data was collected, that would probably be most informative in you decision to include or exclude the column.
I have no idea how they collected the data. I was only searching through and playing with datasets and this one comes. I'm still omw learning to find usable datasets.
If I was going to use a method of analysis that allowed for weighted data, I'd be tempted to duplicate a record for each year of its possible age within the age range, and give it a weight appropriate to keep the total weight contribution of the row as 1 in my dataset. This would allow for the possibility of detecting some signal from noise, whilst avoiding introducing undue bias I think. Curious to know what others would make of that approach.
Hey, I'm really curious about the technicality of this method you mentioned.
I'm not a statistician and I would like to learn more about weighted data. What analysis allow and does not allow weighted data? Can you please give me a link as to how to learn to do this? Thank you ><