I have a dataset which contains data observed for patients who were hospitalized for moderate or severe infection and given one of two antibiotics. The number of days each patient was hospitalized before discharge was also noted down. The analysis is divided into 2 parts;

Part 1: I have to compute if there is a significant difference between the average days of hospitalization on the two antibiotics.

Part 2: I have to compute if there is a significant difference between average days of hospitalization on the two antibiotics, after adjusting for age and severity of infection

A sample from the dataset is given below.

> head(antibiotic)
trt age Infection.Severity Male Days.Hospitalized
1 Old Ab 41 severe 1 22
2 Old Ab 56 moderate 0 2
3 Old Ab 56 severe 1 15
4 Old Ab 66 moderate 1 15
5 Old Ab 45 moderate 1 3
6 Old Ab 41 severe 0 19

My question is, what type of a test can I use for this analysis?
Note that the treatment and infection severity data is not numerical.

I'm sorry, but I've to disagree with the previous answer.

The number of days in the hospital is a count variable, and hence obviously it's not normal. Unless the sample size is large (and the assumption of homoscedasticity is justifiable), t.test is not really appropriate here (in my opinion).

If you can somehow check whether the data for each treatment is Poisson (maybe using Chi square goodness of fit?), then you can check for equality for rates of the two distributions. You may use poisson.test function provided in stats.

But I'll admit if you want to take into account the severity of infection also, then I really have no idea.

Hi Yarnabrina, I agree. I don't think a t-test will be suitable.

The analysis is divided into 2 parts;

Part 1: I have to compute if there is a significant difference between average days of hospitalization on the two antibiotics.

Part 2: I have to compute if there is a significant difference between average days of hospitalization on the two antibiotics, after adjusting for age and severity of infection.

I was thinking of using linear regression on part 1 and multiple regression on part 2 but I am not sure. I think Kaplan-Meier survival analysis also does not apply to this analysis, does it?

Sorry, but I know nothing about survival analysis. Actually, it's offered in this very semester, but I didn't opt for it.

Regarding your idea about regression, I don't quite follow you. In 1st case, I don't understand how are you planning to use it. In 2nd case, I suppose you can fit two multiple regressions and expect that the residuals are free from the effects of the predictors, but then what? How do you plan to compare the two sets of residuals? And, though I don't really know the model assumptions for the error part in Poisson regression (I do get confused in glm), but won't you assume the same model for both residuals? Then, is it fair to compare their means later?

You may have understood by now that I don't really know much about this things. So, I'll refrain myself from making wild guesses further and leave people with more expertise to help you. But in case you solve the problem yourself, can you please share the solution? That'll be beneficial for other people who may face this problem later (such as myself )

head(antibiotic)
trt age Infection.Severity Male Days.Hospitalized
1 Old Ab 41 severe 1 22
2 Old Ab 56 moderate 0 2
3 Old Ab 56 severe 1 15
4 Old Ab 66 moderate 1 15
5 Old Ab 45 moderate 1 3
6 Old Ab 41 severe 0 19

I am having trouble with the datapasta command because the dataset is long;

datapasta::dp_set_max_rows(num_rows = 200)
datapasta::tribble_paste(antibiotic)
Supplied large input_table (>= 200 rows). Was this a mistake? Use dp_set_max_rows(n) to increase the limit.
NULL

I am sorry I cannot provide the dataset in a copy-paste friendly format but here is how I analyzed the data and it's two parts;