logistic regression

From: James Bailey Date: September 18, 2001 technical Source: cognigencorp.com
From: "James Bailey" <James_Bailey@EmoryHealthCare.org> Subject: logistic regression Date: Tue, 18 Sep 2001 16:24:59 -0500 I believe the difficulty with logistic regression for sparse dichotomous data can be well appreciated by considering the case of binary data (for example, loss of responsiveness with an intravenous anesthetic) with one data point per patient. The probability of a positive drug effect is given by P = C**gamma/(C**gamma + C50**gamma) (1) This is equivalent to a model which postulates an underlying continuous drug effect E given by E = gamma*ln(C/C50) + epsilon (2) where epsilon is a random variable with a logistic distribution. It is further postulated that a positive binary drug effect is observed if E > 0 The probability of positive binary drug effect is equal to the probability that epsilon is greater than -gamma*ln(C/C50). and using the definition of the logistic distribution one can easily derive equation (1). Now consider interpatient variability and assume that ln(C50) =ln(<C50>) + eta where <C50> is the "typical value" and eta is normally distributed. Then E = gamma*ln(C/<C50>) + gamma*eta + epsilon In this case the probability of a positive binary drug effect is equal to the probability that the random variable gamma*eta + epsilon is greater than -gamma*ln(C/<C50>). However, consider the situation where epsilon conforms to a normal distribution instead of a logistic distribution. Then gamma*eta + epsilon also has a normal distribution and it is impossible to determine the relative contributions of eta and epsilon to the overall variance. In this situation it is impossible to do a complete analysis of binary data with one data point per patient. This, of course, corresponds to probit analysis but it makes the difficulty apparent. The normal and logistic distributions are not that different. Doing a population analysis of sparse binary data depends on the ability to distinguish between the two distributions and will be almost impossible. Furthermore, it rests on the assumption of an underlying logistic distribution for the intrapatient variability (in epsilon), and there is little basis for this assumption. I and my colleague Wei Lu have done some simulations and our results indicate that from 5-10 data points per patient are necessary to estimate <C50> or gamma with any degree of reliability. Jim Bailey
Sep 14, 2001 Charlotte van Kesteren logistic regression
Sep 14, 2001 Lewis B. Sheiner Re: logistic regression
Sep 14, 2001 Michael Fossler Re: logistic regression
Sep 17, 2001 Vladimir Piotrovskij RE: logistic regression
Sep 18, 2001 James Bailey logistic regression