Failure to arrive at expected parameter estimates

1 messages 1 people Latest: May 19, 2016
Dear all, We just identified that the cause of the problem is model misspecification, which happens for small values of x near zero for a logarithmic function. We managed to solve the problem by using a shift of the x-axis by using this: C=THETA(1) B=THETA(2) S=THETA(3) F=C+B*LOG(FACTOR1+S) Thanks! Matthew
Quoted reply history
From: HUI, Ka Ho Sent: Thursday, May 19, 2016 4:18 PM To: [email protected] Subject: Failure to arrive at expected parameter estimates Dear all, I have some data x (input) and y (output), with 'inverse' heteroscedasticity, where variance is greater for smaller x. The data file is attached (data.txt). After filtering off all data with FILTER1=1 and FILTER2=1, the binned data plot looks like this (Question.jpg). Most data points are at small x (43.3% are between 0-10, 12.9% are between 10-20, 9% are between 20-30, 34.8% for the rest, data are more sparse at larger x) Blue points are the mean, red and purple points show the 5th and 95th percentiles in each bin. Green points are the SD in each bin. Curve estimations has been done and the equation for the means are shown as equation (1) and that for the SDs are shown at the bottom. Here is a template for our first control stream, written according to the results of curve estimation for means: $INPUT ID DV FILTER1 FILTER2 FACTOR1 MDV $DATA data.txt IGNORE=@ IGNORE=(FILTER1.EQ.1,FILTER2.EQ.1) $PRED C=THETA(1) B=THETA(2) F=C+B*LOG(FACTOR1) ;Relationship as shown in equation (1) Y=F+EPS(1) DUMMY=ETA(1) $THETA (-20, -0.5, 20) ;C, curve estimation result is -0.4465 (-20, 1, 20) ;B, curve estimation result is 1.0266 $OMEGA 0 FIXED $SIGMA 2 $EST METHOD=1 INTERACTION MAXEVAL=9999 PRINT=1 $COV $TABLE ... The fitted parameters are illustrated by equation (3), which is obviously biased below for x > 100. The bias was also observed in residual plots. To explain also for the heteroscedasticity, we tried another control stream, written according to the results of curve estimation for SD: $INPUT ID DV FILTER1 FILTER2 FACTOR1 MDV $DATA data.txt IGNORE=@ IGNORE=(FILTER1.EQ.1,FILTER2.EQ.1) $PRED C=THETA(1) B=THETA(2) C_SD=THETA(3) B_SD=THETA(4) W=C_SD*B_SD**FACTOR1 ;Relationship as shown in the equation at the bottom F=C+B*LOG(FACTOR1) Y=F+(W*EPS(1)) ;Variance depends on FACTOR1 DUMMY=ETA(1) $THETA (-20, -0.5, 20) ;C, curve estimation result is -0.4465 (-20, 1, 20) ;B, curve estimation result is 1.0266 (-20, 0.72, 20) ;C_SD, curve estimation result is 0.7529 (-20, 1, 20) ;B_DD, curve estimation result is 0.9962 $OMEGA 0 FIXED $SIGMA 1 FIXED $EST METHOD=1 INTERACTION MAXEVAL=9999 PRINT=1 $COV $TABLE ... The fitted parameters are illustrated by equation (2), which is still biased. Despite the fact that most data points concentrate at small x, which may have contributed to the bias at large x, we observed the fitted parameters (equation (2)/equation(3)) and note that these two equations are in fact over-estimating the means even at small x, and therefore we have no idea why these two equations resulted. We tried different initial estimates but in vain. It would be great if someone can give any advice! Thanks! Matthew