RE: Failure to arrive at expected parameter estimates
Dear all,
We just identified that the cause of the problem is model misspecification,
which happens for small values of x near zero for a logarithmic function. We
managed to solve the problem by using a shift of the x-axis by using this:
C=THETA(1)
B=THETA(2)
S=THETA(3)
F=C+B*LOG(FACTOR1+S)
Thanks!
Matthew
Quoted reply history
From: HUI, Ka Ho
Sent: Thursday, May 19, 2016 4:18 PM
To: [email protected]
Subject: Failure to arrive at expected parameter estimates
Dear all,
I have some data x (input) and y (output), with 'inverse' heteroscedasticity,
where variance is greater for smaller x.
The data file is attached (data.txt).
After filtering off all data with FILTER1=1 and FILTER2=1, the binned data plot
looks like this (Question.jpg).
Most data points are at small x (43.3% are between 0-10, 12.9% are between
10-20, 9% are between 20-30, 34.8% for the rest, data are more sparse at larger
x)
Blue points are the mean, red and purple points show the 5th and 95th
percentiles in each bin. Green points are the SD in each bin. Curve estimations
has been done and the equation for the means are shown as equation (1) and that
for the SDs are shown at the bottom.
Here is a template for our first control stream, written according to the
results of curve estimation for means:
$INPUT ID DV FILTER1 FILTER2 FACTOR1 MDV
$DATA data.txt IGNORE=@ IGNORE=(FILTER1.EQ.1,FILTER2.EQ.1)
$PRED
C=THETA(1)
B=THETA(2)
F=C+B*LOG(FACTOR1) ;Relationship as shown in equation (1)
Y=F+EPS(1)
DUMMY=ETA(1)
$THETA
(-20, -0.5, 20) ;C, curve estimation result is -0.4465
(-20, 1, 20) ;B, curve estimation result is 1.0266
$OMEGA
0 FIXED
$SIGMA
2
$EST METHOD=1 INTERACTION MAXEVAL=9999 PRINT=1
$COV
$TABLE ...
The fitted parameters are illustrated by equation (3), which is obviously
biased below for x > 100. The bias was also observed in residual plots.
To explain also for the heteroscedasticity, we tried another control stream,
written according to the results of curve estimation for SD:
$INPUT ID DV FILTER1 FILTER2 FACTOR1 MDV
$DATA data.txt IGNORE=@ IGNORE=(FILTER1.EQ.1,FILTER2.EQ.1)
$PRED
C=THETA(1)
B=THETA(2)
C_SD=THETA(3)
B_SD=THETA(4)
W=C_SD*B_SD**FACTOR1 ;Relationship as shown in the equation at
the bottom
F=C+B*LOG(FACTOR1)
Y=F+(W*EPS(1)) ;Variance depends on FACTOR1
DUMMY=ETA(1)
$THETA
(-20, -0.5, 20) ;C, curve estimation result is -0.4465
(-20, 1, 20) ;B, curve estimation result is 1.0266
(-20, 0.72, 20) ;C_SD, curve estimation result is 0.7529
(-20, 1, 20) ;B_DD, curve estimation result is 0.9962
$OMEGA
0 FIXED
$SIGMA
1 FIXED
$EST METHOD=1 INTERACTION MAXEVAL=9999 PRINT=1
$COV
$TABLE ...
The fitted parameters are illustrated by equation (2), which is still biased.
Despite the fact that most data points concentrate at small x, which may have
contributed to the bias at large x, we observed the fitted parameters (equation
(2)/equation(3)) and note that these two equations are in fact over-estimating
the means even at small x, and therefore we have no idea why these two
equations resulted. We tried different initial estimates but in vain.
It would be great if someone can give any advice!
Thanks!
Matthew