Data checking macro

2 messages 2 people Latest: Sep 13, 2007

Data checking macro

From: Mark Sale Date: September 13, 2007 technical
Colleagues, I suspect I'm not the only one who has, over the years had the experience of spending a week (or more) on an analysis only to find important errors in the data set. I'm hoping for some feedback on what people do to try to find these errors (short of spending a week on an incorrect data set). To start the discussion, I've put on the Next Level web site (www.NextLevelSolns.com/downloads) an Excel macro that I've used, with some success to find errors. My experience is that most errors, at least those that are hard to find, are in the dosing specification. This macro makes histograms of: Each covariate DVs Dose Amts Dose Times (after expanding the ADDL doses) Interdose interval (after expanding the ADDL doses) - time from each to the previous Dose to DV time (after expanding the ADDL doses) - time from each observation to the previous dose Currently this macro is limited to 12 covariates, could be increased easily if there is interest. It also isn't CMT specific, that is all doses are just listed, without regard to CMT, same with DVs. I might fix this someday. As usual, this is an entirely in my own self-interest, looking for better ways to find problems in data sets, so please give feedback or ideas. Mark

RE: Data checking macro

From: William Bachman Date: September 13, 2007 technical
PDx-Pop 2.2 (and below) has an Excel macro that imports the data sets and automatically gives the following plots: DV vs ID, AMT (dose) vs ID, TIME vs ID, and DV vs TIME. The idea is to graphically look for outliers in these plots. PDx-Pop 3.0 (not yet released) also gives you the option to import the data into R or S-Plus (your choice) and automatically create the above plots and additionally individual plots of the DV vs TIME by ID. Bill _____ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mark Sale - Next Level Solutions Sent: Thursday, September 13, 2007 3:27 PM Cc: [email protected] Subject: [NMusers] Data checking macro Colleagues, I suspect I'm not the only one who has, over the years had the experience of spending a week (or more) on an analysis only to find important errors in the data set. I'm hoping for some feedback on what people do to try to find these errors (short of spending a week on an incorrect data set). To start the discussion, I've put on the Next Level web site (www.NextLevelSolns.com/downloads) an Excel macro that I've used, with some success to find errors. My experience is that most errors, at least those that are hard to find, are in the dosing specification. This macro makes histograms of:Each covariate DVs Dose Amts Dose Times (after expanding the ADDL doses) Interdose interval (after expanding the ADDL doses) - time from each to the previous Dose to DV time (after expanding the ADDL doses) - time from each observation to the previous dose Currently this macro is limited to 12 covariates, could be increased easily if there is interest. It also isn't CMT specific, that is all doses are just listed, without regard to CMT, same with DVs. I might fix this someday. As usual, this is an entirely in my own self-interest, looking for better ways to find problems in data sets, so please give feedback or ideas. Mark