Skip to main content


I think this technically violates Asimov's zeroth law...

First, AI was tasked with dealing with the pesky Reviewer #2 problem of the scientific peer review process (ok, the Evise feature is just a search & match function, not really AI).  Now, AI is here to handle the messy business of actually writing your scientific manuscript for you.

SciNote has their new magic AI plug-in (sarcasm intended) that will purportedly take the results of your analyses and links to relevant literature and "magically" turn it into a scientific manuscript.  From the product page:
This is where the magic happens
Once your data is nicely organized in sciNote, Manuscript Writer can do its job!
Based on the data you have in sciNote, it will generate a draft of your manuscript. oof.    Insert lateral plaintiff face type emoji here.  

This only perpetuates the issues with paper mills/publishers (that thankfully get exposed (using a fake manuscript generator no less)). At least they didn't launch this new product at 2:14 a.m. Eastern time,  on August…
Recent posts

Precision in Stata

In this post, I explore how to deal with precision issues with Stata.
First, Create Data for Example.
.clear . set obs 1000 obs was 0, now 1000 . g x = 1.1. list in 1/5, noobs +-----+ | x | |-----| | 1.1 | | 1.1 | | 1.1 | | 1.1 | | 1.1 | +-----+ . count if x ==1.1 // zero matches!! 0Precision of Stata storage formatsStata isnt wrong, it's just that you stored the variable x with too little precision (some decimal numbers have no exact finite-digit binary representation in computing). If we change the precision to float or store the variable as double format then it fixes the issue. Note below how  x is represented in Hexidecimal and Binary IEEE format vs. Stata general (16g) and fixed (f) format.
. . count if x == float(1.1) 1000. **formats . di %21x x //hex+1.19999a0000000X+000. di %16L x //IEEE precision000000a09999f13f. di %16.0g round(x, .1) 1.1. di %4.2f round(x, .1)1.10. di %23.18f round(x, .1) 1.100000000000000089Double formatsStoring the …

I did a thing....

In 2009, New Mexico adopted more rigorous high school graduation requirements. I (finally) completed the last of my remaining REL studies that examined the changes in New Mexico’s high school students’ advanced course completion rates under these new requirements. We're providing a webinar and you can join in and listen to the results of the study if this is a topic you're interested in.  See the webinar announcement (at the newly minted Gibson blog) for registration details.

The study that will be presented:
Booth, E., Shields, J., & Carle, J. (2017). Advanced course completion rates among New Mexico high school students following changes in graduation requirements (REL 2018–278). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Southwest.
Accessible at

Whitewashing your standard errors

Great quote from Gary King warning about the dangers of the all-to-common " , robust " (or I guess it's ( , vce(robust) now) solution for whitewashing the SEs in your model :

"[...] if robust and classical standard errors diverge—which means the author acknowledges that one part of his or her model is wrong—then why should readers believe that all the other parts of the model that have not been examined are correctly specified? We normally prefer theories that come with measures of many validated observable implications; when one is shown to be inconsistent with the evidence, the validity of the whole theory is normally given more scrutiny, if not rejected (King, Keohane, and Verba 1994). Statistical modeling works the same way: each of the standard diagnostic tests evaluates an observable implication of the statistical model. The more these observable implications are evaluated, the better, since each one makes the theory vulnerable to being proven wrong. This is ho…

Unabbreviate Macro Lists in Stata

This Statalist thread from a few months ago started by Nick Mosely asked about working with hundreds of macros and eventually got onto the topic of expanding or unabbreviating (see -help unab- for the varlist version of this idea) macro lists.  Based on my posts in that thread, I recently posted -mac_unab- to the SSC Archives to help with this problem.
-mac_unab- is still a bit of a kludge solution, but I haven't figured out a better approach (nor did anyone suggest a better approach).  The biggest issues with mac_unab, which I hope to find better solutions for, include:
1.  When you run mac_unab, it will print all the contents of the -macro list- command in the Results window.  This might be desirable for some, but I'd like to be able to toggle it on/off.  Currently, the way I've gathered the macros is via a log, so there's no way to avoid printing the -mac list- output each time -mac_unab- is run.
2.  Currently, the program will only match macros with the pattern st…

For Computers, understanding natural language is sometimes hard ...

This paper by Chloe ́ Kiddon and Yuriy Brun at U of Washington describes a bayes classifier that can be used to find accidental double entendres or "potential innuendos" (called "That's what she said" or TWSS jokes) in sentences.  Here's the ruby script to run this classifier to identify so-called "low brow comedy" (their words, not mine) in natural, human language .
Hopefully, this foreshadows the great things we can expect from our computers' auto-complete functionality in the near future. This article from Wired on detecting humor with computer software is also relevant. Andrew Gelman, a bayesian scholar and co-author of the great zombie survey paper^, links to this article in his blog after I recently mentioned it to him.

^ This paper contains a Technical Note, describing the authors' rationale for using LaTeX, that is one of my all-time favorite quotes: 
"We originally wrote this article in Word, but then we converted it to Lat…

A new SSC package to convert numbers to text (-num2words-)

-num2words- has been posted to the SSC Archives.  It is a Stata module to convert numbers to text.  It can convert integers, fractional numbers, and ordinal numbers (e.g., 8 to 8th).  The idea for this program originated from a LaTeX report I was creating that had some code that wrote the text version of numbers into sentences, including writing the proper case text for a number if it started a sentence.  So, the LaTeX file (written via -texdoc- from SSC) had some code like:
****texdoc example
sum x, meanonly
loc totalN "`=_N'"
loc pct1  "`=myvar[1]'"
loc totalN "`r(N)'"
if `totalN'>`lastN' loc change1 "increase"
****texdoc text written:
tex  `totalN' respondents took the survey this month.
tex  There was a `pct1' percent `change1' in respondents who reported using incentive payment dollars....and so on

where the macros are defined as:
`totalN' - the total number of relevant respondents (so, loc totalN "`=_N…