Skip to main content


Beyond the bubble: the role of affect and cross-communication in dismantling biases

School discipline and behavioral research and evaluation, including my own, consistently battles the critique that we arent considering (or measuring) the presence/dynamics/frequency/quality of parental interactions with teachers and parental participation at school. Part of this is clearly about measuring the influence of how families/homelife/parents shape behavior while at school (and sometimes this is treated as a scapegoat for teachers' limited ability to thwart discipline issues at school). However, a second piece of this is the often understudied nature of parental-teacher communication in improving student behavioral and other outcomes. This second line of inquiry spurs myriad questions about how and about the exact mechanism through which increased parental involvement/communication would ameliorate student discipline issues. In some evaluations, we have tried to answer these questions by proxying parental involvement/interest with a set of student, parent, and staff surv…
Recent posts

Working around issues with long IDs in Stata (and also more about -markstat-)

Working with long IDs in Stata (and also more about -markstat-)

In this example, I discuss some potential issues with using long IDs in Stata and how to avoid them. The problems presented below are separate from precision issues with large numeric values (that is, large in terms of number of places, including precise decimal representations). I previously discussed precision issues that are alleviated by using double or string format at this link.
Long IDs and -levelsof-First, let’s create some fake data for this session:. clear . input campus campus 1. 11990 2. 119902041 3. 243905112 4. 243905129 5. 243905131 6. 244903001 7. 00244903041 8. end . g x = cond(_n<4, 1, 0, .) . desc Contains data obs: 7 vars: 2 size: 112 ──────────────────────────────────────────────────────────────────────────────────────────────────────…

Happy holidays - I made you this Stata tree...

Two Stata-related problems/issues I've been working on involve building weaved, literate documents via -markstat- (from SSC) and post-processing Stata graphs via gr_edit commands.

gr_edit commands are not documented but are helpful for adding elements across many graphs that you'd otherwise have to add manually (or by creating a graph recording). The biggest downside to gr_edit commands (or applying graph recordings) is that it's a slow process when you have many hundreds/thousands of graphs to edit.

In the code example below, I made you a holiday tree with 100k lights. I know, this is very thoughtful of me. You are welcome.

Update (Dec 25): First here's a new version of a holiday tree using -tw scatter-/-tw bar-. I like this version much better than the approach I had originally included when I first put up this post. These previous attempts using -scatteri- and gr_edit are still included below :

**************! clear set obs 2000 g z = 10 g obs = _n g zero = 0 g …

Slack: Alchemizing FOMO into neurosis

At my research firm, we’ve finally caught up to c.2013 and adopted Slack for project communications and management. My initial (<2 weeks) impression is that I like it -- the command line, programmable bots, and API parts, in particular, are appealing (and I'm guessing that the utility of those will increase over time.)

There is a lot of conflicting advice out there about how to best use slack, so in this post I add to the noise with some guiding principals we’re following (at least initially).

The tl;dr version of this post is that more channels with fewer conversations > fewer channels with more conversations (noise)mute your notifications! Slack will let you know when you are needed (as long as others follow standard @ mentioning (tagging) conventions)threading = good (but use it deliberately)Reply with purpose, otherwise just Reactdon’t be a luddite-learn & use the Slack /slash commands.
Below are more details/notes/tips on how we are using Slack and some areas where w…

Unexpected results with conditions and functions in Stata

Just dropped in to see what condition my condition was in... Using Stata functions like those found in -help functions- and -help egen- can be tricky when used in combination with conditions (by 'conditions' here I mean everything that changes the range over which functions operate, so this includes [if] and [in] conditions, conditions within a function like: gen count= sum(gender == 1 & age <= 18),etc) . The examples in this post serve as a cautionary tale for some things to watch for when combining the use of functions and conditions.

The advice in this post falls into three categories/topics: (1) Avoid subscripting with -egen- functions; (2) be wary of conditions inside functions (as well as nested functions); and (3) there are better ways to use conditions with functions to avoid problems.
Note : a subset of this discussion appeared on Statalist in this thread:…

Creating example datasets for collaboration with other Stata users

Robert Picard and Nick Cox developed a (better) program called -dataex- that was uploaded to SSC and as of Stata 15.1 is officially included to help users share example code. The major difference is that -writeinput- writes a .do file with an -input-statement while -dataex- is Statalist-centric (even producing the enclosing [CODE] tags unless the 'elsewhere' option is specified) and produces the data example in the results window (Updated Nov 2017)


I'm lucky to be in a research environment where most of my colleagues and students use Stata.  Also, I regularly participate on Statalist.  Both of these have helped pushed me to periodically refine my habits when it comes to communicating about Stata.

When it comes to asking questions on Statalist, I've tried to stick closely to the Statalist FAQ and other tips mentioned by William Gould on the Stata NEC Blog.  However, for answering questions on Statalist, I find Maarten Buis's page on his Statalist postings espec…

Tukey, not TuRkey (that's for tomorrow), on outliers and multiple comparisons

John Tukey contributed many things to statistics and data visualizations (box plots!).  I also cannot help thinking "TuRkey" every time I read or verbalize his name.  In my (biased, failable human) mind, the frequency with which I think to use Tukey methods in my own work, and the potential for an embarrassing Tukey-TuRkey switch during regular communication, increases across the year to look something like the figure to the right.

Why am I reading this?
In this post, I discuss a few ways to use some of Tukey's contributions to help with some analyses I've been recently running in Stata. I use (a simulated version of) data from student survey responses, discipline involvement, and exam performance to do Tukey-related things like assess outliers and test for differences across multiple groups. This post includes some basic code run on fake data but in addition to what's presented below you could also consider using Stata programs (some from SSC) such as:  -extreme…