Skip to main content


Why ( •_•)>⌐■-■ / (⌐■_■) when you can 😐➡️😎 ? Using Emoji (and other unicode chars) in Stata

The other day I was importing email and text messages on my Mac in Stata from the .db files that they are stored in so that I could do some analysis on how often I text/email certain people. I've never been able to get odbc to work on my Mac in Stata, so I cheated with Sqlite3 and imported with:
foreach tbl in chat chat_msg_join message handle chat_handle_join {
! sqlite3 ~/Library/messages/chat.db
! sqlite3 .tables
! sqlite .headers on
! sqlite .mode csv
! sqlite .output /users/ebooth/`tbl'.csv
! sqlite SELECT * FROM `tbl' ;
! sqlite .quit

to get my texts.  Merging tables from chat.db is another (very long) story.  

I opened these csv files in Excel to inspect them and noticed that Emoji from the texts were missing (I assumed they'd be converted into some sort of unicode or hex equivalent), but when I imported these messages into Stata I was surprised to see something like the picture to the right where Emoji were being included in Stata (in the results window and data browser…
Recent posts

Happy π day! Estimating Pi by graphing random numbers with Stata.

√-1 2^3 Σ π and it was delicious (because it was of the pizza variety).

My obligatory March 14 Pi  post involves estimating Pi by finding the proportion of randomly plotted points inside a square that are also inside a circumscribed circle. The larger the number of points we plot using this method, the closer to Pi we get (sort of Pi estimation by  simulation in Stata (I know, I know, it should have been programmed in PYthon)).

You can approximate a Pi calculation via boring methods like:
pressing the π on your TI-83 calculator , calculating fractions like 22/7 or 355/113 ,  calculating the log(6)^log(5)^log(4)^log(3)^log(2),  or whatnot. e.g.,

. di  22/7

. di  355/113

. di  log(6)^(log(5)^(log(4)^(log(3)^log(2))))

or just cheat and plot  circles via -tw- functions or -graph pie-, etc in Stata.

However  (assuming you aren't Yasumasa Kanada and Daisuke Takahashi from the U of Tokyo with server cycles to spare)  another approach for calculating Pi is to _draw …

Implications of NCAA march madness brackets with multiplier scoring (a Stata example)

This year at our office we are again helping contribute to the great March Madness economic productivity drain. However, we are also toying with the idea of switching to a Fibonacci sequence of bracket scoring and using multiplier scoring to weight up more risky selections (particularly in later rounds). 

In prior years, we did this whole thing on paper / manually, and so to keep things simple we followed a standard scoring (pts per round) regime which was the standard espn/yahoo format (e.g., 1-2-4-8-16-32 points across the 6 rounds) .  
This year most people in our office seem supportive of a scoring scheme that incentivizes risk-taking.  So, there are essentially two changes to our scoring scheme.
First, we are changing to a 2-3-5-8-13-21 progression. With the more traditional 1-2-4-8-16-32 system, the championship game is worth 32x as much as any first-round game (which in effect makes the first round games almost mostly useless). With Fibonacci scoring, the last round game is worth …

Beyond the bubble: the role of affect and cross-communication in dismantling biases

School discipline and behavioral research and evaluation, including my own, consistently battles the critique that we arent considering (or measuring) the presence/dynamics/frequency/quality of parental interactions with teachers and parental participation at school. Part of this is clearly about measuring the influence of how families/homelife/parents shape behavior while at school (and sometimes this is treated as a scapegoat for teachers' limited ability to thwart discipline issues at school). However, a second piece of this is the often understudied nature of parental-teacher communication in improving student behavioral and other outcomes. This second line of inquiry spurs myriad questions about how and about the exact mechanism through which increased parental involvement/communication would ameliorate student discipline issues. In some evaluations, we have tried to answer these questions by proxying parental involvement/interest with a set of student, parent, and staff surv…

Working around issues with long IDs in Stata (and also more about -markstat-)

Working with long IDs in Stata (and also more about -markstat-)

In this example, I discuss some potential issues with using long IDs in Stata and how to avoid them. The problems presented below are separate from precision issues with large numeric values (that is, large in terms of number of places, including precise decimal representations). I previously discussed precision issues that are alleviated by using double or string format at this link.
Long IDs and -levelsof-First, let’s create some fake data for this session:. clear . input campus campus 1. 11990 2. 119902041 3. 243905112 4. 243905129 5. 243905131 6. 244903001 7. 00244903041 8. end . g x = cond(_n<4, 1, 0, .) . desc Contains data obs: 7 vars: 2 size: 112 ──────────────────────────────────────────────────────────────────────────────────────────────────────…

Happy holidays - I made you this Stata tree...

Two Stata-related problems/issues I've been working on involve building weaved, literate documents via -markstat- (from SSC) and post-processing Stata graphs via gr_edit commands.

gr_edit commands are not documented but are helpful for adding elements across many graphs that you'd otherwise have to add manually (or by creating a graph recording). The biggest downside to gr_edit commands (or applying graph recordings) is that it's a slow process when you have many hundreds/thousands of graphs to edit.

In the code example below, I made you a holiday tree with 100k lights. I know, this is very thoughtful of me. You are welcome.

Update (Dec 25): First here's a new version of a holiday tree using -tw scatter-/-tw bar-. I like this version much better than the approach I had originally included when I first put up this post. These previous attempts using -scatteri- and gr_edit are still included below :

**************! clear set obs 2000 g z = 10 g obs = _n g zero = 0 g …

Slack: Alchemizing FOMO into neurosis

At my research firm, we’ve finally caught up to c.2013 and adopted Slack for project communications and management. My initial (<2 weeks) impression is that I like it -- the command line, programmable bots, and API parts, in particular, are appealing (and I'm guessing that the utility of those will increase over time.)

There is a lot of conflicting advice out there about how to best use slack, so in this post I add to the noise with some guiding principals we’re following (at least initially).

The tl;dr version of this post is that more channels with fewer conversations > fewer channels with more conversations (noise)mute your notifications! Slack will let you know when you are needed (as long as others follow standard @ mentioning (tagging) conventions)threading = good (but use it deliberately)Reply with purpose, otherwise just Reactdon’t be a luddite-learn & use the Slack /slash commands.
Below are more details/notes/tips on how we are using Slack and some areas where w…