Skip to main content


Showing posts from 2018

Why ( •_•)>⌐■-■ / (⌐■_■) when you can 😐➡️😎 ? Using Emoji (and other unicode chars) in Stata

The other day I was importing email and text messages on my Mac in Stata from the .db files that they are stored in so that I could do some analysis on how often I text/email certain people. I've never been able to get odbc to work on my Mac in Stata, so I cheated with Sqlite3 and imported with:
foreach tbl in chat chat_msg_join message handle chat_handle_join {
! sqlite3 ~/Library/messages/chat.db
! sqlite3 .tables
! sqlite .headers on
! sqlite .mode csv
! sqlite .output /users/ebooth/`tbl'.csv
! sqlite SELECT * FROM `tbl' ;
! sqlite .quit

to get my texts.  Merging tables from chat.db is another (very long) story.  

I opened these csv files in Excel to inspect them and noticed that Emoji from the texts were missing (I assumed they'd be converted into some sort of unicode or hex equivalent), but when I imported these messages into Stata I was surprised to see something like the picture to the right where Emoji were being included in Stata (in the results window and data browser…

Happy π day! Estimating Pi by graphing random numbers with Stata.

√-1 2^3 Σ π and it was delicious (because it was of the pizza variety).

My obligatory March 14 Pi  post involves estimating Pi by finding the proportion of randomly plotted points inside a square that are also inside a circumscribed circle. The larger the number of points we plot using this method, the closer to Pi we get (sort of Pi estimation by  simulation in Stata (I know, I know, it should have been programmed in PYthon)).

You can approximate a Pi calculation via boring methods like:
pressing the π on your TI-83 calculator , calculating fractions like 22/7 or 355/113 ,  calculating the log(6)^log(5)^log(4)^log(3)^log(2),  or whatnot. e.g.,

. di  22/7

. di  355/113

. di  log(6)^(log(5)^(log(4)^(log(3)^log(2))))

or just cheat and plot  circles via -tw- functions or -graph pie-, etc in Stata.

However  (assuming you aren't Yasumasa Kanada and Daisuke Takahashi from the U of Tokyo with server cycles to spare)  another approach for calculating Pi is to _draw …

Implications of NCAA march madness brackets with multiplier scoring (a Stata example)

This year at our office we are again helping contribute to the great March Madness economic productivity drain. However, we are also toying with the idea of switching to a Fibonacci sequence of bracket scoring and using multiplier scoring to weight up more risky selections (particularly in later rounds). 

In prior years, we did this whole thing on paper / manually, and so to keep things simple we followed a standard scoring (pts per round) regime which was the standard espn/yahoo format (e.g., 1-2-4-8-16-32 points across the 6 rounds) .  
This year most people in our office seem supportive of a scoring scheme that incentivizes risk-taking.  So, there are essentially two changes to our scoring scheme.
First, we are changing to a 2-3-5-8-13-21 progression. With the more traditional 1-2-4-8-16-32 system, the championship game is worth 32x as much as any first-round game (which in effect makes the first round games almost mostly useless). With Fibonacci scoring, the last round game is worth …

Beyond the bubble: the role of affect and cross-communication in dismantling biases

School discipline and behavioral research and evaluation, including my own, consistently battles the critique that we arent considering (or measuring) the presence/dynamics/frequency/quality of parental interactions with teachers and parental participation at school. Part of this is clearly about measuring the influence of how families/homelife/parents shape behavior while at school (and sometimes this is treated as a scapegoat for teachers' limited ability to thwart discipline issues at school). However, a second piece of this is the often understudied nature of parental-teacher communication in improving student behavioral and other outcomes. This second line of inquiry spurs myriad questions about how and about the exact mechanism through which increased parental involvement/communication would ameliorate student discipline issues. In some evaluations, we have tried to answer these questions by proxying parental involvement/interest with a set of student, parent, and staff surv…

Working around issues with long IDs in Stata (and also more about -markstat-)

Working with long IDs in Stata (and also more about -markstat-)

In this example, I discuss some potential issues with using long IDs in Stata and how to avoid them. The problems presented below are separate from precision issues with large numeric values (that is, large in terms of number of places, including precise decimal representations). I previously discussed precision issues that are alleviated by using double or string format at this link.
Long IDs and -levelsof-First, let’s create some fake data for this session:. clear . input campus campus 1. 11990 2. 119902041 3. 243905112 4. 243905129 5. 243905131 6. 244903001 7. 00244903041 8. end . g x = cond(_n<4, 1, 0, .) . desc Contains data obs: 7 vars: 2 size: 112 ──────────────────────────────────────────────────────────────────────────────────────────────────────…