Skip to main content


Showing posts from 2018

Reading list: some recent Ed evaluation and Stata articles/links of interest

Image from @AcademiaObscura #EdEval #EdResearch #Stata Quick roundup of some recent links to some education evaluation/research and/or Stata-related things I've been reading ( or have bookmarked to read soon ). >>From the latest DeptEd NVCS 'data point' report:   "Students who reported repetition and power imbalance were components of the bullying they experienced were also more likely to agree that bullying had an impact on various aspects of their lives” >> Matt Welch (AIR) says "Need to align #edeval criteria to standards for student learning #teachereval" >> Chaisemartin and D’HaultfΕ“uille have a new paper about relaxing the treatment effect homogeneity assumption in difference-in-difference analyses (dubbed Fuzzy DID). Ungated version here:

Why ( •_•)>⌐■-■ / (⌐■_■) when you can 😐➡️😎 ? Using Emoji (and other unicode chars) in Stata

The other day I was importing email and text messages on my Mac in Stata from the .db files that they are stored in so that I could do some analysis on how often I text/email certain people. I've never been able to get odbc to work on my Mac in Stata, so I cheated with Sqlite3 and imported with:    foreach tbl in chat chat_msg_join message handle chat_handle_join { ! sqlite3 ~/Library/messages/chat.db ! sqlite3 .tables ! sqlite .headers on ! sqlite .mode csv ! sqlite .output /users/ebooth/`tbl'.csv ! sqlite SELECT * FROM `tbl' ; ! sqlite .quit } to get my texts.  Merging tables from chat.db is another (very long) story.   I opened these csv files in Excel to inspect them and noticed that Emoji from the texts were missing (I assumed they'd be converted into some sort of unicode or hex equivalent), but when I imported these messages into Stata I was surprised to see something like the picture to the right where Emoji were being included in Stata

Happy Ο€ day! Estimating Pi by graphing random numbers with Stata.

  √-1 2^3  Ξ£  Ο€ and it was delicious (because it was of the pizza variety). My obligatory March 14 Pi  post involves estimating Pi by finding the proportion of randomly plotted points inside a square that are also inside a circumscribed circle. The larger the number of points we plot using this method, the closer to Pi we get (sort of Pi estimation by  simulation in Stata (I know, I know, it should have been programmed in PYthon)). You can approximate a Pi calculation via boring methods like: pressing the Ο€ on your TI-83 calculator ,  calculating fractions like 22/7 or 355/113 ,   calculating the log(6)^log(5)^log(4)^log(3)^log(2),  or whatnot. e.g., . di  22/7 3.1428571 . di  355/113 3.1415929 . di  log(6)^(log(5)^(log(4)^(log(3)^log(2)))) 3.1415774 or just cheat and plot  circles via -tw- functions or -graph pie-, etc in Stata. However  (assuming you aren't Yasumasa Kanada and Daisuke Takahashi from the U of Tokyo with server cycles to spare)  anothe

Implications of NCAA march madness brackets with multiplier scoring (a Stata example)

This year at our office we are again helping contribute to the great March Madness economic productivity drain . However, we are also toying with the idea of switching to a  Fibonacci  sequence of bracket scoring and using multiplier scoring to weight up more risky selections (particularly in later rounds).  In prior years, we did this whole thing on paper / manually, and so to keep things simple we followed a standard scoring (pts per round) regime which was the standard espn/yahoo format (e.g., 1-2-4-8-16-32 points across the 6 rounds) .   This year most people in our office seem supportive of a scoring scheme that incentivizes risk-taking.  So, there are essentially two changes to our scoring scheme. First , we are changing to a 2-3-5-8-13-21 progression. With the more traditional 1-2-4-8-16-32 system, the championship game is worth 32x as much as any first-round game (which in effect makes the first round games almost mostly useless). With Fibonacci scoring, the last

Beyond the bubble: the role of affect and cross-communication in dismantling biases

School discipline and behavioral research and evaluation, including my own , consistently battles the critique that we arent considering (or measuring) the presence/dynamics/frequency/quality of parental interactions with teachers and parental participation at school. Part of this is clearly about measuring the influence of how families/homelife/parents shape behavior while at school (and sometimes this is treated as a scapegoat for teachers' limited ability to thwart discipline issues at school). However, a second piece of this is the often understudied nature of parental-teacher communication in improving student behavioral and other outcomes. This second line of inquiry spurs myriad questions about how and about the exact mechanism through which increased parental involvement/communication would ameliorate student discipline issues. In some evaluations, we have tried to answer these questions by proxying parental involvement/interest with a set of student, parent, and staff surv

Working around issues with long IDs in Stata (and also more about -markstat-)

Working with long IDs in Stata (and also more about -markstat-) In this example, I discuss some potential issues with using long IDs in Stata and how to avoid them. The problems presented below are separate from precision issues with large numeric values (that is, large in terms of number of places, including precise decimal representations). I previously discussed precision issues that are alleviated by using double or string format at this link. Long IDs and -levelsof- First, let’s create some fake data for this session: . clear . input campus campus 1. 11990 2. 119902041 3. 243905112 4. 243905129 5. 243905131 6. 244903001 7. 00244903041 8. end . g x = cond(_n<4, 1, 0, .) . desc Contains data obs: 7 vars: 2 size: 112 ──────────────────────────────────────────────────────────────────────────────────────