Skip to main content


Showing posts from March, 2018

Why ( •_•)>⌐■-■ / (⌐■_■) when you can 😐➡️😎 ? Using Emoji (and other unicode chars) in Stata

The other day I was importing email and text messages on my Mac in Stata from the .db files that they are stored in so that I could do some analysis on how often I text/email certain people. I've never been able to get odbc to work on my Mac in Stata, so I cheated with Sqlite3 and imported with:
foreach tbl in chat chat_msg_join message handle chat_handle_join {
! sqlite3 ~/Library/messages/chat.db
! sqlite3 .tables
! sqlite .headers on
! sqlite .mode csv
! sqlite .output /users/ebooth/`tbl'.csv
! sqlite SELECT * FROM `tbl' ;
! sqlite .quit

to get my texts.  Merging tables from chat.db is another (very long) story.  

I opened these csv files in Excel to inspect them and noticed that Emoji from the texts were missing (I assumed they'd be converted into some sort of unicode or hex equivalent), but when I imported these messages into Stata I was surprised to see something like the picture to the right where Emoji were being included in Stata (in the results window and data browser…

Happy π day! Estimating Pi by graphing random numbers with Stata.

√-1 2^3 Σ π and it was delicious (because it was of the pizza variety).

My obligatory March 14 Pi  post involves estimating Pi by finding the proportion of randomly plotted points inside a square that are also inside a circumscribed circle. The larger the number of points we plot using this method, the closer to Pi we get (sort of Pi estimation by  simulation in Stata (I know, I know, it should have been programmed in PYthon)).

You can approximate a Pi calculation via boring methods like:
pressing the π on your TI-83 calculator , calculating fractions like 22/7 or 355/113 ,  calculating the log(6)^log(5)^log(4)^log(3)^log(2),  or whatnot. e.g.,

. di  22/7

. di  355/113

. di  log(6)^(log(5)^(log(4)^(log(3)^log(2))))

or just cheat and plot  circles via -tw- functions or -graph pie-, etc in Stata.

However  (assuming you aren't Yasumasa Kanada and Daisuke Takahashi from the U of Tokyo with server cycles to spare)  another approach for calculating Pi is to _draw …

Implications of NCAA march madness brackets with multiplier scoring (a Stata example)

This year at our office we are again helping contribute to the great March Madness economic productivity drain. However, we are also toying with the idea of switching to a Fibonacci sequence of bracket scoring and using multiplier scoring to weight up more risky selections (particularly in later rounds). 

In prior years, we did this whole thing on paper / manually, and so to keep things simple we followed a standard scoring (pts per round) regime which was the standard espn/yahoo format (e.g., 1-2-4-8-16-32 points across the 6 rounds) .  
This year most people in our office seem supportive of a scoring scheme that incentivizes risk-taking.  So, there are essentially two changes to our scoring scheme.
First, we are changing to a 2-3-5-8-13-21 progression. With the more traditional 1-2-4-8-16-32 system, the championship game is worth 32x as much as any first-round game (which in effect makes the first round games almost mostly useless). With Fibonacci scoring, the last round game is worth …