Thursday, December 30, 2010

Finding your way around Stata

One of the things my students first get stuck on is how to find things (e.g. files, directories, variables with particular labels or notes) in Stata.
There are a lot of commands to find things like files/datasets, directories, command help documentation, user commands/ado-files, variables, values, notes/chars, etc -- there are some commands that find only one of these things, some commands can find several of these things, and most of these things can be found by more than one command.  It can be a bit overwhelming and confusing and I've found that students who fall behind early in a class using Stata often get stuck at the point of being able to find these things -- particularly directories and command ado/help files.

Of course good use of a search engine is a key resource, but the table below gives an overview of the commands I use to find these things in Stata (this table can also found in my Module 1 Lecture for PHPM 672).    Undoubtedly, there are other commands that will do these tasks, but these are the ones that stuck with me after I started using them.

Sunday, December 26, 2010

Fun with Stata: Games for Stata Edition

Over at Mitch's "Stata Daily" blog, he describes a "hangman" game sent to him by Marek Hlavac.  I'm a sucker for non-standard uses of Stata (e.g., [1] [2] [3]), so I played with it for a while.  This also convinced me to make public one of my earliest attempts at writing a Stata ado-file/program:  -blackjack-.

The game is played by typing -blackjack- into the command window and then the game prompts the user for the amount she wants to bet (default is $500 which replenishes after you lose it all or you exit Stata), and whether to hit or stay.  It doesn't accurately represent all the rules and scenarios of a real game a blackjack (e.g., no doubling down), so don't use it to prep for your run at taking down a Vegas casino.

Fair warning that -blackjack- is visually quite ugly (the cards tend to misalign; I could have come up with a better card design for face cards than a "{Stata}" center; and (because I was learning about Stata chars) I used some ascii symbols for suits instead of something simple like K, Q, J, A ) and I've run into the occasional bug that I haven't taken time to investigate & fix.
One thing I like about Hlavac's -hangman- is how he uses subprograms to define and display the stages of building the hangman.  I wish I had thought about this for displaying my cards -- it probably would have saved a lot of copying/pasting of -if- loops displaying the various card configurations.

Writing/tinkering with the ado-file for this game probably provided more amusement for me than actually playing it. It's a great mindless activity to do if you're doing some Stata coding and need a break.    Check out -blackjack- here.

At the Stata Daily blog, Nick J. Cox comments about some other Stata games/simulations/etc available at SSC:  -chaos- and -irrepro-. Also, I mention similar programs -dice-, -cards- (which I cannot get to work on Stata 11), and -heads- from UCLA's Stata page, see:
net install dice, ///
from( ///
replace all
All these are fun (and possibly instructive) programs for Stata.

Monday, December 20, 2010

Creating example datasets for collaboration with other Stata users

I'm lucky to be in a research environment where most of my colleagues and students use Stata.  Also, I regularly participate on Statalist.  Both of these have helped pushed me to periodically refine my habits when it comes to communicating about Stata.

When it comes to asking questions on Statalist, I've tried to stick closely to the Statalist FAQ and other tips mentioned by William Gould on the Stata NEC Blog.  However, for answering questions on Statalist, I find Maarten Buis's page on his Statalist postings especially helpful .

I've learned a lot from Maarten's FAQ about
(1) the types of questions that are not obvious to others on Statalist (and this tends to translate over to my students & colleagues as well) and
(2) ways to minimize this confusion by doing things as simple as creating clearly marked, self-contained working examples of code or using commenting to help create a roadmap for the code in an example as well as avoid issues with wrapping of code.

When it comes to creating clearly marked, self-contained examples for others, there are a couple of standard tools:
  • Using a canned Stata dataset for the example (as Maarten mentions )
  • Creating a fake dataset using a variety of -generate-, -replace-, or random data functions.  See my previous post about adding a random, fake string function (-ralpha-) to this set of tools.
  • Finally, if you cannot easily get the structure you need for an example from a canned or easily -generate-d dataset, you can always create a data example using -input-
The idea behind -input- is that I can insert a working example into a do-file or Statalist posting that is self-contained.  Running the code below will -input- this data example into Stata's memory:

inp   str14(state) pop str2(state2) divorce region marriage pop65p
"Alabama" 3893888 "AL" 26745 3 49018 440015
"Alaska" 401851 "AK" 3517 4 5361 11547
"Arizona" 2718215 "AZ" 19908 4 30223 307362
"Arkansas" 2286435 "AR" 15882 3 26513 312477
"California" 23667902 "CA" 133541 4 210864 2414250
"Georgia" 5463105 "GA" 34743 3 70638 516731
"Hawaii" 964691 "HI" 4438 4 11856 76150
"Idaho" 943935 "ID" 6596 4 13428 93680
"Illinois" 11426518 "IL" 50997 2 109823 1261885
"Indiana" 5490224 "IN" 40006 2 57853 585384
"Iowa" 2913808 "IA" 11854 2 27474 387584
"Kansas" 2363679 "KS" 13410 2 24847 306263
"Kentucky" 3660777 "KY" 16731 3 32727 409828
"Louisiana" 4205900 "LA" 18108 3 43460 404279
"Maine" 1124660 "ME" 6205 1 12040 140918
"Maryland" 4216975 "MD" 17494 3 46278 395609
"Massachusetts" 5737037 "MA" 17873 1 46273 726531
"Michigan" 9262078 "MI" 45047 2 86898 912258

Sunday, December 19, 2010

Statistics Software Showdown: Google Ngram

Using Google's Ngram Viewer, here's the breakdown of Stata vs. SAS vs. SPSS.  

Stata didn't do as well as I hoped, but in taking a closer look there are at least a couple of reasons to be optimistic about Stata's prospects.
(1)  SAS is benefitting from lot's of books written about the British Special Air Service (SAS).  
(2) As of yet, there doesn't appear to be a way to refine these searches with boolean search parameters.  If so, we could have searched for "SAS -British" or "SPSS | PASW", etc.
(3) I couldn't find a way to search for the software 'R' using Ngram.
(4)  Stata seems to have as much, if not more, web presence / resources as the other software packages.  
Using a regular google search:

"Stata" + statistical software  22.3 million pages
"SPSS" + statistical software  28.2 million pages
"SAS" + statistical software  17.6 million pages
"PASW" + statistical software  53K pages
"R" + statistical software:  7.7 million pages  has Stata ahead of SPSS by a count of 35 million to 13 million pages  (besides leaving out the "statistical software" part, I'm not sure what else explains the difference)

(5)  Finally, while more books are being published in modern times (especially with the increasing output of the Stata Press), this graph at least shows an uptick in Stata presence in Google Books since 1990:

Other fun Google NGram Viewer comparisons include:

Probit v. Logit (is that a CDF?)

Sunday, December 12, 2010

Fun with Stata: Running Stata from your iPhone

There are literally tens of people out there in the world that have at some point or another thought "I really wish I could run something in Stata right now on my iPhone." Well, I recently killed some time making that a possibility.

In order to get results from Stata on your iPhone anywhere/anytime, this process requires 5 components:
(1) Stata (10 or later) installed on a Mac OSX (10.5 or later) that is always connected to the internet
(2) A Dropbox account linked to your Mac that has Stata installed
(3) iStata.scpt Applescript file to manage files put in Dropbox
(4) iStata.ado to run the file, log the output, and put it back in Dropbox
(5) the free application Plaintext for iPhone (or some equivalent) to write and view .do files written and run from your iPhone

You'll need to save the iStata.scpt and files into the folders referenced in these scripts in your Dropbox folders on your Mac OSX. Really, you can place these files/folders anywhere in your Dropbox that you'd like, but you need to change the paths in these scripts to point to the proper location.

The basic idea of the workflow is illustrated below. Basically, the process is to create the file "torun.txt" on your iPhone in the app Plaintext and then save it.  The scripts do all the work and send back a new file called "Results.txt" with the logfile of what you ran in Stata.