Saturday, May 7, 2011

Unabbreviate Macro Lists in Stata

This Statalist thread from a few months ago started by Nick Mosely asked about working with hundreds of macros and eventually got onto the topic of expanding or unabbreviating (see -help unab- for the varlist version of this idea) macro lists.  Based on my posts in that thread, I recently posted -mac_unab- to the SSC Archives to help with this problem.
-mac_unab- is still a bit of a kludge solution, but I haven't figured out a better approach (nor did anyone suggest a better approach).  The biggest issues with mac_unab, which I hope to find better solutions for, include:
1.  When you run mac_unab, it will print all the contents of the -macro list- command in the Results window.  This might be desirable for some, but I'd like to be able to toggle it on/off.  Currently, the way I've gathered the macros is via a log, so there's no way to avoid printing the -mac list- output each time -mac_unab- is run.
2.  Currently, the program will only match macros with the pattern stub*, so you specify what the macros begin with and an asterisk to indicate that you want to match everything with any letters following that prefix.  I'd like to expand those capabilities to match macros based on more complex matching rules like those in -help varlist-, such as  *mymacro*, my?macro, my~macro, etc.  Regardless, the names of your macros will need to be systematic to take advantage of mac_unab, but I'd like to relax the formatting requirements necessary to match of macro names.
The syntax follows closely to -unab- for unabbreviating varlists.
Here's an example:

//Create some data//
    . clear
    . set obs 10
    . g x = round(runiform()*100, .05)
    . g x2 = int(runiform()*100)
    . replace x = -2.5 in 1
//Convert Numbers to Text//
    . num2words x, g(x_converted)
    . num2words x, g(x_rounded) round
    . replace x_converted = proper(x_rounded)
    . num2words x, g(x2_ordinal) ordinal
//Use Converted Text in Graph//
    . egen mx = mean(x)
    . num2words mx, round
    . gr bar x , over(x2_ordinal, sort(1)) ///
        note( X for Obs 2 is `=x_rounded[2]') ///
        text(60 20 `"Mean = `=mx2'"', box )

Sunday, May 1, 2011

For Computers, understanding natural language is sometimes hard ...

This paper by Chloe ́ Kiddon and Yuriy Brun at U of Washington describes a bayes classifier that can be used to find accidental double entendres or "potential innuendos" (called "That's what she said" or TWSS jokes) in sentences.  Here's the ruby script to run this classifier to identify so-called "low brow comedy" (their words, not mine) in natural, human language .
Hopefully, this foreshadows the great things we can expect from our computers' auto-complete functionality in the near future. This article from Wired on detecting humor with computer software is also relevant. Andrew Gelman, a bayesian scholar and co-author of the great zombie survey paper^, links to this article in his blog after I recently mentioned it to him.

^ This paper contains a Technical Note, describing the authors' rationale for using LaTeX, that is one of my all-time favorite quotes: 
"We originally wrote this article in Word, but then we converted it to Latex to make it look more like science."