Skip to main content


Showing posts from December, 2009

A Note on Posting Code in Blogger

As is the case with a lot of free blogging sites, posting code can be a pain.  Google sites and wordpress have fixed the issue, but blogger (Google-owned site) has not.  Many users have posted solutions that involve uploading code to your server and downloading a "syntax highlighter" file, but you have to paste in the css tags before every code posting & I'm not willing to mess with that solution (and I'm hoping that blogger will add this functionality soon).
So, for the purposes of this site, I am posting all code (Stata or otherwise) in grey, standard-width font between a starting and ending set of asterisks.  Also, I'll use row continuation flags (/*  and */) to indicate text wrapping.  I copied/pasted the code from my previous posting into the Stata do-file editor (and and it ran without issues.

Automatically Generating Reports with Stata

At PPRI, we get a lot of data in waves, so we end of recreating various versions of the same report with updated data.  This can become tiresome when having to recreate, re-copy/paste, and reformat lots of tables and figures.  Thankfully, Roger Newson, from Imperial College London, has created some ado-files, called -rtfutil-,  that help automate the insertion of these elements into a .rtf document.
Here's an snippet of how I've adopted his example code to help automatically generate some reports with new data:

*-----------------------BEGIN CODE clear* //RTF UTILITY FOR INSERTING GRAPHICS & TABLES// local sf "`pwd'" //SETUP sysuse auto, clear twoway scatter mpg price, mlabel(make) || lfitci mpg price graph export "`sf'myplot1.eps", replace twoway scatter price mpg, mlabel(make) by(for) graph export "`sf'myplot2.eps", replace ** tempname handle1 //RTFUTIL rtfopen `handle1' using "`sf'mydoc1.rtf", replace file…

Data Cleaning with Stata

During the course of my research, I come across a lot of messy data.

This includes everything from human errors like misspelled words, data entry mistakes, or inconsistent variable/value labels to machine/software issues like weird characters in the data, strange data shapes from external programs, or data embedded in something like HTML.

Stata has a lot of powerful tools to help automate cleaning data, one of my favorites is -filefilter-, but when all else fails, sometimes you've just got to sit down and clean data by hand (or get a research assistant to do it for you).
For example, when you are trying to join data from multiple data sources that label their records with similar, but unsystematically different labeling conventions, it can be tricky to get the data to link up.  A recent statalist posting asked about this issue in particular (note: I'll try to tackle some of these other 'data cleaning' areas in future posts).

The question asked how to merge datasets w…