Sunday, February 21, 2010

Update: Encoding Variables and Labeling Values Consistently (and Efficiently) in Stata

On 2010-02-09,  I posted this example on strategies for labeling values for lots of variables efficiently in Stata.

Today I discovered a function in NJC's -egenmore- extension of -egen- that I think is easier to use and, in many cases, faster to implement than the combination of -labutil- and -multencode- that I had suggested in my Feb 9 posting; so, to extend my previous example, here how we could label those variables using -egen- and the function "ston()" (scroll to the bottom to see the UPDATED code):


*-------------------------------------------------BEGIN CODE
clear
**this first bloc will create a fake dataset, run it all together**
input str12 region regioncode str20 quest1 str20 quest2 str20 quest3
"Southwest" 1 "Strongly Agree" "Strongly Disagree" "Disagree"
"West" 2 "Agree" "Neutral" "Agree"
"North" 3 "Disagree" "Disagree" "Strongly Disagree"
"Northwest" 5 "Disagree" "Agree" "Strongly Agree"
"East" 4 "Strongly Disagree" "Strongly Agree" "Agree"
"South" 9 "Neutral" "Agree" "Agreee"
end
//1. Create labeled REGION variable
/*
If we -encode- region it would not line up with regioncode
because encode operates in alphabetical order, for example:
*/
encode region, gen(region2) label(region2)
fre region2   //<-- these values don't match regioncode
drop region2


/* 
INSTEAD, we use -labmask- to quickly assign the values in 
region to the regioncodes
*/
ssc install labutil
labmask regioncode, values(region)
fre regioncode


//2. Creating comparable survey question scales
/*
We want all the survey questions to be on the same scale 
so that we can compare them in a model or table
-encode- can help us here with quest 1 and 2 because they
have the same categories, but quest3 has different categories 
(it's missing "neutral" and "agree" is spelled differently, so we could
either (1) use replace to define the numeric categoreis for these 
survey questions and then relabel them with -label define- and
-label values-, or (2) use -multencode- after fixing the misspelled 
"agree" value in quest3 
*/
replace quest3 = "Agree" if quest3=="Agreee"
**
ssc install multencode
multencode quest1-quest3, gen(e_quest1-e_quest3)
label li
fre e_*
/* 
The categories are labeled properly, but the scale isn't in
order--we want it to increase in satisfaction as it moves from
1 to 5
*/
 //-labvalch- is also from -labutil-
labvalch quest1, f(1 2 3 4 5) t(4 2 3 5 1)
label li
fre e_*
***UPDATE***
**using egenmore & the "ston()" function:
ssc install egenmore
forval n = 1/3 {
 egen ee_quest`n' = ston(quest`n'), to(1/5) /*
 */ from("Strongly Disagree" Disagree Neutral Agree "Strongly Agree")
 label val ee_quest`n' quest1
 **note: val label "quest1" already defined, if not, 
 **you'll need to define the value labels
 }
li quest1 ee_quest1 
*-------------------------------------------------END CODE

Thursday, February 18, 2010

Merging international / cross-national data using Stata

Giulia Catini, Ugo Panizza, & Carol Saade have published "Macro Data 4 Stata" with the codes to link up many popular international and cross-national economic datasets.   
This is similar to a Stata adofile on SSC called -kountry- which cross-links other international political science or economic datasets.   Taken together, this provides a great resource for working with international datasets.

Monday, February 15, 2010

"Aero Snap" with OSX

A colleague's new favorite critique of the Mac is that it doesn't have the "aero snap" feature that MS has been touting in their recent ads (where you can drag a window to the left or right of the screen and it will become a half window on that size of the screen.  I admit this feature does look handy (in lieu of simply dragging the corner to resize the screen)...here's a freeware that will do what aero snap will do in Mac (and it will do it for the top half of the screen and the bottom half, in addition to the snap left and snap right features in Win7): TwoUp

The only downside (kind of) is that it operates on keyboard shortcuts.  There is a paid version of this software, called Cinch,  that will let you simply drag the window for the snap feature (at $7 its probably a bargain, but I like to stick to freeware apps)

I like using shortcuts, and my hands are at the keyboard more than the mouse, so this works for me, but it might not work for others. (to get mouse gestures like aero, you could use this with BTT)  Speaking of BTT,  another great app that I'm using a lot is called "secondbar" which puts a second menu bar on an extra monitor, so that you don't have to go back to the main screen to click the menus.

Tuesday, February 9, 2010

Encoding Variables and Labeling Values Consistently (and Efficiently) in Stata

In PHPM 672 this week, we are working with -label define-, -label variables-, and -label values- to label data.  You can use these these (in combination with commands like -recode- and -replace-) to get a consistently labeled dataset, but there are three utilities that can get you there more efficiently:  -encode-, -multencode-(from SSC), and a suite of utilities in -labutil-(from SSC).
The goal is to get to a common value assignment across all similar variables (example:  many variables that use the same categorical scale of responses).  You can simply use -encode- if you are sure that the string variables that you are using to build categorical variables with value labels all have the same categories that are all spelled the exact same way.  If not, you will need to use -multencode- to deal with missing categories and -replace- or -subinstr()- to correct any spelling deviations.  Finally, once you've labeled the categorical variables consistently, you want to put them in an order that makes sense (e.g., lowest to highest) using -labvalch-

Here's an example of this process with a sample dataset.  Copy and paste this snippet into a do-file & run it to see how this process words:


*-------------------------------------------------BEGIN CODE
clear
**this first bloc will create a fake dataset, run it all together**
input str12 region regioncode str20 quest1 str20 quest2 str20 quest3
"Southwest" 1 "Strongly Agree" "Strongly Disagree" "Disagree"
"West" 2 "Agree" "Neutral" "Agree"
"North" 3 "Disagree" "Disagree" "Strongly Disagree"
"Northwest" 5 "Disagree" "Agree" "Strongly Agree"
"East" 4 "Strongly Disagree" "Strongly Agree" "Agree"
"South" 9 "Neutral" "Agree" "Agreee"
end
//1. Create labeled REGION variable
/*
If we -encode- region it would not line up with regioncode
because encode operates in alphabetical order, for example:
*/
encode region, gen(region2) label(region2)
fre region2   //<-- these values don't match regioncode
drop region2


/* 
INSTEAD, we use -labmask- to quickly assign the values in 
region to the regioncodes
*/
ssc install labutil
labmask regioncode, values(region)
fre regioncode


//2. Creating comparable survey question scales
/*
We want all the survey questions to be on the same scale 
so that we can compare them in a model or table
-encode- can help us here with quest 1 and 2 because they
have the same categories, but quest3 has different categories 
(it's missing "neutral" and "agree" is spelled differently, so we could
either (1) use replace to define the numeric categoreis for these 
survey questions and then relabel them with -label define- and
-label values-, or (2) use -multencode- after fixing the misspelled 
"agree" value in quest3 
*/
replace quest3 = "Agree" if quest3=="Agreee"
**
ssc install multencode
multencode quest1-quest3, gen(e_quest1-e_quest3)
label li
fre e_*
/* 
The categories are labeled properly, but the scale isn't in
order--we want it to increase in satisfaction as it moves from
1 to 5
*/
 //-labvalch- is also from -labutil-
labvalch quest1, f(1 2 3 4 5) t(4 2 3 5 1)
label li
fre e_*
*-------------------------------------------------END CODE


.

Sunday, February 7, 2010

Quick Links

1. This study demonstrates how reducing information/effort obstacles for the college application process increases the college attendance rate.
At PPRI we're seeing some of this same pattern in Texas high schools where we've helped evaluate one high school reform grant or another.  Campus counselors or other staff are sitting down with HS seniors to help them fill out FAFSA and other application materials, and many of them think that this is helping to improve their college attendance rates.  However, the problem here is knowing what really happens when that senior exits high school & if they do start college, what their likelihood is of actually completing a degree.

2. Yet another reason to learn Bayesian Modeling / Analysis 

3. PhD Comics take down of news media poll reporting

4. Is the spurious regression problem spurious? [1] [2]

3D Mac Desktop via "bumptop"

I like the way bumptop organizes file stacks (similar to "stacks" in the dock, but more flexible)--though the jury is still out about whether it will improve my productivity or not.

Below is my 3D desktop using bumptop. The stacks of photos are there for illustration, but the video at the bumptop webpage shows how you can sort and scrub through stacks of documents/photos.  Notice how you can post links, pending documents, and sticky notes on the 4 walls (the back/bottom wall is not visible unless you navigate to it).  You can also switch to 2D mode on the fly.


Thursday, February 4, 2010