Sunday, February 21, 2010

Update: Encoding Variables and Labeling Values Consistently (and Efficiently) in Stata

On 2010-02-09,  I posted this example on strategies for labeling values for lots of variables efficiently in Stata.

Today I discovered a function in NJC's -egenmore- extension of -egen- that I think is easier to use and, in many cases, faster to implement than the combination of -labutil- and -multencode- that I had suggested in my Feb 9 posting; so, to extend my previous example, here how we could label those variables using -egen- and the function "ston()" (scroll to the bottom to see the UPDATED code):

*-------------------------------------------------BEGIN CODE
**this first bloc will create a fake dataset, run it all together**
input str12 region regioncode str20 quest1 str20 quest2 str20 quest3
"Southwest" 1 "Strongly Agree" "Strongly Disagree" "Disagree"
"West" 2 "Agree" "Neutral" "Agree"
"North" 3 "Disagree" "Disagree" "Strongly Disagree"
"Northwest" 5 "Disagree" "Agree" "Strongly Agree"
"East" 4 "Strongly Disagree" "Strongly Agree" "Agree"
"South" 9 "Neutral" "Agree" "Agreee"
//1. Create labeled REGION variable
If we -encode- region it would not line up with regioncode
because encode operates in alphabetical order, for example:
encode region, gen(region2) label(region2)
fre region2   //<-- these values don't match regioncode
drop region2

INSTEAD, we use -labmask- to quickly assign the values in 
region to the regioncodes
ssc install labutil
labmask regioncode, values(region)
fre regioncode

//2. Creating comparable survey question scales
We want all the survey questions to be on the same scale 
so that we can compare them in a model or table
-encode- can help us here with quest 1 and 2 because they
have the same categories, but quest3 has different categories 
(it's missing "neutral" and "agree" is spelled differently, so we could
either (1) use replace to define the numeric categoreis for these 
survey questions and then relabel them with -label define- and
-label values-, or (2) use -multencode- after fixing the misspelled 
"agree" value in quest3 
replace quest3 = "Agree" if quest3=="Agreee"
ssc install multencode
multencode quest1-quest3, gen(e_quest1-e_quest3)
label li
fre e_*
The categories are labeled properly, but the scale isn't in
order--we want it to increase in satisfaction as it moves from
1 to 5
 //-labvalch- is also from -labutil-
labvalch quest1, f(1 2 3 4 5) t(4 2 3 5 1)
label li
fre e_*
**using egenmore & the "ston()" function:
ssc install egenmore
forval n = 1/3 {
 egen ee_quest`n' = ston(quest`n'), to(1/5) /*
 */ from("Strongly Disagree" Disagree Neutral Agree "Strongly Agree")
 label val ee_quest`n' quest1
 **note: val label "quest1" already defined, if not, 
 **you'll need to define the value labels
li quest1 ee_quest1 
*-------------------------------------------------END CODE

