In PHPM 672 this week, we are working with -label define-, -label variables-, and -label values- to label data. You can use these these (in combination with commands like -recode- and -replace-) to get a consistently labeled dataset, but there are three utilities that can get you there more efficiently: -encode-, -multencode-(from SSC), and a suite of utilities in -labutil-(from SSC).
The goal is to get to a common value assignment across all similar variables (example: many variables that use the same categorical scale of responses). You can simply use -encode- if you are sure that the string variables that you are using to build categorical variables with value labels all have the same categories that are all spelled the exact same way. If not, you will need to use -multencode- to deal with missing categories and -replace- or -subinstr()- to correct any spelling deviations. Finally, once you've labeled the categorical variables consistently, you want to put them in an order that makes sense (e.g., lowest to highest) using -labvalch-
Here's an example of this process with a sample dataset. Copy and paste this snippet into a do-file & run it to see how this process words:
.
The goal is to get to a common value assignment across all similar variables (example: many variables that use the same categorical scale of responses). You can simply use -encode- if you are sure that the string variables that you are using to build categorical variables with value labels all have the same categories that are all spelled the exact same way. If not, you will need to use -multencode- to deal with missing categories and -replace- or -subinstr()- to correct any spelling deviations. Finally, once you've labeled the categorical variables consistently, you want to put them in an order that makes sense (e.g., lowest to highest) using -labvalch-
Here's an example of this process with a sample dataset. Copy and paste this snippet into a do-file & run it to see how this process words:
*-------------------------------------------------BEGIN CODE clear
**this first bloc will create a fake dataset, run it all together** input str12 region regioncode str20 quest1 str20 quest2 str20 quest3 "Southwest" 1 "Strongly Agree" "Strongly Disagree" "Disagree" "West" 2 "Agree" "Neutral" "Agree" "North" 3 "Disagree" "Disagree" "Strongly Disagree" "Northwest" 5 "Disagree" "Agree" "Strongly Agree" "East" 4 "Strongly Disagree" "Strongly Agree" "Agree" "South" 9 "Neutral" "Agree" "Agreee" end
//1. Create labeled REGION variable /* If we -encode- region it would not line up with regioncode because encode operates in alphabetical order, for example: */ encode region, gen(region2) label(region2) fre region2 //<-- these values don't match regioncode drop region2 /* INSTEAD, we use -labmask- to quickly assign the values in region to the regioncodes */ ssc install labutil labmask regioncode, values(region) fre regioncode //2. Creating comparable survey question scales /* We want all the survey questions to be on the same scale so that we can compare them in a model or table -encode- can help us here with quest 1 and 2 because they have the same categories, but quest3 has different categories (it's missing "neutral" and "agree" is spelled differently, so we could either (1) use replace to define the numeric categoreis for these survey questions and then relabel them with -label define- and -label values-, or (2) use -multencode- after fixing the misspelled "agree" value in quest3 */ replace quest3 = "Agree" if quest3=="Agreee" ** ssc install multencode multencode quest1-quest3, gen(e_quest1-e_quest3) label li fre e_* /* The categories are labeled properly, but the scale isn't in order--we want it to increase in satisfaction as it moves from 1 to 5 */ //-labvalch- is also from -labutil- labvalch quest1, f(1 2 3 4 5) t(4 2 3 5 1) label li fre e_* *-------------------------------------------------END CODE
.
Comments
Post a Comment