Monday, March 28, 2011

Some -statplot- examples, Part 2 (wrapping long labels)

...continued from Part 1...
Part 1 of this post covered some advanced examples of -statplot-, focusing on the use of combinations of over() and by() options.
In Part 2, I examine some strategies to use -statplot- with really long variable and/or value labels.  Recently, I was using -statplot- to create some tables in a paper where some of the labels in the tables needed to be the (longish) question and answer choice text, I discovered how long labels can really be a pain for graphs.  This is a problem for any graph in Stata, regardless of whether your labels are in the legend or at the axis; however, my preference is that long labels (up to a limit) look better at the axis.
So, the examples below show how to use -statplot- options to create wrapped labels.  I hope to create an option to help make this a part of -statplot- at some point in the future, but for now, the code below is a good template for helping you to automate wrapping labels.  This can be extended to other plotting packages/commands.
Continuing from the last post, we're using the in-built "nlsw88" dataset.  Let's first look at plots with long variable labels first, and then we'll look at long value labels (which are a bit more complicated).
Note: Please make sure you update your -statplot- to the latest version since an earlier version of the program bites when you have double quotes in suboptions, as I do in the examples below.


1. Wrapping Long Variable Labels
Figure 9 (below) shows what happens when we have really long variable labels for grade, tenure, and wage.
*********************************begin
sysuse nlsw88, clear
**varlabels**
lab var grade "Really Long Variable Label for the Variable GRADE that will cutoff at 80 chars"
lab var tenure "Another Really Long Var Label for the Variable TENURE that will cutoff at 80 chars"
lab var wage "Long Variable Label, this time for the Variable WAGE that will cutoff at 80 chars"
d grade tenure wage
****************!beginFig9
statplot grade tenure wage,  ///
    tit("Long Variable Labels", size(small))
****************!endFig9
*********************************end








Fig. 9
Obviously, this is not ideal.  The most straight-forward way to wrap these labels is to use the relabel() suboption for the option over().  So, let's do this and just edit the first variable's (grade) label in Fig. 10:
***********************!beginFig10
statplot grade tenure wage,  ///
    tit("Wrapping Long Variable Labels - Manually", size(small)) ///
    varopts( relabel(1 `" "asdfasdfasdfasdf" "asdfasdfasdfff" "'))
***********************!endFig10
Fig. 10
This is better.  However, it can be kind of a pain to manually relabel variables by counting up the number of characters and putting in the double quotes to wrap each label so that they each have roughly equal length.  We could automate this process to (1) save time and (2) use an extended macro function (see -help extended_functions-) to create roughly equal labels without breaking in the middle of words.  
The loop below is a bit long, but it allows you to wrap variable labels automatically.  You can change the `ll' macro to the length you prefer for Fig. 11:
***********************!beginFig11
loc vars grade tenure wage
loc ll 25 //sets the length of the labels
loc j = 1
 foreach u of local vars {
loc label ""
loc len ""
loc len = length(`"`:var l `u''"')
loc pieces ""
loc pieces `"`=int(`len'/`ll')+1'"'
di `"`len' :: `pieces'"'
  if `pieces' == 1 loc label `"`:var l `u''"'
 if `pieces' != 1 {
  forval n = 1/`pieces' {
  loc label `"`label' `"`:piece `n' `ll' of `"`:var l `u''"''"'"'
} //n
}
  loc relabel `"`relabel' `j' `"`label'"'"'
loc j = `j'+1
}
 
 
di `"`relabel'"'
 
****statplot
statplot `vars',  ///
    name(g1, replace) ///
    varopts( relabel( `relabeling' ))  
statplot `vars',  ///
    name(g2, replace)  over(race)   ///
    varopts(label(labsize(vsmall)) relabel(`relabel') ) bargap( 5 ) 
gr combine g1 g2, title("Wrapping Long Variable Labels - Automatically")
    graph export "fig11.png", as(png) replace
***********************!endFig11
Fig. 11


Nick Cox wrote a good alternative to the code above in one of the examples included in the -statplot- help file (-h statplot-).  He uses -separate- to automatically create "very short" variable labels before plotting the data:
**********from -statplot- help file
sysuse nlsw88, clear

 statplot wage, over(race) over(union)
 separate wage, by(race) veryshortlabel
 statplot wage?, over(union)
**********


2. Wrapping Long Value Labels
Value labels can be much longer than variable labels (32,000 chars v. 80 chars -- see -help limits-).   Similar to above, we can create wrapped value labels of the over() variable (race) by using the relabel() suboption in Fig. 12:
*********************************begin
lab def racelbl 1 "White:  Really Long label for Race goes here" ///
2 "Black:  Another really long label " ///
3 "Other:  Another really long label", modify
lab val race racelbl
**************!beginFig12
statplot grade tenure wage,  ///
    name(g2, replace) tit("Long VALUE Labels") ///
    over(race, relabel(1 "test" 2 "test")) ///
    varopts( relabel( `relabeling' ))    
**************!endFig12
********************************end
Fig. 12
However, it's a lot more efficient (and fun) to create a code to automate the shortening of long value labels for us.  
The code below wraps long value labels, but it might be worth adding some code to also truncate ridiculously long value labels at some point.  Notice that the loop that helps to automate the long value label wrapping is similar to the loop for wrapping variable labels, but it includes a portion that uses -levelsof- to detect the values of the over() variable.
***********************!beginFig13
loc ll 20 //sets the length of the val labels
loc j = 1
foreach u in race {
loc z `"`:value label `u''"'
di "`z'"
**detect levels of each var**
levelsof `u', loc(`u'levels)
foreach uu in ``u'levels' {
loc len = length("`:label   `z' `uu' '")
di "`len'"
if `len' > `ll' {
loc pieces "`=round(`len'/`ll')'"
forval p = 1/`pieces' {
loc p`p' : piece `p' `ll' of "`:label  `z' `uu' '", nobreak
loc relabeling`j'`uu' `" `relabeling`j'`uu''   `"`p`p'' "'   "'
} //pieces
} //if
loc relabelinglabels `" `relabelinglabels'  `j' `"`relabeling`j'`uu''"'  "'
loc `j++'
} //levelsof
} //foreach
di "`relabelinglabels'"
****statplot
statplot grade tenure wage,  ///
    name(g2, replace) tit("Long VALUE Labels") ///
    over(race, relabel( `relabelinglabels' )) ///
    varopts( relabel( `relabeling' ))    
***********************!endFig13
Fig. 13

 There's a lot packed into this graph --which sort of defeats the purpose of a graph -- but hopefully you won't need this much packed into the nested labels of your -statplot- graph. Note that you could also add bar gaps/spacing and smcl formatting to these labels to make them stand apart a bit more.  Finally, you may not want your value and variable labels permanently wrapped, so you can put preserve/restore before and after these loops above to make these changes temporary.

As always, watch for wrapping issues with the code snippets above.  Click to download the do-file  code for these figures.



No comments:

Post a Comment