Skip to main content

Happy holidays - I made you this Stata tree...








Two Stata-related problems/issues I've been working on involve building weaved, literate documents via -markstat- (from SSC) and post-processing Stata graphs via gr_edit commands.

gr_edit commands are not documented but are helpful for adding elements across many graphs that you'd otherwise have to add manually (or by creating a graph recording). The biggest downside to gr_edit commands (or applying graph recordings) is that it's a slow process when you have many hundreds/thousands of graphs to edit.

In the code example below, I made you a holiday tree with 100k lights. I know, this is very thoughtful of me. You are welcome.

Update (Dec 25): First here's a new version of a holiday tree using -tw scatter-/-tw bar-. I like this version much better than the approach I had originally included when I first put up this post. These previous attempts using -scatteri- and gr_edit are still included below :

**************!


clear
set obs 2000
g z = 10
g obs = _n
g zero = 0

g a = runiform()*10
replace a = a+mod(obs, 500)+round(obs, 100)

g b = a
replace b = -b
replace obs = -obs
 g lights = 1 if runiform()<.33
 g lights2 = 1 if runiform()<.13
 g lights3 = a + round(obs, runiform()*10000)
 g lights4 = b - round(obs, runiform()*10000)
 g lights5 = a + round(obs, runiform()*10000)
 g lights6 = b - round(obs, runiform()*20000)
 g lights7 = a + round(obs, runiform()*10000)
 g lights8 = b - round(obs, runiform()*10000)
 g zzz = .
 g zzz2 = .
forval n =  5(1)500 {
 set obs `=_N+1'
 replace zzz  = 150 in l
 replace zzz2 = -150 in l
 replace obs = obs[_n-1]-1 in l
 }
 
tw (bar a obs , horiz color(forest_green%66))  (bar b obs , horiz color(forest_green%66))  ///
     (scatter  obs a if lights==1 ,   color(red%20)) (scatter  obs b if lights==1 ,   color(red%2))  (scatter  obs a if lights2==1 ,   color(gold%20)) (scatter  obs b if lights2==1 ,   color(gold%2)) (bar zzz obs , horiz color(brown))  (bar zzz2 obs , horiz color(brown)) ///
  (scatter  obs lights3  ,   color(gold%20)) (scatter  obs lights4 ,   color(gold%5))  (scatter  obs lights5  ,   color(red%20)) (scatter  obs lights6  ,   color(red%25)) (scatter  obs lights7 if runiform()<.01 , msize(large) msym(circle_hollow)   color(red%43)) (scatter  obs lights8 if runiform()<.01,  msize(large) msym(circle_hollow)  color(blue%33)) ///
  (scatter zero obs in 1, msangle(forty_five) msize(huge) mcolor(gold) msym(Dh) )   (scatter zero obs in 1, msangle(stdarrow) msize(huge) mcolor(gold) msym(Dh) )  (scatter zero obs in 1, msangle(two_seventy) msize(vlarge) mcolor(gold%50) msym(D) )  (scatter zero obs in 1,   msize(vhuge) mcolor(gold%60) msym(D) ) ///
  ,   yscale(off) xscale(off) legend(off)  
**************!

When I was trying to make a tree, I first tried doing this with -scatteri- and failed (unless my goal was to create a sad lop-sided tree that Charlie Brown would envy), but I left that code/attempt in the post below. You can see that the problem is with how I'm creating random points that should stay inside the triangular boundary. Rather than try (too hard) to figure out what was going wrong, I moved on to using the helpful -triplot- (from SSC, Nick Cox) triangular plotting environment to make my holiday tree a little more tree-shaped. Here’s the Stata code & toy example (weaved via the new -markstat- (from SSC) to html). First, pick some starting values, plot the starting values for the triangle, and then calculate the other vertices for the triangle:
. clear all    

. loc a = 2

. loc b = 1

. loc c = 7
. tw (scatteri `a' `b' `a' `c', mcolor(forest_green) msymbol(triangle))

. graph export firstscatter.png, replace as(png) width(300)
(file firstscatter.png written in PNG format)

. graph export firstscatter.pdf, replace as(pdf)
(file /Users/ebooth/Desktop/firstscatter.pdf written in PDF format)
Simple plot of triangle coordinates
. *****************************************    
. //calculate top point
. loc flatten = `a'*.33 // flatten top factor

. loc xmid = (`a'+`c')/2
. loc ytop = sqrt((`a'-`c')^2 - (`xmid'-`a')^2+`flatten')

. tw (scatteri `a' `b' `a' `c' `a' `xmid' `xmid' `ytop', mcolor(forest_green) msize(large) msymbo
> l(triangle))

. graph export secondscatter.png, replace as(png) width(300)
(file secondscatter.png written in PNG format)

. graph export secondscatter.pdf, replace as(pdf)
(file /Users/ebooth/Desktop/secondscatter.pdf written in PDF format)
Simple plot of triangle vertices
. *forlater*
. loc amid = (`a'+`ytop')/2

. loc bmid = (`ytop'+`c')/2
. *****************************************    
. //shift main triangles
. loc set `"recast(connected) lcolor(forest_green) lwidth(thin) lpattern(tight_dot)  cmissing(y) 
> mcolor(forest_green) msize(vsmall) msymbol(triangle_hollow)"'
. forval sh = 2(2)10  {
  2. loc a`sh' = `a' + (2*.`sh')  
  3. loc b`sh' = `b' + (.`sh'+.3) 
  4. loc c`sh' = `c' - (.`sh'+.30)
  5. loc xmid`sh' = (`a`sh''+`c`sh'')/3 -  mod(.`sh',2)
  6. loc ytop`sh' = sqrt((`a`sh''-`c`sh'')^2 - (`xmid`sh''-`a`sh'')^2+`flatten') + (.`sh'*5)
  7. if `ytop`sh''>`ytop' loc ytop`sh' = `ytop' //set lim
  8. loc incl`sh' `" (scatteri `a`sh'' `b`sh'' `a`sh'' `c`sh'', `set')  (scatteri `a`sh'' `b`sh''
>  `xmid`sh'' `ytop`sh'', `set') (scatteri `xmid`sh'' `ytop`sh'' `a`sh'' `c`sh'', `set') "'
  9. loc inn `"`inn' `incl`sh'' "'
 10. }
. *****************************************    
Create some fake data for scattering the triangles and lights on the tree:
. **scatter
. set obs 10000
number of observations (_N) was 0, now 10,000
. g x = `a'+runiform()*9

. g y = `a'+runiform()*9

. g inside = 0

. replace inside = 1 if inrange(x, `b', `ytop'-.25) & inrange(y, `a', `bmid'-.25)
(930 real changes made)

. replace inside = 0 if !inrange(x, `a', `amid'-.25) & inside==1 & !inrange(y, `amid'+.25, `xmid'
> +.25)
(313 real changes made)
. g red = 1 if runiform()<.33 & inside==1
(9,774 missing values generated)

. g blue = 1 if runiform()<.23 & inside==1 & red!=1
(9,918 missing values generated)

. g tree = 1 if runiform()<.33 & inside==1 & !inlist(1, red , blue)
(9,897 missing values generated)
. //plot
. loc set2 `"recast(connected) lcolor(forest_green) lwidth(vthick) lpattern(tight_dot)  cmissing(
> y) mcolor(forest_green) msize(large) msymbol(triangle)"'
. tw ///
> (scatter x y if inside==1 & red==1, mcolor(red%5) msym(circle) ) ///
> (scatter x y if inside==1 & blue==1, mcolor(gold%10) msym(circle) ) ///
> (scatter x y if inside==1 & tree==1, mcolor(forest_green%50) msymbol(smtriangle_hollow) ) ///
> (scatteri `a' `b' `a' `c', `set2') ///
> (scatteri `a' `b' `xmid' `ytop', `set2') ///
> (scatteri `xmid' `ytop' `a' `c', `set2') ///
> `inn' /// 
> (scatteri 1 3.75 1 4.75, recast(connected) lcolor(brown) mcolor(brown) msy(square) lwidth(vthic
> k) lpattern(solid) ) ///
> (scatteri 1 3.75 2 3.75 , recast(connected) lcolor(brown) mcolor(brown) msy(square) lwidth(vthi
> ck) lpattern(solid) ) ///
> (scatteri 1 4.75 2 4.75 , recast(connected) lcolor(brown) mcolor(brown) msy(square) lwidth(vthi
> ck) lpattern(solid) ) ///
> , yscale(range(0 5) off) xscale(range(0 6) off) ylabel(, nogrid) legend(off) ///
> title("{bf:Happy Holidays!}" "{bf:2017}", size(large) just(right) pos(1) color(red%33))
. graph export treescatter.png, replace as(png) width(300)
(file treescatter.png written in PNG format)

. graph export treescatter.pdf, replace as(pdf)
(file /Users/ebooth/Desktop/treescatter.pdf written in PDF format)
Scatterplot with random, somewhat triangular data
This looks like garbage (not too far from how it looks in real life when I decorate a tree). Mostly this is because I dont know how to evenly spread out the random numbers inside the triangle coordinates/bounds. So, next I try again with triplot (from SSC). I get a little closer here but it’s heavily skewed toward the bottom left vertice (or 0,0) but it’s good enough to get the point across::
. *****************************************    
. **triplot tree**
. discard

. clear all

. set obs 100000
number of observations (_N) was 0, now 100,000
. g Happy = runiform()*100

. g Holidays = runiform()*100

. g y2017 = runiform()*110

. replace Happy = 40+runiform()*50 if Happy<50 & Holidays<50
(24,993 real changes made)

. replace Holidays = 40+runiform()*50 if Happy<50 & Holidays<50
(5,053 real changes made)

. replace y2017 = 60+runiform()*30 if Happy<50 & Holidays<50
(1,034 real changes made)

. replace y2017 = runiform()*55  if Happy+Holidays+y2017>100
(94,482 real changes made)

. replace Happy = runiform()*50 if Happy+Holidays+y2017>100
(85,599 real changes made)

. replace Holidays = runiform()*50 if Happy+Holidays+y2017>100
(52,360 real changes made)

. lab var Happy "{bf:HAPPY}"

. lab var Holidays "{bf:HOLIDAYS!}"

. lab var y2017  "{bf:2017}"

. g x = 90+runiform()*10

. recode y2017 (0/60 = `=x')  //adjust
(y2017: 100000 changes made)

. g color = cond(runiform()<.33, 2, 1)

. replace color = 0 if runiform()<.25
(24,926 real changes made)

. fre color

color
──────────────┬────────────────────────────────────────────
              │      Freq.    Percent      Valid       Cum.
──────────────┼────────────────────────────────────────────
Valid   0     │      24926      24.93      24.93      24.93
        1     │      50293      50.29      50.29      75.22
        2     │      24781      24.78      24.78     100.00
        Total │     100000     100.00     100.00           
──────────────┴────────────────────────────────────────────
. su 

    Variable │        Obs        Mean    Std. Dev.       Min        Max
─────────────┼─────────────────────────────────────────────────────────
       Happy │    100,000    28.11309    18.01648   .0007678   97.25909
    Holidays │    100,000    28.35786    18.56113   .0000872   97.60745
       y2017 │    100,000    92.95022           0   92.95022   92.95022
           x │    100,000    94.98757    2.879511   90.00002   99.99966
       color │    100,000      .99855    .7050339          0          2
. triplot  Happy Holidays y2017, separate(color) max(100) legend(pos(2) ring(0) col(1)) ms(Th poi
> nt point) mcolor(forest_green%40 red%80 gold%73)  msize(large medlarge medlarge)legend(off)  la
> bel(nolabels)  y( lcolor(forest_green%90) lpat(tight_dot)) frame(lpat(tight_dot) lwidth(thick) 
> lcolor(forest_green))text(mlabsize(medlarge) mlabcolor(green%76))

So there isn’t any addplot() top option for -triplot- that I can find, so one of the reasons I put this out is to show how to hack this via the gr_edit (undocumented) commands to draw the tree trunk. The GIF video is included to show how this (painfully) redraws the graph each time to perform these manual edits for the triplot figure::
. *****************************************    
.  //manual graph edit hack//
.  gr_edit   .plotregion1.AddLine added_lines editor .426560596115965 -.0141328237294942 .4265605
> 96115965 -.0961003663966345

.  gr_edit   .plotregion1.added_lines_new = 1

.  gr_edit   .plotregion1.added_lines_rec = 1

.  gr_edit   .plotregion1.added_lines[1].style.editstyle  linestyle( width(thin) color(black) pat
> tern(solid) align(inside)) headstyle( symbol(circle) linestyle( width(thin) color(black) patter
> n(solid) align(inside)) fillcolor(black) size(medium) angle(stdarrow) symangle(zero) backsymbol
> (none) backline( width(thin) color(black) pattern(solid) align(inside)) backcolor(black) backsi
> ze(zero) backangle(stdarrow) backsymangle(zero)) headpos(neither) editcopy

.  gr_edit   .plotregion1.AddLine added_lines editor .5717903884510243 -.0120836351628156 .571790
> 3884510243 -.0940511778299559

.  gr_edit   .plotregion1.added_lines_new = 2

.  gr_edit   .plotregion1.added_lines_rec = 2

.  gr_edit   .plotregion1.added_lines[2].style.editstyle  linestyle( width(thin) color(black) pat
> tern(solid) align(inside)) headstyle( symbol(circle) linestyle( width(thin) color(black) patter
> n(solid) align(inside)) fillcolor(black) size(medium) angle(stdarrow) symangle(zero) backsymbol
> (none) backline( width(thin) color(black) pattern(solid) align(inside)) backcolor(black) backsi
> ze(zero) backangle(stdarrow) backsymangle(zero)) headpos(neither) editcopy

.  gr_edit   .plotregion1.added_lines[2].style.editstyle linestyle(color(brown)) editcopy

.  gr_edit   .plotregion1.added_lines[2].style.editstyle linestyle(width(vthick)) editcopy

.  gr_edit   .plotregion1.added_lines[2].style.editstyle linestyle(pattern(solid)) editcopy

.  gr_edit   .plotregion1.added_lines[1].style.editstyle linestyle(color(brown)) editcopy

.  gr_edit   .plotregion1.added_lines[1].style.editstyle linestyle(width(vthick)) editcopy

.  gr_edit   .plotregion1.added_lines[1].style.editstyle linestyle(pattern(solid)) editcopy
. graph export treeplot.png, replace as(png) width(500)
(file treeplot.png written in PNG format)

. graph export treeplot.pdf, replace as(pdf)     
(file /Users/ebooth/Desktop/treeplot.pdf written in PDF format)
A random forest starts with a single tree...
. *****************************************    
It’s not exactly what I was going for, but close enough to move on to something else. (Waclaw Sierpinski might approve of those inner triangles though. Also this was obviously decorated with bias…by a short, left-handed person who likes the 0,0 vertex). Happy Holidays - see you next year.
GIF showing how Stata redraws the graph to add the trunk

Comments