Skip to main content

Why ( •_•)>⌐■-■ / (⌐■_■) when you can 😐➡️😎 ? Using Emoji (and other unicode chars) in Stata

The other day I was importing email and text messages on my Mac in Stata from the .db files that they are stored in so that I could do some analysis on how often I text/email certain people. I've never been able to get odbc to work on my Mac in Stata, so I cheated with Sqlite3 and imported with:

  
foreach tbl in chat chat_msg_join message handle chat_handle_join {
! sqlite3 ~/Library/messages/chat.db
! sqlite3 .tables
! sqlite .headers on
! sqlite .mode csv
! sqlite .output /users/ebooth/`tbl'.csv
! sqlite SELECT * FROM `tbl' ;
! sqlite .quit
}

to get my texts.  Merging tables from chat.db is another (very long) story.  


I opened these csv files in Excel to inspect them and noticed that Emoji from the texts were missing (I assumed they'd be converted into some sort of unicode or hex equivalent), but when I imported these messages into Stata I was surprised to see something like the picture to the right where Emoji were being included in Stata (in the results window and data browser!).

Emoji are nearly 20 years old this year (!) (first created by Shigetaka Kurita at NTT DoCoMo creating the i-mode mobile Internet platform) and I'm finally getting around to using them in texts/emails. Hence the recent phenomenon of them popping up while (entertaining my weird obsession with) using Stata to analyze my own e-communication (as well as fitness tracking, work productivity, etc) behaviors. 


From what I understand Stata's ability to show/use Emoji is a Mac OSX-specific feature (Mac OSX understands Emoji unicode, other OSs do not), but this is still potentially useful (to me), particularly for inserting symbols into graphs.  

The example below explore some uses (and limitations) of Emoji on the Mac version of Stata. 


First, let’s create a dataset with Emoji.  I live in Texas and we produce a lot of oil, so let's try to make Stata output some data in the shape of Texas using the Emoji oil pump (note the string reshape using `sxpose` to get this to work):

. set scheme plottig

. clear

. set obs 2
number of observations (_N) was 0, now 2

. g x = "πŸ’­"

. g t0  = `" ⛽️⛽️⛽️                     "'

. g t1  = `" ⛽️⛽️⛽️                     "'

. g t2  = `" ⛽️⛽️⛽️⛽️                   "'

. g t3  = `"   ⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽           "'

. g t4  = `"⛽⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽⛽⛽⛽⛽️      "'

. g t5  = `"  ⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽⛽⛽⛽⛽     "'

. g t6  = `"  ⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽⛽⛽⛽   "'

. g t7  = `"  ⛽️ ⛽️⛽️⭐️⛽️⛽️⛽          "'   

. g t8  = `"    ⛽️⛽️⛽️⛽️⛽            "'

. g t9  = `"    ⛽️⛽️⛽               "'

. g t10  = `"    ⛽️⛽                "'

. g t11  = `"     ⛽               "'

. sxpose, clear
number of observations (_N) was 2, now 13

. l  , noobs clean nohead 
                                           πŸ’­                                          πŸ’­  
             ⛽️⛽️⛽️                                  ⛽️⛽️⛽️                       
             ⛽️⛽️⛽️                                  ⛽️⛽️⛽️                       
             ⛽️⛽️⛽️⛽️                                ⛽️⛽️⛽️⛽️                     
            ⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽                      ⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽             
    ⛽⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽⛽⛽⛽⛽️         ⛽⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽⛽⛽⛽⛽️        
       ⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽⛽⛽⛽⛽           ⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽⛽⛽⛽⛽       
        ⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽⛽⛽⛽          ⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽⛽⛽⛽     
               ⛽️ ⛽️⛽️⭐️⛽️⛽️⛽                        ⛽️ ⛽️⛽️⭐️⛽️⛽️⛽            
                  ⛽️⛽️⛽️⛽️⛽                            ⛽️⛽️⛽️⛽️⛽              
                    ⛽️⛽️⛽                                 ⛽️⛽️⛽                 
                      ⛽️⛽                                   ⛽️⛽                  
                        ⛽                                     ⛽               

This looks pretty good in the -markstat- output above, but here's the screenshot of the Texas Emoji-art in the results window, do-file editor, and browser.  

Not too shabby ( click on the image to embiggen ):








At first I thought that maybe there were extended ASCII character equivalents for Emoji that Stata could understand, so I tried using -charlist- to inspect variables that contained Emoji. This fails in the sense that -charlist-/Stata is showing the ascii chars as legal values like 141 , 150, 166 however these values dont show up when you -display- them in Stata.  


. clear

. set obs 10
number of observations (_N) was 0, now 10

. cap which charlist

. if _rc ssc install charlist

. g t = "πŸ·πŸΊπŸ–πŸΈπŸΈπŸΊπŸΈπŸΊπŸ¦πŸΈ"

. charlist t
��������

. foreach a in chars sepchars ascii {
  2.     di `"`r(`a')'"' //inspect
  3.     }
��������
� � � � � � � � 
141 150 159 166 183 184 186 240 

So, this doesnt work for characters above the regular range of values:

. di `"`=char(166)'"'
�

. di `"`=char(186)'"'
�

. g x = `"`=char(141)'"'

. l x

     ┌───┐
     │ x │
     ├───┤
  1. │ � │
  2. │ � │
  3. │ � │
  4. │ � │
  5. │ � │
     ├───┤
  6. │ � │
  7. │ � │
  8. │ � │
  9. │ � │
 10. │ � │
     └───┘

. forval n = 120/160 {
  2.     di as err `"`n' :: "' in green `"`=char(`n')'"'
  3.     }
120 :: x
121 :: y
122 :: z
123 :: {
124 :: |
125 :: }
126 :: ~
127 :: 
128 :: �
129 :: �
130 :: �
131 :: �
132 :: �
133 :: �
134 :: �
135 :: �
136 :: �
137 :: �
138 :: �
139 :: �
140 :: �
141 :: �
142 :: �
143 :: �
144 :: �
145 :: �
146 :: �
147 :: �
148 :: �
149 :: �
150 :: �
151 :: �
152 :: �
153 :: �
154 :: �
155 :: �
156 :: �
157 :: �
158 :: �
159 :: �
160 :: �


Ok - fine. Emoji are probably most useful in graphs anyways. 


You can use the function ustrunescape() to directly interpret unicode characters, similar to using char() to interpret ascii characters.  You can get more unicode codes for Emoji and other symbols from  http://www.unicode.org/charts/PDF/U1F600.pdf

. di `" `=ustrunescape("\u263A")'    "'    //works
   

I’ve used unicode in graphs before, and here’s an example from this Statalist post where I discussed some ideas about plotting large unicode characters (braces) on graphs to help highlight portions of the data: https://www.statalist.org/forums/forum/general-stata-discussion/general/1425537-how-to-add-braces-to-graphs-in-stata

. sysuse auto, clear
(1978 Automobile Data)
. su price, d

.  g x1 = price if price == `r(p10)'
(73 missing values generated)

. g x2 = price if price == `r(p95)'
(73 missing values generated)

. g y1 = "}"  if !mi(x1)|!mi(x2)
(72 missing values generated)

Plot this with:
. twoway (scatter price mpg)  ///
> (scatter x1 mpg, msymbol(point) mlabel(y1) mlabsize(vhuge) mlabcolor(blue) mlabposition(0)) ///
> (scatter x2 mpg, msymbol(point) mlabel(y1) mlabsize(vhuge) mlabcolor(green) mlabgap(4) mlabposition(0)) ///
> , text(4200 37  "}" , size(vhuge) color(red) ) ///
> text( 10000 27  "`=ustrunescape("\u23AB")'" "`=ustrunescape("\u23AC")'"  "`=ustrunescape("\u23AD")'" , size(vhuge) color(purple)) //
> /
> text( 1500 18 "`=ustrunescape("\u2B45")' No Data here!", size(large) color(ply1)) ///
> text( 500 18 `"     `=ustrunescape("\u2B45")' Here's some info"', size(large) color(ply1)) ///
> text( 5300 34.5 `"  Median is: `r(p50)'"'  , size(medlarge) color(orange))  ///
> text( 4800 30 "`=ustrunescape("\u21AF")'"  , size(huge) color(orange))    legend(off)

.  gr export ex.png, replace
(file ex.png written in PNG format)

Which produces:


In my day-job, I work with school data (particularly in Texas), so in the next example, I import some Texas Education Agency data of school locations (from the GIS directory site) and then plot schools.  

. clear

. cap rm addr

. copy https://opendata.arcgis.com/datasets/059432fd0dcb4a208974c235e837c94f_0.csv addr.csv, replace public

. insheet using addr.csv
(35 vars, 8,701 obs)

. g nn = _n

. global coord `" xlab(-107(2)-92) ylab(26(2)36) "'

.  scatter y X, mcolor(gs5%20) msize(large) ${coord}

.  gr export one.png, replace
(file one.png written in PNG format)
. tw ///
> ( scatter y X if mi(chart) & magnet == "No", mcolor(gs3%40) msym(Oh)) ///
> ( scatter y X if !mi(chart),   mlabsize(tiny) mcolor(red%70) ) ///
> ( scatter y X if magnet=="Yes",   mlabsize(tiny) mcolor(ebblue%80) ) ///
> ,   ${coord} legend(order(1 "Regular schools" 2 "Charter" 3 "Magnet") )

.   gr export two.png, replace
(file two.png written in PNG format)

This first scatterplot creates an overview of the location of schools in Texas (I dont go through the extra effort to import the data that would enable us to use -grmap- to produce the shape/polygons for Texas here, instead I just use the X Y coordinates of schools):




In the second graph above, I differentiate charter and magnet schools (using Stata 15's new transparency feature).  As you can see from this example, charter and magnet schools are mostly in the metro areas:


I think this looks great  ( •_•)>⌐■-■ / (⌐■_■)  
But let's add some Emoji to the map to see how else we might present this information.  Importantly, I tried adding transparency to the Emoji to allow for the overlay and it didnt work as I had hoped. 


. g mlab = "πŸ“˜" if magnet == "Yes"
(8,438 missing values generated)

. g clab3 = "🚌" if  index(chart, "OPEN")
(8,051 missing values generated)

. g clab2 = "πŸŽ“" if  index(chart, "COLLEGE")
(8,669 missing values generated)

. g clab1 = "πŸ“" if  index(chart, "CAMPUS")
(8,624 missing values generated)

. ta chart, g(c_)

               CHART_TYPE │      Freq.     Percent        Cum.
──────────────────────────┼───────────────────────────────────
           CAMPUS CHARTER │         77       10.14       10.14
COLLEGE/UNIVERSITY CHARTE │         32        4.22       14.36
  OPEN ENROLLMENT CHARTER │        650       85.64      100.00
──────────────────────────┼───────────────────────────────────
                    Total │        759      100.00
Plot this using the
. tw ///
> ( scatter y X if mi(chart) & magnet == "No", mcolor(gs3%40) msym(Oh) msize(vsmall)) ///
> ( scatter y X if c_2==1,   mlabsize(tiny) msymbol(none) mcolor(none) mlab(clab2) mlabsize(large) mlabcolor(%10) ) ///
> ( scatter y X if c_1==1,     mlabsize(tiny) msymbol(none) mcolor(none) mlab(clab1) mlabsize(large) mlabcolor(%10)) ///
> ( scatter y X if magnet=="Yes",     mlabsize(tiny) msymbol(none) mcolor(none) mlab(mlab) mlabsize(large) mlabcolor(%10) ) ///
> ,   ${coord} legend(order(1 " •   Regular schools" 2 "πŸŽ“ College charter" 3 "🚌 Open charter" 4 "πŸ“˜ Magnet")  symxsize(0) forcesize 
> )

.   gr export three.png, replace
(file three.png written in PNG format)



This produces the figure below where you can see how the location of the different types of schools (edit: I was in a hurry and left the open charter schools out of the graph code ((but it's still in the legend)), but you get the idea). The lesson here is the Emoji definitely work in Stata graphs and can be useful, but I'd reserve them for presenting more sparse or non-overlapping data, perhaps presenting info on discrete categories (like likert data) would be most useful.


Notice that I got rid of the default keys/symbols in the legend and embedded the symbols by using: legend(order(1 " • Regular schools" 2 "πŸŽ“ College charter" 3 "🚌 Open charter" 4 "πŸ“˜ Magnet") symxsize(0) forcesize)










Comments