The other day I was importing email and text messages on my Mac in Stata from the .db files that they are stored in so that I could do some analysis on how often I text/email certain people. I've never been able to get odbc to work on my Mac in Stata, so I cheated with Sqlite3 and imported with:
! sqlite3 ~/Library/messages/chat.db
! sqlite3 .tables
! sqlite .headers on
! sqlite .mode csv
! sqlite .output /users/ebooth/`tbl'.csv
! sqlite SELECT * FROM `tbl' ;
! sqlite .quit
}
to get my texts. Merging tables from chat.db is another (very long) story.
I opened these csv files in Excel to inspect them and noticed that Emoji from the texts were missing (I assumed they'd be converted into some sort of unicode or hex equivalent), but when I imported these messages into Stata I was surprised to see something like the picture to the right where Emoji were being included in Stata (in the results window and data browser!).
From what I understand Stata's ability to show/use Emoji is a Mac OSX-specific feature (Mac OSX understands Emoji unicode, other OSs do not), but this is still potentially useful (to me), particularly for inserting symbols into graphs.
The example below explore some uses (and limitations) of Emoji on the Mac version of Stata.
. set scheme plottig . clear . set obs 2 number of observations (_N) was 0, now 2 . g x = "💭" . g t0 = `" ⛽️⛽️⛽️ "' . g t1 = `" ⛽️⛽️⛽️ "' . g t2 = `" ⛽️⛽️⛽️⛽️ "' . g t3 = `" ⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽ "' . g t4 = `"⛽⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽⛽⛽⛽⛽️ "' . g t5 = `" ⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽⛽⛽⛽⛽ "' . g t6 = `" ⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽⛽⛽⛽ "' . g t7 = `" ⛽️ ⛽️⛽️⭐️⛽️⛽️⛽ "' . g t8 = `" ⛽️⛽️⛽️⛽️⛽ "' . g t9 = `" ⛽️⛽️⛽ "' . g t10 = `" ⛽️⛽ "' . g t11 = `" ⛽ "' . sxpose, clear number of observations (_N) was 2, now 13 . l , noobs clean nohead 💭 💭 ⛽️⛽️⛽️ ⛽️⛽️⛽️ ⛽️⛽️⛽️ ⛽️⛽️⛽️ ⛽️⛽️⛽️⛽️ ⛽️⛽️⛽️⛽️ ⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽ ⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽ ⛽⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽⛽⛽⛽⛽️ ⛽⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽⛽⛽⛽⛽️ ⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽⛽⛽⛽⛽ ⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽⛽⛽⛽⛽ ⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽⛽⛽⛽ ⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽️⛽⛽⛽⛽ ⛽️ ⛽️⛽️⭐️⛽️⛽️⛽ ⛽️ ⛽️⛽️⭐️⛽️⛽️⛽ ⛽️⛽️⛽️⛽️⛽ ⛽️⛽️⛽️⛽️⛽ ⛽️⛽️⛽ ⛽️⛽️⛽ ⛽️⛽ ⛽️⛽ ⛽ ⛽
This looks pretty good in the -markstat- output above, but here's the screenshot of the Texas Emoji-art in the results window, do-file editor, and browser.
Not too shabby ( click on the image to embiggen ):
At first I thought that maybe there were extended ASCII character equivalents for Emoji that Stata could understand, so I tried using -charlist- to inspect variables that contained Emoji. This fails in the sense that -charlist-/Stata is showing the ascii chars as legal values like 141 , 150, 166 however these values dont show up when you -display- them in Stata.
. clear . set obs 10 number of observations (_N) was 0, now 10 . cap which charlist . if _rc ssc install charlist . g t = "🍷🍺🍖🍸🍸🍺🍸🍺🍦🍸" . charlist t �������� . foreach a in chars sepchars ascii { 2. di `"`r(`a')'"' //inspect 3. } �������� � � � � � � � � 141 150 159 166 183 184 186 240
So, this doesnt work for characters above the regular range of values:
. di `"`=char(166)'"' � . di `"`=char(186)'"' � . g x = `"`=char(141)'"' . l x ┌───┐ │ x │ ├───┤ 1. │ � │ 2. │ � │ 3. │ � │ 4. │ � │ 5. │ � │ ├───┤ 6. │ � │ 7. │ � │ 8. │ � │ 9. │ � │ 10. │ � │ └───┘ . forval n = 120/160 { 2. di as err `"`n' :: "' in green `"`=char(`n')'"' 3. } 120 :: x 121 :: y 122 :: z 123 :: { 124 :: | 125 :: } 126 :: ~ 127 :: 128 :: � 129 :: � 130 :: � 131 :: � 132 :: � 133 :: � 134 :: � 135 :: � 136 :: � 137 :: � 138 :: � 139 :: � 140 :: � 141 :: � 142 :: � 143 :: � 144 :: � 145 :: � 146 :: � 147 :: � 148 :: � 149 :: � 150 :: � 151 :: � 152 :: � 153 :: � 154 :: � 155 :: � 156 :: � 157 :: � 158 :: � 159 :: � 160 :: �
Ok - fine. Emoji are probably most useful in graphs anyways.
You can use the function
ustrunescape()
to directly interpret unicode characters, similar to using char()
to interpret ascii characters. You can get more unicode codes for Emoji and other symbols from http://www.unicode.org/charts/PDF/U1F600.pdf. di `" `=ustrunescape("\u263A")' "' //works ☺
I’ve used unicode in graphs before, and here’s an example from this Statalist post where I discussed some ideas about plotting large unicode characters (braces) on graphs to help highlight portions of the data: https://www.statalist.org/forums/forum/general-stata-discussion/general/1425537-how-to-add-braces-to-graphs-in-stata
. sysuse auto, clear (1978 Automobile Data)
. su price, d . g x1 = price if price == `r(p10)' (73 missing values generated) . g x2 = price if price == `r(p95)' (73 missing values generated) . g y1 = "}" if !mi(x1)|!mi(x2) (72 missing values generated)
Plot this with:
. twoway (scatter price mpg) /// > (scatter x1 mpg, msymbol(point) mlabel(y1) mlabsize(vhuge) mlabcolor(blue) mlabposition(0)) /// > (scatter x2 mpg, msymbol(point) mlabel(y1) mlabsize(vhuge) mlabcolor(green) mlabgap(4) mlabposition(0)) /// > , text(4200 37 "}" , size(vhuge) color(red) ) /// > text( 10000 27 "`=ustrunescape("\u23AB")'" "`=ustrunescape("\u23AC")'" "`=ustrunescape("\u23AD")'" , size(vhuge) color(purple)) // > / > text( 1500 18 "`=ustrunescape("\u2B45")' No Data here!", size(large) color(ply1)) /// > text( 500 18 `" `=ustrunescape("\u2B45")' Here's some info"', size(large) color(ply1)) /// > text( 5300 34.5 `" Median is: `r(p50)'"' , size(medlarge) color(orange)) /// > text( 4800 30 "`=ustrunescape("\u21AF")'" , size(huge) color(orange)) legend(off) . gr export ex.png, replace (file ex.png written in PNG format)
Which produces:
In my day-job, I work with school data (particularly in Texas), so in the next example, I import some Texas Education Agency data of school locations (from the GIS directory site) and then plot schools.
. clear . cap rm addr . copy https://opendata.arcgis.com/datasets/059432fd0dcb4a208974c235e837c94f_0.csv addr.csv, replace public . insheet using addr.csv (35 vars, 8,701 obs) . g nn = _n . global coord `" xlab(-107(2)-92) ylab(26(2)36) "' . scatter y X, mcolor(gs5%20) msize(large) ${coord} . gr export one.png, replace (file one.png written in PNG format)
. tw /// > ( scatter y X if mi(chart) & magnet == "No", mcolor(gs3%40) msym(Oh)) /// > ( scatter y X if !mi(chart), mlabsize(tiny) mcolor(red%70) ) /// > ( scatter y X if magnet=="Yes", mlabsize(tiny) mcolor(ebblue%80) ) /// > , ${coord} legend(order(1 "Regular schools" 2 "Charter" 3 "Magnet") ) . gr export two.png, replace (file two.png written in PNG format)
This first scatterplot creates an overview of the location of schools in Texas (I dont go through the extra effort to import the data that would enable us to use -grmap- to produce the shape/polygons for Texas here, instead I just use the X Y coordinates of schools):
In the second graph above, I differentiate charter and magnet schools (using Stata 15's new transparency feature). As you can see from this example, charter and magnet schools are mostly in the metro areas:
I think this looks great ( •_•)>⌐■-■ / (⌐■_■)
But let's add some Emoji to the map to see how else we might present this information. Importantly, I tried adding transparency to the Emoji to allow for the overlay and it didnt work as I had hoped.
. g mlab = "📘" if magnet == "Yes" (8,438 missing values generated) . g clab3 = "🚌" if index(chart, "OPEN") (8,051 missing values generated) . g clab2 = "🎓" if index(chart, "COLLEGE") (8,669 missing values generated) . g clab1 = "📍" if index(chart, "CAMPUS") (8,624 missing values generated) . ta chart, g(c_) CHART_TYPE │ Freq. Percent Cum. ──────────────────────────┼─────────────────────────────────── CAMPUS CHARTER │ 77 10.14 10.14 COLLEGE/UNIVERSITY CHARTE │ 32 4.22 14.36 OPEN ENROLLMENT CHARTER │ 650 85.64 100.00 ──────────────────────────┼─────────────────────────────────── Total │ 759 100.00
Plot this using the
. tw /// > ( scatter y X if mi(chart) & magnet == "No", mcolor(gs3%40) msym(Oh) msize(vsmall)) /// > ( scatter y X if c_2==1, mlabsize(tiny) msymbol(none) mcolor(none) mlab(clab2) mlabsize(large) mlabcolor(%10) ) /// > ( scatter y X if c_1==1, mlabsize(tiny) msymbol(none) mcolor(none) mlab(clab1) mlabsize(large) mlabcolor(%10)) /// > ( scatter y X if magnet=="Yes", mlabsize(tiny) msymbol(none) mcolor(none) mlab(mlab) mlabsize(large) mlabcolor(%10) ) /// > , ${coord} legend(order(1 " • Regular schools" 2 "🎓 College charter" 3 "🚌 Open charter" 4 "📘 Magnet") symxsize(0) forcesize > ) . gr export three.png, replace (file three.png written in PNG format)
This produces the figure below where you can see how the location of the different types of schools (edit: I was in a hurry and left the open charter schools out of the graph code ((but it's still in the legend)), but you get the idea). The lesson here is the Emoji definitely work in Stata graphs and can be useful, but I'd reserve them for presenting more sparse or non-overlapping data, perhaps presenting info on discrete categories (like likert data) would be most useful.
Comments
Post a Comment