There are several solutions to this question, but more importantly, it got me thinking more about referencing cells using brackets (called "explicit subscripting" or using "_variables" in the Stata manual).
Before I go into subscripting, here's a recap of how to label numeric variables from a list contained in a string variable based on the Statalist exchange:
So based on this, I started doing some more exploration of subscripting, especially in the context of grouped observations. I run into a lot of situations where I want to grab the largest/smallest/average/etc. value in each group, and this can easily be accomplished via some -egen- command in combination with -bys-; however, extracting specific values from a group is made easier by explicit subscripting. A second advantage of subscripting is that it can reduce your typing of complicated "if" statements when generating new variables.
Here are some examples:
One last thing that I found interesting...you can use subscripting to help fill in missing observations when a spreadsheet uses "super rows". This is particularly frustrating when working with timeseries datasets from sources like WDI or state health data that I use frequently.
Often, these spreadsheets have a super row identifier every couple of rows, and the rows in between are meant to be a part of that same group. These subrows might be years of data for a country or counties in a state, etc. Of course, Stata sees this as missing data.
Here's an example of how to work with data in that shape:
This is pretty simple ((and probably obvious to most Stata programmers)), but I didn't realize it until long after I had spent many grueling hours during grad school cleaning up these types of datasets by dragging down the columns in Excel or writing clunky Excel 'if' statements. ugh.