Skip to main content

Generating Random/Fake String Data in Stata

When posting to Statalist I usually try to provide an example of my question or answer using the in-built "auto.dta" dataset, the -input- command to manually create a dataset,  or by generating fake, random data using Stata functions.  To create fake, random numeric data, you can use any of the random number generators detailed in -help random_numbers- (such as runiform), but there is no random generator for alphabet characters (A-Z or a-z).  

Sometimes it's useful to illustrate to Statalist or students in class how to manipulate the dataset if it includes some kind of string variable that you want to use to identify panels or illustrate how to -encode- variables, etc.  (or maybe you just want a random string generator because you lost your dice for playing Scattergories)

-ralpha- generates random string characters for Stata.    In many cases, you could generate the numeric variable and -tostring- it, but if you need string (alpha) characters, this package presents an easy way to obtain them.


ralpha [newvarname] [, Loweronly Upperonly Range(string)]


upper - random alpha from uppercase letters only
lower - default; random alpha from lowercase letters only
range() -   examples include: A/Z,  a/z, A/z(uppercase is first), a/c, A/G
            - numerical range stored in `r(num_range)'
If [newvarname] is left blank, the variable "ralpha" is created (if it doesn't already exist). 


which ralpha                       //<--  see instructions

//Example 1 //
set obs 20
ralpha                           //nothing specified-new var named "ralpha" by default
ralpha lowerdefault,             //no options specified - default is lowercase
ralpha upper, upperonly
ralpha lower, low

//Example 2: Using the range() option //
**Note: range goes from a/Z (a to Z)
set obs 20
ralpha somerange, range(A/z)
ralpha, range(B/g)
    di in white "`r(num_range)'"      //Here's numerical range equiv. of "B/g"

//Example 3: create random words/strings in a loop //
set obs 50
g newword = ""
loc lnewword 5                  //how many letters in new word?
forval n = 1/`lnewword' {
ralpha new, upp
replace newword = newword + new
drop new

**make newword proper**
replace newword = proper(newword)