Skip to main content

Precision in Stata

In this post, I explore how to deal with precision issues with Stata.

First, Create Data for Example.



.clear 
. set obs 1000
obs was 0, now 1000
. g x = 1.1
. list in 1/5, noobs
  +-----+
  |   x |
  |-----|
  | 1.1 |
  | 1.1 |
  | 1.1 |
  | 1.1 |
  | 1.1 |
  +-----+
. count if x ==1.1 // zero matches!!
    0

Precision of Stata storage formats

Stata isnt wrong, it's just that you stored the variable x with too little precision (some decimal numbers have no exact finite-digit binary representation in computing). If we change the precision to float or store the variable as double format then it fixes the issue. Note below how  x is represented in Hexidecimal and Binary IEEE format vs. Stata general (16g) and fixed (f) format.
. 
. count if x == float(1.1)
 1000
. **formats 
. di %21x x //hex
+1.19999a0000000X+000
. di %16L x //IEEE precision
000000a09999f13f
. di %16.0g round(x, .1)
             1.1
. di %4.2f round(x, .1)
1.10
. di %23.18f round(x, .1)
   1.100000000000000089

Double formats

Storing the variable (now x) as double format fixes this issue. You could even change all default variable storage to double, however it'd make your dataset bloated and it's usually unnecessary - you really only need to change variables that require full precision or are being displayed in a table/graph.
. g double y = 1.1
. count if y ==1.1 //works now.
 1000
anything you like!

Solutions

Let's look at how to deal with stored results on the fly. The hackish/kludgy solution we have used previously was to convert it to a string and take the substring to truncate the value. This is not ideal.
. g   z = 999/_n
.  qui su z, d
.  di `"`r(mean)'"'
7.477985390007496
. di `"`=round(`r(mean)', 1.1)'"'
7.700000000000001
. di `"`=substr(`"`=round(`r(mean)', .01)'"', 1, 4)'"' //kludge using str
7.48

Instead, we should use one of the solutions below. These include using the extended macro function 'display' to properly format and / or round these values (SOLUTION 1) or create variables with proper display format (think of display format like a 'mask' over the true (and accurate) stored value) (SOLUTION 2).
. **SOLUTION 1: use extended function format**
.  qui su z, d
.  di `"`r(mean)'"'
7.477985390007496
. local r:display  %3.2f `r(mean)'
. di `"`r'"'  //use stored result
7.48
. local r:display  %3.2f `=round(`r(mean)',.01)'
. di `"`r'"' //use calculated/rounded result
7.48
. g mean = `r(mean)'
. local r: display %3.2f `=mean'
. di `"`r' vs. `=mean'"' //use stored variables
7.48 vs. 7.477985382080078
. **SOLUTION 2: create precise, formatted variable or scalar**
.  qui su z, d
. g  double p1 = `r(mean)'
. di %3.2f `=p1[1]'  //display without macro extension
7.48
. l p1 in 1
     +-----------+
     |        p1 |
     |-----------|
  1. | 7.4779854 |
     +-----------+
. *fix display format:
. format p1 %3.2f
. l p1 in 1 //fixed
     +------+
     |   p1 |
     |------|
  1. | 7.48 |
     +------+
. 
Instead of macros or variables, we can also work with lightweight -scalar-s to get the same result.
. *note:
. scalar s1 = `r(mean)'
. di  %3.2f  s1
7.48
. di s1
7.4779854
.  assert `=s1' == p1    //true

Further Reading

For more information on storage precision, check out these items written by the owner of Stata William Gould  HERE and also HERE

Comments