Skip to main content

Data cleaning with Google Refine

There's a lot to be said about the data and text cleaning abilities of programs like R [1] [2] and Stata [3] [4] [5].  But when it comes to cleaning up data with lots of spelling errors, different forms of the same string, abbreviations, acronyms, etc - or - if you've got to task a student worker who's skill set barely includes M$ Excel, then Google Refine (it used to be called Freebase gridworks) is a great tool for cleaning data.
Here's the  Google Code page and below is a video on it's data cleaning tools.  Google Refine can also transform data and access external data (like JSON data) from other websites, but I've found it most useful for data cleaning.  


  1. There are other options too, like Bloglovin, but I haven't given that one a go. Also, don't forget to follow me on Facebook, Twitter, Pinterest and Instagram (links to your right) for all kinds of blog updates and other fun info. Now back to our regular weekend programming! positioning yourself to attract the most web traffic


Post a Comment