Cleaning data with OpenRefine | Xpdient CIC
Cleaning data for CiviCRM
on 11 Oct 2016 4:42 PM

Getting data into the right shape and cleaning it up is one of most important - and occasionally painstaking - tasks to do before you can move it into a CRM.

What is dirty data?

You might be in an organisation where everyone keeps the same sort of data in different spreadsheets, for example addresses. Or you might keep one sort of data in one spreadsheet - volunteers names and addresses for example - and a timetable for the same volunteers in another.

This sort of arrangement leads to a lot of repetition and a lot of errors. Spreadsheets do not really have any error checking mechanisms built into them.

One of the biggest bugbears is address data. It didn’t use to matter when we communicated using land mail, and spreadsheets and Word were great for printing out address labels.

But now we have geo-coding and address lookups, it’s really important that the right data goes in the right place. So postal codes are always in the same column, city names in the same column, countries in the same column. And all spelled correctly.

CRMs are picky about the data they import

Most CRMs - and definitely with CiviCRM - will not accept badly formatted data, so getting it clean is really important.

Xpdient do this stuff all the time an one program we use more than any other is Open Refine, formerly Google Refine.

Open Refine to the rescue

It’s free to use - you just download it and run it in your default browser. You can then load up any Excel or text file and bigin sorting out your data, merging it, correcting it and so on. It takes a little bit of learning but not much to achieve really good results.

And the really good thing is that you can save all things you did and then use it on the same spreadsheet if you need to red- it again later, when there is extra data.