Many stake holders of the project such as business users, project managers, business analysts really care about the data conversion scripts and the quality of the conversion. Since this conversion is business entity related and matters a lot as future business/functionality depends on the data being logically equivalent to the legacy system. Using proper Data migrations specialists is important here!
1. Start data conversion work earlier
We have found that converting data earlier in the project life cycle is very helpful even when our data model is not stable yet. This is useful in many ways
- It helps by forcing the developers to think and account for data in the legacy database that may not be covered by specs provided by the business analysts
- Business analysts can communicate with the business users using real data from the legacy system that the business is used to seeing and understands the data.
- Business users when using the new system before it goes live can understand how their new system behaves with legacy data and business entities such as Customers, Products etc. Giving the business users familiarity with the system and easing their transition to the new system.
- When its time to go-live we have converted the data so many times that its no longer a surprise, as all the bugs, data weirdness have been found and dealt with.
2. Automated compare of data
We compare the data from the legacy database with the new application database that is being developed. This comparison can be automated using sql that creates logically equivalent objects from both databases.
- I blogged about usingAutomated Data Compare of comparing data.
- We could also useDiffKit which is an open source framework that lets you compare databases, excel spreadsheets, flat files or custom formats on the legacy database side with databases, excel spreadsheets, flat files or custom formats with the new databases. Or use frameworks that export the database into xml, yml etc formats such as yaml_db
3. Dealing with duplicates
In legacy databases, as the system gets used over many years some business entities may get duplicated and during data conversion will get merged/collapsed into a single entity. So while comparing data we should remember to do UNIQUE or DISTINCT on the SQL we use to compare the data. In some cases we may end up normalizing some of the data and end up with multiple rows for something that was represented using one row in the legacy database.
4. Dealing with magic values
Systems use magic values to represent data or state of certain business process, such as NULL, 0, N/A, Nil and other strings. When converting we cannot convert these values as-is, we need to understand what do they mean, are these values being converted to mean something else on the application front end? are we using the same logic to convert the data?