Friday, October 11, 2013

What is database normalization? Good/bad? Yes...

There are a lot of people who talk about how important database normalization is for primary functionality tables and make it a talking point when evaluating the technical sides of an ERP. That is all well and good but I think the actual question should be around the developers of the solution and if they took normalization into account with the solution. If a table or set of functionality is normalized, it doesn't really tell me anything either good or bad. It only throws a red flag and a bunch of questions before determining anything else.

Hopefully this will show up on the first page of Bing/Google when someone types in something to the tune of 'AX 2012 database normalization'. <- 'Gratuitous line for search engines'

What is database normalization?

Database normalization (n) definition - (WIKIPEDIA) - Database normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database using the defined relationships. 

Sounds awesome right? Can be, not always. There are pros and cons to it. Sometimes a denormalization structure is best. I don't think anyone will agree that normalization is the answer 100%. While there are more points to the below, I'll only cover a few basic ones.

  • Updating tables can be faster (fewer fields)
  • Can be less risk of database locking
  • More flexibility to expand functionality of solution
  • Removes or reduces redundancy
  • Increase data retrieval times for single tasks (if designed correctly)
  • Enforce relationships between data in joined tables (referential integrity)
  • Can split up a table with many fields that will be quite large into multiple database servers
  • Indexing strategies not as effective as data is spread out across multiple tables
  • If a developer doesn't know what they are doing (or requirements are incorrectly identified), it can create a mess very quickly
  • don't have to worry about joins in a denormalized structure. 
  • reporting and integration can be an easier task than white boarding out a network of tables.
  • Indexing can be more effective
  • Easier to maintain and understand schema
  • Good approach to consider when CPU is an issue
  • Updating tables might be slower
  • Updating development on one large table with many fields
  • Retrieving records may be slower
Basically, the big take away is that this conversation is very situational. Different scenarios make situations unique.  It's like asking a financial adviser how much the person is supposed to sock away for retirement by age XX. Having 50k saved and 10k in credit card debt is worse than 40k saved and 0k in credit card debt. Many pieces in the equation to consider.

The ultimate answer is to know both normalized and denormalized DB strategies. Through thorough planning of a solution, you can determine the best approach. As for asking about database normalization, I think the best initial questions to ask up front are:
    a) are the developers of the solution competent
    b) did they take normalization into account when designing the solution

For more information, I strongly encourage you to do some more research into these concepts. It can only help!

No comments:

Post a Comment