Hi Neil,
Well, I guess I should respond....;-)
"I think the 3NF is indefensible."
Well, I agree just plain 3NF for a DW is indefensible and in a number of cases of NCR implementations I've had the opportunity to see the TV bit was somehow completely left off such important entities as customer/account.
If you include 3NF+TV+SA I'd disagree.
There are some absolutely compelling reasons why this model is a useful and valuable model, as I have shared here. Especially in the case where the questions to be posed could not be reasonably known prior to the design of the DW.
Dimensional models have some limitations on them and recording every value of every field in every operational system that stands a chance of being valuable in the future and still be fast/easy to query and understand is one of those limitations.
"First of all, there is no detail data that shouldn't be queried, and if we move the detail into the marts, why did we need the 3NF in the first place?"
What you are saying is that 'all detailed data should be queried'.
Who says all data should be queried? Some data may never be queried.
If a company cannot cost justify keeping all data forever some data will be lost and perhaps never queried.
The designer must decide what data will NOT be kept and will therefore not be queryable by the user or anyone else. The designer must decide the retention time of data, and if data is to be archived, the archive mechanism, the costs associated with it, the delays associated with it and balance that against a reasonable cost benefit analysis.
In this kind of model the transactions need to be in 'both places'.
Also, just by the way, in 1994 I learned the lesson (the hard way)of putting the real keys into the fact tables so that a fact table could participate in both the archive layer as a 3NF table as well as in the base layer of the dimensional model. So transactions, and even snapshot fact tables, can be designed to participate in both layers without needing two copies of the data. (A tip for those on the list because I see many DWs where the transactions are needlessly replicated in
both....;- )
Indeed, this point is one of the few very minor points of disagreement Ralph and I had over the years. I believe there is value in keeping the real keys on the facts and Ralph has recommended that real keys are taken off the transactions. (So what...We are both entitled to our opinions, and he's a much better writer than me anyway..;-) )
"Also, if it's hard to understand and query, doesn't that make the ETL job (both into and out of) more complicated?"
Yes. ETL into and out of the archive layer is complicated, time consuming to write, time consuming to run, and difficult to change. In a word. It is EXPENSIVE. Really, really expensive. The whole concept of an archive layer as well as an analytical layer has been the MOST EXPENSIVE EDW proposition for years now.....I've never argued that it wasn't.
This is one reason why many companies don't do it. Because they cannot see the cost benefit of doing it. Very reasonable.
"3NF was designed for one purpose, transaction processing, it has absolutely nothing to do with being neutral or flexible."
Well, I wasn't there, but that's not what I read when I read Date/Codd.
I read 3NF was a design technique aimed at reducing redundancy of data.
One side effect was to make updates more efficient.
Remember, at the time 3NF was developed disk was around USD300,000 per GB!!!! For Codd/Date to make relational sound like a good idea minimising the amount of disk was important.
Codd/Date were competing internally in IBM with IMS which did not require the duplication of keys in different tables, and that made IMS faster and cheaper. A relational model requires more disk/processor than a hierarchical model and 3NF was an effort to narrow the gap.
Since then, people have claimed all sorts of things of 3NF. From 'application neutral' to supporting the 'natural order of things which is why it is called the 'normal' formal'. You are correct. It is all crap. It was about promoting the relational model to IBM.
All the history aside....
The bottom line is many organisations around the world have found great value in archiving information without a clear view of exactly what they were going to do with it.
If an organisation is in the category where they really can justify the archiving of information without a clear idea of what they will do with it what other options does one have?
Building a (non-IWS) dimensional model is not going to enable the company to achieve what it is they want to achieve.
(Or at least, in 4 years of thinking and trying I was unable to get a dimensional model to support both query requirements and archive requirements. And a number of the 'best and brightest' put their minds to this very problem archive data in a dimensional model because we knew the dimensional model had to stay. We failed. Simple as that. Which is why the archive layer has become a widely used implementation.
The 'analogy' used is like taking a photograph.
We don't know what questions we might ask about the organisation (or the external environment) so every so often (usually daily) we take an 'electronic photograph' of the company as it is at the end of the day.
(Notice that in most cases we don't try to take multiple photographs per day, though I did in one case, every 8 hours.)
At some point in the future someone asks a question about the state of the company at some point in time in the past. Since you know you have a 'photograph' at the end of the day you know you can answer the question of the company status at the end of the day.
If the user then wants some kind of trend analysis on this new question you know you can do that too because you know you have a photograph at the end of every day. So you can build daily/weekly/whatever trend analysis.
Remember, the 'architectural object' I am talking about is the answer to the question: 'Give me the answer to any question I might ask, even though I don't know what the question is.'
A dimensional model, necessarily, is the architectural object that is the answer to the question 'Give me the answer to any question I might ask from the set of information of this groups of facts and dimensions'.
And generally we limit the number of facts and dimensions.
I agree, and I would argue 95% if not more of ALL business questions can be answered from a well designed dimensional data warehouse with a broad coverage of data.
It is easy to make the case the other 5% doesn't matter. Much easier than it is to make the case of spending the same amount of money again to get the other 5% of answers.
But I challenge anyone to answer the 'unknown' question from a dimensional model that has a defined set of facts and dimensions (ie.
lossy) that can only be answered from the 'lost' data. It cannot be done.
To be able to answer the 'unknown' question one requires 'non-lossy'
data models which are at least understandable by the technical people who build them. To provide great performance of queries to end users one
requires dimensional models or a ton of processing power (teradata).
Now Neil, you may disagree, you may call this all BS and you have every right to your opinion.
I'm just putting forth the fact I've seen these kinds of models, I've built them, I've seen them answer questions that the best designer with the best intentions would not have covered in the dimensional model.
I've seen answers generated by such databases lead to multi-millions of dollars in profit for large organisations.
From the purely practical financial point of view of a company it hardly matters what 'modelling technique' is used if the company gets a multi-million dollar kick. And I am very practical about these
things....;-)
Actually, it pains me to see on this forum so much discussion on 'modelling' and so little discussion on 'business value', despite my best efforts to get such discussion started.
I am guessing only 2 people from this group signed up for the 'business discussion only' yahoo group I started.
And with such little interest in the >>REAL VALUE<< of BI on this list my opinion is we as a group deserve the poor reputation we currently have in the business community at large.
Best Regards
Peter Nolan
Data Warehousing Consultant
Mobile: +353 879 581 732
Homepage: