Search
Tuesday, February 07, 2012 ..:: Forums ::.. Register  Login
 Peter's Forums Minimize
SearchForum Home
     
  Public Postings to Forums  DW List Postings  The compelling ...
 The compelling business case for an archival non-lossy
 
 11/16/2006 7:57:36 PM
User is offlinePeterNolan
380 posts
3rd


The compelling business case for an archival non-lossy

Hi Jim,

Comments in the email....

 

Hi Peter -

Thanks for the thought-provoking post.

>> Thank You...I'd like to see more 'thought provoking posts' and less

technology focused posts, but that's just me...;-)

It seems to me that the "ODS" has been a pretty malleable notion over the years. When I first read about it, it remember it as being a focused point of "as needed" integration for somewhat limited reporting. Later is was presented as a recipient of pre-integrated data (note later CIF diagrams depict the ODS as being fed FROM the monolithic ETL). And now it is described as a real-time application integration device. OK - but let's kick this latest set of tires for just a moment:

 

>> ODS evolved as a very malleable concept from many different places

for many different uses. There were many people working on something that was 'integrated, subject-oriented, transaction oriented, and volatile' for a variety of different purposes....I was doing them for call centers, some people were doing them for things like fast loan approvals, some for operational system integration etc......

>> Bill picked up on this very early in the piece, looked around at a

number of places building similar architectural objects and came up with the name 'Operational Data Store'....and now you get all these different opinions about what an 'operational data store' is......Bill defined it well. It's in the book. What a person wants to use it for is up to them..............there are no 'hard and fast' rules for construction/use of an ODS as far as I am aware and nor should there be.

>> On this forum of late it seems people are getting quite confused

between architecture and implementation....in large systems design these things have always been worlds apart.....I recall SNA, SAA etc. Though they were eventually implemented into products by IBM the architectures first defined at a high level were very flexible as to how one might move forward with implementation and what one might use them for....Bill tends to work more at this level and leave out the details of how one might actually implement portions of CIF....I personally believe this is an excellent way forward because it ensures designers do not become 'closed minded' and follow a set of 'rules' un-thinkingly......but some people will do this anyway... ;-)

>> With the things that Bill has published he leaves plenty of room for

a designer to interpret what to do in his/her own specific situation for the best result within an architectural framework. I like this. It means I do not have to 'obey some rule' to be 'compliant' when it's a rule that will compromise the application in some form or other. In the end, we still have to 'think about it' in quite some detail when we work with CIF.

 

I submit that these responsibilities are strikingly murky in ODS

literature that I've encountered. This literature instead tends to

focus on the ODS's near real time reporting capabilities (type 1-4) - which I submit are far more simply and powerfully met through the Kimball realtime

partition anyway. If the ODS is now realtime Dimension

Authority-meets-EAI, this role is not well supported by the its literature, nor is its interaction with the rest of the enterprise, or even with the data warehouse ETL.

>> The data integration issues of ODS are (pretty much) the same as

EDW....which is why, when I build both, I build the ODS first and send data thru the ODS to the EDW if the data needs to be in the ODS.

>> I would submit these issues are not 'murky' in ODS literature...as

far as I am aware they are not written about at all......and why? They are the same issues we have been dealing with for 15 years now...getting a bit old..;-)

>> I have just completed my first fully dimensional ODS. It is not yet

'near-real-time' but it will be when we move forward with Ascentials RTI engine to expose our existing jobs as 'always on' jobs and feed them with transactions rather than batch files....

>> And it will be integrated into Webmethods for EAI access to the

ODS...Webmethods will decide which system it goes to for data, ODS or source system and the application will be non-the-wiser. I have seen very few dimensional ODSs. As I said, we worry about getting the updates applied fast enough.....The ONLY ones I've seen with my own eyes are all Sybase IWS.

>> I'd be interested to hear if anyone else has managed to build a large

scale ODS (10M+ customers, 20M+ transactions per day) based on a dimensional model and make it work in real time......Anyone got one? No need to name who.

>> As I said here, I believe dimensional models (IWS-style) will become

dominant in the ODS/EDW area over the next 5 years.....

>> Just like in 1993 I used to tell people, "this star schema thing is

going to be THE way to build analytical applications in 5 years time".....(I was a little too optimistic...star schemas took a lot longer to catch on than I thought....

>> In fact, all the 'close mindedness' I hear from dimensional modellers

here and elsewhere closely reflects the attitudes I encountered 10 years ago from 3NF people.....

 

Now, as far 3NF being somehow more academically defensible as the "capture everything" database: I must respectfully express concerns about this as well. It assumes that a "correct" 3NF data model somehow is "application independent".

>> Well, I didn't say 3NF was more academically defensible as the

capture everything database. I said I typically use (and a lot of other people do to) 3NF + Time Variance (and less so nowadays stability

analysis) to be the archival database when I build one. I didn't say why I did that.

>> I did say I believe a 3NF model like that is more able to capture all

versions of data than a type 2 dimension. And the reason I say that is because in a type 2 dimension where the table will be queried (a basic notion of dimensional modelling) will typically store data from multiple different source files and be somewhat denormalised.....hence tracking changes to every field in the table creates a large volume of redundant data than capturing the same information in a 3NF TV+SA model.

>> And in fact, about 8 years ago I discovered that exactly the same

code could be used to build a type 2 dimension as a 3NF TV+SA model if you are just careful enough about getting data into and out of the staging area...instead of combining data to go onto the type 2 dimension you split it by volatility (yes, I even put integer keys onto my TV models but I only use them as internal keys..).....so discussions about using 3NF TV+SA models vs type 2 dimensions seems a waste of time to me because a 3NF TV+SA entity is a type 2 dimension...it's just that it is split based on data volatility and 3NF design rather than bringing together disparate data elements to improve query-ability of the tables.....after all a 3NF TV+SA model is not meant to be queried by the user, so it does not need to be easy to query or easy to understand....

 

 

Candidly, I've never seen an application independent database - even in Len Silverston's terrific and powerful Universal Data Models. Put a bunch of stellar 3NF modelers around an application-independent concept and you will NOT see identical models emerge.

>> It's a shame....this is what I was selling 15 years ago!!! We lost to

packages..

 

 

In practice, corporations shoe-horning mountains of information into some "field of dreams" 3NF enterprise data models have been some of the largest money and time wasters that I've encountered.

 

>> I 'explode with agreement' as one lister said recently.

>> IT people in general, and data modellers in particular are in love

with the idea that theirs is the one best way of doing anything. So, if you put 5 of any of them in a room you can get no agreement and nothing happens....The best systems are developed in teams that are pretty much a dictatorship. I recall early in my IBM career being told how the project I was on worked...the lead architect explained to me as follows:

"This project is a benevolent dictatorship on the days I feel benevolent. On the other days, it's just a dictatorship."

>> It was the most successful development project I ever worked on. It

is a model that I tend to emulate on my projects. Either I am the dictator or I am a spectator, pretty simple really...though I do say I am 'benevolent' on more days than my earstwhile predecessor... ;-)

>> As far as I am concerned the only way to cut through all the

'discussion' between multiple people which will go on forever is to have one person as the Data Warehouse Architect and make sure there is an 'apprentice' around in case the DWA gets hit by a Bus(no pun intended).

>> And you simply hold the DWA responsible to making the whole thing

work. Only problem is finding such people....there are not many around....

 

My 2cents worth....

Best Regards

Peter

 

Other voices welcomed!

--Jim Stagnitto

 

 

 

 

On Jul 23, 2004, at 5:42 PM, Peter Nolan wrote:

> http://www.DataWarehousing.com is sponsored by DataMirror, a leading

> provider of real-time data integration and resiliency solutions.

> Please visit our sponsor today at http://www.datamirror.com to access

> data warehousing white papers and best practices.

>

> For help with list commands, send a message to

> <mailto:dwlist-request@datawarehousing.com> with the word "help" in

> the body of the message.

>

> From: "Peter Nolan" <peter@peternolan.com>

>

>

>

>

> Hi Jim/Nick/All,

>

> 1. It is true that a CIF will cost much more than a purely dimensional

> BUS architecture DW as proposed by Ralph. No Question. Perhaps even

> double.

>

> 2. However, there are a couple of classes of questions and information

> which I do not believe are well covered by the BUS

> architecture/Dimensional models.

>

>

> I will explain each in detail:

>

> They are:

> 1. The increasing need for integrated near real time information aka.

> Operational Data Store. (Just look at how many people are buying

> middleware believing they can integrate information 'on the fly' with

> no

> ODS.)

>

> 2. The need to support questions which are not known and cannot be

> reasonably expected to be known at the time of construction of the DW.

>

>

> 1. ODS.

> With the advent of the call center in the mid 90s there has been an

> ever increasing need for a 'single view' of customer information in

> order

to

> service the customer. Further, with the drive for things like

> 'loan/credit card approval over the phone' there has been an

increasing

> need to integrate data more and more rapidly in order to support

> tactical operational decisions like 'do I approve this loan' or 'do I

> cross sell this product to this customer on the phone'? (I mean the

> customer had better not already own the product that I am cross

> selling.) These are compellingly important business processes.

>

> The ODS actually evolved after the DW when numerous DW projects were

> hijacked by the call center. I worked on a number of projects to pull

> parts of the DW out so that the call center could use the integrated

> customer data and the analysts could use the DW again. Then Bill

> published his ODS book and I think 'So that's what we've been

building,

> now we have a name for it, great!!!'

>

> The BUS Architecture, as I interpret it, does not do so much for me in

> integrating information for operational tactical decision making and

> customer service. Indeed, I believe (unless using IWS) it is not

> possible to support the update requirements of the ODS without using

> 3NF to achieve the transaction rates on current hardware in large

> organisations. The star join, by it's very nature quite time consuming

> and expensive to perform makes it a worrying prospect to try and build

> one that will be updated by transactions.

>

> This is also the stated opinion of Gartner and a number of the large

> consulting organisations. Doug Laney might let us know what Metas

> current position is on the concept of building an updatable ODS using

> dimensional techniques.

>

>

>

> 2. Undefined and impossible to predict questions.

> It may come as a surprise to many on this list that if you spend a lot

> of time with senior managers there are a LOT of questions that come up

> that no-one had asked before. Have not even thought to ask before.

> Have

> not even considered if they might even be important before.

>

> Dimensional modelling, buy it's very nature, tends to focus and limit

> the amount of data (in terms of number of fields) to those items that

> are necessary to run the business on an ongoing basis. That is, the

> information put into a dimensional DW is typically information where

> there is substantial reason to believe that the information will be of

> value. I've used this side-effect of dimensional modelling often. It

> helps get the project done and get value out of the project. This

> 'side-effect' has also led to the ridiculous situation where everyone

> believes and many vendors present the concept that it is possible to

> build a 'data warehouse' in 90 days. I must confess to being a part

of

> this for some time.

>

> However, this approach does not discount the value of answering the

> 'unknown question'. Indeed, the 'Business Requirement' from the

> (incredibly smart) Director of Marketing for my very first DW was, and

> I

> quote:

> 'I want the answer to any question I care to ask, and I want the

answer

> before I forget why I asked the question.' (And I was a pup SE at IBM

> at the time.)

>

> The only way to make sure that you can answer any question that might

> come up at any point in the future is to capture all the data that

> occurs within the organisation. ALL OF IT. Now, that's a bit much so

> some decisions need to be made about what to cull. However, such

> archives can be of tremendous value.

>

> Example 1. Actuarial analysis. Setting the price of life insurance

> policies is tremendously difficult. Keeping excellent records of all

> customer/policy information, as well as the general population, over

an

> extended period of time is extremely valuable. It can very well make

or

> break a life insurance company over an extended period. That's why the

> actuaries are so powerful in those companies.

>

> Example 2. Workplace location.

> Asbestos. It's a big deal in Australia. Many people have sued

companies

> for many millions of dollars claiming they were working in an

> environment where asbestos was present 20-30 years ago. Now they come

> forward with 4-5 witnesses to say 'yes the person was where he claims

> he was' and the companies have had little defense. Had they captured

> the work locations and durations at work locations of workers they

> might have a better chance to defend themselves. One might say, 30

> years

ago

> we didn't have the computers etc. That ignores the fact that in 20

> years time there will be something else, and companies that have not

> kept records of where employees were will suffer the same fate as many

> Australian companies over the last few years.

>

> Example 3. The dreaded consultants.

> How many organisation have had some consulting group come through and

> perform some 'assessment' followed by some recommendation about what

to

> do next?

>

> Most fortune 1000 sized companies.

>

> When McKinseys came through one of my clients what did they do? They

> had about 200 questions of various measures they wanted the company to

> produce, and their plan was to 'burn money' while they were waiting

for

> the answers. Interesting, they got one of the consultants, sat her and

> I

> down together, and the deal was she read out the questions and I

> answered them using DIS/Metaphor. We did the lot in about a week.

This

> was for a $A10B funds under management conglomerate of insurance

> companies. No small outfit. McKinseys were a little miffed at being

> given all their answers so quickly. The point is, when we designed

the

> DW, there was NO WAY we could have known the questions McKinseys were

> going to come in with 2 years later. If we only had dimensional models

> and not an archive we could not have done the job. The numbers

produced

> would be used to decide the fate of the company. It doesn't get any

> more important. The question at the time was whether the owners of the

> insurance company should get rid of it and move on. Minor business

> decision.

>

>

> Example 4. The board meeting.

> Who on this list has been a regular attendee at board meetings of big

> companies? For those that have not been this is what happens. Someone

> proposes spending umpteen gizzillion dollars on the latest and

greatest

> 'whatever' called project X and the board tries to decide yes or no.

>

> It is the job of board members to ask questions of the

directors/senior

> managers of the company to ensure that project X is in the interests

of

> the shareholders. To do this they ask questions. Rarely, if ever, are

> all the questions answered at one meeting. So the decision is delayed

> to the next board meeting (usually 2 months) so that the questions can

> be answered. And what happens at the next board meeting after this

> left over questions are answered? Surprise, surprise, more questions.

>

> And in big companies it's not just Project X, they have a whole

> alphabet of projects to decided on, on an ongoing basis?

>

> A dimensional model, which is extremely difficult to build in a

> non-lossy manner because of performance degradation, does not support

> this so well.

>

> At one client (many years ago), we put a DIS/Metaphor terminal in the

> board room and we sat a marketing analyst at it (not an IT person) and

> his job was to answer any question that came up in the board meeting

in

> a maximum of FIVE MINUTES.

>

> The benefit. The board could make decisions in an hour or two that

used

> to take 4-6 maybe 8 months. That very reduction in the time taken for

> the management decision making cycle has another name, 'competitive

> advantage'. If a company can 'change it's mind in 2 months rather than

> 6-8 months it hardly matters when they make the wrong decision because

> they can change it quickly enough.

>

> These cases are real cases I have worked on and I have seen many other

> similar cases that others have worked on.

>

> So, though the non-lossy archival data store is expensive,

complicated,

> difficult to understand, impossible to query with (most) tools the

> value of doing so is compelling to some organisations, particularly

Insurance

> companies and Banks.

>

> They are so hugely expensive that I tried personally for 4 years to

> figure out a way to do away with the separate archive and just keep a

> dimensional model. I failed. Some of my colleagues, people who are the

> 'best and brightest' also tried and failed.

>

> Sybase IWS succeeds. IWS enables the ability to store archival data in

> a

> non lossy form inside a dimensional model without the need for

> separation of archive and analytical data.

>

> Further IWS, as far as I am aware, is the first dimensional model that

> goes so fast it is possible to build the ODS on top of it. I know.

I've

> done it in a company with 18 million customer records. (Though still

> waiting to go into production last I heard.)

>

> As Neil points out, as hardware goes faster and faster there will

> likely be a collapsing of these separate physical databases. I believe

> that

in

> the 5 year time frame the ODS/EDW/Analytical Layers will be able to be

> collapsed into one single model based on IWS design techniques.

>

>

> So, all you 'DW' folks.

>

> The acid question of your DW efforts for you to ask yourself.

>

> If a DW is partly defined as 'supporting the management decision

making

> process', would you put the tools you have selected into the board

> meetings of your company and guarantee that you will answer any

> question the board cares to ask inside 5 minute?

>

> If the answer is 'no', who cares what modelling technique was used?

>

> That is 'supporting the management decision making process' at work.

>

> I wish more companies would do this. We might have a few less

> Enrons/Worldcoms etc.

>

> Best Regards

>  

> Peter Nolan

> Data Warehousing Consultant

> Mobile: +353 879 581 732

> Homepage: http://www.peternolan.com

>

>

> -----Original Message-----

> From: owner-dwlist@datawarehousing.com

> [mailto:owner-dwlist@datawarehousing.com] On Behalf Of jim stagnitto

> Sent: 22 July 2004 05:49

> To: dwlist@datawarehousing.com

> Subject: Re: dwlist: Outback Challenge

>

> http://www.DataWarehousing.com is sponsored by DataMirror, a leading

> provider of real-time data integration and resiliency solutions.

> Please visit our sponsor today at http://www.datamirror.com to access

> data warehousing white papers and best practices.

>

> For help with list commands, send a message to

> <mailto:dwlist-request@datawarehousing.com> with the word "help" in

> the body of the message.

>

> From: jim stagnitto <jimstag@comcast.net>

>

>

>

>

>

> Hi Nicholas -

>

> As always, a thoughtful letter, thanks.

>

> My issue with the CIF has never been that it is incapable of meeting

> business needs, just that it is so woefully inefficient in doing so.

> Not "right versus wrong" or "possible versus impossible" - rather

> "efficient versus wasteful". And this distinction, I submit, is

> neither "academic" nor "subtle" in nature. Or in business. The

> evolutionary ash heap is filled with interesting and sometimes

> beautiful creatures that were [key word] just a little less efficient

> than their competition.

>

> But the CIF - let's face it - is "dramatically" inefficient.

Consumes

>

> more, delivers less. From a ruthless Darwinian perspective: doomed I

> fear. It stubbornly persists [IMHO] largely because there still exist

> dwindling and isolated pockets within large corporations whose

> priorities, for whatever fleeting reasons, are artificially skewed

from

>

> those of survival of the fittest. Places where one might perceive -

> ultimately incorrectly - that money and time are something less vital

/

>

> real than oxygen and water.

>

> How many members of this community would knowingly implement a CIF

> architecture if building a data warehouse for a self-funded startup?

> Very few - I suspect. And these CIF-enabled startups would then be at

> a significant disadvantage to their better informed, more nimble, and

> more liquid competitors. Right versus wrong, possible versus

> impossible - interesting avenues of discussion certainly - but

> ultimately less relevant to what must survive, no?

>

> In the interest of lively discourse,

>

> --Jim Stagnitto

>

>

>

>

  Public Postings to Forums  DW List Postings  The compelling ...
Search  Forum Home       

Copyright 2002-2010 Peter Nolan   Terms Of Use  Privacy Statement
DotNetNuke® is copyright 2002-2012 by DotNetNuke Corporation