Search
Tuesday, February 07, 2012 ..:: Forums ::.. Register  Login
 Peter's Forums Minimize
SearchForum Home
     
  Public Postings to Forums  IT Toolbox Data Warehousing Postings  How to best est...
 How to best estimate the number of ELT jobs needed in a project
 
 11/16/2006 8:44:43 PM
User is offlinePeterNolan
380 posts
3rd


How to best estimate the number of ELT jobs needed in a project

Hi David,

My recommendation to companies that do not know much about BI is to hire some people who do.....it is much cheaper as long as you get some good people....if you are in Brisbane I an point you to some colleagues in Sydney who can assist..

That said....

The old way of estimating, which most of the world still uses, is to determine number of files, number of fields, whether you will have STG, ODS, EDW and then start counting how many times files will be read/written....

Whether mappings will be simple, medium, hard is a matter for what you are trying to do and it varies.

In terms of time, when I was a PSM for Ardent we used 2 dimension table processing programs per day and one fact table per day as an estimation basis and it served us well...I believe it is still in use at IBM.

Some clients I know allow 2 days per dimension table and up to 4 days per fact table.....they actually take that long but I have no idea why...

I say 'old way' because through some breakthrough thinking we have improved these times dramatically....on my last DataStage project we were able to write 15-20 dimension processing jobs per day and around 5 fact table jobs per day.....including a staging area, 30 fact tables, 80 dimension tables and 4 weeks spend writing the processing templates and worrying about the mistakes we might have missed we built the whole thing in 8 person weeks....not bad.....about 3x faster than my previous best effort....

Before anyone asks...no....we do not release the information as to how we are able to wrote 20 dimension table processing jobs in a day.....either a company hires us to do it or the hire someone else who will write 2-3 per day......

With our software, of course, there is no ETL write time at all....through breakthrough thinking we have 'disappeared' ETL coding.....Landmark Grads will recognize the language... ;-)

We are now finding most of the problem is 'understanding the data'....and we are working on tools to cut the 'understanding the data'

time...

Best Regards

Peter

www.peternolan.com

 

-----Original Message-----

From: David_Brisbane via dw-select

[mailto:dw-select@Groups.ITtoolbox.com]

Sent: 10 August 2006 03:20

To: pete...

Subject: RE: [dw-select] How to best estimate the number of ELT jobs needed in a project

 

Thanks Vincent for the swift reply. I have already used the procedure

you supplied to make some estimates.

 

 

-----Original Message-----

From: Vincent McBurney via dw-select

[mailto:dw-select@Groups.ITtoolbox.com]

Sent: 09 August 2006 12:51

To: David_Brisbane

Subject: RE: [dw-select] How to best estimate the number of ELT jobs

needed in a project

 

 

I just blogged about this in

http://blogs.ittoolbox.com/bi/websphere/archives/my-wiki-wiki-ways-estim

ating-etl-development-time-10940

A lot depends on your vertical banding. Are you carrying data through

from source to target in one job or are you breaking into bands such as

extract, prepare, load. Three jobs with data staged to file or dataset

of table between. The banding gives you a rollback strategy. Too many

bands makes the process too time consuming and development too complex.

No bands makes rollback and recovery difficult and makes jobs larger and

more complex.

In estimating I first map sources to targets at the table/file level.

This gives you a rough count of the number of jobs required:

- 1 extract job for each source.

- 1 prepare job for each source with possible union joins of some

sources.

- 1 insert job for each target.

- 1 update job for each target.

- Potentially one delete job, or one augment job, or one unaugment job

for each target.

Sometimes sources consolidate together during the prepare stage. Eg. A

header and child table may be merged into one stream of data for one

target table.

The insert/update/load/augment jobs can be shared. You should only need

one per target table regardless of how many sources you have.

Regarding simple/medium/complex I tried to put measures of these into

the blog. Extract and load jobs are almost always simple.

Prepare/transform/consolidate jobs are medium to complex. Fact jobs are

almost always complex. Very high volumes can push a job up a category,

a db update job becomes medium to complex for daily terabyte loads for

example.

The maturity of the source data is another factor on job complexity.

Data with poor metadata definition and lack of supporting documentation

pushes all your estimates out by a weighting factor. Poor source data

can double or triple your overall ETL effort. A round of interviews and

data investigation by business analysts can remove this overhead.

Regards

vincent

-----Original Message-----

From: David_Brisbane via dw-select

[mailto:dw-select@Groups.ITtoolbox.com]

Sent: Wednesday, August 09, 2006 10:14 AM

To: McBurney, Vincent

Subject: [dw-select] How to best estimate the number of ELT jobs needed

in a project

 

 

I have discovered a few estimation tools that compute the amount of ETL

effort required to deliver a DW project based on the number of simple,

medium and complex ETL jobs that have to be designed, developed and

tested.

However, I have not found an estimation tool, paper, or rational for how

to estimate the number of simple, medium and complex ETL jobs in the

first place.

I was wondering what methods the people in this community typically use?

Is it based on the number of tables in the DW or based on something

else?

  Public Postings to Forums  IT Toolbox Data Warehousing Postings  How to best est...
Search  Forum Home       

Copyright 2002-2010 Peter Nolan   Terms Of Use  Privacy Statement
DotNetNuke® is copyright 2002-2012 by DotNetNuke Corporation