Hi David,
My recommendation to companies that do not know much about BI is to hire some people who do.....it is much cheaper as long as you get some good people....if you are in Brisbane I an point you to some colleagues in Sydney who can assist..
That said....
The old way of estimating, which most of the world still uses, is to determine number of files, number of fields, whether you will have STG, ODS, EDW and then start counting how many times files will be read/written....
Whether mappings will be simple, medium, hard is a matter for what you are trying to do and it varies.
In terms of time, when I was a PSM for Ardent we used 2 dimension table processing programs per day and one fact table per day as an estimation basis and it served us well...I believe it is still in use at IBM.
Some clients I know allow 2 days per dimension table and up to 4 days per fact table.....they actually take that long but I have no idea why...
I say 'old way' because through some breakthrough thinking we have improved these times dramatically....on my last DataStage project we were able to write 15-20 dimension processing jobs per day and around 5 fact table jobs per day.....including a staging area, 30 fact tables, 80 dimension tables and 4 weeks spend writing the processing templates and worrying about the mistakes we might have missed we built the whole thing in 8 person weeks....not bad.....about 3x faster than my previous best effort....
Before anyone asks...no....we do not release the information as to how we are able to wrote 20 dimension table processing jobs in a day.....either a company hires us to do it or the hire someone else who will write 2-3 per day......
With our software, of course, there is no ETL write time at all....through breakthrough thinking we have 'disappeared' ETL coding.....Landmark Grads will recognize the language... ;-)
We are now finding most of the problem is 'understanding the data'....and we are working on tools to cut the 'understanding the data'
time...
Best Regards
Peter
-----Original Message-----
From: David_Brisbane via dw-select
[
Sent: 10 August 2006 03:20
To: pete...
Subject: RE: [dw-select] How to best estimate the number of ELT jobs needed in a project
Thanks Vincent for the swift reply. I have already used the procedure
you supplied to make some estimates.
From: Vincent McBurney via dw-select
Sent: 09 August 2006 12:51
To: David_Brisbane
Subject: RE: [dw-select] How to best estimate the number of ELT jobs
needed in a project
I just blogged about this in
ating-etl-development-time-10940
A lot depends on your vertical banding. Are you carrying data through
from source to target in one job or are you breaking into bands such as
extract, prepare, load. Three jobs with data staged to file or dataset
of table between. The banding gives you a rollback strategy. Too many
bands makes the process too time consuming and development too complex.
No bands makes rollback and recovery difficult and makes jobs larger and
more complex.
In estimating I first map sources to targets at the table/file level.
This gives you a rough count of the number of jobs required:
- 1 extract job for each source.
- 1 prepare job for each source with possible union joins of some
sources.
- 1 insert job for each target.
- 1 update job for each target.
- Potentially one delete job, or one augment job, or one unaugment job
for each target.
Sometimes sources consolidate together during the prepare stage. Eg. A
header and child table may be merged into one stream of data for one
target table.
The insert/update/load/augment jobs can be shared. You should only need
one per target table regardless of how many sources you have.
Regarding simple/medium/complex I tried to put measures of these into
the blog. Extract and load jobs are almost always simple.
Prepare/transform/consolidate jobs are medium to complex. Fact jobs are
almost always complex. Very high volumes can push a job up a category,
a db update job becomes medium to complex for daily terabyte loads for
example.
The maturity of the source data is another factor on job complexity.
Data with poor metadata definition and lack of supporting documentation
pushes all your estimates out by a weighting factor. Poor source data
can double or triple your overall ETL effort. A round of interviews and
data investigation by business analysts can remove this overhead.
Regards
vincent
Sent: Wednesday, August 09, 2006 10:14 AM
To: McBurney, Vincent
Subject: [dw-select] How to best estimate the number of ELT jobs needed
in a project
I have discovered a few estimation tools that compute the amount of ETL
effort required to deliver a DW project based on the number of simple,
medium and complex ETL jobs that have to be designed, developed and
tested.
However, I have not found an estimation tool, paper, or rational for how
to estimate the number of simple, medium and complex ETL jobs in the
first place.
I was wondering what methods the people in this community typically use?
Is it based on the number of tables in the DW or based on something
else?