ibi4u.blogspot.com: September 2008

Sunday, September 28, 2008

how to open existing project from other m/c

How to open existing project from other user's machine and how to save the changes.
how to run the jobs daily please tell me in detail

ANS:
u can open others projects from ur machines by check out that project( by using AIR commands) then u can get the latest version of that project.
after that u can modify or not that version and then check in that project from ur machine.
u can run the graphs by using job scheduling using cron tab on daily basis.
ok naa i think this is right answer
any one write the correct answer if this is wrong.

Wednesday, September 10, 2008

Hindhi video songs hrithik roshan songs

CODE

http://www.qshare.com/get/335984/Drona.Title.Song-By.Subhash.avi.html

http://rapidshare.com/files/102311308/aao_suno_pyar_ki_ek_kahani-Krish.avi.html
http://rapidshare.com/files/102311305/Aap_Mujhe_ache_Lagne_Lage.mpg.html
http://rapidshare.com/files/102311307/Aaye_Dil_Dil_ki_Dniya_Mein-Yaadein.avi.html
http://rapidshare.com/files/102311304/agar_mein_kaho.avi.html
http://rapidshare.com/files/102311306/An_Dekhi_Anjani_si-Mujhe_dosti_Karoge.avi.html
http://rapidshare.com/files/102311445/Aye_Mere_Dil_tu_ga_jaa-Kaho_Na_Pyar_Hai.avi.html
http://rapidshare.com/files/102311607/Bhoomoro_Bhoomoro-Mission_Kashmir.avi.html
http://rapidshare.com/files/102311684/Bole_Churiyan-kabhi_Khushi_kabhi_Ghum.avi.html
http://rapidshare.com/files/102311689/Chamakati_Shaam_hai-Yaadein.avi.html
http://rapidshare.com/files/102311710/Chand_Sitre-Kaho_Na_Pyar_Hai.avi.html
http://rapidshare.com/files/102311934/Chanda_Taare-Yaadein.avi.html
http://rapidshare.com/files/102312044/chori_chori-Krish.avi.html
http://rapidshare.com/files/102312079/Chupke_se_sun-mission_Kashmir.avi.html
http://rapidshare.com/files/102312209/crazy_kiya_re-Dhoom_2.avi.html
http://rapidshare.com/files/102312271/Deewana_hai_dekho-kabhi_Khushi_kabhi_Ghum.avi.html
http://rapidshare.com/files/102312274/Dhoom.avi.html
http://rapidshare.com/files/102312368/Dil_dushman_ke_hilte_hain-Lakshya.avi.html
http://rapidshare.com/files/102312458/Dil_ko_Kiya_Samjaho.avi.html
http://rapidshare.com/files/102312503/Dil_Laga-Dhoom_2.avi.html
http://rapidshare.com/files/102312556/dil_na_diya-krish.avi.html
http://rapidshare.com/files/102312576/Dil_Ne_Dil_Ko_Pukara_Kaho_Na_Pyar_Hai.avi.html
http://rapidshare.com/files/102312624/haye_allah-Koi_Mil_gaya.avi.html
http://rapidshare.com/files/102312680/I_am_In_love-Kaho_Na_Pyar_Hai.avi.html
http://rapidshare.com/files/102312967/in_penchio_ko_dekh_kar-Koi_Mil_Gaya.avi.html
http://rapidshare.com/files/102313000/indhar_chala_mein-Koi_Mil_Gaya.avi.html
http://rapidshare.com/files/102313005/Ja_Sanam_Tujhko-.mpg.html
http://rapidshare.com/files/102313154/Jaaneman_Jaaneman-Kaho_na_Pyar_Hai.avi.html
http://rapidshare.com/files/102313391/Jab_Dil_mile-Yaadein.avi.html
http://rapidshare.com/files/102313448/Jane_Dil_Mein_Kab_se_hai_Tu-Mujhe_dosti_Karoge.avi.html
http://rapidshare.com/files/102313513/judo_judo-Koi_Mil_Gaya.avi.html
http://rapidshare.com/files/102313585/Kahe_Do_ke_Tum_mujhe_se_dosti_Karoge-Mujhe_dosti_Karoge.avi.html
http://rapidshare.com/files/102313655/Kaho_Na_Pyar_Hai.avi.html
http://rapidshare.com/files/102313804/koi_mil_gaya.avi.html
http://rapidshare.com/files/102313914/Koi_tum_sa_nahin-Krish.avi.html
http://rapidshare.com/files/102313957/Kuch_Saal_Phele_Doosto_yeh_Baat__howi_thi-Yaadein.avi.html
http://rapidshare.com/files/102313972/Main_aisa_Kyun_hoo.avi.html
http://rapidshare.com/files/102314026/Melody-Mujhe_dosti_Karoge.avi.html
http://rapidshare.com/files/102314187/Oh_my_darling_I_love_u-Mujhe_dosti_Karoge.avi.html
http://rapidshare.com/files/102314211/pardeh_ke_uthe_hi-Koi_Mil_Gaya.avi.html
http://rapidshare.com/files/102314322/Pyar_Ki_kisthi_Mein-Kaho_Naa_Pyar_hai.avi.html
http://rapidshare.com/files/102314543/Sad_Version_Of_Dil_Ne_Dil_Ko_Pukara-Kaho_Na_Pyar_Hai.avi.html
http://rapidshare.com/files/102314616/Shawnveli_si_ek_larki-Mujhe_dosti_Karoge.avi.html
http://rapidshare.com/files/102314735/Yaadein_Yaad_Aati_hain.avi.html
http://rapidshare.com/files/102314878/Yeh_Rasta_hai_Tera-Lakshya.avi.html
http://rapidshare.com/files/102314924/You_are_My_Sonia-kabhi_Khushi_kabhi_Ghum.avi.html

Hindhi video songs

http://rapidshare.com/files/140103993/Lover_Boy.avi
http://rapidshare.com/files/140105214/Mausam_Achanak.avi
http://rapidshare.com/files/140102987/Meelon_Ka.avi
http://rapidshare.com/files/140101325/Meelon_Ka_V2.avi
http://rapidshare.com/files/140101961/Milo_Na_Milo.avi
http://rapidshare.com/files/140100860/Sach_Kehna.avi

Gangester:

Bheegi.Bheegi
http://rapidshare.com/files/132174015/Bheegi.Bheegi-By.Subhash.mkv

Lamha.Lamha
http://rapidshare.com/files/132180029/Lamha.Lamha-By.Subhash.mkv

Mujhe.Mat.Roko
http://rapidshare.com/files/132172100/Mujhe.Mat.Roko-By.Subhash.mkv

Tu.Hi.Meri.Shab.Hai
http://rapidshare.com/files/132174569/Tu.Hi.Meri.Shab.Hai-By.Subhash.mkv

Ya.Ali
http://rapidshare.com/files/132174874/Ya.Ali-By.Subhash.mkv

dhoom2:
razy.Kiya.Re
http://rapidshare.com/files/132183345/Crazy.Kiya.Re-By.Subhash.avi

Crazy.Kiya-Remix
http://rapidshare.com/files/132182332/Crazy.Kiya-Remix-By.Subhash.avi

Dhoom.Again-1
http://rapidshare.com/files/132183168/Dhoom.Again-1-By.Subhash.avi

Dhoom-Again-2
http://rapidshare.com/files/132181717/Dhoom-Again-2-By.Subhash.avi

Dhoom.Again-Deleted
http://rapidshare.com/files/132183324/Dhoom.Again-Deleted-By.Subhash.avi

Dil.Laga.Na
http://rapidshare.com/files/132182902/Dil.Laga.Na-By.Subhash.avi

Dont.Touch.Me
http://rapidshare.com/files/132181947/Dont.Touch.Me-By.Subhash.avi

My.Name.Is.Ali
http://rapidshare.com/files/132181806/My.Name.Is.Ali-By.Subhash.avi

Friday, September 05, 2008

Abinitio links

http://www.bi-dw.info/abinitio.htm

Abinitio links

WELCOME TO ABINITIO ON FLY

Thursday, September 04, 2008

CAT Material

Testing web sites

http://site24x7.com/index.html?resellerid=AE8Bje8F&gclid=CI-cvvipwpUCFQoNewodRQyLQA
http://www.soft.com/eValid/Technology/White.Papers/website.testing.html
http://www.softwareqatest.com/qatweb1.html

Data Modelling

http://www.embarcadero.com/jive/kbcategory.jspa?categoryID=3

Data Modelling

Wednesday, September 03, 2008

Happy Vinayaka Chavithi for all MY FRIENDS

Hi
Good Morning. To Day is special day For every one.

Start New Things what you want In u r life. Before That you can do Ganesh pooja.

Here Links r Available . GO Through This Links You Can acheive Your Goals.

http://www.telugucomedyclub.com/vinayaka-chavithi-vratha-kalpam-downloads
http://www.teluguone.com/splevents/ganesh/vinayakaPuja.jsp
http://www.euroandhra.com/ganesh/OnlinePooja.html
http://www.teluguwebsite.com/Telugu_Pandugalu.html
http://www.harsamay.com/

Tuesday, September 02, 2008

What is Abinitio

Ab Initio Is is a Latin word , Meaning From the beginning.

Ab Initio software helps you build large-scale data processing applications and run them in parallel environments. Ab Initio software consists of two main programs:

· Co>Operating System, which your system administrator installs on a host UNIX or Windows NT Server, as well as on processing nodes. (The host is also referred to as the control node.)

· Graphical Development Environment (GDE), which you install on your PC (client node) and configure to communicate with the host (control node).

PICTURE
BASIC TERMS of In Ab Initio

What is Dataset ?

In Simple terms dataset is a file. A file can be main frame file or any fixed or delimited files. There are various types of datasets
FIDXED EBCDIC
FIXED ASCII
DELIMITED ASCII
SAS dataset etc..

You can also think of dataset as a table in database world.

What is Component?

Component is Ab Initio Program
There are various components like SELECT, FILTER, SORT, JOIN, MERGE, DEDUP, ROLLUP, SCAN, USER DEFINED etc…

What is Port?

A port is a connection point for the input or output to a component.

What is Flow?

A flow carries a stream of data between components in a graph. Flows connect components via ports. Ab Initio supplies four kinds of flows with different patterns: straight, fan-in, fan-out, and all-to-all. We will discuss various kinds of flows as we go through this training.

What is Graph ?

A graph is a diagram that defines the various processing stages of a task and the streams of data as they move from one stage to another. Visually, stages are represented by components and streams are represented by flows. The collection of components and flows comprise an Ab Initio graph.

What is Field ?
A filed is Equivalent to column of a table in Database world. Filed is also called variable which holds value.

What is Key ?

Key is used many places in Ab Initio Development , We use key to sort the data , Join two files , Rollup data etc..
See the Below Graph Which explained Basic Terms.
AB Initio DML’s

DML is an acronym for Data Manipulation Language in Ab Initio World. It is the Ab Initio programming language you use to define record formats. DML is equivalent to DDL in traditional databases. DML tells Ab Initio how to interpret your data.

Following list are various types of DML’s

Delimited.dml

record
decimal('') cust_id;
string(‘') last_name;
string(‘') first_name;
string(‘') street_addr;
string(‘') state;
decimal(‘') zip;
string(‘') gender;
decimal('\n') income;
end

Example of data :
297457AlexNeil Steven149 Inkwell St.KY40541M0073900
901288AndrusTom165 Eboli St.WY60313M0492500
662197BannonJeffrey C21 Compuserve St.CO70307M0140200

denorm.dml

record
decimal(5) custid;
decimal(3) num_trans;
record
date("YYYYMMDD") dt;
decimal(6.2) amount;
end transactions[num_trans];
string(1) newline;
end

Example Of Data: -

12345 219970204 5.9519970209125.05
14521 119970205 15.50
12341 0
12341 319970202 9.9019970206 12.2319970210 62.75

ebcdic.dml

record
ebcdic decimal(6) cust_id;
ebcdic string(18) last_name;
ebcdic string(16) first_name;
ebcdic string(26) street_addr;
ebcdic string(2) state;
ebcdic decimal(5) zip;
ebcdic string(1) gender;
ebcdic decimal(7) income;
ebcdic string(1) newline;
end
// Rather than using the "ebcdic" keyword in every text field,
// this example uses the "ebcdic" keyword in front of the "record"
// keyword, affecting all text fields.

ebcdic record
decimal(6) cust_id;
string(18) last_name;
string(16) first_name;
string(26) street_addr;
string(2) state;
decimal(5) zip;
string(1) gender;
decimal(7) income;
string(1) newline;
end

Example Of data
òù÷ôõ÷Á“…§@@@@@@@@@@@@@@Õ…‰“@â£…¥…•@@@@@ñôù@É•’¦…““@â£K@@@@@@@@@@@ÒèôðõôñÔðð÷óùðð%ùðñòøøÁ•„™¤¢@@@@@@@@@@@@ã–”@@@@@@@@@@@@@ñöõ@Å‚–“‰@

fixed.dml

record
decimal(6) cust_id; // Customer ID
string(18) last_name; // Last name
string(16) first_name; // First name
string(26) street_addr; // Street address
string(2) state; // State
decimal(5) zip; // Zipcode
string(1) gender; // Gender (M = male; F = female)
decimal(7) income; // Income (in dollars)
string(1) newline;
end

Example Of data

297457Alex Neil Steven 149 Inkwell St. KY40541M0073900
901288Andrus Tom 165 Eboli St. WY60313M0492500
662197Bannon Jeffrey C 21 Compuserve St. CO70307M0140200

unix-text.ml

string("\n")

Example Of data

This is text as you might
find it on a computer running a

win-text.dml

string("\r\n")

Refrer to the Graph Types.mp to define dml’s and View the data

FILTER BY EXPRESSION

Following Graph is our First Ab Initio Graph. This graph processes a file to produce customers whose income is greater than $5000.

INPUT DML: -
record
decimal(6) cust_id; // Customer ID
string(18) last_name; // Last name
string(16) first_name; // First name
string(26) street_addr; // Street address
string(2) state; // State
decimal(5) zip; // Zipcode
string(1) gender; // Gender (M = male; F = female)
decimal(7) income; // Income (in dollars)
string(1) newline;
end

See the Attached Graph.

Code: -
This Graph contains no code.

Performance / Interview Question?
Don’t use filter by Expression. Most of the components has embedded filter by expression called select expression use embedded select instead of Filter by expression if possible to improve performance.

Exercise: Use Filter By Express to split the above customer information into two files one with Males and Females.

Transformation with Reformat (XFR)

What is XFR?
You write your code (logic) in XFR. Your code file extension is ..xfr. All the Transform components use XFR to run.

REFORMAT COMPONENT: -

As name suggests reformat means changing the format of input file and produce the desired output file. Example if you have 10 fields in input file you want out put file with 5 out of those 10 input fields then you will use reformat component.

With reformat you can derive new fields. See the examples below.

Example 1: Customer info Dml

record
decimal(6) cust_id; // Customer ID
string(18) last_name; // Last name
string(16) first_name; // First name
string(26) street_addr; // Street address
string(2) state; // State
decimal(5) zip; // Zipcode
string(1) gender; // Gender (M = male; F = female)
decimal(7) income; // Income (in dollars)
string(1) newline;
end

INPUT DATA
297457AlexNeil Steven149 Inkwell St.KY40541M0073900
901288AndrusTom165 Eboli St.WY60313M0492500
662197BannonJeffrey C21 Compuserve St.CO70307M0140200

Reformat above DML like this

record
decimal(6) cust_id; // Customer ID
string(18) last_name; // Last name
string(16) first_name; // First name
string(1) gender; // Gender (M = male; F = female)
decimal(7) income; // Income (in dollars)
string(1) newline;
end

YOUR OUTPUT DATA LOOK’s LIKE this

297457AlexNeil StevenM0073900
901288AndrusTomM0492500662197BannonJeffrey CM0140200
XFR Code For Above Example: -

/*Reformat operation*/
out::reformat(in) =
begin
out.cust_id :: in.cust_id;
out.last_name :: in.last_name;
out.first_name :: in.first_name;
out.gender :: in.gender;
out.income :: in.income;
out.newline :: in.newline;
end;

Reformat Example With Derived Field :-

Reformat customer DML like this , We are deriving new field called full_address which is concatenation Street_address , state, zip into one line

record
decimal(6) cust_id; // Customer ID
string(18) last_name; // Last name
string(16) first_name; // First name
String(33) Full_address ---------DERIVED FILED,
string(1) gender; // Gender (M = male; F = female)
decimal(7) income; // Income (in dollars)
string(1) newline;
end

XFR Code For Above Example: -

/*Reformat operation*/
out::reformat(in) =
begin
out.cust_id :: in.cust_id;
out.last_name :: in.last_name;
out.first_name :: in.first_name;
out.Full_address :1: string_concat ( in.street_addr,in.state,in.zip);
out.Full_address :2: “NO Address Found”;
out.gender :: in.gender;
out.income :: in.income;
out.newline :: in.newline;
end;

In the above code string_concat is Ab Initio built-in function, Read help for all built in function. They are similar to C-Programming functions. Also note Priority Assignments 1 and 2 are like Case statements in SQL, If 1 is success take it else use 2.

Caution: - AB INITIO DML names are case sensitive.
Generate Records: -

Generate Records generates a specified number of data records with fields of specified lengths and types.

You can let Generate Records generate random values within the specified length and type for each field, or you can control various aspects of the generated values using command line option of Generate Records component. Typically, the output of Generate Records is used for testing a graph.

Example: -

Input DML:- record decimal(6) cust_id; string(18) last_name; string(16) first_name; string(26) street_addr; decimal(2) state_code; decimal(5) zip; string(1) gender="M"; decimal(7) income; date("MM/DD/YYYY") dob; string(1) newline="\n"; end Set num_records option to 10000 Set command Line option as follows:- -sequential cust_id 350000 -minimum state_code 1 -maximum state_code 50 -minimum income 100 –maximum income 100000 -default gender -default newline

Above Command line telling generate records component to generate 10,000 records , generate cust_id’s sequentially starting from 350,000 , set state_code between 1 to 50 and income between 100 to 100000 and keep default values for gender and newline.
Exercises for reformat: - Generate 50000 records with above DML in /data/abwork/your_dir/in_file1.dat use following command line and DML -sequential cust_id 350000 -minimum state_code 1 -maximum state_code 50 -minimum income 100 –maximum income 100000 -default gender -default newline record decimal(6) cust_id; string(18) last_name; string(16) first_name; string(26) street_addr; decimal(2) state_code; decimal(5) zip; string(1) gender="M"; decimal(7) income; date("MM/DD/YYYY") dob; string(1) newline="\n"; end Generate 100000 records with above DML in /data/abwork/your_dir/in_file2.datuse following command line and DML -sequential cust_id 350000 -minimum state_code 1 -maximum state_code 50 -minimum income 100 –maximum income 100000 -default gender -default newline record decimal(6) cust_id; string(18) last_name; string(16) first_name; string(26) street_addr; decimal(2) state_code; decimal(5) zip; string(1) gender="F"; decimal(7) income; date("MM/DD/YYYY") dob; string(1) newline="\n"; end Develop Following Graphs Exercise 1 ( GRAPH 1) : - Use Unix cat command to make above generated data into one file like this Cd /data/abwork/your_dirCat in_file1.dat in_file2.dat >> in_file.dat Use following input DML to map in_file.dat input DML :- record decimal(6) cust_id; string(18) last_name; string(16) first_name; string(26) street_addr; decimal(2) state_code; decimal(5) zip; string(1) gender; decimal(7) income; date("MM/DD/YYYY") dob; string(1) newline="\n"; endDefine Output file with following DML Set your output file to file:/data/abwork/your_dir/reform_ex1.out OUT PUT DML :- record decimal(6) cust_id; string(18) last_name; string(16) first_name; string(26) street_addr; decimal(2) state_code; decimal(5) zip; string(1) gender; decimal(7) income; date("MM/DD/YYYY") dob; decimal(3) age; string(1) minor_flag; string(1) newline="\n"; end 1) Derive a field called Age using his dob use following expression to get age ((date("MM/DD/YYYY"))"03/17/2003" - (date("MM/DD/YYYY"))in0.dob); 2) Derive a field called minor_flag , Set this flag to "Y" if age is less than 18 or set it to "N" isage is >= 18 Exercise 2 (GRAPH 2): - INPUT DML ( same as Above Example) record decimal(6) cust_id; string(18) last_name; string(16) first_name; string(26) street_addr; decimal(2) state_code; decimal(5) zip; string(1) gender; decimal(7) income; date("MM/DD/YYYY") dob; string(1) newline="\n"; end OUTPUT DML record decimal(6) cust_id; string(18) last_name; string(16) first_name; string(26) street_addr; decimal(2) state_code; decimal(5) zip; string(1) gender; decimal(7) income; date("MM/DD/YYYY") dob; decimal(5) score; string(1) newline="\n"; end Based on his gender derive a filed called score, Business logic to derive score is if (in.gender == "M") score = income / 2000;if (in.gender == "M") score = income / 2000;if (in.gender == "F") score = income / 2000 + 500; Exercise 3 (GRAPH 2): - INPUT DML ( same as Above Example) record decimal(6) cust_id; string(18) last_name; string(16) first_name; string(26) street_addr; decimal(2) state_code; decimal(5) zip; string(1) gender; decimal(7) income; date("MM/DD/YYYY") dob; string(1) newline="\n"; end OUTPUT DML :- record decimal(6) cust_id; string(18) last_name; string(16) first_name; string(26) street_addr; decimal(2) state_code; decimal(5) zip; string(1) gender; decimal(7) income; date("MM/DD/YYYY") dob; decimal(2) dayb; decimal(2) monthb; decimal(2) yearb; string(1) newline="\n"; end Use data functions to find day born and month born and year born of above customers into derived fields dayb, monthb, yearb respectively.

ROLLUP
What is rollup?

Rollup summarize groups of data records. It is like Group by operation in SQL. The Best Way to understand rollup is using an example.

Let us say you have input dataset with following DML

The input dataset has records of this format:

record
string(" ") cust_name;
decimal(" ") purchase;
decimal(" ") age;
string("\n") coupon;
end;

A group of records like this:

Cust_name purchase age coupon
----- -------- --- ------
Steve 100 13 Y
Steve 200 34 N
Kathy 200 38 N
Kathy 400 70 N

We would like to rollup these records by the key field to produce Records of this format:

record
string(" ") cust_name;
decimal(" ") total_purchases;
string("\n") ever_used_coupon;
end;

We want to see the output like this

Cust_name total_purchases ever_used_coupon

Steve 300 Y
Kathy 600 N

The total purchases field will contain the sum of all of the purchase field values for all records with the same key. The ever_used_coupon field will be "Y" if the customer uses a coupon in any transaction, or "N" otherwise.

See the Graph below:-

In this Graph we are using Sort Component, which is required before rollup. Sort requires a key to sort on, We set the key in this example as cust_name . We connect a flow from sort to rollup component. Rollup component requires a key to group the records, In this example we set the key to rollup component as cust_name and we write a transformation code as follows
// While we are doing the rollup for each unique key,
// we keep the following information around:

type temporary_type =
record
decimal("\266") total_purchases;
string("\266") ever_used_coupon;
end;

// The initialize function sets up the initial temporary record.

temp :: initialize(in) =
begin
temp.total_purchases :: 0;
temp.ever_used_coupon :: "N";
end;

// The rollup function does the work for each record in the group
// with the same key.

out :: rollup(temp, in) =
begin

temp.total_purchases :: temp.total_purchases + in.purchase;
temp.ever_used_coupon :1: if ( temp.ever_used_coupon == "Y") "Y";
temp.ever_used_coupon :2: in.ever_used_coupon;

end;

// The finalize function produces the output record from the temporary
// record and the last input record in the group.

out :: finalize(temp, in) =
begin

out.cust_name :: in.cust_name;
out.ever_used_coupon :: in.ever_used_coupon;
out.total_purchases :: in.total_purchases;

end

Rollup component reads one record at a time from sort and compare current cust_name to next cust_name , if they are same then Rollup function in above XFR does the work for each record in the group. The important thing to understand here is rollup operates on each group. Every record in the group loops through rollup function in above XFR.

JOIN
Join performs inner, outer joins with multiple input datasets.

There 3 types in Ab Initio

Inner Join, which is default
Explicit Join, Which is again divided into to left outer join and right outer join
Full outer Join.

All the joins require a Key, if you are joining two tables then the name of the Joining key in DML must be same, If not you have to use over ride key option in side the join.
If there is no key then it is called cartition join, you can do this by setting the key value to {}.

You can use unused ports to achieve A MINUS B and B MINUS A on any two files.

Joins can be done in memory by setting In-Memory option of Join. When you do In-Memory option you have to set the driving table. Join loads all the tables into memory except the driving table and performs the join in memory.

See The Example of Graph below and notice various components
MFS AND Parallelism

Parallelism:-

There Are 3 types
Component
Pipeline
Data

Component
A graph running simultaneously on separate data using different components like Filter, Rollup, Join etc in same phase is called Component Parallelism.

Pipeline
Each component in the pipeline continuously reads from upstream components, processes data, and writes to downstream components. Since a downstream component can process records previously written by an upstream component, both components can operate in parallel.

Component and Pipeline Parallelisms are default in Ab Initio, Programmer don’t have any control on these Parallelisms.

Data

Data Parallelism is achieved using Multi File System (MFS). A multifile is a parallel file that is composed of individual files on different disks and/or nodes. The individual files are partitions of the multifile. Each multifile contains one control partition and one or more data partitions. Control partition will have pointers to data partition.

If there 4 data partition then MFS is called 4 Way MFS
If you have 8 data partition then that MFS is called 8 Way MFS and so on.

Multi File System (MFS) is created using a command called m_mkfs , And deleted using m_rmfs

Following command outlines how to create a MFS

m_mkfs //razzle/data/abwork/traing/b1/my_4way \
//razzle/data/abwork/traing/b1/d1 \
//razzle/data/abwork/traing/b1/d2 \
//razzle/data/abwork/traing/b1/d3 \
//razzle/data/abwork/traing/b1/d4 \

cd /data/abwork/traing/b1/
chmod 777 my_4way
chmod 777 my_4way /.WORK

m_touch my_4way/x.txt

The First line MFS is called control partition, you write all you files in control partition by specify your output file name in output file compoent.

To send a single file to Multifile we have to use partitioning components. There are various partitioning components.

Partition by Key :- distributes data records to its output flow partitions according to key values.
Partition by Expression :- distributes data records to its output flow partitions according to a specified DML expression.
Partition by Percentage:- distributes a specified percentage of the total number of input data records to each output flow.
Partition by Range :- distributes data records to its output flow partitions according to the ranges of key values specified for each partition.
Partition by Round-robin :- distributes data records evenly to each output flow in round-robin fashion.
Partition with Load Balance:- distributes data records to its output flow partitions, writing more records to the flow partitions that consume records faster.
Broadcast: - Broadcast can act like replicate but it does more than replicate. Boadcast can be used to send single file into MFS with out splitting. I.e if you broadcast small file with 10 records in to 4 way, Broadcast send 1 copy of 10 records to all 4 data partitions.

To convert Multifile into single file we have to use Departitioning components. There are various Departition components.

Concatenate : appends multiple flow partitions of data records one after another.
Gather: combines data records from multiple flow partitions arbitrarily.
Interleave : combines blocks of data records from multiple flow partitions in round-robin fashion.
Merge : combines data records from multiple flow partitions that have been sorted according to the same key specifier and maintains the sort order.

Abinitio Glossary

AB INITIO GLOSSARY

Ad-hoc Multifile: A parallel dataset created by naming a set of serial files as its partitions. These partitions can be named by explicitly listing the serial files, or by using a shell expression that expands at runtime to a list of serial files.

Co>Operating System: The Co>Operating System is core software that unites a network of computing resources-CPUs, storage disks, programs, datasets-into a production-quality data processing system with scalable performance and mainframe reliability. The Co>Operating System is layered on top of the native operating systems of a collection of servers. It provides a distributed model for process execution, file management, process monitoring, checkpointing, and debugging.

Component parallelism: A graph with multiple processes running simultaneously on separate data uses component parallelism.

Checkpoint is a phase that acts as an intermediate stopping point in a graph to safeguard against failures. By assigning phases with checkpoints to a graph, you can recover completed stages of the graph if failure occurs.

Control partition is the file in a multifile that contains the locations (URLs) of the multfile's data partitions.

Data parallelism: A graph that deals with data divided into segments and operates on each segment simultaneously uses data parallelism. Nearly all commercial data processing tasks can use data parallelism. To support this form of parallelism, Ab Initio provides Partition components to segment data, and Departition components to merge segmented data back together.

Deadlock occurs when a program cannot progress. It depends on the patterns of the data and typically occurs in graphs with data flows that split and then join.

A graph carries a potential deadlock when flows diverge and converge within the same phase. If the flows converge at a component that reads its input flows in a particular order, that component may wait for records to arrive on one flow even as the unread data accumulates on others because components have a limited buffering capacity.

DML is an acronym for Data Manipulation Language. It is the Ab Initio programming language you use to define record formats (which are kinds of types), expressions, transform functions, and key specifiers. Expressions include a large number of built-in functions. Files with .dml extensions contain record format definitions.

Export: You can export or pass on properties (parameters, layouts, or ports) from components to graphs or subgraphs. This creates a graph level parameter that is referenced by the original parameter in the component.

If you export properties from two different components and give them the same name, the properties will have the same value. A typical use is to export a key parameter from a Partition by Key and a Sort, so they have the same value.

Fan-in flows connect components with a large number of partitions to components with a smaller number of partitions. The most common use of fan-in is to connect flows to Departition components. This flowpattern is used to merge data divided into many segments back into a single segment, so other programs can access the data.

When you connect a component running in parallel to any component via a fan-in flow, the number of partitions of the original component must be a multiple of the number of partitions of the component receiving the fan-in flow. For example, you can connect a component running 9 ways parallel to a component running 3 ways parallel, but not to a component running 4 ways parallel.

To deal with the latter case, Repartition the data by inserting one of the Partition components between the components. This turns the fan-in-flow into an All-to-all flow, allowing a record from any partition of one component to flow into any partition of the other component.

Fan-out flows connect components with a small number of partitions to components with a larger number of partitions. The most common use of fan-out is to connect flows from partition components. This flow pattern is used to divide data into many segments for performance improvements.

When you connect a Partition component running in parallel to another component running in parallel via a fan-out flow, the number of partitions of the component receiving the fan-out flow must be a multiple of the number of partitions of the Partition component. For example, you can connect a Partition component with 3 partitions via a fan-out flow to a component with 9 partitions, but not to a component with 10 partitions.

To deal with the later case, Repartition the data by inserting a Departition component after the Partition component. This turns the fan-out flow into an All-to-all flow, allowing a record from any partition of the Partition component to flow into any partition of the target component.

Flow: A flow carries a stream of data between components in a graph. Flows connect components via ports. Ab Initio supplies four kinds of flows with different patterns: straight, fan-in, fan-out, and all-to-all.

Graph is a diagram that defines the various processing stages of a task and the streams of data as they move from one stage to another. Visually, stages are represented by components and streams are represented by flows. The collection of components and flows comprise an Ab Initio graph. You can create graphs from the main menu of the GDE.

GDE: The Graphical Development Environment (GDE) provides a graphical user interface into the services of the Co>Operating System.

Layout: A layout is a list of host and directory locations, usually given by the URL of a multifile. If the locations are not in a multifile, the layout is a list of URLs called a custom layout.

A program component's layout lists the hosts and directories in which the component runs. A dataset component's layout lists the hosts and directories in which the data resides. Layouts are set on the Properties Layout tab.

The layout defines the level of parallelism. Parallelism is achieved by partitioning data and computation across processors.

Ab Initio uses layout markers to show the level of parallelism on components. If the GDE can determine the level of parallelism, it uses that level as the marker. For example, non-parallel files have a marker of 1. If the GDE cannot determine the level of parallelism, it abbreviates layouts as L1, L2, and so on. An asterisk next to a layout marker (L1*) indicates a propagated layout.

.mdc file: A file with an .mdc file extension represents a Dataset or custom dataset component.

Multifile is a parallel file that is composed of individual files on different disks and/or nodes. The individual files are partitions of the multifile. Each multifile contains one control partition and one or more data partitions. Multifiles are stored in distributed directories called multidirectories.

The data in a multifile is usually divided across partitions by one of these methods:
Random or roundrobin partitioning
Partitioning based on ranges or functions, or
Replication or broadcast, in which each partition is an identical copy of the serial data.

A partition is a file that is a portion of a multifile. A partition is a segment of a parallel computation.

Phase is a stage of a graph that runs to completion before the start of the next stage. By dividing a graph into phases, you can save resources, avoid deadlock, and safeguard against failures.

If a graph has deadlock potential, for example in a merge-split situation, use different phases in the upstream and downstream components to stagger the reading and writing to disk.

To protect a graph, all phases are checkpoints by default. A checkpoint is a special kind of phase that saves status information and allows you to recover from failures.

Pipeline Parallelism: A graph with multiple components running simultaneously on the same data uses pipeline parallelism.

Each component in the pipeline continuously reads from upstream components, processes data, and writes to downstream components. Since a downstream component can process records previously written by an upstream component, both components can operate in parallel.

Record format is either a DML file or a DML string that describes data. You set record formats on the Properties Parameters tab of the Properties dialog box. A record format is a type applied to a port.

Repartition data is to change the degree of parallelism or the grouping of partitioned data. For instance, if you have divided skewed data, you can repartition using a Partition by Key connected to a Gather with an All-to-all flow.

Sandbox: A sandbox is a collection of graphs and related files that are stored in a single directory tree, and treated as a group for purposes of version control, navigation, and migration. A sandbox can be a file system copy of a repository project.

Skew refers to lopsided data storage or program execution. Causes of skew range from uneven input data to different loads on different processors. Repartitioning often alleviates skewed input data. Modifying the layouts often repairs skewed program execution. Skew for a data storage partition is defined as:
(N- AVERAGE) /MAX
Where: N is the number of bytes in that partition
AVERAGE is the total number of bytes in all the partitions divided by number of partitions
MAX is the number of bytes in the partition with the most bytes.

Skew for program execution is similar, using CPU seconds instead of number of bytes.

Straight flows: connect components that have the same number of partitions. Partitions of components connected with straight flows have a one-to-one correspondence. Straight flows are the most common flow pattern.

Subgraph is a graph fragment. Just like graphs, subgraphs can contain components and flows. Subgraphs are useful for grouping a graph into subtasks and reusing them.

Watcher: A watcher lets you view the data that has passed through a Flow.
To use a watcher, do the following:
Turn on debugging mode.
Add a watcher on a flow.
Run the graph.
View the data.

Abi

1.In my sandbox i am having 10 graphs, i checked-in those graphs into EME. Again i checked-out the graph and i do the modifications, i found out the modifications was wrong. what i have to do if i want to get the original graph..?

Here is my understanding of your problem:

Say your orginal version number of the graph is "100" and after you made the first set of modification and checked in, the graph gets a version of, say "102".
Now you checked out the latest version of the graph, i.e. version 102 and did another set of modifications. After checking in (say new version number 105) you realise that the changes were incorrect.

In such a case the correct version is 102 on which you have to make the second set of changes again.

To achieve this, check out version 102 (select appropriate version number in check-out wizard), check it in again without any modification and setting the "force overwrite" option on. This will create a new version of the graph, say 108, and this version will be the same as version 102.

So now you have version 102 as the latest version with a new version number 108, you can lock and make the correct modifications on it.

Another way is to branch out, but in your scenario it doesn't appear to be the right option.

I have used version numbers in the explanation, which can be replaced by "tag names".
2.What is the difference between partition, re-partition and departition?
Dividing a single flow of records(data) into multiple flows is known as partitioning.

Dividing a x-way flows of records(data) into y-way flows is known as re-partitioning.
e.g: 2 ways flow ino 4 ways flow

Combining multiple flows into a single flow of records(data) is known as departitioning.
3.How do we merge graphs in AbInitio?
We cannot merge two ab initio graph but you can copy the contents and paste in the other graph.
I don't understand why you want to merge two graphs.
4.When did we get error like 'Bad-Straight-flow'?
Layouts problems in components. e.g. a straight flow connecting a component which is having serial layout with a component which is in muti-file layout.
4.What is the difference between preemptive scheduling and time slicing
Under preemptive scheduling, the highest priority task executes until it enters the waitingor dead states or a higher priority task comes into existence. Under time slicing, a taskexecutes for a predefined slice of time and then reenters the pool of ready tasks. Thescheduler then determines which task should execute next, based on priority andother factors.
. Pre-emptive Scheduling.

Ways for a thread to leave running state -

· It can cease to be ready to execute ( by calling a blocking i/o method)

· It can get pre-empted by a high-priority thread, which becomes ready to execute.

· It can explicitly call a thread-scheduling method such as wait or suspend.

· Solaris JVM’s are pre-emptive.

· Windows JVM’s were pre-emptive until Java 1.0.2

2. Time-sliced or Round Robin Scheduling

· A thread is only allowed to execute for a certain amount of time. After that, it has to contend for the CPU (virtual CPU, JVM) time with other threads.

· This prevents a high-priority thread mono-policing the CPU.

· The drawback with this scheduling is – it creates a non-deterministic system – at any point in time, you cannot tell which thread is running and how long it may continue to run.

5.HowThe two available options are
1. CRONTAB
2. AT (One time scheduling) to do the scheduled task/jobs in Unix platform?
ans.

Abinitio Performance

Ab Initio PERFORMANCE

How To Improve Performance: -

1. Go Parallel as soon as possible using Ab Initio Partitioning technique.
2. Once Data Is partitioned do not bring to serial , then back to parallel. Repartition instead.
3. For Small processing jobs serial may be better than parallel.
4. Do not access large files across NFS, Use FTP component
5. Use Ad Hoc MFS to read many serial files in parallel and use concat coponenet.
· Don’t use filter by Expression. Most of the components has embedded filter by expression called select expression use embedded select instead of Filter by expression if possible to improve performance.

CONCATENATE
8

Ad Hoc MFS(80 files)

1. Using Phase breaks let you allocate more memory to individual component and make your graph run faster
2. Use Checkpoint after the sort than land data on to disk
3. Use Join and rollup in-memory feature
4. Best performance will be gained when components can work with in memory by MAX-CORE.
5. MAR-CORE for SORT is calculated by finding size of input data file.
6. For In-memory join memory needed is equal to non-driving data size + overhead.
7. If in-memory join cannot fir its non-driving inputs in the provided MAX-CORE then it will drop all the inputs to disk and in-memory does not make sence.
8. Use rollup and Filter by EX as soon as possible to reduce number of records.
9. When joining very small dataset to a very large dataset, it is more efficient to broadcast the small dataset to MFS using broadcast component or use the small file as lookup.

Reduce number of components may save startup costs.
Don’t use MFS if you have small datasets
Use select filter inside the component than separate Filter By Ex component
6. Monitor UNIX CPU usage by using vmstat , disk usage using iostat .

Shell scripting for componentwise

1.Shell programming syntax on compress

To call the Compress component in the Shell Development Environment (sde), use the mp compress command.

Following is the command syntax:

mp compress/labelÞ
-layoutlayout_name

Following is a list of the arguments to the command, with a brief description of each:

label Name you want to assign to this particular instance of this component.
-layoutlayout_name Name of a layout object.

2.Shell programming syntax on uncompress

To call the Uncompress component in the SDE, use the mp uncompress
command.

Following is the command syntax:

mp uncompresslabelÞ
-Layout layout name

Following is a list of the arguments to the command, with a brief description of each:

label Name you want to assign to this particular instance of this component.
-layoutlayout_name Name of a layout object.

3.Shell programming syntax on GATHERLOG AS A MISLANEIOUS COMPONENT

To call the Gather Logs component in the SDE, use the mp logger command.

Following is the command syntax:

mp logger labellog_filename start text end_textÞ
-Layoutlayout_name

Following is a list of the arguments to the command, with a brief description of each:

Label Name you want to assign to this particular instance of this component.
log filename See log file parameter.
start text See start text parameter.
end text See end text parameter.
-Layoutlayout_name Name of a layout object.

4.Shell programming syntax on redefined format as a mislaneous component

To call the Redefine Format component in the SDE, use the mp copy command.

Following is the command syntax:

mp copy label
-layoutlayout_name

Following is a list of the arguments to the command, with a brief description of each:

Label Name you want to assign to this particular instance of this component.
-Layoutlayout_name Name of a layout object.

5.Shell programming syntax on replicate component as a mislaneous component

To call the Replicate component in the SDE, use the mp broadcast command.

Following is the command syntax:

mp broadcast labelÞ
-Layoutlayout_name

Following is a list of the arguments to the command, with a brief description of each:

Label Name you want to assign to this particular instance of this component.
-Layoutlayout_name Name of a layout object.
6.Shell programming syntax on run program as a mislaneous component

To call the Run Program component in the SDE, use the mp filter command.

Following is the command syntax:

mp filter labelcommand_lineÞ -Layoutlayout_name

Following is a list of the arguments to the command, with a brief description of each:

label Name you want to assign to this particular instance of this component.
command_line See commandline parameter. Ex: /bin/grep -i Smith
-layoutlayout_name Name of a layout object.
7.Shell programming syntax on trash component as a mislaneous component

To call the Trash component in the SDE, use the mp broadcast command, and do not attach a flow to the out port.

Following is the command syntax:

mp broadcast labelÞ -Layoutlayout_name

Following is a list of the arguments to the command, with a brief description of each:

label Name you want to assign to this particular instance of this component.
-Layoutlayout_name Name of a layout object.

8.Shell programming syntax on broadcast component as a partition component

To call the Broadcast component in the SDE, use the mp broadcast command.

Following is the command syntax:

Mp broadcast label -Layout layout name

Following is a list of the arguments to the command, with a brief description of each:

Label The name you want to assign to this particular instance of this component.
-Layoutlayout_name Name of a layout object.

9.Shell programming syntax on partition by expression as a partition component

To call the Partition by Expression component in the SDE, use the mp function-partition command.

Following is the command syntax:

mp function-partition labelDML_expressionÞ -Layoutlayout_name

Following is a list of the arguments to the command, with a brief description of each:

Label Name you want to assign to this particular instance of this component.
DML_expression See function parameter.
-Layoutlayout_name Name of a layout object.
10.Shell programming syntax on partition by key component as a partition component

To call the Partition by Key component in the SDE, use the mp hash-partition command.

Following is the command syntax:

mp hash-partition labelkey_specifierÞ -Layoutlayout_name

Following is a list of the arguments to the command, with a brief description of each:

Label Name you want to assign to this particular instance of this component.
key_specifier See key parameter.
-Layoutlayout_name Name of a layout object.
11.Shell programming syntax on partition by percentage as a partition component

To call the Partition by Percentage component in the SDE, use the mp percentage-partition command.

Following is the command syntax:

Mp percentage-partition label [Percentage1percentage2...] Þ -Layoutlayout_name

Following is a list of the arguments to the command, with a brief description of each:

Label Name you want to assign to this particular instance of this component.
Percentage1ÞPercentage2... List of percentages expressed as integers from 1 to 100, separated by spaces.
-Layoutlayout_name Name of a layout object.
12.Shell programming syntax on partition by range as a partition component

To call the Partition by Range component in the SDE, use the mp range-partition command.

Following is the command syntax:

mp range-partition label key_specifierÞ -Layoutlayout_name

Following is a list of the arguments to the command, with a brief description of each:

Label Name you want to assign to this particular instance of this component.
Key_specifier See key parameter.
-Layoutlayout_name Name of a layout object.
13. Shell programming syntax on partition by round Rabin as a partition component

To call the Partition by Round-robin component in the SDE, use the mp round robin-partition command.

Following is the command syntax:

Mp round robin-partition label number_recordsÞ -Layoutlayout_name

Following is a list of the arguments to the command, with a brief description of each:

Label Name you want to assign to this particular instance of this component.
Number records: See block size parameter.
-Layoutlayout_name Name of a layout object.
14. Shell programming syntax on partition with load balance as a partition component

To call the Partition with Load Balance component in the SDE, use the mp load-level-partition command.

Following is the command syntax:

mp load-level-partition label -Layoutlayout_name

Following is a list of the arguments to the command, with a brief description of each:

Label Name you want to assign to this particular instance of this component.
-Layoutlayout_name Name of a layout object.

abinitio script

http://datawarehouse.ittoolbox.com/groups/technical-functional/abinitio-l/start-script-eof-problem-2330973

Data ware housing Links

Data warehousing links:
SearchDatabase.com: Lots of information about databases and data warehousing.
The Data Warehousing Information Center: Provides information on tools and techniques to design, build, maintain, and retrieve information from a data warehouse.
The OLAP Report: Information on OLAP - products, market share, technology, and trends.
The Data Warehousing Institute: Provider of in-depth conferences, education, and training in the data warehousing and business intelligence industry.
DM Review: Is a collection of material about data warehousing written by various authors.
Datawarehousingonline.com: Data warehousing insight portal.
ITtoolbox Portal for Data Warehousing: Content, community, and service for Data Warehousing professionals. Providing technical discussion, job postings, an integrated directory, news, and much more.
Data Warehousing: Wilson Mar's data warehousing site.
Evaltech: Useful site for data warehousing tool selection.
ETL Tools Info: Provides information about different business intelligence aspects, especially focusing on the Datastage ETL tool.
Product links:
The Local Cube Information Center: This site provide information on local cubes, and promotes the use of local cubes. Includes a 30-day free trial for the OLAP Client Management System product by SDG Computing, Inc.
Useful Resources:
PHP Tutorial: Help beginners learn the essential building blocks of PHP.
Techology Tips: My blogging site focused on the tips and how-to's on technology, software, and building a website.
Free Essential Software: Lists free software essential for those who are budget conscious.
Books for Home Buyers: List of books for those looking to buy a house.
Find your baseball cards: Search and buy baseball cards of your favorite players.
The Beijing Folio: A one-page page including the top links introducing you to the best of Beijing, the site of the 2008 Olympic Games.

Monday, September 01, 2008

Glossary

Aggregation: One way of speeding up query performance. Facts are summed up for selected dimensions from the original fact table. The resulting aggregate table will have fewer rows, thus making queries that can use them go faster.
Attribute: Attributes represent a single type of information in a dimension. For example, year is an attribute in the Time dimension.
Conformed Dimension: A dimension that has exactly the same meaning and content when being referred from different fact tables.
Data Mart: Data marts have the same definition as the data warehouse (see below), but data marts have a more limited audience and/or data content.
Data Warehouse: A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process (as defined by Bill Inmon).
Data Warehousing: The process of designing, building, and maintaining a data warehouse system.
Dimension: The same category of information. For example, year, month, day, and week are all part of the Time Dimension.
Dimensional Model: A type of data modeling suited for data warehousing. In a dimensional model, there are two types of tables: dimensional tables and fact tables. Dimensional table records information on each dimension, and fact table records all the "fact", or measures.
Dimensional Table: Dimension tables store records related to this particular dimension. No facts are stored in a dimensional table.
Drill Across: Data analysis across dimensions.
Drill Down: Data analysis to a child attribute.
Drill Through: Data analysis that goes from an OLAP cube into the relational database.
Drill Up: Data analysis to a parent attribute.

Data ware housing Books

Below is a list of 5 most recently-published books related to data warehousing. You can also view the books according to the following subject areas:
ETL Books
OLAP Books
Business Intelligence Books
General Books
Data Modeling Books
Vendor-Specific Books
Data Modeling Fundamentals: A Practical Guide for IT ProfessionalsBy Paulraj PonniahCategory: Data ModelingPublished Date: 2007-07-20
Introduction to Business IntelligenceBy Jorg HartenauerCategory: BIPublished Date: 2007-06-19
Foundations of SQL Server 2005 Business IntelligenceBy Lynn LangitCategory: VendorPublished Date: 2007-04-24
Business Intelligence: A Capability Maturity ModelBy Dorothy MillerCategory: BIPublished Date: 2007-04-12
Business IntelligenceBy Efraim Turban, Ramesh Sharda, Jay Aronson, David KingCategory: BIPublished Date: 2007-04-04
View All Books ...

Business Intelligence Softwares

As the old Chinese saying goes, "To accomplish a goal, make sure the proper tools are selected." This is especially true when the goal is to achieve business intelligence. Given the complexity of the data warehousing system and the cross-departmental implications of the project, it is easy to see why the proper selection of business intelligence software and personnel is very important. This section will talk about the such selections. They are grouped into the following:
General Considerations
Database/Hardware
ETL Tools
OLAP Tools
Reporting Tools
Metadata Tools
Data Warehouse Team Personnel Please note that this site is vendor neutral. Some business intelligence vendor names will be mentioned, but it should not be considered as an endorsement from this site

Business Intelligence

Business intelligence is a term commonly associated with data warehousing. In fact, many of the tool vendors position their products as business intelligence software rather than data warehousing software. There are other occasions where the two terms are used interchangeably. So, exactly what is business intelligence?
Business intelligence usually refers to the information that is available for the enterprise to make decisions on. A data warehousing (or data mart) system is the backend, or the infrastructural, component for achieving business intelligence. Business intelligence also includes the insight gained from doing data mining analysis, as well as unstrctured data (thus the need fo content management systems). For our purposes here, we will discuss business intelligence in the context of using a data warehouse infrastructure.
This section includes the following:
Business intelligence tools: Tools commonly used for business intelligence.
Business intelligence uses: Different forms of business intelligence.
Business intelligence news: News in the business intelligence area

Data ware housing concepts

Several concepts are of particular importance to data warehousing. They are discussed in detail in this section.
Dimensional Data Model: Dimensional data model is commonly used in data warehousing systems. This section describes this modeling technique.
Slowly Changing Dimension: This is a common issue facing data warehousing practioners. This section explains the problem, and describes the three ways of handling this problem with examples.
Conceptual, Logical, and Physical Data Model: Different levels of abstraction for a data model. This section explains their differences and lists the steps for constructing each.
What is OLAP: Definition of OLAP.
MOLAP, ROLAP, and HOLAP: What are these different types of OLAP technology? This section discusses how they are different from the other, and the advantages and disadvantages of each.
Bill Inmon vs. Ralph Kimball: These two data warehousing heavyweights have a different view of the role between data warehouse and data mart.

Data Ware Housing Steps

After the tools and team personnel selections are made, the data warehouse project can begin. The following are the typical processes involved in the datawarehousing project cycle.
Requirement Gathering
Physical Environment Setup
Data Modeling
ETL
OLAP Cube Design
Front End Development
Performance Tuning
Quality Assurance
Rolling out to Production
Production Maintenance
Incremental Enhancements
Each page listed below represents a typical data warehouse phase, and has several sections:
Task Description: This section describes what typically needs to be accomplished during this particular data warehouse phase.
Time Requirement: A rough estimate of the amount of time this particular data warehouse task takes.
Deliverables: Typically at the end of each data warehouse task, one or more documents are produced that fully describe the steps and results of that particular task. This is especially important for consultants to communicate their results to the clients.
Possible Pitfalls: Things to watch out for. Some of them obvious, some of them not so obvious. However, all of them are real.

Data Ware Housing Home

Tools: The selection of business intelligence tools and the selection of the data warehousing team. Tools covered are:
Database, Hardware
ETL (Extraction, Transformation, and Loading)
OLAP
Reporting
Metadata
- Steps: This selection contains the typical milestones for a data warehousing project, from requirement gathering to production rollout and beyond. I also offer my observations on the data warehousing field.
- Business Intelligence: Business intelligence is closely related to data warehousing. This section discusses business intelligence, as wellas the relationship between business intelligence and data warehousing.
- Concepts: This section discusses several concepts particular to the data warehousing field. Topics include:
Dimensional Data Model
Slowly Changing Dimension
Conceptual, Logical, and Physical Data Model
What is OLAP
MOLAP, ROLAP, and HOLAP
Bill Inmon vs. Ralph Kimball
- Business Intelligence Conferences: Lists upcoming conferences in the business intelligence / data warehousing industry.
- Glossary: A glossary of common data warehousing terms.