Wednesday, October 01, 2008

FAQ FROM ALL && COOL INTERVIEWS.com

1.How do we handle if DML changing dynamically?
Ans:
Use a reformat Component
it can be handled in the start up script with dynamic sql
Creation and create dynamic dml so that there will be no
need to change the component henceforth .

Two types of dml will be used: conditional dml, dynamic
Dml.

Conditional dml will be used whenever the output record
Flow will be same based on conditional parameters.

2. If I delete 1 partition (in 8 partition multifile) and run
the graph. Will the graph run successfully?, If not what
error I'm going to get.
Ans: Yes the graph will run. Only the records in the partition
being deleted won't be processed.
3. What is AB_LOCAL expression where do you use it in ab-
initio?
Ans: Parallel unloads

we use AB_LOCAL(expression) to increase the SQL query
performance by supplying the name of large table in
expression. This way we make it as a driving table.

4. name the air commands in ab initio?
Ans: Here are the few of the commands we use


1) air object ls /Projects/edf/.. > --- This is used to see the listing of
objects in a directory inside the project.

2) air object rm /Projects/edf/.. > -- This is used to remove an object
from the repository. Please be careful with this.

3) air object cat /Projects/edf/.. > --- This is used to see the object
which is present in the EME.

4) air object versions -verbose /Projects/edf/.. > --- Gives the Version History of the
object.

5) air project show /Projects/edf/.. > --- Gives the whole info about the
project. What all types of files can be checked-in etc.

6) air project modify /Projects/edf/.. > -extension single quotes> --- This is to modify the
project settings. Ex: If you need to checkin *.java files
into the EME, you may need to add the extension first.

7) air lock show -project /Projects/edf/.. > --- shows all the files that are locked
in the given project

8) air lock show -user -- shows all the
files locked by a user in various projects.

9) air sandbox status ---
shows the status of file in the sandbox with respect to
the EME (Current, Stale, Modified are few statuses)

5. What is $mpjret? Where it is used in ab-initio?
Ans: $mpjret is the graph execution status return value.
It can be used in end script of a graph.
Ideally the value of $mpjret should be 0(zero).
$mpjret is a variable declared in the end script which
returns the value of the execution of the graph.

Its very simalr to the $? in unix

the above command returns the value 0 ot 1
0-success
1-failure
6. How to get DML using Utilities in UNIX?
Ans: m_db gendml will help you get the DML from Database.
cobol-to-dml and xml-todml are other utilities from command
line we can use to get DML's.
7. Output for sort and dedupsort with NULL key?
Ans: Whenever we sort a set of records with NULL key
automatically it consider the records as one group & data
will be as per the input serial number and will be sorted
according to that.But dedup will again cosider the records
as one group & output will be the first record
automatically.
e.g.input records:1,XYZ,100;
2,ABC,700;
5,JJJ,400;
7,KKK,500;
using NULL key sort component will give output as:
1,XYZ,100;
2,ABC,700;
5,JJJ,400;
7,KKK,500;
Dedup sort will give output as:
1,XYZ,100;

i think in the above case dedup sort will give all the
records in out put even if we sort on any of the three
fields

8. what are kinds of layouts does ab initio supports ?
ans: A URL that specifies the location of a serial file
A URL that specifies the location of the control partition
of a multifile
A list of URLs that specifies the locations of:
The partitions of an ad hoc multifile
The working directories of a program component

Basically there are serial and parallel layouts supported
by AbInitio. A graph can have both at the same time. The
parallel one depends on the degree of data parallelism. If
the multi-file system is 4-way parallel then a component in
a graph can run 4 way parallel if the layout is defined
such as it's same as the degree of parallelism.

9. what is the syntax of m_dump command?
Ans: Find all the Ans as below...

How to read a feed file from UNIX prompt without opening Ab-
initio GDE?

Follow the below steps:
Create the DML for reading the feed file and save it
with .dml extension.
Place the DML file in the .dml folder of the respective
sandbox being used (can even be ignored).
Execute the following command at unix prompt:
m_dump <.dml file(absolute path)> path)>
Ex:
m_dump /export/home/read_feed.dml /export/home/feedfile.dat

For more options on m_dump, type m_dump help on unix prompt.

10. What is the difference between a DB config and a CFG file?
Ans: Both .dbc and.cfg files are database configuration files.

The .cfg files are older version of database configuration
files used with 2.1 database components.

.cfg - > Database table configuration files for use with
2.1 Database components

.dbc -> Database configuration files


11. What is AB_LOCAL expression where do you use it in ab-
initio?
Ans: Parallel unloads

we use AB_LOCAL(expression) to increase the SQL query
performance by supplying the name of large table in
expression. This way we make it as a driving table.

12. How to run the graph without GDE?
Ans: 1. Deploy the graph as a script and execute the script in
unix environment

or

2. Use air sandbox run command to run the graph directly
from command line

13. what does layout means in terms of Ab Initio?
Answer

Layout is where the Program ( component) runs.
Based on the layout given Abinitio Tries to run on that
Physical location.

way to give layouts: from neibhoring Component.
URL : like $AI_SERIAL ( amonut point
location /opt/apps/ppl/serial)
Database : where database runs..


14. What is the difference between sandbox and EME, can we
perform checkin and checkout through sandbox?

Ans: Sandbox is working copy of your project,and you can do
checkin checkout through sandbox

EME is the central repository and Sandbox is the private
area where you bring the object (by doing object level
checkout) from the repository for editing.
Once you are finish with your editing you can checkin back
the object to the EME (object level check in)from sandbox.

15. How do we extract data from client machine?
Answer
if u have to extract data from a database system (A) which
is on server(B). for connecting to server(B) provide the
username, encrypted password, connection method,
servername, etc in ur .abinitiorc file and provide all the
database(B) detail in ur .dbc file of ur component used for
doing extraction (input table component)

16. How to get DML using Utilities in UNIX?
Answer
m_db gendml will help you get the DML from Database.
cobol-to-dml and xml-todml are other utilities from command
line we can use to get DML's.

17. what is skew and skew measurement?
Answer
Skew measures the relative imbalance in Parllel loading.
Un-even Load balencing causes the Skew.

18. What is .abinitiorc and What it contain?
Answer
You specify telnet or ftp ports by setting the
configuration variables AB_TELNET_PORT or AB_FTP_PORT in
your .abinitiorc file

.abinitiorc file is used to provide parameters for remote
connectivity. You can access abinitio resources (e.g., EME)
on a different server by providing the connection method,
and authentication details in the .abinitiorc file.

.abinitiorc can be placed in two locations:
1. In the $HOME directory of each user
2. In the config directory of the Co>Op

In case both exist, the first one (in $HOME dir) will take
precedence over the second (in config).

19. wt is meant by repartioning in howmany ways it can be done.
Answer
Repartitioning Technique in Ab Initio is to apply
departition on a data set and the apply Partitioning on the
transformed data.
Its used when there is requirement of changing 2way mfs
into 4way.
I dont know any otherways of achieving this Repartitioning.

20. What is a ramp limit?
Answer
When We set the Reject Threshold is set to ramp/Limit then
we need to give ramp , imit values.

Ramp - Is a real number defined the rate of Rject records.

Limit - Number of records that can Reject.


Ramp - Rate of toleration of reject events in the number of
records processed.

Limit - A number representing reject events.


21. Describe how you would monitor and control database physical file size and growth?
Answer
wc

connect as dba to the database and fire the query
show db_parameter_file; on oracle database (I am not sure
on query, u can search for a similar query)


22. Have you eveer encountered an error called "depth not equal"?
Answer

when 2 components connected with stright
flow not having same depth or layout



When two components are linked together if their layout
doesn’t match then this problem can occur during the
compilation of the graph. A solution to this problem would
be to use a partitioning component in between if there was
change in layout


23. How to Improve Performance of graphs in Ab initio? Answer
# 1
There are many ways to improve the performance of the
graph. It also depends on a particular graph, the
components used in it.
In general the following tips can be used for improving
performance:
1> Try to use partitioning in the graph
2> try minimising the number of components
3> Maintain lookups for better effeciency
4> Components like join/ rollup should have the option
Input must be sorted, if they are placed after a sort
component.
5> If component have In memory: Input need not be sorted
option selected, use the MAX_CORE parameter value
efeciently.
6> Use phasing of a graph effeciently.
7> Ensure that all the graphs where RDBMS tables are used
as input, the join condition is on indexed columns.
8> Try to perform the sort or aggregation operation of data
in the source tables at the database server itself, instead
of using it in AbInitio.


24. Describe which system or process elements you would review when trouble-shooting general server slows ?
Answer
one suggestion is...if u working on unix environment, check
for the space availablity in the mount points.
other suggestion is to use the "top" command to check the
processes usage...

Its better to check with the Unix admin guys if the
transformation is taking lots of time and if in loading its
taking time then consult to DBA to get the exact reason.


25. What are Cartesian joins?
Answer
joining 2 tables with out conditon
select * from emp,dept;

emp having 10 rec
dept having 5 rec

out == 50 rec


26. What is meant by Co > Operating system and why it is special for Ab-initio ? Answer

The Co>Operating System? is Ab Initio core software that
unites a network of computing resources such as CPUs,
storage disks, programs, and datasets into a production-
quality data-processing system. It provides a distributed
model for process execution, file management, process
monitoring, checkpointing, and debugging. You can interact
with Co>Operating System through its graphical user
interface: the Graphical Development Environment, or GDE.


27. How might you quantitatively measure an improvement made to a query? Answer

Check out the execution plan



28. Do we really work with ACTUALDATA in development phase.
Answer
generally you will find development, QA and production
phase. If all these 3 phases are there then in development
phase u will never use actual data, but before moving the
graph to prodution it needs to pass through QA where you
have to test against the actual data.

29. What is the difference between partitioning with key and round robin? Answer

1)Partition by key needs a key where round robin doesnot need
2)Round Robin always tries to distribute the records
equally.where partition by key not


in patitioning with key the partitioning is based on key
whereas round robin is not based on key


Partitioning by key distributes the data into various
multifile partitions depending upon of the fields present in
the input(key), while Partitioning by round robin distribues
data equally among the partitions irrespective of the key
field. Round robin ensures equitable distribution of data
among the partitions, while By key may lead to inequitable
distribution


30.what is the relation between EME , GDE and Co-operating system ?

ans: EME contains old as well as new versions of graphs
while GDE(graphical development environment) contains
only the latest version,while Co-operating system
ia installed at client machine,and it interacts
with GDE.


GDE is the enviornment for developing the Ab Initio graph.

EME is the repository for all the ab initio files and is
used for version controling.

Co-operating system runs the ab initio graph



Co operating system is the core system. It is the Abinitio
server. All the graphs which are made in GDE are deployed
and run on cooperating system. It is installed on unix.

EME stands for Enterprise meta enviornment. It is a
repository which holds all the projects,
metadata,transformations and transformations.It performs
operations like version controlling, statistical
analysis,dependence analysis and metadata management.

GDE stands for graphical development enviornment and is
just like a canvas on which we creat our graphs with the
help of various components.It just provides graphical
interface for editing and executing Abnitio programs




GDE is the graphical development environment on which we
work. we create graphs in GDE. we save graphs with .mp
extension. Gde connect with the Co operating system with
host settings. we have to find the path from AB_home var
and AB_air_root which are in unix os machine in abinitio
directory.

co operating system which is like a server which interprits
the graph created in GDE and connects to os.

EME is the enterprise metadata environment which consists
theread only copies of the written by us we load a local
copy and work on that and the change delta kept in EME.

SANDBOX is the local working environment in abinitio where
we make changes to our code



31. How to Create Surrogate Key using Ab Initio?
Answer

by using next_in_sequence() function in transform.

You can use assign_keys component

32.How Does MAXCORE works?
Answer
Maximum memory usage in bytes,Before spilling data on the
disk.
The defalut max core for a sort component is 10485760


MAXCORE parameter is used when we use Inmemory: Input need
not be sorted option. It is the maximum memory in bytes
that the component uses per partition. This parameter can
be seen in Join, Sort and Rollup



33. What does dependency analysis mean in Ab Initio?
Answer
It is the analyses of the dependencies within the graphs.
It is nothing but the tracing or monitoring how data is
transformed and transferred, field by field, from component
to component.



Allows us to identify the related object to be migrated
when we migrate code from one environment to other i.e from
development to QA to production



34.

No comments: