Learn-Informatica: May 2011

Sunday, May 29, 2011

Dimensional Data Modeling :

It is a modeling technique used in data warehousing systems. It is different from ER modeling technique used in OLTP systems. In Dimensional modeling a model of tables is combined together with aim of optimized query performance in Decision Support systems in relational databases.

Before further going into details of dimensional data modeling we should be aware of key concepts in Dimensional Data Modeling

Dimension : It can be considered as category of information. For example geographic dimension. It provides a way to slice and dice the data in data warehouse.

Attribute : Attribute can be considered as characteristic of dimension. For example region is attribute of geographic dimension. Dimension attribute is column in dimension table.

Hierarchy : It represents relationships between attributes of dimension. It defines different level with in dimension, In case of geographic dimension it is like

Continent ->Country –> Region –> State ->City ->Street

Fact Table : It represents the measure of interest, for organization product revenue is measure of interest. The level of granularity is an important factor in designing fact tables. For example it can be revenue by continent or revenue by country. In this example fact table will have three columns Product, Geographical region, Revenue.

Lookup Table: The look up table represents a dimension and constitutes attributes for dimension. Lookup table for product will consist of all products available.

Dimensional Modelling Vs ER Modelling :
In Dimensional Modelling emphasis is on optimising decision support query performance,On other hand ER Model are focussed on

Removing redundancy in data model
Optimse OLTP Performance
Focus is on retrieval of single record

We will continue with dimensional data modelling in next post.

High-water mark

High-water mark indicates the point up to which table or index has ever contained data.Let me explain .Suppose we create an empty table,the high-water mark would be at the start of the table segment

Now if we insert some data into the table segment, the HWM moves and it will be at position up to which the data is in the segment

If insert more data in the table segment, HWM moves further to point the position up to which the data is in the segment

Now if we delete the data from the table see the new position of HWM

As we can see above on deleting the data, HWM does not move. The drawback of this is that oracle always read the blocks up to high water mark in case of full table scan. You must have noticed that doing a count (*) on empty table, takes time to show you no rows. The reason for taking so much time is setting of HWM at higher position.

Now you must be thinking , how to set the high-water mark at lower position ?
The only method to set the HWM is to truncate a table.

Let us see how truncate set the HWM.

oracle sequences

Oracle sequence
Oracle sequence is a database object that is used to generate unique numbers,It is mainly used for primary key values.
Syntax
Syntax for creating sequence is

CREATE SEQUENCE sequence

INCREMENT BY number

START WITH number

MAXVALUE number

MINVALUE

CYCLE/NOCYCLE

CACHE/NOCACHE

ORDER/NOORDER

Important option need to be considered

CYCLE/NOCYCLE:If we specify cycle option while creating sequence then it will continue to generate values even after reaching it's max value. It will again start with min value. Nocycle means it will not generate value after reaching max value. Default is no cycle.

CACHE/NOCACHE : It specifies how many value oracle pre-assign and keep in memory for fast access. Nocache specifies that no value is pre-assigned. If we don’t use any option then default value is 20.

Flip side of using a sequence with cache is that if some system failure occurs then all cached values, which were not used, will be lost. For example you were using a sequence, which was, having cached value equals to 100 and oracle had used only 30 values then you will lose 70 sequence values.

Things to remember while using sequences

1. In a session sequence.nextval must be referred before using sequence.currval .if you try to use current value before using nextval in session you will get following error ORA-00904: "SEQ_TEST"."CURRENTVAL": invalid identifier.

2. In other case suppose you have insert 20 records with sequence the issue rollback then sequence will not be rolled back. Next time oracle will use 21st value. Oracle has wisely chosen this option so that multiple users can use sequence without any concern of duplicate sequence value.

3. Creating sequences with NOCACHE will have performance impact.

4. We can not use sequence when the UNION operator combines SELECT statements.

Enhancement in 11G for sequences

In 11G we can use variable assignment for sequences like
V_seq_11g := test_SEQ_11G.NEXTVAL;
For doing same in 10G and prior we have to use SELECT TEST_SEQ_10G.NEXTVALINTO v_seq_10g FROM DUAL;

How to transpose data in Informatica

Suppose we are having input data coming as

Firstname1
Ph1
Address1

Firstname2
Ph2
Address2

Firstname3
Ph3
Address3

You want data in output like i.e. you want to convert 9 rows into 3 rows with similar group together

Firstname1 Ph1 Address1
Firstname2 Ph2 Address2
Firstname3 Ph3 Address3

You can create a mapping by following these steps

Create a SQ with one input field Field1

Add a expression transformation after SQ

Link Field1 from SQ to expression Transformation

Add a new op port name grp =substr(FIELD1,length(FIELD1),1)

Add a aggregator transformation after Expression and drag Field1 and Grp field

Have grp as group by fieldAdd three new port OP_FLD1=LAST(FIELD1,SUBSTR(FIELD1,1,1)='f')OP_FLD2=LAST(FIELD1,SUBSTR(FIELD1,1,1)='p')OP_FLD=LAST(FIELD1,SUBSTR(FIELD1,1,1)='a')
Add OP_FLD1,OP_FLD2,OP_FLD3 to target

Tuesday, May 24, 2011

Interview Questions

HCL

Where did u use the Unix shell scripts in informatica projects?
how to u generate surrogate keys for tables with more than 2 billion records (surrogate key is a primary key field).
how do u propagate date coloumn to a flat file, if u need format to be DD-MON-YYYY?
if a look up returning multiple matching rows, then how Unconnected lookup & Connected lookup will perform? (Not using lookup by last value)
If you use sorted I/P option in Aggregator but gave it Unsorted I/P then what will happen?
If i have 100 rows given as I/P to aggregator & want 100 rows as O/P then how can u achieve that?(none of the coloumns are primary key)
If i have 100 rows given as I/P to aggregator & want just 100th row as O/P then how can u achieve that?
What kind of stored procedures are there in informatica
If i use pre-session stored procedure, then i need to activate another field, What would that be?
What are all the conditions (=, Not Between) you have in Lookup & in Joiner?
What is the use of Reusable sequence generator (not in the resuability sense), any other?
If i had a flat file, Can i over ride SQL in Source Qualifier or Lookup?
If i have a flat file target, When i click it in the Workflow manager, What all properties would i get?
I have to use Order by, I have a table with coloumns A,B,C...I need order by B,C ..Do u need to add some more thing to "select * from TABLE NAME order by B,C" ???
What is the use of Return Port & Output port in Lookup Transformation?
If i have used 2 update strategys in my mapping..1 for insert & other for delete, then i changed target option in session properties from datadriven to delete...then how is my mapping going to perform ? (all deletes or insert & delete)

Answers:

To concatinate 2 or more Flat files, Workflow scheduling, File watcher script.
Dont use sequence generator, But use a Expression variable increment & look up transformation to get the last value used from target.
Using To_date function before loading to flat file
Fails the session with error messg 'Expecting keys to be ascending'.
Aggregator is an active transformation. So you can't expect it to give exact number of output rows for all input rows you have given to it.
If you dont select any group by port in ports tab of aggregator transformation then informatica is only going to give last row as output for all the numerous records given to it as input.
Lookup (=, !=, >=, <=, >, <), Joiner (= only )
You can never over ride a SQL Query when you are playing with flat files.
File Writer, Merge Partionned file, Merge File name, Merge File Dir, O/P file name, O/P file dir, Reject file name, Reject file dir.
By default all the ports in lookup transformation are Look up & O/P ports. Return ports are only used in Unconnected lookup & You need have atleast 1 Output port in Connected lookup. (cant explain in detail here...)
Workflow succeds but you get an error mesg in logs saying target did not permit to insert & all the records marked for inserts are loaded into badfile.

NESS Technologies

How would u rate urself on oracle, UNIX, DWH Concepts, Informatica

Why you need surrogate key instead of OLTP primary key, Tell a scenario where it’s mandatory to use surrogate key

Ans: If prod key is numeric before & OLTP People decided to go for alphanumeric, we may have to change all data types. in warehouse where that key is involved. So best to keep it away from business defined values, To track SCD

Scenario for using dynamic lookup

Ans: Loading data from flat file to a table, But the data in file is violating primary key constraint of table due to
duplicate data

How would you tune performance for a given scenario, scenario explained by interviewer!

Scenario where you developed some procedures

Scenario where you developed some UNIX scripts

Scenario where you encountered toughest time in job

In your complete job profile, what are the documentations you have done?

What’s the default & Max size of data cache?

20 MB, 2 GB

What is materialized view, how it’s going to improve performance.

Ans: It creates physical table unlike normal view

When you use bitmap indexes

Ans: when cardinality is low, distinct rows <> Newfile.txt

what is implicit cursor & explicit cursor in oracle?

Ans: Implicit cursor is a defined for every query executed. Its attributes can be retrieved after executing query using sql as prefix Ex: sql%rowcount. Explicit cursor is the cursors what we define manually.

what is the cursor attributes

Ans:%found
%notfound
%rowcount

---------
what are procedures you follow for good performance while doing TDD

What are your roles & responsibilities in previous project

What are all the phases in the DWH project from bidding to delivery

How do you communicate any mistakes to team members, so that they won’t be repeated?

how do you ensure all the business rules defined are implemented in development (Traceability matrix)

how did you follow code reviews, what are all the things u check

at what levels in ur project you perform testing

How did u implement defect tracking in ur project?

At what time in the project, you perform the performance tuning

Did you face any situation where you coded mapping wrong & in later stages found it, if yes how did u rectify

Do you have any check list in doing TDD doc, or is it just your experience that helps

tell me the process you followed in developing a mapping, the documents you got & instructions you got

did you have any team members directly reporting to you

how is the work assigned to you in your project & who takes care of assigning

did you involve in status updates of your project to client, if not who does it

--------------------------
RBS
how would you stop session row if a value for particular column is matched to a given value

Ans: use abort function of informatica in expression transformation

how about error function in informatica

Ans: it logs the error message you defined in the session log if particular condition is satisfied

levels of logs you can maintain in informatica sessions

Ans: Terse,Normal,And Verbose

how to run sequential sessions only if previous session loads at least one row in target

Ans: Defined a link condition between sessions with TgtSuccessrows>0

difference between max & greatest functions in oracle

Ans: Greatest gives greatest of 2 values, max is the max value of all available values

how you get number of rows returned by lookup when a condition matches

Ans: Define a new column & use SQL override with count(*)

how can you configure informatica workflow to run only on first working day of the month

Ans: Define a calendar & go with a session or script

when you go for sql overrides

Ans: When you want to share some processing load with Database, When it’s more effective

what are types of triggers, difference between row level & statement level trigger

Ans: after/before statement level/row level insert/delete/update

how would you question, if you are in interview panel

What is the advantage of packages in PL/SQL

Ans: Modular approach, Security, Integrity

If you have some err in a procedure of package, how you would know which procedure is it

Ans:

Can you join 2 tables using SQL override in Lookup transformation

Ans: Yes

If a session fails, What are the steps you follow in resolving it

Ans: check session logs with various tracing levels

What’s your most difficult situation in development

How you filter records in TOAD

Ans: Use filter icon & Add rule on columns

What is persistent cache in lookup transformation.

Ans: Its remains even after session run, useful in incremental aggregation

how did you implement incremental extraction

Ans: Mostly using set variable or some way of truncating stage tables or use parameter files

tell me about 'set variable' function in informatica

Ans: Set a value to variable depending on the last row processed

if you change properties in session, which one takes preference, is it mapping level or session level

Ans: Session Level

syntax for defining a parameter in parameter file

Ans: Folder - workflow - session – mapplet

How did you move your code from development to UAT or Production?

Did you take back up of your repository at any time? If yes how is it stored & where

Did you install informatica anywhere?

Explain the architecture of informatica

How would you identify why a record is rejected (D,O,N,T Indicators)

Production Support Issues - Informatica

If you have a flat file as a source and it is ftped from other systems.. at the time of ftp if it got failed then your jobs also fail.

If source data is not correct means not in correct format.

Database deadlock (One or more process accessing data at the same time)

Database server down

Taking care of the Mapping parameter and mapping variable file name

While importing the Test Environment into Production some of the Reusable maps gets invalidate.

Most issues come back to quality ofthe design, quality of the code and complexity of the solution.

Ho good were your test cases? Did you do peer code reviews?

How manydisparate systems are you pulling from? etc.

Error due to dependency of jobs runnning simultaneously.

Develop new or maintain existing Informatica mappings and workflows based on specifications.

Familiarity with administrative tasks within Informatica platform such as stopping/starting Informatica Repository and Server processes, performing code migration, backup and restore of Informatica Repository, and security administration.

Participate in cross-functional efforts to support other teams ** such as ETL and database tuning to support Business Objects reporting performance.

Sharing knowledge by effectively documenting work.

Session Recovery Procedures.

Ability to take ownership of projects and critical issues and to drive those to successful resolution

Provide daily production support of batches and system processes.

Produce weekly status report and participate in team meetings and conference calls with other region.

Why staging area is needed?

Unlike OLTP systems that create their own data through a user interface, data warehouses source their data from other systems.

There is physical data movement from source database to data warehouse database.

Staging area is primarily designed to serve as intermediate resting place for data before it is processed and integrated into the target data warehouse.

This staging are serves many purpose above and beyond the primary function.

a.The data is most consistent with the source.

It is devoid of any transformation or has only minor format changes.

b. The staging area in a relational database can be read/ scanned/ queried using SQL without the need of logging into the source system or reading files (text/xml/binary).

c. It is a prime location for validating data quality from source or auditing and tracking down data issues.

d. Staging area acts as a repository for historical data if not truncated

real time complex mappings or complex transformations in Informatica.

Most complex logic we use is denormalization.

We dont have any Denormalizer transformation in INformatica.

So we will have to use an aggregator followed by an expression.

Apart from this, we use most of the complexity in expression transformation involving lot of nested IIF's and Decode statements...another one is the union tranformation and joiner.

What is a Materialized View?What is the significance of Materialized Views in Data warehousing?

Materialized Views are schema objects that can be used to summarize,precompute,replicate and distribute data.

They store summarized data.

They can be used to improve query performance in very large databases.

What is a staging area?Do we need it?What is the purpose of a staging area?

Staging area is place where you hold temporary tables on data warehouse server.

Staging tables are connected to work area or fact tables.

We basically need staging area to hold the data , and perform data cleansing and merging , before loading the data into warehouse.

In the absence of a staging area, the data load will have to go from the OLTP system to the OLAP system directly,which in fact can severely hamper the performance of the OLTP system.

This is the primary reason for the existence of a staging area. In addition, it also offers a platform for carrying out data cleansing.

Why we use Stored Procedure transformation?

A Stored procedure transformation is an important tool for populating and maintaining databases.

Database administrators create stored procedures to automate time consuming tasks that are too complicated for standard SQLstatements.

Stored procedures can be used to do the following tasks:

Check the status of a target database before loading data into it.
Determine if enough space exists in a database.
Perform a specialised calculation.
Drop and recreate indexes.

Rules and Guidelines for Dynamic Caches

The Lookup transformation must be a connected transformation.

· You can use a persistent or a non-persistent cache.
· If the dynamic cache is not persistent, the Informatica Server always rebuilds the cache from the database, even if you do not enable Recache from Database.

· You cannot share the cache between a dynamic Lookup transformation and static Lookup transformation in the same target load order group.

· You can only create an equality lookup condition. You cannot look up a range of data.

· Associate each lookup port (that is not in the lookup condition) with an input port or a sequence ID.

· Only connect lookup/output ports to the target table instead of input/output ports. When you do this, the Informatica Server writes the same values to the lookup cache and the target table, keeping them synchronized.

· When you use a lookup SQL override, make sure you map the correct columns to the appropriate targets for lookup.

· When you add a WHERE clause to the lookup SQL override, use a Filter transformation before the Lookup transformation. This ensures the Informatica Server only inserts rows in the dynamic cache and target table that match the WHERE clause. For details, see Using the WHERE Clause with a Dynamic Cache.

· When you configure a reusable Lookup transformation to use a dynamic cache, you cannot edit the condition or disable the Dynamic Lookup Cache property in a mapping.

· Use Update Strategy transformations after the Lookup transformation to flag the rows for insert or update for the target.

· Use an Update Strategy transformation before the Lookup transformation to define some or all rows as update if you want to use the Update Else Insert property in the Lookup transformation.

· Set the row type to Data Driven in the session properties.
·

Select Insert and Update as Update for the target table options in the session properties.

Performance Tuning in Informatica

The goal of performance tuning is to optimize session performance so sessions run during the available load window for the Informatica Server. You can increase the session performance by following. The performance of the Informatica Server is related to network connections. Data generally moves across a network at less than 1 MB per second, whereas a local disk moves data five to twenty times faster. Thus network connections often affect on session performance.So avoid network connections.

Flat files: If ur flat files stored on a machine other than the informatica server, move those files to the machine that consists of informatica server. Relational datasources: Minimize the connections to sources, targets and informatica server to improve session performance. Moving target database into server system may improve session performance.

Staging areas: If u use staging areas u force informatica server to perform multiple data passes. Removing of staging areas may improve session performance.
U can run the multiple informatica servers against the same repository. Distributing the session load to multiple informatica servers may improve session performance.

Running the informatica server in ASCII datamovement mode improves the session performance because ASCII datamovement mode stores a character value in one byte. Unicode mode takes 2 bytes to store a character.

If a session joins multiple source tables in one Source Qualifier, optimizing the query may improve performance. Also, single table select statements with an ORDER BY or GROUP BY clause may benefit from optimization such as adding indexes.

We can improve the session performance by configuring the network packet size, which allows data to cross the network at one time. To do this go to server manger, choose server configure database connections. If u r target consists key constraints and indexes u slow the loading of data. To improve the session performance in this case drop constraints and indexes before u run the session and rebuild them after completion of session.

Running parallel sessions by using concurrent batches will also reduce the time of loading the data. So concurrent batches may also increase the session performance.

Partitioning the session improves the session performance by creating multiple connections to sources and targets and loads data in parallel pipelines.

In some cases if a session contains an aggregator transformation, u can use incremental aggregation to improve session performance. Avoid transformation errors to improve the session performance. If the session contains lookup transformation u can improve the session performance by enabling the look up cache. If U’r session contains filter transformation, create that filter transformation nearer to the sources or u can use filter condition in source qualifier. Aggregator, Rank and joiner transformation may often decrease the session performance because they must group data before processing it. To improve session performance in this case use sorted ports option.

Unit Testing

The objective of Unit testing involves testing of Business transformation rules, error conditions, mapping fields at staging and core levels.Unit testing involves the following

1. Check the Mapping of fields present in staging level.

2. Check for the duplication of values generated using Sequence generator.

3. Check for the correctness of surrogate keys, which uniquely identifies rows in database.

4. Check for Data type constraints of the fields present in staging and core levels.

5. Check for the population of status and error messages into target table.

6. Check for string columns are left and right trimmed.

7. Check every mapping needs to implement the process abort mapplet which is invoked if the number of record read from source is not equal to trailer count.

8. Check every object, transformation, source and target need to have proper metadata. Check visually in data warehouse designer tool if every transformation has a meaningful description.

Display Odd/ Even number of records?

Odd number of records:
select * from emp where (rowid,1) in (select rowid, mod(rownum,2) from emp);
Output:-
1
3
5
Even number of records:
select * from emp where (rowid,0) in (select rowid, mod(rownum,2) from emp)
Output:-
2
4
6
ii) for even number of records

select*from emp where rowid in(select decode(mod(rownum,2),0,rowid) from emp);

for odd number of records

select*from emp where rowid in(select decode(mod(rownum,2),1,rowid) from emp);

Can a primary key contain more than one columns

A PRIMARY KEY is meant for one column.. If a PRIMARY KEY is defined on more than one column then it is called COMPOSITE PRIMARY KEY.. if you have a table having course id and student id where as one student can apply for many courses and in such a way you can define PRIMARY KEY on both columns like COURSEID AND STUDENTID.

What is a view?

View is a logical table which based on one or more than one table or anather view. The table(s) on which view creats are called as base table(s). It does not have it's own data rather it shows the data from the base table(s).

View is a virtual table based on the result set of an SQL statement.It contains rows and columns just like real table.

What is a cursor & Details

Question : What is a cursor?

Oracle uses work area to execute SQL statements and store processing information PL/SQL construct called a cursor lets you name a work area and access its stored information A cursor is a mechanism used to fetch more than one row in a Pl/SQl block.

Cursor is a variable.
it is similar to a 2D array .
used for processing multiple rows.
used for storing data temporarily.

cursors is a private sql area and are 2 types implict cursor and explicit.implict cursor are predefined and explicit cursor defined by the programmer.

Question : What is use of a cursor variable? How it is defined?

A cursor variable is associated with different statements at run time, which can hold different values at run time. Static cursors can only be associated with one run time query. Acursor variable is reference type (like a pointer in C).
Declaring a cursor variable:
TYPE type_name IS REF CURSOR RETURN return_type type_name is the name of the reference type,return_type is a record type indicating the types of the select list
that will eventually be returned by the cursor variable

Question : Difference between an implicit & an explicit cursor.

PL/SQL declares a cursor implicitly for all SQL data manipulation statements, including quries that return only one row. However,queries that return more than one row you must declare an explicit cursor or use a cursor FOR loop. Explicit cursor is a cursor in which the cursor name is explicitly assigned to a SELECT statement via the CURSOR...IS statement. An implicit cursor is used for all SQL
statements Declare, Open, Fetch, Close. An explicit cursors are used to process multirow SELECT statements An implicit cursor is used to process INSERT, UPDATE, DELETE and single row SELECT. .INTO statements.

The implicit cursor is used to process INSERT, UPDATE,DELETE, and SELECT INTO statements. During the processing ofan implicit cursor,Oracle automatically performs the OPEN,FETCH, and CLOSE operations.

Where as in explicit cursors,the process of its working isdone in 4 steps namely DECLARE a cursor,OPEN a cursor,FETCH from cursor and CLOSE a cursor.

IMPLICT CURSOR:- Automatically porvide by oracle which perform DML statements. queries returns only one row.

EXPLICT CURSOR:- Defined by user. queries returns more than rows.

Explicit Cursor:-We are not able to Handle NO_DATA_FOUND Exception.

Implicit Cursor:-We are able to Handle NO_DATA_FOUND Exception.

According to pl/sql where the data is temporarily stored is called cursor.
The Implicit cursor is created by Oracle engine itself while Explicit cursor is created by programmer by mean of programming statement.
The Exception can be handled in the Implicit cursor whereas Explicit cursor is unable to handle Exceptions.

There are properties of cursor

%open - check cursor is open or not.
%found - Row found out or not.
%not found - yes if no row found.
%row count - Rows affected by statement.

There are 4 steps of working of cursor named DECLARE-cursor is declared, OPEN- cursor is opened, FETCH- data fetched, CLOSE- close the cursor.

Informatica PowerCenter

Informatica Repository Manager

Q. What type of repositories can be created using Informatica Repository Manager?

A. Informatica PowerCenter includeds following type of repositories :

Standalone Repository : A repository that functions individually and this is unrelated to any other repositories.
Global Repository : This is a centralized repository in a domain. This repository can contain shared objects across the repositories in a domain. The objects are shared through global shortcuts.
Local Repository : Local repository is within a domain and it� not a global repository. Local repository can connect to a global repository using global shortcuts and can use objects in it� shared folders.
Versioned Repository : This can either be local or global repository but it allows version control for the repository. A versioned repository can store multiple copies, or versions of an object. This features allows to efficiently develop, test and deploy metadata in the production environment.

Q. What is a code page?

A. A code page contains encoding to specify characters in a set of one or more languages. The code page is selected based on source of the data. For example if source contains Japanese text then the code page should be selected to support Japanese text.

When a code page is chosen, the program or application for which the code page is set, refers to a specific set of data that describes the characters the application recognizes. This influences the way that application stores, receives, and sends character data.

Q. Which all databases PowerCenter Server on Windows can connect to?

A. PowerCenter Server on Windows can connect to following databases:

IBM DB2
Informix
Microsoft Access
Microsoft Excel
Microsoft SQL Server
Oracle
Sybase
Teradata

Q. Which all databases PowerCenter Server on UNIX can connect to?

A. PowerCenter Server on UNIX can connect to following databases:

IBM DB2
Informix
Oracle
Sybase
Teradata

Infomratica Mapping Designer

Q. How to execute PL/SQL script from Informatica mapping?

A. Stored Procedure (SP) transformation can be used to execute PL/SQL Scripts. In SP Transformation PL/SQL procedure name can be specified. Whenever the session is executed, the session will call the pl/sql procedure.

Q. How can you define a transformation? What are different types of transformations available in Informatica?

A. A transformation is a repository object that generates, modifies, or passes data. The Designer provides a set of transformations that perform specific functions. For example, an Aggregator transformation performs calculations on groups of data. Below are the various transformations available in Informatica:

Aggregator
Application Source Qualifier
Custom
Expression
External Procedure
Filter
Input
Joiner
Lookup
Normalizer
Output
Rank
Router
Sequence Generator
Sorter
Source Qualifier
Stored Procedure
Transaction Control
Union
Update Strategy
XML Generator
XML Parser
XML Source Qualifier

Q. What is a source qualifier? What is meant by Query Override?

A. Source Qualifier represents the rows that the PowerCenter Server reads from a relational or flat file source when it runs a session. When a relational or a flat file source definition is added to a mapping, it is connected to a Source Qualifier transformation.

PowerCenter Server generates a query for each Source Qualifier Transformation whenever it runs the session. The default query is SELET statement containing all the source columns. Source Qualifier has capability to override this default query by changing the default settings of the transformation properties. The list of selected ports or the order they appear in the default query should not be changed in overridden query.

Q. What is aggregator transformation?

A. The Aggregator transformation allows performing aggregate calculations, such as averages and sums. Unlike Expression Transformation, the Aggregator transformation can only be used to perform calculations on groups. The Expression transformation permits calculations on a row-by-row basis only.

Aggregator Transformation contains group by ports that indicate how to group the data. While grouping the data, the aggregator transformation outputs the last row of each group unless otherwise specified in the transformation properties.

Various group by functions available in Informatica are : AVG, COUNT, FIRST, LAST, MAX, MEDIAN, MIN, PERCENTILE, STDDEV, SUM, VARIANCE.

Q. What is Incremental Aggregation?

A. Whenever a session is created for a mapping Aggregate Transformation, the session option for Incremental Aggregation can be enabled. When PowerCenter performs incremental aggregation, it passes new source data through the mapping and uses historical cache data to perform new aggregation calculations incrementally.

Q. How Union Transformation is used?

A. The union transformation is a multiple input group transformation that can be used to merge data from various sources (or pipelines). This transformation works just like UNION ALL statement in SQL, that is used to combine result set of two SELECT statements.

Q. Can two flat files be joined with Joiner Transformation?

A. Yes, joiner transformation can be used to join data from two flat file sources.

Q. What is a look up transformation?

A. This transformation is used to lookup data in a flat file or a relational table, view or synonym. It compares lookup transformation ports (input ports) to the source column values based on the lookup condition. Later returned values can be passed to other transformations.

Q. Can a lookup be done on Flat Files?

A. Yes.

Q. What is the difference between a connected look up and unconnected look up?

A. Connected lookup takes input values directly from other transformations in the pipleline.

Unconnected lookup doesn� take inputs directly from any other transformation, but it can be used in any transformation (like expression) and can be invoked as a function using :LKP expression. So, an unconnected lookup can be called multiple times in a mapping.

Q. What is a mapplet?

A. A mapplet is a reusable object that is created using mapplet designer. The mapplet contains set of transformations and it allows us to reuse that transformation logic in multiple mappings.

Q. What does reusable transformation mean?

A. Reusable transformations can be used multiple times in a mapping. The reusable transformation is stored as a metadata separate from any other mapping that uses the transformation. Whenever any changes to a reusable transformation are made, all the mappings where the transformation is used will be invalidated.

Q. What is update strategy and what are the options for update strategy?

A. Informatica processes the source data row-by-row. By default every row is marked to be inserted in the target table. If the row has to be updated/inserted based on some logic Update Strategy transformation is used. The condition can be specified in Update Strategy to mark the processed row for update or insert.

Following options are available for update strategy :

DD_INSERT : If this is used the Update Strategy flags the row for insertion. Equivalent numeric value of DD_INSERT is 0.
DD_UPDATE : If this is used the Update Strategy flags the row for update. Equivalent numeric value of DD_UPDATE is 1.
DD_DELETE : If this is used the Update Strategy flags the row for deletion. Equivalent numeric value of DD_DELETE is 2.
DD_REJECT : If this is used the Update Strategy flags the row for rejection. Equivalent numeric value of DD_REJECT is 3.

Update strategy transformation Use

Update strategy transformation can be used in two levels.

1)Mapping level
2)Session level

Mapping level:-Within a mapping you can use update strategy transf to flag rows for

Insert
Update
Delete
Reject

Session level:-In session properties you have an option as
Treat source rows as-Data driven(by default)
Or you can choose insert,update,delete

For target tables you can set the following update strategy options.

Insert
Delete
Update
-Update as update
-update as insert
-update else insert.
Truncate table.

Do's and Dont's while using Sorted Input in Aggregator transformation

In general, follow this check list to ensure that you are handling aggregator with sorted inputs correctly:
1. Do not use sorted input if any of the following conditions are true:

a.The aggregate expression uses nested aggregate functions.
b.The session uses incremental aggregation.
c. Input data is data driven.
You select data driven for the Treat Source Rows as Session Property, or the Update Strategy transformation appears before the Aggregator transformation in the mapping.
If you use sorted input under these circumstances, the Informatica Server reverts to default aggregate behavior, reading all values before performing aggregate calculations.

Remember, when you are using sorted inputs, you pass sorted data through the Aggregator.
Data must be sorted as follows:
1. By the Aggregator group by ports, in the order they appear in the Aggregator transformation.
2. Using the same sort order configured for the session.
If data is not in strict ascending or descending order based on the session sort order, the Informatica Server fails the session.

Aggregate Transformation

The Aggregate transformation applies aggregate functions, such as Average, to column values and copies the results to the transformation output. Besides aggregate functions, the transformation provides the GROUP BY clause, which you can use to specify groups to aggregate across.
You configure the Aggregate transformation at the transformation, output, and column levels.

At the transformation level, you configure the Aggregate transformation for performance by specifying the following values:
- The number of groups that are expected to result from a Group by operation.
- The number of distinct values that are expected to result from aCount distinct operation.
- The percentage by which memory can be extended during the aggregation.
The Aggregate transformation can also be configured to generate a warning instead of failing when the value of a divisor is zero.

At the output level, you configure the Aggregate transformation for performance by specifying the number of groups that are expected to result from a Group by operation. The Aggregate transformation supports multiple outputs, and each can be configured differently.

At the column level, you specify the following values:
- The aggregation that the column performs.
- The comparison options of the aggregation.

You can also configure the Aggregate transformation for performance by specifying these values:

The number of groups that are expected to result from a Group byoperation on the column.
The number of distinct values that are expected to result from a Count distinct operation on the column.

You can also identify columns as IsBig if a column contains large numeric values or numeric values with high precision.

The Aggregate transformation is asynchronous, which means that it does not consume and publish data row by row. Instead it consumes the whole rowset, performs its groupings and aggregations, and then publishes the results.
This transformation does not pass through any columns, but creates new columns in the data flow for the data it publishes. Only the input columns to which aggregate functions apply or the input columns the transformation uses for grouping are copied to the transformation output. For example, an Aggregate transformation input might have three columns: CountryRegion, City, and Population. The transformation groups by the CountryRegion column and applies the Sum function to the Population column. Therefore the output does not include the City column.

You can also add multiple outputs to the Aggregate transformation and direct each aggregation to a different output. For example, if the Aggregate transformation applies the Sum and the Average functions, each aggregation can be directed to a different output.

You can apply multiple aggregations to a single input column. For example, if you want the sum and average values for an input column named Sales, you can configure the transformation to apply both the Sum and Average functions to the Sales column.

The Aggregate transformation has one input and one or more outputs. It does not support an error output.

Operations

The Aggregate transformation supports the following operations.

Operation	Description
Group by	Divides datasets into groups. Columns of any data type can be used for grouping. For more information, see GROUP BY (Transact-SQL
Sum	Sums the values in a column. Only columns with numeric data types can be summed. For more information, see SUM (Transact-SQL).
Average	Returns the average of the column values in a column. Only columns with numeric data types can be averaged. For more information, see AVG (Transact-SQL).
Count	Returns the number of items in a group. For more information, seeCOUNT (Transact-SQL).
Count distinct	Returns the number of unique nonnull values in a group. For more information, see Eliminating Duplicates with DISTINCT.
Minimum	Returns the minimum value in a group. For more information, see MIN (Transact-SQL). In contrast to the Transact-SQL MIN function, this operation can be used only with numeric, date, and time data types.
Maximum	Returns the maximum value in a group. For more information, see MAX (Transact-SQL). In contrast to the Transact-SQL MAX function, this operation can be used only with numeric, date, and time data types.

The Aggregate transformation handles null values in the same way as the SQL Server relational database engine. The behavior is defined in the SQL-92 standard. The following rules apply:

In a GROUP BY clause, nulls are treated like other column values. If the grouping column contains more than one null value, the null values are put into a single group.
In the COUNT (column name) and COUNT (DISTINCT column name) functions, nulls are ignored and the result excludes rows that contain null values in the named column.
In the COUNT (*) function, all rows are counted, including rows with null values.