Powered By Blogger

Tuesday, May 24, 2011

Informatica Question AND Answers + Concepts

Which transformation should we use to normalize the COBOL and relational sources?

Normalizer Transformation. When we drag the COBOL source in to the mapping Designer workspace, the Normalizer transformation automatically appears, creating input and output ports for every column in the source.

Difference between static cache and dynamic cache?

In case of Dynamic cache when you are inserting a new row it looks at the lookup cache to see if the row existing or not, If not it inserts in the target and cache as well in case of Static cache when you are inserting a new row it checks the cache and writes to the target but not cache

If you cache the lookup table, you can choose to use a dynamic or static cache. By default, the lookup cache remains static and does not change during the session. With a dynamic cache, the Informatica Server inserts or updates rows in the cache during the session. When you cache the target table as the lookup, you can look up values in the target and insert them if they do not exist, or update them if they do.

What are the join types in joiner transformation?

The following are the join types Normal, Master Outer, Detail Outer, and Full Outer

In which conditions we can not use joiner transformation (Limitations of joiner transformation)?

In the conditions; either input pipeline contains an Update Strategy transformation, you connect a Sequence Generator transformation directly before the Joiner transformation

1.Both input pipelines originate from the same Source Qualifier transformation.
2. Both input pipelines originate from the same Normalizer transformation.
3. Both input pipelines originate from the same Joiner transformation.
4. Either input pipeline contains an Update Strategy transformation.
5. We connect a Sequence Generator transformation directly before the Joiner transformation.

What is the look up transformation?

Used to look up data in a relational table or view.
Lookup is a passive transformation and used to look up data in a flat file or a relational table

What are the difference between joiner transformation and source qualifier transformation?
Source Qualifier Operates only with relational sources within the same schema. Joiner can have either heterogeneous sources or relation sources in different schema 2. Source qualifier requires atleats one matching column to perform a join. Joiner joins based on matching port. 3. Additionally, Joiner requires two separate input pipelines and should not have an update strategy or Sequence generator (this is no longer true from Infa 7.2).

1) Joiner can join relational sources which come from different sources whereas in source qualifier the relational sources should come from the same data source.
2) We need matching keys to join two relational sources in source qualifier transformation. Where as we doesn’t need matching keys to join two sources.

Why use the lookup transformation?

Lookup is a transformation to look up the values from a relational table/view or a flat file. The developer defines the lookup match criteria. There are two types of Lookups in Power center-Designer, namely; 1) Connected Lookup 2) Unconnected Lookup. Different caches can also be used with lookup like static, dynamic, persistent, and shared (The dynamic cache cannot be used while creating an un-connected lookup)
Lookup transformation is Passive and it can be both Connected and Unconnected as well. It is used to look up data in a relational table, view, or synonym
look up is used to perform one of the following task: -to get related value -to perform calculation -to update slowly changing dimension table

check whether the record already existing in the table


What is source qualifier transformation?


SQ is an active transformation. It performs one of the following task: to join data from the same source database to filter the rows when Power centre reads source data to perform an outer join to select only distinct values from the source

In source qualifier transformation a user can defined join conditions, filter the data and eliminating the duplicates. The default source qualifier can over write by the above options; this is known as SQL Override. alidwh@gmail.com

the source qualifier represents the records that the Informatica server reads when it runs a session.

When we add a relational or a flat file source definition to a mapping, we need to connect it to a source qualifier transformation. The source qualifier transformation represents the records that the Informatica server reads when it runs a session.


How the Informatica server increases the session performance through partitioning the source?


Partitioning the session improves the session performance by creating multiple connections to sources and targets and loads data in parallel pipe lines


What are the rank caches?


The Informatica server stores group information in an index cache and row data in data cache

when the server runs a session with a Rank transformation; it compares an input row with rows with rows in data cache. If the input row out-ranks a stored row, the Informatica server replaces the stored row with the input row.

During the session, the Informatica server compares an input row with rows in the data cache. If the input row out-ranks a stored row, the Informatica server replaces the stored row with the input row. The Informatica server stores group information in an index cache and row data in a data cache.

What is Code Page Compatibility?


When two code pages are compatible, the characters encoded in the two code pages are virtually identical.

Compatibility between code pages is used for accurate data movement when the Informatica Sever runs in the Unicode data movement mode. If the code pages are identical, then there will not be any data loss. One code page can be a subset or superset of another. For accurate data movement, the target code page must be a superset of the source code page.

How can you create or import flat file definition in to the warehouse designer?

By giving server connection path

Create the file in Warehouse Designer or Import the file from the location it exists or modify the source if the structure is one and the same

first create in source designer then drag into warehouse designer you can't create a flat file target defenition directly ramraj

There is no way to import target definition as file in Informatica designer. So while creating the target definition for a file in the warehouse designer it is created considering it as a table, and then in the session properties of that mapping it is specified as file.

U can not create or import flat file definition in to warehouse designer directly.Instead U must analyze the file in source analyzer,then drag it into the warehouse designer.When U drag the flat file source definition into warehouse designer workspace,the warehouse designer creates a relational target definition not a file definition.If u want to load to a file,configure the session to write to a flat file.When the informatica server runs the session,it creates and loads the flatfile.


What is aggregate cache in aggregator transformation?

Aggregate value will stored in data cache, grouped column value will stored in index cache

Power centre server stores data in the aggregate cache until it completes aggregate calculations.

aggregator transformation contains two caches namely data cache and index cache data cache consists aggregator value or the detail record index cache consists grouped column value or unique values of the records

When the Power Center Server runs a session with an Aggregator transformation, it stores data in aggregator until it completes the aggregation calculation.

The aggregator stores data in the aggregate cache until it completes aggregate calculations. When u run a session that uses an aggregator transformation, the Informatica server creates index and data caches in memory to process the transformation. If the Informatica server requires more space, it stores overflow values in cache files.


How can you recognize whether or not the newly added rows in the source are gets insert in the target?


In the type-2 mapping we have three options to recognize the newly added rows. i) Version Number ii) Flag Value iii) Effective Date Range
From session SrcSuccessRows can be compared with TgtSuccessRows
check the session log or check the target table


How the Informatica server increases the session performance through partitioning the source?


Partitioning the session improves the session performance by creating multiple connections to sources and targets and loads data in parallel pipe lines


What are the types of lookup?

Mainly first three Based on connection: 1. Connected 2. Unconnected Based on source Type: 1. Flat file 2. Relational Based on cache: 1. Cached 2. Un cached Based on cache Type: 1. Static 2. Dynamic Based on reuse: 1. persistence 2. Non persistence Based on input: 1. Sorted 2. Unsorted

connected, unconnected

mainly two types of look up...there 1.static lookup 2.dynamic lookup In static lookup..There two types are used one is connected and unconnected. In connected lookup means while using the pipeline symbol... In unconnected lookup means while using the expression condition


What are the types of metadata that stores in repository?

Data base connections, global objects,sources,targets,mapping,mapplets,sessions,shortcuts,transfrmations

The repository stores metadata that describes how to transform and load source and target data.

Data about data

Metadata can include information such as mappings describing how to transform source data, sessions indicating when you want the Informatica Server to perform the transformations, and connect strings for sources and targets.

Following are the types of metadata that stores in the repository Database connections Global objects Mappings Mapplets Multidimensional metadata Reusable transformations Sessions and batches Short cuts Source definitions Target definitions Transformations.


What happens if Informatica server doesn't find the session parameter in the parameter file?

Workflow will fail.


Can you access a repository created in previous version of informatica?

We have to migrate the repository from the older version to newer version. Then you can use that repository.


Without using ETL tool can u prepare a Data Warehouse and maintain

Yes we can do that using PL/ SQL or Stored procedures when all the data are in the same databases. If you have source as flat files you can?t do it through PL/ SQL or stored procedures.


How do you identify the changed records in operational data

In my project source system itself sending us the new records and changed records from the last 24 hrs.


Why couldn't u go for Snowflake schema?

Snowflake is less performance while compared to star schema, because it will contain multi joins while retrieving the data.
Snowflake is preferred in two cases,
 If you want to load the data into more hierarchical levels of information example yearly, quarterly, monthly, daily, hourly, minutes of information. Prefer snowflake.
 Whenever u found input data contain more low cardinality elements. You have to prefer snowflake schema. Low cardinality example: sex , marital Status, etc., Low cardinality means no of distinct records is very less while compared to total number of the records


Name some measures in your fact table?

Sales amount.


How many dimension tables did you had in your project and name some dimensions (columns)?

Product Dimension : Product Key, Product id, Product Type, Product name, Batch Number.
Distributor Dimension: Distributor key, Distributor Id, Distributor Location,
Customer Dimension : Customer Key, Customer Id, CName, Age, status, Address, Contact
Account Dimension : Account Key, Acct id, acct type, Location, Balance,


How many Fact and Dimension tables are there in your project?

In my module (Sales) we have 4 Dimensions and 1 fact table.


How many Data marts are there in your project?

There are 4 Data marts, Sales, Marketing, Finance and HR. In my module we are handling only sales data mart.


What is the daily data volume (in GB/records)? What is the size of the data extracted in the extraction process?


Approximately average 40k records per file per day. Daily we will get 8 files from 8 source systems.


What is the size of the database in your project?

Based on the client?s database, it might be in GB?s.


What is meant by clustering?

It will join two (or more) tables in single buffer, will retrieve the data easily.


. Whether are not the session can be considered to have a heterogeneous target is determined?


It will consider (there is no primary key and foreign key relationship)

Under what circumstance can a target definition are edited from the mapping designer.


Within the mapping where that target definition is being used?

We can't edit the target definition in mapping designer. we can edit the target in warehouse designer only. But in our projects, we haven't edited any of the targets. if any change required to the target definition we will inform to the DBA to make the change to the target definition and then we will import again. We don't have any permission to the edit the source and target tables.


Can a source qualifier be used to perform a outer join when joining 2 database?


No, we can't join two different databases join in SQL Override.


If u r source is flat file with delimited operator.when next time u want change that

delimited operator where u can make?


In the session properties go to mappings and click on the target instance click set file properties we have to change the delimited option.


If index cache file capacity is 2MB and datacache is 1 MB. If you enter the data of capacity for index is 3 MB and data is 2 MB. What will happen?


Nothing will happen based the buffer size exists in the server we can change the cache sizes. Max size of cache is 2 GB.


Difference between next value and current value ports in sequence generator?


Assume that they r both connected to the input of another transformer?
It will gives values like nextvalue 1, currval 0.


How does dynamic cache handle the duplicates rows?

Dynamic Cache will gives the flags to the records while inserting to the cache it will gives flags to the records, like new record assigned to insert flag as "0", updated record is assigned to updated flag as "1", No change record assigned to rejected flag as "2"


How will u find whether your mapping is correct or not without connecting session?


Through debugging option.


If you are using aggregator transformation in your mapping at that time your source contain dimension or fact?


According to requirements, we can use aggregator transformation. There is no limitation for the aggregator. We should use source as dimension or fact.


My input is oracle and my target is flat file shall I load it? How?


Yes, Create flat file based on the structure match with oracle table in warehouse designer than develop the mapping according requirement and map to that target flat file. Target file is created in TgtFiles directory in the server system.


for a session, can I use 3 mappings?


No, for one session there should be only one mapping. We have to create separate session for each mapping


Type of loading procedures?


Load procedures are two types 1) Normal load 2) bulk loads if you are talking about informatica level. If you are talking about project load procedures based on the project requirement. Daily loads or weekly loads.


Are you involved in high level r low level design? What is meant by that high level design n low level design?


Low Level design:
Requirements should be in the excel format which describes field to field validations and business logic needs to present. Mostly onsite team will do this Low Level design.
High Level Design:
Describes the informatica flow chart from source qualifier to target simply we can say flow chart of the informatica mapping. Developer will do this design document.


what r the dimension load methods?


Daily loads or weekly loads based on the project requirement.


where we are using lkp b/n source to stage or stage to target?


Depend on the requirement. There is no rule we have to use in this stage only.


How will you do SQL tuning?


We can do SQL tuning using Oracle Optimizer, TOAD software


did u use any other tools for scheduling purpose other than workflow manager or pmcmd?

Using third party tools like "Control M",


What is SQL mass updating?


A) Update (select hs1.col1 as hs1_col1
, hs1.col2 as hs1_col2
, hs1.col3 as hs1_col3
, hs2.col1 as hs2_col1
, hs2.col2 as hs2_col2
, hs2.col3 as hs2_col3
From hs1, hs2
Where hs1.sno = hs2.sno)
set hs1_col1 = hs2_col1
, hs1_col2 = hs2_col2
, hs1_col3 = hs2_col3;


what is unbounded exception in source qualifier?


"TE_7020 Unbound field in Source Qualifier" when running session
A) Problem Description:

When running a session the session fails with the following error:

TE_7020 Unbound field in Source Qualifier "
Solution:

This error will occur when there is an inconsistency between the Source Qualifier and the source table.

Either there is a field in the Source Qualifier that is not in the physical table or there is a column
of the source object that has no link to the corresponding port in the Source Qualifier.
To resolve this, re-import the source definition into the Source Analyzer in Designer.

Bring the new Source definition into the mapping.This will also re-create the Source Qualifier.

Connect the new Source Qualifier to the rest of the mapping as before.


Using unconnected lookup how we you remove nulls n duplicates?


We can't handle nulls and duplicates in the unconnected lookup. We can handle in dynamic connected lookup.


I have 20 lookup, 10 joiners, 1 normalizer how will you improve the session performance?


We have to calculate lookup & joiner caches size.


What is version controlling?


It is the method to differentiate the old build and the new build after changes made to the existing code. For the old code v001 and next time u have to increase the version number as v002 like that. In my last company we haven't use any version controlling. We just delete the old build and replace with the new code.
We don't maintain version controlling in informatica. We are maintaining the code in VSS (Virtual visual Source) that is the software with maintain the code with versioning. Whenever client made change request came once the production starts we have to create another build.


How is the Sequence Generator transformation different from other transformations?


The Sequence Generator is unique among all transformations because we cannot add, edit, or delete its default ports (NEXTVAL and CURRVAL).

Unlike other transformations we cannot override the Sequence Generator transformation properties at the session level. This protecxts the integrity of the sequence values generated.


What are the advantages of Sequence generator? Is it necessary, if so why?


We can make a Sequence Generator reusable, and use it in multiple mappings. We might reuse a Sequence Generator when we perform multiple loads to a single target.

For example, if we have a large input file that we separate into three sessions running in parallel, we can use a Sequence Generator to generate primary key values. If we use different Sequence Generators, the Informatica Server might accidentally generate duplicate key values. Instead, we can use the same reusable Sequence Generator for all three sessions to provide a unique value for each target row.


What is a Data warehouse?


A Data warehouse is a denormalized database, which stores historical data in summary level format. It is specifically meant for heavy duty querying and analysis.


What is cube?


Cube is a multidimensional representation of data. It is used for analysis purpose. A cube gives multiple views of data.


What is drill-down and drill-up?


Both drill-down and drill-up are used to explore different levels of dimensionally modeled data. Drill-down allows the users view lower level (i.e. more detailed level) of data and drill-up allows the users to view higher level (i.e. more summarized level) of data.

What is the need of building a data warehouse?


The need of building a data warehouse is that, it acts as a storage fill for a large amount of data. It also provides end user access to a wide varity of data, helps in analyzing data more effectively and also in generating reports. It acts as a huge repository for integrated information.


What is Data Modeling? What are the different types of Data Modeling?


Data modeling is a process of creating data models. In other words, it is structuring and organizing data in a uniform manner where constraints are placed within the structure.The Data structure formed are maintained in a database management system. The Different types of Data Modeling are: 1. Dimension Modelling 2. E-R Modelling


What are the different types of OLAP TECHNOLOGY?


Online Analytical process is of three types, they are MOLAP, HOLAP and ROLAP. MOLAP Mulidimensional online analytical process. It is used for fast retrival of data and also for slicing and dicing operations. It plays a vital role in easing complex calculations. ROLAP Relational online analytical process. It has the ability to handle large amount of data. HOLAP Hybrid online analytical process. It is a combination of both HOLAP and MOLAP.


What is the difference between a Database and a Datawarehouse?


Database is a place where data is taken as base to data access to retrieve and load data, whereas, a data warehouse is a place where application data is managed for analysis and reporting services. Database stores data in the form of tables and columns. On the contrary, in a data warehouse, data is subject oriented and stored in the form of dimensions and packages which are used for analysis purpose. In short, we must understand that a database is used for running an enterprise but a data warehouse helps in how to run an enterprise.


What is the use of tracing levels in transformation?


Tracing level in the case of informatica specifies the level of detail of information that can be recorded in the session log file while executing the workflow.

4 types of tracing levels supported,
by default, the tracing level for every transformation is Normal.

1. Normal: It specifies the initialization and status information and summerization of the success rows and target tows and the information about the skipped rows due to transformation errors.

2. Terse specifies Normal + Notification of data

3. Verbose Initialisation : In addition to the Normal tracing specifies the location of the data cache files and index cache files that are treated and detailed transformation statistics for each and every transformation within the mapping.

4. Verbose data: Along with verbose initialisation records each and every record processed by the informatica server

For better performance of mapping execution the tracing level should be specified as TERSE
Verbose initialisation and verbose data are used for debugging purpose.


Tracing levels store information about mapping and transformations.

What are the various types of transformation?


Various types of transformation are: Aggregator Transformation, Expression Transformation, Filter Transformation, Joiner Transformation, Lookup Transformation, Normalizer Transformation, Rank Transformation, Router Transformation, Sequence Generator Transformation, Stored Procedure Transformation, Sorter Transformation, Update Strategy Transformation, XML Source Qualifier Transformation, Advanced External Procedure Transformation, External Transformation.


What is the difference between active transformation and passive transformation?


An active transformation can change the number of rows that pass through it, but a passive transformation can not change the number of rows that pass through it.


What is the use of control break statements?


They execute a set of codes within the loop and endloop.


What are the types of loading in Informatica?


There are two types of loading, normal loading and bulk loading. In normal loading, it loads record by record and writes log for that. It takes comparatively a longer time to load data to the target in normal loading. But in bulk loading, it loads number of records at a time to target database. It takes less time to load data to target.


What is the difference between source qualifier transformation and application source qualifier transformation?


Source qualifier transformation extracts data from RDBMS or from a single flat file system. Application source qualifier transformation extracts data from application sources like ERP.


How do we create primary key only on odd numbers?


To create primary key, we use sequence generator and set the 'Increment by' property of sequence generator to 2


What is authenticator?


It validates user name and password to access the PowerCentre repository


What is the use of auxiliary mapping?


Auxiliary mapping reflects change in one table whenever there is a change in the other table.

What is a mapplet?


Mapplet is the set of reusable transformation.


Which ETL tool is more preferable Informatica or Data Stage and why?


Preference of an ETL tool depends on affordability and functionality. It is mostly a tradeoff between the price and feature. While Informatica has been a market leader since the past many years, DataStage is beginning to pick up momentum.

What is worklet?


Worklet is an object that represents a set of tasks.


What is workflow?


A workflow is a set of instructions that tells the Informatica server how to execute the tasks.


What is session?


A session is a set of instructions to move data from sources to targets.


Why do we need SQL overrides in Lookup transformations?


In order to lookup more than one value from one table, we go for SQL overrides in Lookups.

No comments: