Learn-Informatica: 2011

Wednesday, September 7, 2011

How to concatenate values from multiple rows in Informatica

In Informatica , many times we face scenario where we have to concatenate values from two or more rows into a single row.

Consider a source which contains the below records:
ID Name
1 A
2 R
3 B
1 D
2 E
2 O

Suppose we want output like this:
ID Name
1 D,A
2 O,E,R
3 B

Above scenario can be implemented in Informatica as depicted below:

Steps:
1) Create source definition which contains the columns named ID and NAME
2) After source qualifier place a sorter and sort the data in ascending order . Here sort key will be ID and sort order will be ascending order.

3) After sorter put an expression transformation . In expression crate following fields in addition to the fields coming from sorter
v_NAME(variable) IIF(ID=v_ID,NAME||','||v_NAME,NAME)
v_ID(variable):ID
out_NAME(output field)=v_NAME

4) After expression place an aggregator . In aggregator group by will be on ID.

5) Then take the ID and NAME field to the target.

Initial loading and delta loading

Initial loading and delta loading are the basic ETL concepts used by any tools in the market.

Initial Loading:-The type of loading that you do for the first time to empty target tables.

Delta loading:-As the volume increases in the target table over the period of time,you would like to load only changes records from source to target table rather than all the data from the source.This type is called delta loading.

The options like "Delete data before loading" and "Auto correct load"should not be related/linked to the above mentioned types of loading.

"Delete data before loading" can be used when you want to ignore all the existing data in the target table and load from scratch from source table.

"Auto correct load" is the option used when you want no duplicate records to be loaded into the target.

Friday, September 2, 2011

PowerCenter Glossary of Terms

active source

An active source is an active transformation the Integration Service uses to generate rows.

adaptive dispatch mode

A dispatch mode in which the Load Balancer dispatches tasks to the node with the most available CPUs.

application services

A group of services that represent PowerCenter server-based functionality. You configure each application service based on your environment requirements. Application services include the Integration Service, Repository Service, SAP BW Service, Metadata Manager Service, Reporting Service, and Web Services Hub.

attachment view

View created in a web service source or target definition for a WSDL that contains a mime attachment. The attachment view has an n:1 relationship with the envelope view.

available resource

Any PowerCenter resource that is configured to be available to a node.

backup node

Any node that is configured to run a service process, but is not configured as a primary node.

blocking

The suspension of the data flow into an input group of a multiple input group transformation.

blurring

A masking rule that limits the range of numeric output values to a fixed or percent variance from the value of the source data. The Data Masking transformation returns numeric data that is close to the value of the source data.

bounds

A masking rule that limits the range of numeric output to a range of values. The Data Masking transformation returns numeric data between the minimum and maximum bounds.

buffer block

A block of memory that the Integration Services uses to move rows of data from the source to the target. The number of rows in a block depends on the size of the row data, the configured buffer block size, and the configured buffer memory size.

buffer block size

The size of the blocks of data used to move rows of data from the source to the target. You specify the buffer block size in bytes or as percentage of total memory. Configure this setting in the session properties.

buffer memory

Buffer memory allocated to a session. The Integration Service uses buffer memory to move data from sources to targets. The Integration Service divides buffer memory into buffer blocks.

buffer memory size

Total buffer memory allocated to a session specified in bytes or as a percentage of total memory.

built-in parameters and variables

Predefined parameters and variables that do not vary according to task type. They return system or run-time information such as session start time or workflow name. Built-in variables appear under the Built-in node in the Designer or Workflow Manager Expression Editor.

cache partitioning

A caching process that the Integration Service uses to create a separate cache for each partition. Each partition works with only the rows needed by that partition. The Integration Service can partition caches for the Aggregator, Joiner, Lookup, and Rank transformations.

child dependency

A dependent relationship between two objects in which the child object is used by the parent object.

child object

A dependent object used by another object, the parent object.

cold start

A start mode that restarts a task or workflow without recovery.

commit number

A number in a target recovery table that indicates the amount of messages that the Integration Service loaded to the target. During recovery, the Integration Service uses the commit number to determine if it wrote messages to all targets.

commit source

An active source that generates commits for a target in a source-based commit session.

compatible version

An earlier version of a client application or a local repository that you can use to access the latest version repository.

composite object

An object that contains a parent object and its child objects. For example, a mapping parent object contains child objects including sources, targets, and transformations.

concurrent workflow

A workflow configured to run multiple instances at the same time. When the Integration Service runs a concurrent workflow, you can view the instance in the Workflow Monitor by the workflow name, instance name, or run ID.

coupled group

An input group and output group that share ports in a transformation.

CPU profile

An index that ranks the computing throughput of each CPU and bus architecture in a grid. In adaptive dispatch mode, nodes with higher CPU profiles get precedence for dispatch.

custom role

A role that you can create, edit, and delete. See also role.

Custom transformation

A transformation that you bind to a procedure developed outside of the Designer interface to extend PowerCenter functionality. You can create Custom transformations with multiple input and output groups.

Custom transformation procedure

A C procedure you create using the functions provided with PowerCenter that defines the transformation logic of a Custom transformation.

Custom XML view

An XML view that you define instead of allowing the XML Wizard to choose the default root and columns in the view. You can create custom views using the XML Editor and the XML Wizard in the Designer.

data masking

A process that creates realistic test data from production source data. The format of the original columns and relationships between the rows are preserved in the masked data.

Data Masking transformation

A passive transformation that replaces sensitive columns in source data with realistic test data. Each masked column retains the datatype and the format of the original data.

default permissions

The permissions that each user and group receives when added to the user list of a folder or global object. Default permissions are controlled by the permissions of the default group, “Other.”

denormalized view

An XML view that contains more than one multiple-occurring element.

dependent object

An object used by another object. A dependent object is a child object.

deployment group

A global object that contains references to other objects from multiple folders across the repository. You can copy the objects referenced in a deployment group to multiple target folders in another repository. When you copy objects in a deployment group, the target repository creates new versions of the objects. You can create a static or dynamic deployment group.

Design Objects privilege group

A group of privileges that define user actions on the following repository objects: business components, mapping parameters and variables, mappings, mapplets, transformations, and user-defined functions.

deterministic output

Source or transformation output that does not change between session runs when the input data is consistent between runs.

dispatch mode

A mode used by the Load Balancer to dispatch tasks to nodes in a grid.

domain

See PowerCenter domain.

Wednesday, June 29, 2011

Workflow Variables

Workflow Variables Overview:

You can create and use variables in a workflow to reference values and record information. For example, use a Variable in a Decision task to determine whether the previous task ran properly. If it did, you can run the next task.

If not, you can stop the workflow. Use the following types of workflow variables:

Predefined workflow variables. The Workflow Manager provides predefined workflow variables for tasks within a workflow.
User-defined workflow variables. You create user-defined workflow variables when you create a workflow. Use workflow variables when you configure the following types of tasks:
Assignment tasks. Use an Assignment task to assign a value to a user-defined workflow variable. For Example, you can increment a user-defined counter variable by setting the variable to its current value plus 1.
Decision tasks. Decision tasks determine how the Integration Service runs a workflow. For example, use the Status variable to run a second session only if the first session completes successfully.
Links. Links connect each workflow task. Use workflow variables in links to create branches in the workflow. For example, after a Decision task, you can create one link to follow when the decision condition evaluates to true, and another link to follow when the decision condition evaluates to false.
Timer tasks. Timer tasks specify when the Integration Service begins to run the next task in the workflow. Use a user-defined date/time variable to specify the time the Integration Service starts to run the next task.

Use the following keywords to write expressions for user-defined and predefined workflow variables:

AND
OR
NOT
TRUE
FALSE
NULL
SYSDATE

Predefined Workflow Variables:

Each workflow contains a set of predefined variables that you use to evaluate workflow and task conditions. Use the following types of predefined variables:

Task-specific variables. The Workflow Manager provides a set of task-specific variables for each task in the workflow. Use task-specific variables in a link condition to control the path the Integration Service takes when running the workflow. The Workflow Manager lists task-specific variables under the task name in the Expression Editor.
Built-in variables. Use built-in variables in a workflow to return run-time or system information such as folder name, Integration Service Name, system date, or workflow start time. The Workflow Manager lists built-in variables under the Built-in node in the Expression Editor.

Task-Specific Variables	Description	Task Types	Data type
Condition	Evaluation result of decision condition expression. If the task fails, the Workflow Manager keeps the condition set to null. Sample syntax: $Dec_TaskStatus.Condition =	Decision	Integer
End Time	Date and time the associated task ended. Precision is to the second. Sample syntax: $s_item_summary.EndTime > TO_DATE('11/10/2004 08:13:25')	All tasks	Date/Time
ErrorCode	Last error code for the associated task. If there is no error, the Integration Service sets ErrorCode to 0 when the task completes. Sample syntax: $s_item_summary.ErrorCode = 24013. Note: You might use this variable when a task consistently fails with this final error message.	All tasks	Integer
ErrorMsg	Last error message for the associated task.If there is no error, the Integration Service sets ErrorMsg to an empty string when the task completes. Sample syntax: $s_item_summary.ErrorMsg = 'PETL_24013 Session run completed with failure Variables of type Nstring can have a maximum length of 600 characters. Note: You might use this variable when a task consistently fails with this final error message.	All tasks	Nstring
First Error Code	Error code for the first error message in the session. If there is no error, the Integration Service sets FirstErrorCode to 0 when the session completes. Sample syntax: $s_item_summary.FirstErrorCode = 7086	Session	Integer
FirstErrorMsg	First error message in the session.If there is no error, the Integration Service sets FirstErrorMsg to an empty string when the task completes. Sample syntax: $s_item_summary.FirstErrorMsg = 'TE_7086 Tscrubber: Debug info… Failed to evalWrapUp'Variables of type Nstring can have a maximum length of 600 characters.	Session	Nstring
PrevTaskStatus	Status of the previous task in the workflow that the Integration Service ran. Statuses include: 1.ABORTED 2.FAILED 3.STOPPED 4.SUCCEEDED Use these key words when writing expressions to evaluate the status of the previous task. Sample syntax: $Dec_TaskStatus.PrevTaskStatus = FAILED	All Tasks	Integer
SrcFailedRows	Total number of rows the Integration Service failed to read from the source. Sample syntax: $s_dist_loc.SrcFailedRows = 0	Session	Integer
SrcSuccessRows	Total number of rows successfully read from the sources. Sample syntax: $s_dist_loc.SrcSuccessRows > 2500	Session	Integer
StartTime	Date and time the associated task started. Precision is to the second. Sample syntax: $s_item_summary.StartTime > TO_DATE('11/10/2004 08:13:25')	All Task	Date/Time
Status	Status of the previous task in the workflow. Statuses include: - ABORTED - DISABLED - FAILED - NOTSTARTED - STARTED - STOPPED - SUCCEEDED Use these key words when writing expressions to evaluate the status of the current task. Sample syntax: $s_dist_loc.Status = SUCCEEDED	All Task	Integer
TgtFailedRows	Total number of rows the Integration Service failed to write to the target. Sample syntax: $s_dist_loc.TgtFailedRows = 0	Session	Integer
TgtSuccessRows	Total number of rows successfully written to the target. Sample syntax: $s_dist_loc.TgtSuccessRows > 0	Session	Integer
TotalTransErrors	Total number of transformation errors. Sample syntax: $s_dist_loc.TotalTransErrors = 5	Session	Integer

User-Defined Workflow Variables:

You can create variables within a workflow. When you create a variable in a workflow, it is valid only in that workflow. Use the variable in tasks within that workflow. You can edit and delete user-defined workflow variables.

Use user-defined variables when you need to make a workflow decision based on criteria you specify. For example, you create a workflow to load data to an orders database nightly. You also need to load a subset of this data to headquarters periodically, every tenth time you update the local orders database. Create separate sessions to update the local database and the one at headquarters.

Use a user-defined variable to determine when to run the session that updates the orders database at headquarters.

To configure user-defined workflow variables, complete the following steps:

1. Create a persistent workflow variable, $$WorkflowCount, to represent the number of times the workflow has run.

2. Add a Start task and both sessions to the workflow.

3. Place a Decision task after the session that updates the local orders database.Set up the decision condition to check to see if the number of workflow runs is evenly divisible by 10. Use the modulus (MOD) function to do this.

4. Create an Assignment task to increment the $$WorkflowCount variable by one.

5. Link the Decision task to the session that updates the database at headquarters when the decision condition evaluates to true. Link it to the Assignment task when the decision condition evaluates to false. When you configure workflow variables using conditions, the session that updates the local database runs every time the workflow runs. The session that updates the database at headquarters runs every 10th time the workflow runs.

Creating User-Defined Workflow Variables :

You can create workflow variables for a workflow in the workflow properties.

To create a workflow variable:

1. In the Workflow Designer, create a new workflow or edit an existing one.

2. Select the Variables tab.

3. Click Add.

4. Enter the information in the following table and click OK:

Field	Description
Name	Variable name. The correct format is $$VariableName. Workflow variable names are not case sensitive. Do not use a single dollar sign ($) for a user-defined workflow variable. The single dollar sign is reserved for predefined workflow variables
Data type	Data type of the variable. You can select from the following data types: - Date/Time - Double - Integer - Nstring
Persistent	Whether the variable is persistent. Enable this option if you want the value of the variable retained from one execution of the workflow to the next.
Default Value	Default value of the variable. The Integration Service uses this value for the variable during sessions if you do not set a value for the variable in the parameter file and there is no value stored in the repository. Variables of type Date/Time can have the following formats: - MM/DD/RR - MM/DD/YYYY - MM/DD/RR HH24:MI - MM/DD/YYYY HH24:MI - MM/DD/RR HH24:MI:SS - MM/DD/YYYY HH24:MI:SS - MM/DD/RR HH24:MI:SS.MS - MM/DD/YYYY HH24:MI:SS.MS - MM/DD/RR HH24:MI:SS.US - MM/DD/YYYY HH24:MI:SS.US - MM/DD/RR HH24:MI:SS.NS - MM/DD/YYYY HH24:MI:SS.NS You can use the following separators: dash (-), slash (/), backslash (\), colon (:), period (.), and space. The Integration Service ignores extra spaces. You cannot use one- or three-digit values for year or the “HH12” format for hour. Variables of type Nstring can have a maximum length of 600 characters.
Is Null	Whether the default value of the variable is null. If the default value is null, enable this option.
Description	Description associated with the variable.

5. To validate the default value of the new workflow variable, click the Validate button.

6. Click Apply to save the new workflow variable.

7. Click OK.

Grid Processing

Grid Processing Overview:

When a Power Center domain contains multiple nodes, you can configure workflows and sessions to run on a grid. When you run a workflow on a grid, the Integration Service runs a service process on each available node of the grid to increase performance and scalability. When you run a session on a grid, the Integration Service distributes session threads to multiple DTM processes on nodes in the grid to increase performance and scalability.
You create the grid and configure the Integration Service in the Administration Console. To run a workflow on a grid, you configure the workflow to run on the Integration Service associated with the grid. To run a session on a grid, configure the session to run on the grid.

The Integration Service distributes workflow tasks and session threads based on how you configure the workflow or session to run:

Running workflows on a grid. The Integration Service distributes workflows across the nodes in a grid. It also distributes the Session, Command, and predefined Event-Wait tasks within workflows across the nodes in a grid.
Running sessions on a grid. The Integration Service distributes session threads across nodes in a grid.

Note: To run workflows on a grid, you must have the Server grid option. To run sessions on a grid, you must have the Session on Grid option.

Running Workflows on a Grid:

When you run a workflow on a grid, the master service process runs the workflow and all tasks except Session, Command, and predefined Event-Wait tasks, which it may distribute to other nodes. The master service process is the Integration Service process that runs the workflow, monitors service processes running on other nodes, and runs the Load Balancer. The Scheduler runs on the master service process node, so it uses the date and time for the master service process node to start scheduled workflows.

The Load Balancer is the component of the Integration Service that dispatches Session, Command, and predefined Event-Wait tasks to the nodes in the grid. The Load Balancer distributes tasks based on node availability. If the Integration Service is configured to check resources, the Load Balancer also distributes tasks based on resource availability.

For example, a workflow contains a Session task, a Decision task, and a Command task. You specify a resource requirement for the Session task. The grid contains four nodes, and Node 4 is unavailable. The master service process runs the Start and Decision tasks. The Load Balancer distributes the Session and Command tasks to

nodes on the grid based on resource availability and node availability.

Running Sessions on a Grid:

When you run a session on a grid, the master service process runs the workflow and all tasks except Session, Command, and predefined Event-Wait tasks as it does when you run a workflow on a grid. The Scheduler runs on the master service process node, so it uses the date and time for the master service process node to start scheduled workflows. In addition, the Load Balancer distributes session threads to DTM processes running on different nodes.

When you run a session on a grid, the Load Balancer distributes session threads based on the following factors:

Node availability :- The Load Balancer verifies which nodes are currently running, enabled, and available for task dispatch.
Resource availability :- If the Integration Service is configured to check resources, it identifies nodes that have resources required by mapping objects in the session.
Partitioning configuration. The Load Balancer dispatches groups of session threads to separate nodes based on the partitioning configuration.

You might want to configure a session to run on a grid when the workflow contains a session that takes a long time to run.

Grid Connectivity and Recovery

When you run a workflow or session on a grid, service processes and DTM processes run on different nodes. Network failures can cause connectivity loss between processes running on separate nodes. Services may shut down unexpectedly, or you may disable the Integration Service or service processes while a workflow or session is running. The Integration Service failover and recovery behavior in these situations depends on the service process that is disabled, shuts down, or loses connectivity. Recovery behavior also depends on the following factors:

High availability option:-When you have high availability, workflows fail over to another node if the node or service shuts down. If you do not have high availability, you can manually restart a workflow on another node to recover it.
Recovery strategy:- You can configure a workflow to suspend on error. You configure a recovery strategy for tasks within the workflow. When a workflow suspends, the recovery behavior depends on the recovery strategy you configure for each task in the workflow.
Shutdown mode:- When you disable an Integration Service or service process, you can specify that the service completes, aborts, or stops processes running on the service. Behavior differs when you disable the Integration Service or you disable a service process. Behavior also differs when you disable a master service process or a worker service process. The Integration Service or service process may also shut down unexpectedly. In this case, the failover and recovery behavior depend on which service process shuts down and the configured recovery strategy.
Running mode:-If the workflow runs on a grid, the Integration Service can recover workflows and tasks on another node. If a session runs on a grid, you cannot configure a resume recovery strategy.
Operating mode:- If the Integration Service runs in safe mode, recovery is disabled for sessions and workflows.

Note: You cannot configure an Integration Service to fail over in safe mode if it runs on a grid.