Powered By Blogger

Wednesday, September 7, 2011

How to concatenate values from multiple rows in Informatica

In Informatica , many times we face scenario where we have to concatenate values from two or more rows into a single row.

Consider a source which contains the below records:
ID Name
1 A
2 R
3 B
1 D
2 E
2 O

Suppose we want output like this:
ID Name
1 D,A
2 O,E,R
3 B

Above scenario can be implemented in Informatica as depicted below:



Steps:
1) Create source definition which contains the columns named ID and NAME
2) After source qualifier place a sorter and sort the data in ascending order . Here sort key will be ID and sort order will be ascending order.



3) After sorter put an expression transformation . In expression crate following fields in addition to the fields coming from sorter
v_NAME(variable) IIF(ID=v_ID,NAME||','||v_NAME,NAME)
v_ID(variable):ID
out_NAME(output field)=v_NAME



4) After expression place an aggregator . In aggregator group by will be on ID.



5) Then take the ID and NAME field to the target.

Initial loading and delta loading

Initial loading and delta loading are the basic ETL concepts used by any tools in the market.

Initial Loading:-The type of loading that you do for the first time to empty target tables.


Delta loading:-As the volume increases in the target table over the period of time,you would like to load only changes records from source to target table rather than all the data from the source.This type is called delta loading.

The options like "Delete data before loading" and "Auto correct load"should not be related/linked to the above mentioned types of loading.


"Delete data before loading" can be used when you want to ignore all the existing data in the target table and load from scratch from source table.

"Auto correct load" is the option used when you want no duplicate records to be loaded into the target.

Friday, September 2, 2011

PowerCenter Glossary of Terms


A

active source
An active source is an active transformation the Integration Service uses to generate rows.
adaptive dispatch mode
A dispatch mode in which the Load Balancer dispatches tasks to the node with the most available CPUs.
application services
A group of services that represent PowerCenter server-based functionality. You configure each application service based on your environment requirements. Application services include the Integration Service, Repository Service, SAP BW Service, Metadata Manager Service, Reporting Service, and Web Services Hub.
attachment view
View created in a web service source or target definition for a WSDL that contains a mime attachment. The attachment view has an n:1 relationship with the envelope view.
available resource
Any PowerCenter resource that is configured to be available to a node.

B

backup node
Any node that is configured to run a service process, but is not configured as a primary node.
blocking
The suspension of the data flow into an input group of a multiple input group transformation.
blurring
A masking rule that limits the range of numeric output values to a fixed or percent variance from the value of the source data. The Data Masking transformation returns numeric data that is close to the value of the source data.
bounds
A masking rule that limits the range of numeric output to a range of values. The Data Masking transformation returns numeric data between the minimum and maximum bounds.
buffer block
A block of memory that the Integration Services uses to move rows of data from the source to the target. The number of rows in a block depends on the size of the row data, the configured buffer block size, and the configured buffer memory size.
buffer block size
The size of the blocks of data used to move rows of data from the source to the target. You specify the buffer block size in bytes or as percentage of total memory. Configure this setting in the session properties.
buffer memory
Buffer memory allocated to a session. The Integration Service uses buffer memory to move data from sources to targets. The Integration Service divides buffer memory into buffer blocks.
buffer memory size
Total buffer memory allocated to a session specified in bytes or as a percentage of total memory.
built-in parameters and variables
Predefined parameters and variables that do not vary according to task type. They return system or run-time information such as session start time or workflow name. Built-in variables appear under the Built-in node in the Designer or Workflow Manager Expression Editor.

C

cache partitioning
A caching process that the Integration Service uses to create a separate cache for each partition. Each partition works with only the rows needed by that partition. The Integration Service can partition caches for the Aggregator, Joiner, Lookup, and Rank transformations.
child dependency
A dependent relationship between two objects in which the child object is used by the parent object.
child object
A dependent object used by another object, the parent object.
cold start
A start mode that restarts a task or workflow without recovery.
commit number
A number in a target recovery table that indicates the amount of messages that the Integration Service loaded to the target. During recovery, the Integration Service uses the commit number to determine if it wrote messages to all targets.
commit source
An active source that generates commits for a target in a source-based commit session.
compatible version
An earlier version of a client application or a local repository that you can use to access the latest version repository.
composite object
An object that contains a parent object and its child objects. For example, a mapping parent object contains child objects including sources, targets, and transformations.
concurrent workflow
A workflow configured to run multiple instances at the same time. When the Integration Service runs a concurrent workflow, you can view the instance in the Workflow Monitor by the workflow name, instance name, or run ID.
coupled group
An input group and output group that share ports in a transformation.
CPU profile
An index that ranks the computing throughput of each CPU and bus architecture in a grid. In adaptive dispatch mode, nodes with higher CPU profiles get precedence for dispatch.
custom role
A role that you can create, edit, and delete. See also role.
Custom transformation
A transformation that you bind to a procedure developed outside of the Designer interface to extend PowerCenter functionality. You can create Custom transformations with multiple input and output groups.
Custom transformation procedure
A C procedure you create using the functions provided with PowerCenter that defines the transformation logic of a Custom transformation.
Custom XML view
An XML view that you define instead of allowing the XML Wizard to choose the default root and columns in the view. You can create custom views using the XML Editor and the XML Wizard in the Designer.

D

data masking
A process that creates realistic test data from production source data. The format of the original columns and relationships between the rows are preserved in the masked data.
Data Masking transformation
A passive transformation that replaces sensitive columns in source data with realistic test data. Each masked column retains the datatype and the format of the original data.
default permissions
The permissions that each user and group receives when added to the user list of a folder or global object. Default permissions are controlled by the permissions of the default group, “Other.”
denormalized view
An XML view that contains more than one multiple-occurring element.
dependent object
An object used by another object. A dependent object is a child object.
deployment group
A global object that contains references to other objects from multiple folders across the repository. You can copy the objects referenced in a deployment group to multiple target folders in another repository. When you copy objects in a deployment group, the target repository creates new versions of the objects. You can create a static or dynamic deployment group.
Design Objects privilege group
A group of privileges that define user actions on the following repository objects: business components, mapping parameters and variables, mappings, mapplets, transformations, and user-defined functions.
deterministic output
Source or transformation output that does not change between session runs when the input data is consistent between runs.
dispatch mode
A mode used by the Load Balancer to dispatch tasks to nodes in a grid.
domain
See also security domain.
DTM
The Data Transformation Manager process that reads, writes, and transforms data.
DTM buffer memory
dynamic deployment group
A deployment group that is associated with an object query. When you copy a dynamic deployment group, the source repository runs the query and then copies the results to the target repository.
dynamic partitioning
The ability to scale the number of partitions without manually adding partitions in the session properties. Based on the session configuration, the Integration Service determines the number of partitions when it runs the session.

E

effective Transaction Control transformation
A Transaction Control transformation that does not have a downstream transformation that drops transaction boundaries.
effective transaction generator
A transaction generator that does not have a downstream transformation that drops transaction boundaries.
element view
A view created in a web service source or target definition for a multiple occurring element in the input or output message. The element view has an n:1 relationship with the envelope view.
envelope view
A main view in a web service source or target definition that contains a primary key and the columns for the input or output message.
exclusive mode
An operating mode for the Repository Service. When you run the Repository Service in exclusive mode, you allow only one user to access the repository to perform administrative tasks that require a single user to access the repository and update the configuration.

F

failover

The migration of a service or task to another node when the node running or service process become unavailable.
fault view
A view created in a web service target definition if a fault message is defined for the operation. The fault view has an n:1 relationship with the envelope view.
flush latency
A session condition that determines how often the Integration Service flushes data from the source.

G

gateway
gateway node
Receives service requests from clients and routes them to the appropriate service and node. A gateway node can run application services. In the Administration Console, you can configure any node to serve as a gateway for a PowerCenter domain. A domain can have multiple gateway nodes. See also master gateway node.
global object
An object that exists at repository level and contains properties you can apply to multiple objects in the repository. Object queries, deployment groups, labels, and connection objects are global objects.
grid object
An alias assigned to a group of nodes to run sessions and workflows.
group
A set of ports that defines a row of incoming or outgoing data. A group is analogous to a table in a relational source or target definition.
H
high availability
A PowerCenter option that eliminates a single point of failure in a domain and provides minimal service interruption in the event of failure.High Group History List
A file that the United States Social Security Administration provides that lists the Social Security Numbers it issues each month. The Data Masking transformation accesses this file when you mask Social Security Numbers.
I
impacted object
An object that has been marked as impacted by the PowerCenter Client. The PowerCenter Client marks objects as impacted when a child object changes in such a way that the parent object may not be able to run.
incompatible object
An object that a compatible client application cannot access in the latest version repository.
ineffective Transaction Control transformation
A Transaction Control transformation that has a downstream transformation that drops transaction boundaries, such as an Aggregator transformation with Transaction transformation scope.
ineffective transaction generator
A transaction generator that has a downstream transformation that drops transaction boundaries, such as an Aggregator transformation with Transaction transformation scope.
Informatica Services
The name of the service or daemon that runs on each node. When you start Informatica Services on a node, you start the Service Manager on that node.
input group
See group.
Integration Service
An application service that runs data integration workflows and loads metadata into the Metadata Manager warehouse.
Integration Service process
A process that accepts requests from the PowerCenter Client and from pmcmd. The Integration Service process manages workflow scheduling, locks and reads workflows, and starts DTM processes.
invalid object
An object that has been marked as invalid by the PowerCenter Client. When you validate or save a repository object, the PowerCenter Client verifies that the data can flow from all sources in a target load order group to the targets without the Integration Service blocking all sources.
K
key masking
A type of data masking that produces repeatable results for the same source data and masking rules. The Data Masking transformation requires a seed value for the port when you configure it for key masking.
L
label
A user-defined object that you can associate with any versioned object or group of versioned objects in the repository.
latency
A period of time from when source data changes on a source to when a session writes the data to a target.
Lightweight Directory Access Protocol (LDAP) authentication
One of the authentication methods used to authenticate users logging in to PowerCenter applications. In LDAP authentication, the Service Manager imports groups and user accounts from the LDAP directory service into an LDAP security domain in the domain configuration database. The Service Manager stores the group and user account information in the domain configuration database but passes authentication to the LDAP server.
limit on resilience timeout
An amount of time a service waits for a client to connect or reconnect to the service. The limit can override the client resilience timeout configured for a client. See resilience timeout.
linked domain
A domain that you link to when you need to access the repository metadata in that domain.
Load Balancer
A component of the Integration Service that dispatches Session, Command, and predefined Event-Wait tasks across nodes in a grid.
local domain
A PowerCenter domain that you create when you install PowerCenter. This is the domain you access when you log in to the Administration Console.
Log Agent
A Service Manager function that provides accumulated log events from session and workflows. You can view session and workflow logs in the Workflow Monitor. The Log Agent runs on the nodes where the Integration Service process runs. See also Log Manager.
Log Manager
A Service Manager function that provides accumulated log events from each service in the domain. You can view logs in the Administration Console. The Log Manager runs on the master gateway node. See also Log Agent.
M
mapping
A set of source and target definitions linked by transformation objects that define the rules for data transformation.
Mapping Architect for Visio
A PowerCenter Client tool that you use to create mapping templates that you can import into the PowerCenter Designer.
masking algorithm
Sets of characters and rotors of numbers that provide the logic to mask data. A masking algorithm provides random results that ensure that the original data cannot be disclosed from the masked data.
master gateway node
The entry point to a PowerCenter domain. When you configure multiple gateway nodes, one node acts as the master gateway node. If the master gateway node becomes unavailable, the Service Manager on other gateway nodes elect another master gateway node. See also gateway node.
master service process
An Integration Service process that runs the workflow and workflow tasks. When you run workflows and sessions on a grid, the Integration Service designates one Integration Service process as the master service process. The master service process can distribute the Session, Command, and predefined Event-Wait tasks to worker service processes. The master service process monitors all Integration Service processes.
metadata explosion
The expansion of referenced or multiple-occurring elements in an XML definition. The relationship model you choose for an XML definition determines if metadata is limited or exploded to multiple areas within the definition. Limited data explosion reduces data redundancy.
Metadata Manager Service
An application service that runs the Metadata Manager application in a PowerCenter domain. It manages access to metadata in the Metadata Manager warehouse.
metric-based dispatch mode
A dispatch mode in which the Load Balancer checks current computing load against the resource provision thresholds and then dispatches tasks in a round-robin fashion to nodes where the thresholds are not exceeded.
N
native authentication
One of the authentication methods used to authenticate users logging in to PowerCenter applications. In native authentication, you create and manage users and groups in the Administration Console. The Service Manager stores group and user account information and performs authentication in the domain configuration database.
node
A logical representation of a machine or a blade. Each node runs a Service Manager that performs domain operations on that node.
normal mode
An operating mode for an Integration Service or Repository Service. Run the Integration Service in normal mode during daily Integration Service operations. Run the Repository Service in normal mode to allow multiple users to access the repository and update content.
normalized view
An XML view that contains no more than one multiple-occurring element. Normalized XML views reduce data redundancy.
O
object query
A user-defined object you use to search for versioned objects that meet specific conditions.
one-way mapping
A mapping that uses a web service client for the source. The Integration Service loads data to a target, often triggered by a real-time event through a web service request.
open transaction
A set of rows that are not bound by commit or rollback rows.
operating mode
The mode for an Integration Service or Repository Service. An Integration Service runs in normal or safe mode. A Repository Service runs in normal or exclusive mode.
operating system flush
A process that the Integration Service uses to flush messages that are in the operating system buffer to the recovery file. The operating system flush ensures that all messages are stored in the recovery file even if the Integration Service is not able to write the recovery information to the recovery file during message processing.
operating system profile
A level of security that the Integration Services uses to run workflows. The operating system profile contains the operating system user name, service process variables, and environment variables. The Integration Service runs the workflow with the system permissions of the operating system user and settings defined in the operating system profile.
output group
See group.
P
parent dependency
A dependent relationship between two objects in which the parent object uses the child object.
parent object
An object that uses a dependent object, the child object.
permission
The level of access a user has to an object. Even if a user has the privilege to perform certain actions, the user may also require permission to perform the action on a particular object.
pipeline
pipeline branch
A segment of a pipeline between any two mapping objects.
pipeline stage
The section of a pipeline executed between any two partition points.
pmdtm process
The Data Transformation Manager process. See DTM.
pmserver process
The Integration Service process. See Integration Service process.
port dependency
The relationship between an output or input/output port and one or more input or input/output ports.
PowerCenter application
A component that you log in to so that you can access underlying PowerCenter functionality. PowerCenter applications include the Administration Console, PowerCenter Client, Metadata Manager, and Data Analyzer.
PowerCenter domain
A collection of nodes and services that define the PowerCenter environment. You group nodes and services in a domain based on administration ownership.
PowerCenter resource
Any resource that may be required to run a task. PowerCenter has predefined resources and user-defined resources.
PowerCenter services
The services available in the PowerCenter domain. These consist of the Service Manager and the application services. The application services include the Repository Service, Integration Service, Reference Table Manager Service, Reporting Service, Metadata Manager Service, Web Services Hub, and SAP BW Service.
predefined parameters and variables
Parameters and variables whose values are set by the Integration Service. You cannot define values for predefined parameter and variable values in a parameter file. Predefined parameters and variables include built-in parameters and variables, email variables, and task-specific workflow variables.
predefined resource
An available built-in resource. This can include the operating system or any resource installed by the PowerCenter installation, such as a plug-in or a connection object.
primary node
A node that is configured as the default node to run a service process. By default, the Service Manager starts the service process on the primary node and uses a backup node if the primary node fails.
privilege
An action that a user can perform in PowerCenter applications. You assign privileges to users and groups for the PowerCenter domain, the Repository Service, the Metadata Manager Service, and the Reporting Service. See also permission.
privilege group
An organization of privileges that defines common user actions.
pushdown group
A group of transformations containing transformation logic that is pushed to the database during a session configured for pushdown optimization. The Integration Service creates one or more SQL statements based on the number of partitions in the pipeline.
pushdown optimization
A session option that allows you to push transformation logic to the source or target database.
R
random masking
A type of masking that produces random, non-repeatable results.
real-time data
Data that originates from a real-time source. Real-time data includes messages and messages queues, web services messages, and change data from a PowerExchange change data capture source.
real-time processing
On-demand processing of data from operational data sources, databases, and data warehouses. Real-time processing reads, processes, and writes data to targets continuously.
real-time session
A session in which the Integration Service generates a real-time flush based on the flush latency configuration and all transformations propagate the flush to the targets.
real-time source
The origin of real-time data. Real-time sources include JMS, WebSphere MQ, TIBCO, webMethods, MSMQ, SAP, and web services.
recovery
The automatic or manual completion of tasks after a service is interrupted. Automatic recovery is available for Integration Service and Repository Service tasks. You can also manually recover Integration Service workflows.
recovery ignore list
A list of message IDs that the Integration Service uses to prevent data duplication for JMS and WebSphere MQ sessions. The Integration Service writes recovery information to the list if there is a chance that the source did not receive an acknowledgement.
reference file
A Microsoft Excel or flat file that contains reference data. Use the Reference Table Manager to import data from reference files into reference tables.
reference table
A table that contains reference data such as default, valid, and cross-reference values. You can create, edit, import, and export reference data with the Reference Table Manager.
Reference Table Manager repository
A relational database that stores reference table metadata and information about users and user connections.
Reference Table Manager Service
An application service that runs the Reference Table Manager application in the PowerCenter domain.
Reference Table Manager
A web application used to manage reference tables. You can also manage user connections, and view user information and audit trail logs.
reference table staging area
A relational database that stores the reference tables. All reference tables that you create or import using the Reference Table Manager are stored within the staging area. You can create and manage multiple staging areas to restrict access to the reference tables.
repeatable data
A source or transformation output that is in the same order between session runs when the order of the input data is consistent.
Reporting Service
An application service that runs the Data Analyzer application in a PowerCenter domain. When you log in to Data Analyzer, you can create and run reports on data in a relational database or run the following PowerCenter reports: PowerCenter Repository Reports, Data Analyzer Data Profiling Reports, or Metadata Manager Reports.
repository client
Any PowerCenter component that connects to the repository. This includes the PowerCenter Client, Integration Service, pmcmd, pmrep, and MX SDK.
repository domain
A group of linked repositories consisting of one global repository and one or more local repositories.
Repository Service
An application service that manages the repository. It retrieves, inserts, and updates metadata in the repository database tables.
request-response mapping
A mapping that uses a web service source and target. When you create a request-response mapping, you use source and target definitions imported from the same WSDL file.
required resource
A PowerCenter resource that is required to run a task. A task fails if it cannot find any node where the required resource is available.
reserved words file
A file named reswords.txt that you create and maintain in the Integration Service installation directory. The Integration Service searches this file and places quotes around reserved words when it executes SQL against source, target, and lookup databases.
resilience
The ability for PowerCenter services to tolerate transient network failures until either the resilience timeout expires or the external system failure is fixed.
resilience timeout
The amount of time a client attempts to connect or reconnect to a service. A limit on resilience timeout can override the resilience timeout. See also limit on resilience timeout.
resource provision thresholds
Computing thresholds defined for a node that determine whether the Load Balancer can dispatch tasks to the node. The Load Balancer checks different thresholds depending on the dispatch mode.
role
A collection of privileges. If groups of users perform similar actions, you can create and assign roles to more efficiently grant privileges to users. You assign roles to users and groups for the PowerCenter domain, the Repository Service, the Metadata Manager Service, and the Reporting Service.
round-robin dispatch mode
A dispatch mode in which the Load Balancer dispatches tasks to available nodes in a round-robin fashion up to the Maximum Processes resource provision threshold.
Run-time Objects privilege group
A group of privileges that define user actions on the following repository objects: session configuration objects, tasks, workflows, and worklets.
S
safe mode
An operating mode for the Integration Service. When you run the Integration Service in safe mode, only users with privilege to administer the Integration Service can run and get information about sessions and workflows. A subset of the high availability features are available in safe mode.
SAP BW Service
An application service that listens for RFC requests from SAP NetWeaver BI and initiates workflows to extract from or load to SAP NetWeaver BI.
security domain
A collection of user accounts and groups in a PowerCenter domain. Native authentication uses the Native security domain which contains the users and groups created and managed in the Administration Console. LDAP authentication uses LDAP security domains which contain users and groups imported from the LDAP directory service. You can define multiple security domains for LDAP authentication. See also native authentication and Lightweight Directory Access Protocol (LDAP) authentication.
seed
A random number required by key masking to generate non-colliding repeatable masked output.
service level
A domain property that establishes priority among tasks that are waiting to be dispatched. When multiple tasks are waiting in the dispatch queue, the Load Balancer checks the service level of the associated workflow so that it dispatches high priority tasks before low priority tasks.
Service Manager
A service that manages all domain operations. It runs on all nodes in the domain to support the application services and the domain. When you start Informatica Services, you start the Service Manager. If the Service Manager is not running, the node is not available.
service mapping
A mapping that processes web service requests. A service mapping can contain source or target definitions imported from a Web Services Description Language (WSDL) file containing a web service operation. It can also contain flat file or XML source or target definitions.
service process
A run-time representation of a service running on a node.
service workflow
A workflow that contains exactly one web service input message source and at most one type of web service output message target. Configure service properties in the service workflow.
session
A task in a workflow that tells the Integration Service how to move data from sources to targets. A session corresponds to one mapping.
session recovery
The process that the Integration Service uses to complete failed sessions. When the Integration Service runs a recovery session that writes to a relational target in normal mode, it resumes writing to the target database table at the point at which the previous session failed. For other target types, the Integration Service performs the entire writer run again.
source pipeline
A source qualifier and all of the transformations and target instances that receive data from that source qualifier.
Sources and Targets privilege group
A group of privileges that define user actions on the following repository objects: cubes, dimensions, source definitions, and target definitions.
stage
state of operation
Workflow and session information the Integration Service stores in a shared location for recovery. The state of operation includes task status, workflow variable values, and processing checkpoints.
static deployment group
A deployment group that you must manually add objects to. See also deployment group.
system-defined role
A role that you cannot edit or delete. The Administrator role is a system-defined role. See also role.
T
target connection group
A group of targets that the Integration Service uses to determine commits and loading. When the Integration Service performs a database transaction such as a commit, it performs the transaction for all targets in a target connection group.
target load order groups
A collection of source qualifiers, transformations, and targets linked together in a mapping.
task release
A process that the Workflow Monitor uses to remove older tasks from memory so you can monitor an Integration Service in online mode without exceeding memory limits.
task-specific workflow variables
Predefined workflow variables that vary according to task type. They appear under the task name in the Workflow Manager Expression Editor. $Dec_TaskStatus.PrevTaskStatus and $s_MySession.ErrorMsg are examples of task-specific workflow variables.
terminating condition
A condition that determines when the Integration Service stops reading messages from a real-time source and ends the session.
transaction
A set of rows bound by commit or rollback rows.
transaction boundary
A row, such as a commit or rollback row, that defines the rows in a transaction. Transaction boundaries originate from transaction control points.
transaction control
The ability to define commit and rollback points through an expression in the Transaction Control transformation and session properties.
transaction control point
A transformation that defines or redefines the transaction boundary by dropping any incoming transaction boundary and generating new transaction boundaries.
Transaction Control transformation
A transformation used to define conditions to commit and rollback transactions from relational, XML, and dynamic WebSphere MQ targets.
transaction control unit
A group of targets connected to an active source that generates commits or an effective transaction generator. A transaction control unit may contain multiple target connection groups.
transaction generator
A transformation that generates both commit and rollback rows. Transaction generators drop incoming transaction boundaries and generate new transaction boundaries downstream. Transaction generators are Transaction Control transformations and Custom transformation configured to generate commits.
type view
A view created in a web service source or target definition for a complex type element in the input or output message. The type view has an n:1 relationship with the envelope view.
U
user-defined commit
A commit strategy that the Integration Service uses to commit and roll back transactions defined in a Transaction Control transformation or a Custom transformation configured to generate commits.
user-defined parameters and variables
Parameters and variables whose values can be set by the user. You normally define user-defined parameters and variables in a parameter file. However, some user-defined variables, such as user-defined workflow variables, can have initial values that you specify or are determined by their datatypes. You must specify values for user-defined session parameters.
user-defined property
A user-defined property is metadata that you define, such as PowerCenter metadata extensions. You can create user-defined properties in business intelligence, data modeling, or OLAP tools, such as IBM DB2 Cube Views or PowerCenter, and exchange the metadata between tools using the Metadata Export Wizard and Metadata Import Wizard.
user-defined resource
A PowerCenter resource that you define, such as a file directory or a shared library you want to make available to services.
V
version
An incremental change of an object saved in the repository. The repository uses version numbers to differentiate versions.
versioned object
An object for which you can create multiple versions in a repository. The repository must be enabled for version control.
view root
The element in an XML view that is a parent to all the other elements in the view.
view row
The column in an XML view that triggers the Integration Service to generate a row of data for the view in a session.
W
Web Services Hub
An application service in the PowerCenter domain that uses the SOAP standard to receive requests and send responses to web service clients. It acts as a web service gateway to provide client applications access to PowerCenter functionality using web service standards and protocols.
Web Services Provider
The provider entity of the PowerCenter web service framework that makes PowerCenter workflows and data integration functionality accessible to external clients through web services.
worker node
Any node not configured to serve as a gateway. A worker node can run application services but cannot serve as a master gateway node.
workflow
A set of instructions that tells the Integration Service how to run tasks such as sessions, email notifications, and shell commands.
workflow instance
The representation of a workflow. You can choose to run one or more workflow instances associated with a concurrent workflow. When you run a concurrent workflow, you can run one instance multiple times concurrently, or you can run multiple instances concurrently.
workflow instance name
The name of a workflow instance for a concurrent workflow. You can create workflow instance names, or you can use the default instance name.
workflow run ID
A number that identifies a workflow instance that has run.
Workflow Wizard
A wizard that creates a workflow with a Start task and sequential Session tasks based on the mappings you choose.
worklet
A worklet is an object representing a set of tasks created to reuse a set of workflow logic in multiple workflows.
X
XML group
A set of ports in an XML definition that defines a row of incoming or outgoing data. An XML view becomes a group in a PowerCenter definition.
XML view
A portion of any arbitrary hierarchy in an XML definition. An XML view contains columns that are references to the elements and attributes in the hierarchy. In a web service definition, the XML view represents the elements and attributes defined in the input and output messages.  

Wednesday, June 29, 2011

Workflow Variables

Workflow Variables Overview:

Bookmark Facebook Twitter Google Delicious Digg StumbleUpon More Bookmarks

You can create and use variables in a workflow to reference values and record information. For example, use a Variable in a Decision task to determine whether the previous task ran properly. If it did, you can run the next task.

If not, you can stop the workflow. Use the following types of workflow variables:

  • Predefined workflow variables. The Workflow Manager provides predefined workflow variables for tasks within a workflow.
  • User-defined workflow variables. You create user-defined workflow variables when you create a workflow. Use workflow variables when you configure the following types of tasks:
  • Assignment tasks. Use an Assignment task to assign a value to a user-defined workflow variable. For Example, you can increment a user-defined counter variable by setting the variable to its current value plus 1.
  • Decision tasks. Decision tasks determine how the Integration Service runs a workflow. For example, use the Status variable to run a second session only if the first session completes successfully.
  • Links. Links connect each workflow task. Use workflow variables in links to create branches in the workflow. For example, after a Decision task, you can create one link to follow when the decision condition evaluates to true, and another link to follow when the decision condition evaluates to false.
  • Timer tasks. Timer tasks specify when the Integration Service begins to run the next task in the workflow. Use a user-defined date/time variable to specify the time the Integration Service starts to run the next task.

Use the following keywords to write expressions for user-defined and predefined workflow variables:

  • AND
  • OR
  • NOT
  • TRUE
  • FALSE
  • NULL
  • SYSDATE

Predefined Workflow Variables:

Each workflow contains a set of predefined variables that you use to evaluate workflow and task conditions. Use the following types of predefined variables:

  • Task-specific variables. The Workflow Manager provides a set of task-specific variables for each task in the workflow. Use task-specific variables in a link condition to control the path the Integration Service takes when running the workflow. The Workflow Manager lists task-specific variables under the task name in the Expression Editor.
  • Built-in variables. Use built-in variables in a workflow to return run-time or system information such as folder name, Integration Service Name, system date, or workflow start time. The Workflow Manager lists built-in variables under the Built-in node in the Expression Editor.

Task-Specific
Variables

DescriptionTask TypesData type
Condition

Evaluation result of decision condition expression.
If the task fails, the Workflow Manager keeps the condition set to null.


Sample syntax:
$Dec_TaskStatus.Condition =

DecisionInteger
End Time

Date and time the associated task ended. Precision is to the second.
Sample syntax:
$s_item_summary.EndTime > TO_DATE('11/10/2004
08:13:25')

All tasksDate/Time
ErrorCode

Last error code for the associated task. If there is no error, the Integration Service sets ErrorCode to 0 when the task completes.
Sample syntax:
$s_item_summary.ErrorCode = 24013.
Note: You might use this variable when a task consistently fails with this final error message.

All tasksInteger
ErrorMsg

Last error message for the associated task.If there is no error, the Integration Service sets ErrorMsg to an empty string when the task completes.
Sample syntax:
$s_item_summary.ErrorMsg = 'PETL_24013 Session run
completed with failure
Variables of type Nstring can have a maximum length of 600 characters.
Note: You might use this variable when a task consistently fails
with this final error message.

All tasksNstring
First Error Code

Error code for the first error message in the session.
If there is no error, the Integration Service sets FirstErrorCode to 0
when the session completes.
Sample syntax:
$s_item_summary.FirstErrorCode = 7086

SessionInteger
FirstErrorMsg

First error message in the session.If there is no error, the Integration Service sets FirstErrorMsg to an empty string when the task completes.
Sample syntax:
$s_item_summary.FirstErrorMsg = 'TE_7086 Tscrubber:
Debug info… Failed to evalWrapUp'Variables of type Nstring can have a maximum length of 600 characters.

SessionNstring
PrevTaskStatus

Status of the previous task in the workflow that the Integration Service ran. Statuses include:
1.ABORTED
2.FAILED
3.STOPPED
4.SUCCEEDED
Use these key words when writing expressions to evaluate the status of the previous task.
Sample syntax:
$Dec_TaskStatus.PrevTaskStatus = FAILED

All TasksInteger
SrcFailedRows

Total number of rows the Integration Service failed to read from the source.
Sample syntax:
$s_dist_loc.SrcFailedRows = 0

SessionInteger
SrcSuccessRows

Total number of rows successfully read from the sources.
Sample syntax:
$s_dist_loc.SrcSuccessRows > 2500

SessionInteger
StartTime

Date and time the associated task started. Precision is to the second.
Sample syntax:
$s_item_summary.StartTime > TO_DATE('11/10/2004
08:13:25')

All TaskDate/Time
Status

Status of the previous task in the workflow. Statuses include:
- ABORTED
- DISABLED
- FAILED
- NOTSTARTED
- STARTED
- STOPPED
- SUCCEEDED
Use these key words when writing expressions to evaluate the status of the current task.
Sample syntax:
$s_dist_loc.Status = SUCCEEDED

All TaskInteger
TgtFailedRows

Total number of rows the Integration Service failed to write to the target.
Sample syntax:
$s_dist_loc.TgtFailedRows = 0

SessionInteger
TgtSuccessRows

Total number of rows successfully written to the target.
Sample syntax:
$s_dist_loc.TgtSuccessRows > 0

SessionInteger
TotalTransErrors

Total number of transformation errors.
Sample syntax:
$s_dist_loc.TotalTransErrors = 5

SessionInteger

User-Defined Workflow Variables:

You can create variables within a workflow. When you create a variable in a workflow, it is valid only in that workflow. Use the variable in tasks within that workflow. You can edit and delete user-defined workflow variables.

Use user-defined variables when you need to make a workflow decision based on criteria you specify. For example, you create a workflow to load data to an orders database nightly. You also need to load a subset of this data to headquarters periodically, every tenth time you update the local orders database. Create separate sessions to update the local database and the one at headquarters.

clip_image002

Use a user-defined variable to determine when to run the session that updates the orders database at headquarters.

To configure user-defined workflow variables, complete the following steps:

1. Create a persistent workflow variable, $$WorkflowCount, to represent the number of times the workflow has run.

2. Add a Start task and both sessions to the workflow.

3. Place a Decision task after the session that updates the local orders database.Set up the decision condition to check to see if the number of workflow runs is evenly divisible by 10. Use the modulus (MOD) function to do this.

4. Create an Assignment task to increment the $$WorkflowCount variable by one.

5. Link the Decision task to the session that updates the database at headquarters when the decision condition evaluates to true. Link it to the Assignment task when the decision condition evaluates to false. When you configure workflow variables using conditions, the session that updates the local database runs every time the workflow runs. The session that updates the database at headquarters runs every 10th time the workflow runs.

Creating User-Defined Workflow Variables :

You can create workflow variables for a workflow in the workflow properties.

To create a workflow variable:

1. In the Workflow Designer, create a new workflow or edit an existing one.

2. Select the Variables tab.

3. Click Add.

4. Enter the information in the following table and click OK:

FieldDescription
Name

Variable name. The correct format is $$VariableName. Workflow variable names are not case sensitive.
Do not use a single dollar sign ($) for a user-defined workflow variable. The single dollar sign
is reserved for predefined workflow variables

Data type

Data type of the variable. You can select from the following data types:
- Date/Time
- Double
- Integer
- Nstring

Persistent

Whether the variable is persistent. Enable this option if you want the value of the variable
retained from one execution of the workflow to the next.

Default Value

Default value of the variable. The Integration Service uses this value for the variable during
sessions if you do not set a value for the variable in the parameter file and there is no value
stored in the repository.
Variables of type Date/Time can have the following formats:
- MM/DD/RR
- MM/DD/YYYY
- MM/DD/RR HH24:MI
- MM/DD/YYYY HH24:MI
- MM/DD/RR HH24:MI:SS
- MM/DD/YYYY HH24:MI:SS
- MM/DD/RR HH24:MI:SS.MS
- MM/DD/YYYY HH24:MI:SS.MS
- MM/DD/RR HH24:MI:SS.US
- MM/DD/YYYY HH24:MI:SS.US
- MM/DD/RR HH24:MI:SS.NS
- MM/DD/YYYY HH24:MI:SS.NS
You can use the following separators: dash (-), slash (/), backslash (\), colon (:), period (.), and
space. The Integration Service ignores extra spaces. You cannot use one- or three-digit values
for year or the “HH12” format for hour.
Variables of type Nstring can have a maximum length of 600 characters.

Is NullWhether the default value of the variable is null. If the default value is null, enable this option.
DescriptionDescription associated with the variable.

5. To validate the default value of the new workflow variable, click the Validate button.

6. Click Apply to save the new workflow variable.

7. Click OK.


Grid Processing

Grid Processing Overview:

Bookmark Facebook Twitter Google Delicious Digg StumbleUpon More Bookmarks
When a Power Center domain contains multiple nodes, you can configure workflows and sessions to run on a grid. When you run a workflow on a grid, the Integration Service runs a service process on each available node of the grid to increase performance and scalability. When you run a session on a grid, the Integration Service distributes session threads to multiple DTM processes on nodes in the grid to increase performance and scalability.
You create the grid and configure the Integration Service in the Administration Console. To run a workflow on a grid, you configure the workflow to run on the Integration Service associated with the grid. To run a session on a grid, configure the session to run on the grid.

The Integration Service distributes workflow tasks and session threads based on how you configure the workflow or session to run:

  • Running workflows on a grid. The Integration Service distributes workflows across the nodes in a grid. It also distributes the Session, Command, and predefined Event-Wait tasks within workflows across the nodes in a grid.
  • Running sessions on a grid. The Integration Service distributes session threads across nodes in a grid.

Note: To run workflows on a grid, you must have the Server grid option. To run sessions on a grid, you must have the Session on Grid option.

Running Workflows on a Grid:

When you run a workflow on a grid, the master service process runs the workflow and all tasks except Session, Command, and predefined Event-Wait tasks, which it may distribute to other nodes. The master service process is the Integration Service process that runs the workflow, monitors service processes running on other nodes, and runs the Load Balancer. The Scheduler runs on the master service process node, so it uses the date and time for the master service process node to start scheduled workflows.

The Load Balancer is the component of the Integration Service that dispatches Session, Command, and predefined Event-Wait tasks to the nodes in the grid. The Load Balancer distributes tasks based on node availability. If the Integration Service is configured to check resources, the Load Balancer also distributes tasks based on resource availability.

For example, a workflow contains a Session task, a Decision task, and a Command task. You specify a resource requirement for the Session task. The grid contains four nodes, and Node 4 is unavailable. The master service process runs the Start and Decision tasks. The Load Balancer distributes the Session and Command tasks to

nodes on the grid based on resource availability and node availability.

Running Sessions on a Grid:

When you run a session on a grid, the master service process runs the workflow and all tasks except Session, Command, and predefined Event-Wait tasks as it does when you run a workflow on a grid. The Scheduler runs on the master service process node, so it uses the date and time for the master service process node to start scheduled workflows. In addition, the Load Balancer distributes session threads to DTM processes running on different nodes.

When you run a session on a grid, the Load Balancer distributes session threads based on the following factors:

  • Node availability :- The Load Balancer verifies which nodes are currently running, enabled, and available for task dispatch.
  • Resource availability :- If the Integration Service is configured to check resources, it identifies nodes that have resources required by mapping objects in the session.
  • Partitioning configuration. The Load Balancer dispatches groups of session threads to separate nodes based on the partitioning configuration.

You might want to configure a session to run on a grid when the workflow contains a session that takes a long time to run.

Grid Connectivity and Recovery

When you run a workflow or session on a grid, service processes and DTM processes run on different nodes. Network failures can cause connectivity loss between processes running on separate nodes. Services may shut down unexpectedly, or you may disable the Integration Service or service processes while a workflow or session is running. The Integration Service failover and recovery behavior in these situations depends on the service process that is disabled, shuts down, or loses connectivity. Recovery behavior also depends on the following factors:

  • High availability option:-When you have high availability, workflows fail over to another node if the node or service shuts down. If you do not have high availability, you can manually restart a workflow on another node to recover it.
  • Recovery strategy:- You can configure a workflow to suspend on error. You configure a recovery strategy for tasks within the workflow. When a workflow suspends, the recovery behavior depends on the recovery strategy you configure for each task in the workflow.
  • Shutdown mode:- When you disable an Integration Service or service process, you can specify that the service completes, aborts, or stops processes running on the service. Behavior differs when you disable the Integration Service or you disable a service process. Behavior also differs when you disable a master service process or a worker service process. The Integration Service or service process may also shut down unexpectedly. In this case, the failover and recovery behavior depend on which service process shuts down and the configured recovery strategy.
  • Running mode:-If the workflow runs on a grid, the Integration Service can recover workflows and tasks on another node. If a session runs on a grid, you cannot configure a resume recovery strategy.
  • Operating mode:- If the Integration Service runs in safe mode, recovery is disabled for sessions and workflows.

Note: You cannot configure an Integration Service to fail over in safe mode if it runs on a grid.