What is Normalization, First Normal Form, Second Normal Form , Third
1.Normalization is process for assigning attributes to entities–Reducesdata redundancies–Helps eliminate data anomalies–Produces controlledredundancies to link tables
2.Normalization is the analysis of functional dependency between attributes / data items of user views?It reduces a complex user view to a set of small and stable subgroups of fields / relations
1NF: Repeating groups must be eliminated, Dependencies can be identified, All key attributes defined, No repeating groups in table
2NF: The Table is already in1NF,Includes no partial dependencies–No attribute dependent on a portionof primary key, Still possible to exhibit transitive dependency, Attributes may be functionally dependent on non-key attributes
3NF: The Table is already in 2NF, Contains no transitive dependencies.
What is Fact table?
Fact Table contains the measurements or metrics or facts of business process. If your business process is “Sales” , then a measurement of this business process such as “monthly sales number” is captured in the Fact table. Fact table also contains the foreign keys for the dimension tables.
What are conformed dimensions?
Answer1:
Conformed dimensions mean the exact same thing with every possible fact table to which they are joined Ex: Date Dimensions is connected all facts like Sales facts, Inventory facts..etc
Answer2:
Conformed dimensions are dimensions which are common to the cubes.(cubes are the schemas contains facts and dimension tables)
Consider Cube-1 contains F1,D1,D2,D3 and Cube-2 contains F2,D1,D2,D4 are the Facts and Dimensions here D1,D2 are the Conformed Dimensions
Explain why and where do we exactly use the lookup transformations.
You can use the Lookup transformation to perform many tasks, including:
o Get a related value. For example, your source includes employee ID, but you want to include the employee name in your target table to make your summary data easier to read.
o Perform a calculation. Many normalized tables include values used in a calculation, such as gross sales per invoice or sales tax, but not the calculated value (such as net sales).
o Update slowly changing dimension tables. You can use a Lookup transformation to determine whether rows already exist in the target.
How do you tell aggregator stage that input data is already sorted?
By enabling sorted input property in Aggregator Properties
What are push and pull ETL strategies?
Push and Pull strategies determine how data comes from source system to ETL server.
Push: In this case the Source system pushes data i.e.(sends data) to the ETL server.
Pull: In this case the ETL server pulls data i.e.(gets data) from the source system.
What is the Difference between a ODS and Staging Area?
ODS:- Operational Data Store which contains data .
ODS comes after the staging area
e.g.:-
In our e.g. lets consider that we have day level Granularity in the OLTP & Year level Granularity in the Data warehouse.
If the business (manager) asks for week level Granularity then we have to go to the oltp and summarize the day level to the week level which would be pain taking. So what we do is that we maintain week level Granularity in the ods for the data, for about 30 to 90 days.
Note : Ods information would contain cleansed data only. ie after staging area
Staging Area :-
It comes after the etl has finished.Staging Area consists of
1.Meta Data .
2.The work area where we apply our complex business rules.
3.Hold the data and do calculations.
In other words we can say that its a temp work area.
No comments:
Post a Comment