Data Flow Task:
used to move the data from source to destination and to perform intermediate
operations
Transformations:
These are intermediate Operations performed between source and destination
Ex:
Concatenation, Addition, Sorting, Merging etc
Sort: used to
sort the data in the specified order i.e. ascending or descending. It has some
flexibility to do sorting on multiple columns.
There is a feature called “remove rows with duplicates “helps
us to display unique rows in sorting by eliminating duplicates.
Derived Column:
It performs operations row by row. It does different
calculations, aggregations, concatinations, convertions etc for the columns in
the rows.
Variable: It is
the value which is changeable within the package
Variable holds memory and stores values in it.
User Defined
Variables:
Holds system information. These variables store under system
name space
Ex: system:: <Variable Name>
System::<variable name>
User Defined variables: created by the user only store under
user name space
User::<variablename>
Package Level: within
the package any where we can use the variable
Tasks Level: within
the task only we can use the variable.
Data Conversions:
Data Conversions are done in 3 ways
1. By
using data conversion transformations
2. By
using type cast operations in expressions
<Type Cast Operator> (column - Name)
(DT_I4)(PartyIncome)
(DT_DBDATE)(‘2010-10-10’)
3. Directly
doing at flat file itself
RC on Flat File Source à
Show advanced editor à
Input and Out Put properties à
Flat File Source out Put à
o/p columns à
Party Income à
Data Type Properties à
Data Type à
four – byte signed integer (DT_I4)
Note: If we are using same column nof time in
middle instead of converting multiple times directly do at flat file source
itself.
Aggregate:
It performs aggregate operations such as
Average, min, sum, count, distinct etc
FlatFileSource à
Data Conversion à Aggregate à FlatFileDestination
Multi Cast:
It creates multiple copies of same source so that instead of
doing multiple operations on the same source in multiple packages we can do in
a single package by using multi cast.
It improves the performance because it reads the data only
one time from the source.
Performance point of view this component is very important.
Merge:
It merges multiple input data sources. Here the restriction
is source should be in sorted order so that the output also be in the sorted
order.
Note: it will accepts only 2 inputs.
Conditional Split:
It splits the data based on the condition. There are 2 types
of output comes from this transformation
a. Condition
matched out put
b. Condition
un-matched output or default output.
Union All:
It merges multiple input sources i.e. 2 / more
No need to take the input in the sorted order so that the
output will also have Un sorted data.
Limitation: In
put source structures should be same (i.e No.of Columns , Order of data type of
columns)
Note: It doesn’t
bother about column – names.
Merge Join: It
performs merge operation along with joins. Generally it supports only Inner,
Left and Full Joins.
Look Up: It looks
up the required value on target and fetches relevant results (Exact Result)
Adv:
1. To
fetch relevant data
2. While
working with SCD’s
3. To
have exact match with destination and to improve query retrieval fast (it uses
cache)
Fuzzy Look Up:
Used to get the result from the
destination based on the similarity but not exact match
For Loop Container:
It executes the
underlying task to the specified no of times.
For Each Loop
Container:
Used to load group of similar objects whose count is “unknown
“
Note: For Loop
having a condition so that we know the count where as for each loop doesn’t
have.
Loading similar files from a folder to a table etc.
It uses enumerator that supports are
For
Each File enumerator
Item
ADO
Note : Enumerator values are not changeable with in a
Package.
File System task:
It performs file and folder operations such as copying, moving,
deleting, creating …etc
Bulk Insert Task:
It loads bulk data with max speed into tables.
We cannot perform any intermediate operations with bulk
insert task
Note: It loads
files only i.e before loading into the table, table should already create.
Send Mail task:
Used to send e-mains to corresponding recipients.
It requires SMTP server.
Maintenance Clean up
task: it removes file left over from a maintenance plan
Notify Operator task:
it sends an e-mail ms to any SQL server Agent.
Rebuild Index task:
used to reorganizes data on the index by rebuilding the indexes. It improves
the performance of index scan and seeks
Reorganize Index Seek:
it defragments and compacts clustered and non-clustered indexes on table and
views.
Shrink Data Base
Task:
It reduces the disk space consumed by data base and log
files by removing empty data and log prices.
Update Statistics
Task: it updates statistics of the object if there are already collected.
Character Map: It
applies string operations such as lower to Upper vice versa etc.
Copy Column: It
creates multiple copies of the columns
Import Columns: it
imports into columns from a file
Ex: Loading images from file to table rows.
Fuzzy Grouping:
It groups similar rows
OLEDB Command: it
executes an SQL Command for each row in a data set
Row Count: It
counts no of rows in a data set
Term Extraction:
It extracts terms from a data to column
Term Look Up:
It
counts the frequencies that terms in a reference table appears in a data set.
Check Point:
Check point provides better restorability of package
Check point configuration helps us to resume from the last
task in the package i.e if multiple tasks are there in a package if there is any failure in any task it stores
the fail task point in the check point file , once we restart the package then check point helps to start from last
point specified in check point file . once package is succeed then it deletes
the check point file.
Error out Put
Options:
In case of error or truncated values coming from source we
can use either of the below options
Ignore Failure:
in case of errors or truncations it ignores failure
Redirect Row: In
case of error (or) truncation it redirects the row to the other destination
Fail Component:
In case of error (or) truncation it simply fails the component
Thanks for posting quick review . It is very informative.
ReplyDeleteMany thanks Vidhya for reading the blog . Please touch with the blog for more technical stuff.
ReplyDeleteThanks for posting valuable information with us, we are looking for more articles on MSBI Online Training
ReplyDelete