Labels

Monday, 2 May 2016

SSIS Terminologies quick review .


Data Flow Task: used to move the data from source to destination and to perform intermediate operations

Transformations: These are intermediate Operations performed between source and destination

Ex: Concatenation, Addition, Sorting, Merging etc

Sort: used to sort the data in the specified order i.e. ascending or descending. It has some flexibility to do sorting on multiple columns.
There is a feature called “remove rows with duplicates “helps us to display unique rows in sorting by eliminating duplicates.

Derived Column:


It performs operations row by row. It does different calculations, aggregations, concatinations, convertions etc for the columns in the rows.

Variable: It is the value which is changeable within the package

Variable holds memory and stores values in it.

User Defined Variables:

Holds system information. These variables store under system name space
Ex: system:: <Variable Name>
System::<variable name>
User Defined variables: created by the user only store under user name space
User::<variablename>

Package Level: within the package any where we can use the variable

Tasks Level: within the task only we can use the variable.

Data Conversions:

Data Conversions are done in 3 ways

1.       By using data conversion transformations

2.       By using type cast operations in expressions
<Type Cast Operator> (column -  Name)
(DT_I4)(PartyIncome)
(DT_DBDATE)(‘2010-10-10’)
3.       Directly doing at flat file itself

RC on Flat File Source à Show advanced editor à Input and Out Put properties à Flat File Source out Put à o/p columns à Party Income à Data Type Properties  à Data Type à four – byte signed integer (DT_I4)
Note: If we are using same column nof time in middle instead of converting multiple times directly do at flat file source itself.

Aggregate:
                 It performs aggregate operations such as Average, min, sum, count, distinct etc
FlatFileSource à Data Conversion  à Aggregate    à FlatFileDestination

Multi Cast:

It creates multiple copies of same source so that instead of doing multiple operations on the same source in multiple packages we can do in a single package by using multi cast.
It improves the performance because it reads the data only one time from the source.
Performance point of view this component is very important.

Merge:

It merges multiple input data sources. Here the restriction is source should be in sorted order so that the output also be in the sorted order.
Note: it will accepts only 2 inputs.

Conditional Split:

It splits the data based on the condition. There are 2 types of output comes from this transformation
a.       Condition matched out put
b.      Condition un-matched output or default output.

Union All:

It merges multiple input sources i.e. 2 / more
No need to take the input in the sorted order so that the output will also have Un sorted data.

Limitation: In put source structures should be same (i.e No.of Columns , Order of data type of columns)

Note: It doesn’t bother about column – names.

Merge Join: It performs merge operation along with joins. Generally it supports only Inner, Left and Full Joins.

Look Up: It looks up the required value on target and fetches relevant results (Exact Result)
Adv:

1.       To fetch relevant data
2.       While working with SCD’s
3.       To have exact match with destination and to improve query retrieval fast (it uses cache)

Fuzzy Look Up:

Used to get the result from the destination based on the similarity but not exact match

For Loop Container:

 It executes the underlying task to the specified no of times.

For Each Loop Container:

Used to load group of similar objects whose count is “unknown “

Note: For Loop having a condition so that we know the count where as for each loop doesn’t have.
Loading similar files from a folder to a table etc.

It uses enumerator that supports are
                                For Each File enumerator
                                                Item
                                                ADO
                               
Note : Enumerator values are not changeable with in a Package.

File System task:
It performs file and folder operations such as copying, moving, deleting, creating …etc

Bulk Insert Task:
It loads bulk data with max speed into tables.
We cannot perform any intermediate operations with bulk insert task

Note: It loads files only i.e before loading into the table, table should already create.

Send Mail task:
                 Used to send e-mains to corresponding recipients. It requires SMTP server.

Maintenance Clean up task: it removes file left over from a maintenance plan

Notify Operator task: it sends an e-mail ms to any SQL server Agent.

Rebuild Index task: used to reorganizes data on the index by rebuilding the indexes. It improves the performance of index scan and seeks

Reorganize Index Seek: it defragments and compacts clustered and non-clustered indexes on table and views.

Shrink Data Base Task:
It reduces the disk space consumed by data base and log files by removing empty data and log prices.

Update Statistics Task: it updates statistics of the object if there are already collected.

Character Map: It applies string operations such as lower to Upper vice versa etc.

Copy Column: It creates multiple copies of the columns

Import Columns: it imports into columns from a file
Ex: Loading images from file to table rows.

Fuzzy Grouping:
                 It groups similar rows

OLEDB Command: it executes an SQL Command for each row in a data set

Row Count: It counts no of rows in a data set

Term Extraction: It extracts terms from a data to column

Term Look Up:
                It counts the frequencies that terms in a reference table appears in a data set.

Check Point:

Check point provides better restorability of package
Check point configuration helps us to resume from the last task in the package i.e if multiple tasks are there in a package  if there is any failure in any task it stores the fail task point in the check point file , once we restart the package  then check point helps to start from last point specified in check point file . once package is succeed then it deletes the check point file.

Error out Put Options:

In case of error or truncated values coming from source we can use either of the below options

Ignore Failure: in case of errors or truncations it ignores failure

Redirect Row: In case of error (or) truncation it redirects the row to the other destination

Fail Component: In case of error (or) truncation it simply fails the component

3 comments:

  1. Thanks for posting quick review . It is very informative.

    ReplyDelete
  2. Many thanks Vidhya for reading the blog . Please touch with the blog for more technical stuff.

    ReplyDelete
  3. Thanks for posting valuable information with us, we are looking for more articles on MSBI Online Training

    ReplyDelete