Sorter Transformation in Informatica

Sorter Transformation In Informatica

The sorter transformation in Informatica is an active, connected transformation. It allows us to sort data within our mapping by a specific port or ports. The sorter transformation’s core functionality is very similar to the ORDER BY clause within SQL. For example, if we are working with a PERSON table and we want to get SELECT all records but sort descending by LAST_NAME then ascending by FIRST_NAME, our SQL might look something like this:

SELECT *
FROM PERSON
ORDER BY LAST_NAME DESC, FIRST_NAME ASC

We can accomplish these same results with a data pipeline in Informatica through a Sorter Transformation. Lets take a look at what this looks like within a mapping.

Sorter Transofrmation in Informatica Mapping

Sorter Transofrmation in Informatica Mapping

Looking at the above sample mapping, we have an Oracle PERSON source table. We also have a corresponding source qualifier pulling in all data from the PERSON table. Next we pass our non audit fields into an expression transformation where we might perform some data transformation. Finally, we pass these same four ports into our sorter transformation.

Lets double click the sorter transformation and see how we have configured it to behave like our SQL statement above.

Sorter Transformation Ports Tab

Sorter Transformation in Informatica Ports Tab

Sorter Transformation in Informatica Ports Tab

Notice how the ports tab is configured. The key column checkboxes next to the LAST_NAME and FIRST_NAME ports are checked. Additionally, we have selected specific values in the direction column for these same ports, descending for LAST_NAME and ascending for FIRST_NAME. These selections ensure we first sort by LAST_NAME from Z to A, then by FIRST_NAME from A to Z. The LAST_NAME port comes before the FIRST_NAME port in the ports tab. This ensures LAST_NAME has priority in the sort, then FIRST_NAME.

Lets take a look at some real data…

Assume these values flow into our SRT_NAME sorter transformation:

PERSON_ID LAST_NAME FIRST_NAME MIDDLE_NAME
3 JOHNSON SAM MATTHEW
5 KELLER TIM ALLEN
2 BAILY GEORGE FRANCES
4 SINGER SIMON THOMAS
1 JOHNSON ADAM CHRIS

After passing through the SRT_NAME sorter transformation our record order will look like this:

PERSON_ID LAST_NAME FIRST_NAME MIDDLE_NAME
4 SINGER SIMON THOMAS
5 KELLER TIM ALLEN
1 JOHNSON ADAM CHRIS
3 JOHNSON SAM MATTHEW
2 BAILY GEORGE FRANCES

Lets checkout the properties tab to see what additional functionality the sorter transformation in Informatica provides.

Sorter Transformation Properties Tab

Sorter Transformation in Informatica Properties Tab`

Sorter Transformation in Informatica Properties Tab

Lets take a look at each transformation attribute one at a time.

Sorter Cache Size

Sorter cache size allows us to control the amount of physical memory dedicated to the sorter transformation to perform the sort operation. “Auto” is the default attribute value which the Integration Service sets to 16 MB (16,777,216 bytes). 64-bit Integration Service must be used if the total session configured session cache is 2GB or greater.

Cache files are used in the working directory if the amount of data exceeds the allotted sorter cache size. The Integration Service working directory needs at least twice the disc space of the incoming data perform the sort. Before performing the sort, the Integration Service allocates the memory needed and will fail if it cannot find enough. Check the session log with Normal tracing to view the row size and memory use by the the sorter transformation. This will give you a better idea of what memory is being allocated and if more is needed.

Case Sensitive

The case sensitive property forces the sorter transformation to evaluate upper and lower case character with different sort priority. Upper case characters have a higher sort priority than lower case when enabled. Otherwise, if left un-checked, both lower and upper case characters have the same sort priority.

Distinct Output Rows

Checking distinct output rows will ensure all output records corresponding to output ports are distinct. This property is very much like a SELECT DISTINCT within a SQL statement. This property is what makes the sorter transformation an active transformation, removing duplicate rows that may pass through. After checking this checkbox, the sorter transformation will enable every port as a “key” to sort on. So in addition to removing duplicates, every row will be used in the sort with the first port being considered first, second port second, etc…

Tracing Level

Lets look at the sorter transformation in Informatica’s tracing level. The default, Normal tracing level will log the size of the row and amount of memory allocated for sort operation. Important for performance evaluation, the Integration Service will also write the time and date the first and last input rows to the sorter transformation.

Using our SRT_NAME transformation in our example above, here is a partial session log with Normal tracing on:

Severity Timestamp Node Thread Message Code Message
INFO 09/01/2014 3:06:03 PM node_ul1234 MAPPING SORT_40419 For Transformation [SRT_NAME], memory used is [16777216] bytes and row size is 172 bytes.
INFO 09/01/2014 3:06:03 PM node_ul1234 TRANSF_1_1_1 SORT_40420 SORT_40420 Start of input for Transformation [SRT_NAME]. : (Mon Sep 01 15:06:03 2014)
INFO 09/01/2014 3:06:03 PM node_ul1234 TRANSF_1_1_1 SORT_40421 SORT_40421 End of input for Transformation [SRT_NAME]. : (Mon Sep 01 15:06:03 2014)
INFO 09/01/2014 3:06:03 PM node_ul1234 TRANSF_1_1_1 SORT_40422 SORT_40422 End of output from Sorter Transformation [SRT_NAME]. Processed 5 rows (176 input bytes; 0 temp I/O bytes). : (Mon Sep 01 15:06:03 2014)
INFO 09/01/2014 3:06:03 PM node_ul1234 TRANSF_1_1_1 SORT_40423 SORT_40423 End of sort for Sorter Transformation [SRT_NAME]. : (Mon Sep 01 15:06:03 2014)

With a tracing level of Verbose Data instead of Normal, the time the Sorter transformation finishes passing all data to the next transformation in the pipeline is also recorded in the session log by the Integration Service. Additionally, memory resource release time and the time temporary files were removed from the working directory.

Here is an example of a partial verbose session log for the Sorter transformation.

Severity Timestamp Node Thread Message Code Message
INFO 09/01/2014 3:28:32 PM node_ul1234 MAPPING SORT_40419 For Transformation [SRT_NAME], memory used is [16777216] bytes and row size is 172 bytes.
INFO 09/01/2014 3:28:32 PM node_ul1234 TRANSF_1_1_1 CMN_1053 SRT_NAME:INPUT row pushed from EXP_SETUP_DATA: Control=[None] SRT_NAME:INPUT row pushed from EXP_SETUP_DATA: Rowdata: ( RowType=0(insert) Src Rowid=1 Targ Rowid=1
PERSON_ID (PERSON_ID:Double:): “4.000000000000000”
LAST_NAME (LAST_NAME:Char.50:): “SINGER”
FIRST_NAME (FIRST_NAME:Char.50:): “SIMON”
MIDDLE_NAME (MIDDLE_NAME:Char.50:): “THOMAS”
)
SRT_NAME:INPUT row pushed from EXP_SETUP_DATA: Rowdata: ( RowType=0(insert) Src Rowid=2 Targ Rowid=2
PERSON_ID (PERSON_ID:Double:): “5.000000000000000”
LAST_NAME (LAST_NAME:Char.50:): “KELLER”
FIRST_NAME (FIRST_NAME:Char.50:): “TIM”
MIDDLE_NAME (MIDDLE_NAME:Char.50:): “ALLEN”
)
SRT_NAME:INPUT row pushed from EXP_SETUP_DATA: Rowdata: ( RowType=0(insert) Src Rowid=3 Targ Rowid=3
PERSON_ID (PERSON_ID:Double:): “1.000000000000000”
LAST_NAME (LAST_NAME:Char.50:): “JOHNSON”
FIRST_NAME (FIRST_NAME:Char.50:): “ADAM”
MIDDLE_NAME (MIDDLE_NAME:Char.50:): “CHRIS”
)
SRT_NAME:INPUT row pushed from EXP_SETUP_DATA: Rowdata: ( RowType=0(insert) Src Rowid=4 Targ Rowid=4
PERSON_ID (PERSON_ID:Double:): “3.000000000000000”
LAST_NAME (LAST_NAME:Char.50:): “JOHNSON”
FIRST_NAME (FIRST_NAME:Char.50:): “SAM”
MIDDLE_NAME (MIDDLE_NAME:Char.50:): “MATTHEW”
)
SRT_NAME:INPUT row pushed from EXP_SETUP_DATA: Rowdata: ( RowType=0(insert) Src Rowid=5 Targ Rowid=5
PERSON_ID (PERSON_ID:Double:): “2.000000000000000”
LAST_NAME (LAST_NAME:Char.50:): “BAILY”
FIRST_NAME (FIRST_NAME:Char.50:): “GEORGE”
MIDDLE_NAME (MIDDLE_NAME:Char.50:): “FRANCES”
)
INFO 09/01/2014 3:28:32 PM node_ul1234 TRANSF_1_1_1 SORT_40420 SORT_40420 Start of input for Transformation [SRT_NAME]. : (Mon Sep 01 15:28:32 2014)
INFO 09/01/2014 3:28:32 PM node_ul1234 TRANSF_1_1_1 SORT_40421 SORT_40421 End of input for Transformation [SRT_NAME]. : (Mon Sep 01 15:28:32 2014)
INFO 09/01/2014 3:28:32 PM node_ul1234 TRANSF_1_1_1 CMN_1053 SRT_NAME: Send OUTPUT row to FF_Generic: Control=[None] SRT_NAME: Send OUTPUT row to FF_Generic: Rowdata: ( RowType=0(insert) Src Rowid=1 Targ Rowid=1
PERSON_ID (NEWFIELD:Char.200:): “4.000000000000000”
LAST_NAME (NEWFIELD1:Char.200:): “SINGER”
FIRST_NAME (NEWFIELD2:Char.200:): “SIMON”
MIDDLE_NAME (NEWFIELD3:Char.200:): “THOMAS”
)
SRT_NAME: Send OUTPUT row to FF_Generic: Rowdata: ( RowType=0(insert) Src Rowid=2 Targ Rowid=2
PERSON_ID (NEWFIELD:Char.200:): “5.000000000000000”
LAST_NAME (NEWFIELD1:Char.200:): “KELLER”
FIRST_NAME (NEWFIELD2:Char.200:): “TIM”
MIDDLE_NAME (NEWFIELD3:Char.200:): “ALLEN”
)
SRT_NAME: Send OUTPUT row to FF_Generic: Rowdata: ( RowType=0(insert) Src Rowid=3 Targ Rowid=3
PERSON_ID (NEWFIELD:Char.200:): “1.000000000000000”
LAST_NAME (NEWFIELD1:Char.200:): “JOHNSON”
FIRST_NAME (NEWFIELD2:Char.200:): “ADAM”
MIDDLE_NAME (NEWFIELD3:Char.200:): “CHRIS”
)
SRT_NAME: Send OUTPUT row to FF_Generic: Rowdata: ( RowType=0(insert) Src Rowid=4 Targ Rowid=4
PERSON_ID (NEWFIELD:Char.200:): “3.000000000000000”
LAST_NAME (NEWFIELD1:Char.200:): “JOHNSON”
FIRST_NAME (NEWFIELD2:Char.200:): “SAM”
MIDDLE_NAME (NEWFIELD3:Char.200:): “MATTHEW”
)
SRT_NAME: Send OUTPUT row to FF_Generic: Rowdata: ( RowType=0(insert) Src Rowid=5 Targ Rowid=5
PERSON_ID (NEWFIELD:Char.200:): “2.000000000000000”
LAST_NAME (NEWFIELD1:Char.200:): “BAILY”
FIRST_NAME (NEWFIELD2:Char.200:): “GEORGE”
MIDDLE_NAME (NEWFIELD3:Char.200:): “FRANCES”
)
INFO 09/01/2014 3:28:32 PM node_ul1234 TRANSF_1_1_1 SORT_40422 SORT_40422 End of output from Sorter Transformation [SRT_NAME]. Processed 5 rows (176 input bytes; 0 temp I/O bytes). : (Mon Sep 01 15:28:32 2014)
INFO 09/01/2014 3:28:32 PM node_ul1234 TRANSF_1_1_1 SORT_40423 SORT_40423 End of sort for Sorter Transformation [SRT_NAME]. : (Mon Sep 01 15:28:32 2014)

Null Treated Low

This property gives us control over how NULL values are sorted within a key port. If enabled or checked, NULL values in a key port will be considered lower than any other value in the sort. If disabled or unchecked, NULL values in a key port will be considered higher than any other value in the sort.

Transformation Scope

Transformation scope is a property that determines the scope of the transformation logic. If set to All Input sorter transformation logic will apply to all incoming rows. However, if Transaction is selected, sorter transformation logic only applies to all rows within the same transaction.

Velecity Best Practice

Velocity recommends the following naming standards for the Sorter transformation in Informatica.

SRT_{FUNCTION}

Example: SRT_NAME

Final Summary

The Sorter Transformation in Informatica is a great tool for sorting data within an Informatica mapping. One should even consider sorting with the sorter transformation instead of through an ORDER BY statement within SQL. If the database is not configured appropriately, the sorter transformation may perform even better than SQL. This active, connected transformation has several properties that enhance its functionality and usability. Make sure to understand this common Informatica transformation and good luck using the Sorter Transformation in all your Informatica data integration projects!

Aaron Gendle

Aaron Gendle

Senior Consultant & Data Architect at Eccella
As a data architect, I have the privilege of solving challenging data problems on a daily basis.

I hold a B.S. in Mathematics and M.S. in System Engineering. I have also earned a Big Data and Social Analytics Certification from MIT.

Eccella is a high end, boutique style consulting firm focused, on creating data driven companies.

Find Aaron Gendle on LinkedIn -
Aaron Gendle

Leave a Reply

Your email address will not be published. Required fields are marked *