1

SUBSTR in Informatica with Examples

SUBSTR in Informatica Mapping Designer

SUBSTR in Informatica is a function that returns a subset of characters from a larger string. We can use this data as part of some additional mapping logic or map it to a target table to be consumed by business. SUBSTR is used primarily within the Expression Transformation in Informatica. This function works perfectly with pattern based string values like zip codes or phone numbers.

Lets take a look at a quick SUBSTR in Informatica example.

Phone Number Example

Lets say we have the below phone numbers passing through our mapping into an expression transformation:

209-555-1234

714-555-5678

515-555-9123

Assume we want to populate a PHONE table along with AREA_CODE and MAIN_LINE fields. SUBSTR in Informatica works perfectly for extracting these pieces of data out of the full phone number.

Lets take a quick look at the sytax we must use:

SUBSTR( string, start [,length] )

Our first two parameters are required, the third is optional.

1. “string” is defined as the character/string that we want to search. Generally we would pass an expression string variable or input port.

2. “start”, defined by an integer, is merely the starting position to begin counting. We can pass a positive or negative value here. If we pass a positive value, we count left to right for our starting position. Conversely, if we pass a negative value, we count right to left for our starting position. The integration service considers a 0 equal to 1, the first character in our string.

3. “length” is an optional parameter. If entered, it must be an integer greater than 0. It tells the integration service how many characters of the string we want to return based on our starting position. If left blank, the entire string will be returned from the start location specified.

Ok now that we understand the SUBSTR in Informatica syntax, lets continue our phone number example.

Area Code

Using the below SUBSTR in Inforamtica parameter values, we can return the first three characters from our PHONE_NUMBER data:

SUBSTR(PHONE_NUMBER, 1, 3)

SUBSTR Informatica Area Code

SUBSTR Informatica Area Code

PHONE_NUMBER AREA_CODE
209-555-1234 209
714-555-5678 714
515-555-9123 515

I named this expression output port OUT_AREA_CODE.

Lets add another expression output port, OUT_MAIN_LINE. We will define it with the below SUBSTR statement. We start at the 5th character of our PHONE_NUMBER and return the next 8 characters.

SUBSTR(PHONE_NUMBER, 5, 8)

SUBSTR Informatica Main Line

SUBSTR Informatica Main Line

PHONE_NUMBER MAIN_LINE
209-555-1234 555-1234
714-555-5678 555-5678
515-555-9123 555-9123

Putting it all together, our expression transformation will produce the following:

PHONE_NUMBER AREA_CODE MAIN_LINE
209-555-1234 209 555-1234
714-555-5678 714 555-5678
515-555-9123 515 555-9123

Below is a snapshot of our expression transformation ports tab. I defined our new fields using SUBSTR as OUT_AREA_CODE and OUT_MAIN_LINE.

SUBSTR Informatica Expression Setup

SUBSTR Informatica Expression Setup

Common Questions

Question 1 – What will the SUSTR in Informatica fuction return when my “string” value is NULL?

A. When the string value is NULL, SUBSTR will return NULL.

Question 2 – What if my “string” does not follow a character length pattern. How would I return the domain names in an email address for example?

A. Many times our data is not simple. It may follow a pattern of some kind, but perhaps not as straight forward as our PHONE_NUMBER example.

In these situations, we need to use the INSTR function to determine either our start position, length of characters to return or both.

In the case of an email domain, we would need to do something like the below…

SUBSTR(EMAIL_ADDRESS, INSTR(EMAIL_ADDRESS, ‘@’))

We passed the EMAIL_ADDRESS port into our SUBSTR string value parameter. Since we cannot predict the starting position for every email address ahead of time, I used the INSTR function to get my start position. I passed the same EMAIL_ADDRESS port into INSTR as the string to search in, and then the @ symbol as the character to search for.

The INSTR function in Informatica will then return the start postion of the first occurrence of the @ symbol. Since I do not know how long any domain will be, I left the SUBSTR length optional parameter empty so the entire domain will be returned.

Using some real data, our results might look something like this:

EMAIL_ADDRESS DOMAIN
12345Go@gmail.com gmail.com
hello@hotmail.com hotmail.com
dataintegration@yahoo.com yahoo.com

Summary

SUBSTR in Informatica is a very useful function. It helps us extract specific characters from a string that might be useful on their own. The phone number use case is a perfect example of how SUBSTR can be used on strings with simple, consistent patterns. For more complex patterns we might use the INSTR function in Informatica to compliment SUBSTR.

Let me know how you have used the SUBSTR function in Informatica?

Enjoy!

Aaron Gendle

Aaron Gendle

Senior Consultant & Data Architect at Eccella
As a data architect, I have the privilege of solving challenging data problems on a daily basis.

I hold a B.S. in Mathematics and M.S. in System Engineering. I have also earned a Big Data and Social Analytics Certification from MIT.

Eccella is a high end, boutique style consulting firm focused, on creating data driven companies.

Find Aaron Gendle on LinkedIn -
Aaron Gendle

Comments 1

  1. Nicely written explanation of Substr in Informatica! This is very helpful for students and developers new to using Informatica , step by step explanation with examples is great at helping with learning faster. Thanks for sharing this!

Leave a Reply

Your email address will not be published. Required fields are marked *