Automation Action: Extract Field
Parse and extract data from the incoming message and assign the result to a field name.
Find and extract data from the incoming message body or a variable and assign the extracted data to a field name. The extracted data can then be used on any other Automation action setting using %fieldname% replacement (see: Variable Replacements).
Enter a Name for the field.
Helper Message
When creating your first Extract Field action it is a good idea to paste a sample copy of the message body you are extracting data from into the Helper Message box. ThinkAutomation will then highlight the data it will extract for each field as you specify the extraction properties. The Helper text will be saved with the Automation - so you only need to paste it once. Each set of extract field actions with different Extract Field From values can have their own Helper Message text. For Database message source types the Helper Message text will be auto populated when you use the Test query option.
Extract Field From
Select the variable that you want to extract data from. This defaults to %Msg_Body% - which means extract data from the plaintext body of the incoming message. You can change this to %Msg_Subject% to extract from the subject or %Msg_Headers% to extract from the headers etc. You can also select any of your own %variables% to extract data from text created previously in your Automation.
You have four options for setting the field value:
- Find & Extract : To find and extract distinct values from the Extract Field From data by looking for markers.
- Extract From Json : To extract a specific Json path if the Extract Field From contains Json data.
- Extract Using Text Range : To extract a text block from the Extract Field From data at specific left, top, right, bottom character coordinates.
- Extract Built-In Variable : To set the field value to a message variable, system variable or Solution constant.
Find & Extract
Select the Find & Extract option to find and extract distinct values from the Extract From data.
Start From Last Extract Point
Normally ThinkAutomation moves the 'extraction point' as it moves through the Extract From value extracting data for each Extract Field action. Disable this option if you want ThinkAutomation to start from the beginning of the data when it looks for this field. Once a field is extracted the extraction point will be set to the end of the extracted data for the next Extract Field action. The next field extraction starts from this point unless you disable this option.
Case Sensitive
By default ThinkAutomation ignores case when it looks for fields. So 'order number' and 'Order Number' will both match when searching for 'Order number'. Enable this option if you want to perform a case sensitive search.
Is Repeating Block
This option sets the field as a repeating block. If set ThinkAutomation will enter a loop and repeat the extraction of this field until it finds the next field to extract (or the end of the message). You can Call another Automation with the results of each block loop to perform further actions/extraction.
Look For
Enter the text that ThinkAutomation should look for when searching the data for this field. This should be the text that is the same for each message and uniquely identifies the field.
Then Look For
Enter an additional text string that ThinkAutomation looks for AFTER it has found the above text. This is optional, but is useful when data is formatted in the message using an unknown number of tabs or spaces. Consider the following line:
Customer code : ABC
We would search for 'Customer code' and then ':' because we don't know how many spaces are between 'Customer code' and ':'. The field extraction would then start after the ':'.
In both the Look For and Then Look For entries you can make use of Regular Expressions to assist with searching. You can also use %variable% replacements using Fields/Variables previously extracted or set.
Find Next None-Whitespace
You can set the Look For or Then Look For to a single star '*' - a single star means find the next none-whitespace character.
This can be useful when searching for data. For example, suppose the text contains:
Your serial number is:
1234-5678
If we wanted to extract the Serial Number we would be to look for 'Your serial number is:' and then look for '*' - which would effectively look for any none whitespace after 'Your serial number is:'.
Find Blank Line
You can set the Look For or Then Look For to '<BlankLine>' or you can include it, eg: 'Customer: <BlankLine>'. The <BlankLine> marker searches for a single blank line. A 'blank line' is LF + LF or CRLF + CRLF. To find multiple blank lines use '<BlankLine><BlankLine>'
Extract From Json
Select the Extract From Json option if the Extract Field From contains Json data. You can easily find and extract a specific Json path. Enter or select the Json Path. If Json sample data has been pasted into the Helper Message then the paths list will be populated automatically.
Note: Paths use JsonPath notation.
Extracting From Json Arrays
If the selected Path is an array then you can extract a single array value or all values. Enable the Extract All Array Values To CSV to extract all array values to a CSV string. For example, consider the following Json:
{ "cars":[ "Ford", "BMW", "Fiat" ] }
Setting the path to 'cars[0]' would extract the single value "Ford". If Extract All Array Values To CSV is enabled then the extracted value would be "Ford,BMW,Fiat".
If the Json array contains no enclosing object (eg: ["Ford","BMW","Fiat"]) then use paths: array[0], array[1] etc.
If the Extract Field From contained:
{
"cars": [
{
"Name": "Ford",
"Color": "Green",
"Model": "Focus"
},
{
"Name": "BMW",
"Color": "Blue",
"Model": "3 Series"
}
]
}
Setting the path to 'cars' or 'cars[0]' and enabling Extract All Array Values To CSV would extract:
Ford,Green,Focus
BMW,Blue,3 Series
Setting the path to 'cars[0].name' and enabling Extract All Array Values To CSV would extract "Ford,BMW"
Whilst setting the path to 'cars[1].name' and not enabling Extract All Array Values To CSV would extract 'BMW'.
You can use the Parse CSV Line action to further process CSV data.
Extract Using Text Range
Select the Extract Using Text Range option to extract a block of text from the Extract Field From data at specific Left, Top, Right & Bottom character coordinates.
This option is useful where data is always in the same place - but has no specific markers to easily find and extract. An example would be an 'address' block on a invoice/quote etc where the text has been converted from a PDF document.
For example, if the Extract Field From contained:
1 Purchase Order No. PO0000000051040
2 Date 1/12/2023
3
4 PURCHASE ORDER
5
6 Ship To:
7 Test Customer Name Inc Test Customer
8 4767 New Broad Street 1100 Pontiac Ct
9 Baldwin Park
10 Orlando FL 32814 Export PA 15632-9066
11
If we wanted to extract the customer address we would set the coordinates to: left=9, top=7, right=60, bottom=11. We would extend the right & bottom values to ensure all values are extracted. If the Clean And Trim Blanks option is enabled then the extracted value would have any blank lines above or below removed after extraction and each line trimmed.
When extracting postal address blocks you can use the Extract Address Parts action to then extract specific address data.
When defining the coordinates you can select the text you want to extract in the Helper Message - the coordinates will then be automatically set to the selected range. Ranges can be on a single line or span multiple lines.
Extract Built-In Variable
You can also set the field value to any of the built-in message variables, system variables or solution constants. Select the Extract Built-In Variable option and select the variable to use.
Extracting Data
Click the Extract Data tab to define how ThinkAutomation will extract data for this field once it has found it.
For the Find & Extract option, there are a number of options you can use to extract data (all options start the extraction after the Look For, and optionally Then Look For text):
Until End Of Line
Extract all data up to the end of the line (or the end of the data if there are no more lines).
Until End Of Message
Extract all data up to the end of the data.
Until Any Of These Characters
Extract data until any of the following characters are found. You can then specify a list of characters to search for. If any one of the characters are found then extracting will stop. You can include the following markers:
Marker | Details | Example |
---|---|---|
<CR> |
A carriage return character | " ,<cr>" - look for space, comma or carriage return. |
<LF> |
A line feed character | " :<lf>" - look for : or line feed. |
<TAB> |
A tab character | " <tab>" - look for a space or a tab. |
<ESC> |
An escape character | " <tab><esc><cr>" - look for a space, tab, esc or carriage return. |
<CRLF> |
A carriage return + line feed | "<tab><crlf>" - look for a tab or carriage return + line feed. |
<NEWLINE> |
Either CRLF or LF | "<newline>" - look for any line ending. |
<BLANKLINE> |
A blank line | " <blankline>" - look for a space or a blank line. |
<END> |
End of data | " ,<cr><end>" - look for space, comma, carriage return or end of data. |
Until These Characters
Extract data until specific words or characters are found. You can then specify characters, words or phases to search for. Extraction will stop when the words are found. Regular expressions permitted. You can use the <BLANKLINE>
marker to search up to the next blank line.
Until These Many Characters
You can manually specify a number of characters to extract.
Until End Tag
Select this option if you are extracting HTML or XML tags. If the Look For value is a tag, for example: <mytag>
then ThinkAutomation will extract up to the end tag </mytag>
. This option will be automatically selected on new fields if you enter a tag in the Look For entry.
Use The Look For Expression
Select this option to extract the field INCLUDING the Look For expression. This is useful when you want to find AND extract using a regular expression. The data would be extracted starting from the Look For value. For example, if the 'Look For is set to the regular expression: [a-zA-Z0-9._-]+@[a-zA-Z0-9_-]+\.[a-zA-Z.]+
(which is the regular expression for an email address) and the Use The Look For Expression option is enabled.. then the first Email address will be found AND extracted. If the Use The Look For Expression is not enabled then the first email address will be found and extraction will start AFTER the end of the email address.
The Extract 'Until' options do not apply if you are using the Extract From Json option.
Clean And Trim Blanks
Enable this option if you want the extracted data to be cleaned and trimmed. This will remove any leading or trailing spaces, tabs and carriage return/line feed/control characters from the field data.
Remove First/Last
You can also select to remove a number of characters from the beginning and end of the extracted data.
Additional Attributes
Select the Attributes tab to define optional additional attributes for the field.
Field Data Type
Select the Field Data Type from the list. After the field is extracted the value will be converted to the appropriate type.
For the Boolean type field the value will be set to 'True' or 'False' only. The field value will be set to 'True' if the extracted data is any of 'true','yes','on','y','1' (case insensitive) and 'False' otherwise.
If the any of the numeric types are selected then the numeric value of the extracted value will be assigned. If the extracted value cannot be converted to a number of the selected data type then no data will be assigned.
For Decimal, Single and Double types you can also set the Decimal Places. The extracted value will be rounded up to the specified number of decimal places. If Decimal Places is zero then no rounding will be applied.
For the Date type the date will be extracted and assigned in yyyy-MM-dd
format or yyyy-MM-dd HH:mm:ss
format for the DateTime type.
If you are using Update A Database Using Extracted Fields action you should select the correct type matching the column in your database.
Max Length
You can optionally specify the maximum allowed field length for the field. ThinkAutomation will truncate the field if the extracted data is greater than the maximum length. Use this option to avoid database errors that will be raised when field data is inserted into your database that is greater than the defined length. This option applies to String and Text field types only. Set to zero if you do not want ThinkAutomation to truncate the field.
Default Value
Enter a value that will be assigned to the field if no data is found or the extracted value is blank.
Case
This option allows you to change the case of extracted value or to apply Word Capitalization.
Validate
In this section you can define validation rules for the extracted field and you can define what action ThinkAutomation should take if the extracted data is invalid.
Select the Validate option to enable validation for this field. Select Cannot Be Blank Or Zero option if the field must be a value (or be non-zero in the case of numeric fields). For numeric fields you can also select a valid Numeric Range. The Must Be In List option allows you to define a list of valid values. In the Choices entry specify the list of valid values for the field.
If Data Is Invalid
Here you specify what ThinkAutomation should do if the extracted field data is invalid. There are two options:
Set Field To Default Value - select this option if you want ThinkAutomation to replace the extracted data with the field's default value (or blank if no default it specified).
Throw Error - select this option if you want ThinkAutomation to cancel execution of the Automation for the current message.
Mapping To Database Columns
If you are going to use the Update A Database Using Extracted Fields action in your Automation, then you can map extracted fields to tables & columns in the database you want to update. ThinkAutomation then builds the database update commands automatically.
You can also use the Update A Database Using Custom SQL action type which allows finer control over database updates.
Select the Database Update tab to map the extracted field to a table & column in your database that you want ThinkAutomation to update.
Enter the Update Table Name that the field will be updated on. This table must already exist in your database. Multiple fields can use different table names if required - but the tables must be part of the same database.
Enter the Update Column Name in the table that the ThinkAutomation field will be mapped to.
Key Field
Enable this option if this field is a Key Field. Key fields allow you to control how your database is updated. If you create one or more key fields, ThinkAutomation will first check if a record exists in your database using the key field values. If a record already exists then the existing record will be UPDATED otherwise a new record will be INSERTED.
Repeating Blocks
Often messages contain repeating sections. Most of the time you need to run a process or update a database for each individual block of the repeating section. ThinkAutomation allows you to do this by defining a field as a Repeating Block. When a field is defined as a repeating block, ThinkAutomation will enter a loop and extract each block of the repeating section in turn. Another Automation can then be called with the value of each block.
For example:
Suppose you receive the following message:
Name : Howard Williams
Company : PSL
Order Ref: 1234
Product : WHO1
Qty : 1
Product : WHO2
Qty : 2
The Product and Qty fields can repeat any number of times depending on what the customer has ordered.
We can define the Product & Qty fields as a single field and set them as a repeating block. The repeating block can then be passed to another Automation for processing on it's own.
We would define the extraction of the above fields as follows:
- Name = Extract Field From %Msg_Body% Look For "Name" Then ":"
- Company = Extract Field From %Msg_Body% Look For "Company" Then ":"
- Order = Extract Field From %Msg_Body% Look For "Order Ref" Then ":"
- Block = Extract Field Repeating Block From %MsgBody% Look For "*" Call Process Order Lines
The Name, Company and Order fields we extract by looking for the Name:, Company: and Order Ref: field headers and extract until the end of the line.
For the Block field we set the Look For to '*' - which means start from the next character after the last extract point:
We set the Extract Data option to Until These Characters - and set this to '<BlankLine>' which means 'the next blank line'.
You then enable the Is Repeating Block option on this field. ThinkAutomation will then repeat the field extraction until it either finds the next field in the extracted fields list or the end of the message.
We then create another Automation. In this case - called 'Process Order Lines'. This Automation will receive each block of the repeating section as a new message. The 'Process Order Lines' Automation can then extract the Product and Qty fields and perform further processing.
When you enable the Is Repeating Block option the Call tab becomes visible on the Extract Field properties.
On the Call tab of the repeating block field we select the Automation to call. The Body Text should be set to the value that you want this Automation to receive for each block. The Body Text should contain the %ExtractedValue% replacement field - this will be replaced with the current block text on each call. You can add additional variable replacements if you also want to pass previously extracted fields. For example:
Name : %Name%
Company : %Company%
Order Ref: %Order%
%ExtractedValue%
For each block the 'Process Order Lines' Automation would be called with messages set to:
Name : Howard Williams
Company : PSL
Order Ref: 1234
Product : WHO1
Qty : 1
and then..
Name : Howard Williams
Company : PSL
Order Ref: 1234
Product : WHO2
Qty : 2