Automation Action: Tokenize

Tokenize any text and assign the comma separated tokens to a variable.

The Tokenize automation action can be used to get a list of comma separated tokens (words) for any text. The tokens can then be used further in your automation workflow.

Enter the Text/HTML to tokenize. If the text is HTML then the HTML will be converted to plain text first.

Options:

  • Remove Common Words : Remove all common words (and, the, a etc.) from the tokens list.
  • Remove Email Addresses & Urls : Removes any email addresses and URLs from the tokens list.
  • Include Numeric Tokens : Include tokens containing numbers and dates in the tokens list.
  • Normalize : Normalizes common contractions (eg: 'what's' to 'what is') and common abbreviations (eg: hi to hello, nov to november, ur to your, bday to birthday, 2day to today, plz to please, thx to thanks etc.)
  • Stem Words : Reduces words to their root form (English only). For example: the words 'ask','asking' and 'asked' would all stem to 'ask'.
  • Unique : Duplicates are removed from the tokens list.
  • Include Count : The frequency is appended to each token (if unique enabled).
  • Sort By : None, frequency, word (if unique enabled).
  • Top : Return the top x words if sorted (if unique enabled).

The tokens can be assigned to a variable. Tokens are returned as a comma separated string.