We Use Cookies

We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with this.

See our cookie policy.

Automation Action: Tokenize

Tokenize any text and assign the comma separated tokens to a variable.

Built-In Action

Gets a list of comma separated tokens (words) for any text.

Enter the Text/HTML to tokenize. If the text is HTML then the HTML will be converted to plain text first.

Options:

  • Remove Common Words : Remove all common words (and, the, a etc.) from the tokens list.
  • Remove Email Addresses & Urls : Removes any email addresses and URLs from the tokens list.
  • Include Numeric Tokens : Include tokens containing numbers and dates in the tokens list.
  • Normalize : Normalizes common contractions (eg: 'what's' to 'what is') and common abbreviations (eg: hi to hello, nov to november, ur to your, bday to birthday, 2day to today, plz to please, thx to thanks etc.)
  • Stem Words : Reduces words to their root form (English only). For example: the words 'ask','asking' and 'asked' would all stem to 'ask'.
  • Unique : Duplicates are removed from the tokens list.
  • Include Count : The frequency is appended to each token (if unique enabled).
  • Sort By : None, frequency, word (if unique enabled).
  • Top : Return the top x words if sorted (if unique enabled).

The tokens can be assigned to a variable. Tokens are returned as a comma separated string.