Automation Action: Tokenize
Tokenize any text and assign the comma separated tokens to a variable.
Gets a list of comma separated tokens (words) for any text.
Enter the Text/HTML to tokenize. If the text is HTML then the HTML will be converted to plain text first.
Options:
- Remove Common Words : Remove all common words (and, the, a etc.) from the tokens list.
- Remove Email Addresses & Urls : Removes any email addresses and URLs from the tokens list.
- Include Numeric Tokens : Include tokens containing numbers and dates in the tokens list.
- Normalize : Normalizes common contractions (eg: 'what's' to 'what is') and common abbreviations (eg: hi to hello, nov to november, ur to your, bday to birthday, 2day to today, plz to please, thx to thanks etc.)
- Stem Words : Reduces words to their root form (English only). For example: the words 'ask','asking' and 'asked' would all stem to 'ask'.
- Unique : Duplicates are removed from the tokens list.
- Include Count : The frequency is appended to each token (if unique enabled).
- Sort By : None, frequency, word (if unique enabled).
- Top : Return the top x words if sorted (if unique enabled).
The tokens can be assigned to a variable. Tokens are returned as a comma separated string.