-
Notifications
You must be signed in to change notification settings - Fork 2
Tips and Tricks Strings
Björn Lindqvist edited this page May 27, 2016
·
2 revisions
Sequence handling Tips and tricks:
This is an incredibly common task that might not be obvious how to do in Factor. Python has a handy split()
method on strings that you can use:
u"foo \u00a0bar\u205Fmeh".split()
[u'foo', u'bar', u'meh']
Factor has a split
word found in splitting
that seem similar. However it requires you to specify what delimiters to split on so you might use it like this:
IN: "foo \u0000a0bar\u00205Fmeh" " \n\t" split harvest .
{ "foo" " bar meh" }
But to make it work properly you need to list all possible whitespace characters to split on which is a tedious job since there are so many of them in the unicode standard. It is better to use blank?
from unicode.categories
:
IN: "foo \u0000a0bar\u00205Fmeh" [ blank? ] split-when harvest .
{ "foo" "bar" "meh" }
Factor doesn't have default function arguments which is why it is longer than the Python equivalent. You might also use the split
word from the pcre
vocab which offers a succinct alternative:
IN: QUALIFIED: pcre
IN: "foo \u0000a0bar\u00205Fmeh" "\\s" pcre:split .
{ "foo" "bar" "meh" }