Bug 1060 : split( String, String ) with brackets in delimiter causes regex.PatternSyntaxException
Last modified: 2008-11-29 11:36




Status:
RESOLVED
Resolution:
FIXED -
Priority:
P2
Severity:
normal

 

Reporter:
fjen
Assigned To:
REAS

Attachment Type Created Size Actions

Description:   Opened: 2008-11-25 10:42
String foo = "12.6 ( done )";
String[] fooArr = split( foo, " ( " );

will generate:

Exception in thread "Animation Thread" java.util.regex.PatternSyntaxException: Unclosed
group near index 2
(
^

i know why it happens but it might be strange for other people since the reference does not
mention regular expressions.
Additional Comment #1 From fry 2008-11-25 10:47
split() uses regex, so you have to put escapes around things that are used
by regex (such as parens).

the correct syntax is:
String[] fooArr = split( foo, " \\( " );

we may just need to make a note of it in the reference.
Additional Comment #2 From REAS 2008-11-28 15:08
Good point. I added this example:

String men = "Chernenko ] Andropov ] Brezhnev";
String[] list = split(men, " \\] ");
// list[0] is now Chernenko, list[1] is Andropov, ...

And this text:

This function uses regular expressions to determine how the <b>delim</b>
parameter divides the <b>str</b> parameter. Therefore, if you use
characters such parentheses and brackets that are used with regular
expressions as a part of the <b>delim</b> parameter, you'll need to put two
blackslashes (\\) in front of the character (see example above). You can
read more about <a
href="http://en.wikipedia.org/wiki/Regular_expression">regular
expressions</a> and <a
href="http://en.wikipedia.org/wiki/Escape_character">escape characters</a>
on Wikipedia.
Additional Comment #3 From fry 2008-11-29 06:17
I think this should just be added to the old text--the old text was correct
(and more commonly, the way that people will use split()) it's just that
the delimiter also parses regular expressions, which can be a positive or
negative.

Though on second thought, maybe it shouldn't be parsing regexps at all. The
more I think about it, this seems really confusing. And if people want
regexps, they can use String.split() instead.
Additional Comment #4 From fjen 2008-11-29 06:23
on one hand i think it's not that bad to have regexes but on the other hand it's really bad to
have to escape things. maybe we could have a matchReplace(String, String) instead and
split(String, String) being a wrapper around that that automatically escapes things?
Additional Comment #5 From REAS 2008-11-29 07:24
(In reply to comment #3)
>
>
>
> Additional Comment #3 From
> fry
> 2008-11-29 06:17
> <!--
> addReplyLink(3); //-->[reply]
>
> I think this should just be added to the old text--the old text was
correct
> (and more commonly, the way that people will use split()) it's just that
> the delimiter also parses regular expressions, which can be a positive or
> negative.

That's what I did, I added that paragraph to the end of the reference. Am I
missing your point?

I think the function works well "as is", but I don't use if often. I defer
to your info-vis knowledge.
Additional Comment #6 From fry 2008-11-29 11:36
After thinking about this more, I think we shouldn't do the regexp
thing--people can use String.split() if they want that. It's more in line
with split(char) and in the same family with splitTokens().

I've commented out the portion in the reference and written new code for
0136 (1.0.1).