Alessandro Lacava

on Designing and Developing Software. In love with Functional Programming.

How to Force One or More Metacharacters to Be Treated as Ordinary Characters In a Java Regular Expression (RegEx)

When using RegEx in Java you might face the need of treating one or more metacharacters as ordinary characters. As a reminder the metacharacters in a Java RegEx are:

([{^$|)?*+.

If you want to treat them as ordinary characters you have two options:

  1. Escape the metacharacter with a backslash,
  2. Enclose the whole string that contains metacharacters within Q and E

Q means: “quotes all characters until E”, while E ends the quotes.

The following example will hopefully to clarify the subject:

1
2
3
String test = "I want to replace the . with the ,";
String replaced = test.replaceAll(".", ",");
System.out.println(replaced);

What do you expect the above method will do? Do you think the following string will be displayed?

I want to replace the , with the ,

If yes then you might be surprised to find out that what you really get is instead:

,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

The problem is that the replaceAll method of the String class accept a RegExp as its first parameter. Since . means any character, so writing test.replaceAll(".", ","); is translated in: “Replace ANY character of the test string with a comma”. As I said previously you can fix that in two ways. Either you escape the . with a or enclose it within Q and E. What I didn’t say is that, since the is a metacharacter itself, you need to escape it too. :-)

Translating this in Java you have:

1
2
test.replaceAll("\.", ",");
test.replaceAll("\Q.\E", ",");

I prefer to use the first method when the metacharacter is just one. However, when I have more metacharacters or I don’t know at compile time what my string is going to be, I use the second method.

Comments