Find/Replace using Regexes in Java

2007.12.02 11:53
EDIT: String.replaceAll(String, String) is a better solution for this task.

I needed to strip markup from a string. For example, I needed to replace <i>this</i> with this.

This is a paraphrased method that I wrote (along with help from colleagues) to complete the task.

public static final String stripMarkup(String content) {
    final Pattern tagPattern = Pattern.compile("</?\\w+>");
    final StringBuffer replacement = new StringBuffer();
    final Matcher matcher = tagPattern.matcher(content);
    while (matcher.find()) {
        matcher.appendReplacement(replacement, "");
    }
    matcher.appendTail(replacement);

    tag.removeChildren();
    return replacement.toString();
}

Comments? (moderated as hell)

allowed HTML tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>