Wednesday, March 06, 2013

 

.NET Regex: Character Class Subtraction

You’ve heard of a positive match character group [] and a negative match character group [^]. But did you know there is also a Character Class Subtraction? I didn’t. It’s supported in .NET but not in the majority of RegEx flavours.

A character class subtraction expression has the following form:

          [base_group - [excluded_group]]

The square brackets ([]) and hyphen (-) are mandatory. The base_group is a positive or negative character group as described in the Character Class Syntax table. The excluded_group component is another positive or negative character group, or another character class subtraction expression (that is, you can nest character class subtraction expressions).

For example, suppose you have a base group that consists of the character range from "a" through "z". To define the set of characters that consists of the base group except for the character "m", use [a-z-[m]]. To define the set of characters that consists of the base group except for the set of characters "d", "j", and "p", use [a-z-[djp]]. To define the set of characters that consists of the base group except for the character range from "m" through "p", use [a-z-[m-p]].

Using this format, the pattern

^[\w-[v123]]$

can be used in .NET to match all alphanumeric characters (any word character) excluding the letter v and numbers 123. 


The MSDN page for the .NET Regex definitions doesn’t seem to appear high in the search indexes, so bookmarking here for my future reference: Character Classes in Regular Expressions


This is useful for comparing Regex capabilities in different languages: regex flavor comparison chart



    

Powered by Blogger