SiteExperts.com Logo Home | Community | Developer's Paradise | Jobs
User Groups | Site Tools | Site Information | Search

Inside Technique : Easy Cross-Browser Form Validation Using Regular Expressions
By Karen Gayda

Introduction

Validating user input is the bane of every software developer’s existence. When you are developing cross-browser web applications (for IE and Netscape v4.0 or higher) this task becomes even less enjoyable due to the lack of useful intrinsic validation functions in JavaScript. Fortunately, JavaScript 1.2 has incorporated regular expressions. In this article I will present a brief tutorial on the basics of regular expressions and then give some examples of how they can be used to simplify data validation. A demonstration page and code library of common validation functions has been included to supplement the examples in the article.

Regular Expressions and Patterns

Regular expressions are very powerful tools for performing pattern matches. PERL programmers and UNIX shell programmers have enjoyed the benefits of regular expressions for years. Once you master the pattern language, most validation tasks become trivial. You can perform complex tasks that once required lengthy procedures with just a few lines of code using regular expressions.

So how are regular expressions implemented in JavaScript? There are two intrinsic objects associated with programming regular expressions: the RegExp object and the Regular Expression object. The RegExp object is the parent to the regular expression object. RegExp has a constructor function that instantiates a Regular Expression object much like the Date object instantiates an new date. If you wanted to create a Regular Expression object, you would use the following syntax:

var RegularExpression  =  new RegExp( “pattern”, [“switch”] );

JavaScript has an alternate syntax for creating Regular Expression objects that implicitly calls the RegExp constructor function. The syntax for that method is the following:

var RegularExpression = /pattern/[switch]

To use the Regular Expression object to validate the user input you must be able to define a pattern string that represents the search criteria. Patterns are defined using string literal characters and metacharacters. For example, to determine if a string contained a valid US zip code you would use the following search pattern:

/(^\d{5}$)|(^\d{5}-\d{4}$)/

At first glance this looks like a comic strip version of something I might say when my code won’t run. It is actually a pattern that you can use to confirm that a string contains a valid 5-digit zip code or zip+4 zip code. The pattern is divided into two parts. Regular expressions use parentheses for grouping and precedence like mathematical expressions. The part in the first set of parentheses matches a 5-digit zip code. The pipe symbol in between denotes an OR operation. The part contained in the second set of parentheses matches a zip+4 zip code.

For simplicity, let’s deconstruct just the first part of the pattern, ^\d{5}$.

  • ^ indicates the beginning of the string. Using a ^ metacharacter requires that the match start at the beginning.
  • \d indicates a digit character and the {5} following it means that there must be 5 consecutive digit characters.
  • $ indicates the end of the string. Using a $ metacharacter requires that the match end at the end of the string.

Translated to English, this pattern states: “Starting at the beginning of the string there must be nothing other than 5 digits. There must also be nothing following those 5 digits.”

Categories Of Regular Expression Pattern Characters

Pattern-matching characters can be grouped into several categories. The following are categorized tables explaining the use of the pattern-matching characters.

Position Matching

Symbol Function
^

Only matches the beginning of a string.

"^P" matches first "P" in "Paul Peterson, President."

$

Only matches the ending of a string.

"t$" matches the last "t" in "A cat in the hat"

\b

Matches any word boundary (test characters must exist at the beginning or end of a word within the string)

"ly\b" matches "ly" in "regular expressions are really cool."

\B

Matches any non-word boundary

“\Bor” matches the “or” in normal but not the one in origami.

Literals

Symbol Function
Alphanumeric

Matches alphabetical and numerical characters literally.

“2 days” matches “2 days”

\n

Matches a new line character

\f

Matches a form feed character

\r

Matches carriage return character

\t

Matches a horizontal tab character

\v

Matches a vertical tab character

\?

Matches ?

\*

Matches *

\+

Matches +

\.

Matches .

\|

Matches |

\{

Matches {

\}

Matches }

\\

Matches \

\[

Matches [

\]

Matches ]

\(

Matches (

\)

Matches )

\xxx

Matches the ASCII character expressed by the octal number xxx.

"\50" matches left parentheses character "("

\xdd

Matches the ASCII character expressed by the hex number dd.

"\x28" matches left parentheses character "("

\uxxxx

Matches the ASCII character expressed by the UNICODE xxxx.

"\u00A3" matches "£".

Character Classes

Symbol Function
[xyz]

Match any one character enclosed in the character set.

"/[AN]BC/" matches ABC and NBC but not BBC since the leading “B” is not in the set.

[^xyz]

Match any one character not enclosed in the character set. The caret indicates that none of the characters

"/[^AN]BC/" matches BBC and NBC but not ABC or NBC.

NOTE: the caret used within a character class is not to be confused with the caret that denotes the beginning of a string. Negation is only performed within the square brackets.

.


“\b.t\” matches bat, bit, but, bet.

\w

Match any single word (non- punctuation or non-whitespace) character. Equivalent to [a-zA-Z_0-9].

\W

Match any single non-word character. Equivalent to [^a-zA-Z_0-9].

\d

Match any single digit. Equivalent to [0-9].

\D

Match any non-digit. Equivalent to [^0-9].

\s

Match any single space character. Equivalent to [ \t\r\n\v\f].

\S

Match any single non-space character. Equivalent to [^ \t\r\n\v\f].

Repetition

SymbolFunction
{x}

Match exactly x occurrences of a regular expression.

"\d{5}" matches 5 digits.

{x,}

Match x or more occurrences of a regular expression.

"\s{2,}" matches at least 2 space characters.

{x,y}

Matches x to y number of occurrences of a regular expression.
B "\d{2,3}" matches at least 2 but no more than 3 digits.

?

Match zero or one occurrences. Equivalent to {0,1}.

"a\s?b" matches "ab" or "a b".

*

Match zero or more occurrences. Equivalent to {0,}.

+

Match one or more occurrences. Equivalent to {1,}.

Alternation & Grouping

SymbolFunction
()

Grouping a clause to create a clause. May be nested. "(abc)+(def)" matches one or more occurrences of "abc" followed by one occurrence of "def".

|

Alternation combines clauses into one regular expression and then matches any of the individual clauses.

"(ab)|(cd)|(ef)" matches "ab" or "cd" or "ef".

Backreferences

SymbolFunction
()\n

Matches a parenthesized clause in the pattern string. n is the number of the clause to the left of the backreference.

"(\w+)\s+\1" matches any word that occurs twice in a row, such as "hubba hubba." The \1 denotes that the first word after the space must match the portion of the string that matched the pattern in the last set of parentheses. If there were more than one set of parentheses in the pattern string you would use \2 or \3 to match the appropriate grouping to the left of the backreference. Up to 9 backreferences can be used in a pattern string.

Pattern Switches

In addition to the pattern-matching characters, you can use switches to make the match global or case- insensitive or both. The following is an example of a pattern string definition that uses a switch:

/\s/g

This pattern and switch combination matches all occurrences of a space because it uses the global switch. Below is a table of pattern switches.

Switches

PropertyDescription
i

Ignore the case of characters.

g

Global search for all occurrences of a pattern

gi

Global search, ignore case.


Next - The Regular Expression Object

Page 1:Easy Cross-Browser Form Validation Using Regular Expressions
Page 2:The Regular Expression Object
Page 3:Sample Usage
Page 4:Demonstration