Regular Expressions In PHP

The PHP programming language provides support for two different types of regular expressions: POSIX Extended and Perl Compatible. The Perl Compatible ones are, as far as is possible, compatible with Perl 5 regex syntax. PCRE regexes support a range of features that the POSIX Extended ones do not, are safe for use with binary data and are often faster. The POXIX Extended regexes are supported by a range of functions starting with the word "ereg". The PCRE ones are supported by a range of functions with the prefix "preg_". Examples of both will be given throughout this tutorial.

Matching A String Against A Pattern

To match a pattern against a string using POSIX Extended regexes, simply use the "ereg" function. The first parameter is a string (or string literal) containing the the pattern; the second parameter is the string we're checking against the pattern. The function returns true if there is a match, and false otherwise.
$mypets = "2 cats 1 monkey";
ereg("monkeys?", $mypets); // Returns true
ereg("lizards?", $mypets); // Returns false
To match a pattern against a string using Perl Compatible regexes, use "preg_match" or "preg_match_all". The first parameter is the pattern and the second is the string to match the pattern against. Both return the number of times the pattern matched, however "preg_match" will only ever attempt to match it once, so its return value will only ever be 1 (matched) or 0 (didn't match). Note that the pattern must be surrounded by a suitable delimiter, e.g. "/pattern/"; most non-alphanumeric characters other than the backslash are allowed, and as in Perl you can do things like "<pattern>" (e.g. matched angle brackets).
// String we'll do the examples on.
$friends = 'Fred Jack Sam Francis';
// Match any name beginning with an F.
$numMatches = preg_match("\bF\w+\b", $friends);
echo $numMatches; // Prints 1
$numMatches = preg_match_all("\bF\w+\b", $friends);
echo $numMatches; // Prints 2
// Any friends whose name starts with T?
$numMatches = preg_match("\bT\w+\b", $friends);
echo $numMatches; // Prints 0 - e.g. no matches.


Extracting Matches

The POSIX Extended match functions take an optional third parameter that will become an array containing the string that matched and any captures, provided the pattern matches. The first capture will be the 1st element of the array, the second capture the 2nd element, etc. The 0th element is the entire substring that the pattern matched.
$mypets = "2 cats 1 monkey";
ereg("(\d+)\s+monkeys?", $mypets, $matches); // Returns true
echo $matches[0]; // Prints 1 monkey
echo $matches[1]; // Prints 1
The Perl Compatible regex functions also take a third parameter. Again, the 0th element is the entire substring matched, the 1st element is the first capture, etc. When using preg_match, extracting the matches is no different from the ereg functions.
// String we'll do the examples on.
$friends = 'Fred Jack Sam Francis';
// Extract the first name beginning with an F.
$numMatches = preg_match("\b(F\w+)\b", $friends, $matches);
echo $matches[1]; // Prints Fred
When using preg_match_all, all of the matches are returned in an array or arrays, which you may like to think of as a two dimensional array. The 0th element in the matches array this time is an array that contains all of the substrings that match; the 1st element is an array that contains all of the first captures from each of those substrings, and so on.
// String we'll do the examples on.
$friends = 'Fred Jack Sam Francis';
// Extract all names beginning with an F. This code prints:
// Fred
// Francis
$numMatches = preg_match("\b(F\w+)\b", $friends, $matches);
for ($i = 0; $i < $numMatches; $i++) {
    echo $matches[1][$i];
}


Modifiers

Modifiers are extra parameters that we give the regex engine to tell it how we want it to carry out the match. POSIX Extended regexes don't have modifiers, however a variant of all of the ereg functions that does a case insensitive match are available, namely "eregi", "eregi_replace" and "spliti".
$language = "php";
ereg("PHP", $language); // returns false
eregi("PHP", $language); // returns true
The Perl Compatible regex functions support modifiers in the same way Perl does - by adding letters after the trailing / of the regex. Some of these include:
  • i for doing a case insensitive match.
  • x for making whitespace in the pattern not count as part of the pattern; you can spread the pattern over multiple lines and even add comments. Note that whitespcae is still significant inside character classes.
  • s makes the . metacharacter match newline characters; it doesn't by default.
  • m makes ^ and $ match the start and end of a line rather than the start and end of a string.
Here are some examples.
// The difference made the the i modifier.
$language = "php";
preg_match("/PHP/", $language); // returns 0
preg_match("/PHP/i", $language); // returns 1
// We can space stuff out.
preg_match("/P H P/ix", $language); // returns 1 - spaces don't matter


Substitutions

The ereg_replace and case insensitive "eregi_replace" functions do substitutions for POSIX Extended regexes. The first parameter is the pattern to match, the second is what to replace the matched substring with, and the third is the string that we're doing the match on. Unlike "ereg", they do not return true or false, but rather return the modified string. The original string is left unchanged. If the pattern doesn't match, the unmodified string is returned; if you need to know if anything was substituted then you should keep a copy of the old string around to compare with the potentially modified one. It is possible to substitute captures from the pattern into the replacement using \\n notation, e.g. for the first capture, write \\1, for the second write \\2, etc. \\0 stores the entire string matched. Here are some examples.
// A simple text replacement.
$request = "I want an ant.";
$newrequest = ereg_replace("\bant", "elephant", $request);
echo $request; // $request unchanged - prints I want an ant.
echo $newrequest; // Modified - prints I want an elephant.
// If you don't need to keep the unmodified string...
$request = "I want an ant.";
$request = ereg_replace("\bant", "elephant", $request);
echo $request; // Prints I want a elephant.
// Using matches in the replacement.
$nofood = "I have 2 thin cows.";
$withfood = ereg_replace(" (\d+) thin (\w+)", " \\1 fat \\2", $nofood);
echo $withfood; // prints I have 2 fat cows.
The Perl Compatible regex function for doing substitutions is preg_replace. Like ereg_replace, the first parameter is the pattern to find (remember to put it between delimiters, e.g. slashes), the second is what to replace it with (no slashes needed here) and the third is the string that we want to substitute our replacements into. As with preg_replace, the original string is not touched and a new string with the replacements is returned.
// A simple text replacement.
$request = "I want an ant.";
$newrequest = ereg_replace("/\b ant/x", "elephant", $request);
echo $request; // $request unchanged - prints I want an ant.
echo $newrequest; // Modified - prints I want an elephant.
// If you don't need to keep the unmodified string...
$request = "I want an ant.";
$request = ereg_replace("/\b ant/x", "elephant", $request);
echo $request; // Prints I want a elephant.
Using the matches in the replacement is similar again - the \\n notation can be used. There is another notation, $n, which can also be written ${n}. This is useful if you have a literal number that you want to put in your replacement that immediately follows a capture.
// Using matches in the replacement.
$nofood = "I have 2 thin cows.";
$withfood = ereg_replace("/ (\d+) thin (\w+) /", " \\1 fat \\2", $nofood);
echo $withfood; // prints I have 2 fat cows.
// This regex not only makes cows fat, but makes more of them!
$nomagic = "I have 2 thin cows.";
$withmagic = ereg_replace("/ (\d+) thin (\w+) /", " ${1}0 fat ${2}", $nofood);
echo $withmagic; // prints I have 20 fat cows.


Interpolating Variables

Sometimes it is useful to be able to use a variable in a pattern. You need to be careful here, as variables that are used to make up a pattern are interpreted as part of the pattern, not just as literals. That means that if they contain metacharacters, then things may not work quite as you expect. You can get around this by using the preg_quote function, which takes a string and returns a copy of that string with all metacharacters escaped, so using it as part of a pattern will be safe.
// This example will find if a word exists in some text.
$text = 'This monkey will cost $20, please.';
$word = '$20'; // Note the $ sign, which is a metacharacter.
$safeword = preg_quote($word);
echo $safeword; // Prints \$20 - the metacharacter is escaped
$numMatches = preg_match_all("/\b" . $safeword . "\b/", $text);
echo $numMatches; // Prints 1


Further Reading

The PHP documentation contains a great deal more detail than this tutorial. See the sections on POXIX Extended Regexes and Perl Compatible Regexes.

  User Comments


Anonymous
(Not rated)
(Report as abusive)
new String("123")????
new String(<literal>) is a newbie mistake... just use the literal directly.
Anonymous

(Report as abusive)
the comment above is incorrect
It is true that declaring variables unnecessarily is a frequent newbie mistake. However, in this case, it is necessary to have a String object to operate on.

For example, if you don't put the string in var text here, you can't use the replace method on it:

var text = "abababab";
var altered = text.replace(/b/, 'a');

Would you write that as
var altered = "abababab".replace(/b/, 'a');

I think not.

However, I do think that there is a typo in the telephone number example. The string is placed into a variable named "phone", so the second line should probably be:
var lastfour = phone.match(/\d{4}$/);

instead of
var lastfour = text.match(/\d{4}$/);

which (probably inadvertently) references a variable named "text" which is used in adjacent examples.
  View all   Rate and comment this article




 
Printer friendly version of the RegexPHP page




Newsletter | Submit Content | About | Advertising | Awards | Contact Us | Link to us |
© 1996-2008 Community Networks Ltd All rights reserved. Reproduction in whole or in part, in any form or medium without express written permission is prohibited. Violators of this policy may be subject to legal action. Please read Terms Of Use and Privacy Statement for more information. Development by Synchron Data - .NET development.