REGEX in JavaScript

本文主要介绍正则表达式的基本概念与用法，并对 JS 中的正则表达式的常用方法进行总结，方便后续记忆与复习。

查看参考教程

参考方向	教程原帖
Learn regex the easy way	Learn regex the easy way
菜鸟教程-正则表达式教程	Runoob 正则表达式教程
JS 中正则表达式	MDN-RegExp

正则表达式基本概念

元字符

元字符	描述
`.`	句号匹配任意单个字符除了换行符。
`[ ]`	字符种类。匹配方括号内的任意字符。
`[^ ]`	否定的字符种类。匹配除了方括号里的任意字符
`*`	匹配`>=0`个重复的在`*`号之前的字符。
`+`	匹配`>=1`个重复的+号前的字符。
`?`	标记`?`之前的字符为可选.
`{n,m}`	匹配`num`个大括号之前的字符或字符集 (`n <= num <= m`).
`(xyz)`	字符集，匹配与 `xyz` 完全相等的字符串.
`\|`	或运算符，匹配符号前或后的字符.
`\`	转义字符,用于匹配一些保留的字符 `[ ] ( ) { } . * + ? ^ $ \ \|`
`^`	从开始行开始匹配.
`$`	从末端开始匹配.

[.]只匹配字符.，表达式 ar[.] 匹配 ar.字符串。
()会形成一个捕获组并获取当前匹配，后续可以通过\1，\2等方式进行反向引用。

简写字符集

正则表达式提供一些常用的字符集简写。如下:

简写	描述
`.`	除换行符外的所有字符
`\w`	匹配所有字母数字以及下划线，等同于 `[a-zA-Z0-9_]`
`\W`	匹配所有非字母数字，即符号，等同于： `[^\w]`
`\d`	匹配数字： `[0-9]`
`\D`	匹配非数字： `[^\d]`
`\s`	匹配所有空格字符，等同于： `[\t\n\f\r\p{Z}]`
`\S`	匹配所有非空格字符： `[^\s]`
`\f`	匹配一个换页符
`\n`	匹配一个换行符
`\r`	匹配一个回车符
`\t`	匹配一个制表符
`\v`	匹配一个垂直制表符
`\p`	匹配 `CR/LF`（等同于 `\r\n`），用来匹配 DOS 行终止符

零宽度断言

符号	描述	实例
`?=`	正先行断言-存在	`exp1(?=exp2)`：查找 `exp2` 前面的 `exp1`。
`?!`	负先行断言-排除	`exp1(?!exp2)`：查找后面不是 `exp2` 的 `exp1`。
`?<=`	正后发断言-存在	`(?<=exp2)exp1`：查找 `exp2` 后面的 `exp1`。
`?<!`	负后发断言-排除	`(?<!exp2)exp1`：查找前面不是 `exp2` 的 `exp1`。

?:、?=、?!都是非捕获元，不会获取匹配结果。零宽度断言都是非捕获元。

匹配模式

标志	描述
`i`	忽略大小写。
`g`	全局搜索。返回多个匹配项。
`m`	多行修饰符：锚点元字符 `^` `$` 工作范围在每行的起始。即每行以特定字符开始或者结束。
`s`	特殊字符圆点 . 中包含换行符 `\n`。

TIPS

\b与\B的区别
在正则表达式中，\b 和 \B 是用于匹配单词边界的特殊元字符。
- \b（单词边界）匹配以下三种情况之一：
  1. 单词的开头：如果 \b 出现在一个字母或数字之前，或者在字符串的开头，它会匹配一个单词的开始位置。
  2. 单词的结尾：如果 \b 出现在一个字母或数字之后，或者在字符串的末尾，它会匹配一个单词的结束位置。
  3. 单词的内部：如果 \b 出现在两个连续的字母或数字之间，它不会匹配任何内容，因为没有单词边界。
以下是一些使用 \b 的实例：
- 正则表达式 \bword\b 匹配整个单词 “word”，但不匹配 “words” 或 “sword”。
- 正则表达式 \b\d+\b 匹配一个完整的数字，例如 “123”，但不匹配 “abc123”。
- 正则表达式 \b [A-Z]+\b 匹配一个完整的大写字母单词，例如 “HELLO”，但不匹配 “HELLO WORLD”。
需要注意的是，\b 是一个零宽度断言，它不匹配实际的字符，只匹配位置。因此，在想要匹配实际字符时，请不要使用 \b，而应该使用其他字符或字符组合。
- \B（非单词边界）在正则表达式中表示非单词边界，即匹配不在单词边界处的位置。具体来说：
  1. 单词的内部：如果 \B 出现在两个连续的字母或数字之间，它会匹配这两个字符之间的位置，表示它们不是单词的边界。
  2. 非单词的开头或结尾：如果 \B 出现在一个字母或数字之前或之后，它会匹配这个位置，表示它不是单词的开头或结尾。
以下是一些使用 \B 的实例：
- 正则表达式 \Bword\B 匹配 “swords” 中的 “word”，但不匹配 “word” 或 “sword”。
- 正则表达式 \B\d+\B 匹配 “abc123def” 中的 “123”。
- 正则表达式 \B [A-Z]+\B 匹配 “HELLO WORLD” 中的 “ELL”和”ORL”。
需要注意的是，与 \b 不同，\B 也是一个零宽度断言，只匹配位置而不匹配实际字符。
捕获组的相关概念

当我们在正则表达式中使用捕获组时，我们可以将子表达式匹配的内容保存到内存中，以便后续引用。这对于处理复杂的文本匹配和替换非常有用。让我详细解释一下捕获组的相关概念。
1. 普通捕获组：
  - 普通捕获组是按照左括号出现的顺序进行分组。
  - 从正则表达式左侧开始，每出现一个左括号 ( 记做一个分组，分组编号从 1 开始。编号 0 代表整个表达式。
  - 例如，对于时间字符串 2017-04-25，以下正则表达式有 4 个左括号，所以有 4 个分组：
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    const pattern = /(\d{4})-((\d{2})-(\d{2}))/; console.log("2017-04-25".match(pattern)); // { // "0": "2017-04-25", // "1": "2017", // "2": "04-25", // "3": "04", // "4": "25", // "index": 0, // "input": "2017-04-25" // "groups": undefined // }
    - 编号 0: (\d{4})-((\d{2})-(\d{2})) 匹配整个日期字符串 2017-04-25
    - 编号 1: (\d{4}) 匹配年份 2017
    - 编号 2: ((\d{2})-(\d{2})) 匹配月份和日期 04-25
    - 编号 3: (\d{2}) 匹配月份 04
    - 编号 4: (\d{2}) 匹配日期 25
2. 命名捕获组：
  - 命名捕获组是为了给捕获组命名，方便后续引用。
  - 每个以左括号开始的捕获组都紧跟着 ? ，而后才是正则表达式。
  - 例如，对于时间字符串 2017-04-25，以下正则表达式有 4 个命名的捕获组：
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    const pattern = /(?<year>\d{4})-(?<md>((?<month>\d{2})-(?<date>\d{2})))/; console.log("2017-04-25".match(pattern)); // { // "0": "2017-04-25", // "1": "2017", // "2": "04-25", // "3": "04-25", // "4": "04", // "5": "25", // "index": 0, // "input": "2017-04-25", // "groups": { // "year": "2017", // "md": "04-25", // "month": "04", // "date": "25" // } // }
    - 名称 year: (\d{4}) 匹配年份 2017
    - 名称 md: ((?<month>\d{2})-(?<date>\d{2})) 匹配月份和日期 04-25
    - 名称 month: (\d{2}) 匹配月份 04
    - 名称 date: (\d{2}) 匹配日期 25
3. 非捕获组：
  - 在正则中可以使用非捕获元字符 ?:、?= 或 ?! 来重写捕获组，以 (?:Expression) 开头的捕获组就是非捕获组。
  - 非捕获组不会保存匹配到的文本内容到内存中，因此不占用内存且无分组编号，也不可被反向引用。

捕获组的反向引用

var str = "Is is the cost of of gasoline going up up";
var patt1 = /\b([a-z]+) \1\b/gim;
document.write(str.match(patt1));
// 结果为：Is is,of of,up up

JS 中的正则表达式

RegExp 对象的方法

exec：使用当前的正则表达式对象循环匹配字符串

如果匹配失败，exec() 方法返回 null，并将正则表达式的 lastIndex 重置为 0。

如果匹配成功，exec() 方法返回一个数组，并更新正则表达式对象的 lastIndex 属性。完全匹配成功的文本将作为返回数组的第一项，从第二项起，后续每项都对应一个匹配的捕获组。
```
1
2
3
4
5
6
7
8
9
```
```
const regex1 = RegExp("foo*", "g");
const str1 = "table football, foosball";
let array1;

while ((array1 = regex1.exec(str1)) !== null) {
  console.log(`Found ${array1[0]}. Next starts at ${regex1.lastIndex}.`);
  // Expected output: "Found foo. Next starts at 9."
  // Expected output: "Found foo. Next starts at 19."
}
```
当正则表达式设置 g 标志位时，可以多次执行 exec 方法来查找同一个字符串中的成功匹配。当你这样做时，查找将从正则表达式的 lastIndex 属性指定的位置开始。（test() 也会更新 lastIndex 属性）。注意，即使再次查找的字符串不是原查找字符串时，lastIndex 也不会被重置，它依旧会从记录的 lastIndex 开始。

test：test() 方法执行一个检索，用来查看正则表达式与指定的字符串是否匹配。返回 true 或 false。

const str = "table football";

const regex = new RegExp("foo*");
const globalRegex = new RegExp("foo*", "g");

console.log(regex.test(str));
// Expected output: true

console.log(globalRegex.lastIndex);
// Expected output: 0

console.log(globalRegex.test(str));
// Expected output: true

console.log(globalRegex.lastIndex);
// Expected output: 9

console.log(globalRegex.test(str));
// Expected output: false

String 对象的方法

match：match()方法检索字符串与正则表达式进行匹配的结果。

返回值：一个 Array，其内容取决于是否存在全局（g）标志，如果没有匹配，则返回 null。

如果使用 g 标志，则将返回与完整正则表达式匹配的所有结果，但不会返回捕获组。
如果没有使用 g 标志，则只返回第一个完整匹配及其相关捕获组。在这种情况下，match() 方法将返回与 RegExp.prototype.exec() 相同的结果（一个带有一些额外属性的数组）。

const str = "For more information, see Chapter 3.4.5.1";
const re = /see (chapter \d+(\.\d)*)/i;
const found = str.match(re);

console.log(found);
// [
//   'see Chapter 3.4.5.1',
//   'Chapter 3.4.5.1',
//   '.1',
//   index: 22,
//   input: 'For more information, see Chapter 3.4.5.1',
//   groups: undefined
// ]
const str = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
const regexp = /[A-E]/gi;
const matches = str.match(regexp);

console.log(matches);
// ['A', 'B', 'C', 'D', 'E', 'a', 'b', 'c', 'd', 'e']

matchAll：matchAll()方法返回一个迭代器，该迭代器包含了检索字符串与正则表达式进行匹配的所有结果（包括捕获组）。

注意如果参数是一个正则表达式则一定需要全局匹配
返回值为：一个匹配结果的可迭代迭代器对象 (en-US)（它不可重新开始）。每个匹配结果都是一个数组，其形状与 RegExp.prototype.exec() 的返回值相同。

注意matchAll方法内部会对Regex对象进行一次克隆，所以其并不会改变原来的Regex对象的lastIndex属性。

matchAll方法对比match方法可以更好的获得捕获组。

// exec 方法
const regexp = /foo[a-z]*/g;
const str = "table football, foosball";
let match;

while ((match = regexp.exec(str)) !== null) {
  console.log(
    `找到 ${match[0]} 起始位置=${match.index} 结束位置=${regexp.lastIndex}。`
  );
}
// 找到 football 起始位置=6 结束位置=14。
// 找到 foosball 起始位置=16 结束位置=24。

// matchAll 方法
const regexp = /foo[a-z]*/g;
const str = "table football, foosball";
const matches = str.matchAll(regexp);

for (const match of matches) {
  console.log(
    `找到 ${match[0]} 起始位置=${match.index} 结束位置=${
      match.index + match[0].length
    }.`
  );
}
// 找到 football 起始位置=6 结束位置=14.
// 找到 foosball 起始位置=16 结束位置=24.

// 匹配迭代器在 for...of 迭代后用尽
// 再次调用 matchAll 以创建新的迭代器
Array.from(str.matchAll(regexp), (m) => m[0]);
// [ "football", "foosball" ]

// matchAll 内部做了一个 regexp 的复制，所以不像 regexp.exec()，lastIndex 在字符串扫描后不会改变。然而，这也意味着，与在循环中使用 regexp.exec() 不同，你不能更改 lastIndex 来使正则表达式前进或倒退。

const regexp = /[a-c]/g;
regexp.lastIndex = 1;
const str = "abc";
Array.from(str.matchAll(regexp), (m) => `${regexp.lastIndex} ${m[0]}`);
// [ "1 b", "1 c" ]

// 使用 matchAll 方法可以更好的获得捕获组
const regexp = /t(e)(st(\d?))/g;
const str = "test1test2";
const array = [...str.matchAll(regexp)];
array[0];
// ['test1', 'e', 'st1', '1', index: 0, input: 'test1test2', length: 4]
array[1];
// ['test2', 'e', 'st2', '2', index: 5, input: 'test1test2', length: 4]

replace：replace() 方法返回一个新字符串，其中一个、多个或所有匹配的 pattern 被替换为 replacement。pattern 可以是字符串或 RegExp，replacement 可以是字符串或一个在每次匹配时调用的函数。如果 pattern 是字符串，则只会替换第一个匹配项。原始的字符串不会改变。

该方法并不改变调用它的字符串本身，而是返回一个新的字符串。
字符串模式只会被替换一次。要执行全局搜索和替换，请使用带有 g 标志的正则表达式或使用 replaceAll()。

const re = /(\w+)\s(\w+)/;
const str = "Maria Cruz";
const newstr = str.replace(re, "$2, $1");
console.log(newstr); // Cruz, Maria

const p =
  "The quick brown fox jumps over the lazy dog. If the dog reacted, was it really lazy?";

console.log(p.replace("dog", "monkey"));
// Expected output: "The quick brown fox jumps over the lazy monkey. If the dog reacted, was it really lazy?"

const regex = /Dog/i;
console.log(p.replace(regex, "ferret"));
// Expected output: "The quick brown fox jumps over the lazy ferret. If the dog reacted, was it really lazy?"
// 函数调用的情况
function replacer(match, p1, p2, p3, offset, string) {
  // p1 is non-digits, p2 digits, and p3 non-alphanumerics
  return [p1, p2, p3].join(" - ");
}
const newString = "abc12345#$*%".replace(/([^\d]*)(\d*)([^\w]*)/, replacer);
console.log(newString); // abc - 12345 - #$*%

replaceAll：配置属性与replace基本相同，相当于replace的全局配置版本。

使用字符串作为pattern和使用Regex对象作为pattern行为上有些区别

function unsafeRedactName(text, name) {
  return text.replace(new RegExp(name, "g"), "[REDACTED]");
}
function safeRedactName(text, name) {
  return text.replaceAll(name, "[REDACTED]");
}

const report =
  "A hacker called ha.*er used special characters in their name to breach the system.";

console.log(unsafeRedactName(report, "ha.*er")); // "A [REDACTED]s in their name to breach the system."
console.log(safeRedactName(report, "ha.*er")); // "A hacker called [REDACTED] used special characters in their name to breach the system."

search：search() 方法用于在 String 对象中执行正则表达式的搜索，寻找匹配项。

如果匹配成功，则返回正则表达式在字符串中首次匹配的索引；否则，返回 -1。

const paragraph =
  "The quick brown fox jumps over the lazy dog. If the dog barked, was it really lazy?";

// Any character that is not a word character or whitespace
const regex = /[^\w\s]/g;

console.log(paragraph.search(regex));
// Expected output: 43

console.log(paragraph[paragraph.search(regex)]);
// Expected output: "."

split：使用正则表达式来作为参数进行字符串分割。

匹配的正则表达式存在捕获组的话，返回的数组会包含捕获组中的元素。
第二个参数limit可以限制返回数组的长度

const names =
  "Harry Trump ;Fred Barney; Helen Rigby ; Bill Abel ;Chris Hand ";

console.log(names);

const re = /\s*(?:;|$)\s*/;
const nameList = names.split(re);

console.log(nameList);

// 运行结果
// Harry Trump ;Fred Barney; Helen Rigby ; Bill Abel ;Chris Hand
// [ "Harry Trump", "Fred Barney", "Helen Rigby", "Bill Abel", "Chris Hand", "" ]
const myString = "Hello 1 word. Sentence number 2.";
const splits = myString.split(/(\d)/);

console.log(splits);
// [ "Hello ", "1", " word. Sentence number ", "2", "." ]

// 限制返回结果的数组长度
const myString = "Hello World. How are you doing?";
const splits = myString.split(" ", 3);
console.log(splits); // [ "Hello", "World.", "How" ]