std:: regex_token_iterator

From cppreference.net

定义于头文件 `<regex>`
template < class BidirIt, class CharT = typename std:: iterator_traits < BidirIt > :: value_type , class Traits = std:: regex_traits < CharT > > class regex_token_iterator		(C++11 起)

std::regex_token_iterator 是一种只读的 LegacyForwardIterator ，用于访问底层字符序列中正则表达式每次匹配的各个子匹配项。它也可用于访问给定正则表达式未匹配的序列部分（例如作为分词器使用）。

在构造时，它会构建一个 std::regex_iterator ，并在每次递增时遍历当前匹配结果中的请求子匹配，当从最后一个子匹配递增时，会递增底层的 std::regex_iterator 。

默认构造的 std::regex_token_iterator 是序列末尾迭代器。当有效的 std::regex_token_iterator 在到达最后一个匹配的最后一个子匹配后继续递增时，它将变为等于序列末尾迭代器。此时再对其进行解引用或递增将引发未定义行为。

在成为结束序列迭代器之前，如果索引 - 1 （未匹配片段）出现在请求的子匹配索引列表中， std::regex_token_iterator 可能会变成 后缀迭代器 。此类迭代器在解引用时，会返回对应于最后匹配项与序列结束之间字符序列的 match_results 对象。

std::regex_token_iterator 的典型实现包含以下组成部分：底层的 std::regex_iterator 、存储请求的子匹配索引的容器（例如 std:: vector < int > ）、等于子匹配索引值的内部计数器、指向当前匹配项中当前子匹配的 std::sub_match 指针，以及包含最后未匹配字符序列（用于分词器模式）的 std::match_results 对象。

定义于头文件 `<regex>`
类型	定义
`std::cregex_token_iterator`	std :: regex_token_iterator < const char * >
`std::wcregex_token_iterator`	std :: regex_token_iterator < const wchar_t * >
`std::sregex_token_iterator`	std :: regex_token_iterator < std:: string :: const_iterator >
`std::wsregex_token_iterator`	std :: regex_token_iterator < std:: wstring :: const_iterator >

成员类型

成员类型	定义
`value_type`	std:: sub_match < BidirIt >
`difference_type`	std::ptrdiff_t
`pointer`	const value_type *
`reference`	const value_type &
`iterator_category`	std::forward_iterator_tag
`iterator_concept` (C++20)	std::input_iterator_tag
`regex_type`	std:: basic_regex < CharT, Traits >

成员函数

(constructor)	构造新的 `regex_token_iterator` (公开成员函数)
(destructor) (implicitly declared)	销毁 `regex_token_iterator` ，包括缓存值 (公开成员函数)
operator=	赋值内容 (公开成员函数)
operator== operator!= (removed in C++20)	比较两个 `regex_token_iterator` (公开成员函数)
operator* operator->	访问当前子匹配 (公开成员函数)
operator++ operator++ (int)	将迭代器推进到下一个子匹配 (公开成员函数)

注释

程序员有责任确保传递给迭代器构造函数的 std::basic_regex 对象生命周期长于迭代器。由于迭代器内部存储的 std::regex_iterator 持有指向正则表达式对象的指针，若在正则表达式对象被销毁后继续递增迭代器，将导致未定义行为。

示例

运行此代码

#include <algorithm>
#include <fstream>
#include <iostream>
#include <iterator>
#include <regex>
int main()
{
    // 分词（非匹配片段）
    // 注意正则表达式仅匹配两次；当获取第三个值时
    // 迭代器将成为后缀迭代器
    const std::string text = "Quick brown fox.";
    const std::regex ws_re("\\s+"); // 空白字符
    std::copy(std::sregex_token_iterator(text.begin(), text.end(), ws_re, -1),
              std::sregex_token_iterator(),
              std::ostream_iterator<std::string>(std::cout, "\n"));
    std::cout << '\n';
    // 遍历第一个子匹配
    const std::string html = R"(<p><a href="http://google.com">google</a> )"
                             R"(< a HREF ="http://cppreference.net">cppreference</a>\n</p>)";
    const std::regex url_re(R"!!(<\s*A\s+[^>]*href\s*=\s*"([^"]*)")!!", std::regex::icase);
    std::copy(std::sregex_token_iterator(html.begin(), html.end(), url_re, 1),
              std::sregex_token_iterator(),
              std::ostream_iterator<std::string>(std::cout, "\n"));
}

输出：

Quick
brown
fox.
http://google.com
http://cppreference.net

缺陷报告

以下行为变更缺陷报告被追溯应用于先前发布的C++标准。

缺陷报告	应用于	发布时的行为	正确行为
LWG 3698 ( P2770R0 )	C++20	`regex_token_iterator` 作为暂存迭代器时被定义为 `forward_iterator`	改为 `input_iterator` ^[1]

↑ iterator_category 未因该决议而更改，因为将其更改为 std::input_iterator_tag 可能会破坏过多现有代码。

Compiler support
Freestanding and hosted
Language
Standard library
Standard library headers
Named requirements
Feature test macros (C++20)
Language support library
Concepts library (C++20)
Diagnostics library
Memory management library
Metaprogramming library (C++11)
General utilities library
Containers library
Iterators library
Ranges library (C++20)
Algorithms library
Strings library
Text processing library
Numerics library
Date and time library
Input/output library
Filesystem library (C++17)
Concurrency support library (C++11)
Execution control library (C++26)
Technical specifications
Symbols index
External libraries

Classes
basic_regex (C++11)
sub_match (C++11)
match_results (C++11)
Algorithms
regex_match (C++11)
regex_search (C++11)
regex_replace (C++11)
Iterators
regex_iterator (C++11)
regex_token_iterator (C++11)
Exceptions
regex_error (C++11)
Traits
regex_traits (C++11)
Constants
syntax_option_type (C++11)
match_flag_type (C++11)
error_type (C++11)
Regex Grammar
Modified ECMAScript-262 (C++11)

Member functions
regex_token_iterator::regex_token_iterator
regex_token_iterator::operator=
Comparisons
regex_token_iterator::operator== regex_token_iterator::operator!= (until C++20)
Observers
regex_token_iterator::operator* regex_token_iterator::operator->
Modifiers
regex_token_iterator::operator++ regex_token_iterator::operator++(int)

cppreference.net

Namespaces

Variants

std:: regex_token_iterator

目录

类型要求

特化

成员类型

成员函数

注释

示例

缺陷报告