Module Re.Str
Module Str
: regular expressions and high-level string processing
Regular expressions
val regexp : string -> regexp
Compile a regular expression. The syntax for regular expressions is the same as in Gnu Emacs. The special characters are
$^.*+?[]
. The following constructs are recognized:.
matches any character except newline*
(postfix) matches the previous expression zero, one or several times+
(postfix) matches the previous expression one or several times?
(postfix) matches the previous expression once or not at all[..]
character set; ranges are denoted with-
, as in[a-z]
; an initial^
, as in[^0-9]
, complements the set^
matches at beginning of line$
matches at end of line\|
(infix) alternative between two expressions\(..\)
grouping and naming of the enclosed expression\1
the text matched by the first\(...\)
expression (\2
for the second expression, etc)\b
matches word boundaries\
quotes special characters.
val regexp_case_fold : string -> regexp
Same as
regexp
, but the compiled expression will match text in a case-insensitive way: uppercase and lowercase letters will be considered equivalent.
String matching and searching
val string_match : regexp -> string -> int -> bool
string_match r s start
tests whether the characters ins
starting at positionstart
match the regular expressionr
. The first character of a string has position0
, as usual.
val search_forward : regexp -> string -> int -> int
search_forward r s start
searches the strings
for a substring matching the regular expressionr
. The search starts at positionstart
and proceeds towards the end of the string. Return the position of the first character of the matched substring, or raiseNot_found
if no substring matches.
val search_backward : regexp -> string -> int -> int
Same as
search_forward
, but the search proceeds towards the beginning of the string.
val string_partial_match : regexp -> string -> int -> bool
Similar to
string_match
, but succeeds whenever the argument string is a prefix of a string that matches. This includes the case of a true complete match.
val matched_string : string -> string
matched_string s
returns the substring ofs
that was matched by the lateststring_match
,search_forward
orsearch_backward
. The user must make sure that the parameters
is the same string that was passed to the matching or searching function.
val match_beginning : unit -> int
val match_end : unit -> int
match_beginning ()
returns the position of the first character of the substring that was matched bystring_match
,search_forward
orsearch_backward
.match_end ()
returns the position of the character following the last character of the matched substring.
val matched_group : int -> string -> string
matched_group n s
returns the substring ofs
that was matched by then
th group\(...\)
of the regular expression during the lateststring_match
,search_forward
orsearch_backward
. The user must make sure that the parameters
is the same string that was passed to the matching or searching function.matched_group n s
raisesNot_found
if then
th group of the regular expression was not matched. This can happen with groups inside alternatives\|
, options?
or repetitions*
. For instance, the empty string will match\(a\)*
, butmatched_group 1 ""
will raiseNot_found
because the first group itself was not matched.
val group_beginning : int -> int
val group_end : int -> int
group_beginning n
returns the position of the first character of the substring that was matched by then
th group of the regular expression.group_end n
returns the position of the character following the last character of the matched substring. Both functions raiseNot_found
if then
th group of the regular expression was not matched.
Replacement
val global_replace : regexp -> string -> string -> string
global_replace regexp templ s
returns a string identical tos
, except that all substrings ofs
that matchregexp
have been replaced bytempl
. The replacement templatetempl
can contain\1
,\2
, etc; these sequences will be replaced by the text matched by the corresponding group in the regular expression.\0
stands for the text matched by the whole regular expression.
val replace_first : regexp -> string -> string -> string
Same as
global_replace
, except that only the first substring matching the regular expression is replaced.
val global_substitute : regexp -> (string -> string) -> string -> string
global_substitute regexp subst s
returns a string identical tos
, except that all substrings ofs
that matchregexp
have been replaced by the result of functionsubst
. The functionsubst
is called once for each matching substring, and receivess
(the whole text) as argument.
val substitute_first : regexp -> (string -> string) -> string -> string
Same as
global_substitute
, except that only the first substring matching the regular expression is replaced.
val replace_matched : string -> string -> string
replace_matched repl s
returns the replacement textrepl
in which\1
,\2
, etc. have been replaced by the text matched by the corresponding groups in the most recent matching operation.s
must be the same string that was matched during this matching operation.
Splitting
val split : regexp -> string -> string list
split r s
splitss
into substrings, taking as delimiters the substrings that matchr
, and returns the list of substrings. For instance,split (regexp "[ \t]+") s
splitss
into blank-separated words. An occurrence of the delimiter at the beginning and at the end of the string is ignored.
val bounded_split : regexp -> string -> int -> string list
Same as
split
, but splits into at mostn
substrings, wheren
is the extra integer parameter.
val split_delim : regexp -> string -> string list
val bounded_split_delim : regexp -> string -> int -> string list
Same as
split
andbounded_split
, but occurrences of the delimiter at the beginning and at the end of the string are recognized and returned as empty strings in the result. For instance,split_delim (regexp " ") " abc "
returns[""; "abc"; ""]
, whilesplit
with the same arguments returns["abc"]
.
val full_split : regexp -> string -> split_result list
val bounded_full_split : regexp -> string -> int -> split_result list
Same as
split_delim
andbounded_split_delim
, but returns the delimiters as well as the substrings contained between delimiters. The former are taggedDelim
in the result list; the latter are taggedText
. For instance,full_split (regexp "[{}]") "{ab}"
returns[Delim "{"; Text "ab"; Delim "}"]
.
Extracting substrings
val string_before : string -> int -> string
string_before s n
returns the substring of all characters ofs
that precede positionn
(excluding the character at positionn
).
val string_after : string -> int -> string
string_after s n
returns the substring of all characters ofs
that follow positionn
(including the character at positionn
).