Module Str
Regular expressions and high-level string processing
Regular expressions
val regexp : string -> regexpCompile a regular expression. The following constructs are recognized:
.Matches any character except newline.*(postfix) Matches the preceding expression zero, one or several times+(postfix) Matches the preceding expression one or several times?(postfix) Matches the preceding expression once or not at all[..]Character set. Ranges are denoted with-, as in[a-z]. An initial^, as in[^0-9], complements the set. To include a]character in a set, make it the first character of the set. To include a-character in a set, make it the first or the last character of the set.^Matches at beginning of line: either at the beginning of the matched string, or just after a '\n' character.$Matches at end of line: either at the end of the matched string, or just before a '\n' character.\|(infix) Alternative between two expressions.\(..\)Grouping and naming of the enclosed expression.\1The text matched by the first\(...\)expression (\2for the second expression, and so on up to\9).\bMatches word boundaries.\Quotes special characters. The special characters are$^\.*+?[].
Note: the argument to
regexpis usually a string literal. In this case, any backslash character in the regular expression must be doubled to make it past the OCaml string parser. For example, the following expression:let r = Str.regexp "hello \\([A-Za-z]+\\)" in Str.replace_first r "\\1" "hello world"returns the string
"world".In particular, if you want a regular expression that matches a single backslash character, you need to quote it in the argument to
regexp(according to the last item of the list above) by adding a second backslash. Then you need to quote both backslashes (according to the syntax of string constants in OCaml) by doubling them again, so you need to write four backslash characters:Str.regexp "\\\\".
val regexp_case_fold : string -> regexpSame as
regexp, but the compiled expression will match text in a case-insensitive way: uppercase and lowercase letters will be considered equivalent.
val quote : string -> stringStr.quote sreturns a regexp string that matches exactlysand nothing else.
val regexp_string : string -> regexpStr.regexp_string sreturns a regular expression that matches exactlysand nothing else.
val regexp_string_case_fold : string -> regexpStr.regexp_string_case_foldis similar toStr.regexp_string, but the regexp matches in a case-insensitive way.
String matching and searching
val string_match : regexp -> string -> int -> boolstring_match r s starttests whether a substring ofsthat starts at positionstartmatches the regular expressionr. The first character of a string has position0, as usual.
val search_forward : regexp -> string -> int -> intsearch_forward r s startsearches the stringsfor a substring matching the regular expressionr. The search starts at positionstartand proceeds towards the end of the string. Return the position of the first character of the matched substring.- raises Not_found
if no substring matches.
val search_backward : regexp -> string -> int -> intsearch_backward r s lastsearches the stringsfor a substring matching the regular expressionr. The search first considers substrings that start at positionlastand proceeds towards the beginning of string. Return the position of the first character of the matched substring.- raises Not_found
if no substring matches.
val string_partial_match : regexp -> string -> int -> boolSimilar to
Str.string_match, but also returns true if the argument string is a prefix of a string that matches. This includes the case of a true complete match.
val matched_string : string -> stringmatched_string sreturns the substring ofsthat was matched by the last call to one of the following matching or searching functions:Str.string_matchStr.search_forwardStr.search_backwardStr.string_partial_matchStr.global_substituteStr.substitute_first
provided that none of the following functions was called in between:
Str.global_replaceStr.replace_firstStr.splitStr.bounded_splitStr.split_delimStr.bounded_split_delimStr.full_splitStr.bounded_full_split
Note: in the case of
global_substituteandsubstitute_first, a call tomatched_stringis only valid within thesubstargument, not afterglobal_substituteorsubstitute_firstreturns.The user must make sure that the parameter
sis the same string that was passed to the matching or searching function.
val match_beginning : unit -> intmatch_beginning()returns the position of the first character of the substring that was matched by the last call to a matching or searching function (seeStr.matched_stringfor details).
val match_end : unit -> intmatch_end()returns the position of the character following the last character of the substring that was matched by the last call to a matching or searching function (seeStr.matched_stringfor details).
val matched_group : int -> string -> stringmatched_group n sreturns the substring ofsthat was matched by thenth group\(...\)of the regular expression that was matched by the last call to a matching or searching function (seeStr.matched_stringfor details). The user must make sure that the parametersis the same string that was passed to the matching or searching function.- raises Not_found
if the
nth group of the regular expression was not matched. This can happen with groups inside alternatives\|, options?or repetitions*. For instance, the empty string will match\(a\)*, butmatched_group 1 ""will raiseNot_foundbecause the first group itself was not matched.
val group_beginning : int -> intgroup_beginning nreturns the position of the first character of the substring that was matched by thenth group of the regular expression that was matched by the last call to a matching or searching function (seeStr.matched_stringfor details).- raises Not_found
if the
nth group of the regular expression was not matched.
- raises Invalid_argument
if there are fewer than
ngroups in the regular expression.
val group_end : int -> intgroup_end nreturns the position of the character following the last character of substring that was matched by thenth group of the regular expression that was matched by the last call to a matching or searching function (seeStr.matched_stringfor details).- raises Not_found
if the
nth group of the regular expression was not matched.
- raises Invalid_argument
if there are fewer than
ngroups in the regular expression.
Replacement
val global_replace : regexp -> string -> string -> stringglobal_replace regexp templ sreturns a string identical tos, except that all substrings ofsthat matchregexphave been replaced bytempl. The replacement templatetemplcan contain\1,\2, etc; these sequences will be replaced by the text matched by the corresponding group in the regular expression.\0stands for the text matched by the whole regular expression.
val replace_first : regexp -> string -> string -> stringSame as
Str.global_replace, except that only the first substring matching the regular expression is replaced.
val global_substitute : regexp -> (string -> string) -> string -> stringglobal_substitute regexp subst sreturns a string identical tos, except that all substrings ofsthat matchregexphave been replaced by the result of functionsubst. The functionsubstis called once for each matching substring, and receivess(the whole text) as argument.
val substitute_first : regexp -> (string -> string) -> string -> stringSame as
Str.global_substitute, except that only the first substring matching the regular expression is replaced.
val replace_matched : string -> string -> stringreplace_matched repl sreturns the replacement textreplin which\1,\2, etc. have been replaced by the text matched by the corresponding groups in the regular expression that was matched by the last call to a matching or searching function (seeStr.matched_stringfor details).smust be the same string that was passed to the matching or searching function.
Splitting
val split : regexp -> string -> string listsplit r ssplitssinto substrings, taking as delimiters the substrings that matchr, and returns the list of substrings. For instance,split (regexp "[ \t]+") ssplitssinto blank-separated words. An occurrence of the delimiter at the beginning or at the end of the string is ignored.
val bounded_split : regexp -> string -> int -> string listSame as
Str.split, but splits into at mostnsubstrings, wherenis the extra integer parameter.
val split_delim : regexp -> string -> string listSame as
Str.splitbut occurrences of the delimiter at the beginning and at the end of the string are recognized and returned as empty strings in the result. For instance,split_delim (regexp " ") " abc "returns[""; "abc"; ""], whilesplitwith the same arguments returns["abc"].
val bounded_split_delim : regexp -> string -> int -> string listSame as
Str.bounded_split, but occurrences of the delimiter at the beginning and at the end of the string are recognized and returned as empty strings in the result.
val full_split : regexp -> string -> split_result listSame as
Str.split_delim, but returns the delimiters as well as the substrings contained between delimiters. The former are taggedDelimin the result list; the latter are taggedText. For instance,full_split (regexp "[{}]") "{ab}"returns[Delim "{"; Text "ab"; Delim "}"].
val bounded_full_split : regexp -> string -> int -> split_result listSame as
Str.bounded_split_delim, but returns the delimiters as well as the substrings contained between delimiters. The former are taggedDelimin the result list; the latter are taggedText.
Extracting substrings
val string_before : string -> int -> stringstring_before s nreturns the substring of all characters ofsthat precede positionn(excluding the character at positionn).
val string_after : string -> int -> stringstring_after s nreturns the substring of all characters ofsthat follow positionn(including the character at positionn).
val first_chars : string -> int -> stringfirst_chars s nreturns the firstncharacters ofs. This is the same function asStr.string_before.