Fare Finite-state automaton with regular expression operations.

Class invariants:

  • An automaton is either represented explicitly (with State and Transition} objects) or with a singleton string (see Singleton property ExpandSingleton() method) in case the automaton is known to accept exactly one string. (Implicitly, all states and transitions of an automaton are reachable from its initial state.)
  • Automata are always reduced (see method Rreduce()) and have no transitions to dead states (see RemoveDeadTransitions() method).
  • If an automaton is non deterministic, then IsDeterministic property returns false (but the converse is not required).
  • Automata provided as input to operations are generally assumed to be disjoint.

If the states or transitions are manipulated manually, the RestoreInvariant() method and SetDeterministic(bool) methods should be used afterwards to restore representation invariants that are assumed by the built-in automata operations.
Minimize using Huffman's O(n2) algorithm. This is the standard text-book algorithm. Minimize using Brzozowski's O(2n) algorithm. This algorithm uses the reverse-determinize-reverse-determinize trick, which has a bad worst-case behavior but often works very well in practice even better than Hopcroft's!). Minimize using Hopcroft's O(n log n) algorithm. This is regarded as one of the most generally efficient algorithms that exist. Selects whether operations may modify the input automata (default: false). Minimize always flag. The hash code. The initial. Initializes a new instance of the class that accepts the empty language. Using this constructor, automata can be constructed manually from and objects. Gets the minimization algorithm (default: MINIMIZE_HOPCROFT ). Gets or sets a value indicating whether operations may modify the input automata (default: false ). true if [allow mutation]; otherwise, false. Gets or sets a value indicating whether this automaton is definitely deterministic (i.e., there are no choices for any run, but a run may crash). true then this automaton is definitely deterministic (i.e., there are no choices for any run, but a run may crash)., false. Gets or sets the initial state of this automaton. The initial state of this automaton. Gets or sets the singleton string for this automaton. An automaton that accepts exactly one string may be represented in singleton mode. In that case, this method may be used to obtain the string. The singleton string, null if this automaton is not in singleton mode. Gets or sets a value indicating whether this instance is singleton. true if this instance is singleton; otherwise, false. Gets or sets a value indicating whether this instance is debug. true if this instance is debug; otherwise, false. Gets or sets a value indicating whether IsEmpty. Gets the number of states in this automaton. Returns the number of states in this automaton. Gets the number of transitions in this automaton. This number is counted as the total number of edges, where one edge may be a character interval. Sets or resets allow mutate flag. If this flag is set, then all automata operations may modify automata given as input; otherwise, operations will always leave input automata languages unmodified. By default, the flag is not set. if set to true then all automata operations may modify automata given as input; otherwise, operations will always leave input automata languages unmodified.. The previous value of the flag. Sets or resets minimize always flag. If this flag is set, then {@link #minimize()} will automatically be invoked after all operations that otherwise may produce non-minimal automata. By default, the flag is not set. The flag if true, the flag is set. Assigns consecutive numbers to the given states. The states. The check minimize always. The clear hash code. Creates a shallow copy of the current Automaton. A shallow copy of the current Automaton. A clone of this automaton, expands if singleton. Returns a clone of this automaton, expands if singleton. A clone of this automaton unless allowMutation is set, expands if singleton. Returns a clone of this automaton unless allowMutation is set, expands if singleton. Returns a clone of this automaton, or this automaton itself if allow_mutation flag is set. A clone of this automaton, or this automaton itself if allow_mutation flag is set. Expands singleton representation to normal representation. Does nothing if not in singleton representation. The set of reachable accept states. Returns the set of reachable accept states. Returns the set of live states. A state is "live" if an accept state is reachable from it. The sorted array of all interval start points. Returns sorted array of all interval start points. Gets the set of states that are reachable from the initial state. The set of states that are reachable from the initial state. The minimize. Recomputes the hash code. The automaton must be minimal when this operation is performed. Reduces this automaton. An automaton is "reduced" by combining overlapping and adjacent edge intervals with same destination. Removes transitions to dead states and calls Reduce() and ClearHashCode(). (A state is "dead" if no accept state is reachable from it). Adds transitions to explicit crash state to ensure that transition function is total. Returns a new (deterministic) automaton that accepts any single character. A new (deterministic) automaton that accepts any single character. Returns a new (deterministic) automaton that accepts all strings. A new (deterministic) automaton that accepts all strings. Returns a new (deterministic) automaton that accepts a single character of the given value. The c. A new (deterministic) automaton that accepts a single character of the given value. Returns a new (deterministic) automaton that accepts a single char whose value is in the given interval (including both end points). The min. The max. A new (deterministic) automaton that accepts a single char whose value is in the given interval (including both end points). Returns a new (deterministic) automaton with the empty language. A new (deterministic) automaton with the empty language. Returns a new (deterministic) automaton that accepts only the empty string. A new (deterministic) automaton that accepts only the empty string. Returns a new automaton that accepts strings representing decimal non-negative integers in the given interval. The minimum value of interval. The maximum value of inverval (both end points are included in the interval). If f >0, use fixed number of digits (strings must be prefixed by 0's to obtain the right length) otherwise, the number of digits is not fixed. A new automaton that accepts strings representing decimal non-negative integers in the given interval. Returns a new (deterministic) automaton that accepts the single given string. The string. A new (deterministic) automaton that accepts the single given string. Constructs sub-automaton corresponding to decimal numbers of length x.Substring(n).Length. The x. The n. Constructs sub-automaton corresponding to decimal numbers of value at least x.Substring(n) and length x.Substring(n).Length. The x. The n. The initials. if set to true [zeros]. Constructs sub-automaton corresponding to decimal numbers of value at most x.Substring(n) and length x.Substring(n).Length. The x. The n. Constructs sub-automaton corresponding to decimal numbers of value between x.Substring(n) and y.Substring(n) and of length x.Substring(n).Length (which must be equal to y.Substring(n).Length). The x. The y. The n. The initials. if set to true [zeros]. Returns a new (deterministic) automaton that accepts a single character in the given set. The set. Returns a new (deterministic and minimal) automaton that accepts the union of the given set of strings. The input character sequences are internally sorted in-place, so the input array is modified. @see StringUnionOperations. The strings. Constructs automaton that accept strings representing nonnegative integer that are not larger than the given value. The n string representation of maximum value. Constructs automaton that accept strings representing nonnegative integers that are not less that the given value. The n string representation of minimum value. Constructs automaton that accept strings representing decimal numbers that can be written with at most the given number of digits. Surrounding whitespace is permitted. The i max number of necessary digits. Constructs automaton that accept strings representing decimal numbers that can be written with at most the given number of digits in the fraction part. Surrounding whitespace is permitted. The i max number of necessary fraction digits. Constructs automaton that accept strings representing the given integer. Surrounding whitespace is permitted. The value string representation of integer. Constructs automaton that accept strings representing the given decimal number. Surrounding whitespace is permitted. The value string representation of decimal number. Constructs deterministic automaton that matches strings that contain the given substring. The s. Adds epsilon transitions to the given automaton. This method adds extra character interval transitions that are equivalent to the given set of epsilon transitions. The automaton. A collection of objects representing pairs of source/destination states where epsilon transitions should be added. Returns an automaton that accepts the union of the languages of the given automata. The l. An automaton that accepts the union of the languages of the given automata. Complexity: linear in number of states. Returns a (deterministic) automaton that accepts the complement of the language of the given automaton. The automaton. A (deterministic) automaton that accepts the complement of the language of the given automaton. Complexity: linear in number of states (if already deterministic). Determinizes the specified automaton. Complexity: exponential in number of states. The automaton. Determinizes the given automaton using the given set of initial states. The automaton. The initial states. Determines whether the given automaton accepts no strings. The automaton. true if the given automaton accepts no strings; otherwise, false. Determines whether the given automaton accepts the empty string and nothing else. The automaton. true if the given automaton accepts the empty string and nothing else; otherwise, false. Returns an automaton that accepts the intersection of the languages of the given automata. Never modifies the input automata languages. The a1. The a2. Returns an automaton that accepts the union of the empty string and the language of the given automaton. The automaton. Complexity: linear in number of states. An automaton that accepts the union of the empty string and the language of the given automaton. Accepts the Kleene star (zero or more concatenated repetitions) of the language of the given automaton. Never modifies the input automaton language. The automaton. An automaton that accepts the Kleene star (zero or more concatenated repetitions) of the language of the given automaton. Never modifies the input automaton language. Complexity: linear in number of states. Accepts min or more concatenated repetitions of the language of the given automaton. The automaton. The minimum concatenated repetitions of the language of the given automaton. Returns an automaton that accepts min or more concatenated repetitions of the language of the given automaton. Complexity: linear in number of states and in min. Accepts between min and max (including both) concatenated repetitions of the language of the given automaton. The automaton. The minimum concatenated repetitions of the language of the given automaton. The maximum concatenated repetitions of the language of the given automaton. Returns an automaton that accepts between min and max (including both) concatenated repetitions of the language of the given automaton. Complexity: linear in number of states and in min and max. Returns true if the given string is accepted by the automaton. The automaton. The string. Complexity: linear in the length of the string. For full performance, use the RunAutomaton class. Implements the operator ==. The left. The right. The result of the operator. Implements the operator !=. The left. The right. The result of the operator. Minimizes (and determinizes if not already deterministic) the given automaton. The automaton. Minimizes the given automaton using Brzozowski's algorithm. The automaton. Minimizes the given automaton using Huffman's algorithm. The automaton. Regular Expression extension to Automaton. Prevents a default instance of the class from being created. Initializes a new instance of the class from a string. A string with the regular expression. Initializes a new instance of the class from a string. A string with the regular expression. Boolean 'or' of optional syntax constructs to be enabled. Constructs new Automaton from this RegExp. Same as toAutomaton(null) (empty automaton map). Constructs new Automaton from this RegExp. Same as toAutomaton(null,minimize) (empty automaton map). if set to true [minimize]. Constructs new Automaton from this RegExp. The constructed automaton is minimal and deterministic and has no transitions to dead states. The provider of automata for named identifiers. Constructs new Automaton from this RegExp. The constructed automaton has no transitions to dead states. The provider of automata for named identifiers. if set to true the automaton is minimized and determinized. Constructs new Automaton from this RegExp. The constructed automaton is minimal and deterministic and has no transitions to dead states. The a map from automaton identifiers to automata. Constructs new Automaton from this RegExp. The constructed automaton has no transitions to dead states. The map from automaton identifiers to automata. if set to true the automaton is minimized and determinized. Sets or resets allow mutate flag. If this flag is set, then automata construction uses mutable automata, which is slightly faster but not thread safe. if set to true the flag is set. The previous value of the flag. Returns the set of automaton identifiers that occur in this regular expression. The set of automaton identifiers that occur in this regular expression. Uses case-insensitive matching. Use single-line mode, where the period matches every character, instead of every character except \n. Use multiline mode, where ^ and $ match the beginning and end of each line, instead of the beginning and end of the input string. Do not capture unnamed groups. Exclude unescaped white space from the pattern and enable comments after a hash sign #. Enables intersection. Enables complement. Enables empty language. Enables anystring. Enables named automata. Enables numerical intervals. Enables all optional regexp syntax. Special automata operations. Reverses the language of the given (non-singleton) automaton while returning the set of new initial states. The automaton. Returns an automaton that accepts the overlap of strings that in more than one way can be split into a left part being accepted by a1 and a right part being accepted by a2. The a1. The a2. Returns an automaton that accepts the single chars that occur in strings that are accepted by the given automaton. Never modifies the input automaton. The automaton. Returns an automaton that accepts the trimmed language of the given automaton. The resulting automaton is constructed as follows: 1) Whenever a c character is allowed in the original automaton, one or more set characters are allowed in the new automaton. 2) The automaton is prefixed and postfixed with any number of set characters. The automaton. The set of characters to be trimmed. The canonical trim character (assumed to be in set). Returns an automaton that accepts the compressed language of the given automaton. Whenever a c character is allowed in the original automaton, one or more set characters are allowed in the new automaton. The automaton. The set of characters to be compressed. The canonical compress character (assumed to be in set). Returns an automaton where all transition labels have been substituted.

Each transition labeled c is changed to a set of transitions, one for each character in map(c). If map(c) is null, then the transition is unchanged.

The automaton. The dictionary from characters to sets of characters (where characters are char objects).
Rinds the largest entry whose value is less than or equal to c, or 0 if there is no such entry. The c. The points. Returns an automaton where all transitions of the given char are replaced by a string. The automaton. The c. The s. A new automaton. Returns an automaton accepting the homomorphic image of the given automaton using the given function.

This method maps each transition label to a new value. source and dest are assumed to be arrays of same length, and source must be sorted in increasing order and contain no duplicates. source defines the starting points of char intervals, and the corresponding entries in dest define the starting points of corresponding new intervals.

The automaton. The source. The dest.
Returns an automaton with projected alphabet. The new automaton accepts all strings that are projections of strings accepted by the given automaton onto the given characters (represented by Character). If null is in the set, it abbreviates the intervals u0000-uDFFF and uF900-uFFFF (i.e., the non-private code points). It is assumed that all other characters from chars are in the interval uE000-uF8FF. The automaton. The chars. Returns true if the language of this automaton is finite. The automaton. true if the specified a is finite; otherwise, false. Checks whether there is a loop containing s. (This is sufficient since there are never transitions to dead states). The s. The path. The visited. true if the specified s is finite; otherwise, false. Returns the set of accepted strings of the given length. The automaton. The length. Returns the set of accepted strings, assuming this automaton has a finite language. If the language is not finite, null is returned. The automaton. Returns the set of accepted strings, assuming that at most limit strings are accepted. If more than limit strings are accepted, null is returned. If limit<0, then this methods works like {@link #getFiniteStrings(Automaton)}. The automaton. The limit. Returns the strings that can be produced from the given state, or false if more than limit strings are found. limit<0 means "infinite". The s. The path states. The strings. The path. The limit. Returns the longest string that is a prefix of all accepted strings and visits each state at most once. The automaton. A common prefix. Prefix closes the given automaton. The automaton. Constructs automaton that accepts the same strings as the given automaton but ignores upper/lower case of A-F. The automaton. An automaton. Constructs automaton that accepts 0x20, 0x9, 0xa, and 0xd in place of each 0x20 transition in the given automaton. The automaton. An automaton. Automaton state. Initializes a new instance of the class. Initially, the new state is a reject state. Gets the id. Gets or sets a value indicating whether this State is Accept. Gets or sets this State Number. Gets or sets this State Transitions. Implements the operator ==. The left. The right. The result of the operator. Implements the operator !=. The left. The right. The result of the operator. Adds an outgoing transition. The transition. Performs lookup in transitions, assuming determinism. The character to look up. The destination state, null if no matching outgoing transition. Performs lookup in transitions, allowing nondeterminism. The character to look up. The collection where destination states are stored. Gets the transitions sorted by (min, reverse max, to) or (to, min, reverse max). if set to true [to first]. The transitions sorted by (min, reverse max, to) or (to, min, reverse max). Determines whether the specified objects are equal. The first object of type to compare. The second object of type to compare. true if the specified objects are equal; otherwise, false. Returns a hash code for this instance. The obj. A hash code for this instance, suitable for use in hashing algorithms and data structures like a hash table. The type of is a reference type and is null. Pair of states. Initializes a new instance of the class. The s. The s1. The s2. Initializes a new instance of the class. The first state. The second state. Gets or sets the first component of this pair. The first state. Gets or sets the second component of this pair. The second state. Implements the operator ==. The left. The right. The result of the operator. Implements the operator !=. The left. The right. The result of the operator. Automaton transition.

A transition, which belongs to a source state, consists of a Unicode character interval and a destination state.

Initializes a new instance of the class. (Constructs a new singleton interval transition). The transition character. The destination state. Initializes a new instance of the class. (Both end points are included in the interval). The transition interval minimum. The transition interval maximum. The destination state. Gets the minimum of this transition interval. Gets the maximum of this transition interval. Gets the destination of this transition. Implements the operator ==. The left. The right. The result of the operator. Implements the operator !=. The left. The right. The result of the operator. Initializes a new instance of the class. if set to true [to first]. Compares by (min, reverse max, to) or (to, min, reverse max). The first Transition. The second Transition. An object that will generate text from a regular expression. In a way, it's the opposite of a regular expression matcher: an instance of this class will produce text that is guaranteed to match the regular expression passed in. Initializes a new instance of the class. The regex. The random. Initializes a new instance of the class. The regex. Generates a random String that is guaranteed to match the regular expression passed to the constructor. Generates a random number within the given bounds. The minimum number (inclusive). The maximum number (inclusive). The object used as the randomizer. A random number in the given range.