Documentation

Mathlib.Data.List.EditDistance.Estimator

Estimator for Levenshtein distance. #

The usual algorithm for computing Levenshtein distances provides successively better lower bounds for the Levenshtein distance as it runs, as proved in Mathlib.Data.List.EditDistance.Bounds.

In this file we package that fact as an instance of

Estimator (Thunk.mk fun _ => levenshtein C xs ys) (LevenshteinEstimator C xs ys)

allowing us to use the Estimator framework for Levenshtein distances.

This is then used in the implementation of rewrite_search to avoid needing the entire edit distance calculation in unlikely search paths.

structure LevenshteinEstimator' {α : Type} {β : Type} {δ : Type} [CanonicallyLinearOrderedAddCommMonoid δ] (C : Levenshtein.Cost α β δ) (xs : List α) (ys : List β) :

Data showing that the Levenshtein distance from xs to ys is bounded below by the minimum Levenshtein distance between some suffix of xs and a particular suffix of ys.

This bound is (non-strict) monotone as we take longer suffixes of ys.

This is an auxiliary definition for the later LevenshteinEstimator: this variant constructs a lower bound for the pair consisting of the Levenshtein distance from xs to ys, along with the length of ys.

  • pre_rev : List β

    The prefix of ys that is not is not involved in the bound, in reverse order.

  • suff : List β

    The suffix of ys, such that the distance from xs to ys is bounded below by the minimum distance from any suffix of xs to this suffix.

  • split : List.reverse self.pre_rev ++ self.suff = ys

    Witness that ys has been decomposed into a prefix and suffix.

  • distances : { r : List δ // 0 < List.length r }

    The distances from each suffix of xs to suff.

  • distances_eq : self.distances = suffixLevenshtein C xs self.suff

    Witness that distances are correct.

  • bound : δ ×

    The current bound on the pair (distance from xs to ys, length of ys).

  • bound_eq : self.bound = match self.pre_rev, (_ : List.reverse self.pre_rev ++ self.suff = ys) with | [], split => ((self.distances)[0], List.length ys) | x, split => (List.minimum_of_length_pos (_ : 0 < List.length self.distances), List.length self.suff)

    Predicate describing the current bound.

Instances For
    instance estimator' {α : Type} {β : Type} {δ : Type} [CanonicallyLinearOrderedAddCommMonoid δ] (C : Levenshtein.Cost α β δ) (xs : List α) (ys : List β) :
    Estimator { fn := fun (x : Unit) => (levenshtein C xs ys, List.length ys) } (LevenshteinEstimator' C xs ys)
    Equations
    • One or more equations did not get rendered due to their size.
    def LevenshteinEstimator {α : Type} {β : Type} {δ : Type} [CanonicallyLinearOrderedAddCommMonoid δ] (C : Levenshtein.Cost α β δ) (xs : List α) (ys : List β) :

    An estimator for Levenshtein distances.

    Equations
    Instances For
      Equations
      • One or more equations did not get rendered due to their size.
      instance instBotLevenshteinEstimator {α : Type} {β : Type} {δ : Type} [CanonicallyLinearOrderedAddCommMonoid δ] (C : Levenshtein.Cost α β δ) (xs : List α) (ys : List β) :

      The initial estimator for Levenshtein distances.

      Equations
      • One or more equations did not get rendered due to their size.