项目作者: tulz-app

项目描述 :
Myers diff algorithm in Scala
高级语言: Scala
项目地址: git://github.com/tulz-app/stringdiff.git
创建时间: 2020-12-20T02:10:56Z
项目社区:https://github.com/tulz-app/stringdiff

开源协议:MIT License

下载


Maven Central

app.tulz.diff

Myers diff algorithm in Scala.

  1. "app.tulz" %%% "stringdiff" % "0.3.4"

Overview

The core algorithm is a Scala translation of the Python implementation described here:
https://blog.robertelder.org/diff-algorithm/

Additionally, the following is implemented on top of it:

  • interpretation of the algorithm’s output
  • customizable formatting of the interpreted result into a string (with a few provided out of the box formatters)
  • additional transformations for when the input is expected to be a sequence of tokens (TokenDiff)

Interpretation

The core algorithm outputs a set of instructions to get from SeqA to SeqB, something like this:

  1. in order to get from s1="bcdefgzio" to s2="abcxyfgi":
  2. Insert a from s2 before position 0 into s1.
  3. Delete d from s1 at position 2 in s1.
  4. Insert x from s2 before position 3 into s1.
  5. Insert y from s2 before position 3 into s1.
  6. Delete e from s1 at position 3 in s1.
  7. Delete z from s1 at position 6 in s1.
  8. Delete o from s1 at position 8 in s1.

It’s hard to work with it. Also, it is somewhat tricky at first to understand how to follow these instructions (try it! :) ).

MyersInterpret object parses the raw output into a List[DiffElement]:

  1. trait DiffElement[Repr]
  2. object DiffElement {
  3. final case class InBoth[Repr](v: Repr) extends DiffElement[Repr]
  4. final case class InFirst[Repr](v: Repr) extends DiffElement[Repr]
  5. final case class InSecond[Repr](v: Repr) extends DiffElement[Repr]
  6. final case class Diff[Repr](x: Repr, y: Repr) extends DiffElement[Repr]
  7. }

The result of the interpretation (without any transformations applied) for the same example:

  1. StringDiff
  2. .raw(
  3. "bcdefgzio",
  4. "abcxyfgi",
  5. collapse = false
  6. ).mkString("[\n ", "\n ", "\n]")
  1. [
  2. InSecond(a)
  3. InBoth(bc)
  4. InFirst(d)
  5. InSecond(x)
  6. InSecond(y)
  7. InFirst(e)
  8. InBoth(fg)
  9. InFirst(z)
  10. InBoth(i)
  11. InFirst(o)
  12. ]

Collapsing

By default, diff functions will collapse the diff:

  1. StringDiff
  2. .raw(
  3. "bcdefgzio",
  4. "abcxyfgi",
  5. collapse = true // default is true
  6. ).mkString("[\n ", "\n ", "\n]")
  1. [
  2. InSecond(a)
  3. InBoth(bc)
  4. Diff(de,xy)
  5. InBoth(fg)
  6. InFirst(z)
  7. InBoth(i)
  8. InFirst(o)
  9. ]

Here, the following list of DiffElements:

  1. [
  2. InFirst(d)
  3. InSecond(x)
  4. InSecond(y)
  5. InFirst(e)
  6. ]

got collapsed into a single one:

  1. [
  2. Diff(de,xy)
  3. ]

In a nutshell, collapsing removes empty elements and joins same or otherwise “join-able” subsequent DiffElements.

Examples:

  • any InFirst, InLast, Diff or InBoth gets removed if the element is empty
  • Diff becomes InFirst or InSecond if one the elements is empty
  • InFirst(a)+InFirst(b) -> InFirst(ab)
  • InFirst(a)+InSecond(b) -> Diff(a,b)
  • InSecond(a)+InDiff(b,c) -> Diff(ab,c)
  • etc

Diff’ing sequences:

  1. SeqDiff
  2. .seq(
  3. Seq(1, 2, 3, 4, 5).toIndexedSeq,
  4. Seq(1, 2, 8, 3, 8, 4, 5, 0).toIndexedSeq
  5. ).mkString("[\n ", "\n ", "\n]")
  1. [
  2. InBoth(Vector(1, 2))
  3. InSecond(Vector(8))
  4. InBoth(Vector(3))
  5. InSecond(Vector(8))
  6. InBoth(Vector(4, 5))
  7. InSecond(Vector(0))
  8. ]

Diff’ing strings:

Raw diff AST:
  1. println(
  2. StringDiff.diff(
  3. "bcdefgzio",
  4. "abcxyfgi"
  5. )
  6. )
  1. [
  2. InSecond(a)
  3. InBoth(bc)
  4. Diff(de,xy)
  5. InBoth(fg)
  6. InFirst(z)
  7. InBoth(i)
  8. InFirst(o)
  9. ]
Text output:
  1. println(
  2. StringDiff.text(
  3. "bcdefgzio",
  4. "abcxyfgi"
  5. )
  6. )
  1. [∅|a]]bc][de|xy]]fg][z|∅]]i][o|∅]
ANSI color output:
  1. println(
  2. StringDiff.ansi(
  3. "bcdefgzio",
  4. "abcxyfgi"
  5. )
  6. )

screenshot1

(default formatters highlight missing text with yellow, extraneous text — with red, and matching text is underlined)

Diff’ing tokens

When the inputs are strings that are expected to contain whitespace-separated tokens, TokenDiff
will try to make the diff more comprehensible in terms of tokens (while preserving the accuracy).

  1. println(
  2. TokenDiff.ansi(
  3. "match-1 match-2 diff-1 diff-2 match-3 match-4 diff-1 diff-2 match-1 match-2 diff-1 match-3 match-4 diff-1 match-1 match-2 diff-1 match-3 match-4 suffix-1",
  4. "prefix-1 match-1 match-2 diff-3 match-3 match-4 match-1 match-2 diff-2 diff-3 match-3 match-4 diff-2 match-1 match-2 match-3 match-4"
  5. )
  6. )

screenshot3

With a StringDiff the output would look like the following:

  1. println(
  2. StringDiff.ansi(
  3. "match-1 match-2 diff-1 diff-2 match-3 match-4 diff-1 diff-2 match-1 match-2 diff-1 match-3 match-4 diff-1 match-1 match-2 diff-1 match-3 match-4 suffix-1",
  4. "prefix-1 match-1 match-2 diff-3 match-3 match-4 match-1 match-2 diff-2 diff-3 match-3 match-4 diff-2 match-1 match-2 match-3 match-4"
  5. )
  6. )

screenshot4

Inline diffs for both strings
  1. println(
  2. TokenDiff.ansiBoth(
  3. "match-1 match-2 diff-1 diff-2 match-3 match-4 diff-1 diff-2 match-1 match-2 diff-1 match-3 match-4 diff-1 match-1 match-2 diff-1 match-3 match-4 suffix-1",
  4. "prefix-1 match-1 match-2 diff-3 match-3 match-4 match-1 match-2 diff-2 diff-3 match-3 match-4 diff-2 match-1 match-2 match-3 match-4"
  5. )
  6. )

screenshot3

Usage

Diff’ing Seqs:
  1. SeqDiff.seq(
  2. Seq(1, 2, 3),
  3. Seq(2, 3, 4)
  4. )
Diff’ing Stringss:
  1. StringDiff.ansi("abc", "acb")
  2. // OR
  3. StringDiff("abc", "acb")
  4. StringDiff.ansiBoth("abc", "acb")
  5. StringDiff.text("abc", "acb")
  6. StringDiff.diff("abc", "acb")
  7. StringDiff.raw("abc", "acb")
Diff’ing Stringss with tokens:
  1. TokenDiff.ansi("abc", "acb")
  2. // OR
  3. TokenDiff("abc", "acb")
  4. TokenDiff.ansiBoth("abc", "acb")
  5. TokenDiff.text("abc", "acb")
  6. TokenDiff.diff("abc", "acb")
  7. TokenDiff.raw("abc", "acb")

Custom formatters

Formatters are instances of the DiffFormat[Out] trait:

  1. trait DiffFormat[Out] {
  2. def apply(diff: List[DiffElement[String]]): Out
  3. }
  1. object MyFormat extends DiffFormat[MyDiffOutput] { ... }
  2. val diff: MyDiffOutput = MyFormat(StringDiff("abc", "acb"))

For example, the TextDiffFormat is implemented like this:

  1. object TextDiffFormat extends DiffFormat[String] {
  2. import DiffElement._
  3. def apply(diff: List[DiffElement[String]]): String = {
  4. val sb = new StringBuilder
  5. diff.foreach {
  6. case InBoth(both) =>
  7. sb.append("]")
  8. sb.appendAll(both)
  9. sb.append("]")
  10. case InSecond(second) =>
  11. sb.append("[∅|")
  12. sb.appendAll(second)
  13. sb.append("]")
  14. case InFirst(first) =>
  15. sb.append("[")
  16. sb.appendAll(first)
  17. sb.append("|∅]")
  18. case Diff(first, second) =>
  19. sb.append("[")
  20. sb.appendAll(first)
  21. sb.append("|")
  22. sb.appendAll(second)
  23. sb.append("]")
  24. case _ =>
  25. }
  26. sb.toString()
  27. }

Author

Iurii Malchenko – @yurique

License

stringdiff is provided under the MIT license.