项目作者: atextor

项目描述 :
Java library for pretty printing RDF/Turtle documents
高级语言: Java
项目地址: git://github.com/atextor/turtle-formatter.git
创建时间: 2021-02-03T05:07:11Z
项目社区:https://github.com/atextor/turtle-formatter

开源协议:GNU Lesser General Public License v3.0

下载


turtle-formatter

build Maven Central codecov License: Apache 2.0

turtle-formatter is a Java library for pretty printing
RDF/Turtle documents in a configurable and reproducible way.

It takes as input a formatting style and an Apache Jena Model and
produces as output a pretty-printed RDF/Turtle document.

Starting from version 1.2.0, turtle-formatter is licensed under Apache 2.0. The
current version is 1.2.16.

Current Status: The library is feature-complete.

Why?

Reproducible Formatting

Every RDF library comes with its own serializers, for example an Apache Jena Model can be written
in multiple ways, the easiest being
calling the write method on a model itself: model.write(System.out, "TURTLE"). However, due to the
nature of RDF, outgoing edges of a node in the graph have no order. When serializing a model, there
are multiple valid ways to do so. For example, the following two models are identical:








turtle @prefix : <http://example.com></http:> . :test :blorb "blorb" ; :floop "floop" .



turtle @prefix : <http://example.com></http:> . :test :floop "floop" ; :blorb "blorb" .

Therefore, when a model is serialized, one of many different (valid) serializations could be the
result. This is a problem when different versions of a model file are compared, for example when
used as artifacts in a git repository. Additionally, serialized files are often formatted in one
style hardcoded in the respective library. So while Apache Jena and for example
libraptor2 both write valid RDF/Turtle, the files are formatted
differently. You would not want the code of a project formatted differently in different files,
would you?
turtle-formatter addresses these problems by taking care of serialization order and providing a
way to customize the formatting style.

Nice and Configurable Formatting

Most serializers, while creating valid RDF/Turtle, create ugly formatting. Obviously, what is ugly
and what isn’t is highly subjective, so this should be configurable. turtle-formatter addresses
this by making the formatting style configurable, e.g. how alignment should be done, where extra
spaces should be inserted and even if indendation is using tabs or spaces. A default style is
provided that reflects sane settings (i.e., the author’s opinion). An RDF document formatted using
the default style could look like this:

  1. @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . ①
  2. @prefix owl: <http://www.w3.org/2002/07/owl#> .
  3. @prefix : <http://example.com/relations#> .
  4. :Male a owl:Class ;
  5. owl:disjointWith :Female ;
  6. owl:equivalentClass [
  7. a owl:Restriction ;
  8. owl:hasSelf true ;
  9. owl:onProperty :isMale ;
  10. ] ;
  11. rdfs:subClassOf :Person .
  12. :hasBrother a owl:ObjectProperty ;
  13. owl:propertyChainAxiom ( :hasSibling :isMale ) ;
  14. rdfs:range :Male .
  15. :hasUncle a owl:ObjectProperty, owl:IrreflexiveProperty ;
  16. owl:propertyChainAxiom ( :hasParent :hasSibling :hasHusband ) ;
  17. owl:propertyChainAxiom ( :hasParent :hasBrother ) ;
  18. rdfs:range :Male .
  • ① Prefixes are sorted by common, then custom. They are not aligned on the colon because that
    looks bad when one prefix string is much longer than the others.
  • rdf:type is always written as a. It is always the first predicate and written in the same
    line as the subject.
  • ③ Indentation is done using a fixed size, like in any other format or language. Predicates are not
    aligned to subjects with an arbitrary length.
  • ④ Anonymous nodes are written using the [ ] notation whenever possible.
  • ⑤ Literal shortcuts are used where possible (e.g. no "true"^^xsd:boolean).
  • ⑥ RDF Lists are always written using the ( ) notation, no blank node IDs or
    rdf:next/rdf:first seen here.
  • ⑦ The same predicates on the same subjects are repeated rather than using the , notation,
    because especially when the objects are longer (nested anonymous nodes), it is difficult to
    understand. The exception to this rule is for different rdf:types.

Usage

Usage as a CLI (command line interface)

turtle-formatter itself is only a library and thus intended to be used programmatically, which is
explained in the following sections. However, in the sibling project
owl-cli, turtle-formatter is used and can be called using a
command line interface to pretty-print any OWL or RDF document. See owl-cli’s Getting
Started
to get the tool and the write command
documentation
to see which
command line switches are available to adjust the formatting.

Usage as a library

Add the following dependency to your Maven pom.xml:

  1. <dependency>
  2. <groupId>de.atextor</groupId>
  3. <artifactId>turtle-formatter</artifactId>
  4. <version>1.2.16</version>
  5. </dependency>

Gradle/Groovy: implementation 'de.atextor:turtle-formatter:1.2.16'

Gradle/Kotlin: implementation("de.atextor:turtle-formatter:1.2.16")

Calling the formatter

  1. import java.io.FileInputStream;
  2. import de.atextor.turtle.formatter.FormattingStyle;
  3. import de.atextor.turtle.formatter.TurtleFormatter;
  4. import org.apache.jena.rdf.model.Model;
  5. import org.apache.jena.rdf.model.ModelFactory;
  6. // ...
  7. // Determine formatting style
  8. FormattingStyle style = FormattingStyle.DEFAULT;
  9. TurtleFormatter formatter = new TurtleFormatter(style);
  10. // Build or load a Jena Model.
  11. // Use the style's base URI for loading the model.
  12. Model model = ModelFactory.createDefaultModel();
  13. model.read(new FileInputStream("data.ttl"), style.emptyRdfBase, "TURTLE");
  14. // Either create a string...
  15. String prettyPrintedModel = formatter.apply(model);
  16. // ...or write directly to an OutputStream
  17. formatter.accept(model, System.out);

Customizing the style

Instead of passing FormattingStyle.DEFAULT, you can create a custom FormattingStyle object.

  1. FormattingStyle style = FormattingStyle.builder(). ... .build();

The following options can be set on the FormattingStyle builder:



















































































































































OptionDescriptionDefault


emptyRdfBase

Set the URI that should be left out in formatting. If you don’t care about
this, don’t change it and use the FormattingStyle’s emptyRdfBase field as the
base URI when loading/creating the model that will be formatted, see
calling the formatter.

urn:turtleformatter:internal


alignPrefixes

Boolean. Example:

turtle # true @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix example: <http://example.com></http:> . # false @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix example: <http://example.com></http:> .

false


alignPredicates
firstPredicate-
InNewLine


Boolean. Example:

turtle # firstPredicateInNewLine false # alignPredicates true :test a rdf:Resource ; :blorb "blorb" ; :floop "floop" . # firstPredicateInNewLine false # alignPredicates false :test a rdf:Resource ; :blorb "blorb" ; :floop "floop" . # firstPredicateInNewLine true # alignPredicates does not matter :test a rdf:Resource ; :blorb "blorb" ; :floop "floop" .

false (for both)


alignObjects



Boolean. Example:
turtle # alignObjects true :test a rdf:Resource ; :blorb "blorb" ; :floopfloop "floopfloop" . # alignObjects false :test a rdf:Resource ; :blorb "blorb" ; :floopfloop "floopfloop" .


false


charset*



One of LATIN1, UTF_16_BE, UTF_16_LE, UTF_8, UTF_8_BOM



UTF_8



doubleFormat



A NumberFormat that describes how xsd:double literals are formatted if enableDoubleFormatting is true.



0.####E0



enableDoubleFormatting



Enables formatting of xsd:double values (see doubleFormat option)



false



endOfLine*



One of LF, CR, CRLF. If unsure, please see Newline



LF



indentStyle*



SPACE or TAB. Note that when choosing TAB, alignPredicates and alignObjects are
automatically treated as false.



SPACE



quoteStyle



ALWAYS_SINGLE_QUOTES, TRIPLE_QUOTES_FOR_MULTILINE or ALWAYS_TRIPLE_QUOTES.
Determines which quotes should be used for literals. Triple-quoted strings can
contain literal quotes and line breaks.



TRIPLE_QUOTES_FOR_MULTILINE



indentSize*



Integer. When using indentStyle SPACE, defines the indentation size.


2


insertFinalNewLine*


Boolean. Determines whether there is a line break after the last line

true


useAForRdfType



Boolean. Determines whether rdf:type is written as a or as rdf:type.


true


keepUnusedPrefixes



Boolean. If true, keeps prefixes that are not part of any statement.


false


useCommaByDefault



Boolean. Determines whether to use commas for identical predicates. Example:
turtle # useCommaByDefault false :test a rdf:Resource ; :blorb "someBlorb" ; :blorb "anotherBlorb" . # useCommaByDefault true :test a rdf:Resource ; :blorb "someBlorb", "anotherBlorb" .


false


commaForPredicate



A set of predicates that, when used multiple times, are separated by commas, even when
useCommaByDefault is false. Example:

turtle # useCommaByDefault false, commaForPredicate contains # 'rdf:type', firstPredicateInNewLine true :test a ex:something, owl:NamedIndividual ; :blorb "someBlorb" ; :blorb "anotherBlorb" . # useCommaByDefault false, commaForPredicate is empty, # firstPredicateInNewLine false :test a ex:something ; a owl:NamedIndividual ; :blorb "someBlorb" ; :blorb "anotherBlorb" .



Set.of(rdf:type)



noCommaForPredicate



Analogous to commaForPredicate: A set of predicates that, when used multiple times, are not
separated by commas, even when useCommaByDefault is true.


Empty


prefixOrder



A list of namespace prefixes that defines the order of @prefix directives. Namespaces from the
list always appear first (in this order), every other prefix will appear afterwards,
lexicographically sorted. Example:

turtle # prefixOrder contains "rdf" and "owl" (in this order), so # they will appear in this order at the top (when the model # contains them!), followed by all other namespaces @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix example: <http://example.com></http:> .



List.of(rdf rdfs xsd owl)



subjectOrder



A list of resources that determines the order in which subjects appear. For a subject s there must
exist a statement s rdf:type t in the model and an entry for t in the subjectOrder list for
the element to be considered in the ordering, i.e., when subjectOrder contains :Foo and :Bar
in that order, the pretty-printed model will show first all :Foos, then all :Bars, then
everything else lexicographically sorted.



List.of(rdfs:Class owl:Ontology owl:Class rdf:Property owl:ObjectProperty
owl:DatatypeProperty owl:AnnotationProperty owl:NamedIndividual owl:AllDifferent
owl:Axiom)



predicateOrder



A list of properties that determine the order in which predicates appear for a subject. First all
properties that are in the list are shown in that order, then everything else lexicographically
sorted. For example, when predicateOrder contains :z, :y, :x in that order and the subject
has statements for the properties :a, :x and :z:

turtle :test :z "z" ; :x "x" ; :a "a" .



List.of(rdf:type rdfs:label rdfs:comment dcterms:description)



objectOrder



A list of RDFNodes (i.e. resources or literals) that determine the order in which objects appear for
a predicate, when there are multiple statements with the same subject and the same predicate. First
all objects that are in the list are shown in that order, then everything else lexicographically
sorted. For example, when objectOrder contains :Foo and :Bar in that order:

turtle :test a :Foo, :Bar .



List.of(owl:NamedIndividual owl:ObjectProperty owl:DatatypeProperty owl:AnnotationProperty owl:FunctionalProperty owl:InverseFunctionalProperty owl:TransitiveProperty owl:SymmetricProperty owl:AsymmetricProperty owl:ReflexiveProperty owl:IrreflexiveProperty)



anonymousNode-
IdGenerator



A BiFunction that takes a resource (blank node) and an integer (counter) and determines the name
for a blank node in the formatted output, if it needs to be locally named. Consider the following
model:

turtle :test :foo _:b0 . :test2 :bar _:b0 .

There is no way to serialize this model in RDF/Turtle while using the inline blank node syntax [ ]
for the anonymous node _:b0. If, as in this example, the node in question already has a label, the label is re-used.
Otherwise, the anonymousNodeIdGenerator is used to generate it.



(r, i) -> "gen" + i



{after,before}
{Opening, Closing}
{Parenthesis, SquareBrackets},

{after,before}
{Comma, Dot, Semicolon }



NEWLINE, NOTHING or SPACE. Various options for formatting gaps and line breaks. It is not
recommended to change those, as the default style represents the commonly accepted best practices
for formatting turtle already.



Varied



wrapListItems



ALWAYS, NEVER or FOR_LONG_LINES. Controls how line breaks are added after
elements in RDF lists.



FOR_LONG_LINES

* Adapted from EditorConfig

Release Notes

  • Unreleased
    • Replace Platform-dependent line terminators in multiline strings with unix style newlines (\n, \r, and \r\n -> \n)
  • 1.2.16:
    • Bugfix: Empty RDF lists are formatted as empty set of parenthesis again
  • 1.2.15:
    • Bugfix: RDF list nodes containing other properties than rdf:rest and
      rdf:first are formatted correctly
  • 1.2.14:
    • Bugfix: xsd:double numbers are correctly typed even when lexically
      equivalent to decimals
  • 1.2.13:
    • Feature: Skip double formatting
  • 1.2.12:
    • Bugfix: Handle RDF lists that start with a non-anonymous node
    • Bugfix: Handle blank node cycles
    • Bugfix: Ensure constant blank node ordering
    • Bugfix: Set Locale for NumberFormat to US
    • Change default subjectOrder to show rdfs:Class after owl:Ontology
  • 1.2.11:
    • Bugfix: rdf:type is not printed as a when used as an object
    • Update all dependencies, including Apache Jena to 4.10.0
  • 1.2.10:
    • Configured endOfLine style is honored in prefix formatting
  • 1.2.9:
    • The dummy base URI is now configurable in the formatting style. Its default
      value was changed (to urn:turtleformatter:internal) to make it a valid URI.
  • 1.2.8:
    • Bugfix: Quotes that are the last character in a triple-quoted string are
      escaped correctly
    • New style switch: FormattingStyle.quoteStyle
  • 1.2.7:
    • Bugfix: URIs and local names are formatted using Jena RIOT; no invalid local
      names are printed any longer
  • 1.2.6:
    • Fix typo in FormattingStyle property (indentPredicates)
    • Fix alignment of repeated identical predicates
  • 1.2.5:
    • Dashes, underscores and full stops in the name part of local names are not
      escaped any more. Technically not a bug fix since both is valid, but it’s
      nicer to read
  • 1.2.4:
    • Bugfix: Dashes in prefixes of local names are not escaped any more
  • 1.2.3:
    • Bugfix: Special characters in local names (curies) and literals are properly escaped
  • 1.2.2:
    • Enable writing URIs with an empty base: use TurtleFormatter.EMPTY_BASE as
      value for “base” when reading a model using Jena’s model.read()
    • Update build to Java 17
  • 1.2.1:
    • Improve formatting for blank nodes nested in lists
    • Use triple quotes for literals containing line breaks
    • Use Jena’s mechanisms for escaping special characters in literals
  • 1.2.0:
    • Add wrapListItems configuration option
    • Change license from LGPL 3.0 to Apache 2.0
  • 1.1.1:
    • Make fields of FormattingStyle public, so that DEFAULT config is readable
  • 1.1.0:
    • Bugfix: Subjects with a rdf:type not in subjectOrder are rendered correctly
    • Adjust default subjectOrder and predicateOrder
    • Add new option keepUnusedPrefixes and by default render only used prefixes
  • 1.0.1: Fix POM so that dependency can be used as jar
  • 1.0.0: First version

Contact

turtle-formatter is developed by Andreas Textor <mail@atextor.de>.