项目作者: luc-tielen

项目描述 :
Haskell bindings for the Souffle datalog language
高级语言: C++
项目地址: git://github.com/luc-tielen/souffle-haskell.git
创建时间: 2019-11-23T16:31:59Z
项目社区:https://github.com/luc-tielen/souffle-haskell

开源协议:MIT License

下载


Souffle-haskell

License: MIT
CI
Hackage

This repo provides Haskell bindings for performing analyses with the
Souffle Datalog language.

Fun fact: this library combines both functional programming (Haskell),
logic programming (Datalog / Souffle) and imperative / OO programming (C / C++).

Motivating example

Let’s first write a datalog program that can check if one point
is reachable from another:

  1. // We define 2 data types:
  2. .decl edge(n: symbol, m: symbol)
  3. .decl reachable(n: symbol, m: symbol)
  4. // We indicate we are interested in "reachable" facts.
  5. // NOTE: If you forget to add outputs, the souffle compiler will
  6. // try to be smart and remove most generated code!
  7. .output reachable
  8. // We write down some pre-defined facts on the datalog side.
  9. edge("a", "b").
  10. edge("b", "c").
  11. edge("c", "e").
  12. edge("e", "f").
  13. edge("c", "d").
  14. // And we tell datalog how to check if 1 point is reachable from another.
  15. reachable(x, y) :- edge(x, y). // base rule
  16. reachable(x, z) :- edge(x, y), reachable(y, z). // inductive rule

Now that we have the datalog code, we can generate a path.cpp from it
using souffle -g path.cpp path.dl. souffle-haskell can bind to this program
in the following way:

  1. -- Enable some necessary extensions:
  2. {-# LANGUAGE DeriveGeneric, DeriveAnyClass, DerivingVia, DataKinds, UndecidableInstances #-}
  3. -- NOTE: The usage of "deriving stock", "deriving anyclass" and "deriving via" in the
  4. -- examples below matters in order for the library to work correctly!
  5. module Main ( main ) where
  6. import Data.Foldable ( traverse_ )
  7. import Control.Monad.IO.Class
  8. import GHC.Generics
  9. import Data.Vector
  10. import qualified Language.Souffle.Compiled as Souffle
  11. -- First, we define a data type representing our datalog program.
  12. data Path = Path
  13. -- By making Path an instance of Program, we provide Haskell with information
  14. -- about the datalog program. It uses this to perform compile-time checks to
  15. -- limit the amount of possible programmer errors to a minimum.
  16. deriving Souffle.Program
  17. via Souffle.ProgramOptions Path "path" '[Edge, Reachable]
  18. -- Facts are represented in Haskell as simple product types,
  19. -- Numbers map to Int32, unsigned to Word32, floats to Float,
  20. -- symbols to Strings / Text.
  21. data Edge = Edge String String
  22. deriving stock (Eq, Show, Generic)
  23. -- For simple product types, we can automatically generate the
  24. -- marshalling/unmarshalling code of data between Haskell and datalog.
  25. deriving anyclass Souffle.Marshal
  26. -- By making a data type an instance of Fact, we give Haskell the
  27. -- necessary information to bind to the datalog fact.
  28. deriving Souffle.Fact
  29. via Souffle.FactOptions Edge "edge" 'Souffle.Input
  30. data Reachable = Reachable String String
  31. deriving stock (Eq, Show, Generic)
  32. deriving anyclass Souffle.Marshal
  33. deriving Souffle.Fact
  34. via Souffle.FactOptions Reachable "reachable" 'Souffle.Output
  35. main :: IO ()
  36. main = Souffle.runSouffle Path $ \maybeProgram -> do -- Initializes the Souffle program.
  37. case maybeProgram of
  38. Nothing -> liftIO $ putStrLn "Failed to load program."
  39. Just prog -> do
  40. Souffle.addFact prog $ Edge "d" "i" -- Adding a single fact from Haskell side
  41. Souffle.addFacts prog [ Edge "e" "f" -- Adding multiple facts
  42. , Edge "f" "g"
  43. , Edge "f" "g"
  44. , Edge "f" "h"
  45. , Edge "g" "i"
  46. ]
  47. Souffle.run prog -- Run the Souffle program
  48. -- NOTE: You can change type param to fetch different relations
  49. -- Here it requires an annotation since we directly print it
  50. -- to stdout, but if passed to another function, it can infer
  51. -- the correct type automatically.
  52. -- A list of facts can also be returned here.
  53. results :: Vector Reachable <- Souffle.getFacts prog
  54. liftIO $ traverse_ print results
  55. -- We can also look for a specific fact:
  56. maybeFact <- Souffle.findFact prog $ Reachable "a" "c"
  57. liftIO $ print $ maybeFact

For more examples of how to use the top level API, you can also take a look at
the tests.

Getting started

This library assumes that the Souffle include paths are properly set.
This is needed in order for the C++ code to be compiled correctly.
The easiest way to do this (that I know of) is via Nix.
Add souffle to the build inputs of your derivation and everything will
be set correctly.
Without Nix, you will have to follow the manual install instructions
on the Souffle website.

In your package.yaml or .cabal file, make sure to add the following options
(assuming package.yaml here):

  1. # ...
  2. cxx-options:
  3. - -D__EMBEDDED_SOUFFLE__
  4. cxx-sources:
  5. - /path/to/FILE.cpp # be sure to change this according to what you need!
  6. # ...

This will instruct the Souffle compiler to compile the C++ in such a way that
it can be linked with other languages (including Haskell!).

For an example, take a look at the configuration for the
test suite of this project.

If you run into C++ compilation issues when using stack, this might be because
the -std=c++17 flag is not being used correctly when compiling souffle-haskell.
To fix this, you can add the following to your stack.yaml:

  1. ghc-options:
  2. souffle-haskell: -optcxx-std=c++17

Souffle EDSL

This package previously contained a Haskell EDSL for writing Souffle code
directly in Haskell. This has now been moved to a separate
package.

Documentation

The documentation for the library can be found on
Hackage.
Language.Souffle.Class is a good starting point for getting an overview of
the top level API.

Supported modes

Souffle programs can be run in 2 ways. They can either run in interpreted mode
(using the souffle CLI command), or they can be compiled to C++-code and
called from a host program for improved efficiency. This library supports both
modes (since version 0.2.0). The two variants have only a few minor differences
and can be swapped fairly easily.

Interpreted mode

This is probably the mode you want to start out with if you are developing a
program that uses Datalog for computing certain relations. Interpreted mode
offers quick development iterations (no compiling of C++ code each time you
change your Datalog code). However because the Souffle code is interpreted,
it can’t offer the same speed as in compiled mode.

If you want to use interpreted Souffle, you need to import the
Language.Souffle.Interpreted module.

Interpreter configuration

The interpreter uses CSV files to read or write facts. The configuration
allows specifiying where the fact directory is located. With the default
configuration, it will try to lookup DATALOG_DIR in the environment and
fall back to the current directory (or .).

You can also configure which souffle executable will be used. By default,
it will first look at the SOUFFLE_BIN environment variable. If this is
not set, it will try to find the executable using the which shell-command.
If it also can’t find the executable this way, then it will fail to
initialize the interpreter.

For more information regarding configuration, take a look at the
runSouffleWith function.

The separators in the CSV fact files cannot be configured at the moment.
A tab character ('\t') is used to separate the different columns.

Compiled mode

Once the prototyping phase of the Datalog algorithm is over, it is advised
to switch over to the compiled mode. It offers much improved performance
compared to the interpreted mode, at the cost of having to recompile your
Datalog algorithm each time it changes.

The main differences with interpreted mode are the following:

  1. Compile the Datalog code with souffle -g.
  2. Import Language.Souffle.Compiled

The motivating example is a complete example for the compiled mode.

Contributing

TLDR: Nix-based project; the Makefile contains the most commonly used commands.

Long version:

The project makes use of Nix to setup the development environment.
Setup your environment by entering the following command:

  1. $ cachix use luctielen # Optional (improves setup time *significantly*)
  2. $ nix-shell

After this command, you can build the project:

  1. $ make configure # configures the project
  2. $ make build # builds the haskell code
  3. $ make lint # runs the linter
  4. $ make hoogle # starts a local hoogle webserver

Issues

Found an issue or missing a piece of functionality?
Please open an issue with a description of the problem.