项目作者: MPLLang

项目描述 :
The MaPLe compiler for Parallel ML
高级语言: Standard ML
项目地址: git://github.com/MPLLang/mpl.git
创建时间: 2013-11-21T19:45:10Z
项目社区:https://github.com/MPLLang/mpl

开源协议:Other

下载


MaPLe (MPL)

MaPLe is a functional language for provably efficient and safe multicore
parallelism.

Features:

  • Support for the full Standard ML programming language, extended with
    task-parallel and data-parallel primitives.
  • Native performance on both x86 and Arm architectures.
  • Whole-program compilation based on MLton,
    with aggressive optimizations to achieve performance competitive
    with languages such as C/C++.
  • Efficient memory representations, including:
    • Untagged and unboxed native integers and floating-point numbers.
    • Flattened tuples and records.
    • Native arrays with contiguous unboxed elements.
  • Simple and fast foreign-function calls into C, based on
    MLtonFFI.
  • Support for both regular and irregular fine-grained parallelism,
    with provably efficient automatic parallelism management [7]
    to control the overheads of task creation.
  • Provably efficient parallel garbage collection based on
    hierarchical memory management and disentanglement
    [1,2,3,4,5,6].
  • Support for large core counts and large memory sizes. MPL scales to hundreds
    of cores, and can efficiently handle heap sizes of as much as 1TB or more.

MPL is being actively developed. If you are interested in contributing to
the project, PRs are welcome!

If you are you interested in using MPL, consider checking
out the tutorial.
You might also be interested in exploring
mpllib
(a library for MPL) and the
Parallel ML benchmark suite.

References

[7]
Automatic Parallelism Management.
Sam Westrick, Matthew Fluet, Mike Rainey, and Umut A. Acar.
POPL 2024.

[6]
Efficient Parallel Functional Programming with Effects.
Jatin Arora, Sam Westrick, and Umut A. Acar.
PLDI 2023.

[5]
Entanglement Detection with Near-Zero Cost.
Sam Westrick, Jatin Arora, and Umut A. Acar.
ICFP 2022.

[4]
Provably Space-Efficient Parallel Functional Programming.
Jatin Arora, Sam Westrick, and Umut A. Acar.
POPL 2021.

[3]
Disentanglement in Nested-Parallel Programs.
Sam Westrick, Rohan Yadav, Matthew Fluet, and Umut A. Acar.
POPL 2020.

[2]
Hierarchical Memory Management for Mutable State.
Adrien Guatto, Sam Westrick, Ram Raghunathan, Umut Acar, and Matthew Fluet.
PPoPP 2018.

[1]
Hierarchical Memory Management for Parallel Programs.
Ram Raghunathan, Stefan K. Muller, Umut A. Acar, and Guy Blelloch.
ICFP 2016.

Try It Out

Instructions for installing MPL natively (on Linux or Mac) are further below.

If you want to quickly try out using MPL, you can download the Docker image and
run one of the examples.

  1. $ docker pull shwestrick/mpl
  2. $ docker run -it shwestrick/mpl /bin/bash
  3. ...# examples/bin/primes @mpl procs 4 --

To write and compile your own code, we recommend
mounting a local directory inside the container. For example, here’s how you
can use MPL to compile and run your own main.mlb in the current directory.
(To mount some other directory, replace $(pwd -P) with a different path.)

  1. $ ls
  2. main.mlb
  3. $ docker run -it -v $(pwd -P):/root/mycode shwestrick/mpl /bin/bash
  4. ...# cd /root/mycode
  5. ...# mpl main.mlb
  6. ...# ./main @mpl procs 4 --

Benchmark Suite

The Parallel ML benchmark suite
provides many examples of sophisticated parallel algorithms and
applications in MPL, as well as cross-language performance comparisons with
C++, Go, Java,
and multicore OCaml.

Libraries and Projects

We recommend using the smlpkg package
manager. MaPLe supports the full SML language, so existing libraries for
SML can be used.

Here are a few libraries and projects that make use of MaPLe for parallelism:

Parallel and Concurrent Extensions

MaPLe extends SML with a number of primitives for parallelism and concurrency.
Take a look at examples/ to see these primitives in action.

The ForkJoin Structure

  1. val par: (unit -> 'a) * (unit -> 'b) -> 'a * 'b
  2. val parfor: int -> (int * int) -> (int -> unit) -> unit
  3. val alloc: int -> 'a array
  4. val parform: (int * int) -> (int -> unit) -> unit
  5. val reducem: ('a * 'a -> 'a) -> 'a -> (int * int) -> (int -> 'a) -> 'a

The par primitive takes two functions to execute in parallel and
returns their results.

The parfor primitive is a “parallel for loop”. It takes a grain-size g, a
range (i, j), and a function f, and executes f(k) in parallel for each
i <= k < j. The grain-size g is for manual granularity
control: parfor splits the input range into approximately (j-i)/g subranges,
each of size at most g, and each subrange is processed sequentially. The
grain-size must be at least 1, in which case the loop is “fully parallel”.
Note: This function should only be used if a reasonable grain size can be
passed as argument. This is often cumbersome.
In general, we recommend using parform instead (described below).

The alloc primitive takes a length and returns a fresh, uninitialized array
of that size. Warning: To guarantee no errors, the programmer must be
careful to initialize the array before reading from it. alloc is intended to
be used as a low-level primitive in the efficient implementation of
high-performance libraries. It is integrated with the scheduler and memory
management system to perform allocation in parallel and be safe-for-GC.

The parform primitive is a parallel for loop, similar to parfor above,
except with no grain parameter. The parallelism of the loop is automatically
managed. In general, we recommend using parform instead of parfor.

The reducem primitive performs an automatically managed parallel reduction.
It takes a “sum” function c, a “zero” element z, a range (i,j), and
a function f, and computes the “sum” of [f(i), ..., f(j-1)] in parallel
with respect to c. For example:

  • reduce op+ 0 (0, Array.length a) (fn i => Array.sub (a, i)) computes the
    sum of an array a
  • reduce Real.max Real.negInf (0, n) (fn i => f (Real.fromInt i / Real.fromInt n))
    samples a function f: real -> real at n evenly-spaced locations in
    the range [0.0, 1.0] to find the maximum value.

The MLton.Parallel Structure

  1. val compareAndSwap: 'a ref -> ('a * 'a) -> 'a
  2. val arrayCompareAndSwap: ('a array * int) -> ('a * 'a) -> 'a
  3. val fetchAndAdd: int ref -> int -> int
  4. val arrayFetchAndAdd: int array * int -> int -> int

compareAndSwap r (x, y) performs an atomic
CAS
which attempts to atomically swap the contents of r from x to y,
returning the original value stored in r before the CAS.
Polymorphic equality is determined
in the same way as MLton.eq, which is a
standard equality check for simple types (char, int, word, etc.) and
a pointer equality check for other types (array, string, tuples, datatypes,
etc.). The semantics are a bit murky.

arrayCompareAndSwap (a, i) (x, y) behaves the same as compareAndSwap but
on arrays instead of references. This performs a CAS at index i of array
a, and does not read or write at any other locations of the array.

fetchAndAdd r d performs an atomic fetch-and-add
which atomically retrieves and the value of the specified memory cell r
adds d to it, returning the original value stored in r before the update.

arrayFetchAndAdd (a, i) d behaves the same as fetchAndAdd except on
arrays instead of references. It performs a fetch-and-add at index i of
array a, and does not read or write at any other locations of the array.

Using MPL

MPL uses .mlb files (ML Basis) to describe
source files for compilation. A typical .mlb file for MPL is shown
below. The first three lines of this file respectively load:

  • The SML Basis Library
  • The ForkJoin structure, as described above
  • The MLton structure, which includes the MPL extension
    MLton.Parallel as described above, as well as various
    MLton-specific features. Not all MLton
    features are supported (see “Unsupported MLton Features” below).
    ```
    ( libraries )
    $(SML_LIB)/basis/basis.mlb
    $(SML_LIB)/basis/fork-join.mlb
    $(SML_LIB)/basis/mlton.mlb

( your source files… )
A.sml
B.sml

  1. ### Compiling a Program
  2. The command to compile a `.mlb` is as follows. By default, MPL
  3. produces an executable with the same base name as the source file, i.e.
  4. this would create an executable named `foo`:

$ mpl [compile-time options…] foo.mlb

  1. MPL has a number of compile-time options derived from MLton, which are
  2. documented [here](http://mlton.org/CompileTimeOptions). Note that MPL only
  3. supports C codegen and does not support profiling.
  4. Some useful compile-time options are
  5. * `-output <NAME>` Give a specific name to the produced executable.
  6. * `-default-type int64 -default-type word64` Use 64-bit integers and words
  7. by default.
  8. * `-debug true -debug-runtime true -keep g` For debugging, keeps the generated
  9. C files and uses the debug version of the runtime (with assertions enabled).
  10. The resulting executable is somewhat peruse-able with tools like `gdb`.
  11. For example:

$ mpl -default-type -int64 -output foo sources.mlb

  1. ### Running a Program
  2. MPL executables can take options at the command line that control the run-time
  3. system. The syntax is

$ [@mpl [run-time options…] —] [program args…]

  1. The runtime arguments must begin with `@mpl` and end with `--`, and these are
  2. not visible to the program via
  3. [CommandLine.arguments](http://sml-family.org/Basis/command-line.html).
  4. Some useful run-time options are
  5. * `procs <N>` Use `N` worker threads to run the program.
  6. * `set-affinity` Pin worker threads to processors. Can be used in combination
  7. with `affinity-base <B>` and `affinity-stride <S>` to pin thread `i` to
  8. processor number `B + S*i`.
  9. * `block-size <X>` Set the heap block size to `X` bytes. This can be
  10. written with suffixes K, M, and G, e.g. `64K` is 64 kilobytes. The block-size
  11. must be a multiple of the system page size (typically 4K). By default it is
  12. set to one page.
  13. For example, the following runs a program `foo` with a single command-line
  14. argument `bar` using 4 pinned processors.

$ foo @mpl procs 4 set-affinity — bar

  1. ## Bugs and Known Issues
  2. ### Basis Library
  3. The basis library is inherited from (sequential) SML. It has not yet been
  4. thoroughly scrubbed, and some functions may not be safe for parallelism
  5. ([#41](https://github.com/MPLLang/mpl/issues/41)).
  6. ### Garbage Collection
  7. * ([#115](https://github.com/MPLLang/mpl/issues/115)) The GC is currently
  8. disabled at the "top level" (outside any calls to `ForkJoin.par`).
  9. For highly parallel programs, this has generally not been a problem so far,
  10. but it can cause a memory explosion for programs that are mostly (or entirely)
  11. sequential.
  12. ## Unsupported MLton Features
  13. Many [MLton-specific features](http://mlton.org/MLtonStructure) are
  14. unsupported, including (but not limited to):
  15. * `share`
  16. * `shareAll`
  17. * `size`
  18. * `Finalizable`
  19. * `Profile`
  20. * `Signal`
  21. * `Thread` (partially supported but not documented)
  22. * `Cont` (partially supported but not documented)
  23. * `Weak`
  24. * `World`
  25. ## Build and Install (from source)
  26. ### Requirements
  27. MPL can be installed natively on Linux (x86-64) or Mac (x86-64 or Arm),
  28. including Apple Silicon chips (M1, M2, M3, etc.)
  29. The following software is required.
  30. * [GCC](http://gcc.gnu.org)
  31. * [GMP](http://gmplib.org) (GNU Multiple Precision arithmetic library)
  32. * [GNU Make](http://savannah.gnu.org/projects/make), [GNU Bash](http://www.gnu.org/software/bash/)
  33. * binutils (`ar`, `ranlib`, `strip`, ...)
  34. * miscellaneous Unix utilities (`diff`, `find`, `grep`, `gzip`, `patch`, `sed`, `tar`, `xargs`, ...)
  35. * Standard ML compiler and tools:
  36. - Recommended: [MLton](http://mlton.org) (`mlton`, `mllex`, and `mlyacc`). Pre-built binary packages for MLton can be installed via an OS package manager or (for select platforms) obtained from http://mlton.org.
  37. - Supported but not recommended: [SML/NJ](http://www.smlnj.org) (`sml`, `ml-lex`, `ml-yacc`).
  38. * (If using [`mpl-switch`](https://github.com/mpllang/mpl-switch)): Python 3, and `git`.
  39. ### Installation with `mpl-switch` (Mac and Linux)
  40. The [`mpl-switch`](https://github.com/mpllang/mpl-switch) utility makes it
  41. easy to install multiple versions of MPL on the same system and switch
  42. between them:
  43. ```bash
  44. $ git clone https://github.com/mpllang/mpl-switch
  45. # ... add ./mpl-switch/ to your PATH ...
  46. $ mpl-switch init
  47. $ mpl-switch install v0.5.3
  48. $ mpl-switch select v0.5.3
  49. # MPL is now installed at ~/.mpl/bin/mpl
  50. # final step: add ~/.mpl/bin/ to your PATH

You can use any commit hash or tag name from the MPL repo to pick a
particular version of MPL. Installed versions are stored in ~/.mpl/; this
folder is safe to delete at any moment, as it can always be regenerated. To
see what versions of MPL are currently installed, do:

  1. $ mpl-switch list

Manual Instructions (Linux)

You can manually build mpl by cloning this repo and then performing the following.

Build the executable. This produces an executable at build/bin/mpl:

  1. $ make

Put it where you want it. After building, MPL can then be installed
to a custom directory with the PREFIX option:

  1. $ make PREFIX=/opt/mpl install

Note: At the moment, we do not recommend doing make install without
setting PREFIX=..., because this can clobber an existing installation of
MLton. (See issue #170.)

Manual Instructions (Mac)

You can manually build mpl by cloning this repo and then performing the following.

Make sure you have GNU make and gmp installed. You can install these
with Homebrew as follows. You’ll also need all the other
dependencies as well, listed above (e.g., mlton).

  1. $ brew install mlton make gmp

Build the executable. Make sure you are using GNU make, which should be
available as gmake after doing brew install make above. We also need to
tell the Makefile about where gmp is installed.

  1. $ gmake WITH_GMP_DIR=$(brew --prefix gmp)