项目作者: magabriel

项目描述 :
Classifies files according to certain rules.
高级语言: Kotlin
项目地址: git://github.com/magabriel/librarian.git
创建时间: 2016-07-25T09:49:18Z
项目社区:https://github.com/magabriel/librarian

开源协议:Other

下载


Librarian

Build Status

librarian is a rule-based file classifier.

Description

librarian reads files from certain filesystem folders and moves/copies them to some other configured destination folders.

The configuration is written in an easy-to-customize YAML file.

Motivation

I wanted a way to automatically move my downloaded files to certain folders. This project replaces a bash script
I hacked together quite some time ago in a more elegant and efficient way (I hope). Also, it is a good excuse
to teach myself Java (and Kotlin).

Features

  • YAML configuration file.
  • Input files can be any type (regex driven on filename).
  • TV shows episodes are recognized via a special regex and classified by TV show name and season.
  • Optional logging.
  • Optional RSS feed.

Changelog

0.8

  • Replaced the copy+delete operation by move.

0.7

  • Added Dagger2.
  • Ported to Kotlin.

0.5 and before

  • Work in progress.

Documentation

Requirements

  • Java 8.

Installation

  • From source:

    Clone this repository and do:

    1. ./gradlew java
  • Or download the last distribution jar.

Usage

java -jar /path/to/librarian.jar --help will print all command line arguments.

The librarian.yml file

Running java -jar /path/to/librarian.jar in an empty folder will produce an error message complaining that no
librarian.yml can be found.

Create a default librarian.yml

To create a default librarian.yaml, run java -jar /path/to/librarian.jar --create-config and the default
configuration file will be created in the current folder.

  1. config:
  2. extensions:
  3. - video: [avi, mpeg, mpg, mov, wmv, mp4, m4v, mkv, srt, sub]
  4. - audio: [mp3, ogg]
  5. - book: [pdf, epub, fb2, mobi, azw]
  6. filters:
  7. - tvshow:
  8. # "name.S01E02.title.avi"
  9. - "(?<name>.+)S(?<season>[0-9]{1,2})E(?<episode>[0-9]{1,3})(?<rest>.*)"
  10. # "name.1x02.title.avi"
  11. - "(?<name>.+(?:[^\\p{Alnum}]))(?<season>[0-9]{1,2})x(?<episode>[0-9]{1,3})(?<rest>.*)"
  12. # "name.102.title.avi" (avoid matching movies with year)
  13. - "(?<name>.+(?:[^\\p{Alnum}\\(]))(?<season>[0-9]{1})(?<episode>[0-9]{2})(?:(?<rest>[^0-9].*)|\\z)"
  14. content_classes:
  15. - tvshows:
  16. extension: video
  17. filter: tvshow
  18. - videos:
  19. extension: video
  20. - music:
  21. extension: audio
  22. - books:
  23. extension: book
  24. tvshows:
  25. numbering_schema: "S{season:2}E{episode:2}"
  26. season_schema: "Season_{season:2}"
  27. words_separator:
  28. show: "_"
  29. file: "_"
  30. errors:
  31. unknown_files:
  32. action: move # ignore, move, delete
  33. move_path: /my/errors/folder/unknown
  34. duplicate_files:
  35. action: move # ignore, move, delete
  36. move_path: /my/errors/folder/duplicates
  37. error_files:
  38. action: move # ignore, move, delete
  39. move_path: /my/errors/folder/errors
  40. execute:
  41. success: "success_script.sh"
  42. error: "error_script.sh"
  43. input:
  44. folders:
  45. - /my/input/folder1
  46. - /my/input/folder2
  47. output:
  48. folders:
  49. -
  50. path: /my/output/folder/tvshows
  51. contents: tvshows
  52. -
  53. path: /my/output/folder/tvshows2
  54. contents: tvshows
  55. -
  56. path: /my/output/folder/movies
  57. contents: videos
  58. -
  59. path: /my/output/folder/music
  60. contents: music
  61. -
  62. path: /my/output/folder/books
  63. contents: books

Customize librarian.yaml

  • config.extensions: A list of extensions, in the form extension_type: list_of_extensions. Each filetype can be
    use later to define a content class.

  • filters: A list of filter definitions in the form filter_name: list_of_regular_expressions. Feel free to use
    your own names except for the special types tvshows and music entries, which may also requires an special format
    (see below). Filters can be used to define content classes.

  • config.content_classes: A list of content classes, each one having one extension type and one filter name. The name
    of the content class can be later used when defining input folders.

  • config.tvshows.numbering_schema: The numbering schema to use for output TV shows episode files. The file will be
    renamed using this pattern. You can use {season:N} and {episode:N} placeholders for season and episode numbers,
    where N stands for the length, zero padded.

  • config.tvshows.season_schema: The nameing schema to use TV shows season folders. The folders will be created using
    this pattern. {season:N} is available as explained above.

  • config.tvshows.words_separator: Contains the characters that will be used to replace word separators in show folders
    and the episode file itself.

  • config.errors.unknown_files: Define what to do with unrecognized files. The default action is ignore, so they
    will be left in the input folder. Action move will move them to the move_path while action delete will delete
    them.

  • config.duplicate.error_files: Same as unknown_files for files with duplicate files errors.

  • config.errors.error_files: Same as unknown_files for files with processing errors.

  • config.execute.success: A command or script to execute for each successfully processed file. Example:

    1. #!/bin/bash
    2. INPUTFOLDER=$1
    3. INPUTFILENAME=$2
    4. OUTPUTFOLDER=$3
    5. OUTPUTFILENAME=$4
    6. CLASS=$5
    7. ACTION=$6
    8. echo "SUCCESS: $INPUTFOLDER; $INPUTFILENAME; $OUTPUTFOLDER; $OUTPUTFILENAME; $ACTION; $CLASS"
  • config.execute.error: A command or script to execute for each errored file. Example:

    1. #!/bin/bash
    2. INPUTFOLDER=$1
    3. INPUTFILENAME=$2
    4. OUTPUTFOLDER=$3
    5. OUTPUTFILENAME=$4
    6. ACTION=$5
    7. echo "ERROR: $INPUTFOLDER; $INPUTFILENAME; $OUTPUTFOLDER; $OUTPUTFILENAME; $ACTION"
  • input.folders: A list of paths to one or more input folders (i.e. where the input file will be found).

  • output.folders: A list of output folders definitions (where the files will be copied to). See below for format.

TV Shows

Tv shows are special, because we need to capture the name of the show and the season and episode numbers.

  1. filters:
  2. - tvshow:
  3. # "name.S01E02.title.avi"
  4. - "(?<name>.+)S(?<season>[0-9]{1,2})E(?<episode>[0-9]{1,3})(?<rest>.*)"
  5. # "name.1x02.title.avi"
  6. - "(?<name>.+(?:[^\\p{Alnum}]))(?<season>[0-9]{1,2})x(?<episode>[0-9]{1,3})(?<rest>.*)"
  7. # "name.102.title.avi" (avoid matching movies with year)
  8. - "(?<name>.+(?:[^\\p{Alnum}\\(]))(?<season>[0-9]{1,2})(?<episode>[0-9]{2})(?<rest>[^0-9].*)?"

These defintions will match files of the form My tv show name S01E02 whatever.*, My tv show name 01x02 whatever.* and
My tv show name 102 whatever.*

Things to remember:

  • There can be several defintions with the same name.
  • Each of the regexes must have the following capture groups:
    • name: the name of the TV show.
    • season: the season number.
    • episode: the episode number.
    • rest: any other information left in the filename.

Finally, the content class name for tvshows must be tvshows.

Music albums

Files matched by music content class are assumed to be individual tracks in an album if they are inside a subfolder.
The subfolder containing the files will be copied as is.

Output folders definitions

output.folders is a list of output folders definition, each one of the form:

  • path: The absolute or relative path of that folder.
  • contents: The name of one of the content types defined in config.content_types.

The last one will be used as default for new TV shows.

Execute the process

java -jar /path/to/librarian.jar will read the librarian.yml file in the current directory and act accordingly.

Both a log and a RSS files are writen in the current directory explaining what has been done.

Several aspects can be customized using the provided command liner arguments: java -jar /path/to/librarian.jar --help