项目作者: mhahsler

项目描述 :
Mining Association Rules and Frequent Itemsets with R
高级语言: C
项目地址: git://github.com/mhahsler/arules.git
创建时间: 2015-10-12T03:35:42Z
项目社区:https://github.com/mhahsler/arules

开源协议:GNU General Public License v3.0

下载


R package arules - Mining Association Rules and Frequent Itemsets" class="reference-link"> R package arules - Mining Association Rules and Frequent Itemsets

r-universe
status
Package on
CRAN
CRAN RStudio mirror
downloads
Anaconda.org
StackOverflow

Introduction

The arules package family for R provides the infrastructure for
representing, manipulating and analyzing transaction data and patterns
using frequent itemsets and association
rules
. The
package also provides a wide range of interest
measures
and mining
algorithms including the code of Christian Borgelt’s popular and
efficient C implementations of the association mining algorithms
Apriori and
Eclat. In addition, the following
mining algorithms are available via
fim4r:

  • Apriori
  • Eclat
  • Carpenter
  • FPgrowth
  • IsTa
  • RElim
  • SaM

Code examples can be found in Chapter 5 of the web book R Companion for
Introduction to Data
Mining
.

To cite package ‘arules’ in publications use:

Hahsler M, Gruen B, Hornik K (2005). “arules - A Computational
Environment for Mining Association Rules and Frequent Item Sets.”
Journal of Statistical Software, 14(15), 1-25. ISSN 1548-7660,

https://doi.org/10.18637/jss.v014.i15.

  1. @Article{,
  2. title = {arules -- {A} Computational Environment for Mining Association Rules and Frequent Item Sets},
  3. author = {Michael Hahsler and Bettina Gruen and Kurt Hornik},
  4. year = {2005},
  5. journal = {Journal of Statistical Software},
  6. volume = {14},
  7. number = {15},
  8. pages = {1--25},
  9. doi = {10.18637/jss.v014.i15},
  10. month = {October},
  11. issn = {1548-7660},
  12. }

Packages

arules core packages

  • arules: arules base
    package with data structures, mining algorithms (APRIORI and ECLAT),
    interest measures.
  • arulesViz: Visualization of
    association rules.
  • arulesCBA: Classification
    algorithms based on association rules (includes CBA).
  • arulesSequences:
    Mining frequent sequences (cSPADE).

Additional mining algorithms

  • arulesNBMiner: Mining
    NB-frequent itemsets and NB-precise rules.
  • fim4r: Provides fast implementations
    for several mining algorithms. An interface function called fim4r()
    is provided in arules.
  • opusminer: OPUS Miner
    algorithm for finding the op k productive, non-redundant itemsets.
    Call opus() with format = 'itemsets'.
  • RKEEL: Interface to KEEL’s
    association rule mining algorithm.
  • RSarules: Mining
    algorithm which randomly samples association rules with one pre-chosen
    item as the consequent from a transaction dataset.

In-database analytics

  • ibmdbR: IBM in-database
    analytics for R can calculate association rules from a database table.
  • rfml: Mine frequent
    itemsets or association rules using a MarkLogic server.

Interface

  • rattle: Provides a
    graphical user interface for association rule mining.
  • pmml: Generates PMML
    (predictive model markup language) for association rules.

Classification

  • arc: Alternative CBA
    implementation.
  • inTrees: Interpret Tree
    Ensembles provides functions for: extracting, measuring and pruning
    rules; selecting a compact rule set; summarizing rules into a learner.
  • rCBA: Alternative CBA
    implementation.
  • qCBA: Quantitative
    Classification by Association Rules.
  • sblr: Scalable Bayesian
    rule lists algorithm for classification.

Outlier Detection

Recommendation/Prediction

  • recommenerlab: Supports
    creating predictions using association rules.

The following R packages use arules:
aPEAR,
arc,
arulesCBA,
arulesNBMiner,
arulesSequences,
arulesViz,
clickstream,
CLONETv2,
CRE,
ctsem,
discnorm,
fcaR,
fdm2id,
GroupBN,
ibmdbR,
immcp,
inTrees,
opusminer,
pmml,
qCBA,
RareComb,
rattle,
rCBA,
recommenderlab,
rgnoisefilt,
RKEEL,
sbrl,
SurvivalTests,
TELP

Installation

Stable CRAN version: Install from within R with

  1. install.packages("arules")

Current development version: Install from
r-universe.

  1. install.packages("arules",
  2. repos = c("https://mhahsler.r-universe.dev",
  3. "https://cloud.r-project.org/"))

Usage

Load package and mine some association rules.

  1. library("arules")
  2. data("IncomeESL")
  3. trans <- transactions(IncomeESL)
  4. trans
  1. ## transactions in sparse format with
  2. ## 8993 transactions (rows) and
  3. ## 84 items (columns)
  1. rules <- apriori(trans, supp = 0.1, conf = 0.9, target = "rules")
  1. ## Apriori
  2. ##
  3. ## Parameter specification:
  4. ## confidence minval smax arem aval originalSupport maxtime support minlen
  5. ## 0.9 0.1 1 none FALSE TRUE 5 0.1 1
  6. ## maxlen target ext
  7. ## 10 rules TRUE
  8. ##
  9. ## Algorithmic control:
  10. ## filter tree heap memopt load sort verbose
  11. ## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
  12. ##
  13. ## Absolute minimum support count: 899
  14. ##
  15. ## set item appearances ...[0 item(s)] done [0.00s].
  16. ## set transactions ...[84 item(s), 8993 transaction(s)] done [0.01s].
  17. ## sorting and recoding items ... [42 item(s)] done [0.00s].
  18. ## creating transaction tree ... done [0.00s].
  19. ## checking subsets of size 1 2 3 4 5 6 done [0.02s].
  20. ## writing ... [457 rule(s)] done [0.00s].
  21. ## creating S4 object ... done [0.00s].

Inspect the rules with the highest lift.

  1. inspect(head(rules, n = 3, by = "lift"))
  1. ## lhs rhs support confidence coverage lift count
  2. ## [1] {dual incomes=no,
  3. ## householder status=own} => {marital status=married} 0.10 0.97 0.10 2.6 914
  4. ## [2] {years in bay area=>10,
  5. ## dual incomes=yes,
  6. ## type of home=house} => {marital status=married} 0.10 0.96 0.10 2.6 902
  7. ## [3] {dual incomes=yes,
  8. ## householder status=own,
  9. ## type of home=house,
  10. ## language in home=english} => {marital status=married} 0.11 0.96 0.11 2.6 988

Using arules with tidyverse

arules works seamlessly with tidyverse.
For example:

  • dplyr can be used for cleaning and preparing the transactions.
  • transaction() and other functions accept tibble as input.
  • Functions in arules can be connected with the pipe operator |>.
  • arulesViz provides
    visualizations based on ggplot2.

For example, we can remove the ethnic information column before creating
transactions and then mine and inspect rules.

  1. library("tidyverse")
  2. library("arules")
  3. data("IncomeESL")
  4. trans <- IncomeESL |>
  5. select(-`ethnic classification`) |>
  6. transactions()
  7. rules <- trans |>
  8. apriori(supp = 0.1, conf = 0.9, target = "rules", control = list(verbose = FALSE))
  9. rules |>
  10. head(3, by = "lift") |>
  11. as("data.frame") |>
  12. tibble()
  1. ## # A tibble: 3 × 6
  2. ## rules support confidence coverage lift count
  3. ## <chr> <dbl> <dbl> <dbl> <dbl> <int>
  4. ## 1 {dual incomes=no,householder status=o… 0.102 0.971 0.105 2.62 914
  5. ## 2 {years in bay area=>10,dual incomes=y… 0.100 0.961 0.104 2.59 902
  6. ## 3 {dual incomes=yes,householder status=… 0.110 0.960 0.114 2.59 988

Using arules from Python

arules and arulesViz can now be used directly from Python with the
Python package arulespy
available form PyPI.

Support

Please report bugs here on
GitHub.
Questions should be
posted on stackoverflow and tagged with
arules
.

References