Mining Association Rules and Frequent Itemsets with R
The arules package family for R provides the infrastructure for
representing, manipulating and analyzing transaction data and patterns
using frequent itemsets and association
rules. The
package also provides a wide range of interest
measures and mining
algorithms including the code of Christian Borgelt’s popular and
efficient C implementations of the association mining algorithms
Apriori and
Eclat. In addition, the following
mining algorithms are available via
fim4r:
Code examples can be found in Chapter 5 of the web book R Companion for
Introduction to Data
Mining.
To cite package ‘arules’ in publications use:
Hahsler M, Gruen B, Hornik K (2005). “arules - A Computational
Environment for Mining Association Rules and Frequent Item Sets.”
Journal of Statistical Software, 14(15), 1-25. ISSN 1548-7660,
@Article{,
title = {arules -- {A} Computational Environment for Mining Association Rules and Frequent Item Sets},
author = {Michael Hahsler and Bettina Gruen and Kurt Hornik},
year = {2005},
journal = {Journal of Statistical Software},
volume = {14},
number = {15},
pages = {1--25},
doi = {10.18637/jss.v014.i15},
month = {October},
issn = {1548-7660},
}
Additional mining algorithms
fim4r()
arules
.opus()
with format = 'itemsets'
.In-database analytics
Interface
Classification
Outlier Detection
Recommendation/Prediction
The following R packages use arules
:
aPEAR,
arc,
arulesCBA,
arulesNBMiner,
arulesSequences,
arulesViz,
clickstream,
CLONETv2,
CRE,
ctsem,
discnorm,
fcaR,
fdm2id,
GroupBN,
ibmdbR,
immcp,
inTrees,
opusminer,
pmml,
qCBA,
RareComb,
rattle,
rCBA,
recommenderlab,
rgnoisefilt,
RKEEL,
sbrl,
SurvivalTests,
TELP
Stable CRAN version: Install from within R with
install.packages("arules")
Current development version: Install from
r-universe.
install.packages("arules",
repos = c("https://mhahsler.r-universe.dev",
"https://cloud.r-project.org/"))
Load package and mine some association rules.
library("arules")
data("IncomeESL")
trans <- transactions(IncomeESL)
trans
## transactions in sparse format with
## 8993 transactions (rows) and
## 84 items (columns)
rules <- apriori(trans, supp = 0.1, conf = 0.9, target = "rules")
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.9 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 899
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[84 item(s), 8993 transaction(s)] done [0.01s].
## sorting and recoding items ... [42 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.02s].
## writing ... [457 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
Inspect the rules with the highest lift.
inspect(head(rules, n = 3, by = "lift"))
## lhs rhs support confidence coverage lift count
## [1] {dual incomes=no,
## householder status=own} => {marital status=married} 0.10 0.97 0.10 2.6 914
## [2] {years in bay area=>10,
## dual incomes=yes,
## type of home=house} => {marital status=married} 0.10 0.96 0.10 2.6 902
## [3] {dual incomes=yes,
## householder status=own,
## type of home=house,
## language in home=english} => {marital status=married} 0.11 0.96 0.11 2.6 988
arules
works seamlessly with tidyverse.
For example:
dplyr
can be used for cleaning and preparing the transactions.transaction()
and other functions accept tibble
as input.|>
.ggplot2
.For example, we can remove the ethnic information column before creating
transactions and then mine and inspect rules.
library("tidyverse")
library("arules")
data("IncomeESL")
trans <- IncomeESL |>
select(-`ethnic classification`) |>
transactions()
rules <- trans |>
apriori(supp = 0.1, conf = 0.9, target = "rules", control = list(verbose = FALSE))
rules |>
head(3, by = "lift") |>
as("data.frame") |>
tibble()
## # A tibble: 3 × 6
## rules support confidence coverage lift count
## <chr> <dbl> <dbl> <dbl> <dbl> <int>
## 1 {dual incomes=no,householder status=o… 0.102 0.971 0.105 2.62 914
## 2 {years in bay area=>10,dual incomes=y… 0.100 0.961 0.104 2.59 902
## 3 {dual incomes=yes,householder status=… 0.110 0.960 0.114 2.59 988
arules
and arulesViz
can now be used directly from Python with the
Python package arulespy
available form PyPI.
Please report bugs here on
GitHub. Questions should be
posted on stackoverflow and tagged with
arules.