Accuracy-Weighted Ensembles, 129 , 209
AccuracyUpdatedEnsemble, 130 , 209
AccuracyWeightedEnsemble, 130 , 209
Fixed Uncertainty Strategy, 119
in MOA, 211
Random Strategy, 119
Uncertainty Strategy with Randomization, 121
Variable Uncertainty Strategy, 119
ActiveClassifier, 211
Adaboost, 135
AdaGraphMiner algorithm, 179 , 189
AdaHoeffdingOptionTree, 209
ADAMS project, 190
adaptive bagging, see ADWIN Bagging
Adaptive Random Forests, 137
Adaptive-Size Hoeffding Trees, 138 , 209
AddNoiseFilter, 206
ADWIN Bagging, 17 , 133 , 200 , 209
ADWIN sketch, 79 , 82 , 108 , 179 , 211
AgrawalGenerator, 206
Agresti-Coull bound, 39
AMS (Alon-Matias-Szegedy) sketch, 57
Android operating system, 190
Apex, 196
approximation, 36
absolute, 36
( 𝜖 , δ )-approximation, 36 , 37 , 62 , 64
relative, 36
Area under the curve (AUC), 90
ArffFileStream, 204
ARL, Average Run Length, 75
attributes, 85
AUC, 90
Bayes’ theorem, 95
Bernstein’s inequality, 39
bias (in classifiers), 94
BICO algorithm, 154
Big Data, 3
challenges, 6
hidden, 7
Three V’s, 3
BIRCH algorithms, 152
Bloom filter, 43
boosting, 135
bootstrap, 133
C++ language, 195
CART, 101
centers (clustering), 149
centroids (clustering), 149
CF trees, 153
change in data streams, see drift
Chebyshev’s inequality, 38 , 46 , 62 , 92
comparing classifiers, 92
concept evolution, 121
CVFDT, 105
decision stump, 208
delayed feedback, 13
ensembles, 71 , 82 , see also ensembles
evaluation, 86
Hoeffding Adaptive Tree, 108
Hoeffding Tree, 102
lazy learning, see k -NN (nearest neighbors)
Majority Class classifier, 94
missing feedback, 13
multi-label, 115
Multinomial Naive Bayes, 98
Naive Bayes, 95
No-change classifier, 94
perceptron, 113
UFFT, 107
VFDT, 104
VFDTc, 107
closed pattern, 169
CloseGraph algorithm, 170 , 179 , 182
cluster mapping measure (CMM), 151
BICO, 154
BIRCH, 152
centroids or centers, 149
CluStream, 154
ClusTree, 156
CobWeb, 212
cost functions, 149
DBSCAN, 155
Den-Stream, 155
density-based, 155
distance function, 149
distributed, 200
evaluation, 150
k -means++, 152
microclusters, 152
other methods, 159
similarity, 149
surveys, 159
CluStream algorithm, 154 , 212 , 213
CM-sketch, see Count-Min sketch
CMM (cluster mapping measure), 151
CobWeb algorithm, 212
cohesion measure (clustering), 150
communities, 18
comparing classifiers, 92
concentration inequalities, 37 , 101
concept drift, see drift
concept evolution, 121
ConceptDriftRealStream, 205
ConceptDriftStream, 204
confusion matrix, 91
coresets
coreset tree, 158
in clustering, 158
in pattern mining, 172 , 178 , 182
cost measures, 93
Count-Min sketch, 51 , 60 , 81 , 82
counting
distinct or unique items, 40 , 42 , 48
items, 41
CountSketch, 54
distributed, 88
data streams, 35
adversarial vs. stochastic, 35 , 69
change, see drift
frequency moments, 56
in disaster management, 9
in e-commerce, 9
in healthcare, 9
in marketing, 9
in social media, 9 , 189 , 190
in utilities, 9
items, 36
Markovian, 69
dataset shift, 68
DBSCAN algorithm, 155
DDM, Drift Detection Method, 78 , 82 , 83 , 107 , 211
decay factor, 73
Decision Stump classifier, 208
split criteria, 101
delayed feedback, 13
δ , confidence parameter, 37
Den-Stream algorithm, 155 , 212
density-based clustering, 155
distinct items, see counting
distributed evaluation, 88
drift, 67
gradual, 69
shift, 69
simulating in MOA, 22 , 25 , 204
strategies to manage, 70
types of, 69
Accuracy-Weighted, 129
Adaboost, 135
Adaptive Random Forests, 137
Adaptive Size Hoeffding Tree, 138
boosting, 135
exponentiated gradient, 132
Hoeffding Option Tree, 136
in MOA, 209
Leveraging Bagging, 134
Online Bagging, 133
Online Boosting, 135
random forests, 136
Weighted Majority, 130
đťś– , accuracy parameter, 36
Equal-frequency discretization, 109
Equal-width discretization, 109
error-correcting output codes, 134
estimators, 72
AUC, 90
cross-validation, see cross-validation
distributed, see distributed evaluation
holdout, see holdout evaluation
in clustering, 150
interleaved chunks, see interleaved chunks evaluation
prequential, see prequential evaluation
statistical significance, 92
test-then-train, see test-then-train evaluation
EWMA estimator, 73 , 82 , 151 , 211
exhaustive binary tree, 110 , 146
Exponential Histograms, 57 , 61 , 64 , 73 , 80
exponentiated gradient algorithm, 132
Facebook graph, 48
fading factor, 73
Fayyad and Irani’s discretization, 109
feature extraction, 10
features, see attributes
FilteredStream, 205
FIMT-DD, 146
Flajolet-Martin counter, 45 , 60
FP-Growth algorithm, 19 , 168 , 175
FP-Stream algorithm, 175
FP-Tree, 168
frequency moments (in streams), 56
frequency problems, 48
frequent elements, see heavy hitters
frequent pattern, see pattern mining
Frequent sketch, 49
FrugalStreaming sketch, 54
Gaussian distribution, 38 , 111
Gini impurity index, 101
gnuplot, 219
GPU computing, 137
graphical models, 94
GraphX, 6
families of random, 61
fully independent, 61
in practice, 62
pairwise independent, 61
HDFS, 5
by sampling, 49
in itemset mining, 174
in pattern mining, 174
surveys, 49
Hoeffding Adaptive Tree classifier, 17 , 108 , 209
Hoeffding adaptive tree classifier, 195
Hoeffding Option Tree classifier, 136 , 146 , 209
Hoeffding Tree classifier, 16 , 102 , 190 , 208
multi-label, 117
vertical, 200
Hoeffding’s bound, 38 , 46 , 63 , 65 , 81 , 82 , 92 , 101 , 102 , 172 , 177
holdout evaluation, 14 , 87 , 204
Huawei, 195
HyperANF counter, 47
HyperplaneGenerator, 206
hypothesis testing, see statistical tests
iceberg queries, 49
IncMine algorithm, 19 , 176 , 183 , 189
information gain, 101 , 101 , 117
interleaved chunks evaluation, 88 , 204
items, 36
itemset, 165
Java language, 187 , 188 , 195 , 196 , 221 , 227
good practices, 238
Kalman filter estimator, 74
Kappa architecture, 6
Kappa M statistic, 90
Kappa statistic, 90
Kappa temporal statistic, 91
k -grams, counting, 42
k -means++ algorithm, 152
k -NN (nearest neighbors), 15 , 190
for regression, 145
Lambda architecture, 6
large-deviation bounds, see concentration inequalities
lazy learning, see k -NN (nearest neighbors)
learning rate, 114
LEDGenerator, 206
LEDGeneratorDrift, 207
linear estimator, 73
linear regression, 143
Lossy Counting sketch, 49 , 174
Mahout, 6
Majority Class classifier, 15 , 94 , 210
Markov’s inequality, 38 , 53 , 92
maximal pattern, 169
McDiarmid’s inequality, 39 , 101
McNemar’s test, 93
MDL, Minimum Description Length, 109
MDR, Missed Detection Rate, 75
MEKA project, 193
Mergeability, 60
Merging sketches, 60
microclusters, 18 , 152 , 154 , 200
Milgram’s degrees of separation, 48
Misra-Gries counter, 49
missing data, 10
missing feedback, 13
MLIB, 6
adding classes to, 227
API, 221
clustering, 160
Command Line Interface (CLI), 29 , 217
compiling code for, 237
discretization, 190
distributed, see SAMOA
evaluation, 22–31 , 203 , 218
extensions, 189
for Android, 190
for social media analysis, 189 , 190 , 192
for video processing, 193
generators, 160 , 204 , 204 , 212
good programming practices, 237
Hadoop, 196
modifying the behavior of, 227
multi-target learning, 188
outlier detection, 188
programming applications that use, 221
recent developments, 188
recommender systems, 189
running tasks, 22 , 123 , 201 , 217
SAMOA, 196
Spark, 195
visualization, 212
MOA-TweetReader, 189
MOAReduction, 190
Moment algorithm, 19 , 174 , 189
moment computation, 56
Morris’s counter, 41 , 61 , 63
motif discovery, 10
MTD, Mean Time to Detection, 75
MTFA, Mean Time between False Alarms, 75
multi-label classification, 115
BR method, 115
in MOA, 193
LC method, 115
multi-label Hoeffding Tree, 116
PW method, 116
multi-target learning, 188
Multinomial Naive Bayes classifier, 98 , 208
Naive Bayes
Multinomial, see Multinomial Naive Bayes classifier
Naive Bayes classifier, 16 , 95 , 105 , 208
neighborhood function (in graphs), 47
No-change classifier, 15 , 94 , 210
normal approximation, 38 , 92 , 172
in MOA, 190
OCBoost, 209
Online Bagging, 133
Online Bagging algorithm, 209
Online Boosting algorithm, 209
Onling Boosting algorithm, 135 , 209
OpenML project, 194
outliers, 70 , 81 , 109 , 113 , 188
overfitting (in classifiers), 94
PAC-learning, 37
Page-Hinkley test, 76 , 82 , 83 , 146 , 211
pattern mining, 11 , 18 , 165 , 167
AdaGraphMiner, 179
Apriori, 168
association rules, 182
candidate pattern, 168
closed pattern, 169 , 182 , 183
Eclat, 169
FP-Growth, 168
FP-Stream, 175
generic algorithm on streams, 170
maximal pattern, 169
Moment, 174
pattern size, 167
subpattern, 166
superpattern, 166
support, 166
surveys, 181
WinGraphMiner, 179
for regression, 145
stacking on Hoeffding Trees, 137 , 210
perceptron
for classification, 113
Poisson distribution, 133 , 134
prequential evaluation, 14 , 88 , 90 , 204
Probabilistic counter, see Flajolet-Martin counter
purity measure (clustering), 150
Python language, 195
quantiles, 54
FrugalStreaming sketch, 54
Greenwald and Khanna’s sketch, 111 , 190
in MOA, 190
RAM-hour, 94
random forests, 136
randomized algorithm, 36
RandomRBFGenerator, 207
RandomRBFGeneratorDrift, 207
RandomSEAGenerator, 207
RandomTreeGenerator, 207
ranking / learning to rank, 10
real-time analytics, see data streams
recurrent concepts, 10 , 69 , 139
regression, 143
AMRules, 147
error measures, 144
FIMT-DD, 146
IBLStreams, 145
k -NN, 145
linear regression, 143
Perceptron, 145
Spegasos, 148
stochastic gradient descent, 148
reservoir sampling, 40
SAMOA, 196
for heavy hitters, 49
reservoir, see reservoir sampling
Samza, 196
semi-supervised learning, 13
SGD, 210
silhouette coefficient, 150
six degrees of separation, 48
AMS (Alon-Matias-Szegedy), 57
Cohen’s counter, 44
Count-Min, 51
CountSketch, 54
Exponential Histograms, 57 , 73
Flajolet-Martin counter, 45
for linear algebra, 63
for massive graphs, 48
Frequent, 49
FrugalStreaming, 54
HyperLogLog counter, 46
Linear counting, 43
merging, 60
Misra-Gries, 49
Morris’s counter, 41
other sketches, 63
range-sum queries, 53
reservoir sampling, 40
Space Saving, 50 , 64 , 82 , 174 , 183
Sticky Sampling, 49
Stream-Summary, 51
skip counting, 41
sliding windows, 58 , 73 , 79 , 83 , 178
Space Saving sketch, 50 , 61 , 64 , 82 , 174 , 183
split criteria, 101
split-validation, 89
SSQ measure (clustering), 150
stacking, 132
Perceptron on Hoeffding Trees, 137 , 210
STAGGERGenerator, 208
statistical significance, 92
McNemar’s test, 92
Sticky Sampling sketch, 49
stochastic averaging, 46
stochastic gradient descent, 114 , 148 , 210
Storm, 196
stream cross-validation, 90
Stream-Summary structure, 51
StreamDM-C++ project, 195
streaming, see data streams
StreamKM++ algorithm, 158 , 212
Streams project, 196
subpattern, see pattern mining
summaries, see sketches
superpattern, see pattern mining
support (of a pattern), 166
support vector machines (SVM), see kernel methods
TemporallyAugmentedClassifier, 95 , 210
TensorFlow, 6
test-then-train evaluation, 14 , 87 , 204
time series, 68
Twitter, 15 , 85 , 96 , 99 , 121 , 189 , 192
unique items, see counting
unsupervised learning, 11 , 149 , 165
Vertical Hoeffding Tree, 200
VFML, 110
video processing, 193
WaveformGenerator, 208
WaveformGeneratorDrift, 208
Weighted Majority algorithm, 130
WEKA, 10 , 22 , 190 , 193 , 203
WinGraphMiner algorithm, 179