"Machine learning is automatic programming!" — Me

# Mean Absolute Errors

## 2022-05-12

Mean absolute errors are error absolute value averages.

# Root Mean Square Errors

## 2022-05-12

Root mean square errors are residual root mean squares.

# Features

## 2022-04-22

Features are properties in machine learning inputs.

# Loss Functions

## 2022-04-21

Loss functions define model result errors.

# Bidirectional Recurrent Neural Networks

## 2022-04-04

Bidirectional recurrent neural networks combine the results of two recurrent neural networks where one receives the inputs in reverse order.

# Recurrent Neural Networks

## 2022-04-04

Recurrent neural networks are stateful artificial neural networks where inputs modify states. Results depend on inputs as well as states.

# Thresholds

## 2022-03-25

Many classification models depend on whether certain probabilities are above certain minima. These minima are referred to as thresholds.

# Batch Normalization

## 2022-03-15

It is often useful to modify backpropagation method layer inputs so that each input element has an average of zero and a variance of one per training data subset. This process is referred to as batch normalization.

# Backpropagation Momenta

## 2022-03-15

It is often useful in backpropagation method updates to include fractions of previous updates. These fractions are referred to as momenta.

# Specificity

## 2022-03-15

Sensitivity (recall) corresponds to the fraction of positive values predicted. Specificity corresponds to the fraction of negative values predicted.

# Accuracy

## 2022-03-14

The fractions of values, along the diagonals of confusion matrices, correspond to accuracy.

# Correlation Coefficients And Matrices

## 2022-03-11

Correlation coefficients denote the linearity in relations between two variables. Correlation matrices contain multiple correlation coefficients: # Scatter Plot Matrices

## 2022-03-11

Scatter plot matrices contain multiple scatter plots: # Dropout

## 2022-03-11

Dropout is a backpropagation method regularization technique. It involves ignoring randomly selected function values.

# Imputation

## 2022-03-10

Imputation is the process of replacing missing values in datasets by inferences from the present values.

# Backpropagation Method Batches

## 2022-03-05

Backpropagation method updates can be applied per training subset. These training subsets are referred to as batches. Smaller batch sizes typically lead to greater model accuracies.

# Random Forest Methods

## 2022-02-25

Random forest methods are decision tree methods that use bootstrapping.

# Decision Tree Methods

## 2022-02-25

Decision tree methods create flowchart models.

# Box Cox Transformations

## 2022-02-17

Box Cox transformations, (x - 1) / k, make frequency distributions closer to normal distributions for 0 < k < 1. They are equivalent to log(x) in the limit as k approaches 0. These transformations are also referred to as power transformations.

# Bootstrapping

## 2022-02-17

Bootstrapping is an ensemble method using models built from randomly selected subsets of input output pairs. Input output pairs may be present in multiple subsets.

# Logistic Regression

## 2022-02-17

Logistic regression is regression with the logistic function to build classification models: # Softmax Function

## 2022-02-16

The softmax function turns vectors into ones with components that add up to one. They are often used in machine learning models where results are probabilities: # Pie Charts And Stacked Bar Charts

## 2022-02-15

Pie charts denote compositions of entities. Stacked bar charts denote compositions of multiple entities: # Scatter Plots And Bubble Charts

## 2022-02-15

Scatter plots denote relations between two variables. Bubble charts denote relations between three variables: # Histograms And Bar Charts

## 2022-02-15

Histograms denote range frequencies. Bar charts denote variable values. # Box Plots

## 2022-02-15

Box plots divide data into quarters and denote medians: # Confidence Scores

## 2022-02-11

Confidence scores are probabilities of model results being correct.

# Incremental Learning

## 2022-02-11

Incremental learning is building new supervised learning models from existing ones using new training data.

# Object Standardization

## 2022-02-10

Object standardization is the process of replacing words, in texts, that are not found in dictionaries. These include slang, abbreviations and acronyms.

# Oversampling And Undersampling

## 2022-02-10

Both oversampling and undersampling are processes that modify sets to alter proportions of element types. Oversampling methods may involve adding exact or modified copies of elements that are randomly selected. Undersampling methods may involve removing elements that are randomly selected or selected using k means clustering.

# Sensitivity And True Positive Rate

## 2022-02-10

Both sensitivity and true positive rate are other terms for recall.

# Stop Words

## 2022-02-09

Stop words are common words that are ignored in machine learning.

# Type 1 and Type 2 Errors

## 2022-02-09

Type 1 errors are false positives. Type 2 errors are false negatives.

# Stemming And Lemmatization

## 2022-02-09

Both stemming and lemmatization are processes that remove word variations in texts. Stemming provides greater performance by ignoring word contexts. Lemmatization provides greater accuracy but analyzing word contexts.

# Residuals

## 2022-02-08

Residuals are regression model prediction errors.

# Variances And Standard Deviations

## 2022-02-08

Variances are averages of squared deviations from averages. Standard deviations are square roots of variances.

# Word2vec

## 2022-02-06

Word2vec is a set of techniques to convert words into vectors such that closeness corresponds to semantic similarity. Word2vec relies on the observation that semantically similar words tend to occur near the same other words in texts.

# K Means Clustering

## 2022-02-02

K means clustering divides vector sets into subsets based on distance. Consider distances from corresponding subset averages. K means clustering minimizes the sums of the squares of these distances.

# F1 Scores And Macro F1 Scores

## 2022-01-30

F1 scores are the harmonic means of precision and recall values. F1 scores are equal to twice the ratios of products and sums. Macro F1 scores are F1 score averages.

# Precision And Recall

## 2022-01-30

The fractions of values that are correct, in rows and columns of confusion matrices, correspond to precision and recall. Precision depends on the number of false positives. Recall depends on the number of false negatives.

# Confusion Matrices

## 2022-01-29

Confusion matrices give information about classification testing results: # Cross Validation

## 2022-01-15

Comparing supervised learning training and testing results from different data subsets is referred to as cross validation.

# Term Frequency-Inverse Document Frequencies

## 2022-01-12

Term frequency-inverse document frequencies estimate the importance of words in documents. These estimates grow with word frequencies in documents, but, decrease with word frequencies in other documents. Estimates are often presented in matrices where rows and columns correspond to documents and terms respectively.

# Z-Scores And Standardization

## 2022-01-12

The number of standard deviations an element of a set differs from the average is referred to as its z-score. An alternative to min max normalization is to replace numbers with z-scores. This is referred to as standardization.

# Cartesian Products

## 2022-01-12

The Cartesian product of sets A and B is {(xy) : x ϵ A and y ϵ B}.

# Bags Of Words And N-Grams

## 2022-01-12

Lists of words and word frequencies in text data are referred to as bags of words. Lists of phrases, up to some maximum word count, in text data are referred to as n-grams.

# One Hot Encodings

## 2022-01-09

Converting a set of strings to a set of integers establishes an ordering. Converting a set of strings to a set of perpendicular unit vectors does not establish an ordering. This is referred to as one hot encoding.

# Inference

## 2022-01-08

Inference is the process of using supervised learning models.

# Validation And Testing Data

## 2021-03-22

Validation data are used to evaluate supervised learning method variations. Testing data are used to evaluate supervised learning models. Often training, validation and testing data are 80%, 10% and 10% respectively of all the data.

# AutoML

## 2021-03-18

AutoML is a Google service which creates programs using machine learning methods. For example, it can create an image classifier given a large set of categorized images.

# Regularization

## 2021-02-22

Regularization is the use of techniques to avoid overfitting in supervised learning. A common backpropagation method regularization technique is to adjust the error function to penalize large weights and biases. L1 (Lasso) regularization adds a term containing the sum of the absolute values of all the weights and biases. L2 (Ridge) regularization adds a term containing the sum of the squares of all the weights and biases.

# Backpropagation Method

## 2020-10-30

The backpropagation method is an extension of the perceptron method for acyclic artificial neural networks. Acyclic artificial neural networks are defined in terms of the following:

functions f1, f2, f3, ..., fN

weight matrices W1, W2, W3, ..., WN

bias vectors b1, b2, b3, ..., bN

such that the result for an input vector i involves:

o0 = i

oj  = (fj (aj1), fj (aj2), fj (aj3), ..., fj (ajN)) for j = 1, 2, 3, ..., N

aj   = Wj oj -1 + bj for j = 1, 2, 3, ..., N

where oN is the result.

In the backpropagation method, each weight matrix and bias vector is updated for each input output vector pair (i, o) by subtracting a small fraction of the corresponding partial derivative of the error function Eo = (o - oN)2 / 2. The small fraction is referred to as the learning rate. For a derivation of the formulas to calculate these partial derivatives, click here.

Here is sample Python backpropagation method code:

```#!/usr/bin/env python3

"""
Implements the backpropagation method.

Usage:
./backprop <data file>                        \
<data split>                       \
<number of hidden layers>          \
<number of hidden layer functions> \
<number of categories>             \
<learning rate>                    \
<number of epochs>

Data files must be space delimited with one input output pair per line.

Every hidden layer has the same number of functions.

The hidden layer functions are rectified linear unit functions.

The outer layer functions are identity functions.

initialization steps:
The input output pairs are shuffled and the inputs mix max normalized.
The weights and biases are set to random values.

Requires NumPy.
"""

import numpy
import sys

def min_max(data):
"""
Finds the min max normalizations of data.
"""

return (data - numpy.min(data)) / (numpy.max(data) - numpy.min(data))

def init_data(data_file, data_split, n_cat):
"""
Creates the training and testing data.
"""

numpy.random.shuffle(data)
data[:, :-1] = min_max(data[:, :-1])
outputs      = numpy.identity(n_cat)[data[:, -1].astype("int")]
data         = numpy.hstack((data[:, :-1], outputs))
data_split   = int((data_split / 100) * data.shape)

return data[:data_split, :], data[data_split:, :]

def accuracy(data, weights, biases, n_cat):
"""
Calculates the accuracies of models.
"""

results = model(data[:, :-n_cat], weights, biases)

return 100 * (results == answers).astype(int).mean()

def model_(inputs, weights, biases, relu = True):
"""
model helper function
"""

outputs = numpy.matmul(weights, inputs.T).T + biases
if relu:
outputs = numpy.maximum(outputs, 0)

return outputs

def model(inputs, weights, biases):
"""
Finds the model results.
"""

output = model_(inputs, weights, biases)
for e in zip(weights[1:-1], biases[1:-1]):
output = model_(output, e, e)
output = model_(output, weights[-1], biases[-1], False)
output = numpy.argmax(output, 1)

return output

def adjust(weights, biases, input_, output, func_inps, func_outs, learn_rate):
"""
"""

d_e_f_i = [func_outs[-1] - output]
d_e_w   = [numpy.outer(d_e_f_i[-1], func_outs[-2])]
for i in reversed(range(len(weights) - 1)):
func_deriv = numpy.clip(numpy.sign(func_inps[i]), 0, 1)
vector     = numpy.matmul(weights[i + 1].T, d_e_f_i[-1])
func_out   = func_outs[i - 1] if i else input_
d_e_f_i.append(numpy.multiply(vector, func_deriv))
d_e_w.append(numpy.outer(d_e_f_i[-1], func_out))
for i, e in enumerate(reversed(list(zip(d_e_w, d_e_f_i)))):
weights[i] -= learn_rate * e
biases[i]  -= learn_rate * e

def learn(train_data, n_hls, n_hl_funcs, n_cat, learn_rate, n_epochs):
"""
Learns the weights and biases from the training data.
"""

weights = [numpy.random.randn(n_hl_funcs, train_data.shape - n_cat)]
for i in range(n_hls - 1):
weights.append(numpy.random.randn(n_hl_funcs, n_hl_funcs))
weights.append(numpy.random.randn(n_cat, n_hl_funcs))
weights = [e / numpy.sqrt(e.shape) for e in weights]
biases  = [numpy.random.randn(n_hl_funcs) for i in range(n_hls)]
biases.append(numpy.random.randn(n_cat))
biases  = [e / numpy.sqrt(e.shape) for e in biases]
for i in range(n_epochs):
for e in train_data:
input_    = e[:-n_cat]
func_inps = []
func_outs = []
for l in range(n_hls + 1):
input__   = func_outs[l - 1] if l else input_
func_inp  = numpy.matmul(weights[l], input__)
func_inp += biases[l]
relu      = numpy.maximum(func_inp, 0)
func_out  = relu if l != n_hls else func_inp
func_inps.append(func_inp)
func_outs.append(func_out)
biases,
e[:-n_cat],
e[-n_cat:],
func_inps,
func_outs,
learn_rate)

return weights, biases

n_cat                 = int(sys.argv)
train_data, test_data = init_data(sys.argv, float(sys.argv), n_cat)
weights, biases       = learn(train_data,
int(sys.argv),
int(sys.argv),
n_cat,
float(sys.argv),
int(sys.argv))
print(f"weights and biases:     {weights}, {biases}")
accuracy_             = accuracy(train_data, weights, biases, n_cat)
print(f"training data accuracy: {accuracy_:.2f}%")
accuracy_             = accuracy(test_data,  weights, biases, n_cat)
print(f"testing  data accuracy: {accuracy_:.2f}%")
```

Here are sample results for the MNIST dataset (Modified National Institute Of Standards And Technology dataset) available from many sources such as Kaggle:

```% ./backprop MNIST_dataset 80 2 64 10 0.001 100
weights and biases:     [array([[ 0.10866304,  0.0041912 , -0.23560872, ...,  0.03364987,
-0.19519161, -0.00068468],
[ 0.12745399,  0.12268858, -0.13698254, ...,  0.19508343,
0.20920324,  0.1970561 ],

...

-0.24605896,  0.02329749, -0.16363297, -0.24085487, -0.14819292,
-0.19237153, -0.21772553, -0.19817858,  0.50966376,  0.14384857,
0.10621777,  0.64537735,  0.77337279,  0.01737619]), array([0.03938714, 0.0574965 , 0.16544762, 0.13164358, 0.04927753,
0.12365563, 0.0401857 , 0.18105514, 0.10016533, 0.11111991])]
training data accuracy: 97.96%
testing  data accuracy: 96.59%
```

Here is a plot of the accuracy versus the number of epochs for a data split of 80 / 20, two hidden layers, 64 functions per hidden layer, 10 categories, and, a learning rate of 0.001. Blue denotes the training data accuracy and orange denotes the testing data accuracy: # Feedforward Artificial Neural Networks

## 2020-10-30

Feedforward artificial neural networks are acyclic artificial neural networks.

# Rectified Linear Unit

## 2020-10-25

The rectified linear unit is a popular artificial neural network function. It is widely used in deep learning and is also referred to as the ramp function: # Deep Learning

## 2020-10-24

Deep learning methods are artificial neural network supervised learning methods involving large numbers of compositions of functions.

# Samples, Targets, Classes And Categories

## 2020-10-21

Supervised learning inputs are also referred to as samples.
Supervised learning outputs are also referred to as targets, classes and categories.

# Old Hype

## 2020-10-18

"The Navy revealed the embryo of an electronic computer today that it expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence." — New York Times, July 8, 1958

# Perceptron Method

## 2020-10-07

The perceptron method is one of the earliest and simplest artificial neural network supervised learning methods. It involves the single function H(w · i + b) where H is the Heaviside step function and i is the input. w and b are referred to as the weights and the bias. For every input output pair (i, o), a scaled version of i is added to w, and, the scale factor of i is added to b. The scale factor for every i is γ(o - H(w · i + b)) for some small γ referred to as the learning rate.

Here is sample Python perceptron method code:

```#!/usr/bin/env python3

"""
Implements the perceptron method.

Usage:
./perceptron <data file> <data split> <learning rate> <number of epochs>

Data files must be space delimited with one input output pair per line.

initialization steps:
Input output pairs are shuffled.
Inputs             are min max normalized.
Weights            are set to random values.

Requires NumPy.
"""

import numpy
import sys

def min_max(data):
"""
Finds the min max normalizations of data.
"""

return (data - numpy.min(data)) / (numpy.max(data) - numpy.min(data))

def init_data(data_file, data_split):
"""
Creates the training and testing data.
"""

numpy.random.shuffle(data)
data[:, :-1] = min_max(data[:, :-1])
ones         = numpy.ones(data.shape)[None].T
data         = numpy.hstack((data[:, :-1], ones, data[:, -1][None].T))
data_split   = int((data_split / 100) * data.shape)

return data[:data_split, :], data[data_split:, :]

def accuracy(data, weights):
"""
Calculates the accuracies of models.
"""

model_ = model(data[:, :-1], weights)

return 100 * (model_ == data[:, -1]).astype(int).mean()

def model(inputs, weights):
"""
Finds the model results.
"""

return (numpy.matmul(inputs, weights) > 0).astype(int)

def learn(data, learn_rate, n_epochs):
"""
Learns the weights from data.
"""

weights = numpy.random.rand(data.shape - 1) / (data.shape - 1)
for i in range(n_epochs):
for e in data:
model_   = model(e[:-1], weights)
weights += learn_rate * (e[-1] - model_) * e[:-1]

return weights

train_data, test_data = init_data(sys.argv, int(sys.argv))
weights               = learn(train_data, float(sys.argv), int(sys.argv))
print(f"weights and bias:       {weights}")
print(f"training data accuracy: {accuracy(train_data, weights):.2f}%")
print(f"testing  data accuracy: {accuracy(test_data,  weights):.2f}%")
```

Here are sample results for a subset of the popular MNIST dataset (Modified National Institute Of Standards And Technology dataset) available from many sources such as Kaggle. Results denote whether the inputs correspond to the number eight or not:

```% ./perceptron MNIST_subset_dataset 80 0.000001 100
weights and bias:       [ 9.60835270e-04  7.78817831e-04  9.09208513e-04  1.23811178e-04
1.24167654e-03  6.78889421e-04  5.61003207e-04  6.95360517e-04
7.41301570e-04  7.99198618e-04  5.40027576e-04  1.53847709e-05
6.85229222e-04  7.34466515e-04  1.10555270e-03  3.54355472e-04

...

3.49190104e-04  9.08839645e-04  2.15854858e-04  7.85936614e-04
2.48270482e-04  7.91941436e-04  5.33470893e-04  4.43331643e-04
9.53736704e-04  2.42570411e-04  9.22297554e-04  9.67634113e-04
-1.70084762e-03]
training data accuracy: 89.75%
testing  data accuracy: 86.32%
```

Here is a plot of the accuracy versus the number of epochs for a data split of 80 / 20 and a learning rate of 0.000001. Blue denotes the training data accuracy and orange denotes the testing data accuracy: # Hyperparameters

## 2020-10-07

Hyperparameters specify machine learning method variations

# Artificial Neural Networks

## 2020-10-07

Artificial neural networks (ANNs) are built from functions which correspond to idealized neurons. These functions are referred to as activation functions and are organized into sets referred to as layers based on their compositions.

# Ensemble Methods

## 2020-10-05

Ensemble methods involve multiple machine learning methods.

# Min Max Normalization

## 2020-10-05

Min max normalizations transform sets of numbers into ones with the extrema zero and one. Let m and M denote the minimum and maximum of a set of numbers. The min max normalization of that set replaces every element x with (x - m) / (M - m).

# Models

## 2020-10-05

Models are function approximations created with supervised learning methods.

# Epochs

## 2020-10-05

Epochs are supervised learning steps which process all of the input output pairs.

# Classification And Regression

## 2020-10-05

Using supervised learning methods to approximate piecewise constant functions is referred to as classification. Using supervised learning methods to approximate continuous functions is referred to as regression.

# Training Data

## 2020-10-04

Training data are supervised learning input output pair sets.

# Modes

## 2020-10-04

Modes are the most frequent elements of sets.

# K Nearest Neighbors Method

## 2020-10-04

The k nearest neighbors method is one of the simplest supervised learning methods. It involves finding the most similar inputs in the set of input output pairs.

Here is sample Python code to determine the accuracy of the k nearest neighbors method on data:

```#!/usr/bin/env python3

"""
Determines the accuracy of the k nearest neighbors method on data.

Usage:
./k_nn <data file> <data split> <number of nearest neighbors>

Data files must be space delimited with one input output pair per line.

initialization steps:
Input output pairs are shuffled.
Inputs             are min max normalized.

Requires SciPy and NumPy.
"""

import scipy.stats
import numpy
import sys

def min_max(data):
"""
Finds the min max normalizations of data.
"""

return (data - numpy.min(data)) / (numpy.max(data) - numpy.min(data))

def init_data(data_file, data_split):
"""
Creates the model and testing data.
"""

numpy.random.shuffle(data)
data[:, :-1] = min_max(data[:, :-1])
data_split   = int((data_split / 100) * data.shape)

return data[:data_split, :], data[data_split:, :]

def accuracy(model_data, test_data, n_nn):
"""
Calculates the accuracies of models.
"""

model_ = model(test_data[:, :-1], model_data, n_nn)

return 100 * (model_ == test_data[:, -1]).astype(int).mean()

def model_(input_, model_data, n_nn):
"""
model helper function
"""

squares = (input_ - model_data[:, :-1]) ** 2
indices = numpy.sum(squares, 1).argsort()[:n_nn]

return scipy.stats.mode(numpy.take(model_data[:, -1], indices))

def model(inputs, model_data, n_nn):
"""
Finds the model results.
"""

return numpy.apply_along_axis(lambda e : model_(e, model_data, n_nn),
1,
inputs)

model_data, test_data = init_data(sys.argv, float(sys.argv))
n_nn                  = int(sys.argv)
print(f"testing data accuracy: {accuracy(model_data, test_data, n_nn):.2f}%")
```

Here are sample results for the popular Iris flower dataset available from many sources such as Scikit-learn:

```% ./k_nn Iris_flower_dataset 80 1
testing data accuracy: 96.67%

% ./k_nn Iris_flower_dataset 80 2
testing data accuracy: 93.33%
```

# Symbols And Minds

## 2020-10-03

Symbols are physical objects that represent mental ideas. Symbol manipulation can correspond to thinking. Therefore, computers can correspond to minds and replace humans at some mental tasks.

# Underfitting And Overfitting

## 2020-10-03

Underfitting is the creation of supervised learning continuous function approximations that are too simple. Overfitting is the creation of supervised learning continuous function approximations that are too complex. Both decrease accuracy.

# NumPy Arrays

## 2020-10-02

NumPy arrays are a fundamental data structure of machine learning with Python. Pandas, SciPy, Sklearn and many other libraries use Numpy arrays.

# Labeled

## 2020-10-02

Labeled means categorized. Supervised learning methods require labeled inputs.

# Statistics > Data Science

## 2020-09-30

Statistics is a superset of what is often referred to as data science. The field of statistics predates computers.

# Supervised Learning Methods

## 2020-09-30

Supervised learning methods automate the creation of programs that approximate functions of any number of variables. They require many input output pairs.

# The Problem With The Term Intelligence

## 2020-09-28

Intelligence does not have a rigorous definition. Rather than trying to define intelligence, Alan Turing suggested trying to create devices that act as if they have intelligence. Quality can be measured by their ability to fool people.

# Computers Are Symbol Manipulation Machines

## 2020-09-28

Computers are symbol manipulation machines.

# Machine Learning Is Automatic Programming

## 2020-09-28

Machine learning methods automatically create programs. They are useful when creating programs is inconvenient, impractical or even impossible for humans. Many inventions surpass humans in limited ways. Cars move faster than humans. Cranes lift more than humans. Calculators calculate better than humans. Now an invention surpasses humans at programming!