5. Common Terminology in AI: Decoding the Jargon

May 28, 2023309

AI Glossary

Each term is defined clearly with examples and applications provided to give you a better understanding of how they work in practice.

Whether you’re just starting out or looking to expand your knowledge in the field of AI, this glossary will be an invaluable resource to help you navigate through the complex terminology and concepts associated with artificial intelligence.

Activation Function – In neural networks, a function defines the output of a neuron in terms of its inputs, introducing non-linearity into the model.
Adversarial Attack – A technique used to deceive machine learning models by providing input data that are designed to cause the model to make a mistake.
Algorithm – A set of rules or instructions that are followed to perform a task or solve a problem.
Anomaly Detection – The identification of rare items, events, or observations which raise suspicions by differing significantly from the majority of the data.
Artificial Intelligence (AI) – The simulation of human intelligence processes by machines, especially computer systems.
Artificial Neural Network (ANN) – A computational model based on the structure and functions of biological neural networks, used for approximating functions that can depend on a large number of inputs.
Attention Mechanism – A component of neural networks that allows them to focus on specific parts of the input data for making predictions.
Autoencoder – A type of artificial neural network used to learn efficient data codings in an unsupervised manner.
Backpropagation – The primary algorithm for performing gradient descent on neural networks, used to minimize the error in the network’s predictions.
Bag of Words (BoW) – A representation of text that describes the occurrence of words within a document, disregarding grammar and word order.
Batch Normalization – A technique to improve the training of deep neural networks by normalizing the inputs of each layer so that they have a mean output activation of zero and a standard deviation of one.
Bayesian Network – A statistical model that represents a set of variables and their conditional dependencies via a directed acyclic graph.
Beam Search – An algorithm used in many NLP and speech recognition models for efficiently searching through a large space of potential solutions.
Bias – The simplifying assumptions made by the model to make the target function easier to approximate.
Bias-Variance Tradeoff – The balance between the error due to bias (underfitting) and the error due to variance (overfitting).
Big Data – Extremely large datasets that may be analyzed computationally to reveal patterns, trends, and associations.
Capsule Networks (CapsNet) – A type of artificial neural network designed to improve the capabilities of neural networks in recognizing patterns in complex data, such as image recognition.
Chatbot – A software application used to conduct an online chat conversation via text or text-to-speech, in lieu of providing direct contact with a live human agent.
Classification – A type of supervised learning in which the goal is to predict the categorical class labels of new instances, based on past observations.
Clustering – The task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups.
Confusion Matrix – A table used to evaluate the performance of a classification algorithm, showing the true and false positives and negatives.
Convolutional Neural Network (CNN) – A class of deep neural networks often used in image processing, where neurons are arranged in a more hierarchical and overlapping fashion.
Cross-Validation – A technique for assessing how a statistical analysis generalizes to an independent dataset, commonly used in machine learning to validate models.
Data Augmentation – The technique of increasing the amount of training data by applying transformations such as rotations, scaling, flipping, etc., to create new training examples.
Data Mining – The process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.
Data Wrangling – The process of cleaning, structuring, and enriching raw data into a desired format for better decision-making in less time. Often used in the context of preparing data for machine learning.
Dataset – A collection of data, usually presented in tabular form, used to train or evaluate machine learning models.
Deep Learning – A subset of machine learning that is based on neural networks containing more than one hidden layer.
Denoising Autoencoder – A type of autoencoder designed to learn efficient data representations by reconstructing the input from a corrupted version, thereby denoising the input.
Decision Tree – A decision support tool that uses a tree-like model of decisions and their possible consequences.
Dimensionality Reduction – The process of reducing the number of random variables under consideration, by obtaining a set of principal variables.
Discriminative Models – A model that is used in classification, focusing on determining the boundaries between different classes in the dataset.
Embedding Layer – A layer in neural networks that learns to map the input data into a higher-dimensional space, often used for reducing the dimensionality of text data.
Ensemble Methods – Machine learning techniques that combine several base models in order to produce one optimal predictive model.
Evolutionary Algorithms – A subset of evolutionary computation, a generic population-based metaheuristic optimization algorithm used for finding approximate solutions to optimization and search problems.
Exploratory Data Analysis (EDA) – An approach to analyzing datasets to summarize their main characteristics, often with visual methods.
F1 Score – The harmonic mean of precision and recall and is used as a metric for the accuracy of a classification system.
Feature – An individual measurable property or characteristic of a phenomenon being observed.
Feature Engineering – The process of using domain knowledge of the data to create features that make machine learning algorithms work.
Feature Scaling – A method used to standardize the range of independent variables or features of data.
Federated Learning – A machine learning approach where a model is trained across multiple decentralized devices or servers holding local data samples, without exchanging the data itself.
GANs (Generative Adversarial Networks) – A class of machine learning systems where two neural networks, a generator and a discriminator, contest with each other to create new, synthetic instances of data that can pass for real data.
Genetic Algorithm – A search heuristic in computing used to find approximate solutions to optimization and search problems by mimicking the process of natural selection.
Generative Models – A type of model in machine learning that is capable of generating new data that is similar in statistics to the training set.
Graph Neural Networks (GNN) – Neural networks that are specialized in processing data structured as graphs by using their network structure.
Grid Search – A traditional way to perform hyperparameter optimization, which is simply an exhaustive search through a manually specified subset of the hyperparameter space of a learning algorithm.
Hyperparameter – A parameter whose value is set before the learning process begins, and determines how a machine learning model is trained.
Hyperparameter Optimization – The process of finding the optimal set of hyperparameters (configuration) that produces the most accurate predictions for a given machine learning algorithm.
Imputation – The process of replacing missing data with substituted values.
K-Nearest Neighbors (KNN) – A simple algorithm that stores all available cases and classifies new cases by a majority vote of its k neighbours.
Kernel Methods – A class of algorithms for pattern analysis, whose best-known element is the support vector machine (SVM).
Latent Dirichlet Allocation (LDA) – A generative statistical model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar.
Linear Regression – A linear approach to modelling the relationship between a dependent variable and one or more independent variables.
Logistic Regression – A regression analysis that is used to predict the probability of a categorical dependent variable.
Long Short-Term Memory (LSTM) – A special kind of RNN, capable of learning long-term dependencies, often used in sequence prediction problems.
Loss Function – A method of evaluating how well a specific algorithm models the given data.
Machine Learning (ML) – The scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions.
Markov Decision Process (MDP) – A mathematical framework for modelling decision-making in situations where outcomes are partly random and partly under the control of a decision-maker.
Multilabel Classification – A classification problem where multiple target labels can be assigned to each observation.
Multitask Learning – A type of machine learning where one model is trained to solve multiple tasks simultaneously, often improving the model’s performance on each task.
Naive Bayes – A classification technique based on Bayes’ Theorem with an assumption of independence among predictors.
Natural Language Processing (NLP) – A field of AI that gives machines the ability to read, understand, and derive meaning from human languages.
Neuron – A single node in a neural network, typically taking in multiple input values and producing one output value.
Normal Distribution (Gaussian distribution) – A type of continuous probability distribution for a real-valued random variable.
Object Detection – A computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class in digital images and videos.
One-Hot Encoding – A process by which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction.
Outliers – Data points that are significantly different from other observations and may arise due to variability in the data or errors.
Overfitting – A modelling error that occurs when a function is too closely fit to a limited set of data points.
Perceptron – A type of artificial neuron which takes several binary inputs and produces a single binary output.
Precision and Recall – Precision is the number of true positives divided by the number of true positives plus the number of false positives, while recall is the number of true positives divided by the number of true positives plus the number of false negatives.
Principal Component Analysis (PCA) – A technique used to emphasize variation and bring out strong patterns in a dataset, often used to make data easy to explore and visualize.
Q-Learning – A reinforcement learning algorithm which seeks to learn the best action to take, given a certain state.
Random Forest – A meta-estimator that fits a number of decision trees on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.
Recurrent NeuralNetwork (RNN) – A class of artificial neural networks where connections between nodes form a directed graph along a sequence, allowing them to maintain a state across inputs.
Reinforcement Learning – A type of machine learning where an agent learns to behave in an environment by performing actions and observing the rewards of those actions.
Regularization – A technique used to prevent overfitting in a machine learning model by adding additional information or constraints to the model.
Relevance Vector Machine (RVM) – A machine learning model similar to support vector machines, but uses a different approach that provides probabilistic classification.
Root Mean Square Error (RMSE) – A measure of the differences between the values predicted by a model and the actual values.
Scaling – The process of increasing or decreasing the magnitude of data in a dataset.
Sequential Pattern Mining – A data mining technique used to identify patterns in a sequence of events, such as user behaviour in weblogs or sequences of genes in bioinformatics.
Spatial Transformer Networks (STN) – A type of neural network module that allows the spatial manipulation of data within the network, making the network invariant to scale, rotation, and other transformations.
Stochastic Gradient Descent (SGD) – A version of the gradient descent optimization algorithm used to minimize an objective function that is written as a sum of differentiable functions.
Support Vector Machine (SVM) – A classification algorithm that seeks to maximize the margin between two classes by choosing the optimal hyperplane that maximally separates the data points from each class.
T-SNE (t-Distributed Stochastic Neighbor Embedding) – A dimensionality reduction technique particularly well-suited for the visualization of high-dimensional datasets.
TensorFlow – An open-source software library developed by Google for dataflow programming, particularly well-suited for deep learning applications.
Time Series Analysis – A statistical technique that deals with time-series data, or trend analysis.
Transfer Learning – A research problem in machine learning that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem.
Underfitting – A modelling error that occurs when a function is too simple to accurately represent the data.
Unsupervised Learning – A type of machine learning algorithm used to draw inferences from datasets consisting of input data without labelled responses.
Variational Autoencoders (VAE) – A type of autoencoder that aims at learning a probabilistic mapping between the data space and the latent space, which is useful for generating new data that’s similar to the training data.
Word Embeddings – A representation of text where words that have a similar meaning have a similar representation.
Word2Vec – A group of related models that are used to produce word embeddings.
XGBoost – An open-source software library which provides a gradient-boosting framework for languages such as C++, Python, R, and Java.
Zero-Shot Learning – A type of machine learning where the model is able to correctly make predictions on data that it has not seen during training, i.e., unseen classes.
Z-Score – The number of standard deviations from the mean a data point is. It’s used to find outliers and for normalization.
k-Fold Cross-Validation – A cross-validation technique where the original sample is randomly partitioned into k equal-sized subsamples, and a single subsample is retained as the validation data for testing the model, while the remaining k-1 subsamples are used as training data.