cross validation

In machine learning (ML), cross validation is a method in which the data scientists perform an evaluation of an ML model's performance on unlabelled data, i.e. data which the ML model has not seen before. In the method of cross validation, the data which is available in the dataset is split into multiple subsets. One ... Read more

data lineage

Data lineage is part of responsible AI and it refers to all procedures related to tracking the origin, transformation, and usage of data throughout its lifecycle in machine learning. This ensures a Privacy by Design ML model.

design patterns

Design patterns are patterns which are applied in the design of software and/or infrastructure systems, aiming to optimize the systems' functional (operations and application or infrastructure features and services) and non-functional characteristics (e.g. performance, reliability, redundancy, security). Design patterns can be divided in the following major categories: Software design patterns. This category includes devops design ... Read more

Dioptra

Dioptra is a software test platform for assessing the trustworthy characteristics of artificial intelligence (AI). Dioptra excels in the area of Adversarial Machine Learning (AML). More details can be found in the official project web site at: https://pages.nist.gov/dioptra/index.html.

DR

1) In machine learning, DR stands for dimensionality reduction. In machine learning, dimensionality reduction is a feature engineering technique, in which a large number of features in a dataset is reduced to a smaller number of features. It is important to ensure that the remaining features are meaningful and representative for the dataset and that ... Read more

entanglement

Entanglement in machine learning (ML) pipelines refers to the fact that when a change is made in one of the ML pipeline steps, then other steps are affected by the change. For example, a fundamental change in the data preparation and feature engineering phases of the ML pipeline can have a drastic effect in subsequent ... Read more

ETL

ETL stands for extract, transform, and load. It refers to a data science and machine learning procedure, in which ML model data is being collected (extracted) from data sources, then data is transformed and finally loaded into an ML model.

F1 score

F1 score is the weighted average (harmonic mean) of precision and recall. The F1 score is calculated by the following formula: F1 score = (2 x Precision x Recall)/(Precision + Recall) The more the precision and recall metrics deviate from each other, the worse their harmonic mean (i.e. the F1 score).

fitting

Fitting or training in machine learning is the process by which a model learns from input data. Fitting is another word for training an ML model. Besides the ideal best fit or good fit, a model can get overfitted when overfitting occurs or it can get underfitted when underfitting occurs.

GAN

GAN stands for generative adversarial network. It is a type of a CNN neural network architecture in which two types of CNN neural networks compete against each other. In a GAN network architecture there are typically one or more generator neural networks and one ore more discriminator neural networks. The generator network performs continuous iterations, ... Read more

good fit

A "good fit" or "best fit" or "sweet spot" is when a machine learning (ML) model can predict values for a system with the minimum error, ideally that error being zero. In this case, the ML model is said to have a good fit on the data. The good fit sits between the underfitting and ... Read more

GPU

Graphics Processing Units (GPUs) have been traditionally used in computing systems for image processing (graphics cards). Their usage has largely been expanded to AI and ML tasks and are largely used for optimizing performance of neural networks.

gradient descent

Gradient descent is a method of minimizing the machine learning model cost function in linear regression models. In the gradient descent method, the (internal) parameters of the ML model are tuned over several training iterations by taking gradual "steps" down a slope in the function graph, aiming towards a minimum error value. In gradient descent, ... Read more

ground truth

Ground truth is a term commonly used in statistics and machine learning. It signifies the correct or “true” answer to a specific problem or question. Each ML model makes predictions to values or boolean classifications about a problem. Each time the ML model's prediction or classification is compared to the ground truth, which is what ... Read more

harms modeling

Harms modeling is part of responsible AI and it is a method which can identify and mitigate risks or harms which may be caused by deploying AI solutions in production.

holdout

In machine learning, holdout validation is a data sampling method in which the dataset is split into two: the training dataset and the test. The split is equal, i.e. training is performed on the 50% of the dataset and testing is performed on the remaining 50% of the dataset. Holdout validation is not recommended in ... Read more

hyperparameter

In machine learning, a hyperparameter is a parameter external to the ML model, which controls the learning process. A hyperparameter is not related to the internal workings of the ML model but rather indirectly affects the model's internal parameters via the ML model training (fitting) process. The usage of hyperparameter types vary depending on the ... Read more

incident database

In machine learning systems an incident database which collects and stores information about incidents where AI systems have caused or contributed to negative outcomes or harms, such as accidents, errors, biases, discriminations, or violations.

k-fold cross validation

In machine learning, k-fold cross validation is a type of cross validation in which the dataset is split into k subsets and the validation process is repeated k times. Each time k-1 subsets are used for training the ML model, while one (1) subset is used for testing/validation. The final validation generalization capability/performance of the ... Read more

Keras

Keras is one of the most recognized deep learning APIs. Keras has been developed in the Python programming language and can be executed on top of JAX, TensorFlow, or PyTorch. More details about Keras can be found in its official website at: https://keras.io/about/.