A B C D E F G H I K L M N O P Q R S T U V W Z

Seasonality

Seasonality is the property by which a forecasting model considers time series data to follow a recurring pattern.

Self-learning is an approach to semi-supervised learning in which a model is trained on a small portion of a labeled dataset so that it can generate pseudo-labels from the unlabeled portion, which can then be used to train an improved ML model.

sigmoid kernel

The sigmoid kernel is a kernel trick method which uses a hyperbolic tangent function (tanh) to create an equivalent of a perceptron neural network.

silhouette analysis

Silhouette analysis is a method of calculating how well a particular data example fits within a cluster as compared to its neighboring clusters.

skillful

The term skillful is used to describe an AI model which is useful for its intended task. There are degrees of skill; some models are more useful than others.

SOC

SoC stands for: System on a Chip. It is an integrated circuit design and manufacturing method in which all components of an autonomous electronic system are integrated into a single IC. Security Operations Center. It is the overall platform comprising services and tools which offer an end-to-end cybersecurity proactive protection and management solution. A SoC ... Read more

soft-margin classification

Soft-margin classification is an approach to classification with SVMs which keeps the distance between the margins as large as possible while minimizing the number of examples that end up inside the margins.

specificity

The specificity metric in classification problems is defined by the following formula. Specificity = (True Negative)/(True Negative + False Positive) Specificity is the ideal metric when we need to minimize the false positives. This acts as the opposite of the recall metric.

spectrogram

In audio data processing and analysis, spectrogram is a type of plot in which time, frequency, and amplitude of an audio signal are depicted.

Standard deviation

A standard deviation (or σ) is a statistical metric (measure) of how dispersed the data is in relation to the mean. It if frequently used in machine learning data preparation and in machine learning model training.

standardization

In machine learning, standardization is a feature engineering technique by which the dataset features are re-scaled to achieve zero-mean value (μ=0) and unit standard deviation value (σ=1). Each x value in the dataset gets a corresponding x' standardized value, which is calculated as follows. , where μ is the x variable mean and σ is ... Read more

Stationarity

Stationarity in machine learning is a property of a forecasting machine learning model by which the statistical attributes of a variable, such as mean, variance and covariance, are kept constant instead of varying over time.

stemming

Stemming in machine learning and natural language processing is the process of removing the affix of a word in order to retrieve the word stem. This is essential in order to train an ML model on a series of words belonging to a human natural language.

Stochastic

Stochastic in data science and machine learning refers to a property by which a randomly determined process cannot perfectly estimate individual events or data points but can demonstrate a general pattern common to the entire set of data. In data science and physics, the term stochastic refers to events which occur without a formally set ... Read more

stop word

A stop word in machine learning text processing refers to any word which provides no content, such as simple and common words (and, to, so, by, to, etc)

Stop word

A stop word is a word in a text document which is very common and it is therefore typically removed when the text is processed. Stop words are therefore not included in the training set of machine learning models in natural language processing scenarios.

stratified cross validation

Stratified cross validation is a data validation technique used when splitting the ML dataset into k subsets, of which k-1 subsets are used as training subsets (folds) and one (1) is used as the test subset (fold). This process is repeated k times. Stratified cross validation uses stratified sampling in the dataset, in order to ... Read more

stratified k-fold cross-validation

The stratified k-fold cross-validation is a k-fold cross-validation method in which each fold has a representative sample of data in datasets which exhibit class imbalance.

stride

Stride in Convolutional Neural Networks (CNN) is called the distance between filters in a convolution as they scan an image.

structured data

Data found in data sources (virtual machines, virtual containers, storage accounts, databases, data wareshouses, data lakes, data marts and data hubs) can be classified into three (3) major categories with regard to the level of structure they present. Unstructured data, i.e data which is in a format that makes it difficult to search, filter, or ... Read more