skillful

The term skillful is used to describe an AI model which is useful for its intended task. There are degrees of skill; some models are more useful than others.

SOC

SoC stands for: System on a Chip. It is an integrated circuit design and manufacturing method in which all components of an autonomous electronic system are integrated into a single IC. Security Operations Center. It is the overall platform comprising services and tools which offer an end-to-end cybersecurity proactive protection and management solution. A SoC ... Read more

soft-margin classification

Soft-margin classification is an approach to classification with SVMs which keeps the distance between the margins as large as possible while minimizing the number of examples that end up inside the margins.

specificity

The specificity metric in classification problems is defined by the following formula. Specificity = (True Negative)/(True Negative + False Positive) Specificity is the ideal metric when we need to minimize the false positives. This acts as the opposite of the recall metric.

spectrogram

In audio data processing and analysis, spectrogram is a type of plot in which time, frequency, and amplitude of an audio signal are depicted.

Standard deviation

A standard deviation (or σ) is a statistical metric (measure) of how dispersed the data is in relation to the mean. It if frequently used in machine learning data preparation and in machine learning model training.

standardization

In machine learning, standardization is a feature engineering technique by which the dataset features are re-scaled to achieve zero-mean value (μ=0) and unit standard deviation value (σ=1). Each x value in the dataset gets a corresponding x' standardized value, which is calculated as follows. , where μ is the x variable mean and σ is ... Read more

Stationarity

Stationarity in machine learning is a property of a forecasting machine learning model by which the statistical attributes of a variable, such as mean, variance and covariance, are kept constant instead of varying over time.

stemming

Stemming in machine learning and natural language processing is the process of removing the affix of a word in order to retrieve the word stem. This is essential in order to train an ML model on a series of words belonging to a human natural language.

Stochastic

Stochastic in data science and machine learning refers to a property by which a randomly determined process cannot perfectly estimate individual events or data points but can demonstrate a general pattern common to the entire set of data. In data science and physics, the term stochastic refers to events which occur without a formally set ... Read more

stop word

A stop word in machine learning text processing refers to any word which provides no content, such as simple and common words (and, to, so, by, to, etc)

Stop word

A stop word is a word in a text document which is very common and it is therefore typically removed when the text is processed. Stop words are therefore not included in the training set of machine learning models in natural language processing scenarios.

stratified cross validation

Stratified cross validation is a data validation technique used when splitting the ML dataset into k subsets, of which k-1 subsets are used as training subsets (folds) and one (1) is used as the test subset (fold). This process is repeated k times. Stratified cross validation uses stratified sampling in the dataset, in order to ... Read more

stratified k-fold cross-validation

The stratified k-fold cross-validation is a k-fold cross-validation method in which each fold has a representative sample of data in datasets which exhibit class imbalance.

stride

Stride in Convolutional Neural Networks (CNN) is called the distance between filters in a convolution as they scan an image.

structured data

Data found in data sources (virtual machines, virtual containers, storage accounts, databases, data wareshouses, data lakes, data marts and data hubs) can be classified into three (3) major categories with regard to the level of structure they present. Unstructured data, i.e data which is in a format that makes it difficult to search, filter, or ... Read more

supervised learning

Machine learning models and algorithms can be classified into three (3) major categories: Supervised learning, i.e. a type of machine learning in which known label values are provided as input so that a model can estimate these values in future datasets. Examples of supervised learning algorithms are regression and classification algorithms, such as linear regression ... Read more

SVM

SVM stands for Support Vector Machine. SVM is a well-known family of supervised learning non-parametric algorithms which are used in regression and classification machine learning problems, by separating data values using a hyperplane. SVM algorithms are ideal when there is presence of outliers in the model training data.

tanh function

The tanh function, also known as the hyperbolic tangent function, is an activation function in artificial neural networks whose output values are constrained between the values of −1 and 1. The following screenshot provides a graph of function f(x)=tanh(x), as output from the Geogebra free online calculator.

target function

In machine learning, the target function is a mathematical representation of the relationship between an ML model's input variables and output variables, which best approximates some desired outcome from a machine learning model.