Data Normalization Techniques in Machine Learning with Examples

Nikhil Joshi
7 min readMay 6, 2024

--

If you are reading this article, then I suppose you are new in field of ML or just brushing up your knowledge. This article help you understand different Data Normalization techniques and its uses In the realm of machine learning, data normalization stands as a cornerstone technique, serving to enhance the efficiency and accuracy of predictive models. It’s akin to preparing ingredients before cooking, ensuring that each component contributes harmoniously to the final dish.

First what is Data Normalization?

Normalization transforms data columns into a uniform scale, aligning their ranges and distributions. This process is imperative when dealing with disparate features, as it promotes fair treatment and equitable influence across all aspects of the dataset. In this column, we’ll delve into the various normalization techniques employed in machine learning, elucidating their significance through illustrative examples. Through understanding these techniques, practitioners can harness the full potential of their data and unlock insights that drive transformative outcomes.

Why Data Normalization Necessary?

It very clears from intro that through data normalization, the information is made consistent and brought together in a similar format so that it’s easier to interpret and use.

Let’s understand this with an example of housing price prediction: Housing price is based on various features such as square footage, number of bedrooms, and distance to the supermarket, etc. The dataset contains diverse features with varying scales, such as:

· Square footage: Ranges from 500 to 5000 square feet

· Number of bedrooms: Ranges from 1 to 5

· Distance to supermarket: Ranges from 0.1 to 10 miles

ML algorithm might give more weight to features with large scales, such as square footage. During training, the algorithm assumes that a change in total square footage will have a significant impact on housing prices. The algorithm might overlook the nuances of features that are relatively small, such as the number of bedrooms and distance to the supermarket.

Now let’s dive deeper and discuss few technical aspects that makes it necessary step in pre-processing. Data normalization is essential in machine learning for several reasons:

Equal Treatment of Features: When features in a dataset have different scales and ranges, some features may dominate the learning process simply because they have larger values as explained through above example. Normalization ensures that all features are treated equally by putting them on same scale, preventing bias toward features with larger magnitudes.

Improved Convergence: Not all but many machine learning algorithms, such as gradient descent-based optimization algorithms, converge faster when features are on similar scales. Normalizing the data can speed up the convergence process and reduce training time.

Enhanced Model Performance: Normalization can lead to better model performance by improving the stability and robustness of the learning process.

Better Handling of Outliers: Some normalization techniques, such as robust scaling which we are going to discuss in detail later, are robust to outliers in the data. Normalizing the data can help mitigate the impact of outliers and prevent them from unduly influencing the learning process.

To improve the interpretability of the data: Normalization can make the data more interpretable and easier to understand. By putting all features on the same scale, it can be easier to see the relationships between different variables and make meaningful comparisons.

Normalization Techniques in Machine Learning

Each normalization technique has its advantages and is chosen based on the specific characteristics of the dataset and the requirements of the machine learning algorithm being used. Now we will discuss what are the different techniques and understand when to use them with help of examples:

Min-Max Scaling:

Min-max scaling is very often simply called ‘normalization.’ This technique scales the data to a fixed range, usually between 0 and 1.

Min-Max Scaling formula

Min-Max Scaling is sensitive to outliers and the range of the data. It is suitable for algorithms where the absolute values and their relations are important (e.g., k-nearest neighbors, neural networks). It can lead to faster convergence, especially in algorithms that rely on gradient descent. Min-max scaling is a good choice when the approximate upper and lower bounds of the dataset is known, and the dataset has few or no outliers. Also, when maintaining the distribution’s original shape is essential.

Example: Suppose you have a dataset of temperatures ranging from 20°C to 40°C. After applying Min-Max scaling, 20°C becomes 0 and 40°C becomes 1.

Standardization (Z-score Normalization)

Standardization assumes a Gaussian distribution of the data and transforms features to have around mean with a unit standard deviation. It means if we will calculate mean (µ) and standard deviation (σ) of standard scores it will be 0 and 1 respectively.

Standardization formula

In contrast to Min-Max, standardization is less sensitive to outliers due to the use of the mean and standard deviation. It is effective when algorithms assume a standard normal distribution, particularly useful for algorithms that assume normally distributed data, such as linear regression and support vector machines.

Example: If you have a dataset of exam scores with a mean of 75 and a standard deviation of 10, after applying standardization, a score of 85 becomes 1 as {(85–75)/10 = 1}.

Robust Scaling:

This technique scales the data based on percentiles and is robust to outliers. Standardization can become skewed or biased if the input variable contains outlier values. To overcome this, the median and interquartile range (IQR) can be used when standardizing numerical input variables, generally referred to as robust scaling.

Robust Scaling formula

Since robust scaling is resilient to the influence of outliers, this makes it suitable for datasets with skewed or anomalous values.

Example: In a dataset of salaries where the median salary is €50,000, and the interquartile range (IQR) is €20,000, a salary of €60,000 becomes 0.5.

Min-Max scaling, Z-score normalization (standardization), and Robust scaling are most common normalization techniques used ML. Each normalization technique has its advantages and is chosen based on the specific characteristics of the dataset and the requirements of the machine learning algorithm being used. Let’s discuss two more normalization techniques that are used rarely in realm of ML, also when and how they can be used with help of example.

Decimal scaling Normalization:

Decimal scaling normalization is a technique that involves shifting the decimal point of the values in each feature to make them fall within a specific range. The objective of decimal scaling normalization is to scale the feature values by a power of 10, ensuring that the largest absolute value in each feature becomes less than 1. The scaling factor used is determined by the maximum absolute value in each feature.

Decimal Scaling formula

Where X is the original feature value, and d is the smallest integer such that the largest absolute value in the feature becomes less than 1.

It is useful when the range of values in a dataset is known, but the range varies across features. Decimal scaling normalization is advantageous when dealing with datasets where the absolute magnitude of values matters more than their specific scale.

Example: Suppose we have a dataset containing two features: age and income. The age values range from 20 to 50, while the income values range from $20,000 to $80,000. both features have a maximum absolute value of 80,000, so the scaling factor d is 6. If a person’s age is 30 and their income is $60,000 then:

· Normalized age: 0.00003

· Normalized income:0.6

Log Scaling:

Log scaling normalization converts data into a logarithmic scale, by taking the log of each data point. This procedure is useful when managing with information that incorporates a wide extend of values, because it makes a difference to decrease the variety in the information.

Log Scaling formula

This technique is additionally valuable when managing with information that has outliers, because it makes a difference to decrease their impact on the information. This normalization comes in handy with data that follows an exponential growth or decay pattern. It compresses the scale of the dataset, making it easier for models to capture patterns and relationships in the data.

Example: Population size over the years is a good example of a dataset where some features exhibit exponential growth. Log scaling normalization can make these features more amenable to modelling.

We’ve covered some of the most beneficial normalization techniques, but there’s a plethora of others to consider. Choosing the right one requires a deep understanding of the dataset. During data preprocessing, it’s crucial to experiment with various normalization methods and evaluate their impact on model performance. Through experimentation, you can observe how each method influences the learning process.

Normalizing data can be challenging, especially with sparse data where many feature values are zero. Directly applying standard normalization techniques may lead to unintended consequences. Fortunately, there are variations of the discussed normalization techniques tailored for sparse data, such as ‘sparse min-max scaling.’

Scikit-learn is a versatile Python library designed to simplify machine learning complexities. It offers a comprehensive suite of tools for data preprocessing, feature selection, dimensionality reduction, model building, training, evaluation, hyperparameter tuning, serialization, pipeline construction, and more. Its modular architecture encourages exploration and experimentation, enabling users to seamlessly transition from basic concepts to advanced methodologies.

Throughout this article, we’ve explored min-max scaling, Z-normalization, robust normalization, decimal scaling, and log scaling normalization techniques. Each method has demonstrated its unique strengths and applicability. We hope this article has provided insight into normalization techniques.

--

--

Nikhil Joshi
Nikhil Joshi

Written by Nikhil Joshi

Engineer at Mercedes Benz, Transforming Industry

No responses yet