Python’s fit(), transform(), and fit transform() Methods
Scikit-learn, sometimes known as sklearn, is perhaps one of Python’s most significant and widely used Machine Learning tools. It offers a comprehensive set of ready-to-train algorithms and modelling methodologies, as well as utilities for preprocessing, training, and grading models.
Transformer, one of the most fundamental classes in Sklearn, supports three distinct methods: fit(), transform(), and fit transform() (). We shall study the distinction between them.
is.
Introduction
Before proceeding, let’s examine the procedures used for a data science project; we would then be aware that there are particular measures required to design any data science project. We will discuss them briefly.
here:
- We evaluate the datasets using exploratory data analysis (EDA), and by doing so, we reveal their crucial significance.
- Using some domain expertise, feature engineering is the procedure of extracting features from raw data.
- Feature Selection, when we decide which features will significantly influence the model.
- Model building in this step, we build a machine learning model using the appropriate techniques.
- Implementation, where we post our machine learning model online.
If we prioritise the first three stages, data preprocessing will likely play a larger role in model creation and training. Hence, whenever we planned to create a machine learning application, it was a key step.
process.
Transformer In Sklearn
Transformers are a frequently seen item on Scikit-learn. Transformers conduct the process of feature transformation, which is part of data preprocessing; nevertheless, for model training, we need objects known as models, such as linear regression, classification, etc. StandardScaler, PCA, Imputer, MinMaxScaler, etc. are some examples of transformer-like objects used for feature selection. We utilise these tools to conduct some pre-processing on the raw data, such as modifying the input data format and scaling the features. Also, this data are utilised for model training.
We use a standardisation process that transforms a feature F into F’. Using a standardised formula for f 1, f 2, f 3, and f 4 features, f 1, f 2, f 3, and f 4 are the independent features, while f 4 is the dependent feature; these features are modified. Three separate processes provide the transformation of one input feature F into another input feature F’. These procedures
are:
- fit()
- transform()
- fit_transform()
fit() Method
With the fit() method, we apply the required formula to the input data feature we want to modify and calculate the result prior to fitting the result to the transformer. The.fit() function must be called after the transformer object.
If the StandardScaler object sc is constructed, the.fit() function calculates the mean () and standard deviation () of the specified feature F. These parameters may be used later for analysis.
Using the StandardScaler pre-processing transformer as an example, let’s imagine we need to scale the attributes of self-created data. The training and testing datasets are built using the arrange technique and then applied to the example dataset in the following code. Next, we establish a StandardScaler instance and fit the training data features to it to calculate the mean and standard deviation to be used for future scaling.
Before using any pre-processing procedure, such as scaling, it is essential to divide the dataset into train and test datasets. Test data points reflect data from the actual world. Consequently, we must only call fit() on the training feature to avoid our model from receiving further data. Code
-
# Python program to show how to use the fit() method of the Transformer class of scikit-learn.
-
# We will use the fit() method with the feature scaling tool known as StandardScaler. This tool is used for scaling the features using standardization.
-
-
# Importing the required modules
-
import
numpy as np
-
from
sklearn.model_selection
import
train_test_split
-
from
sklearn.preprocessing
import
StandardScaler
-
-
# Creating a random dataset with features X and y
-
X, Y = np.arange(
20
).reshape((
10
,
2
)), range(
10
)
-
-
# Segregating data into training and testing datasets
-
X_train, X_test, Y_train, Y_test = train_test_split( X, Y, test_size =
0.30
, random_state =
1
)
-
-
# Printing the training dataset
-
print
(
“Training dataset: \n”
, X_train)
-
-
# Printing the testing dataset
-
print
(
“Testing dataset: \n”
, X_test)
-
-
# Calculating the standardizing parameters that are the mean and standard deviation of the X_train dataset.
-
standard_scaler = StandardScaler()
-
standard_scaler.fit(X_train)
-
print
(
” Prameters of the fit method: \n”
, standard_scaler.get_params())
Output:
Training dataset: [[ 8 9] [ 0 1] [ 6 7] [ 2 3] [14 15] [16 17] [10 11]] Testing dataset: [[ 4 5] [18 19] [12 13]] Parameters of the fit method: {'copy': True, 'with_mean': True, 'with_std': True}
transform() Method
To modify the data, we will most likely use the transform() method, which applies the fit() algorithm to each value in feature F. We alter the calculations for fit. So, we must utilize.transform() after applying the fit object.
When creating an object using the fit technique, we use the preceding example and position the object in front of the.
With the transform and fit transform methods, the scale of the data points is altered, and the result is always a sparse matrix or array. Code
-
# Python program to show how to use the transform() method of the Transformer class of scikit-learn.
-
# We will use the transform() method with the feature scaling tool known as StandardScaler.
-
-
# Importing the required modules,
-
import
numpy as np
-
from
sklearn.model_selection
import
train_test_split
-
from
sklearn.preprocessing
import
StandardScaler
-
-
# Createing a random dataset with features X and y
-
X, Y = np.arange(
20
).reshape((
10
,
2
)), range(
10
)
-
-
# Segregating data into training and testing datasets
-
X_train, X_test, Y_train, Y_test = train_test_split( X, Y, test_size =
0.30
, random_state =
1
)
-
-
# Printing original X_train
-
print
(X_train)
-
# Calculating the standardizing parameters and transforming the dataset.
-
standard_scaler = StandardScaler()
-
fitted = standard_scaler.fit(X_train)
-
X_train = fitted.transform(X_train)
-
-
# Printing X_train after transforming data
-
print
(X_train)
Output:
[[ 8 9] [ 0 1] [ 6 7] [ 2 3] [14 15] [16 17] [10 11]] [[ 0. 0. ] [-1.46759877 -1.46759877] [-0.36689969 -0.36689969] [-1.10069908 -1.10069908] [ 1.10069908 1.10069908] [ 1.46759877 1.46759877] [ 0.36689969 0.36689969]]
fit_transform() Method
Fit transform() is applied to the training data to establish the scaling parameters before scaling the training data. In this instance, the model we developed will determine the mean and variance of the training set’s properties.
Using the fit method, we determine the mean and standard deviation of every characteristic in our data. The transform technique converts all characteristics based on their respective means and standard deviations.
We would want scaling to be applied to our test data, but we do not intend for our model to be biassed. We anticipate that our test batch of data will be completely new and unanticipated for the model. In this circumstance, the convert strategy is beneficial. Code
-
# Python program to show how to use the fit_transform() method of the Transformer class of scikit-learn.
-
# We will use fit_transform() method with the feature scaling tool known as StandardScaler.
-
-
# Importing the required modules
-
import
numpy as np
-
from
sklearn.model_selection
import
train_test_split
-
from
sklearn.preprocessing
import
StandardScaler
-
-
# Createing a random dataset with features X and y
-
X, Y = np.arange(
20
).reshape((
10
,
2
)), range(
10
)
-
-
# Segregating data into training and testing datasets
-
X_train, X_test, Y_train, Y_test = train_test_split( X, Y, test_size =
0.30
, random_state =
1
)
-
-
# Printing original X_train
-
print
(X_train)
-
# Directly transforming the X_train dataset.
-
standard_scaler = StandardScaler()
-
X_train = standard_scaler.fit_transform(X_train)
-
-
# Printing X_train after transforming data
-
print
(X_train)
Output:
[[ 8 9] [ 0 1] [ 6 7] [ 2 3] [14 15] [16 17] [10 11]] [[ 0. 0. ] [-1.46759877 -1.46759877] [-0.36689969 -0.36689969] [-1.10069908 -1.10069908] [ 1.10069908 1.10069908] [ 1.46759877 1.46759877] [ 0.36689969 0.36689969]]