Convert the Column Type in the Pandas DataFrame from String to Datetime Format.
When working with data in Python’s Pandas DataFrame, it is not uncommon for us to come across time series data. In Python, the powerful tool known as Panday is able to manage time-series data, and in the dataset that has been provided to us, it is possible that we will need to transform the string into Datetime format.
We are going to learn how to convert the string column of a DataFrame into the datetime format known as “dd/mm/yy” in the following lesson. If the dates are not in the necessary format, the user will be unable to perform any operations that are dependent on time series on those dates. In order to address this issue, we will need to transform the dates into the appropriate date-time format.
format.
Different Approaches for Converting Datatype Format in Python:
In this part, we will go through the many strategies that can be utilized in order to convert the datatype of a Pandas DataFrame column from string to another type.
datetime:
Approach 1: Using pandas.to_datetime() Function
In this method, the datatype that is contained within the Pandas DataFrame column will be converted by utilizing the “pandas.to_datetime()” function. Example:
-
import
pandas as pnd
-
-
# Creating the dataframe
-
data_frame = pnd.DataFrame({
‘Date’
:[
’12/05/2021′
,
’11/21/2018′
,
’01/12/2020′
],
-
‘Event’
:[
‘Music- Dance’
,
‘Poetry- Songs’
,
‘Theatre- Drama’
],
-
‘Cost’
:[
15400
,
7000
,
25000
]})
-
-
# Print the dataframe
-
print (
“The data is: ”
)
-
print (data_frame)
-
-
# Here, we are checking the data type of the
‘Date’
column
-
data_frame.info()
Output:
The data is: Date Event Cost 0 12/05/2021 Music- Dance 15400 1 11/21/2018 Poetry- Songs 7000 2 01/12/2020 Theatre- Drama 25000RangeIndex: 3 entries, 0 to 2 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date 3 non-null object 1 Event 3 non-null object 2 Cost 3 non-null int64 dtypes: int64(1), object(2) memory usage: 200.0+ bytes
When we look at the output, we can see that the “Date” column in the DataFrame has “object” as its Datatype. This indicates that the column contains strings as its values. Now that we have the Datatype loaded, we will use the “pnd.to_datetime()” function to transform it into datetime format.
function:
-
import
pandas as pnd
-
-
# Creating the dataframe
-
data_frame = pnd.DataFrame({
‘Date’
:[
’12/05/2021′
,
’11/21/2018′
,
’01/12/2020′
],
-
‘Event’
:[
‘Music- Dance’
,
‘Poetry- Songs’
,
‘Theatre- Drama’
],
-
‘Cost’
:[
15400
,
7000
,
25000
]})
-
-
# Print the dataframe
-
print (
“The data is: ”
)
-
print (data_frame)
-
-
# For converting the
‘Date’
column of DataFrame into datetime format
-
data_frame[
‘Date’
] = pnd.to_datetime(data_frame[
‘Date’
])
-
-
# Here, we are checking the data type of the
‘Date’
column
-
data_frame.info()
Output:
The data is:
Date Event Cost
0 12/05/2021 Music- Dance 15400
1 11/21/2018 Poetry- Songs 7000
2 01/12/2020 Theatre- Drama 25000
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 3 non-null datetime64[ns]
1 Event 3 non-null object
2 Cost 3 non-null int64
dtypes: datetime64[ns](1), int64(1), object(1)
memory usage: 200.0+ bytes
Now that we have arrived at this point, we are able to observe that the “Data” column in the DataFrame has been converted to use the datetime format.
format.
Approach 2: Using DataFrame.astype() Function.
In this method, we will transform the datatype in the Pandas DataFrame column by using the “DataFrame.astype()” function. Example:
-
import
pandas as pnd
-
-
# Creating the dataframe
-
data_frame = pnd.DataFrame({
‘Date’
:[
’12/05/2021′
,
’11/21/2018′
,
’01/12/2020′
],
-
‘Event’
:[
‘Music- Dance’
,
‘Poetry- Songs’
,
‘Theatre- Drama’
],
-
‘Cost’
:[
15400
,
7000
,
25000
]})
-
-
# Print the dataframe
-
print (
“The data is: ”
)
-
print (data_frame)
-
-
# Here, we are checking the data type of the
‘Date’
column
-
data_frame.info()
Output:
The data is:
Date Event Cost
0 12/05/2021 Music- Dance 15400
1 11/21/2018 Poetry- Songs 7000
2 01/12/2020 Theatre- Drama 25000
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 3 non-null object
1 Event 3 non-null object
2 Cost 3 non-null int64
dtypes: int64(1), object(2)
memory usage: 200.0+ bytes
When we look at the output, we can see that the “Date” column in the DataFrame has “object” as its Datatype. This indicates that the column contains strings as its values. Now that we have the datatype, we will use the “Data_Frame.astype()” function to transform it into datetime format.
function:
-
import
pandas as pnd
-
-
# Creating the dataframe
-
data_frame = pnd.DataFrame({
‘Date’
:[
’12/05/2021′
,
’11/21/2018′
,
’01/12/2020′
],
-
‘Event’
:[
‘Music- Dance’
,
‘Poetry- Songs’
,
‘Theatre- Drama’
],
-
‘Cost’
:[
15400
,
7000
,
25000
]})
-
-
# Print the dataframe
-
print (
“The data is: ”
)
-
print (data_frame)
-
# For converting the
‘Date’
column of DataFrame into datetime format
-
data_frame[
‘Date’
] = data_frame[
‘Date’
].astype(
‘datetime64[ns]’
)
-
-
# Here, we are checking the data type of the
‘Date’
column
-
data_frame.info()
Output:
The data is:
Date Event Cost
0 12/05/2021 Music- Dance 15400
1 11/21/2018 Poetry- Songs 7000
2 01/12/2020 Theatre- Drama 25000
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 3 non-null datetime64[ns]
1 Event 3 non-null object
2 Cost 3 non-null int64
dtypes: datetime64[ns](1), int64(1), object(1)
memory usage: 200.0+ bytes
Now that we have examined the DataFrame, we can see that the format of the “Data” column has been converted to the datetime format. This was accomplished by utilizing the
data_frame[‘Date’].astype(‘datetime64[ns]’.
Approach 3:
Let’s say we have a date in the DataFrame column that is formatted as “yymmdd,” and we need to convert it from a string to a datetime format. Example:
-
import
pandas as pnd
-
-
# Now, we will initialize the nested list with Dataset
-
play_list = [[
‘210302’
,
67000
], [
‘210901’
,
62000
], [
‘210706’
,
61900
],
-
[
‘210402’
,
59000
], [
‘210802’
,
74000
],
-
[
‘210804’
,
54050
], [
‘210109’
,
57650
], [
‘210509’
,
67300
], [
‘210209’
,
76600
]]
-
-
# Creating a pandas DataFrame
-
data_frame = pnd.DataFrame(play_list,columns = [
‘Date’
,
‘Patient Number’
])
-
-
# Print the dataframe
-
print (
“The data is: ”
)
-
print (data_frame)
-
-
# Here, we are checking the data type of the
‘Date’
column
-
print (data_frame.dtypes)
Output:
The data is:
Date Patient Number
0 210302 67000
1 210901 62000
2 210706 61900
3 210402 59000
4 210802 74000
5 210804 54050
6 210109 57650
7 210509 67300
8 210209 76600
Date object
Patient Number int64
dtype: object
Here, in the output, we can see that the Datatype of the “Date” column in the DataFrame is “object,” which indicates that it is string. This can be seen because the “Date” column is located here. Now that we have the datatype, we need to convert it into datetime format by using the expression “data_frame[‘Date’] = pnd.to_datetime(data_frame[‘Date], format = ‘%y%m%d’)”.
function.
-
import
pandas as pnd
-
-
# Now, we will initialize the nested list with Dataset
-
play_list = [[
‘210302’
,
67000
], [
‘210901’
,
62000
], [
‘210706’
,
61900
],
-
[
‘210402’
,
59000
], [
‘210802’
,
74000
],
-
[
‘210804’
,
54050
], [
‘210109’
,
57650
], [
‘210509’
,
67300
], [
‘210209’
,
76600
]]
-
-
# creating a pandas dataframe
-
data_frame = pnd.DataFrame(play_list,columns = [
‘Date’
,
‘Patient Number’
])
-
-
# Print the dataframe
-
print (
“The data is: ”
)
-
print (data_frame)
-
-
# For converting the
‘Date’
column of DataFrame into datetime format
-
data_frame[
‘Date’
] = pnd.to_datetime(data_frame[
‘Date’
], format =
‘%y%m%d’
)
-
-
# Here, we are checking the data type of the
‘Date’
column
-
print (data_frame.dtypes)
Output:
The data is:
Date Patient Number
0 210302 67000
1 210901 62000
2 210706 61900
3 210402 59000
4 210802 74000
5 210804 54050
6 210109 57650
7 210509 67300
8 210209 76600
Date datetime64[ns]
Patient Number int64
dtype: object
By utilizing the expression “pnd.to_datetime(data_frame[‘Date], format = ‘%y%m%d’),” which can be found in the preceding code, we were able to transform the datatype of the column “Date” from “object” to “datetime64[ns].”
function.
Approach 4:
Using the “pandas.to_datetime()” function, we are able to convert many columns from the “string” format to the “datetime” format, which is represented as “YYYYMMDD” format.
function.
-
# Initializing the nested list with Data set
-
Dataset_list = [[
‘20210612’
,
54000
,
‘20210812’
],
-
[
‘20210814’
,
65000
,
‘20210614’
],
-
[
‘20210316’
,
71500
,
‘20210316’
],
-
[
‘20210519’
,
45000
,
‘20210119’
],
-
[
‘20210221’
,
98000
,
‘20210221’
],
-
[
‘20210124’
,
23000
,
‘20210724’
],
-
[
‘20210929’
,
12000
,
‘20210924’
]]
-
-
# creating a pandas dataframe
-
data_frame = pnd.DataFrame(
-
Dataset_list, columns = [
‘Treatment_starting_Date’
,
-
‘Patients Number’
,
-
‘Treatment_ending_Date’
])
-
-
# Print the dataframe
-
print (
“The data is: ”
)
-
print (data_frame)
-
-
# Here, we are checking the data type of the
‘Date’
column
-
print (data_frame.dtypes)
Output:
The data is: Treatment_starting_Date Patients Number Treatment_ending_Date 0 20210612 54000 20210812 1 20210814 65000 20210614 2 20210316 71500 20210316 3 20210519 45000 20210119 4 20210221 98000 20210221 5 20210124 23000 20210724 6 20210929 12000 20210924 Treatment_starting_Date object Patients Number int64 Treatment_ending_Date object dtype: object
When we look at the output, we can see that the “Date” column in the DataFrame has “object” as its Datatype. This indicates that the column contains strings as its values. Using the expression “pnd.to_datetime(data_frame[”], format = ‘%y%m%d’], we will now convert the column whose datatype is “Date” into the datetime format.
function.
-
import
pandas as pnd
-
-
# Initializing the nested list with Data set
-
Dataset_list = [[
‘20210612’
,
54000
,
‘20210812’
],
-
[
‘20210814’
,
65000
,
‘20210614’
],
-
[
‘20210316’
,
71500
,
‘20210316’
],
-
[
‘20210519’
,
45000
,
‘20210119’
],
-
[
‘20210221’
,
98000
,
‘20210221’
],
-
[
‘20210124’
,
23000
,
‘20210724’
],
-
[
‘20210929’
,
12000
,
‘20210924’
]]
-
-
# creating a pandas dataframe
-
data_frame = pnd.DataFrame(
-
Dataset_list, columns = [
‘Treatment_starting_Date’
,
-
‘Patients Number’
,
-
‘Treatment_ending_Date’
])
-
-
# Print the dataframe
-
print (
“The data is: ”
)
-
print (data_frame)
-
-
-
# For converting the multiple columns of DataFrame into datetime format
-
data_frame[
‘Treatment_starting_Date’
] = pnd.to_datetime(
-
data_frame[
‘Treatment_starting_Date’
],
-
format =
‘%Y%m%d’
-
)
-
data_frame[
‘Treatment_ending_Date’
] = pnd.to_datetime(
-
data_frame[
‘Treatment_ending_Date’
],
-
format =
‘%Y%m%d’
-
)
-
-
# Here, we are checking the data type of the
‘Date’
column
-
print (data_frame.dtypes)
Output:
The data is: Treatment_starting_Date Patients Number Treatment_ending_Date 0 20210612 54000 20210812 1 20210814 65000 20210614 2 20210316 71500 20210316 3 20210519 45000 20210119 4 20210221 98000 20210221 5 20210124 23000 20210724 6 20210929 12000 20210924 Treatment_starting_Date datetime64[ns] Patients Number int64 Treatment_ending_Date datetime64[ns] dtype: object
By making use of the “pnd.to_datetime()” method, we can see that the datatype of “Treatment_starting_Date” and “Treatment_ending_Date” has been converted to datetime format. This is visible in the output that we just looked at.