| by Arround The Web | No comments

Pandas Convert Column to Int

Pandas is a free and open-source Python library that provides fast, flexible, and expressive data structures that make working with scientific data easy.

Pandas is one of Python’s most valuable data analysis and manipulation packages.

It offers features such as custom data structures that are built on top of Python.

This article will discuss converting a column from one data type to an int type within a Pandas DataFrame.

Setting Up Pandas

Before diving into how to perform the conversion operation, we need to setup Pandas in our Python environment.

If you are using the base environment in the Anaconda interpreter, chances are you have Pandas installed.

However, on a native Python install, you will need to install it manually.

You can do that by running the command:

$ pip install pandas

On Linux, run

$ sudo pip3 install pandas

In Anaconda or Miniconda environments, install pandas with conda.

$ conda install pandas
$ sudo conda install pandas

Pandas Create Sample DataFrame

Let us set up a sample DataFrame for illustration purposes in this tutorial. You can copy the code below or use your DataFrame.

import pandas as pd
df = pd.DataFrame({'id': ['1', '2', '3', '4', '5'],
                   'name': ['Marja Jérôme', 'Alexios Shiva', 'Mohan Famke', 'Lovrenco Ilar', 'Steffen Angus'],
                   'points': ['50000', '70899', '70000', '81000', '110000']})

Once the DataFrame is created, we can check the data.

Pandas Show Column Type

It is good to know if the existing type can be cast to an int before converting a column from one type to an int.

For example, attempting to convert a column containing names cannot be converted to an int.

We can view the type of a DataFrame using the dtypes property

Use the syntax:

DataFrame.dtypes

In our sample DataFrame, we can get the column types as:

df.dtypes
id        object
name      object
points    object
dtype: object

We can see from the output above that none of the columns hold an int type.

Pandas Convert Column From String to Int.

To convert a single column to an int, we use the astype() function and pass the target data type as the parameter.

The function syntax:

DataFrame.astype(dtype, copy=True, errors='raise')
  1. dtype – specifies the Python type or a NumPy dtype to which the object is converted.
  2. copy – allows you to return a copy of the object instead of acting in place.
  3. errors – specifies the action in case of error. By default, the function will raise the errors.

In our sample DataFrame, we can convert the id column to int type using the astype() function as shown in the code below:

df['id'] = df['id'].astype(int)

The code above specifies the ‘id’ column as the target object. We then pass an int as the type to the astype() function.

We can check the new data type for each column in the DataFrame:

df.dtypes
id         int32
name      object
points    object
dtype: object

The id column has been converted to an int while the rest remains unchanged.

Pandas Convert Multiple Columns to Int

The astype() function allows us to convert more than one column and convert them to a specific type.

For example, we can run the following code to convert the id and points columns to int type.

df[['id', 'points']] = df[['id', 'points']].astype(int)

Here, we are specifying multiple columns using the square bracket notation. This allows us to convert the columns to the data type specified in the astype() function.

If we check the column type, we should see an output:

df.dtypes
id         int32
name      object
points     int32
dtype: object

We can now see that the id and points column has been converted to int32 type.

Pandas Convert Multiple Columns to Multiple Types

The astype() function allows us to specify a column and target type as a dictionary.

Assume that we want to convert the id column to int32 and the points column to float64.

We can run the following code:

convert_to = {"id": int, "points": float}
df = df.astype(convert_to)

In the code above, we start by defining a dictionary holding the target column as the key and the target type as the value.

We then use the astype() function to convert the columns in the dictionary to the set types.

Checking the column types should return:

df.dtypes
id          int32
name       object
points    float64
dtype: object

Note that the id column is int32 and the points column is of float32 type.

Pandas Convert Column to Int – to_numeric()

Pandas also provides us with the to_numeric() function. This function allows us to convert a column to a numeric type.

The function syntax is as shown:

 pandas.to_numeric(arg, errors='raise', downcast=None)

For example, to convert the id column to numeric in our sample DataFrame, we can run:

df['id'] = pd.to_numeric(df['id'])

The code should take the id column and convert it into an int type.

Pandas Convert DataFrame to Best Possible Data Type

The convert_dtypes() function in Pandas allows us to convert an entire DataFrame to the nearest possible type.

The function syntax is as shown:

DataFrame.convert_dtypes(infer_objects=True, convert_string=True, convert_integer=True, convert_boolean=True, convert_floating=True)

You can check the docs in the resource below:

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.convert_dtypes.html

For example, to convert our sample DataFrame to the nearest possible type, we can run:

df = df.convert_dtypes()

If we check the type:

df.dtypes
id         Int32
name      string
points     Int64
dtype: object

You will notice that each column has been converted to the nearest appropriate type. For example, the function converts small ints to int32 type.

Likewise, the names column is converted to string type as it holds string values.

Finally, since the points column holds larger integers, it is converted to an int64 type.

Conclusion

In this article, we gave detailed methods and examples of converting a Pandas DataFrame from one type to another.

Share Button

Source: linuxhint.com

Leave a Reply