| by Arround The Web | No comments

Pandas to Array

The most common and popular method to convert the DataFrame into an array is using the to_numpy() function. In NumPy, the dimensions are referred to as the axis. The “array.array” class from the default Python library is distinct from the numpy.array. We also have the “values” attribute and to_records() methods.

Method 1: Using To_Numpy()

When we apply the to_numpy() method on Pandas DataFrame, an object of the NumPy data type, ndarray is returned as output. Typically, a 2-dimensional ndarray is returned. Let’s have a look at the function’s syntax before seeing the working of the function in the following examples.

Syntax:
DataFrame_object.to_numpy(dtype= None, copy= False, na_value= NoDefault.no_default)

Parameters:

  1. dtype: NumPy.dtype, str, or optional. The datatype is passed to numpy.asarray().
  2. copy: Bool, False by default. Whether to check that the output/returned data/value isn’t a view on the other arrays. The to_numpy() is not guaranteed to be no-copy when copy=False is used. Instead, copy=True makes a copy even if it is not strictly necessary.
  3. na_value: Any option. The value to replace if there are missing values. The value, by default, depends on the dtypes of the columns in the DataFrame.

Example 1:
Let’s have a DataFrame having 5 rows and 3 columns and convert it to a NumPy array using the to_numpy() method.

import pandas
import numpy

# Consider the pandas DataFrame
actual=pandas.DataFrame([[1,"cooking",200],
                            [2,"music",3004],
                            [3,"hand loom",1000],
                            [4,"hand loom",2000],
                            [5,"dressing",3000]],
                           columns = ['id','work','wages'],
                           index=['person 1','person 2','person 3','person 4','person 5'])

# Display the converted DataFrame
print(actual,"\n")

# Convert to Numpy array
converted=actual.to_numpy()

# Display the type of numpy array
print(type(converted),"\n")

print(converted)

Output:

          id       work  wages
person 1   1    cooking    200
person 2   2      music   3004
person 3   3  hand loom   1000
person 4   4  hand loom   2000
person 5   5   dressing   3000

<class 'numpy.ndarray'>

[[1 'cooking' 200]
 [2 'music' 3004]
 [3 'hand loom' 1000]
 [4 'hand loom' 2000]
 [5 'dressing' 3000]]

Explanation:
After converting to the NumPy array, we use the type() function to display the type of converted array. You can see that 5 rows are stored in a NumPy array.

Example 2:
Convert only two columns in the DataFrame to the NumPy array using the to_numpy() method. Here, we have to specify the column names to be converted to the NumPy array in a list.

import pandas
import numpy

# Consider the pandas DataFrame
actual=pandas.DataFrame([[1,"cooking",200],
                            [2,"music",3004],
                            [3,"hand loom",1000],
                            [4,"hand loom",2000],
                            [5,"dressing",3000]],
                           columns = ['id','work','wages'],
                           index=['person 1','person 2','person 3','person 4','person 5'])

# Convert only 'work' and 'wages' columns to numpy array
print(actual[['work','wages']].to_numpy())

Output:

[['cooking' 200]
 ['music' 3004]
 ['hand loom' 1000]
 ['hand loom' 2000]
 ['dressing' 3000]]

Explanation:
We can see that only two columns [“work”,”wages”] are converted to the NumPy array.

Method 2: Using the Values Attribute

The “values” is an attribute that converts the Pandas DataFrame to the NumPy array directly.

Syntax:
DataFrame_object.values

Example 1: Convert the Entire DataFrame to NumPy Array
Consider the previous DataFrame and convert it to a NumPy array using the to_numpy() method.

import pandas
import numpy

# Consider the pandas DataFrame
actual=pandas.DataFrame([[1,"cooking",200],
                            [2,"music",3004],
                            [3,"hand loom",1000],
                            [4,"hand loom",2000],
                            [5,"dressing",3000]],
                           columns = ['id','work','wages'],
                           index=['person 1','person 2','person 3','person 4','person 5'])

# Use values attribute to convert the above DataFrame to numpy array.
print(actual.values)

print(type(actual.values))

Output:

 [[1 'cooking' 200]
 [2 'music' 3004]
 [3 'hand loom' 1000]
 [4 'hand loom' 2000]
 [5 'dressing' 3000]]
<class 'numpy.ndarray'>

Explanation:
You can see all the columns in the DataFrame to the NumPy array.

Example 2: Convert Some Columns to NumPy Array
Convert only two columns in the DataFrame to the NumPy array using the to_numpy() method. Here, we have to specify the column names to be converted to the NumPy array in a list.

import pandas
import numpy

# Consider the pandas DataFrame
actual=pandas.DataFrame([[1,"cooking",200],
                            [2,"music",3004],
                            [3,"hand loom",1000],
                            [4,"hand loom",2000],
                            [5,"dressing",3000]],
                           columns = ['id','work','wages'],
                           index=['person 1','person 2','person 3','person 4','person 5'])

print(actual[['work','wages']].values)

Output:

[['cooking' 200]
 ['music' 3004]
 ['hand loom' 1000]
 ['hand loom' 2000]
 ['dressing' 3000]]

We can see that only two columns [“work”,”wages”] are converted to the NumPy array.

Method 3: Using the To_Records()

The “to_records()” directly converts the existing DataFrame to a NumPy array which is of record array type. The advantage of using this method is that for each converted row, the index also comes in the record array.

Syntax:
DataFrame_object.to_records()

Example 1: Convert the Entire DataFrame to NumPy Array
Consider the previous DataFrame and convert it to a NumPy array using the to_records() method.

import pandas
import numpy

# Consider the pandas DataFrame
actual=pandas.DataFrame([[1,"cooking",200],
                            [2,"music",3004],
                            [3,"hand loom",1000],
                            [4,"hand loom",2000],
                            [5,"dressing",3000]],
                           columns = ['id','work','wages'],
                           index=['person 1','person 2','person 3','person 4','person 5'])

# Use to_records() to convert the above DataFrame to numpy array.
print(actual.to_records(),"\n")

# Get the data type
print(type(actual.to_records()))

Output:

 [('person 1', 1, 'cooking',  200) ('person 2', 2, 'music', 3004)
 ('person 3', 3, 'hand loom', 1000) ('person 4', 4, 'hand loom', 2000)
 ('person 5', 5, 'dressing', 3000)]

<class 'numpy.recarray'>

Explanation:
You can see all the columns in the DataFrame to the NumPy array and the returned array is a record array. In each record, you can also see the index.

Example 2: Convert Some Columns to NumPy Array
Use the to_records() method to convert the first 2 columns in the DataFrame to a NumPy array.

import pandas
import numpy

# Consider the pandas DataFrame
actual=pandas.DataFrame([[1,"cooking",200],
                            [2,"music",3004],
                            [3,"hand loom",1000],
                            [4,"hand loom",2000],
                            [5,"dressing",3000]],
                           columns = ['id','work','wages'],
                           index=['person 1','person 2','person 3','person 4','person 5'])

# Use to_records() to convert the first 2 columns in the DataFrame to a numpy array.
print(actual[['id','work']].to_records(),"\n")

Output:

[('person 1', 1, 'cooking') ('person 2', 2, 'music')
 ('person 3', 3, 'hand loom') ('person 4', 4, 'hand loom')
 ('person 5', 5, 'dressing')]

The first two columns are converted to the NumPy array.

Conclusion

We discussed what arrays are and how the DataFrames in Pandas can be converted to NumPy columns. We used three methods to change the DataFrame columns into an array. In the examples of this article, we tried to teach you how to convert the specific columns or the entire DataFrame into a NumPy array using the to_numpy() function. We also used the values attribute and to_records() method to convert the DataFrame columns into a NumPy array.

Share Button

Source: linuxhint.com

Leave a Reply