Pandas & Numpy

Series – It is a one-dimensional array-like structure with homogeneous data which means data of different data types cannot be a part of the same series. It can hold any data type such as integers, floats, and strings and its values are mutable but the size of the series is immutable. The ‘series’ method can be used to convert the list, tuple, and dictionary into a series. A Series cannot contain multiple columns.
DataFrame – It is a two-dimensional array-like structure with heterogeneous data. It can contain data of different data types and the data is aligned in a tabular manner i.e. the indexes for rows and columns are called row index and column index respectively. Both the size and values of DataFrame are mutable.

import pandas as pd

dataframe = pd.DataFrame( data, index, columns, dtype)

- - data – It represents various forms like series, map, ndarray, lists, dict, etc.
  - index – It is an optional argument that represents an index to row labels.
  - columns – Optional argument for column labels.
  - Dtype – It represents the data type of each column. It is an optional parameter.

Panel – The Pandas have a third type of data structure known as Panel, which is a 3D data structure capable of storing heterogeneous data but it isn’t that widely used.

Significant features of the pandas Library

Fast and efficient DataFrame object with default and customized indexing.
High-performance merging and joining of data.
Data alignment and integrated handling of missing data.
Label-based slicing, indexing, and subsetting of large data sets.
Reshaping and pivoting of data sets.
Tools for loading data into in-memory data objects from different file formats.
Columns from a data structure can be deleted or inserted.
Group by data for aggregation and transformations.
Time Series functionality.

Time Series in Panda

A time series is an organized collection of data that depicts the evolution of a quantity through time.

Supported by pandas:

Analyzing time-series data from a variety of sources and formats.
Create time and date sequences with preset frequencies.
Date and time manipulation and conversion with timezone information.
A time series is resampled or converted to a specific frequency.
Calculating dates and times using absolute or relative time increments is one way to.

MultiIndexing in Pandas

Multiple indexing is defined as essential indexing because it deals with data analysis and manipulation, especially for working with higher dimensional data. It also enables us to store and manipulate data with an arbitrary number of dimensions in lower-dimensional data structures like Series and DataFrame.

Copy of the series in Pandas

We can create a copy of the series by using the following syntax: Series.copy(deep=True) The default value for the deep parameter is set to True i.e the creation of a new object with a copy of the calling object’s data and indices takes place. Modifications to the data or indices of the copy will not be reflected in the original object whereas when the value of deep=False, the creation of a new object will take place without copying the calling object’s data or index i.e. only the references to the data and index will be copied. Any changes made to the data of the original object will be reflected in the shallow copy and vice versa.

Categorical data in Pandas

Categorical data is a discrete set of values for a particular outcome and has a fixed range. Also, the data in the category need not be numerical, it can be textual in nature. Examples are gender, social class, blood type, country affiliation, observation time, etc. Apply one’s domain knowledge to make that determination on the data sets.

Reindexing in Pandas

Used to alter the rows and columns in a DataFrame. It is also defined as the process of conforming a dataframe to a new index with optional filling logic. For missing values in a dataframe, the reindex() method assigns NA/NaN as the value. A new object is returned unless a new index is produced that is equivalent to the current one. The copy value is set to False. This is also used for changing the index of rows and columns in the dataframe.

Pandas & Numpy

NumPy

Pandas

Data Structures in Pandas

Significant features of the pandas Library

Time Series in Panda

MultiIndexing in Pandas

Copy of the series in Pandas

Categorical data in Pandas

Reindexing in Pandas

Table of Contents

You may also like to read

Python Notes

‘map’ and ‘filter’ in Python

Variable Scope

Docstring

Popular content

contact us