NumPy

  • NumPy (or Numpy) is a Linear Algebra Library of Python
  • Almost all libraries in the PyData Ecosystem rely on Numpy as one of their main building blocks
  • Numpy is incredibly fast, as it has binding to C libraries
  • Numpy arrays have two flavours: Vectors and Matrices
  • Vectors are strictly 1D arrays and matrices are 2D. 

Pandas

  • Pandas is an open source library built on top of NumPy
  • It allows for fast analysis and data cleaning and preparation
  • It excels in performance and productivity
  • It also has built-in visualisation features.
  • It can work with data from a wide variety of sources.

Data Structures in Pandas

Simple and flexible data structures make it fast and efficient.

  • Series – It is a one-dimensional array-like structure with homogeneous data which means data of different data types cannot be a part of the same series. It can hold any data type such as integersfloats, and strings and its values are mutable but the size of the series is immutable. The ‘series’ method can be used to convert the list, tuple, and dictionary into a series. A Series cannot contain multiple columns.
  • DataFrame – It is a two-dimensional array-like structure with heterogeneous data. It can contain data of different data types and the data is aligned in a tabular manner i.e. the indexes for rows and columns are called row index and column index respectively. Both the size and values of DataFrame are mutable.

                  import pandas as pd

                 dataframe = pd.DataFrame( data, index, columns, dtype)

      • data – It represents various forms like series, map, ndarray, lists, dict, etc.
      • index – It is an optional argument that represents an index to row labels.
      • columns – Optional argument for column labels.
      • Dtype – It represents the data type of each column. It is an optional parameter.
  • Panel – The Pandas have a third type of data structure known as Panel, which is a 3D data structure capable of storing heterogeneous data but it isn’t that widely used.

Significant features of the pandas Library

  1. Fast and efficient DataFrame object with default and customized indexing.
  2. High-performance merging and joining of data.
  3. Data alignment and integrated handling of missing data.
  4. Label-based slicing, indexing, and subsetting of large data sets.
  5. Reshaping and pivoting of data sets.
  6. Tools for loading data into in-memory data objects from different file formats.
  7. Columns from a data structure can be deleted or inserted.
  8. Group by data for aggregation and transformations.
  9. Time Series functionality.

Time Series in Panda

A time series is an organized collection of data that depicts the evolution of a quantity through time. 

Supported by pandas:

  • Analyzing time-series data from a variety of sources and formats.
  • Create time and date sequences with preset frequencies.
  • Date and time manipulation and conversion with timezone information.
  • A time series is resampled or converted to a specific frequency.
  • Calculating dates and times using absolute or relative time increments is one way to.

MultiIndexing in Pandas

Multiple indexing is defined as essential indexing because it deals with data analysis and manipulation, especially for working with higher dimensional data. It also enables us to store and manipulate data with an arbitrary number of dimensions in lower-dimensional data structures like Series and DataFrame.

Copy of the series in Pandas

We can create a copy of the series by using the following syntax: Series.copy(deep=True) The default value for the deep parameter is set to True i.e the creation of a new object with a copy of the calling object’s data and indices takes place. Modifications to the data or indices of the copy will not be reflected in the original object whereas when the value of deep=False, the creation of a new object will take place without copying the calling object’s data or index i.e. only the references to the data and index will be copied. Any changes made to the data of the original object will be reflected in the shallow copy and vice versa.

Categorical data in Pandas

Categorical data is a discrete set of values for a particular outcome and has a fixed range. Also, the data in the category need not be numerical, it can be textual in nature. Examples are gender, social class, blood type, country affiliation, observation time, etc. Apply one’s domain knowledge to make that determination on the data sets.

Reindexing in Pandas

Used to alter the rows and columns in a DataFrame. It is also defined as the process of conforming a dataframe to a new index with optional filling logic. For missing values in a dataframe, the reindex() method assigns NA/NaN as the value. A new object is returned unless a new index is produced that is equivalent to the current one. The copy value is set to False. This is also used for changing the index of rows and columns in the dataframe.

 

Search

Table of Contents

You may also like to read