Week 2 - BALT 4396 - Chapter 3

 

Handling and Cleaning Data with Python Libraries


Introduction to Pandas/How to Import Data
What is pandas? Pandas is a library within the Python software that is open to creators worldwide. This library holds prewritten codes and data structures that can be modified and used on any set of data. It's very time efficient and flexible as it easily allows users to implement these functions/features onto their own projects/collaborations. 

1. How do you use Pandas in Python? To use Pandas you import the library: "import pandas as pd"

  • There are two primary data structures. 
  • Series: A series holds any data type. It is a one-dimensional array. 
  • DataFrame: A dataframe is a two-dimensional array that holds different types of data. 

2. How do you import data with pandas? Pandas allows users to import data from softwares like Excel, JSON, CVC, SQL.

Steps on importing data to pandas. 

- import pandas as pd
- # Reading data from a CSV file
data = pd.read_csv ('data.csv')


Introduction to NumPy
What is NumPy? NumPy is another library found in Python. This one stores arrays and matrices. The arrays/matrices that can be found are considered to be efficient as they are based off of well-organized data structures. In this library you can also find the math functions that run on the arrays/matrices. 


1. How do you use NumPy? Import the library "import numpy as np"

Steps on creating a numpy array.
- import numpy as np
- # creating a numpy array
array = np.array ([1, 2, 3, 4, 5])

Introduction to Data Cleaning with Pandas
What is data cleaning in pandas? Data cleaning is the preparation of data for analyzing. During this cleaning process we are looking for any incorrect, duplicated, and unrelated data to modify or get rid of it. But it is also important to look out for any missing data. After doing this process, it ensures that we will get accurate information after analyzing the clean data. 

(In the book Data Toolkit: Python + Hands on Math you can find the steps on how to do each of the cleaning steps such as handling missing data, removing duplicates, renaming columns, and replacing values on Python. It is on page 20 of the pdf. I would give my own explanation on how to do this on Python, but I feel that professor Kelsey did a good job. I'd make it confusing.) 

Comments