Data Cleaning with Pandas

Learn how to clean data with concise and real life strategies, regardless of where your data is coming from: databases, scraped websites, etc.

9 hours

20 assignments

9 videos

Intermediate

Learn more

Course
Description

Data Cleaning is fundamental for every Data Science project. Regardless of where you're getting the data from, it's never perfect. Some invalid values, some odd types, outliers or statistically insignificant values will arise at some point. This course will teach you all the common issues found with messy data, and how to fix it with simple and clear techniques.

"For this course, I focused on proven strategies that I always follow when cleaning data for my own projects. Hopefully sharing these methods will help your process run smoothly."

Instructor Santiago Basulto

Santiago Basulto

Course Instructor

Curriculum

Dealing with Missing Data

Duplicated Values and Invalid Types

String Handling and Invalid Values

Final Project

LESSONS: 7

Intro to Data Cleaning

Missing Values in Python and NumPy

Numpy Cleaning NaNs Exercises

Missing Values in Pandas

Missing Values Exercises 1

Dealing with Missing Data in Pandas

Missing Values Exercises 2

LESSONS: 1

Final Project

LESSONS: 6

Pandas String Handling

String Handling Exercises 1

String Handling Exercises 2

Dealing with Invalid Values

Invalid Values Exercises

Using stats to clean data

LESSONS: 4

Dealing with Duplicated Values in Pandas

Duplicated Values Exercises

Dealing with Invalid Types

Invalid Types Exercises

Simple pricing

Gain access to RMOTR’s entire course library

$49

per month

15% OFF!

$490

per year

  • Full access to all videos & courses.
  • 200+ programming assignments.
  • 20+ Github Projects to add to your portfolio.
  • Access to exclusive community.
  • Exclusive access to members-only webinars and workshops.
Learn more

Course final project

New York City Airbnb Open Data

For this project, you’ll apply all the techniques discussed in this course. You’ll be doing a fair amount of analysis as practice to help you better understand when values are valid or invalid given their context.

Learn more
Testimonials

What Our Students
Have to Say

The most trusted Data Science academy online.
1000+ students have trusted us with their Data Science careers.

Willian Ponton
William Ponton

The perfect combination of building real-world skills through challenging coursework and projects.

Aiya Akatayeva

I have tried multiple ways to learn Data Science. Now, with the lessons and practice provided here, I finally feel like I am making real progress.

Chris Mccluskey
Chris McCluskey

Clear guidance while providing detailed explanations. Greatly enhanced my knowledge in a short amount of time!

Course instructor

Santiago Basulto

Data Scientist at RMOTR

Santiago is an experienced Data Scientist and Python programmer with more than 10 years of experience in the field. He started as a Java developer working as a consultant on high performance and critical systems before moving to Python. In 2012 was hired as the CTO of Athlete.com, a startup analyzing data from runners. He then fell in love with Data processing with Python and in 2015 founded RMOTR, to provide expert-level Data Science training.