Exploring NASA's Meteorite Landing Dataset: Part 2

Photo by Rex on Unsplash

Exploring NASA's Meteorite Landing Dataset: Part 2

Exploratory Data Analysis of NASA Meteorite Landing Data

The aim of this study is to use SQL and Python to analyze a large dataset and derive key insights and patterns in the data. Firstly, the environment for data analysis was setup with the following assumptions,

  • Python is already installed in the system

  • Jupyter notebook is already installed in the system

  • A GitHub repository is already created to store the code

  • Git is already installed in the system

  • The dataset is downloaded in csv format

Author has created a folder in GitHub which can be referred to get any files related to this project.

Link - https://github.com/isuri-balasooriya2/TheMathLab/tree/main/NASA_Meteorite_Landing_EDA

Data Cleaning

As the first step, an SQLite database was created and the dataset was loaded using the pandas library in Python.

An SQL table was created to load the dataset into a table. Here the geolocation column was removed as it stores the geolocation as a text value. Since the dataset already contains latitude and longitude, geospatial analysis can be done using the two columns.

The first step of data cleaning was to identify null values. The following columns were checked for null values,

  • Mass

  • Latitude

  • Longitude

Based on the type of data and the size of the affected data portion, it was decided to remove the records with empty values for mass, latitude, longitude and geolocation.

Further data cleaning was done to fix small issues like removing the GeoLocation column which stores text values, fix the year column to show year as an integer as it was showing a float value.

Analytical Queries

Once the data cleaning was completed, basic analytical queries were used to identify basic patterns and information about the data. Below is a portion of the cleaned dataset.

Below are some of the findings from using general analytical querying on the dataset.

  • The 10 heaviest meteorites

  • The number of unique meteorite classes in the dataset. There are 423 unique classes of meteorites

  • The number of meteorites per class. Class L6 has the highest number of meteorite landings

  • Meteorite landings after the year 2000

While executing the above query it was noticed that one record has the year 2101, which is an incorrect value. This means further data cleaning should be done to check the value in the year if it is less than or equal to the current year.

  • The average mass of the meteorites