Generating Insights
The analysis of meteorites is done based on several factors, but the mass of a meteorite plays a significant role in the studying of meteorites. Based on the recorded mass data in the dataset, the meteorites were categorized as small, medium, large and massive. According to National Geographic, a meteorite is considered as massive if the size is more than 60 tons (National Geographic, 2024).
conn = sqlite3.connect("meteortite_landing.db")
query = '''
SELECT name, "mass (g)", year,
CASE
WHEN "mass (g)" <100 THEN 'small'
WHEN "mass (g)" BETWEEN 100 AND 10000 THEN 'medium'
WHEN "mass (g)" BETWEEN 10000 AND 100000 THEN 'large'
ELSE 'massive'
END AS 'Size_Category'
FROM meteorites
WHERE Size_Category='massive';
'''
df = pd.read_sql(query,conn);
conn.close()
df
By using a WHERE clause in the above SQL query, it was identified that 25804 meteorites were small, 11208 meteorites were medium, 1000 meteorites were large and only 270 meteorites were massive. Its clear from this analysis that most meteorites are of small size which is less than 100 grams. Compared to that number only a small about of meteorites weigh more than 60 tons which categorizes as massive.
- Finding the most active years for meteorite landings
conn = sqlite3.connect("meteortite_landing.db")
query = '''
SELECT year, COUNT(*) AS count
FROM meteorites
WHERE year IS NOT NULL
GROUP BY year
ORDER BY count DESC
LIMIT 10;
'''
df = pd.read_sql(query,conn);
conn.close()
df
- The most impact to earth or environment can happen with massive meteorites. In order to analyze that further, the most number of massive meteorite landings were identified based on the year.
conn = sqlite3.connect("meteortite_landing.db")
query = '''
SELECT year, COUNT(*) AS count,
CASE
WHEN "mass (g)" <100 THEN 'small'
WHEN "mass (g)" BETWEEN 100 AND 10000 THEN 'medium'
WHEN "mass (g)" BETWEEN 10000 AND 100000 THEN 'large'
ELSE 'massive'
END AS 'Size_Category'
FROM meteorites
WHERE Size_Category='massive' AND year >0
GROUP BY year
ORDER BY count DESC
LIMIT 10;
'''
df = pd.read_sql(query,conn);
conn.close()
df
Linking meteorite landings to continents can help understand which areas are more prone to meteorites. This was done using the latitude and longitude data, the geo location data for continents were taken from the Google dataset called countries.csv
The results of this were inconclusive as more than 80% of the meteorites falls under the category unknown.
conn = sqlite3.connect("meteortite_landing.db")
query = '''
SELECT
COUNT(*) AS count,
CASE
WHEN reclat BETWEEN -35 AND 37.3 AND reclong BETWEEN 17.5 AND 51.4 THEN 'Africa'
WHEN reclat BETWEEN -55 AND 12.5 AND reclong BETWEEN -81.7 AND -34.8 THEN 'South America'
WHEN reclat BETWEEN 7.2 AND 83.1 AND reclong BETWEEN -168.1 AND -52.6 THEN 'North America'
WHEN reclat BETWEEN 34.6 AND 71.2 AND reclong BETWEEN -31.3 AND 69 THEN 'Europe'
WHEN reclat BETWEEN -55.1 AND -9.1 AND reclong BETWEEN 112.9 AND 179.9 THEN 'Australia'
WHEN reclat BETWEEN -10 AND 81 AND reclong BETWEEN 25 AND 180 THEN 'Asia'
ELSE 'Unknown'
END AS region
FROM meteorites
GROUP BY region
ORDER BY count DESC;
'''
df = pd.read_sql(query,conn);
conn.close()
df
Data Visualization
This study used python libraries matplotlib and seaborn for data visualization.
- Plotting the meteorite landings per year
- Mapping meteorite landings. In order to visualize geospatial data, the python library folium was used.
- Top 10 largest meteorites
- Top 10 meteorite classes
- Relationship between massive meteorites and their landing continent
Key Findings
Based on this exploratory data analysis, the following key insights were identified,
Most of the meteors are small in size with a mass of less than 100g. out of 35000 meteors only 270 are considered massive with a mass exceeding 60 tons. What we can understand of that is that most meteorites either vaporize by the time they reach the earths ground or they come from smaller meteors or meteoroids from space.
Most meteorite landings were recorded in 1979 with 3045 records.
Most of the massive meteorites were recorded after 1900
Most of the meteorite landings were recorded from Asia and least from Africa
Most of the massive meteorites were recorded in North America and least from Africa
Limitations of the Data
Incomplete Records – Some meteorite entries lack key attributes like mass and geolocation, which led to those records being dropped before analysis
Mass Distribution Bias – The dataset contains many small meteorites but fewer large ones, potentially skewing mass-based insights.
Geographical Reporting Bias – More meteorites are reported in regions with active research (e.g., North America, Europe), while remote areas (e.g., deep oceans, Antarctica) may be underrepresented.
Geolocation accuracy – more than 80% of the meteorite landings in the dataset doesn’t seem to belong to any continent based on the classification used in this study. This leads to the assumption maybe the latitude, longitude data is incorrect or the classification used in the study is incorrect.
The aim of this project was to explore NASA's meteorite landings dataset to uncover patterns and insights related to the distribution, size, and impact of meteorites around the world. By leveraging SQL and Python, data was successfully cleaned and analyzed, revealing trends such as the geographic concentration of meteorite landings and the relationship between meteorite size and impact locations. Through data visualization, the findings were presented in an easily understandable way, showing the varying frequencies of meteorite landings across different regions.
This project highlights the importance of data-driven insights in understanding the behavior of meteorites and sets the stage for further exploration into their role in Earth's history. Moving forward, there is potential for deeper analysis, incorporating machine learning techniques to predict future landings based on environmental and astronomical factors.