Key Excel Functions for Data Science and Statistics Explained

In this article I’m going to delve deeper into data science functions in Excel. Excel has a ton of features to help data analytics as we’ve discussed before. Sure, there are plenty of other options out there today that we can use to accomplish all these tasks, but I always find Excel to be such a simple and basic tool that everyone should know, even if we rarely use it anymore.

Today we are going to look at statistical and data science functions in Excel.

Descriptive Statistics

Descriptive statistics summarize key aspects of data, such as mean, variability, and dispersion.

Mean is the average of a dataset.

Sales
5000
7000
8000
6500

In order to find the average sales we can use AVERAGE() function.

Standard deviation measures by how much data varies from the mean. There are two functions to calculate the standard deviation. STDEV.P() and STDEV.S()

Its crucial to know the difference. In STDEV.P(), the P stands for population. That means this function should be used if you have the data representing the entire population you are interested in studying. On the other hand STDEV.S() means standard deviation of a sample taken from that population.

Lets calculate the standard deviation of the above sales dataset,

If you get a higher value for standard deviation (away from zero) that means your dataset is spread out. If you get a lower value for standard deviation, closer to zero that means you data stays around the mean value.

Variance is the square of the standard deviation and shows the degree of spread.

Again high variance means data is widely spread and low variance means data points are closer to the mean.

Correlation

Correlation helps identify relationships between variables. Lets take the below dataset and see if we can find a relationship between the hours studies and the exam score

Hours StudiedExam Score
570
675
780
890

The range of value for correlation is from -1 to +1

If the value you get is closer to -1 that means there’s a strong negative connection between the two variables. If the value is 0 then there’s no connection. If the value is closer to +1 then there’s a strong positive connection.

The above dataset returns 0.98 value which means there’s a positive connection between the two variables.

Forecasting and Predictive Analytics

Forecasting helps to predict future data based on historical data. For example if we wanted to predict the sales for 2024 based on previous years sales. There are two functions in Excel that we can use for predicting future values.

Predicting Future Values Using Exponential Smoothing: FORECAST.ETS()

YearSales ($)
20205000
20217000
20229000
2023???

Here we use the target date (2023), values (known sales data range), timeline (year data range). Based on that we get the value 11000 as the sales for 2023.

Predicting Future Values Using TREND()

YearRevenue ($)
201910,000
202015,000
202120,000
2022???

Lets use this function to predict the revenue for 2022 in the above dataset.

Based on the existing data we get the value 25000 as the revenue for 2022.

Lets do a practice exercise to summarize what we learned today,

YearRevenue ($)Ad Spend ($)
202050,00010,000
202165,00015,000
202280,00020,000
2023?25,000

Tasks:

  1. Find Mean, Variance, and Standard Deviation of revenue

  2. Calculate the correlation between Ad Spend & Revenue

    \=CORREL(C2:C5,B2:B5)

    \=1

    There’s a strong positive correlation between the two variables

  3. Predict 2023 revenue using TREND() & FORECAST.ETS()

    \=TREND(B2:B4,A2:A4,A5)

    \= 95,000

\=FORECAST.ETS(A5,B2:B4,A2:A4)

\=95,000

That’s all for todays article. Hope you learned something new :)