Mathematical Concept Of Probability In Data Science

Oyinkansola Awosan
4 min readMar 13, 2021

Understanding Data Science and being an efficient data scientist starts from understanding probability and statistics. This is because they are the basics of data science and as such are very crucial to the understanding and practice of Data Science.

Probability Theory

Probability theory is basically a branch of mathematics concerned with the analysis of random phenomena. It is the very foundation needed for statistical inference and by extension data analysis, is therefore essential for data scientists.

Probability is something we use in our day to day lives without even realizing it. When we look at the time and ask what the chances are that light will be restored within the next two hours, we are unconsciously using probability. How likely is it that I will get money tomorrow? What are the chances of me meeting my lecturer in her office? What are the chances that this class will hold?

When we consider how probable these events will occur, it helps us to make a decision. For example, if I decide based on past experiences with that class that the chances that the class will hold are very low, say 0.2 (2/100), it is very unlikely that I will be attending that class.

If I also decide that based on past experiences with going to see that particular lecturer, that the probability of me meeting her in her office is 0.90, I will most likely go ahead to meet her in her office.

Some commonly used words in Probability theory include:

Probability space

1. Sample space

2. Events

3. Probability measure

Sample Space

This is the set of ALL possible outcomes for a given probability experiment. It contains all the possible outcomes for an experiment.

Event

This is one or more outcomes from an experiment.

There are three types of events:

Independent Events: Here, each event is not affected by another, it occurs without the need for past events.

Dependent Events: This is conditional, in the sense that, a past event affects the event that comes after it.

Mutually Exclusive Events: As the name states, this is when events are mutually exclusive, i.e, can not happen at the same time.

Probability Measure

This is basically a function of how events are assigned to probabilities.

Probability

This is the measure of how likely an event will happen, usually between 0 and 1. It basically tells us how likely or how often an event will happen after repeated trials.

Probability of an event happening = Number of ways it can happen / Total number of outcomes.

In other words, the probability of getting a number 4 when rolling a die is ⅙.

This is because there is only one face of a die with 4 on it, so it is the only possible way getting a 4 could happen, hence this is 1.

While the total number of outcomes is 6, because there are 5 other different possibilities apart from getting a 4, making it a total of 6 possibilities, hence we have ⅙.

Experiment

This is a repeatable procedure with a set of possible results.

Outcome

This is each possible result for an experiment.

Conditional Probability

It is a measure of the probability of an event given that another event has occurred which affects the new event. Ig the probability of the event is influenced or changed as a result of the first or an earlier event, it can be said that the probability of the second event is dependent on the occurrence of the first event.

See DEPENDENT EVENTS above.

RANDOM VARIABLE.

A random experiment is any experiment which the outcomes can’t be predicted with certainty.

A random variable is a possible value gotten from the outcomes of a random experiment. It is an outcome that is naturally random and each outcome has a probability associated with it. Examples of this are throwing a dice, Nigeria’s population, etc.

Now, a random variable can either be discrete or continuous.

What Is A Discrete & Continuous Variable?

Discrete Random Variables are finite. It is limited, has a definite or definable number of possibilities. Throwing a dice, months, days of a week/month, are all defined, finite, and as such are examples of a discrete random variable.

A Continuous Random Variable on the other hand, is exactly as the name states, continuous. It is infinite, can’t be defined, subject to change for different reasons, and as such, the outcome can’t be counted or measured.

A good example is Nigeria’s or the world’s population. This is continuous because it changes over time, literally everyday.

--

--

Oyinkansola Awosan

Technical Writer, Open Source Enthusiast, Machine Learning & Site Reliability Engineer