Week 4 - BALT 4396B - Chapter 5

 Probability and Statistics for Data Science



In chapter 5 of Data Toolkit: Python + Hands-On Math we are introduced to some important concepts of math and data science; Probability and statistics. Their role within data science is to help data scientists interpret/understand the information and numbers that they are working with. By breaking down data they are able to get a better understanding and make the best decisions that they can. Data scientists can also get a feel for the kinds of data models that they are working with and see which one's are producing the best results. To be more specific about the type of probability and statistic concepts that chapter 5 introduces us to it is descriptive statistics and probability distributions. We also get to see examples of them being used with Python. 

Descriptive Statistics:
What are descriptive statistics and how do they work? Descriptive statistics summarize focal points that are found within a data set. A focal point in a data set could be something like calculating for the mean. The focus is on certain numbers and finding the average for those. Whatever you get for the mean is the summary of those specific numbers. Descriptive statistics also include calculating for the median mode, range, variance, and the standard deviation. 
Here is a better example of what a descriptive statistic is like. Let's say we do a survey on twelve students at school. We ask them how old they are. Six out of the twelve students say that they are 21, three of the twelve students say that they are 22, and the other three are 20. The mode out of these numbers is 21. It is the number that shows up the most. We can assume that there are more 21-year-olds than any other age. This is a quick description of these twelve students/numbers.

Probability Distributions:
A probability distribution is a mathematical function that shows and predicts the probability of different outcomes for any variable that a data scientist is observing and working with. The probability functions include the normal (also knows as the Gaussian), binomial, and the poisson distribution. The shape, median, modes, means. parameters, etc, vary for these different distributions. 


Comments