Introduction
When I was firstly exposed to statistics, I always heard about p-value, null hypothesis, and confidence interval. It is difficult to understand them without examples. Luckily, (Stanton 2017) offers us a nice introduction of statistics using hands-on examples. This notebook is a summary of the notes I took while reading the book. In addition, I will update this notebook with other statistics concepts that I learn in the future.
Currently, this notebook is structured as follows:
- Statistical Vocabulary: This chapter introduces heads-up statistics terminologies such as Descriptive Statistics and Inferential Statistics, Parametric vs. Nonparametric, Level of Measurement, P-value, etc.
- Probability: This chapter touches on sampling with probability distribution, form which we can derive population parameters such as mean and variance.
- Inference: After we obtain the sample data, inference induces a certain level of confidence that we can draw conclusions based on the sample data. This chapter will also include the first inferential statistics example, t-test.
- Bayesian and Frequentist Statistics: This chapter introduces the difference between Bayesian and Frequentist statistics. In the following chapters, the statistical analysis will be conducted using these two approaches.
- Comparing Groups: While the t-test compare the means of the two groups, this chapter focus on methods that simultaneously compare the mean difference among any number of groups, such as analysis of variance (ANOVA).
- Associations between Variables: This chapter introduces associations between variables depicting how multiple variables are related to each other. Correlation such as Pearson product-moment correlation, Chi-square test, KENDALL’S TAU and SPEARMAN’S RANK-ORDER CORRELATION are discussed in this chapter.
- Linear Multiple Regression: This chapter introduces linear multiple regression, which is a method to predict a continuous variable based on multiple independent variables.
- Interactions in ANOVA and Regression: This chapter introduces interactions between two variables in ANOVA and regression, which is the effect of one variable on the response variable, conditional on the value of the other variable.
- Logistic Regression: Different from linear regression that predicts continuous variable, logistic regression predicts a categorical variable.
- Analyzing Change over Time: In this chapter, we consider the dependency among data points that collected at different point in a time series. We examine two configuration of data: repeated measures and time series.
- Dealing with Too Many Variables: This chapter introduces methods to deal with too many variables, such as principal component analysis (PCA) and factor analysis.
I also include a summary diagram of the statistical analysis methods based on the statistics family, the group number, the variable types, assumption in the end of this notebook.
References
Stanton, Jeffrey M. 2017. Reasoning with Data: An Introduction to Traditional and Bayesian Statistics Using r. Guilford Publications.