1.2 Inferential Statistics

Inferential statistics is the process of using data to draw conclusions about the population from which the data were collected. It is used to make decisions about the population based on the sample data. The most common methods of inferential statistics are the t-test, the chi-square test, and the ANOVA.

1.2.1 Deduction vs. Induction

  • Deduction (e.g., syllogism) is making an inference based on widely-accepted facts or premises. For instance, if I know that all objects will fall to the ground, then I infer that the apple will also fall to the ground.
  • Induction (e.g., generalization) is making an inference based on observation, often of a sample. For example, if I observe an apple is red, and then I observe another apple, and another is red, I might infer that the apple is red. Inference induces a certain level of confidence that we can draw conclusions based on the sample data. However, unlike syllogism, we cannot attest to the conclusion by merely observing or by using statistical inference. But why it still matters if we cannot prove anything? Well, we can use statistical inference to develop a weight of evidence that reduces the uncertainty of a conclusion. Making estimates and/or inferences about a population from a sample of data with some degree of confidence is the aim of statistical inference.

1.2.2 Parametric vs. Nonparametric

Parametric statistics is a type of inferential statistics that assumes the data is normally distributed. The most common parametric statistics are the t-test, the chi-square test, and the ANOVA. Nonparametric statistics is a type of inferential statistics that does not assume the data is normally distributed. The most common nonparametric statistics are the Mann-Whitney U test, the Kruskal-Wallis test, and the Friedman test.

1.2.3 Linear vs. Nonlinear

Linear statistics is a type of inferential statistics that assumes the relationship between the predictor and the response variable is linear. The most common linear statistics are the t-test, the chi-square test, and the ANOVA. Nonlinear statistics is a type of inferential statistics that does not assume the relationship between the predictor and the response variable is linear. The most common nonlinear statistics are the Mann-Whitney U test, the Kruskal-Wallis test, and the Friedman test.

1.2.3.1 Independent variables

Independent variables are the variables that are manipulated by the researcher. They are also called the predictor variables or the explanatory variables. The independent variables are the variables that are used to predict the dependent variable.

1.2.3.2 Dependent variables

Dependent variables are the variables that are measured by the researcher. They are also called the response variables or the outcome variables. The dependent variables are the variables that are predicted by the independent variables.

1.2.3.3 General linear model (GLM)

The general linear model (GLM) is a statistical model that is used to describe the relationship between a dependent variable and one or more independent variables. The dependent variable is assumed to be a linear combination of the independent variables and a set of random error terms. The independent variables can be continuous or categorical. The GLM is a generalization of the linear regression model. The GLM is used to model the relationship between a continuous dependent variable and one or more independent variables. The GLM is also used to model the relationship between a binary dependent variable and one or more independent variables.