07. T-test analysis⚓︎

Module items⚓︎

R Script file⚓︎

Copy the code below ➜ Paste into [[RStudio console]] ➜ Hit Enter

source(url("https://raw.githubusercontent.com/ttezcann/ssric-reg/refs/heads/main/regression/docs/assets/r-scripts/0_packages_data.R")); 
download.file("https://raw.githubusercontent.com/ttezcann/ssric-reg/refs/heads/main/regression/docs/assets/r-scripts/07-ttest.R", "07-ttest.R"); 
file.edit("07-ttest.R")

Lab assignment⚓︎

T-test

Sample lab assignment⚓︎

Sample: T-test

Learning outcomes⚓︎

Learn t-test analysis
Learn how to run t-test analysis

T-test analysis basics⚓︎

Group I and Group II

Is there a significant income difference between Group I and Group II?

Group I (Income mean)	Group II (Income mean)
$32,000	$32,000

Show the answer

We can’t know the answer without seeing the distribution of the variable.
The mean of the two groups look same, however, there are more deviations in Group II.

RStudio will show whether there is a significant difference. Relying solely on mean scores can be misleading!

id	Group I	Group II
1	$32,000	$23,150
2	$31,000	120,300
3	$29,500	$12,100
4	$30,100	$32,000
5	$32,500	$25,000
6	$31,150	$20,000
7	$30,230	$11,000
8	$32,400	$12,200
9	$33,000	$11,100
10	$38,120	$53,150

The [[t-test]] is used to determine if there is a significant difference ([[statistical significance]]) between the means (average scores) of two groups. We conduct t-test, when we want to compare two means (the scores must be continuous).
- We'll check the [[p-value]].
  - [[Is my p-value less than 0.05?]]
- T-test can answer following questions:
  - Whether the average number of close friends reported by married individuals versus single individuals significantly differs.
  - Whether the average hours of daily internet usage between younger (18–34) and older (65+) age cohorts significantly differs.
  - Whether the average income between male and female full-time workers significantly differs.

For example, we would use a t-test if we wished to compare the reading achievement scores of juniors and seniors.

id	Class standing (1 = juniors; 2 = seniors)	Reading achievement score (0 to 100)
1	2	88
2	2	79
3	1	54
4	1	22
5	1	91
6	1	56
7	1	87
8	2	91
...	...	...
300	2	78

Reading achievement mean score of juniors: 61.20
Reading achievement mean score of seniors: 83.40
- The difference is 22.20 points.

The t-test analysis below shows if this score difference is statistically significant.

See the "Difference" column: 22.20. On average, the reading achiement scrore of Seniors' is 22.20 points higher than the Juniors'.

Now check the p-value column. Based on the t-test analysis, this 22.20 points difference is statistically significant because the p-value is less than 0.05.

Model Summary

Parameter	Class = Junior	Class = Senior	Difference	95% CI	t(298)	p
Reading achievement score	61.20	83.40	-22.20	(-26.80, -17.60)	-9.51	<.001

T-test specifics⚓︎

In [[t-test]],
- [[Factor variable]] should be:
  - [[Binary]];
- [[Outcome variable]] should be:
  - [[Continuous]].
- In the example above, class standing is the factor variable with two values: 1=Junior; 2:Senior, thus binary.
  - Reading achievement score is the outcome variable ranged from 0 to 100, thus continuous.
    - The t-test table shows that students' class standing (being junior or senior) has a significant effect on their reading achievement score (p<0.05).
```
flowchart LR
    subgraph F["Factor variable (Binary)"]
        A[Class standing<br/>1=Juniors; 2=Seniors]
    end

    subgraph O["Outcome variable (Continuous)"]
        B[Reading achievement score<br/>Range: 0-100]
    end

    A ==>|Has an effect on <br/> Significant effect; p<0.05| B
```
    - The opposite couldn't be the case. Just because a student's reading achievement score is high, this wouldn't make them a senior next day.
      - That's why, class standing is the factor, and the reading achievement score is the outcome.

GSS Example 1: Significant p-value (`sex` and `conrinc`)⚓︎

Find the variables in Variables in GSS page⚓︎

We wonder if respondents' sex have a statistically significant influence on their personal income.

flowchart LR
    subgraph F["Factor variable (Binary)"]
        A[Sex<br/>1=Males; 2=Females]
    end

    subgraph O["Outcome variable (Continuous)"]
        B[Personal income]
    end

    A ==>|May have an effect on| B

We want to make sure that the factor variable, sex, is binary, and the outcome variable, conrinc is continuous variable.
We check this information in the Variables in GSS page.

Search Ctrl / Cmd+F for the variable names.

Variable name	Variable label	Variable type	Question wording and response categories
`sex`	Respondents' sex	Binary	What's your sex? (1: Male; 2: Female)
`conrinc` From: Variables in GSS	Respondents' personal income	Continuous	What is your income in dollars?

[[T-test]] #code⚓︎

[[Model code]]

t.test(outcome_here ~ factor_here, data = gss) |>
parameters() |> display(format="html")

[[Working code]]
1 2
t.test(conrinc ~ sex, data = gss) |> parameters() |> display(format="html")
- Line 1: We will put conrinc here ➜ outcome_here and sex here ➜ factor_here.
  - [[Outcome]] varible first, [[factor]] variable second.
  - Find the working code in this module's R script file. [[Highlighting and running]] this code will create the output below.

[[T-test]] #output⚓︎

Parameter	sex = Male	sex = Female	Difference	95% CI	t(2222.45)	p	Sig
Respondents' personal income	43037.83	31301.51	11736.32	(9148.26, 14324.39)	8.89	0.000	***

[[T-test]] #interpretation significant (p < 0.05)⚓︎

Significant (p < 0.05) t-test interpretation template

The average [Variable label of outcome variable] of [label 1 of factor variable] is [mean] year/dollar/point/score, while [Variable label of outcome variable] of [label 1 of factor variable] is [mean] year/dollar/point/score.

[Variable label of outcome variable] differs by [Variable label of factor variable] in a statistically significant way since the p-value is higher than 0.05.

Significant (p < 0.05) t-test interpretation sample

The average personal income of males is $43,037, The average personal income of females is $31,301.

Personal income differs by respondents’ sex in a statistically significant way since the p-value is less than 0.05.

GSS example 2: Insignificant p-value (`sex` and `educ`)⚓︎

Find the variables in Variables in GSS page⚓︎

We wonder if respondents' sex have a statistically significant influence on their education.

flowchart LR
    subgraph F["Factor variable (Binary)"]
        A[Sex<br/>1=Males; 2=Females]
    end

    subgraph O["Outcome variable (Continuous)"]
        B[Education in years<br/>Range: 0-20]
    end

    A ==>|May have an effect on| B

We want to make sure that the factor variable, sex, is binary, and the outcome variable, educ, is continuous variable.
We check this information in the Variables in GSS page.

Search Ctrl / Cmd+F for the variable names.

Variable name	Variable label	Variable type	Question wording and response categories
`sex`	Respondents' sex	Binary	What's your sex? (1: Male; 2: Female)
`educ` From: Variables in GSS	Respondents' education in years	Continuous	What is the highest year of school you completed?

[[T-test]] #code⚓︎

[[Model code]]

t.test(outcome_here ~ factor_here, data = gss) |>
parameters() |> display(format="html")

[[Working code]]
1 2
t.test(educ ~ sex, data = gss) |> parameters() |> display(format="html")
- Line 1: We will put educ here ➜ outcome_here and sex here ➜ factor_here.
  - [[Outcome]] varible first, [[factor]] variable second.
  - Find the working code in this module's R script file. [[Highlighting and running]] this code will create the output below.

[[T-test]] #output⚓︎

Parameter	sex = Male	sex = Female	Difference	95% CI	t(2222.45)	p	Sig
Respondents' education in years	14.29	14.22	0.06	(-0.12, -0.24)	8.89	0.67

[[T-test]] #interpretation insignificant (p > 0.05)⚓︎

Insignificant (p > 0.05) t-test #interpretation template

The average [Variable label of outcome variable] of [label 1 of factor variable] is [mean] year/dollar/point/score, while [Variable label of outcome variable] of [label 1 of factor variable] is [mean] year/dollar/point/score.

[Variable label of outcome variable] differs by [Variable label of factor variable] in a statistically significant way since the p-value is higher than 0.05.

Insignificant (p > 0.05) t-test #interpretation sample

The average personal income of males is $43,037, The average personal income of females is $31,301.

Personal income differs by respondents’ sex in a statistically significant way since the p-value is less than 0.05.

GSS example 3: Significant p-value (`maritalbinary` and `tvhours`)⚓︎

Step 1: Find the variables in Variables in GSS page⚓︎

We wonder if respondents' marital status have a statistically significant influence on how many hours they watch television.

flowchart LR
    subgraph F["Factor variable (Binary)"]
        A[Marital status<br/>1=Single; 2=Non-single]
    end

    subgraph O["Outcome variable (Continuous)"]
        B[Education in years<br/>Range: 0-20]
    end

    A ==>|May have an effect on| B

We want to make sure that the factor variable, marital, is binary, and the outcome variable, educ, is continuous variable.
We check this information in the Variables in GSS page.

Search Ctrl / Cmd+F for the variable names.

Variable name	Variable label	Variable type	Question wording and response categories
`marital`	Respondents' marital status	Nominal	Are you currently — married, widowed, divorced, separated, or have you never been married? (1: Married; 2: Widowed; 3: Divorced; 4: Separated; 5: Never married)
`tvhours` From: Variables in GSS	Television screen time in hours	Continuous	On the average day, how many hours do you personally watch television?

The original marital variable is not binary, but nominal. That's why we first need to recode marital and make it a binary variable.

[[Merging values]] #code⚓︎

gss$maritalbinary <- 
rec(gss$marital, rec = 
"5 = 1 [Single]; 
1, 2, 3, 4 = 2 [Nonsingle]",
var.label = "Recoded respondents' marital status")

Line 1: We put the new variable name for the recoded variable here, maritalbinary.
Line 2: We put the original variable we want to recode here, marital.
Line 3-4: We merge values in lines 3-4, customize inside, delete commas, add commas/new values (numbers). [...] are the new labels for the new values. These will appear on the table.
Line 5: We write this new variable's variable label here "Recoded variable label"

[[T-test]] #code⚓︎

[[Model code]]

t.test(outcome_here ~ factor_here, data = gss) |>
parameters() |> display(format="html")

[[Working code]]
1 2
t.test(tvhours ~ maritalbinary, data = gss) |> parameters() |> display(format="html")
- Line 1: We will put tvhours here ➜ outcome_here and maritalbinary here ➜ factor_here.
  - [[Outcome]] varible first, [[factor]] variable second.
  - Find the working code in this module's R script file. [[Highlighting and running]] this code will create the output below.

[[T-test]] #output⚓︎

Parameter	maritalbinary = Single	maritalbinary = Nonsingle	Difference	95% CI	t(2222.45)	p	Sig
Television screen time in hours	3.60	3.25	0.35	(0.05, 0.64)	2.30	0.022	*

[[T-test]] #interpretation significant (p < 0.05)⚓︎

Significant (p < 0.05) t-test interpretation template

The average [Variable label of outcome variable] of [label 1 of factor variable] is [mean] year/dollar/point/score, while [Variable label of outcome variable] of [label 1 of factor variable] is [mean] year/dollar/point/score.

[Variable label of outcome variable] differs by [Variable label of factor variable] in a statistically significant way since the p-value is higher than 0.05.

Significant (p < 0.05) t-test interpretation sample

The average television screen time in hours of single respondents is 3.60 hours, the average television screen time in hours of nonsingle respondents is 3.25 hours,.

Television screen time in hours differs by respondents’ marital status in a statistically significant way since the p-value is less than 0.05.

07. T-test analysis⚓︎

Module items⚓︎

R Script file⚓︎

Lab assignment⚓︎

Sample lab assignment⚓︎

Learning outcomes⚓︎

Suggested reading⚓︎

T-test analysis basics⚓︎

T-test specifics⚓︎

GSS Example 1: Significant p-value (sex and conrinc)⚓︎

Find the variables in Variables in GSS page⚓︎

[[T-test]] #code⚓︎

[[T-test]] #output⚓︎

[[T-test]] #interpretation significant (p < 0.05)⚓︎

GSS example 2: Insignificant p-value (sex and educ)⚓︎

Find the variables in Variables in GSS page⚓︎

[[T-test]] #code⚓︎

[[T-test]] #output⚓︎

[[T-test]] #interpretation insignificant (p > 0.05)⚓︎

GSS example 3: Significant p-value (maritalbinary and tvhours)⚓︎

Step 1: Find the variables in Variables in GSS page⚓︎

[[Merging values]] #code⚓︎

[[T-test]] #code⚓︎

[[T-test]] #output⚓︎

[[T-test]] #interpretation significant (p < 0.05)⚓︎

GSS Example 1: Significant p-value (`sex` and `conrinc`)⚓︎

GSS example 2: Insignificant p-value (`sex` and `educ`)⚓︎

GSS example 3: Significant p-value (`maritalbinary` and `tvhours`)⚓︎