07. T-test analysis⚓︎
Module items⚓︎
R Script file⚓︎
Copy the code below ➜ Paste into [[RStudio console]] ➜ Hit Enter
source(url("https://raw.githubusercontent.com/ttezcann/ssric-reg/refs/heads/main/regression/docs/assets/r-scripts/0_packages_data.R"));
download.file("https://raw.githubusercontent.com/ttezcann/ssric-reg/refs/heads/main/regression/docs/assets/r-scripts/07-ttest.R", "07-ttest.R");
file.edit("07-ttest.R")
Lab assignment⚓︎
Sample lab assignment⚓︎
Learning outcomes⚓︎
- Learn t-test analysis
- Learn how to run t-test analysis
Suggested reading⚓︎
- 📖
Urdan, Timothy C. 2010. “T Tests.” Pp. 93–103 in Statistics in plain English. New York, NY: Routledge.
T-test analysis basics⚓︎
Group I and Group II
Is there a significant income difference between Group I and Group II?
| Group I (Income mean) |
Group II (Income mean) |
|---|---|
| $32,000 | $32,000 |
Show the answer
- We can’t know the answer without seeing the distribution of the variable.
- The mean of the two groups look same, however, there are more deviations in Group II.
-
RStudio will show whether there is a significant difference. Relying solely on mean scores can be misleading!
id Group I Group II 1 $32,000 $23,150 2 $31,000 120,300 3 $29,500 $12,100 4 $30,100 $32,000 5 $32,500 $25,000 6 $31,150 $20,000 7 $30,230 $11,000 8 $32,400 $12,200 9 $33,000 $11,100 10 $38,120 $53,150
- The [[t-test]] is used to determine if there is a significant difference ([[statistical significance]]) between the means (average scores) of two groups. We conduct t-test, when we want to compare two means (the scores must be continuous).
- We'll check the [[p-value]].
- [[Is my p-value less than 0.05?]]
- T-test can answer following questions:
- Whether the average number of close friends reported by married individuals versus single individuals significantly differs.
- Whether the average hours of daily internet usage between younger (18–34) and older (65+) age cohorts significantly differs.
- Whether the average income between male and female full-time workers significantly differs.
- We'll check the [[p-value]].
-
For example, we would use a t-test if we wished to compare the reading achievement scores of juniors and seniors.
id Class standing
(1 = juniors; 2 = seniors)Reading achievement score
(0 to 100)1 2 88 2 2 79 3 1 54 4 1 22 5 1 91 6 1 56 7 1 87 8 2 91 ... ... ... 300 2 78 - Reading achievement mean score of juniors: 61.20
- Reading achievement mean score of seniors: 83.40
- The difference is 22.20 points.
- The t-test analysis below shows if this score difference is statistically significant.
-
See the "Difference" column: 22.20. On average, the reading achiement scrore of Seniors' is 22.20 points higher than the Juniors'.
- Now check the p-value column. Based on the t-test analysis, this 22.20 points difference is statistically significant because the p-value is less than 0.05.
Model SummaryParameter Class = Junior Class = Senior Difference 95% CI t(298) p Reading achievement score 61.20 83.40 -22.20 (-26.80, -17.60) -9.51 <.001 T-test specifics⚓︎
- In [[t-test]],
- [[Factor variable]] should be:
- [[Binary]];
- [[Outcome variable]] should be:
- [[Continuous]].
- In the example above, class standing is the factor variable with two values: 1=Junior; 2:Senior, thus binary.
-
Reading achievement score is the outcome variable ranged from 0 to 100, thus continuous.
- The t-test table shows that students' class standing (being junior or senior) has a significant effect on their reading achievement score (p<0.05).
flowchart LR subgraph F["Factor variable (Binary)"] A[Class standing<br/>1=Juniors; 2=Seniors] end subgraph O["Outcome variable (Continuous)"] B[Reading achievement score<br/>Range: 0-100] end A ==>|Has an effect on <br/> Significant effect; p<0.05| B- The opposite couldn't be the case. Just because a student's reading achievement score is high, this wouldn't make them a senior next day.
- That's why, class standing is the factor, and the reading achievement score is the outcome.
-
- [[Factor variable]] should be:
GSS Example 1: Significant p-value (
sexandconrinc)⚓︎Find the variables in Variables in GSS page⚓︎
- We wonder if respondents' sex have a statistically significant influence on their personal income.
flowchart LR subgraph F["Factor variable (Binary)"] A[Sex<br/>1=Males; 2=Females] end subgraph O["Outcome variable (Continuous)"] B[Personal income] end A ==>|May have an effect on| B - We want to make sure that the factor variable,
sex, is binary, and the outcome variable,conrincis continuous variable. - We check this information in the Variables in GSS page.
-
Search Ctrl / Cmd+F for the variable names.
Variable name Variable label Variable type Question wording and response categories sexRespondents' sex Binary What's your sex?
(1: Male; 2: Female)conrinc
From: Variables in GSSRespondents' personal income Continuous What is your income in dollars?
[[T-test]] #code⚓︎
- [[Model code]]
-
[[Working code]]
- Line 1: We will put
conrinchere ➜outcome_hereandsexhere ➜factor_here.- [[Outcome]] varible first, [[factor]] variable second.
- Find the working code in this module's R script file. [[Highlighting and running]] this code will create the output below.
- Line 1: We will put
[[T-test]] #output⚓︎
Parameter sex = Male sex = Female Difference 95% CI t(2222.45) p Sig Respondents' personal income 43037.83 31301.51 11736.32 (9148.26, 14324.39) 8.89 0.000 *** [[T-test]] #interpretation significant (p < 0.05)⚓︎
Significant (p < 0.05) t-test interpretation template
The average [Variable label of outcome variable] of [label 1 of factor variable] is [mean] year/dollar/point/score, while [Variable label of outcome variable] of [label 1 of factor variable] is [mean] year/dollar/point/score.
[Variable label of outcome variable] differs by [Variable label of factor variable] in a statistically significant way since the p-value is higher than 0.05.
Significant (p < 0.05) t-test interpretation sample
The average personal income of males is $43,037, The average personal income of females is $31,301.
Personal income differs by respondents’ sex in a statistically significant way since the p-value is less than 0.05.
GSS example 2: Insignificant p-value (
sexandeduc)⚓︎Find the variables in Variables in GSS page⚓︎
- We wonder if respondents' sex have a statistically significant influence on their education.
flowchart LR subgraph F["Factor variable (Binary)"] A[Sex<br/>1=Males; 2=Females] end subgraph O["Outcome variable (Continuous)"] B[Education in years<br/>Range: 0-20] end A ==>|May have an effect on| B - We want to make sure that the factor variable,
sex, is binary, and the outcome variable,educ, is continuous variable. - We check this information in the Variables in GSS page.
-
Search Ctrl / Cmd+F for the variable names.
Variable name Variable label Variable type Question wording and response categories sexRespondents' sex Binary What's your sex?
(1: Male; 2: Female)educ
From: Variables in GSSRespondents' education in years Continuous What is the highest year of school you completed?
[[T-test]] #code⚓︎
- [[Model code]]
-
[[Working code]]
- Line 1: We will put
educhere ➜outcome_hereandsexhere ➜factor_here.- [[Outcome]] varible first, [[factor]] variable second.
- Find the working code in this module's R script file. [[Highlighting and running]] this code will create the output below.
- Line 1: We will put
[[T-test]] #output⚓︎
Parameter sex = Male sex = Female Difference 95% CI t(2222.45) p Sig Respondents' education in years 14.29 14.22 0.06 (-0.12, -0.24) 8.89 0.67 [[T-test]] #interpretation insignificant (p > 0.05)⚓︎
Insignificant (p > 0.05) t-test #interpretation template
The average [Variable label of outcome variable] of [label 1 of factor variable] is [mean] year/dollar/point/score, while [Variable label of outcome variable] of [label 1 of factor variable] is [mean] year/dollar/point/score.
[Variable label of outcome variable] differs by [Variable label of factor variable] in a statistically significant way since the p-value is higher than 0.05.
Insignificant (p > 0.05) t-test #interpretation sample
The average personal income of males is $43,037, The average personal income of females is $31,301.
Personal income differs by respondents’ sex in a statistically significant way since the p-value is less than 0.05.
GSS example 3: Significant p-value (
maritalbinaryandtvhours)⚓︎Step 1: Find the variables in Variables in GSS page⚓︎
- We wonder if respondents' marital status have a statistically significant influence on how many hours they watch television.
flowchart LR subgraph F["Factor variable (Binary)"] A[Marital status<br/>1=Single; 2=Non-single] end subgraph O["Outcome variable (Continuous)"] B[Education in years<br/>Range: 0-20] end A ==>|May have an effect on| B - We want to make sure that the factor variable,
marital, is binary, and the outcome variable,educ, is continuous variable. - We check this information in the Variables in GSS page.
-
Search Ctrl / Cmd+F for the variable names.
Variable name Variable label Variable type Question wording and response categories maritalRespondents' marital status Nominal Are you currently — married, widowed, divorced, separated, or have you never been married?
(1: Married; 2: Widowed; 3: Divorced; 4: Separated; 5: Never married)tvhours
From: Variables in GSSTelevision screen time in hours Continuous On the average day, how many hours do you personally watch television? -
The original
maritalvariable is not binary, but nominal. That's why we first need to recodemaritaland make it a binary variable.
[[Merging values]] #code⚓︎
- Line 1: We put the new variable name for the recoded variable here,
maritalbinary. - Line 2: We put the original variable we want to recode here,
marital. - Line 3-4: We merge values in lines 3-4, customize inside, delete commas, add commas/new values (numbers). [...] are the new labels for the new values. These will appear on the table.
- Line 5: We write this new variable's variable label here "
Recoded variable label"
[[T-test]] #code⚓︎
- [[Model code]]
-
[[Working code]]
- Line 1: We will put
tvhourshere ➜outcome_hereandmaritalbinaryhere ➜factor_here.- [[Outcome]] varible first, [[factor]] variable second.
- Find the working code in this module's R script file. [[Highlighting and running]] this code will create the output below.
- Line 1: We will put
[[T-test]] #output⚓︎
Parameter maritalbinary = Single maritalbinary = Nonsingle Difference 95% CI t(2222.45) p Sig Television screen time in hours 3.60 3.25 0.35 (0.05, 0.64) 2.30 0.022 * [[T-test]] #interpretation significant (p < 0.05)⚓︎
Significant (p < 0.05) t-test interpretation template
The average [Variable label of outcome variable] of [label 1 of factor variable] is [mean] year/dollar/point/score, while [Variable label of outcome variable] of [label 1 of factor variable] is [mean] year/dollar/point/score.
[Variable label of outcome variable] differs by [Variable label of factor variable] in a statistically significant way since the p-value is higher than 0.05.
Significant (p < 0.05) t-test interpretation sample
The average television screen time in hours of single respondents is 3.60 hours, the average television screen time in hours of nonsingle respondents is 3.25 hours,.
Television screen time in hours differs by respondents’ marital status in a statistically significant way since the p-value is less than 0.05.
-