Skip to content

07. T-test analysis⚓︎

Module items⚓︎

R Script file⚓︎

Copy the code below ➜ Paste into [[RStudio console]] ➜ Hit Enter

source(url("https://raw.githubusercontent.com/ttezcann/ssric-reg/refs/heads/main/regression/docs/assets/r-scripts/0_packages_data.R")); 
download.file("https://raw.githubusercontent.com/ttezcann/ssric-reg/refs/heads/main/regression/docs/assets/r-scripts/07-ttest.R", "07-ttest.R"); 
file.edit("07-ttest.R")

Lab assignment⚓︎

T-test

Sample lab assignment⚓︎

Sample: T-test

Learning outcomes⚓︎

  1. Learn t-test analysis
  2. Learn how to run t-test analysis

Suggested reading⚓︎

  • 📖
    Urdan, Timothy C. 2010. “T Tests.” Pp. 93–103 in Statistics in plain English. New York, NY: Routledge.

T-test analysis basics⚓︎

Group I and Group II

Is there a significant income difference between Group I and Group II?

Group I

(Income mean)
Group II

(Income mean)
$32,000 $32,000
Show the answer
  • We can’t know the answer without seeing the distribution of the variable.
  • The mean of the two groups look same, however, there are more deviations in Group II.
  • RStudio will show whether there is a significant difference. Relying solely on mean scores can be misleading!

    id Group I Group II
    1 $32,000 $23,150
    2 $31,000 120,300
    3 $29,500 $12,100
    4 $30,100 $32,000
    5 $32,500 $25,000
    6 $31,150 $20,000
    7 $30,230 $11,000
    8 $32,400 $12,200
    9 $33,000 $11,100
    10 $38,120 $53,150
  • The [[t-test]] is used to determine if there is a significant difference ([[statistical significance]]) between the means (average scores) of two groups. We conduct t-test, when we want to compare two means (the scores must be continuous).
    • We'll check the [[p-value]].
      • [[Is my p-value less than 0.05?]]
    • T-test can answer following questions:
      • Whether the average number of close friends reported by married individuals versus single individuals significantly differs.
      • Whether the average hours of daily internet usage between younger (18–34) and older (65+) age cohorts significantly differs.
      • Whether the average income between male and female full-time workers significantly differs.
  • For example, we would use a t-test if we wished to compare the reading achievement scores of juniors and seniors.

    id Class standing
    (1 = juniors; 2 = seniors)
    Reading achievement score
    (0 to 100)
    1 2 88
    2 2 79
    3 1 54
    4 1 22
    5 1 91
    6 1 56
    7 1 87
    8 2 91
    ... ... ...
    300 2 78
    • Reading achievement mean score of juniors: 61.20
    • Reading achievement mean score of seniors: 83.40
      • The difference is 22.20 points.
    • The t-test analysis below shows if this score difference is statistically significant.
      • See the "Difference" column: 22.20. On average, the reading achiement scrore of Seniors' is 22.20 points higher than the Juniors'.

        • Now check the p-value column. Based on the t-test analysis, this 22.20 points difference is statistically significant because the p-value is less than 0.05.

        Model Summary
        Parameter Class = Junior Class = Senior Difference 95% CI t(298) p
        Reading achievement score 61.20 83.40 -22.20 (-26.80, -17.60) -9.51 <.001

T-test specifics⚓︎

  • In [[t-test]],
    • [[Factor variable]] should be:
      • [[Binary]];
    • [[Outcome variable]] should be:
      • [[Continuous]].
    • In the example above, class standing is the factor variable with two values: 1=Junior; 2:Senior, thus binary.
      • Reading achievement score is the outcome variable ranged from 0 to 100, thus continuous.

        • The t-test table shows that students' class standing (being junior or senior) has a significant effect on their reading achievement score (p<0.05).
        flowchart LR
            subgraph F["Factor variable (Binary)"]
                A[Class standing<br/>1=Juniors; 2=Seniors]
            end
        
            subgraph O["Outcome variable (Continuous)"]
                B[Reading achievement score<br/>Range: 0-100]
            end
        
            A ==>|Has an effect on <br/> Significant effect; p<0.05| B
        • The opposite couldn't be the case. Just because a student's reading achievement score is high, this wouldn't make them a senior next day.
          • That's why, class standing is the factor, and the reading achievement score is the outcome.

GSS Example 1: Significant p-value (sex and conrinc)⚓︎

Find the variables in Variables in GSS page⚓︎

  1. We wonder if respondents' sex have a statistically significant influence on their personal income.
    flowchart LR
        subgraph F["Factor variable (Binary)"]
            A[Sex<br/>1=Males; 2=Females]
        end
    
        subgraph O["Outcome variable (Continuous)"]
            B[Personal income]
        end
    
        A ==>|May have an effect on| B
  2. We want to make sure that the factor variable, sex, is binary, and the outcome variable, conrinc is continuous variable.
  3. We check this information in the Variables in GSS page.
  4. Search   Ctrl  /  Cmd+F  for the variable names.

    Variable name Variable label Variable type Question wording and response categories
    sex Respondents' sex Binary What's your sex?

    (1: Male; 2: Female)
    conrinc

    From: Variables in GSS
    Respondents' personal income Continuous What is your income in dollars?

[[T-test]] #code⚓︎

  • [[Model code]]
    t.test(outcome_here ~ factor_here, data = gss) |>
    parameters() |> display(format="html")
    
  • [[Working code]]

    t.test(conrinc ~ sex, data = gss) |> 
    parameters() |> display(format="html")
    

    • Line 1: We will put conrinc here ➜ outcome_here and sex here ➜ factor_here.
      • [[Outcome]] varible first, [[factor]] variable second.
      • Find the working code in this module's R script file. [[Highlighting and running]] this code will create the output below.

[[T-test]] #output⚓︎

Parameter sex = Male sex = Female Difference 95% CI t(2222.45) p Sig
Respondents' personal income 43037.83 31301.51 11736.32 (9148.26, 14324.39) 8.89 0.000 ***

[[T-test]] #interpretation significant (p < 0.05)⚓︎

Significant (p < 0.05) t-test interpretation template

The average [Variable label of outcome variable] of [label 1 of factor variable] is [mean] year/dollar/point/score, while [Variable label of outcome variable] of [label 1 of factor variable] is [mean] year/dollar/point/score.

[Variable label of outcome variable] differs by [Variable label of factor variable] in a statistically significant way since the p-value is higher than 0.05.

Significant (p < 0.05) t-test interpretation sample

The average personal income of males is $43,037, The average personal income of females is $31,301.

Personal income differs by respondents’ sex in a statistically significant way since the p-value is less than 0.05.

GSS example 2: Insignificant p-value (sex and educ)⚓︎

Find the variables in Variables in GSS page⚓︎

  1. We wonder if respondents' sex have a statistically significant influence on their education.
    flowchart LR
        subgraph F["Factor variable (Binary)"]
            A[Sex<br/>1=Males; 2=Females]
        end
    
        subgraph O["Outcome variable (Continuous)"]
            B[Education in years<br/>Range: 0-20]
        end
    
        A ==>|May have an effect on| B
  2. We want to make sure that the factor variable, sex, is binary, and the outcome variable, educ, is continuous variable.
  3. We check this information in the Variables in GSS page.
  4. Search   Ctrl  /  Cmd+F  for the variable names.

    Variable name Variable label Variable type Question wording and response categories
    sex Respondents' sex Binary What's your sex?

    (1: Male; 2: Female)
    educ

    From: Variables in GSS
    Respondents' education in years Continuous What is the highest year of school you completed?

[[T-test]] #code⚓︎

  • [[Model code]]
    t.test(outcome_here ~ factor_here, data = gss) |>
    parameters() |> display(format="html")
    
  • [[Working code]]

    t.test(educ ~ sex, data = gss) |> 
    parameters() |> display(format="html")
    

    • Line 1: We will put educ here ➜ outcome_here and sex here ➜ factor_here.
      • [[Outcome]] varible first, [[factor]] variable second.
      • Find the working code in this module's R script file. [[Highlighting and running]] this code will create the output below.

[[T-test]] #output⚓︎

Parameter sex = Male sex = Female Difference 95% CI t(2222.45) p Sig
Respondents' education in years 14.29 14.22 0.06 (-0.12, -0.24) 8.89 0.67

[[T-test]] #interpretation insignificant (p > 0.05)⚓︎

Insignificant (p > 0.05) t-test #interpretation template

The average [Variable label of outcome variable] of [label 1 of factor variable] is [mean] year/dollar/point/score, while [Variable label of outcome variable] of [label 1 of factor variable] is [mean] year/dollar/point/score.

[Variable label of outcome variable] differs by [Variable label of factor variable] in a statistically significant way since the p-value is higher than 0.05.

Insignificant (p > 0.05) t-test #interpretation sample

The average personal income of males is $43,037, The average personal income of females is $31,301.

Personal income differs by respondents’ sex in a statistically significant way since the p-value is less than 0.05.

GSS example 3: Significant p-value (maritalbinary and tvhours)⚓︎

Step 1: Find the variables in Variables in GSS page⚓︎

  1. We wonder if respondents' marital status have a statistically significant influence on how many hours they watch television.
    flowchart LR
        subgraph F["Factor variable (Binary)"]
            A[Marital status<br/>1=Single; 2=Non-single]
        end
    
        subgraph O["Outcome variable (Continuous)"]
            B[Education in years<br/>Range: 0-20]
        end
    
        A ==>|May have an effect on| B
  2. We want to make sure that the factor variable, marital, is binary, and the outcome variable, educ, is continuous variable.
  3. We check this information in the Variables in GSS page.
  4. Search   Ctrl  /  Cmd+F  for the variable names.

    Variable name Variable label Variable type Question wording and response categories
    marital Respondents' marital status Nominal Are you currently — married, widowed, divorced, separated, or have you never been married?

    (1: Married; 2: Widowed; 3: Divorced; 4: Separated; 5: Never married)
    tvhours

    From: Variables in GSS
    Television screen time in hours Continuous On the average day, how many hours do you personally watch television?
  5. The original marital variable is not binary, but nominal. That's why we first need to recode marital and make it a binary variable.

[[Merging values]] #code⚓︎

1
2
3
4
5
gss$maritalbinary <- 
rec(gss$marital, rec = 
"5 = 1 [Single]; 
1, 2, 3, 4 = 2 [Nonsingle]",
var.label = "Recoded respondents' marital status")
  • Line 1: We put the new variable name for the recoded variable here, maritalbinary.
  • Line 2: We put the original variable we want to recode here, marital.
  • Line 3-4: We merge values in lines 3-4, customize inside, delete commas, add commas/new values (numbers). [...] are the new labels for the new values. These will appear on the table.
  • Line 5: We write this new variable's variable label here "Recoded variable label"

[[T-test]] #code⚓︎

  • [[Model code]]
    t.test(outcome_here ~ factor_here, data = gss) |>
    parameters() |> display(format="html")
    
  • [[Working code]]

    t.test(tvhours ~ maritalbinary, data = gss) |> 
    parameters() |> display(format="html")
    

    • Line 1: We will put tvhours here ➜ outcome_here and maritalbinary here ➜ factor_here.
      • [[Outcome]] varible first, [[factor]] variable second.
      • Find the working code in this module's R script file. [[Highlighting and running]] this code will create the output below.

[[T-test]] #output⚓︎

Parameter maritalbinary = Single maritalbinary = Nonsingle Difference 95% CI t(2222.45) p Sig
Television screen time in hours 3.60 3.25 0.35 (0.05, 0.64) 2.30 0.022 *

[[T-test]] #interpretation significant (p < 0.05)⚓︎

Significant (p < 0.05) t-test interpretation template

The average [Variable label of outcome variable] of [label 1 of factor variable] is [mean] year/dollar/point/score, while [Variable label of outcome variable] of [label 1 of factor variable] is [mean] year/dollar/point/score.

[Variable label of outcome variable] differs by [Variable label of factor variable] in a statistically significant way since the p-value is higher than 0.05.

Significant (p < 0.05) t-test interpretation sample

The average television screen time in hours of single respondents is 3.60 hours, the average television screen time in hours of nonsingle respondents is 3.25 hours,.

Television screen time in hours differs by respondents’ marital status in a statistically significant way since the p-value is less than 0.05.