12. Dummy variables⚓︎

Module items⚓︎

R Script file⚓︎

Copy the code below ➜ Paste into [[RStudio console]] ➜ Hit Enter

source(url("https://raw.githubusercontent.com/ttezcann/ssric-reg/refs/heads/main/regression/docs/assets/r-scripts/0_packages_data.R")); 
download.file("https://raw.githubusercontent.com/ttezcann/ssric-reg/refs/heads/main/regression/docs/assets/r-scripts/12-dummy.R", "12-dummy.R"); 
file.edit("12-dummy.R")

Lab assignment⚓︎

Dummy variables

Sample lab assignment⚓︎

Sample: Dummy variables

Learning outcomes⚓︎

Learn how to create dummy variables for categorical and continuous variables
Learn how to use dummy variables in a linear regression model
Learn how to interpret the coefficients of dummy variables

Dummy variables definition⚓︎

Regression analysis is used with continuous variables.
- However, in social sciences we often need to work with categorical variables in which the different values have no real numerical relationship with each other.
A [[dummy variable]] is a numerical variable used in regression analysis to represent the effect of categorical factor variables on the outcome variable.
- Specifically, dummy variables are used to compare categories, such as “the effect of being female on income compared to being male.”

Examples⚓︎

Example: Rent prices⚓︎

Imagine a linear regression analysis where we aim to demonstrate the effects of “population of city” (a continuous variable) and “house type” (categorical: 1 for Townhouse, 2 for Studio) on rent.

In such a model, we would argue that the "population of the city" would increase the rent. Moreover, living in a townhouse would increase the rent compared to living in a studio apartment.

flowchart LR
subgraph F["Factor variables"]
    A[Population of city <br><br> Continuous variable <br><br> Min: 5K, Max: 10M]
end

subgraph F["Factor variables"]
    B[House type <br><br> Categorical variable <br><br> 1: Townhouse; 2: Studio]
end

subgraph O["Outcome variable"]
    C[Rent]
end

A -.->|May affect| C
B -.->|May affect| C

If we include the original house type variable in our model as it is, the linear regression model will incorrectly assume that:
- Living in a "Studio" (2) is somehow twice the value of living in a “Townhouse” (1).
- This would be similar to estimating the coefficients for age; 20-year-old and 40-year-old respondents (40 is twice 20). The solution:

The solution is to use [[dummy variable]]s - variables with only two values, zero and one.

In dummy variable 1 (townhouse), one indicates people living in townhouse, zero indicates who do not. So, townhouse dummy variable will show the effect of “living in a townhouse” on the rent.

In dummy variable 2 (studio), one indicates people living in studio, zero indicates who do not. So, studio dummy variable will show the effect of “living in a studio” on the rent.

respondent	Original variable	Dummy variable 1	Dummy variable 2
	housetype	townhouse	studio
1	2 (studio)	0	1
2	1 (townhouse)	1	0
3	2 (studio)	0	1
4	1 (townhouse)	1	0
5	2 (studio)	0	1
6	1 (townhouse)	1	0

GSS example: Sex⚓︎

If we have a variable for sex with two responses (1=male and 2=female), we can't use the original values of 1 and 2, and interpret that as meaning being female is somehow two times of that being male.
- The solution is to use dummy variables - variables with only two values, zero and one.
It does make sense to create a variable called “male" and interpret it as meaning that someone assigned a 1 on this variable is male and someone with a 0 is not.
- Following this procedure, we also create a separate “female” dummy variable to show the effects of being female on the outcome variable, such as "personal income."
  
  respondent Original variable Dummy variable 1 Dummy variable 2
  
  sex male female
  
  1 1 (male) 0 1
  
  2 1 (male) 1 0
  
  3 2 (female) 0 1
  
  4 1 (male) 1 0
  
  5 2 (female) 0 1
  
  6 2 (female) 1 0

Dummy variable coding structure⚓︎

If there are two categories in the variable, we create two dummy variables, one for each category.

Variable name Variable label Variable type Question wording and response categories

sex Respondents' sex Binary What's your sex?

(1: Male; 2: Female)
Since we create new variables using the dummy variable codes below, we also need to write these dummy variables' new variable labels on our own. Such as;
- Respondents' sex (1: Male; 2: Female)
  - Being male
  - Being female
- Respondents' immigrant status (1: Yes; 2: No)
  - Being nonimmigrant
  - Being immigrant
- Belief in life after death (1: Yes; 2: No)
  - Believing in life after death
  - Not believing in life after death
- Level of finding life exciting (1: Exciting; 2: Routine; 3: Dull)
  - Finding life exciting
  - Finding life routine
  - Finding life dull
- Confidence level in education (1: A great deal; 2: Only some; 3: Hardly any)
  - Having a great deal confidence in education
  - Having only some confidence in education
  - Having hardly any confidence in education
- Home ownership status (1: Own; 2: Rent)
  - Owning a house
    - Or Having a house
  - Not owning a house
    - Or Not having a house

Model code

gss$dummyvariable1_name <- 
ifelse(gss$orig_var == value, 1, 0,
label = "Dummy variable's variable label")

gss$dummyvariable2_name <- 
ifelse(gss$orig_var == value, 1, 0,
label = "Dummy variable's variable label")

Working code
1 2 3 4 5 6 7
gss$male <- ifelse(gss$sex == 1, 1, 0, label = "Being male") gss$female <- ifelse(gss$sex == 2, 1, 0, label = "Being female")
- Line 1: We put the new variable name here:
  - male here ➜ dummyvariable1_name
    - It's better to write the value label here, thus male. Something simple and memorable.
- Line 2: We put the original variable that we want to create a dummy variable.
  - sex here ➜ orig_var
  - since 1 is male in GSS dataset, we put 1 here ➜ value
    - The whole code with the last 1, 0 means:
      - "if sex is 1, create a new variable called “male”, assign them “1”, and assign the rest “0”".
- Line 3: We write this new dummy variable's variable label here "Being male"
- Line 5: We put the new variable name here:
  - female here ➜ dummyvariable1_name
    - It's better to write the value label here, thus female. Something simple and memorable.
- Line 6: We put the original variable that we want to create a dummy variable.
  - sex here ➜ orig_var
  - since 2 is female in GSS dataset, we put 2 here ➜ value
    - The whole code with the last 1, 0 means:
      - "if sex is 2, create a new variable called “female”, assign them “1”, and assign the rest “0”".
- Line 7: We write this new dummy variable's variable label here "Being female"
- Creating a dummy variable is in a way recoding a variable, and thus creating a new variable.
- Running these codes will create two more variables in GSS dataset, male and female.
  - That's it, two more variables, do not expect to see an output.

Adding dummy variable to a regression model⚓︎

Let's add the dummy variables we have just created to the model we used in Linear regression basics module.

Model code

model5 <- lm(outcome_var ~ factor_var1 + factor_var2 + factor_var3 + factor_var4 + factor_var5 + factor_var6, data = gss)
tab_model(model5, show.std = T, show.ci = F, collapse.se = T)

Working code
1 2
model5 <- lm(conrinc ~ res16 + age + prestg10 + educ + male + female, data = gss) tab_model(model5, show.std = T, show.ci = F, collapse.se = T)
- Line 1: We put conrinc here ➜ outcome_here; res16 here ➜ factor_var1; age here ➜ factor_var2; prestg10 here ➜ factor_var3; educ here ➜ factor_var4; male here ➜ factor_var5; female here ➜ factor_var6.
  - Outcome variable variable first; then, factor variables separated by plus (+).
- Line 2: Check the first argument: model5. If this is model5, then we should use model5 here
  - This needs to be model5, otherwise this code won't work.

Respondents' personal income

Factors	Coefficients	std. Beta	p
(Intercept)	-48487.47 (3955.96)	-0.00 (0.02)	0.001***
Population density of residence during adolescence years	570.97 (409.78)	0.03 (0.02)	0.164
Respondents' age	191.36 (40.68)	0.09 (0.02)	0.001***
Respondents' occupational prestige score	624.94 (49.18)	0.26 (0.02)	0.001***
Respondents' education in years	2786.02 (252.43)	0.23 (0.02)	0.001***
male	11787.50 (1187.55)	0.18 (0.02)	0.001***
Observations	2268
R² / R² adjusted	0.231 / 0.229

RStudio created the table, but where's female?

Actually, RStudio console part shows: "Parameters female were not estimable :

Model matrix is rank deficient. Parameters `female` were not
estimable.
Model matrix is rank deficient. Parameters `female` were not
estimable.

First, let’s start with the interpretation of “male” dummy variable:
- “Being male increases personal income by $11,787 compared to being female.”
  - We inherently know that “being female decreases personal income by $11,787 compared to being male.” Therefore, adding both dummy variables is redundant.
    - If we use female instead of male in the model, the coefficient of female will be negative, -11,787.

[[Omitting dummy variable]]⚓︎

Then, which dummy variable(s) to include and which one(s) to omit?
- We include “male” if we want to discuss the effect of being male, as compared to female, on personal income;
- We include “female” if we want to discuss the effect of being female, as compared to male, on personal income.
  - This decision won’t change the model.
    - However, we need to decide which one to include and which one to omit.
      - Otherwise, RStudio will remove the last added dummy variable.
Omitting a dummy variable doesn't mean we remove them:
- We actually attribute a significant meaning to the omitted dummy variable that:
  - it becomes our [[comparison dummy variable]]:
    - and, it's used in interpretation as well.
    Interpretation of dummy variables
    
    “Being male (included dummy variable) increases personal income by $11,787 compared to being female (omitted - comparison dummy variable).”
    
    OR
    
    “Being female (included dummy variable) decreases personal income by $11,787 compared to being male (omitted - comparison dummy variable).”
- Likewise,
  - if we create two dummy variables using a single original variable:
    - We need to include one, and omit one.
  - if we create three dummy variables using a single original variable:
    - We need to include two, and omit one.
  - if we create four dummy variables using a single original variable:
    - We need to include three, and omit one.

GSS example: Predicting personal income (conrinc)⚓︎⚓︎

We'll add dummy variables to the model we used in Linear regression basics module.

Find the variables in Variables in GSS page⚓︎

In this analysis, we propose a cause-and-effect relationship in which these factor variables may affect (increase or decrease) personal income.

flowchart LR
subgraph C0[Continuous factor variables]
    direction TB
    A[Population density]
    B[Age]
    C[Occupational prestige]
    D[Education]
end

subgraph D0[Dummy factor variables]
    subgraph I0[Immigration status]
        direction TB
        I1[Being nonimmigrant]
        I2[Being immigrant]
    end

    subgraph M0[Marital status]
        direction TB
        M1[Being married]
        M2[Being formerly in union]
        M3[Being single ]
    end

    subgraph S0[Socio-economic status]
        direction TB
        S1[Low SES]
        S2[Moderate SES]
        S3[High SES]
    end
end

subgraph O0[Outcome variable]
    E[Personal income]
end

A -.->|May affect| E
B -.->|May affect| E
C -.->|May affect| E
D -.->|May affect| E
I0 -.->|May affect| E
M0 -.->|May affect| E
S0 -.->|May affect| E

We want to make sure that conrinc and res16, age, prestg10, and educ are continuous variables or usable ordinal variables (Ordinal ✅), and need to see the values of categorical variables for dummy variable codes.

Variable name	Variable label	Variable type	Question wording and response categories
`conrinc`	Respondents' personal income	Continuous	What is your income in dollars? (Min: $281.5; Max, $123,761.9)
`res16`	Population density of residence during adolescence years	Ordinal ✅	Which of the categories on this card comes closest to the type of place you were living in when you were 16 years old? (1: Country, nonfarm; 2: Farm; 3: Town less than 50K; 4: 50K to 250K; 5: Big city, suburb; 6: City greater than 250K)
`age`	Respondents' age	Continuous	What is your age? (Min: 18, Max: 89)
`prestg10`	Respondents' occupational prestige score	Continuous	Respondent's occupational prestige score (calculated) (Min: 16, Max: 80)
`educ`	Respondents' education in years	Continuous	What is the highest year of school you completed? (Min: 0, Max: 20)
`born`	Respondents' immigrant status	Binary	Were you born in this country? (1: Yes; 2: No)
`marital`	Respondents' marital status	Nominal	Are you currently — married, widowed, divorced, separated, or have you never been married? (1: Married; 2: Widowed; 3: Divorced; 4: Separated; 5: Never married)
`sei10` From: Variables in GSS	Respondents' socio-economic index score	Continuous	Socio-economic index score of the respondent (calculated) (Min: 9, Max: 92.8)

[[Dummy variable]]: Categorical (binary) #code⚓︎

Model code

gss$dummyvar1 <- 
ifelse(gss$orig_var == value, 1, 0,
label = "Dummy variable's variable label")

gss$dummyvar2 <- 
ifelse(gss$orig_var == value, 1, 0,
label = "Dummy variable's variable label")

Working code
1 2 3 4 5 6 7
gss$nonimmigrant <- ifelse(gss$born == 1, 1, 0, label = "Being nonimmigrant") gss$immigrant <- ifelse(gss$born == 2, 1, 0, label = "Being immigrant")
- Line 1: We put the new variable name here:
  - nonimmigrant here ➜ dummyvar1
    - It's better to write something simple and memorable value label here, thus nonimmigrant.
- Line 2: We put the original variable that we want to create a dummy variable.
  - born here ➜ orig_var
  - since 1 is "yes to being born in this country" in GSS dataset, we put 1 here ➜ value
    - The whole code with the last 1, 0 means:
      - "if born is 1, create a new variable called nonimmigrant, assign them “1”, and assign the rest “0”".
- Line 3: We write this new dummy variable's variable label here "Being nonimmigrant"
- Line 5: We put the new variable name here:
  - immigrant here ➜ dummyvar2
    - It's better to write something simple and memorable here, thus immigrant.
- Line 6: We put the original variable that we want to create a dummy variable.
  - born here ➜ orig_var
  - since 2 is "no to being born in this country" in GSS dataset, we put 2 here ➜ value
    - The whole code with the last 1, 0 means:
      - "if born is 2, create a new variable called immigrant, assign them “1”, and assign the rest “0”".
- Line 7: We write this new dummy variable's variable label here "Being immigrant"
- Creating a dummy variable is in a way recoding a variable, and thus creating a new variable.
- Running these codes will create two more variables in GSS dataset, nonimmigrant and nonimmigrant.
  - That's it, two more variables, do not expect to see an output.

[[Dummy variable]]: Categorical (nominal/ordinal) #code⚓︎

Model code

gss$dummyvar1 <- 
ifelse(gss$orig_var == value, 1, 0,
label = "Dummy variable's variable label")

gss$dummyvar2 <- 
ifelse(gss$orig_var == value | gss$orig_var == value | gss$orig_var == value, 1, 0,
label = "Dummy variable's variable label")

gss$dummyvar3 <- 
ifelse(gss$orig_var == value, 1, 0,
label = "Dummy variable's variable label")

Working code

gss$married <- 
ifelse(gss$marital == 1, 1, 0,
label = "Being married")

gss$formerlyunion <- 
ifelse(gss$marital == 2 | gss$marital == 3 | gss$marital == 4, 1, 0,
label = "Being formerly in union")

gss$single <- 
ifelse(gss$marital == 5, 1, 0,
label = "Being single")

The codes are as same as the dummy variable codes for categorical (binary).
- The exception is the second dummy variable in which we merge some categories.
Line 5: We put the new variable name here:
- formerlyunion here ➜ dummyvar1
  - It's better to write something simple and memorable, thus formerlyunion.
Line 6: We put the original variable that we want to create a dummy variable.
- marital here ➜ orig_var
- Since we want to merge respondents who are 2 "Widowed", 3 "Divorced", and 4 "Separated", we put marital == 2 | gss$marital == 3 | gss$marital == 4, 1, 0 here ➜ gss$orig_var == value | gss$orig_var == value | gss$orig_var == value, 1, 0
  - The whole code with the last 1, 0 means:
    - "if marital is 2 or 3 or 4, create a new variable called formerlyunion, assign them “1”, and assign the rest “0”".
Line 7: We write this new dummy variable's variable label here "Being formerly in union"
Creating a dummy variable is in a way recoding a variable, and thus creating a new variable.
Running these codes will create three more variables in GSS dataset, married, formerlyunion, and single.
- That's it, three more variables, do not expect to see an output.

[[Dummy variable]]: Continuous #code⚓︎

Model code

gss$dummyvar1 <- 
ifelse(gss$orig_var <= value, 1, 0,
label = "Dummy variable's variable label")

gss$dummyvar2 <- 
ifelse(gss$orig_var <- ifelse(gss$orig_var >= lowest_value & gss$orig_var <= highest_value, 1, 0,
label = "Dummy variable's variable label")

gss$dummyvar3 <- 
ifelse(gss$orig_var => value, 1, 0,
label = "Dummy variable's variable label")

Working code

gss$lowses <- 
ifelse(gss$sei10 <= 40, 1, 0,
label = "Having low socio-economic status")

gss$moderateses <- 
ifelse(gss$sei10 >= 40 & gss$sei10 <= 75, 1, 0,
label = "Having moderate socio-economic status")

gss$highses <- 
ifelse(gss$sei10 >=76, 1, 0,
label = "Having high socio-economic status")

Clarification on codes: Click to expand

Creating dummy variables using continuous variables requires a different coding structure.
- This is because continuous variables have many possible values.
- sei10 is the socio-economic index score of the respondent with minimum value of 9 and maximum value of 92.8.
  - We will create three separate categories for this continuous variable:
    - 1: Less than or equal to 40
      - This means respondents with a sei10 score of 40 or lower.
      - Code:
        
        ifelse(gss$sei10 <= 40, 1, 0)
    - 2: Between 41 and 75
      - This means respondents with a sei10 score from 41 to 75.
      - Code:
        
        ifelse(gss$sei10 >= 41 & gss$sei10 <= 75, 1, 0)
    - 3: Greater than or equal to 76
      - This means respondents with a sei10 score of 76 or higher.
      - Code:
        
        ifelse(gss$sei10 >= 76, 1, 0)
    - In these codes:
      - <= means less than or equal to.
        
        Example: sei10 <= 40
        
        This includes 40 and all values below 40.
      - >= means greater than or equal to.
        
        Example: sei10 >= 76
        
        This includes 76 and all values above 76.
      - & means and.
        
        We use & when both conditions must be true at the same time.
        
        This means:
        
        The sei10 score must be 41 or higher
        
        and
        
        The sei10 score must be 75 or lower

Code explanation: Click to expand

Line 1: We put the new variable name here:
- lowses here ➜ dummyvar1
  - It's better to write something simple and memorable value label here, thus lowses.
Line 2: We put the original variable that we want to create a dummy variable.
- sei10 here ➜ orig_var
- since we define low ses as "40 and all values below 40", we put 40 here ➜ value
  - The whole code with the last 1, 0 means:
    - "if sei10 is 40 and below, create a new variable called lowses, assign them “1”, and assign the rest “0”".
Line 3: We write this new dummy variable's variable label here "Having low socio-economic status"
Line 5: We put the new variable name here:
- moderateses here ➜ dummyvar2
  - It's better to write something simple and memorable value label here, thus moderateses.
Line 6: We put the original variable that we want to create a dummy variable.
- sei10 here ➜ orig_var
- since we define moderate ses as "41 or higher and 75 or lower" (all the values between 41 and 75), we put 41 here ➜ lowest_value and 75 here ➜ highest_value
  - The whole code with the last 1, 0 means:
    - "if sei10 is between 41 and 75, create a new variable called moderateses, assign them “1”, and assign the rest “0”".
Line 7: We write this new dummy variable's variable label here "Having moderate socio-economic status"
Line 9: We put the new variable name here:
- highses here ➜ dummyvar2
  - It's better to write something simple and memorable value label here, thus highses.
Line 10: We put the original variable that we want to create a dummy variable.
- sei10 here ➜ orig_var
- since we define high ses as "76 or higher" we put 76 here ➜ value
  - The whole code with the last 1, 0 means:
    - "if sei10 is 76 and higher, create a new variable called highses, assign them “1”, and assign the rest “0”".
Line 11: We write this new dummy variable's variable label here "Having high socio-economic status"
Creating a dummy variable is in a way recoding a variable, and thus creating a new variable.
Running these codes will create three more variables in GSS dataset, lowses, moderateses, and highses.
- That's it, three more variables, do not expect to see an output.

[[Linear regression]] with [[dummy variables]] #code⚓︎

Model code

model1 <- lm(conrinc ~ res16 + age + prestg10 + educ + male + immigrant + married + single + lowses + moderateses, data = gss)
tab_model(model1, show.std = T, show.ci = F, collapse.se = T)

Working code

model1 <- lm(conrinc ~ res16 + age + prestg10 + educ + male + immigrant + married + single + lowses + moderateses, data = gss)
tab_model(model1, show.std = T, show.ci = F, collapse.se = T)

[[Linear regression]] with [[dummy variables]] #output⚓︎

Respondents' personal income

Factors	Coefficients	std. Beta	p
(Intercept)	4314.45 (6534.17)	-0.00 (0.02)	0.509
Population density of residence during adolescence years	439.36 (400.21)	0.02 (0.02)	0.272
Respondents' age	68.35 (44.21)	0.03 (0.02)	0.122
Respondents' occupational prestige score	126.77 (68.75)	0.05 (0.03)	0.065
Respondents' education in years	1956.08 (253.45)	0.16 (0.02)	0.001***
Being male	10499.06 (1153.09)	0.16 (0.02)	0.001***
Being immigrant	2058.30 (1609.74)	0.02 (0.02)	0.201
Being married	6128.15 (1582.65)	0.09 (0.02)	0.001***
Being single	-5621.85 (1763.87)	-0.08 (0.03)	0.001**
Having low socio-economic status	-22276.87 (2563.88)	-0.35 (0.04)	0.001***
Having moderate socio-economic status	-6606.39 (1969.99)	-0.10 (0.03)	0.001**
Observations	2260
R² / R² adjusted	0.291 / 0.288

[[Linear regression]] with dummy variables #interpretation⚓︎

Linear regression with dummy variables interpretation template. Click to expand

First section: The significance levels
Mention which variables [variable labels] are statistically significant, and which variables are statistically nonsignificant (if any). Variables with at least one asterisk (*) are statistically significant.

[Variable label of significant factor variable 1], [Variable label of significant factor variable 2], [Variable label of significant factor variable 3]... are statistically significant factors of [Variable label of outcome variable] since the p values are less than 0.05. [If any]: [Variable label of significant factor variable 4], [Variable label of significant factor variable 5]... is(are) not statistically significant factor(s) of [Variable label of outcome variable] since the p value(s) is(are) greater than 0.05.

Second section: The explanation of coefficients
Mention how significant factor variables increase or decrease the value of the outcome variable, using "Coefficients" (Coeff. column). When reporting the coefficients of continuos variables, ensure that the sentence includes the unit of analysis (one unit, a day, a score, a year, a dollar, etc.) of both the factor variables and the outcome variable. When reporting the coefficients of dummy variables, ensure that the sentence includes omitted - comparison category

A [unit/day/score,year,dollar (unit of analysis of factor variable1)] increase in [Variable label of significant factor variable1] increases [Variable label of outcome variable] by [coefficient + unit/day/score,year,dollar (unit of analysis of outcome variable)].

A [unit/day/score,year,dollar (unit of analysis of factor variable2)] increase in [Variable label of significant factor variable2] increases [Variable label of outcome variable] by [coefficient + unit/day/score,year,dollar (unit of analysis of outcome variable)].

A [unit/day/score,year,dollar (unit of analysis of factor variable3)] increase in [Variable label of significant factor variable3] increases [Variable label of outcome variable] by [coefficient + unit/day/score,year,dollar (unit of analysis of outcome variable)].

[Variable label of included dummy variable] increases [Variable label of outcome variable] by [coefficient + unit/day/score,year,dollar (unit of analysis of outcome variable)] compared to [Variable label of omitted - comparison dummy variable]**.

Note: Do not mention nonsignificant variables here.

Third section: The explanation of standardized coefficients
Mention the strongest factor variables of the outcome variable using the "Standardized coefficients" (Std. Coeff. column) in order. Only mention the statistically significant ones. "Standardized coefficient" is an absolute number, which means -.56 is stronger than .45.

The strongest factor of [Variable label of outcome variable] is the [Variable label of first strongest factor variable] (std. Coeff=0.xx), followed by [Variable label of second strongest factor variable] (std. Coeff=0.xx), and [Variable label of third strongest factor variable] (std. Coeff=0.xx).

Note: Do not mention nonsignificant variables here.

Fourth section: The explanation of adjusted R-squared
Report the adjusted R-squared value as a percentage with the statistically significant variables.

The adjusted R squared value indicates that [adjusted R squared value] of the variation in [Variable label of outcome variable] can be explained by [Variable label of significant factor variable1], [Variable label of significant factor variable2], [Variable label of significant factor variable3]...

Note: Do not mention nonsignificant variables here.

Linear regression interpretation sample

First section: The significance levels

Respondents' education in years, being male, being married, being single, having low socio-economic status, and having moderate socio-economic status are statistically significant factors of respondents' personal income since the p values are less than 0.05. Population density of residence during adolescence years, respondents' age, respondents' occupational prestige score, and being immigrant are not statistically significant factors of respondents' personal income since the p value is greater than 0.05.

Second section: [The explanation of coefficients]

A year increase in respondents' education in years** increases respondents' personal income by $1,956.

Being male increases respondents' personal income by $10,499 compared to being female.

Being married increases respondents' personal income by $6,128 compared to being formerly in union.

Being single decreases respondents' personal income by $5,621 compared to being formerly in union.

Having low socio-economic status decreases respondents' personal income by $22,276 compared to having high socio-economic status.

Having moderate socio-economic status decreases respondents' personal income by $6,606 compared to having high socio-economic status.

Third section: The explanation of standardized coefficients

The strongest factor of respondents’ personal income is having low socio-economic status (std. Coeff=-0.35), followed by respondents' education in years (std. Coeff=0.16), being male (std. Coeff=0.16), having moderate socio-economic status (std. Coeff=-0.10), being married (std. Coeff=0.09), and being single (std. Coeff=-0.08).

Fourth section: The explanation of adjusted R-squared

The adjusted R squared value indicates that 28.8% of the variation in respondents' personal income can be explained by having low socio-economic status, respondents' education in years, being male, having moderate socio-economic status, being married, and being single.

respondent	Original variable	Dummy variable 1	Dummy variable 2
	sex	male	female
1	1 (male)	0	1
2	1 (male)	1	0
3	2 (female)	0	1
4	1 (male)	1	0
5	2 (female)	0	1
6	2 (female)	1	0

12. Dummy variables⚓︎

Module items⚓︎

R Script file⚓︎

Lab assignment⚓︎

Sample lab assignment⚓︎

Learning outcomes⚓︎

Suggested reading⚓︎

Dummy variables definition⚓︎

Examples⚓︎

Example: Rent prices⚓︎

GSS example: Sex⚓︎

Dummy variable coding structure⚓︎

Adding dummy variable to a regression model⚓︎

[[Omitting dummy variable]]⚓︎

GSS example: Predicting personal income (conrinc)⚓︎⚓︎

Find the variables in Variables in GSS page⚓︎

[[Dummy variable]]: Categorical (binary) #code⚓︎

[[Dummy variable]]: Categorical (nominal/ordinal) #code⚓︎

[[Dummy variable]]: Continuous #code⚓︎

[[Linear regression]] with [[dummy variables]] #code⚓︎

[[Linear regression]] with [[dummy variables]] #output⚓︎

[[Linear regression]] with dummy variables #interpretation⚓︎