04. Recoding variables⚓︎
Module items⚓︎
R Script file⚓︎
Copy the code below ➜ Paste into [[RStudio console]] ➜ Hit Enter
source(url("https://raw.githubusercontent.com/ttezcann/ssric-reg/refs/heads/main/regression/docs/assets/r-scripts/0_packages_data.R"));
download.file("https://raw.githubusercontent.com/ttezcann/ssric-reg/refs/heads/main/regression/docs/assets/r-scripts/04-recoding.R", "04-recoding.R");
file.edit("04-recoding.R")
Lab assignment⚓︎
Sample lab assignment⚓︎
Learning outcomes⚓︎
- Learn the definition of recoding
- Learn the reasons for recoding:
- Merging values
- Reversing values
- Transforming continuous variables into groups
- Identify the common recoding issues
Suggested reading⚓︎
- 📖
Van Tubergen, Frank. 2006. “Occupational Status of Immigrants in Cross-National Perspective: A Multilevel Analysis of Seventeen Western Societies.” Pp. 147–71 in Immigration and the Transformation of Europe, edited by C. A. Parsons and T. M. Smeeding. Cambridge University Press. (Specifically, "Dependent and independent variables" ps. 153-155)
[[Recoding]] definition⚓︎
- It is rare that we use variables as they are in our analyses.
- Instead, we often customize the values of variables for our needs.
- Recoding means creating a new variable using the values of an original variable.
- After recoding (creating a new variable), the data will include one more variable.
[[Reasons for recoding]]⚓︎
- There are three reasons for recoding:
- [[Merging values]]
- [[Reversing values]]
- [[Transforming continuous variables into groups]]
[[Merging values]]⚓︎
-
For our analysis, we may want to merge the values of variables and create a new variable.
- Merging values is for [[categorical]] variables.
-
Take
maritalvariable in GSS:Variable name Variable label Variable type Question wording and response categories marital
From: Variables in GSSRespondents' marital status Nominal Are you currently — married, widowed, divorced, separated, or have you never been married?
(1: Married; 2: Widowed; 3: Divorced; 4: Separated; 5: Never married)
-
For our analysis, imagine we are interested in the income level of
1: married,2: formerly in union, and3: never marriedrespondents.-
Then, we will merge values and create a new variable by recoding.
Merging values:
marital→maritalgroups- 1: married ➜ 1: married
- 2: widowed, 3: divorced, 4: separated ➜ 2: formerly in union
- 5: never married ➜ 3: never married
-
After recoding the original
maritalvariable, which has 5 responses, our dataset will include one more variable calledmaritalgroupswith 3 responses.respondent id marital maritalgroups 1 1 (married) 1 (married) 2 2 (widowed) 2 (formerly in union) 3 1 (married) 1 (married) 4 5 (never married) 3 (never married) 5 3 (divorced) 2 (formerly in union) 6 4 (separated) 2 (formerly in union) 7 5 (never married) 3 (never married) 8 4 (separated) 2 (formerly in union) 9 2 (widowed) 2 (formerly in union) 10 1 (married) 1 (married)
-
-
We need to inform RStudio about which numbers should be replaced with which numbers in our recoded (new) variable.
-
We use comma (,) to merge the values of categorical variables in the code:
Merging values:
marital→maritalgroups- 1: married ➜ 1: married - 1 = 1
- 2: widowed, 3: divorced, 4: separated ➜ 2: formerly in union - 2, 3, 4 = 2
- 5: never married ➜ 3: never married - 5 = 3
-
[[Merging values]] - coding steps⚓︎
-
Before recoding
maritalvariable by merging values, note that we have 980 variables in total. After recoding, there will be 981 variables. Remember: recoding is for creating a new variable.
-
[[Merging values]] #code structure
- [[Model code]]
-
[[Working code]]
Code explanation: Click to expand
- maritalgroups: New name for the recoded variable. This will be added to GSS dataset.
- We’ll type this name. No space, no special characters. Add “groups”, “reversed”, or “recoded” at the end of the original variable name or type anything that you will remember what this variable is.
- marital: The original variable we want to recode. The new variable will be created based on the original variable's values.
- [married], [formerly in union] and [never married]: New labels for the new values. These will appear on the table.
- [var.label]: The last line is for writing the variable label of the new variable. We put the new variable's name here again,
maritalgroups, and write this new variable's variable label here "Recoded respondents' marital status"
- Line 1: We put the new variable name for the new recoded variable here,
maritalgroups. - Line 2: We put the original variable we want to recode here,
marital. - Lines 3-4-5 We merge values in these lines. "[...]" are the new labels for the new values. These will appear on our outputs.
- Line 6: We write this new variable's variable label here "
Recoded respondents' marital status".
- maritalgroups: New name for the recoded variable. This will be added to GSS dataset.
-
After [[highlighting and running]] the code above, GSS dataset will include one more variable as we have just created the
maritalgroupsvariable.
-
[[Frequency table]] #code for the original variable (
marital)- [[Model code]]
-
[[Working code]]
- Line 1: We put
maritalhere ➜variable_here.- Find the working code in this module's R script file.
- [[Highlighting and running]] this code will generate the output below (which will appear in the viewer part of RStudio).
- Line 1: We put
- [[Model code]]
-
[[Frequency table]] #output for the original variable (
marital)Respondents' marital status (x)
val label frq raw.prc valid.prc cum.prc 1 Married 1659 41.62 41.78 41.78 2 Widowed 269 6.75 6.77 48.55 3 Divorced 579 14.53 14.58 63.13 4 Separated 130 3.26 3.27 66.41 5 Never married 1334 33.47 33.59 100.00 NA NA 15 0.38 NA NA -
[[Frequency table]] #interpretation for the original variable (
marital)Frequency table interpretation template
The [variable label] variable shows that xx.xx% of the respondents are / have / feel / think / said / reported [label 1], xx.xx% of the respondents are / have / feel / said / reported [label 2]...
- After the [variable label], we add the word of "variable" in your interpretation:
- "The respondents' marital status variable shows that..."
- Depending on the variable, we need to tweak some parts of the interpretation.
- For example, "41.78% of the respondents are married" etc.
- We interpret the valid percentage column (valid.prc).
Frequency table interpretation sample
The respondents' marital status variable shows that 41.78% of the respondents are married; 6.77% of the respondents are widowed; 14.58% of the respondents are divorced; 3.27% of the respondents are separated; and 33.59% of the respondents are never married.
- After the [variable label], we add the word of "variable" in your interpretation:
-
[[Frequency table for recoded variable]] #code (
maritalgroups)- [[Model code]]
-
[[Working code]]
- Line 1: We put
maritalgroupshere ➜variable_here.- Find the working code in this module's R script file.
- [[Highlighting and running]] this code will generate the output below (which will appear in the viewer part of RStudio).
- Line 1: We put
- [[Model code]]
-
[[Frequency table for recoded variable]] #output (
maritalgroups)Recoded respondents' marital status (x)
val label frq raw.prc valid.prc cum.prc 1 Married 1659 41.62 41.78 41.78 2 Formerly in union 978 24.54 24.63 66.41 3 Never married 1334 33.47 33.59 100.00 NA NA 15 0.38 NA NA -
[[Frequency table for recoded variable]] #interpretation (
maritalgroups)Frequency table for recoded variables interpretation template
The [recoded variable label] variable shows that xx.xx% of the respondents are / have / feel / think / said / reported [label 1], xx.xx% of the respondents are / have / feel / said / reported [label 2]...
- Before the [variable label], we add the word of "recoded".
- After the [variable label], we add the word of "variable" in your interpretation:
- "The recoded respondents' marital status variable shows that..."
- Depending on the variable, we need to tweak some parts of the interpretation.
- For example, "24.63% of the respondents are formerly in union" etc.
- We interpret the valid percentage column (valid.prc).
Frequency table for recoded variables interpretation sample
The recoded respondents' marital status variable shows that 41.78% of the respondents are married; 24.63% of the respondents are formerly in union; and 33.59% of the respondents are never married.
[[Reversing values]]⚓︎
-
For some variables, reversing their values is necessary to ensure that higher values represent higher levels of what they measure.
- Reversing values is for [[ordinal]] variables.
-
Take
satjobvariable in GSS:Variable name Variable label Variable type Question wording and response categories satjob
From: Variables in GSSLevel of work satisfaction Ordinal, RECODE On the whole, how satisfied are you with the work you do?
(1: Very satisfied; 2: Moderately satisfied; 3: A little dissatisfied; 4: Very dissatisfied) -
Imagine respondent A, who are very dissatisfied with their work and responded with 4 (Very dissatisfied), and respondent B, who always are very satisfied and responded with 1 (Very satisfied).
- If we use this variable as is in our analysis, the work satisfaction score for respondent A will be higher than respondent B. It should be the opposite.
-
Then, we reverse the values and create a new variable by recoding.
Reversing values:
satjob→satjobreversed- 1: Very satisfied ➜ 4: Very satisfied
- 2: Moderately satisfied ➜ 3: Moderately satisfied
- 3: A little dissatisfied ➜ 2: A little dissatisfied
- 4: Very dissatisfied ➜ 1: Very dissatisfied
-
After recoding the original
satjobvariable, which has 4 responses, our dataset will include one more variable called satjobreversed with 4 responses. -
The only difference between these two variables is that the responses are reversed, so higher numbers indicate higher levels of work satisfaction.
respondent id satjob satjobreversed 1 1 (Very satisfied) 4 (Very satisfied) 2 2 (Moderately satisfied) 3 (Moderately satisfied) 3 4 (Very dissatisfied ) 1 (Very dissatisfied ) 4 3 (A little dissatisfied) 2 (A little dissatisfied) 5 2 (Moderately satisfied) 3 (Moderately satisfied) 6 4 (Very dissatisfied ) 1 (Very dissatisfied ) 7 1 (Very satisfied) 4 (Very satisfied) 8 3 (A little dissatisfied) 2 (A little dissatisfied) 9 2 (Moderately satisfied) 3 (Moderately satisfied) 10 4 (Very dissatisfied ) 1 (Very dissatisfied )
-
We need to inform RStudio about which numbers should be replaced with which numbers in our new variable (the recoded variable). We simply reverse the values, NO merging, so NO comma (,).
-
We use comma (,) to merge the values of categorical variables in the code:
Reversing values:
satjob→satjobreversed- 1: Very satisfied ➜ 4: Very satisfied - 1 = 4
- 2: Moderately satisfied ➜ 3: Moderately satisfied - 2 = 3
- 3: A little dissatisfied ➜ 2: A little dissatisfied - 3 = 2
- 4: Very dissatisfied ➜ 1: Very dissatisfied - 4 = 1
[[Reversing values]] - coding steps⚓︎
-
Before recoding
maritalvariable by [[reversing values]], note that we have 981 variables in total. Original 980 variables + 1 variable we created above,maritalgroups.
-
[[Reversing values]] #code structure:
-
[[Model code]]
-
[[Working code]]
Code explanation: Click to expand
- satjobreversed: New name for the recoded variable. This will be added to GSS dataset.
- We’ll type this name. No space, no special characters. Add “groups”, “reversed”, or “recoded” at the end of the original variable name or type anything that you will remember what this variable is.
- satjob: The original variable we want to recode. The new variable will be created based on the original variable's values.
- [Very satisfied], [Moderately satisfied], [A little dissatisfied] and [Very dissatisfied]: The existing labels for the new values. These will appear on the table.
- [var.label]: The last line is for writing the variable label of the new variable. We put the new variable's name here again,
satjobreversed, and write this new variable's variable label here "Recoded level of work satisfaction"
- Line 1: We put the new variable name for the new recoded variable here,
satjobreversed. - Line 2: We put the original variable we want to recode here,
satjob. - Lines 3-4-5 We reverse values in these lines. "[...]" are the new labels for the new values. These will appear on our outputs.
- Line 6: We write this new variable's variable label here "
Recoded level of work satisfaction"
-
-
After highlighting and running the code above, GSS dataset will include one more variable as we have just created the
satjobreversedvariable.
-
[[Frequency table]] #code for the original variable (
satjob)- [[Model code]]
-
[[Working code]]
- Line 1: We put
satjobhere ➜variable_here.- Find the working code in this module's R script file.
- [[Highlighting and running]] this code will generate the output below (which will appear in the viewer part of RStudio).
- Line 1: We put
- [[Model code]]
-
[[Frequency table]] #output for the original variable (
satjob)Level of work satisfaction (x)
val label frq raw.prc valid.prc cum.prc 1 Very satisfied 1162 29.15 41.77 41.77 2 Moderately satisfied 1188 29.80 42.70 84.47 3 A little dissatisfied 294 7.38 10.57 95.04 4 Very dissatisfied 138 3.46 4.96 100.00 NA NA 1204 30.21 NA NA -
[[Frequency table]] #interpretation for the original variable (
satjob)Frequency table interpretation template
The [variable label] variable shows that xx.xx% of the respondents are / have / feel / think / said / reported [label 1], xx.xx% of the respondents are / have / feel / said / reported [label 2]...
- After the [variable label], we add the word of "variable" in your interpretation:
- "The level of work satisfaction variable shows that..."
- Depending on the variable, we need to tweak some parts of the interpretation.
- For example, "41.77% of the respondents are very satisfied" etc.
- We interpret the valid percentage column (valid.prc).
Frequency table interpretation sample
The level of work satisfaction variable shows that 41.77% of the respondents are very satisfied; 42.70% of the respondents are moderately satisfied; 10.57% of the respondents are a little dissatisfied; and 4.96% of the respondents are very dissatisfied with the work they do.
- After the [variable label], we add the word of "variable" in your interpretation:
-
[[Frequency table for recoded variable]] #code (
satjobreversed)- [[Model code]]
-
[[Working code]]
- Line 1: We put
satjobreversedhere ➜variable_here.- Find the working code in this module's R script file.
- [[Highlighting and running]] this code will generate the output below (which will appear in the viewer part of RStudio).
- Line 1: We put
- [[Model code]]
-
[[Frequency table for recoded variable]] #output (
satjobreversed)Recoded level of work satisfaction (x)
val label frq raw.prc valid.prc cum.prc 1 Very dissatisfied 138 3.46 4.96 4.96 2 A little dissatisfied 294 7.38 10.57 15.53 3 Moderately satisfied 1188 29.80 42.70 58.23 4 Very satisfied 1162 29.15 41.77 100.00 NA NA 1204 30.21 NA NA -
[[Frequency table for recoded variable]] #interpretation (
satjobreversed)Frequency table for recoded variables interpretation template
The [recoded variable label] variable shows that xx.xx% of the respondents are / have / feel / think / said / reported [label 1], xx.xx% of the respondents are / have / feel / said / reported [label 2]...
- Before the [variable label], we add the word of "recoded".
- After the [variable label], we add the word of "variable" in your interpretation:
- "The recoded level of work satisfaction variable shows that..."
- Depending on the variable, we need to tweak some parts of the interpretation.
- For example, "4.96% of the respondents are very dissatisfied" etc.
- We interpret the valid percentage column (valid.prc).
Recoded frequency table interpretation sample
The recoded level of work satisfaction variable shows that 4.96% of the respondents are very dissatisfied; 10.57% of the respondents are a little dissatisfied; 42.70% of the respondents are moderately satisfied; and 41.77% of the respondents are very satisfied with the work they do.
[[Transforming continuous variables into groups]]⚓︎
-
For our analysis, we may want to recode [[continuous]] variables and create [[categorical]] groups. This is also, in a way, merging the values.
-
Take
educvariable in GSS (“What is the highest year of school you completed?”).Variable name Variable label Variable type Question wording and response categories educ
From: Variables in GSSRespondents' education in years Continuous What is the highest year of school you completed? -
The responses are from 0 (no schooling) to 20 (20 years of schooling). All the numbers from 0 to 20 are real numbers (continuous variable).
- For our analysis, imagine we’re interested in the income level of respondents with (1) Low level of education, (2) Moderate level of education, and (3) High level of education.
-
Then, we will merge some values and create a new variable by recoding. 1, 2, 3 are not real numbers (categorical variable).
Transforming values:
educ→educgroups- 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ➜ 1: Low level of education
- 11, 12, 13, 14, 15 ➜ 2: Moderate level of education
- 16, 17, 18, 19, 20 ➜ 3: High level of education
-
After recoding the continuous
educvariable, which originally had 21 responses (from 0 to 20 years of schooling), our dataset will include one more variable callededucgroupswith 3 responses. -
While
educis a continuous variable,educgroupsis a categorical variable.respondent id educ educgroups 1 16 3 (High level of education) 2 3 1 (Low level of education) 3 4 1 (Low level of education) 4 13 2 (Moderate level of education) 5 20 3 (High level of education) 6 9 1 (Low level of education) 7 15 2 (Moderate level of education) 8 18 3 (High level of education) 9 17 3 (High level of education) 10 9 1 (Low level of education)
-
-
We need to inform RStudio about which numbers should be replaced with which numbers in our new variable (the recoded variable).
-
We use colon (:) to merge responses of continuous variables:
Reversing values:
educ→educgroups- 0 : 11 = 1 [meaning: From 0 years of education to 11 years of education = 1 (Low level of education)]
- 12 : 15 = 2 [meaning: From 12 years of education to 15 years of education = 2 (Moderate level of education)]
- 16 : 20 = 3 [meaning: From 16 years of education to 20 years of education = 3 (High level of education)]
[[Transforming continuous variables into groups]] - coding steps⚓︎
-
Before recoding
educvariable by transforming continuous variables into groups, note that we have 982 variables in total. Original 980 variables +maritalgroupsandsatjobreversed.
-
[[Transforming continuous variables into groups]] #code structure:
-
[[Model code]]
-
[[Working code]]
Code explanation: Click to expand
- educgroups: New name for the recoded variable. This will be added to GSS dataset.
- We’ll type this name. No space, no special characters. Add “groups”, “reversed”, or “recoded” at the end of the original variable name or type anything that you will remember what this variable is.
- educ: The original variable we want to recode. The new variable will be created based on the original variable's values.
- [Low level of education], [Moderate level of education], and [High level of education]: The new labels for the new values. These will appear on the table.
- [var.label]: The last line is for writing the variable label of the new variable. We put the new variable's name here again,
educgroups, and write this new variable's variable label here "Recoded respondents' education in years"
- Line 1: We put the new variable name for the new recoded variable here,
educgroups. - Line 2: We put the original variable we want to recode here,
educ. - Lines 3-4-5 We merge values in lines 3, 4, and 5, using colon (:). "[...]" are the new labels for the new values. These will appear on our outputs.
- Line 6: We write this new variable's variable label here "
Recoded respondents' education in years".
-
-
After highlighting and running the code above, GSS dataset will include one more variable as we have just created the
educgroupsvariable.
-
[[Descriptive table]] #code for the original variable (
educ)-
[[Model code]]
-
[[Working code]]
- Line 1: We put
educhere ➜variable_here. [[Highlighting and running]] this code will generate the output below (which will appear in the viewer part of RStudio).
- Line 1: We put
-
-
[[Descriptive table]] #output for the original variable (
educ)Basic descriptive statistics
Variable Label N Missings (%) Mean SD educ Respondents' education in years 3952 0.85 14.24 2.92 -
[[Descriptive table]] #interpretation for the original variable (
educ)Descriptive table interpretation template
The [variable label] variable shows the average [variable label] of the respondents is [mean], with standard deviation [SD].
- After the [variable label], we add the word of "variable" in your interpretation:
- "The respondents' education in years variable shows that..."
- Depending on the variable, we need to tweak some parts of the interpretation.
- For example, "the average years of education is...", "the average weeks of working is..." etc.
- We use the mean (Mean column) and standard deviation (SD column) in our interpretation.
Descriptive table interpretation sample
The respondents' education in years variable shows that the average years of education is 14.42, with standard deviation 2.92.
-
[[Frequency table for recoded variable]] #code (
educgroups)- Model code
-
Working code
- Line 1: We put
educgroupshere ➜variable_here.- Find the working code in this module's R script file.
- [[Highlighting and running]] this code will generate the output below (which will appear in the viewer part of RStudio).
- Line 1: We put
- Model code
-
[[Frequency table for recoded variable]] #output (
educgroups)Recoded respondents' education in years
val label frq raw.prc valid.prc cum.prc 1 Low level of education 337 8.45 8.53 8.53 2 Moderate level of education 2059 51.66 52.10 60.63 3 High level of education 1556 39.04 39.37 100.00 NA NA 34 0.85 NA NA -
[[Frequency table for recoded variable]] #interpretation (
educgroups)Frequency table for recoded variables interpretation template
The [recoded variable label] variable shows that xx.xx% of the respondents are / have / feel / think / said / reported [label 1], xx.xx% of the respondents are / have / feel / said / reported [label 2]...
- Before the [variable label], we add the word of "recoded".
- After the [variable label], we add the word of "variable" in your interpretation:
- "The recoded respondents' education in years variable shows that..."
- Depending on the variable, we need to tweak some parts of the interpretation.
- For example, "8.53% of the respondents have low level of education" etc.
- We interpret the valid percentage column (valid.prc).
Frequency table for recoded variables interpretation sample
The ==recoded respondents' education in years variable shows that 8.53% of the respondents have low level of education; 52.10% of the respondents have moderate level of education; and 39.37% of the respondents have high level of education.
How to work with recoding codes⚓︎
Step (1): Determine what kind of recoding you need⚓︎
- [[Merging values]] (categorical to categorical),
- For example: We have merged the values of
maritaland createdmaritalgroups.
- For example: We have merged the values of
- [[Reversing values]] (categorical to categorical),
- For example: We have reversed the values of
satjoband createdsatjobreversed.
- For example: We have reversed the values of
- [[Transforming continuous variables into groups]] (continuous to categorical),
- For example: We have transformed the values of
educ(continuous variable) and creatededucgroups
- For example: We have transformed the values of
Step (2): Determine how many values you will need in your recoded (new) variable⚓︎
- We needed 3 values for
maritalgroups, - We needed 5 values for
satjobreversed, - We needed 3 values for
educgroups.
Step (3): Find [[recoding model codes]]⚓︎
- At the very bottom of this page, there are recoding model codes with every type of recoding and every kind of value possibility.
[[Common recoding issues and troubleshooting]]⚓︎
[[Different recoding codes for different variables]]⚓︎
-
Recoding a categorical variable and a continuous variable requires slightly different codes.
- Line 4: Check line 4. For categorical variables, we use comma (,) between the values for merging. In the line 4, comma means “merge 2, 3, and 4.”
-
Line 10: Check line 10. For continuous variables, we use colon (:) between the values. In the line 10, colon means “merge all numbers between 0 and 11.”
Troubleshooting
- We need to check the "variable type" column of the variable (in "Variables in GSS page") we recode.
- If we recode a categorical variable, we use comma (,) between the values for merging.
- If we recode a continuous variable, we use colon (:) between the values.
- We need to check the "variable type" column of the variable (in "Variables in GSS page") we recode.
[[Use the recoded (new) variable in analyses]]⚓︎
- When we want to display, for example, the frequency table of a recoded (new) variable, we must use the recoded (new) variable’s name in the frequency code.
-
This is because, for our analysis, the original variable is no longer relevant. We recoded the original variable and created a new one for our analysis needs.
- Line 1: Wrong!
-
Line 3: Correct!
Troubleshooting
- After the recoding process, use the recoded (new) variable in analyses. Make sure you do not use the original variable name in analyses.
[[Recoded variables are always categorical]]⚓︎
- When we recode a continuous variable, the new (recoded) variable is no longer continuous.
- It becomes CATEGORICAL because we have merged the real numbers, and they no longer remain as real numbers.
-
Therefore, for example, we use the
frqcode to see the frequency distribution.- Line 1: Wrong!
-
Line 3: Correct!
Troubleshooting
- Recoded variables are always categorical. Therefore, they should be treated categorical in every analyses they are used.
[[Use a model code]]⚓︎
- We will likely make mistakes:
- If we type manually and do not use a [[model code]] and do not compare it with our [[working code]]. The code below has two issues:
- Check line 3. A semicolon (;) is missing at the end. The [[RStudio console]] showed this error:
Error: ?Syntax error in argument "18:29=130:45=2[Middle" -
Imagine, there was a semicolon at the end of line 3. This time, line 4 lacks of a bracket. This is a much more problematic error, because RStudio will still create
agegroupsand when we run a frequency table code, the table will show-Infand2[Middlelabels.- Line 4: Check line 4. A semicolon is missing at the end. The [[RStudio console]] showed this error:
Error: ?Syntax error in argument "18:29=130:45=2[Middle" - Line 5: Imagine, there was a semicolon (;) at the end of line 4. This time, line 5 lacks of a bracket. This is a much more problematic error, because RStudio will still create
agegroupsand when we run a frequency table code, the table will show-Infand2[Middlelabels.
Recoded Respondents' age (x)
val label frq raw.prc valid.prc cum.prc -Inf 127 3.19 3.19 3.19 1 606 15.20 15.20 18.39 2[Middle 1196 30.01 30.01 48.39 3 874 21.93 21.93 70.32 4 1183 29.68 29.68 100.00 NA NA 0 0.00 NA NA Troubleshooting
-
In this page, there are recoding model codes for both categorical and continuous variables, with every kind of value possibility. See Recoding model codes.
- Determine what kind of recoding you need (merging, reversing, or transforming continuous variables into groups)
- Determine how many values you will need in your recoded (new variables).
- Line 4: Check line 4. A semicolon is missing at the end. The [[RStudio console]] showed this error:
[[Refresh GSS data if variables are misplaced]]⚓︎
- If variables are misplaced in the codes and have overwritten the original values, we have to have original GSS data again, because we lost the values of the original variable and we need a fresh data.
- From time to time, we may accidentally change the values of original variables (especially when we recode variables).
-
When this happens, we go to the very top of the R script file, and highlight and run the "Refresh data and packages" code. If we created new variables previously, we will need to run those codes under our working space again in order since this will be a fresh data.
- Imagine, we accidentially place the original variable,
educ, whose values we want to change, in the first part of the code. We did run the code and the values of theeducvariable is now lost. - Note that in addition to 980 original variables, we have created 3 more variables so far, which adds up to 983 variables in total.
- We need to go to the top of the R script file, and highlight and run the "Refresh data and packages" code. That line is there for this exact reason. We normally do not run that code in our sessions.
-
Note that now we have 980 variables. The 3 variables we created are gone. We will run the codes under our working space again in order since this is a new,fresh data.

Troubleshooting
- Mistakes happen. For example, we could put the new variable name into the wrong part of the code. When this happens, the values of the original variable are lost.
- Therefore, we should highlight and run the "Refresh data and packages" code.
- We should run each code again before the wrong code, because they are also lost.
- Mistakes happen. For example, we could put the new variable name into the wrong part of the code. When this happens, the values of the original variable are lost.
- Imagine, we accidentially place the original variable,
[[Run the recoding codes to create a new variable]]⚓︎
- Let’s say we want to recode an existing variable and therefore create a new variable. Then we want to create a frequency table of the new (recoded) variable.
-
Preparing the recoding code does not mean we created a new variable. We need to run the recoding code so the frequency code can work. They need to be run in order.
- For example, the
frq(gss$maritalgroups, out = "v")code didn’t work below, and it yielded anunknown or uninitialised column: ‘maritalgroups’error. -
Even though the recoding code that generates the
maritalgroupsvariable exists, we didn’t highlight and run it, so the data doesn’t actually includemaritalgroupsyet.
-
Below, it works because:
- We did highlight and run the recoding code, and
- We did highlight and run the frequency code. They need to be run in order. Alternatively, we could highlight both and run.

Troubleshooting
- Always run the recoding codes before running the frequency codes, or any other codes including the new (recoded) variable.
- If we do not remember if we did run it before, we run it again.
- For example, the