Skip to content

09. Visualization⚓︎

Module items⚓︎

R Script file⚓︎

Copy the code below ➜ Paste into [[RStudio console]] ➜ Hit Enter

source(url("https://raw.githubusercontent.com/ttezcann/ssric-reg/refs/heads/main/regression/docs/assets/r-scripts/0_packages_data.R")); 
download.file("https://raw.githubusercontent.com/ttezcann/ssric-reg/refs/heads/main/regression/docs/assets/r-scripts/09-visualization.R", "09-visualization.R"); 
file.edit("09-visualization.R")

Lab assignment⚓︎

Visualization

Sample lab assignment⚓︎

Sample: Visualization

Learning outcomes⚓︎

  1. Learn how to create and customize stacked bar graphs for multiple variables
  2. Learn how to create and customize stacked bar graphs by groups
  3. Learn how to create and customize scatter plots

Suggested reading⚓︎

  • 📖
    Healy, Kieran Joseph. 2019. “Look at the Data.” Pp. 1–23 in Data visualization: A practical introduction. Princeton: Princeton University Press.

Stacked bar graph for multiple variables⚓︎

  • A [[stacked bar graph for multiple variables]] displays:
    • Multiple [[categorical]] variables with the exact same response categories at the same time.
      • Each row represents one variable and shows the percentage breakdown across response categories.
      • It is useful when you want to compare distributions across several related variables.
      • When interpreting stacked bar graphs, we generally interpret one response category.
  • We will create a stacked bar graph for confidence in major US institutions variables, then interpret it.

Step 1: Find the variables in Variables in GSS page⚓︎

  1. We want to make sure all selected variables are categorical.
  2. We check this information in the Variables in GSS page.

    Variable name Variable label Variable type Question wording and response categories
    conbus Confidence level in major companies Ordinal Would you say you have confidence in major companies?
    (1: A great deal; 2: Only some; 3: Hardly any)
    coneduc Confidence level in education Ordinal Would you say you have confidence in education?
    (1: A great deal; 2: Only some; 3: Hardly any)
    confed Confidence level in executive branch of fed. govt. Ordinal Would you say you have confidence in executive branch of the federal government?
    (1: A great deal; 2: Only some; 3: Hardly any)
    conmedic Confidence level in medicine Ordinal Would you say you have confidence in medicine?
    (1: A great deal; 2: Only some; 3: Hardly any)
    conarmy Confidence level in military Ordinal Would you say you have confidence in military?
    (1: A great deal; 2: Only some; 3: Hardly any)
    conjude Confidence level in United States Supreme Court Ordinal Would you say you have confidence in Supreme Court?
    (1: A great deal; 2: Only some; 3: Hardly any)

Step 2: Run the stacked bar graph code for multiple variables and create the figure⚓︎

  1. Now, let's create a stacked bar graph.
  2. The code has two parts: the first part creates the graph object, and the second part adjusts the font sizes.
  3. The template #code shows where to paste the variable names and the title.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    graph <- gss |> 
    select(variable1_here, variable2_here, variable3_here) |> # (1)!
    plot_stackfrq(sort.frq = "first.asc",  geom.colors = "Blues", # (2)!
    show.total = FALSE, title = "title_here") # (3)!
    graph + theme(
    axis.text.x = element_text(size=14), # (4)!
    axis.text.y = element_text(size=14), # (5)!
    plot.title = element_text(size=20), # (6)!
    legend.text = element_text(size=14)) # (7)!
    
    1. We put the variable names here ➜ variable1_here, variable2_here, variable3_here, separated by a comma.
    2. We can change the color palettes here. Replace geom.colors = Blues

      Finding color palettes

      color palettes

    3. We put the graph title here ➜ title_here.

    4. Change font size of x-axis labels ➜ size=14
    5. Change font size of y-axis labels ➜ size=14
    6. Change font size of graph title ➜ size=20
    7. Change font size of legend ➜ size=14
  4. The sample #code below shows the code we will run. This and all the codes are in Templates page. Find this section in this module's R script file. Highlight all lines and click run on RStudio.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    graph <- gss |> 
    select(conbus, coneduc, confed, conmedic, conarmy, conjude) |> # (1)!
    plot_stackfrq(sort.frq = "first.asc",  geom.colors = "Blues", # (2)!
    show.total = FALSE, title = "Confidence in major US institutions") # (3)!
    graph + theme(
    axis.text.x = element_text(size=14), # (4)!
    axis.text.y = element_text(size=14), # (5)!
    plot.title = element_text(size=20), # (6)!
    legend.text = element_text(size=14)) # (7)!
    
    1. We put the variable names here ➜ conbus, coneduc, confed, conmedic, conarmy, `conjude``, separated by a comma.
    2. We can change the color palettes here. Replace geom.colors = Blues

      Finding color palettes

      color palettes

    3. We put the graph title here ➜ Confidence in major US institutions.

    4. Change font size of x-axis labels ➜ size=14
    5. Change font size of y-axis labels ➜ size=14
    6. Change font size of graph title ➜ size=20
    7. Change font size of legend ➜ size=14
  5. You will see this figure on RStudio plots pane:

    A stacked bar graph showing confidence in major U.S. institutions, including the military, medicine, education, the Supreme Court, major companies, and the executive branch of government. Each bar is divided into response categories representing different levels of confidence.

Step 3: Interpret the stacked bar graph for multiple variables⚓︎

  1. The stacked bar graph interpretation templates (and all other interpretation templates of other analyses) are in Templates page.
  2. Read the first section, Interpretation structure, of that page, which provides the detailed steps.

    Interpret one response category

    When interpreting stacked bar graphs, we generally pick one response category and report it across all variables.

    For this example, we will interpret a great deal category, which it the light blue.

    Stacked bar graph for multiple variables #interpretation sample

    Of the GSS respondents, 40.8% have a great deal of confidence in the military; 27.2% have a great deal of confidence in medicine; 19.6% in education; 16.4% in the supreme court; 14.4% in major companies; and 12.5% in the executive branch of government.

    Stacked bar graph for multiple variables #interpretation template

    Of the GSS respondents, xx.xx% have / report/ say [response] in [variable 1]; xx.xx% [response] in [variable 2]; xx.xx% [response] in [variable 3]...

Stacked bar graph by groups⚓︎

  • We can create a [[stacked bar graph by groups]] to:
    • Compare a [[categorical]] variable's distribution across groups.
  • We will create a stacked bar graph for confidence in medicine by age groups.

    • Since we use two variables, we actually, in a way, want to see the connection between the two variables. Therefore, we need to decide which one is the factor variable, and which one is the outcome variable.
    flowchart LR
        subgraph F["Factor variable (Categorical)"]
            A[Perceived personal health quality<br/>1: 18-39 age;<br/> 2: 40-59 age;<br/> 3: 60-89 age]
        end
    
        subgraph O["Outcome variable (Categorical)"]
            B[Confidence level in medicine<br/>1: A great deal;<br/> 2: Only some;<br/> 3: Hardly any]
        end
    
        A ==>|May have an effect on| B
  • The [[outcome variable]] goes first, the [[factor variable]] goes second.

Step 1: Find the variables in Variables in GSS page⚓︎

  1. We want to make sure all selected variables are categorical.
  2. We check this information in the Variables in GSS page.

    Variable name Variable label Variable type Question wording and response categories
    health Perceived personal health quality Ordinal Would you say that in general your health is Excellent, Very good, Good, Fair, or Poor?

    (1: Excellent; 2: Very Good; 3: Good; 4: Fair; 5: Poor)
    conmedic Confidence level in medicine Ordinal Would you say you have confidence in medicine?
    (1: A great deal; 2: Only some; 3: Hardly any)

Step 2: Run stacked bar graph by groups graph code and create the figure⚓︎

  1. Now, let's create the stacked bar graph by groups.
  2. The template #code shows where to paste the variables.
  3. The template code shows where to paste the variable names and the title.

    1
    2
    3
    4
    5
    6
    7
    8
    9
        graph <- plot_xtab(
        gss$outcome_here, gss$factor_here, show.n = FALSE, # (1)!
        geom.colors = "Dark2", # (2)!
        show.total = FALSE, title = "title_here") # (3)!
        graph + theme(
        axis.text.x = element_text(size = 14), # (4)!
        axis.text.y = element_text(size = 14), # (5)!
        plot.title = element_text(size = 20), # (6)!
        legend.text = element_text(size = 14)) # (7)!
    
    1. We put the outcome variable here ➜ outcome_here, and factor variable here ➜ factor_here,
    2. We can change the color palettes here. Replace geom.colors = Dark2

      Finding color palettes

      color palettes

    3. We put the graph title here ➜ title_here.

    4. Change font size of x-axis labels ➜ size=14
    5. Change font size of y-axis labels ➜ size=14
    6. Change font size of graph title ➜ size=20
    7. Change font size of legend ➜ size=14
  4. The sample #code below shows the code we will run. This and all the codes are in Templates page. Find this section in this module's R script file. Highlight all lines and click run on RStudio.

    1
    2
    3
    4
    5
    6
    7
    8
    9
        graph <- plot_xtab(
        gss$health, gss$conmedic, show.n = FALSE, # (1)!
        geom.colors = "Dark2", # (2)!
        show.total = FALSE, title = "Confidence level in medicine by Perceived personal health quality") # (3)!
        graph + theme(
        axis.text.x = element_text(size = 14), # (4)!
        axis.text.y = element_text(size = 14), # (5)!
        plot.title = element_text(size = 20), # (6)!
        legend.text = element_text(size = 14)) # (7)!
    
    1. We put the outcome variable here ➜ health, and factor variable here ➜ conmedic,
    2. We can change the color palettes here. Replace geom.colors = Dark2

      Finding color palettes

      color palettes

    3. We put the graph title here ➜ Confidence level in medicine by perceived personal health quality.

    4. Change font size of x-axis labels ➜ size=14
    5. Change font size of y-axis labels ➜ size=14
    6. Change font size of graph title ➜ size=20
    7. Change font size of legend ➜ size=14

    Variable order matters

    [[Outcome variable]] first health, [[factor variable]] second conmedic.

  5. You will see this figure on RStudio plots pane:

    Bar chart titled “Confidence level in medicine by perceived personal health quality.” Across all confidence groups, “Very Good” health is the most common response: 55.6% among those with a great deal of confidence in medicine, 56.6% among those with only some confidence, and 48.4% among those with hardly any confidence. “Good” is next most common for those with only some or hardly any confidence (24.9% and 25.4%), while “Excellent” is somewhat higher among those with a great deal of confidence (19.9%) than the other groups; “Fair” is least common overall, but highest among those with hardly any confidence (8.0%).

Step 3: Interpret the stacked bar graph by groups⚓︎

  1. The stacked bar graph interpretation templates (and all other interpretation templates of other analyses) are in Templates page.
  2. Read the first section, Interpretation structure, of that page, which provides the detailed steps.

    Interpret one response category

    When interpreting stacked bar graphs, we generally pick one response category and compare it across all groups.

    Stacked bar graph by groups #interpretation sample

    Of the GSS respondents, 18.2% of those who perceive their health excellent, 48.4% of of those who perceive their health very good, 25.4% of those who perceive their health good, and 8% of those who perceive their health fair have hardly any confidence in medicine.

    Stacked bar graph by groups #interpretation template

    Of the GSS respondents, xx.xx% of the [group 1], xx.xx% of the [group 2], and xx.xx% of the [group 3] [response] [variable label].