14. Modeling exercises⚓︎
Module items⚓︎
R Script file⚓︎
Copy the code below ➜ Paste into [[RStudio console]] ➜ Hit Enter
source(url("https://raw.githubusercontent.com/ttezcann/ssric-reg/refs/heads/main/regression/docs/assets/r-scripts/0_packages_data.R"));
download.file("https://raw.githubusercontent.com/ttezcann/ssric-reg/refs/heads/main/regression/docs/assets/r-scripts/14-modeling-exercises.R", "14-modeling-exercises.R");
file.edit("14-modeling-exercises.R")
Lab assignment⚓︎
Sample lab assignment⚓︎
Learning outcomes⚓︎
- Practice logistic regression analysis
Suggested reading⚓︎
- 📖
Sperandei, Sandro. 2014. “Understanding Logistic Regression Analysis.” Biochemia Medica 24(1):12–18. doi:10.11613/BM.2014.003
The content of the module⚓︎
- We will use the dataset created during in-person classes in previous semesters.
- 189 students tossed a ball into a trash can from various distances.
- Those 189 students also filled out a survey with information (both relevant and irrelevant) to predict targeting trash can.
Trashball: Logistic regression classroom activity⚓︎

- The activity involved students attempting to toss a ball into a trash can from various distances.
- Students targeted the wide trash can from 6, 10, 15-feet away and the narrow trash can from 5, 8, 12-feet away.
Recorded data and survey⚓︎
| Variable name | Variable label | Variable type | Question wording and response categories | Comparison category |
|---|---|---|---|---|
success |
Hitting the target | Binary-Dummy | Recorded data: (1: Success; 0: No success) |
|
distance |
Distance from the trash can | Continuous | Recorded data: (Min: 5, Max: 15 feet) |
|
narrow |
Tossing a ball to a narrow trash can | Binary-Dummy | Recorded data: (1: Tossing a ball to a narrow trash can; 0: Tossing a ball to a wide trash can ) |
Tossing a ball to a wide trash can |
sleep |
Having at least seven hours sleep | Binary-Dummy | Did you sleep at least 7 hours last night? (1: Yes; 0: No) |
Having less than seven hours sleep |
hungry |
Hunger level | Ordinal ✅ | How hungry are you right now? (1: Not hungry at all; 2: Slightly hungry, 3: Moderately hungry, 4: Very hungry; 5: Extremely hungry) |
|
physicallyactive |
The level of physical activity | Ordinal ✅ | How physically active are you on a regular basis? (1: Not at all; 2: Slightly active; 3: Moderately active; 4: Very active; 5: Extremely active) |
|
regularsports |
Playing sports regularly | Binary-Dummy | Do you play any sports regularly? (1: Yes; 0: No) |
Not playing sports regularly |
physicallycoordinated |
The level of physical coordination | Ordinal ✅ | Do you consider yourself physically coordinated? (1: Not at all; 2: Slightly; 3: Moderately; 4: Very; 5: Extremely) |
|
competitive |
The level of competitiveness | Ordinal ✅ | How competitive do you consider yourself to be? (1: Not at all; 2: Slightly; 3: Moderately; 4: Very; 5: Extremely) |
|
outgoing |
The level of outgoingness | Ordinal ✅ | Would you describe yourself as outgoing? (1: Not at all, 2: Slightly; 3: Moderately; 4: Very; 5: Extremely) |
|
nervous |
The level of nervousness | Ordinal ✅ | How nervous do you generally feel in new situations? (1: Not at all; 2: Slightly; 3: Moderately; 4: Very; 5: Extremely) |
|
confidence |
The level of confidence | Ordinal ✅ | How confident do you feel in general, day-to-day? (1: Not confident at all; 2: Slightly confident; 3: Moderately confident; 4: Very confident; 5: Extremely confident) |
Binary variables are already dummy
Binary variables, success, distance, narrow, sleep, and regularsports, are already recorded as dummy variable, 1-0.
-
In this analysis, we propose a cause-and-effect relationship in which these factor variables may affect (increase or decrease) the odds of hitting the target.
flowchart LR subgraph C0[Continuous factor variable] direction BT C01[Distance from the trash can] C02[Physical coordination] end subgraph D0[Dummy factor variables] subgraph I0[Narrow] direction TB I1[Narrow trash can] I2[Wide trash can] end subgraph M0[Playing sports regularly] direction TB M1[1: Yes] M2[0: No] end end subgraph O0[Dummy outcome variable] E[Success<br><br>1: Hitting <br><br> 0: Not hitting] end C01 -.->|May affect| E C02 -.->|May affect| E I0 -.->|May affect| E M0 -.->|May affect| E
[[Logistic regression]] #code⚓︎
- Model code
-
Working code
Code explanation
- Line 1: We put
successhere ➜outcome_here;distancehere ➜factor1_here;narrowhere ➜factor2_here;physicallycoordinatedhere ➜factor3_here;regularsportshere ➜factor4_here.- Outcome variable variable first; then, factor variables separated by plus (+).
- Line 2: Check the first argument:
model1. If this is model1, then we should use model1 here- This needs to be
model1, otherwise this code won't work. - Note that the data argument is different since we use a different dataset:
data = trashball.
- This needs to be
- Line 1: We put
[[Logistic regression]] #output⚓︎
Hitting the target
| Factors | Odds Ratios | std. OR | p |
|---|---|---|---|
| (Intercept) | 278.29 (336.53) |
0.64 (0.18) |
0.001*** |
| Distance from the trash can | 0.37 (0.06) |
0.03 (0.02) |
0.001*** |
| Tossing a ball to a narrow trash can | 0.10 (0.06) |
0.32 (0.09) |
0.001*** |
| The level of physical coordination | 1.48 (0.41) |
1.57 (0.50) |
0.156 |
| Playing sports regularly | 13.70 (11.49) |
3.27 (1.24) |
0.002** |
| Observations | 189 | ||
| R² Tjur | 0.645 |
[[Logistic regression]] with dummy variables #interpretation⚓︎
Logistic regression with dummy variables interpretation template. Click to expand
First section: The significance levels
Mention which variables [variable labels] are statistically significant, and which variables are statistically nonsignificant (if any). Variables with at least one asterisk (*) are statistically significant.
[Variable label of significant factor variable 1], [Variable label of significant factor variable 2], [Variable label of significant factor variable 3]... are statistically significant factors of [Variable label of outcome variable] since the p values are less than 0.05. [If any]: [Variable label of significant factor variable 4], [Variable label of significant factor variable 5]... is(are) not statistically significant factor(s) of [Variable label of outcome variable] since the p value(s) is(are) greater than 0.05.
Second section: The explanation of odd ratios
Mention how significant factor variables increase or decrease the odds of the outcome variable happening, using the “Odd ratios” column. When reporting the odd ratios of continuos variables, ensure that the sentence includes the unit of analysis (one unit, a day, a score, a year, a dollar, etc.) of both the factor variables and the outcome variable. When reporting the odd ratios of dummy variables, ensure that the sentence includes omitted - comparison category. When reporting the negative odd ratios of the dummy factor variables, make sure to divide 1 by the odd ratios.
A [unit/day/score,year,dollar (unit of analysis of continuous factor variable1)] increase in [Variable label of significant continuous factor variable1] increases/decreases the odds of [Variable label of outcome variable] by [odd ratio + times.
[Variable label of included dummy variable1] increases/decreases the odds of [Variable label of outcome variable] by [odd ratio + times
[Variable label of included dummy variable2] increases/decreases the odds of [Variable label of outcome variable] by [odd ratio + times
Note: Do not mention nonsignificant variables here.
Third section: The explanation of standardized odd ratios
Mention the strongest factor variables of the outcome variable using the "Standardized odd ratios" (Std. OR column) in order. Only mention the statistically significant ones. "Standardized odd ratio" is an absolute number, which means -.56 is stronger than .45.
The strongest factor of [Variable label of outcome variable] is the [Variable label of first strongest factor variable] (std. OR=0.xx), followed by [Variable label of second strongest factor variable] (std. OR=0.xx), and [Variable label of third strongest factor variable] (std. OR=0.xx).
Note: Do not mention nonsignificant variables here.
Fourth section: The explanation of Tjur R-squared
Report the Tjur R-squared value as a percentage with the statistically significant variables.
The Tjur R-squared value indicates that [Tjur R-squared value] of the variation in [Variable label of outcome variable] can be explained by [Variable label of significant factor variable1], [Variable label of significant factor variable2], [Variable label of significant factor variable3]...
Note: Do not mention nonsignificant variables here.
Logistic regression with dummy variables interpretation sample
First section: The significance levels
Distance from the trash can, tossing a ball to a narrow trash can, and playing sports regularly are statistically significant factors of hitting the target since the p values are less than 0.05. The level of physical coordination is not a statistically significant factor of hitting the target since the p value is greater than 0.05.
Second section: The explanation of odd ratios
A foot increase in distance from the trash can decreases the odds of hitting the target by 2.70 times.
Tossing a ball to a narrow trash can decreases the odds of hitting the target by 9.82 times compared to tossing a ball to a wide trash can.
Playing sports regularly increases the odds of hitting the target by 13.70 times compared to not playing sports regularly.
Third section: The explanation of standardized odd ratios
The strongest factor of hitting the target is distance from the trash can (std. OR=33.33), followed by playing sports regularly (std. OR=3.27), and tossing a ball to a narrow trash can (std. OR=3.12).
Fourth section: The explanation of Tjur R-squared
The Tjur R-squared value indicates that 64.5% of the variation in hitting the target can be explained by distance from the trash can, tossing a ball to a narrow trash can, and playing sports regularly.