Skip to content

Instantly share code, notes, and snippets.

@naomispence
Created November 10, 2025 01:33
Show Gist options
  • Select an option

  • Save naomispence/fb0da44f40fbdb103d6cfc1939aa3c9f to your computer and use it in GitHub Desktop.

Select an option

Save naomispence/fb0da44f40fbdb103d6cfc1939aa3c9f to your computer and use it in GitHub Desktop.
#LINES 3-22 ARE EXAMPLES OF BIVARIATE ANALYSIS AND INTERPRETATION FOR 2 CATEGORICAL VARIABLES; LINES 25-49 ARE FOR QUANTITATIVE DEPENDENT AND CATEGORICAL INDEPENDENT.
####CHI SQUARE for categorical independent variable and categorical dependent variable
#only change the variable names
data.chisq1 <- chisq.test(wave5addhealth$H5OD2A, wave5addhealth$H5HR2)
data.chisq1
##INTERPRETATION OF CHI SQUARE: The chi square test of independence shows that there is a statistically significant
# association between sex at birth and living arrangements among adults in the U.S. (chi squared=14.715; p<.05).
#when you have a chi square with p-value less than .05, create a crosstab to see the relationship between the variables.
#NOTE THAT YOU NEED TO PUT YOUR DEPENDENT VARIABLE FIRST; dependent ~ independent
#The order that you list variables in a crosstab is critical for ensuring that you're
#correctly interpreting the results. We "percent down, compare across" to see group
#differences in the dependent variable by groups of the independent variable.
lehmansociology::crosstab(H5HR2 ~ H5OD2A, data = wave5addhealth,
title = "Living Arrangements by Sex Assigned at Birth",
format= "column_percent")
#Interpretation: 85% of males live in their own place, compared to 89% of females.
#A higher percent of those who were assigned the male sex at birth (9.1%) live with
#their parents as adults who are in their 30s or early 40s, compared to 6.4% of females.
####ANOVA for a quantitative dependent variable (DV) and categorical independent variable (IV)
#run an analysis of variance (ANOVA); only change variable names and put in this order DV ~ IV
data.aov1 <- aov(wave5addhealth$H5ID23 ~ wave5addhealth$H5HR2, data=wave5addhealth)
summary(data.aov1)
#remove hashtag on line below to run Tukey ONLY if the F test is statistically significant AND there are more than 2 categories on the IV.
#TukeyHSD(data.aov1)
##INTERPRETATION OF ANOVA: The ANOVA results show a statistically signficant relationship between living arrangements
# and the amount of time spent watching TV, movies, and videos among adults in the United States (F=8.58; p<.05).
# Tukey's post-hoc test shows more specifically that adults who live in their parents' home or another persons' home
# watch significantly more TV, movies, or videos than those who live in their own place (p<.05). There is a difference
# of about 4 hours per week between those living arrangements.
#when you have an ANOVA with p-value less than .05, create a bivariate bar graph to see the relationship between the variables.
##BAR GRAPH FOR QUANTITATIVE DEPENDENT VARIABLE AND CATEGORICAL INDEPENDENT VARIABLE
ggplot(data=subset(wave5addhealth, !is.na(H5HR2)))+stat_summary(aes(x=H5HR2,y=H5ID23),fun.y=mean,geom="bar")+
ylab("Average Hours Per Week")+
xlab("Current Living Arrangements")+
ggtitle("Bar Graph of Average Time Spent Watching TV/Movies/Videos by Living Arrangements")
#Interpretation: The graph shows that Add Health respondents who live in their own
#home watch about 13 hours of TV, movies, and videos per week. The highest average
#time spent watching TV is among those living in their parents' home or another
#person's home; these group average about 17.5 hours per week.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment