06A-Respect
January 19, 2023
1 6A: Respect: Does it make a difference?
Name: Blessy, Dana
[1]: # This code will load the R packages we will use
suppressPackageStartupMessages({
library(coursekata)
})
1.1 1.0 - The Research Question
1.1- Do you think you would be more likely to do something if the person asking was being respectful,
even if the thing they are asking you to do is unpleasant?
Yes, I am more likely to do something if the person is being more repectful.
1.2- WRITE A PARARGRAPH about what you learned in our community circle about what you’re
willing to do, based on the situation
We learned that people are willing to do stuff for a price. If more poeple do something
the same they are more likely to do it. It also affects the reactions that people have.
We leanrned that who ever tells them to do something matters because they wouldn’t
want to disrespect someone with a higher authority.
That’s what some researchers (Yeager, Hirschi, and Josephs) wanted to know. They had a hy-
pothesis: Respectful instructions will make people (especially adolescents) more likely to follow the
medical advice of their doctor. What do you think about this hypothesis? How would we write
this idea as a word equation?
We think this hypothesis is decent because they they have a point of why people might
listen due to the tone used. Respect = openess + other stuff
1.2 The Study
1.3 2.0 - The Data
For other reasons, the researchers also had access to some other information about these participants
(e.g., their baseline testosterone from a saliva sample, survey responses to narcissism, openness,
etc.).
1
[2]: # We have loaded the csv link into an object called linktocsv
# Use that object to write some code to get this data into a data frame called␣
↪respect_study
linktocsv <- "[Link]
↪2PACX-1vShHDu7P5XnUWo_xBtB67I00R-TTCyB73GyILmjlMq3LGM-kHFg-rIBQwB4upLOU7mTXG6fg8QxTLhB/
↪pub?gid=191363782&single=true&output=csv"
# Run some code to check out the data.
[3]: VegStudy <- [Link](linktocsv, header = TRUE)
head(VegStudy)
subject respect_condition testosterone spoon1_before spoon1_after spoon2_b
<int> <chr> <dbl> <dbl> <dbl> <dbl>
1 4098 No Respect 195.810 1.37 1.27 1.67
2 3957 No Respect 42.848 4.00 3.92 4.20
A [Link]: 6 × 23
3 4765 No Respect 211.544 1.81 1.72 2.10
4 7773 No Respect 60.581 1.77 1.70 2.01
5 2374 No Respect 72.091 1.40 1.30 1.69
6 4657 Respect 106.763 2.05 1.73 1.77
2.1- What are the cases?
The case consist of people that took the personality test. The cases are also the people
whol took the vegemite as an expirament and their imformatrion was logged those are
the cases.
Here are the variables in the data frame respect_study: - subject ID for the participant. -
respect_condition Whether the participant watched the “Respect” or “No Respect” video -
testosterone The measure of the testosterone via the saliva sample - spoon1_before The mass
(grams) of the Vegemite in the 1st spoon before it was given. - spoon1_after The mass (grams)
of the Vegemite in the 1st spoon after it was given. - spoon2_before The mass (grams) of the
Vegemite in the 2nd spoon before it was given. - spoon2_after The mass (grams) of the Vegemite
in the 2nd spoon after it was given. - ravens_correct Number of items answered correctly in a
Standardized Progressive Matrix - openness Score from the Big Five personality test (OCEAN),
1 (low) - 7 (high) - conscientious Score from the Big Five personality test (OCEAN), 1 (low)
- 7 (high) - extraversion Score from the Big Five personality test (OCEAN), 1 (low) - 7 (high)
- agreeable Score from the Big Five personality test (OCEAN), 1 (low) - 7 (high) - narcissism
Score from the Big Five personality test (OCEAN), 1 (low) - 7 (high) - emotion_stability Average
score from questions regarding emotional stability, 1 (low) - 7 (high) - reactance Average score
from questions regarding reactance, 1 (low) - 7 (high) - subjective_power Average score from
questions regarding subjective power, 1 (low) - 7 (high) - aggressive_right_now “How aggressive
are you feeling right now?”, 1 (low) - 7 (high) - sex_drive_right_now “How high is your sex
drive right now?”, 1 (low) - 7 (high) - campus_greek Are you in a fraternity or sorority? - status
Average score from questions regarding personal status, 1 (low) - 7 (high) - competent Average
score from questions regarding self-competence, 1 (low) - 7 (high) - autonomous Average score from
questions regarding autonomy, 1 (low) - 7 (high) - respectful Measure of how respectful they felt
the researcher was, 1 (low) - 7 (high)
2
2.1 - Identify the variables that are most relevant to the respect hypothesis. What would be the
outcome variable? Is there something like “amount of Vegemite eaten after watching the video” in
this list of variables? How could we create such a variable in our data frame?
The variables that are most relevent to respect are openness because if they are more
opening to doing something then they will be more open to doing it. Another variable
that might work to prove this hypothesis is agreeable becuase in order to follow medical
advice or to try something new you have to b e open to the idea of trying something
new. No there is not something like the amount of Vegemite eatne after watching the
video. we can create a variable in the dataframe by asking them how much they liked
the vegemite on a scale from 1 through 5.
2.2- WRITE A PARAGRAPH on what you learned from your Big 5 Personality Test (in assign-
ments) and whether any of those other variables might be a confounding variable for this study
What we learned from our big 5 personality test was that high scores for openness tend
to be more adventurous while low scores tend to be practicle and “avoid the unknown”.
Another thing we learned from the test was that people that agree more tend to trust
and emethize more while people who don’t are usually untrusting and put themselves
before others. These variables may be confounding for this study because they help
prove the orginal hypothesis based on the video that they watched. Based on these
variables it helps us furhter understand why the people are more likely to do something
based on respect.
2.3 - Which is the explanatory variable? Should we use respect_condition or respectful to
explore the researchers’ hypothesis? Why?
The explanitory variable is respect. we should use “repectful” to explore the researchers hypothesis
because it shows us more detailed data on how respected they felt regaurdless on which video they
wtached.
1.4 3.0 - Exploring Variation
3.1 - Explore the variation in the outcome variable using a plot or graph.
[4]: VegStudy$VegEaten <- VegStudy$spoon2_before-VegStudy$spoon2_after
gf_histogram(~VegEaten, data = VegStudy, color = "red", fill = "blue")
3
50
40
30
count
20
10
0.0 0.1 0.2 0.3 0.4
VegEaten
3.2 - Also make some visualizations to explore the respect hypothesis.
[5]: gf_histogram(~VegEaten, data = VegStudy) %>% gf_facet_grid(respect_condition~.)
4
30
No Respect
20
10
count
30
Respect
20
10
0
0.0 0.1 0.2 0.3 0.4
VegEaten
3.3 - What does a value of zero mean on this outcome variable? What does a high value mean?
0 means they dont think they were repectful at all and high is a 7 and this means that
they think that they were very respectful.
3.4 - Did respect make a difference in how much Vegemite the participants ate after watching the
video? Make an argument for EACH side (strictly based on what you see in the data).
• PRO (Some reasons respect DID make a difference):
• Yes, repect did make a difference on how much of the vegemite they took due to
the variable “respect_condition” people who watched the respectful video took
more vegemite then the people that didn’t watch the respectful video.
• People who feel that their researcher was more respectful are more likely to try
the vegemite due to the subjects trying to be more respectful to someone with a
higher authority. This falls in the variable “respectful”
• “status” is really important to know because we need to know the postion in whicfh
the subjecty or the researcher is to see if they will respect someone with a higher
authority or disrespect them.
• “subjective_power” would be good to knwo to see how much power the subject
has if they are doing this voluntarily or not.
• CON (Some reasons respect did NOT make a difference):
• People who aren’t extroverts are more likely to not try the vegemite becuasee they
are don’t want to be involved in things that will make them embarass themselves,
5
variable “extraversion”
• “emotion_stability” was really high from what we can see in the data.
• Many people didn’t answer the test correctly some subject decide it would be
okay to lie their way out of the survey that was conducted. This is seen in the
variable, “ravens_correct” where is states Number of items answered correctly in
a Standardized Progressive Matrix
3.5 - Could this pattern of data be the result of randomness?
No, This can be a result of randomness because some people got a small amout and
they ansered correctly and some people got. a huge amount of vegemite and they didn’t
send the right answeres they lied on the survey.
1.5 4.0 - Creating Some Simple Models
4.1 - If we used the mean as our empty model to predict how much Vegemite someone in this study
would eat (regardless of condition), what would we have predicted? How many people would we
have predicted correctly?
[6]: [Link] <- lm(VegEaten~NULL, data = VegStudy)
[Link]
Call:
lm(formula = VegEaten ~ NULL, data = VegStudy)
Coefficients:
(Intercept)
0.1836
[7]: tally(~ VegEaten == 0.1836, data = VegStudy)
VegEaten == 0.1836
TRUE FALSE
0 160
4.2 - The mean might be a terrible model for this data. Why does it seem so terrible?
It would seem like a terrable model because there is a lot of data which means that the
graph would look wierd and it’ll be hard to be read by us.
4.3 - If we used the mode as our empty model to predict how much Vegemite someone in this study
would eat, what would we predict? How many people would we have predicted correctly?
[8]: tally(~VegEaten == 0, data =VegStudy)
VegEaten == 0
TRUE FALSE
47 113
47 people predicted correclty and 113 people did not predict correctly.
6
4.4 - Use favstats() to put the mean into your visualization (as a blue line). Also put the mode
into your visualization (as a green line).
[9]: # Feel free to copy and paste your visualization from above here.
# Hint: You can pipe on a line to any faceted histograms, boxplot, jitter, etc.
# Try functions like gf_vline (for vertical lines) or gf_hline (for horizontal)
[Link] <- favstats(~VegEaten, data= VegStudy)
gf_histogram(~VegEaten, data = VegStudy ) %>% gf_facet_grid(respect_condition~.
↪) %>% gf_vline(xintercept = ~mean, data = [Link], color = "blue") %>%␣
↪gf_vline(xintercept = 0, color = "green")
30
No Respect
20
10
count
30
Respect
20
10
0
0.0 0.1 0.2 0.3 0.4
VegEaten
4.5 - Even though we would have predicted more people correctly using the mode, why might the
mean still be a useful model for this data?
The mean would be useful for this data becuase we would be able to see what the
difference between both of the sides of the graph. What we mean is that we are able to
see that on the 0.0 if the vegemite people eat on the bottom graph there was at leat 15
pepople and on the 0.4 side there was at least 18 people who decided to give vegemite
a try. we can see that the count of vegemite eaten on both sides were different.
1.6 5.0 - Error from the Models
5.1 - Here is how we might write the mode as an empty model in GLM notation:
7
𝑌𝑖 = 0 + 𝑒 𝑖
Modify the copy below to write the mean (the number) as the empty model:
𝑌𝑖 = 0 + 𝑒 𝑖
5.2 - Let’s imagine we are going to use the mode as our model. Take a look at the first student in
the sample. How would you represent that student in GLM format? What would be that student’s
DATA? MODEL? ERROR?
[10]: head(VegStudy,1)
subject respect_condition testosterone spoon1_before spoon1_after spoon2_b
A [Link]: 1 × 24 <int> <chr> <dbl> <dbl> <dbl> <dbl>
1 4098 No Respect 195.81 1.37 1.27 1.67
Respect = openess + other stuff
5.3 - In the visualization you made above, where would the residuals (𝑒𝑖 ) for each model be? Which
model “balances” the residuals?
The estimated error would be in openness and respectful.
5.4 - If we calculated the sum of squared residuals off the mean versus off the mode, which model
would have the lower SS? Make your prediction then try calculating these with R. (Note: There
are easy functions to do that for the mean but not for the mode. But you can create a column of
residuals from 0, square them, and add them up.)
[11]: # Write code here.
VegStudy$Residuals <-resid([Link])
sum((VegStudy$Residuals)^2)
VegStudy$ModeResidual <- VegStudy$VegEaten - 0
sum((VegStudy$ModeResidual)^2)
4.7212975
10.1162
5.5 - To decide which is a better model, we always have to make explicit how we are measuring
“error” (how off the model is). One way of measuring error is to count up how many predictions
were correct versus incorrect. Which model minimizes that kind of error? What kind of error does
the other model minimize?
The first model minimalizes the error by 4.72 compared to the other error that mini-
malizes the error by 10.11. The first maodel is better becuaser the estimated error is
way much smaller and easier to work with.
8
1.7 6.0 - Closing Thoughts
For reasons that are a bit opaque now, in statistics, we really value the Sum of Squares as a
measure of error. As we progress through the course, we will continue to learn more about its
special properties. Squaring, as odd as it seems right now, will allow us to do some cool stuff in
the following chapters.
Here are some questions to think about:
6.1 - Think about the way we represent our models: DATA = MODEL + ERROR. Could there be
a model so good that there is no need to add “ERROR” to it?
Yes, there is a possible way that a model will be that good that there won’t be a error.
The whole point of a model is to minimize and balance the sum of our error and to
make it always equal 0 so yes there could possibly be a model that is so good.
6.2 - If we shuffled this respect_study data such that the Vegemite eaten would be randomly
categorized into two groups, would the empty model for that data change? Why or why not? Why
is the empty model a good way to represent a random DGP?
[12]: respect_study <-VegStudy
respect_study$veg_eaten <- VegStudy$VegEaten
No, the data would still be the same even if the empty model was changed because the
data is all coming from one place. The empty model is a good way to represent random
DGP because its a good way to lower the estimated error.
[13]: # Here's a set of visualizations where the outcome gets shuffled
# and the empty model is fit from the shuffled data.
# Run it a few times to see if the empty model changes.
# this creates the outcome variable
respect_study$veg_eaten <- respect_study$spoon2_before -␣
↪respect_study$spoon2_after
# this shuffles the outcome
respect_study$shuffled_veg_eaten <- shuffle(respect_study$veg_eaten)
# gets the favstats of the outcome
shuff_stats <- favstats(~ shuffled_veg_eaten, data = respect_study)
# makes a histogram of the shuffled data
gf_histogram(~ shuffled_veg_eaten, data = respect_study, fill = "purple4",␣
↪color = "purple3") %>%
gf_facet_grid(respect_condition ~ .) %>%
gf_vline(xintercept = ~mean, data = shuff_stats) %>%
gf_labs(title = "veg_eaten has been shuffled")
# makes a jitter plot of same data
9
gf_jitter(shuffled_veg_eaten ~ respect_condition, data = respect_study, size =␣
↪5, alpha = .2, height = 0, color = "purple") %>%
gf_hline(yintercept = ~mean, data = shuff_stats) %>%
gf_labs(title = "veg_eaten has been shuffled")
veg_eaten has been shuffled
30
No Respect
20
10
count
0
30
20
Respect
10
0
0.0 0.1 0.2 0.3 0.4
shuffled_veg_eaten
10
veg_eaten has been shuffled
0.4
0.3
shuffled_veg_eaten
0.2
0.1
0.0
No Respect Respect
respect_condition
11