Create an R script that: 1. Establishes your working directory. 2. Reads in a data file. 3. Runs descriptive statistics.
The following R script and notes answers the question above: getwd() ##Tells you your current working directory## setwd() ##How to set the directory that you want to work from. 1. go to session; 2. set working directory; 3. choose directory## #Reading in your data# health_survey<- read.csv("health_survey.csv",header = TRUE,sep = ",") #check your dataset# names(health_survey) ##what type of variables are you working with## is(health_survey) is.numeric(health_survey) is.numeric(health_survey$eat.out) is.factor(health_survey$burger.pred) ##Factors are the r-objects which are created using a vector. It stores the vector along with the distinct values of the elements in the vector as labels. The labels are always character irrespective of whether it is numeric or character or Boolean etc. in the input vector. They are useful in statistical modeling.## levels(health_survey$burger.pred) #what levels does your factor have? Note that levels are alphabetized by default, but you can reset this## health_survey$burger.pred<- factor(health_survey$burger.pred,levels = c("low","correct","high")) health_survey$fat.prediction<- factor(health_survey$fat.prediction,levels = c("10-40g","41-70g","71-100g","101-130g")) ##Does your dataset include missing (NA) values? Note, this will evaluate the question for every cell in your dataset.## is.na(health_survey) ##is your dataset includes NAs, some analyses won't return an answer.## ############### #Data Analysis# ############### ##Exploratory Analysis## #calculate the mean of a numerical variable# mean(health_survey$eat.out) #calculate the standard deviation of a numerical variable# sd(health_survey$eat.out) #summarize the responses in a given category# summary(health_survey$fat.prediction) #summarize your entire dataset# summary(health_survey)
What mechanism drives the evolutionary process?
Natural selection drives the evolutionary process.
Thirty-seven undergraduate students participated in a biomedical research fellowship semester. Pre/post data was collected documenting science self-efficacy and comfort conducting research. What analysis(es) would run to find out if there was statistical significant change in the variables documented in this study?
The first thing I would do, is to plot the data by question (pre/post) in histograms. This would allow me to visualize any positive or negative (increase or decrease) change in the data pre to post. Second, I would run t-tests to measure if the change was statistically significant. Because there is an n of only 37, and type I error is more likely to occur with samples close to n=30, I would also run a wilcoxon t-test ( also known as a wilcoxon signed rank test). This is a non-parametric analysis and will decrease type I error.