# Tutor profile: Julia V.

## Questions

### Subject: R Programming

You run an experiment to test a sleep drug. The same 16 subjects sleep at your clinic on two different nights. They all take the drug one night, they get a placebo on the other - of course, they do not know which is which. On both nights, their sleep is analyzed with EEG, then this data is preprocessed to obtain a single measure of sleep efficiency for each subject on each night. You import this data in R and obtain a 16x2 table where the rows represent each participant and the columns are Night 1 (drug condition) and Night 2 (placebo condition). Each cell is a data point representing sleep efficiency. From this how do you determine whether the drug had an effect?

What we are curious about is whether there is a significant difference between the two columns. But how do we define "difference"? Since we have continuous data and we would like to have an overall picture of the drug's effect, testing for a significant difference in means seems like the right approach. Since we have two conditions (drug and placebo), we can do a T-test or a Wilcoxon Rank-sum test to test this. The choice depends on whether or not the data is normally distributed. To be sure, we perform a Shapiro-Wilk test for both columns separately. > shapiro.test(drug) > shapiro.test(placebo) Assume both tests return p >= 0.05. This means that our data is normally distributed, so we choose to do a T-test. But should it be paired or unpaired? Since both conditions have the same test subjects, we use a paired T-test. > t.test(drug, placebo, paired = T) If the resulting p-value is less than the threshold for statistical significance (generally 0.05), we found a significant difference in means, so the sleep drug likely works!

### Subject: Python Programming

You have a long list of numbers where elements can occur multiple times. You want to extract a list of the unique elements. What is the easiest way to do this in Python? Given: nums = [1, 1, 9, 1, 5, 3, 4, 4, 3] Extract a list of the unique numbers: nums_unique = [1, 3, 4, 5, 9]

You might be tempted to solve this using a for loop: nums_unique = [] for num in nums: if (num not in nums_unique): nums_unique.append(num) But what if I told you we can solve this in a single line? Ever heard of sets? One of the special things about sets is that they do not allow the repetition of elements. Python allows us to convert lists into sets, like so: nums_set = set(nums). This yields the set {1, 3, 4, 5, 9}. Notice the curlies? This is the notation for sets in Python. However, we wanted the result nums_unique to be a list. Luckily, Python also allows us to convert a set to a list, so we can just call: nums_unique = list(nums_set). We can also solve this in a single line, declaring nums_set has no use unless plan on doing something with the set itself later in the code. The solution: nums_unique = list(set(nums)) Smart solutions like this keep our code clean and save some unnecessary typing! :)

### Subject: Machine Learning

What is an example of unsupervised learning?

Clustering is a form of unsupervised learning. For example, k-means clustering groups unlabeled data into k groups called clusters. The method starts off with an initial assignment of each data point to a random cluster. After this, an iterative process begins where at each step, cluster centers are recalculated as the average of their data point members, then data points are reassigned to the cluster center that is now closest to them. The process finishes once cluster membership stabilized, that is, no data point changed which cluster it belongs to during the previous iteration. A benefit of this (and unsupervised learning in general) is that training data does not need labels. In fact, the aim of the algorithm is to construct the labels by identifying groups in the data. Among other things, this comes in handy for market segmentation. For instance, identifying customer groups based on their online activity in order to use targeted ads.

## Contact tutor

needs and Julia will reply soon.