# Tutor profile: Luca S.

## Questions

### Subject: Python Programming

Ask the user for a positive number and output the smallest divisor, bigger than one, for the inputted number. Output "Prime" if the number is a prime number.

num = int(input("Enter a number: ")) if num < 0: print("The number can't be less than 0.") else: div = 1 found_divisor = False while div <= num and not found_divisor: div += 1 if num % div == 0: found_divisor = True if not found_divisor or div == num: print("The number is a PRIME number.") else: print(div) # A few things to note, The use of a while loop instead of a for loop with a break, the user prompting for wrong input values, and the run in O(n) of the algorithm.

### Subject: Machine Learning

You have a data-set comprised of X and y regarding risks levels for patients diagnosed with lung cancer. X is an NxD matrix containing information regarding patients of a clinic, such as temperature, sugar level, etc. , Where N is the number of patients visited, and D is the number of clinical variables per patient. y is a N dimensional vector of integer values between[0,2], where 0 means the patient is at low-to no risk, 1 the patient is a medium risk patient, and 2 the patient is a high risk patient. What would you use to predict y, and why? What if N=10,000 and D=100? and N=10,000,000 and D=100?

The question if fairly broad. The first thing to understand is whether this is a classification or regression problem (or both). From the question statement it is clear this it a classification problem. The second is to understand the context and the type of data. We are dealing with medical data, so there are a few important points. One for example is that we probably want some measure of confidence of the predictions the model will make, so to inform the user of whether the prediction can or can not be trusted, and to what extent. Another is transparency, so if something goes wrong with the model we can perhaps understand fix it. The missing information is here the value for N and D, which will also matter for the answer. For low dimensional data a fully probabilistic approach is desirable, so a Gaussian Process based classifier or another probabilistic model with a fully probabilistic treatment of the data, and a measure of confidence is the best choice. If the number of data-points, or the dimensionality of the data is too high, then other models can be considered. It is usually useful to start from simple models, which give insights into the nature of the problem. With very large data-sets (e.g. N=10,000,000 & D=100) a Neural Network is likely the best-performing choice. Here, note, that transparency is an issue, and guarantees of confidence can not be given. Even if the outputs are made to behave like probabilities through Sigmoidal transformations, in fact, the treatment of the data is NOT probabilistic, so the answers should not be trusted blindly as such. Other considerations can be made: - The data, for example, might be missing some variable values for some patients, in which case a generative model is most appropriate. - The values of y will most likely need to be re-encoded into one-of-k encoding of the same, where, for example 1->100, 2->010 and 3->001. - etc.

### Subject: Basic Math

What is the value of $$\frac{27^x}{9^y}$$, if $$3x−2y=5$$?

The first fact to notice is that the base of the $$x$$ and $$y$$ exponent are both powers of three. It is worthwhile transforming the equation into one with the same base both at the numerator and denominator, for example: $$\frac{(3^{3})^{x}}{(3^{2})^{y}}$$, which simplifies to $$\frac{3^{3x}}{3^{2y}}$$ . Since the rules of fractions and exponential allow us to bring the denominator to the numerator while flipping the sign of the exponential we have that: $$\frac{3^{3x}}{3^{2y}}=3^{3x-2y}$$ Since we know that $$3x−2y=5$$, we can substitute and obtain: $$3^{3x-2y}=2^5$$ Which is the solution.

## Contact tutor

needs and Luca will reply soon.