# Tutor profile: Prasoon G.

## Questions

### Subject: Python Programming

How do I know how much memory my program uses in Python?

If you’re on a Linux machine, you can get a ballpark number by using **top** command. From within Python, use import resource print 'Memory usage: %s kb' % resource.getrusage(resource.RUSAGE_SELF).ru_maxrss to print the total memory currently in use. Printing the memory used before and after the model is loaded should give you how much memory the model takes.

### Subject: Machine Learning

What should everyone know about gradient descent?

* Gradient descent is essentially an optimization algorithm, that has nothing to do with machine learning per se. Because a lot of machine learning involves optimization, it is used in machine learning quite a bit. * For convex problems, it converges to the global optimum; for non-convex problems, it converges to a local optimum. * The distinction of batch and stochastic variants comes into the picture in machine learning when the objective function is a function of $m$ data points, but can be *approximated* by $n << m$ data points. Mostly, $n = 1$. For $n > 1$, it is typically referred to as mini-batch SGD. * When implementing gradient descent for a non-trivial function, you should write a gradient checker that compares the output of your gradient computation to a finite difference approximation. * Normalize your data for better convergence. **Batch gradient descent** * It looks at the entire training data for each step. So it computes the exact gradient, but is slow. * You do not have to worry about the step size much. A constant step size throughout the entire optimization works in most cases. This is because for well-behaved objective functions, the gradient is zero at the optimum, and small near it. **Stochastic gradient descent** * It looks at a subset of the training data at each step. So it computes an approximate gradient. But the gradient computation is very fast compared to the batch version for large datasets. * Randomize your data after each epoch for faster convergence. * While implementing SGD, have a parameter $n$ that determines the number of points taken at each iteration for gradient computation. Set $n$ to a large value (close to or equal to $m$) to sanity check your code - it should converge to the optimal. * Choosing the right step size is crucial here, because the approximate gradient is not zero at the optimum. So you need to manually turn down the gradient. Turning it down too slowly will make the convergence slow, turning it down too fast will make the algorithm stop before reaching close to the optimum. * Use a subset of the training data to optimize the step size. Because SGD is primarily independent of the dataset size, you can take a small subset of the data to get a good step size. Finally, while gradient descent is the among the most common techniques used for optimization in practice, it is not the best technique. There is a whole spectrum of iterative optimization techniques. One one end, you have techniques like SGD that compute a noisy estimate of direction and magnitude of step and take many iterations to converge, and on the other end, you have techniques like Newton's method that compute a very precise estimate of direction and magnitude of step and therefore take much fewer steps to converge. In between you have techniques like Levenberg–Marquardt algorithm. Here's a nice paper that discusses some practical tips for SGD: Stochastic Gradient Descent Tricks (http://link.springer.com/chapter/10.1007/978-3-642-35289-8_25)

### Subject: Computer Science (General)

How can I get good at dynamic programming?

**The basic idea is that you need to find a subproblem with similar structure as the problem you’re trying to solve. Usually, there are not many ways you can do this, and just considering all possibilities gives you the solution.** Let’s look at some examples: * Coin change problem (http://www.geeksforgeeks.org/dynamic-programming-set-7-coin-change/): Given a value $N$, if we want to make change for $N$ cents, and we have infinite supply of each of $S = {S_1, S_2, .. , S_m}$ valued coins, how many ways can we make the change? The order of coins doesn’t matter. How do you proceed? Clearly, there are two dimensions to the problem — the value $N$ and the types of coins. If $N$ was small like 1, or there was just one type of coin, the answer would be trivial. So your subproblems will either have a smaller $N$ or fewer types of coins, or both. How do you know which one is it? Let’s consider the case of smaller $N$. Say the given $N$ was 10. Now if you knew how many ways you can make change for a smaller value, say 8, can you *efficiently* find the no. of ways to make change for 10? Note that the operative word here is “efficiently” — if you can’t do it efficiently, DP is useless. Now, you extend the above question — if you knew how many ways you can make change for *all values* of $N$ smaller than 10, can you efficiently find the no. of ways to make change for 10? One way you think you can proceed is to add the no. of ways to make change for $10-S_1$, no. of ways to make change for $10-S_2$, …, because adding one coin to each of these ways would give you a sum of 10. However, this is incorrect — because the order of coins is not important, so you will be overcounting. In particular, if $S_1 = 1$ and $S_2 = 2$, then $10 - S_1$ include ways that include coins of denomination 2, and you’ll add 1 to these to get 10. Similarly, $10 - S_2$ include ways that include coins of denomination 1, and you’ll add 2 to these to get 10. So there is repeated counting. Let’s go to the next dimension — the types of coins. Suppose you knew how many ways there were to make change of $N$ using coins of denominations ${S_1, …, S_{m-1}}$. Can you use this to generate the no. of ways to make change of N using ${S_1, …, S_m}$? Clearly not. But, if you knew how to make change for *any value* below $N$ using coins ${S_1, …, S_{m-1}}$, you can get the answer. In particular, no. of ways to make change for $N$ using coins ${S_1, …, S_m}$ = no. of ways to make change for $N - S_m$ using coins ${S_1, …, S_m}$ + no. of ways to make change for $N$ using coins ${S_1, …, S_{m-1}}$. Note that the above two terms are mutually exclusive — the first term corresponds to the cases where you use at least one coin of denomination $S_m$, while the second case includes no coins of denominations $S_m$. Now, from this point, you just have to perform table-filling to get the answer. * All-pair shortest path problem (https://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm): Given a graph, find the shortest path from every vertex to every other vertex. What can you reduce to make the problem size smaller? The graph size — so find shortest path from vertex i to j using subgraph ${v_1, …, v_k}$. And that gives you the Floyd-Warshall algorithm. * Subset sum problem (http://www.geeksforgeeks.org/dynamic-programming-subset-sum-problem/): Given a set of non-negative integers $S = {s_1, …, s_m}$, and a value $N$, determine if there is a subset of the given set $S$ with sum equal to $N$. Again, there are two dimensions — the size of the set and $N$. Proceeding similar to coin-change problem, you need to use subproblems along both dimensions — there is a way to make $N$ if there is a way to make $N - s_m$ using ${s_1, …, s_{m-1}}$, or if there is a way to make $N$ using ${s_1, …, s_m}$. This has the same interpretation as above — either $s_m$ is included in the sum or it is not. **Often, people start by thinking about these mutually exclusive cases like inclusion and exclusion, which makes it tricky because different problems have different criteria to generate mutually exclusive sets. The correct way is to start thinking in terms of dimensions, which is much more consistent across problems.** Finally, I would recommend Algorithm Design (https://www.amazon.com/Algorithm-Design-Jon-Kleinberg/dp/0321295358) book. It has one chapter each on various paradigms of algorithms, like greedy algorithms, dynamic programming, divide and conquer, etc. The end-of-chapter exercises are doable in number, and cover basic to advanced problems.

## Contact tutor

needs and Prasoon will reply soon.