# Tutor profile: Srinjoy G.

## Questions

### Subject: Python Programming

Explain the process of memory allocation in Python programming language.

Computer memory is organized into a sequence of memory words which ranges from 0 to M - 1, where M is the memory words available to the computer. Each of these memory words are associated with a memory address which are usually hexadecimal number. In Python, the objects which are instances of class are stored in a heap known as the Python heap. This heap is not a data structure. Suppose, we have a class called DQN() and we want to store an object of this class in memory, so to do that we write - dn = DQN() The above line creates an object or instance of the DQN() class and stores it in the Python heap memory. This Python heap is actually divided into continuous chunks of data called as blocks. These are very similar to that of an array. To implement this structure, we keep continuous holes of available free memory in a linked list called as free list. The links joining these holes are stored inside the holes themselves. As the memory is allocated and de-allocated, the collection of holes in the free list changes and the unused memory is separated into disjoint holes divided by blocks of used memory.

### Subject: Machine Learning

How neural networks are related to deep learning? Explain. What are activation functions? Give some examples of activation functions with their mathematical equations.

Neural networks or artificial neural networks are a specialized type of machine learning algorithm which are heavily inspired from the biological neurons present in human brains. Our brain contains neurons which has many dendrites connected to axon with the help of synapses. These synapses have some connection strength which are often modified by the external stimuli. In this way the learning happens. Similarly, in a neural network, we have a central neuron where many inputs are coming and their connections carry some strength which we call weights. This neuron is a computational unit(called as nodes) and calculates a specific function e.g. an Affine transformation in the case of linear classifiers. The output of this neuron can be one or many which depends upon the application. The inputs of the neural network are usually provided by some training data which can be image pixel values or housing data etc. The output of this network can be a classifier classifying any fruit or animals etc based on training data or predicting housing prices or steering angles in a self driving car. The architecture of a neural network is like that of a computational graph and the simplest neural network is called as the perceptron where we have input nodes with a bias neuron and a single output node where the computation of the function is performed. So for an input layer of xj nodes, we have wj weights associated with each input node and bias is denoted by b. So the Affine transformation we perform here at the output is y = wj * xj + b which is a very familiar equation and is an equation of a straight line which means linear classifier model. j is number of nodes. Here if we do not have the bias term then our equation will only pass through the origin and will not be able to capture many underlying features and essence of our training data. So the bias gives our neural network the freedom to move our linear classifier line to left or right so as to capture the essence and features of the data properly. It helps to find the best fitting line for our data. Apart from this it also helps us to find the systematic errors(happens when model is too simple) in our model which might be subjected to underfitting (when the model is unable to learn the data properly). Simply put, deep learning is basically the learning of our neural networks with multiple layers of neurons(nodes) at each layer. This means that now we can have an input layer and we can have subsequent layers with lot of neurons in each layer which are more commonly known as hidden layers which makes our neural network deeper. This addition of additional hidden layers along with an activation function helps neural network to have a non linear architecture and it helps a lot in capturing the real life data. Activation functions are those functions which helps the neural network to identify the types of output values and helps to attain non linearity in the hidden layers of the network. Some of the most commonly used activation functions with their mathematical description is given below - 1. Linear activation - provides no non linearity. Used to output real discrete values. f(v) = v 2. Sign function - Used to predict binary outputs(0 or 1) such as Cat or Dog classification etc. It is non linear. f(v) = sign(v) 3. Sigmoid function - Used to predict range of probability values for binary outputs such as Cat or Dog classification. It is non linear. f(v) = 1 / (1 + exp(-v)) 4. Tanh function - Used when the outputs of the function are desired to be both positive or negative.. Due to its mean centric and larger gradient, it is easier to train. f(v) = (exp(2v) - 1) / (exp(2v) + 1) or tanh(v) = 2 * sigmoid(2v) - 1 5. ReLU function - Rectified Linear Units. These are much easier to train than sigmoids and are usually considered in applying to hidden layers. f(v) = max{v, 0} 6. Softmax function - Used to predict probability values for more than 2 outputs such as classifying cats, dogs and elephants. f(v) = exp(vi) / sum from j=1 to k exp(vj) where k are output values

### Subject: Artificial Intelligence

What is Reinforcement learning? Explain Q-Learning and Temporal Difference in detail.

Before diving into the explanation of reinforcement learning, we consider that in AI, we have an agent who is supposed to perform some kind of specific task(such as going from point A to B by overcoming obstacles) in a given environment. Now the agent has three main characteristics with the help of which it learns - state of the agent, action taken by the agent and a reward awarded to the agent if the agent succeeds in accomplishing the task or makes a positive progress towards the given objective. Reinforcement learning(RL) is a AI algorithm where the agent learns to take proper actions with the help of rewards, and it gradually optimizes itself over a period of time by training itself to be in the correct state and by that take correct actions to gain more rewards to accomplish the task given successfully. Now, to explain Q Learning and Temporal Difference, let's suppose we have an agent who is assigned a task to move from A to B by overcoming obstacles in our stochastic environment. Let us define some variables - s = state s' = next state a = action a' = next action R = reward gamma = discount We also assume that if the agent progresses positively by overcoming obstacles then it is rewarded +1 for every correct action it takes and -1 for wrong actions. Now, the agent starts exploring the environment where it has probabilistic options to make i.e. 70% chance of moving up, 10% moving left and so on. So each of the moves of the agent are probabilistic in nature, this is simply known as non deterministic search because agent is unsure where to move next and has to explore all of his movements to find out the consequence of each of it's actions taken. Each of these moving probabilities can be multiplied with each state of the agent for up, left etc to form a next state called V(s'). Now, the reward awarded to the agent is based on the current state s and action taken by the agent a, which can be denoted by R(s, a). Let us denote the moving probabilities by the variable P(s, a, s') which is dependent on current state s, action a and next state s'. In simple terms, Q - learning is basically the learning of Quality actions/values by the agent to accomplish success in a given task. The Q parameter gives the agent an intuition about how much lucrative the action taken by him can be and should he take it or not. So, Q learning is defined by the equation - Q(s, a) = R(s, a) + gamma * sum over s'(P(s, a, s') * max of a' over Q(s', a')) This equation is telling us that the quality Q(s, a) of current action is dependent on the Reward R(s, a) of the current action and the cumulative effect of the moving probabilities P(s, a, s') and Q(s' , a') for the next state and actions to be taken by the agent. These Q values are acquired by the agent empirically as it explores the environment for some time. As we can see that the Q learning equation is quite complex to calculate because it involves recursion and the values are lot dependent on each other, so, to ease up the calculation we use the Temporal Difference(TD) method. The equation of TD is given by - TD(a, s) = [R(s, a) + gamma * max of a' over Q(s', a')] - Q(s, a) where [R(s,a) + gamma * max of a' over Q(s',a')] is the Q value the agent after performing a certain movement and Q(s,a) is the Q value of the agent before that movement. And with the application of the TD formula into our Q learning equation we finally achieve this equation - Qt(s,a) = Qt-1(s,a) + alpha * TDt(a,s) where Qt(s,a) = Q value of current state Qt-1(s,a) = Q value of previous state TDt(a,s) = TD of current state Here alpha is the learning rate and it cannot be 0 and 1 because if it is 0 then the TDt term goes to zero and if it is 1 then Qt-1(s, a) cancels out with -Qt-1(s,a). So, alpha can only be between 0 and 1 for the learning to take place properly, and gradually the TD term reaches to 0 and as soon as TD = 0, the AI reaches its optimal state(converges) which means it no longer needs to change the Q values to converge.

## Contact tutor

needs and Srinjoy will reply soon.