一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写

Python代寫--COMP9417
時間:2021-07-13


COMP9417 - Machine Learning

Homework 2: Logistic Regression & Optimization

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写       Introduction In this homework, we will first explore some aspects of Logistic Regression and performing inference for model parameters. We then turn our attention to gradient based optimisation, the workhorse of modern machine learning methods.

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写What to Submit 

? A single PDF file which contains solutions to each question. For each question, provide your solution in the form of text and requested plots. For some questions you will be requested to provide screen shots of code used to generate your answer — only include these when they are explicitly asked for. 

? .py file(s) containing all code you used for the project, which should be provided in a separate .zip file. This code must match the code provided in the report. ? You may be deducted points for not following these instructions. 

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写? You may be deducted points for poorly presented/formatted work. Please be neat and make your solutions clear. Start each question on a new page if necessary.

? You cannot submit a Jupyter notebook; this will receive a mark of zero. This does not stop you from developing your code in a notebook and then copying it into a .py file though, or using a tool such as nbconvert or similar.

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写 ? We will set up a Moodle forum for questions on this homework. Please read the existing questions before posting new questions. Please do some basic research online before posting questions. Please only post clarification questions. Any questions deemed to be fishing for answers will be ignored and/or deleted.

 ? As usual, we monitor all online forums such as Chegg, StackExchange, etc. Posting homework ques- tions on these site is equivalent to plagiarism and will result in a case of academic misconduct.

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写 When and Where to Submit

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写? Due date: Week 7, Sunday July 18th, 2021 by 11:55pm. 

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写 ? Late submissions will incur a penalty of 20% per day (from the ceiling, i.e., total marks available for the homework) for the first 5 days. For example, if you submit 2 days late, the maximum possible mark is 60% of the available 25 marks. 

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写? Submission must be done through Moodle, no exceptions.

Question 1. Regularized Logistic Regression & the Bootstrap

In this problem we will consider the dataset provided in Q1.csv, with binary response variable Y , and 45 continuous features X 1 ,...,X 45 . Recall that Regularized Logistic Regression is a regression model used when the response variable is binary valued. Instead of using mean squared error loss as in standard regression problems, we instead minimize the log-loss, also referred to as the cross entropy loss. For a parameter vector β = (β 1 ,...,β p ) ∈ R p , y i ∈ {0,1}, x i ∈ R p for i = 1,...,n, the log-loss is


一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写where s(z) = (1 + e ?z ) ?1 is the logistic sigmoid (see Homework 0 for a refresher.) In practice, we will usually add a penalty term, and consider the optimisation:


一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写where the penalty is usually not applied to the bias term β 0 , and C is a hyper-parameter. For example, in the ‘ 1 regularisation case, we take penalty(β) = kβk 1 (a LASSO for logistic regression).

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写(a) Consider the sklearn logistic regression implementation (section 1.1.11), which claims to mini- mize the following objective:


一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写It turns out that this objective is identical to our objective above, but only after re-coding the binary variables to be in {?1,1} instead of binary values {0,1}. That is, e y i ∈ {?1,1}, whereas y i ∈ {0,1}. Argue rigorously that the two objectives (1) and (2) are identical, in that they give us the same solutions ( ? β 0 = ? c and ? β = ? w). Further, describe the role of C in the objectives, how does it compare to the standard LASSO parameter λ? What to submit: some commentary/your working.

(b) Take the first 500 observations to be your training set, and the rest as the test set. In this part, we will perform cross validation over the choice of C from scratch (Do not use existing cross valida- tion implementations here, doing so will result in a mark of zero.)

Create a grid of 100 C values ranging from C = 0.0001 to C = 0.6 in equally sized increments, inclusive. For each value of C in your grid, perform 10-fold cross validation (i.e. split the data into 10 folds, fit logistic regression (using the LogisticRegression class in sklearn) with the choice of C on 9 of those folds, and record the log-loss on the 10th, repeating the process 10 times.) For this question, we will take the first fold to be the first 50 rows of the training data, the second fold to be the next 50 rows, etc. Be sure to use ‘ 1 regularisation, and the liblinear solver when fitting your models.

To display the results, we will produce a plot: the x-axis should reflect the choice of C values, and for each C, plot a box-plot over the 10 CV scores. Report the value of C that gives you the best CV performance. Re-fit the model with this chosen C, and report both train and test accuracy using this model. Note that we do not need to use the e y coding here (the sklearn implementation is able to handle different coding schemes automatically) so no transformations are needed before applying logistic regression to the provided data. What to submit: a single plot, train and test accuracy of your final model, a screen shot of your code for this section, a copy of your python code in solutions.py

(c) In this part we will compare our results in the previous section to the sklearn implementation of gridsearch, namely, the GridSearchCV class. My initial code for this section looked like:


一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写However, this gave me a very different answer to the result in (b). Provide two reasons for why this is the case, and then, if it is possible, re-run the code with some changes to give consistent results to those in (b), and if not, explain why. It may help to read through the documentation. What to submit: some commentary, a screen shot of your code for this section, a copy of your python code in solutions.py

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写We next explore the idea of inference. To motivate the difference between prediction and inference, see some of the answers to this stats.stachexchange post. Needless to say, inference is a much more difficult problem than prediction in general. In the next parts, we will study some ways of quantifying the uncertainty in our estimates of the logistic regression parameters. Assume for the remainder of this question that C = 1, and work only with the training data set (n = 500 observations) constructed earlier.

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写(d) In this part, we will consider the nonparametric bootstrap for building confidence intervals for each of the parameters β 1 ,...,β p . (Do not use existing Bootstrap implementations here, doing so will result in a mark of zero.) To describe this method, let’s first focus on the case of ? β 1 . The idea behind the nonparametric bootstrap is as follows:

(d) In this part, we will consider the nonparametric bootstrap for building confidence intervals for each of the parameters β 1 ,...,β p . (Do not use existing Bootstrap implementations here, doing so will result in a mark of zero.) To describe this method, let’s first focus on the case of ? β 1 . The idea behind the nonparametric bootstrap is as follows:

2. On each of the B bootstrap samples, compute an estimate of β 1 , giving us a total of B estimates which we denote ? β (1) 1 ,..., ? β (B) 1 .

3. Define the bootstrap mean and standard error respectively:


一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写4. A 90% bootstrap confidence interval for β 1 is then given by the interval:

(( ? β 1 ) L ,( ? β 1 ) U ) = (5th quantile of the bootstrap estimates,95th quantile of the bootstrap estimates)

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写The idea behind a 90% confidence interval is that it gives us a range of values for which we be- lieve with 90% probability the true parameter lives in that interval. If the computed 95 % interval contains the value of zero, then this provides us evidence that β 1 = 0, which means that the first feature should not be included in our model.

Take B = 10000 and set a random seed of 12 (i.e. np.random.seed(12)). Generate a plot where the x-axis represents the different parameters β 1 ,...,β p , and plot a vertical bar that runs from ( ? β p ) L to ( ? β p ) U . For those intervals that contain 0, draw the bar in red, otherwise draw it in blue. Also indicate on each bar the bootstrap mean. Remember to use C = 1.0.

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写What to submit: a single plot, some commentary, a screen shot of your code for this section, a copy of your python code in solutions.py

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写(e) Comment on your results in the previous section, what do the confidence intervals tell you about the underlying data generating distribution? How does this relate to the choice of C when running regularized logistic regression on this data? Is regularization necessary?

Question 2. Gradient Based Optimization

In this question we will explore some algorithms for gradient based optimization. These algorithms have been crucial to the development of machine learning in the last few decades. The most famous example is the backpropagation algorithm used in deep learning, which is in fact just an application of a simple algorithm known as (stochastic) gradient descent. The general framework for a gradient method for finding a minimizer of a function f : R n → R is defined by

 x (k+1) = x (k) ? α k ?f(x k ), k = 0,1,2,..., (3)

where α k > 0 is known as the step size, or learning rate. Consider the following simple example of minimizing g(x) = 2 √ x 3 + 1. We first note that g 0 (x) = 3x 2 (x 3 + 1) ?1/2 . We then need to choose a starting value of x, say x (0) = 1. Let’s also take the step size to be constant, α k = α = 0.1. Then we have the following iterations:


and this continues until we terminate the algorithm (as a quick exercise for your own benefit, code this up and compare it to the true minimum of the function which is x ? = ?1). This idea works for functions that have vector valued inputs, which is often the case in machine learning. For example, when we minimize a loss function we do so with respect to a weight vector, β. When we take the step- size to be constant at each iteration, this algorithm is called gradient descent. For the entirety of this question, do not use any existing implementations of gradient methods, doing so will result in an automatic mark of zero for the entire question.

(a) Consider the following optimisation problem:


一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写Run gradient descent on f using a step size of α = 0.1 and starting point of x (0) = (1,1,1,1). You will need to terminate the algorithm when the following condition is met: k?f(x (k) )k 2 < 0.001. In Page 5 your answer, clearly write down the version of the gradient steps (3) for this problem. Also, print out the first 5 and last 5 values of x (k) , clearly indicating the value of k, in the form:


一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写What to submit: an equation outlining the explicit gradient update, a print out of the first 5 and last 5 rows of your iterations, a screen shot of any code used for this section and a copy of your python code in solutions.py.

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写(b) Note that using a constant step-size is sub-optimal. Ideally we would ideally like to take large steps at the beginning (when we are far away from the optimum), then take smaller steps as we move closer towards the minimum. There are many proposals in the literature for how best to choose the step size, here we will explore just one of them called the method of steepest descent. This is almost identical to gradient descent, except at each iteration k, we choose


一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写In words, the step size is chosen to minimize an objective at each iteration of the gradient method, the objective is different at each step since it depends on the current x-value. In this part, we will run steepest descent to find the minimizer in (a). First, derive an explicit solution for α k (mathematically, please show your working). Then run steepest descent with the same x (0) as in (a), and α 0 = 0.1. Use the same termination condition. Provide the first and last 5 values of x (k) , as well as a plot of α k over all iterations. What to submit: a derivation of α k , a print out of the first 5 and last 5 rows of your iterations, a single plot, a screen shot of any code used for this section, a copy of your python code in solutions.py.

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写(c) Comment on the differences you observed, why would we prefer steepest descent over gradient descent? Why would you prefer gradient descent over steepest descent? Finally, explain why this is a reasonable condition to terminate use to terminate the algorithm. In the next few parts, we will use the gradient methods explored above to solve a real machine learning problem. Consider the data provided in Q2.csv. It contains 414 real estate records, each of which contains the following features:

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写 ? transactiondate: date of transaction ? age: age of property 

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写 ? nearestMRT: distance of property to nearest supermarket

 ? nConvenience: number of convenience stores in nearby locations 

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写 ? latitude

 ? longitude

The target variable is the property price. The goal is to learn to predict property prices as a function of a subset of the above features.

(d) We need to preprocess the data. First remove any rows with missing values. Then, delete all features except for age, nearestMRT and nConvenience. Then use the sklearn minmaxscaler to normalize the features. Finally, create a training set from the first half of the resulting dataset, and a test set from the remaining half. Your end result should look like:

? first row X train: [0.73059361,0.00951267,1.] 

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写? last row X train: [0.87899543,0.09926012,0.3] 

? first row X test: [0.26255708,0.20677973,0.1] 

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写? last row X test: [0.14840183,0.0103754,0.9]

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写 ? first row Y train: 37.9 

? last row Y train: 34.2 

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写? first row Y test: 26.2 

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写? last row Y test: 63.9 What to submit: a copy of your python code in solutions.py

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写(e) Consider the loss function


Instead of computing the gradient directly though, we will rely on an automatic differentiation library called JAX. Read the first section of the documentation to get an idea of the syntax. Im- plement gradient descent from scratch and use the JAX library to compute the gradient of the loss function at each step. You will only need the following import statements:


一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写se w (0) = [1,1,1,1] T , and a step size of 1. Terminate your algorithm when the absolute value of the loss from one iteration to the other is less than 0.0001. Report the number of iterations taken, and the final weight vector. Further, report the train and test losses achieved by your final model, and produce a plot of the training loss at each step of the algorithm. What to submit: a single plot, the final weight vector, the train and test loss of your final model, a screen shot of your code for this section, a copy of your python code in solutions.py

(f) Finally, re-do the previous section but with steepest descent instead. In order to compute α k at each step, you can either use JAX or it might be easier to use the minimize function in scipy (See lab3). Run the algorithm with the same w (0) as above, and take α 0 = 1 as your initial guess when numerically solving for α k (for each k). Terminate the algorithm when the loss value falls bellow 2.5. Report the number of iterations it took, as well as the final weight vector, and the train and test losses achieved. Generate a plot of the losses as before and include it. What to submit: a single plot, the final weight vector, the train and test accuracy of your final model, a screen shot of your code for this section, a copy of your python code in solutions.py

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写(g) In this question we have explored the gradient descent and steepest descent variants of the gradi- ent method. Many other gradient based algorithms exist in the literature. Choose one and describe it. What to submit: some commentary, any plots or code you used in this section (you don’t need to write any code, or supply plots, but you may. Also be sure to cite any sources used here.)




學霸聯盟


在線客服

售前咨詢
售后咨詢
微信號
Essay_Cheery
微信
专业essay代写|留学生论文,作业,网课,考试|代做功課服務-PROESSAY HKG 专业留学Essay|Assignment代写|毕业论文代写-rushmyessay,绝对靠谱负责 代写essay,代写assignment,「立减5%」网课代修-Australiaway 代写essay,代写assignment,代写PAPER,留学生论文代写网 毕业论文代写,代写paper,北美CS代写-编程代码,代写金融-第一代写网 作业代写:CS代写|代写论文|统计,数学,物理代写-天天论文网 提供高质量的essay代写,Paper代写,留学作业代写-天才代写 全优代写 - 北美Essay代写,Report代写,留学生论文代写作业代写 北美顶级代写|加拿大美国论文作业代写服务-最靠谱价格低-CoursePass 论文代写等留学生作业代做服务,北美网课代修领导者AssignmentBack