﻿ Python代写--COMP9417|学霸联盟

# 一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写

Python代寫--COMP9417

### Homework 2: Logistic Regression & Optimization

? A single PDF file which contains solutions to each question. For each question, provide your solution in the form of text and requested plots. For some questions you will be requested to provide screen shots of code used to generate your answer — only include these when they are explicitly asked for.

? .py file(s) containing all code you used for the project, which should be provided in a separate .zip file. This code must match the code provided in the report. ? You may be deducted points for not following these instructions.

? You cannot submit a Jupyter notebook; this will receive a mark of zero. This does not stop you from developing your code in a notebook and then copying it into a .py file though, or using a tool such as nbconvert or similar.

? As usual, we monitor all online forums such as Chegg, StackExchange, etc. Posting homework ques- tions on these site is equivalent to plagiarism and will result in a case of academic misconduct.

Question 1. Regularized Logistic Regression & the Bootstrap

In this problem we will consider the dataset provided in Q1.csv, with binary response variable Y , and 45 continuous features X 1 ,...,X 45 . Recall that Regularized Logistic Regression is a regression model used when the response variable is binary valued. Instead of using mean squared error loss as in standard regression problems, we instead minimize the log-loss, also referred to as the cross entropy loss. For a parameter vector β = (β 1 ,...,β p ) ∈ R p , y i ∈ {0,1}, x i ∈ R p for i = 1,...,n, the log-loss is

(b) Take the first 500 observations to be your training set, and the rest as the test set. In this part, we will perform cross validation over the choice of C from scratch (Do not use existing cross valida- tion implementations here, doing so will result in a mark of zero.)

Create a grid of 100 C values ranging from C = 0.0001 to C = 0.6 in equally sized increments, inclusive. For each value of C in your grid, perform 10-fold cross validation (i.e. split the data into 10 folds, fit logistic regression (using the LogisticRegression class in sklearn) with the choice of C on 9 of those folds, and record the log-loss on the 10th, repeating the process 10 times.) For this question, we will take the first fold to be the first 50 rows of the training data, the second fold to be the next 50 rows, etc. Be sure to use ‘ 1 regularisation, and the liblinear solver when fitting your models.

To display the results, we will produce a plot: the x-axis should reflect the choice of C values, and for each C, plot a box-plot over the 10 CV scores. Report the value of C that gives you the best CV performance. Re-fit the model with this chosen C, and report both train and test accuracy using this model. Note that we do not need to use the e y coding here (the sklearn implementation is able to handle different coding schemes automatically) so no transformations are needed before applying logistic regression to the provided data. What to submit: a single plot, train and test accuracy of your final model, a screen shot of your code for this section, a copy of your python code in solutions.py

(c) In this part we will compare our results in the previous section to the sklearn implementation of gridsearch, namely, the GridSearchCV class. My initial code for this section looked like:

(d) In this part, we will consider the nonparametric bootstrap for building confidence intervals for each of the parameters β 1 ,...,β p . (Do not use existing Bootstrap implementations here, doing so will result in a mark of zero.) To describe this method, let’s first focus on the case of ? β 1 . The idea behind the nonparametric bootstrap is as follows:

2. On each of the B bootstrap samples, compute an estimate of β 1 , giving us a total of B estimates which we denote ? β (1) 1 ,..., ? β (B) 1 .

3. Define the bootstrap mean and standard error respectively:

(( ? β 1 ) L ,( ? β 1 ) U ) = (5th quantile of the bootstrap estimates,95th quantile of the bootstrap estimates)

Take B = 10000 and set a random seed of 12 (i.e. np.random.seed(12)). Generate a plot where the x-axis represents the different parameters β 1 ,...,β p , and plot a vertical bar that runs from ( ? β p ) L to ( ? β p ) U . For those intervals that contain 0, draw the bar in red, otherwise draw it in blue. Also indicate on each bar the bootstrap mean. Remember to use C = 1.0.

In this question we will explore some algorithms for gradient based optimization. These algorithms have been crucial to the development of machine learning in the last few decades. The most famous example is the backpropagation algorithm used in deep learning, which is in fact just an application of a simple algorithm known as (stochastic) gradient descent. The general framework for a gradient method for finding a minimizer of a function f : R n → R is defined by

x (k+1) = x (k) ? α k ?f(x k ), k = 0,1,2,..., (3)

where α k > 0 is known as the step size, or learning rate. Consider the following simple example of minimizing g(x) = 2 √ x 3 + 1. We first note that g 0 (x) = 3x 2 (x 3 + 1) ?1/2 . We then need to choose a starting value of x, say x (0) = 1. Let’s also take the step size to be constant, α k = α = 0.1. Then we have the following iterations:

and this continues until we terminate the algorithm (as a quick exercise for your own benefit, code this up and compare it to the true minimum of the function which is x ? = ?1). This idea works for functions that have vector valued inputs, which is often the case in machine learning. For example, when we minimize a loss function we do so with respect to a weight vector, β. When we take the step- size to be constant at each iteration, this algorithm is called gradient descent. For the entirety of this question, do not use any existing implementations of gradient methods, doing so will result in an automatic mark of zero for the entire question.

(a) Consider the following optimisation problem:

? nConvenience: number of convenience stores in nearby locations

? longitude

The target variable is the property price. The goal is to learn to predict property prices as a function of a subset of the above features.

(d) We need to preprocess the data. First remove any rows with missing values. Then, delete all features except for age, nearestMRT and nConvenience. Then use the sklearn minmaxscaler to normalize the features. Finally, create a training set from the first half of the resulting dataset, and a test set from the remaining half. Your end result should look like:

? first row X train: [0.73059361,0.00951267,1.]

? first row X test: [0.26255708,0.20677973,0.1]

? last row Y train: 34.2

Instead of computing the gradient directly though, we will rely on an automatic differentiation library called JAX. Read the first section of the documentation to get an idea of the syntax. Im- plement gradient descent from scratch and use the JAX library to compute the gradient of the loss function at each step. You will only need the following import statements:

(f) Finally, re-do the previous section but with steepest descent instead. In order to compute α k at each step, you can either use JAX or it might be easier to use the minimize function in scipy (See lab3). Run the algorithm with the same w (0) as above, and take α 0 = 1 as your initial guess when numerically solving for α k (for each k). Terminate the algorithm when the loss value falls bellow 2.5. Report the number of iterations it took, as well as the final weight vector, and the train and test losses achieved. Generate a plot of the losses as before and include it. What to submit: a single plot, the final weight vector, the train and test accuracy of your final model, a screen shot of your code for this section, a copy of your python code in solutions.py

Essay_Cheery