Python代寫--COMP9417

時間：2021-07-13

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写 Introduction In this homework, we will first explore some aspects of Logistic Regression and performing inference for model parameters. We then turn our attention to gradient based optimisation, the workhorse of modern machine learning methods.

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写What to Submit

? A single PDF file which contains solutions to each question. For each question, provide your solution in the form of text and requested plots. For some questions you will be requested to provide screen shots of code used to generate your answer — only include these when they are explicitly asked for.

? .py file(s) containing all code you used for the project, which should be provided in a separate .zip file. This code must match the code provided in the report. ? You may be deducted points for not following these instructions.

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写? You may be deducted points for poorly presented/formatted work. Please be neat and make your solutions clear. Start each question on a new page if necessary.

? You cannot submit a Jupyter notebook; this will receive a mark of zero. This does not stop you from developing your code in a notebook and then copying it into a .py file though, or using a tool such as nbconvert or similar.

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写 ? We will set up a Moodle forum for questions on this homework. Please read the existing questions before posting new questions. Please do some basic research online before posting questions. Please only post clarification questions. Any questions deemed to be fishing for answers will be ignored and/or deleted.

? As usual, we monitor all online forums such as Chegg, StackExchange, etc. Posting homework ques- tions on these site is equivalent to plagiarism and will result in a case of academic misconduct.

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写 When and Where to Submit

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写? Due date: Week 7, Sunday July 18th, 2021 by 11:55pm.

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写 ? Late submissions will incur a penalty of 20% per day (from the ceiling, i.e., total marks available for the homework) for the first 5 days. For example, if you submit 2 days late, the maximum possible mark is 60% of the available 25 marks.

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写? Submission must be done through Moodle, no exceptions.

Question 1. Regularized Logistic Regression & the Bootstrap

In this problem we will consider the dataset provided in Q1.csv, with binary response variable Y , and 45 continuous features X 1 ,...,X 45 . Recall that Regularized Logistic Regression is a regression model used when the response variable is binary valued. Instead of using mean squared error loss as in standard regression problems, we instead minimize the log-loss, also referred to as the cross entropy loss. For a parameter vector β = (β 1 ,...,β p ) ∈ R p , y i ∈ {0,1}, x i ∈ R p for i = 1,...,n, the log-loss is

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写where s(z) = (1 + e ?z ) ?1 is the logistic sigmoid (see Homework 0 for a refresher.) In practice, we will usually add a penalty term, and consider the optimisation:

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写where the penalty is usually not applied to the bias term β 0 , and C is a hyper-parameter. For example, in the ‘ 1 regularisation case, we take penalty(β) = kβk 1 (a LASSO for logistic regression).

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写(a) Consider the sklearn logistic regression implementation (section 1.1.11), which claims to mini- mize the following objective:

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写It turns out that this objective is identical to our objective above, but only after re-coding the binary variables to be in {?1,1} instead of binary values {0,1}. That is, e y i ∈ {?1,1}, whereas y i ∈ {0,1}. Argue rigorously that the two objectives (1) and (2) are identical, in that they give us the same solutions ( ? β 0 = ? c and ? β = ? w). Further, describe the role of C in the objectives, how does it compare to the standard LASSO parameter λ? What to submit: some commentary/your working.

(b) Take the first 500 observations to be your training set, and the rest as the test set. In this part, we will perform cross validation over the choice of C from scratch (Do not use existing cross valida- tion implementations here, doing so will result in a mark of zero.)

Create a grid of 100 C values ranging from C = 0.0001 to C = 0.6 in equally sized increments, inclusive. For each value of C in your grid, perform 10-fold cross validation (i.e. split the data into 10 folds, fit logistic regression (using the LogisticRegression class in sklearn) with the choice of C on 9 of those folds, and record the log-loss on the 10th, repeating the process 10 times.) For this question, we will take the first fold to be the first 50 rows of the training data, the second fold to be the next 50 rows, etc. Be sure to use ‘ 1 regularisation, and the liblinear solver when fitting your models.

To display the results, we will produce a plot: the x-axis should reflect the choice of C values, and for each C, plot a box-plot over the 10 CV scores. Report the value of C that gives you the best CV performance. Re-fit the model with this chosen C, and report both train and test accuracy using this model. Note that we do not need to use the e y coding here (the sklearn implementation is able to handle different coding schemes automatically) so no transformations are needed before applying logistic regression to the provided data. What to submit: a single plot, train and test accuracy of your final model, a screen shot of your code for this section, a copy of your python code in solutions.py

(c) In this part we will compare our results in the previous section to the sklearn implementation of gridsearch, namely, the GridSearchCV class. My initial code for this section looked like:

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写However, this gave me a very different answer to the result in (b). Provide two reasons for why this is the case, and then, if it is possible, re-run the code with some changes to give consistent results to those in (b), and if not, explain why. It may help to read through the documentation. What to submit: some commentary, a screen shot of your code for this section, a copy of your python code in solutions.py

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写We next explore the idea of inference. To motivate the difference between prediction and inference, see some of the answers to this stats.stachexchange post. Needless to say, inference is a much more difficult problem than prediction in general. In the next parts, we will study some ways of quantifying the uncertainty in our estimates of the logistic regression parameters. Assume for the remainder of this question that C = 1, and work only with the training data set (n = 500 observations) constructed earlier.

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写(d) In this part, we will consider the nonparametric bootstrap for building confidence intervals for each of the parameters β 1 ,...,β p . (Do not use existing Bootstrap implementations here, doing so will result in a mark of zero.) To describe this method, let’s first focus on the case of ? β 1 . The idea behind the nonparametric bootstrap is as follows:

(d) In this part, we will consider the nonparametric bootstrap for building confidence intervals for each of the parameters β 1 ,...,β p . (Do not use existing Bootstrap implementations here, doing so will result in a mark of zero.) To describe this method, let’s first focus on the case of ? β 1 . The idea behind the nonparametric bootstrap is as follows:

2. On each of the B bootstrap samples, compute an estimate of β 1 , giving us a total of B estimates which we denote ? β (1) 1 ,..., ? β (B) 1 .

3. Define the bootstrap mean and standard error respectively:

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写4. A 90% bootstrap confidence interval for β 1 is then given by the interval:

(( ? β 1 ) L ,( ? β 1 ) U ) = (5th quantile of the bootstrap estimates,95th quantile of the bootstrap estimates)

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写The idea behind a 90% confidence interval is that it gives us a range of values for which we be- lieve with 90% probability the true parameter lives in that interval. If the computed 95 % interval contains the value of zero, then this provides us evidence that β 1 = 0, which means that the first feature should not be included in our model.

Take B = 10000 and set a random seed of 12 (i.e. np.random.seed(12)). Generate a plot where the x-axis represents the different parameters β 1 ,...,β p , and plot a vertical bar that runs from ( ? β p ) L to ( ? β p ) U . For those intervals that contain 0, draw the bar in red, otherwise draw it in blue. Also indicate on each bar the bootstrap mean. Remember to use C = 1.0.

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写What to submit: a single plot, some commentary, a screen shot of your code for this section, a copy of your python code in solutions.py

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写(e) Comment on your results in the previous section, what do the confidence intervals tell you about the underlying data generating distribution? How does this relate to the choice of C when running regularized logistic regression on this data? Is regularization necessary?

Question 2. Gradient Based Optimization

In this question we will explore some algorithms for gradient based optimization. These algorithms have been crucial to the development of machine learning in the last few decades. The most famous example is the backpropagation algorithm used in deep learning, which is in fact just an application of a simple algorithm known as (stochastic) gradient descent. The general framework for a gradient method for finding a minimizer of a function f : R n → R is defined by

x (k+1) = x (k) ? α k ?f(x k ), k = 0,1,2,..., (3)

where α k > 0 is known as the step size, or learning rate. Consider the following simple example of minimizing g(x) = 2 √ x 3 + 1. We first note that g 0 (x) = 3x 2 (x 3 + 1) ?1/2 . We then need to choose a starting value of x, say x (0) = 1. Let’s also take the step size to be constant, α k = α = 0.1. Then we have the following iterations:

and this continues until we terminate the algorithm (as a quick exercise for your own benefit, code this up and compare it to the true minimum of the function which is x ? = ?1). This idea works for functions that have vector valued inputs, which is often the case in machine learning. For example, when we minimize a loss function we do so with respect to a weight vector, β. When we take the step- size to be constant at each iteration, this algorithm is called gradient descent. For the entirety of this question, do not use any existing implementations of gradient methods, doing so will result in an automatic mark of zero for the entire question.

(a) Consider the following optimisation problem:

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写Run gradient descent on f using a step size of α = 0.1 and starting point of x (0) = (1,1,1,1). You will need to terminate the algorithm when the following condition is met: k?f(x (k) )k 2 < 0.001. In Page 5 your answer, clearly write down the version of the gradient steps (3) for this problem. Also, print out the first 5 and last 5 values of x (k) , clearly indicating the value of k, in the form:

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写What to submit: an equation outlining the explicit gradient update, a print out of the first 5 and last 5 rows of your iterations, a screen shot of any code used for this section and a copy of your python code in solutions.py.

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写(b) Note that using a constant step-size is sub-optimal. Ideally we would ideally like to take large steps at the beginning (when we are far away from the optimum), then take smaller steps as we move closer towards the minimum. There are many proposals in the literature for how best to choose the step size, here we will explore just one of them called the method of steepest descent. This is almost identical to gradient descent, except at each iteration k, we choose

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写In words, the step size is chosen to minimize an objective at each iteration of the gradient method, the objective is different at each step since it depends on the current x-value. In this part, we will run steepest descent to find the minimizer in (a). First, derive an explicit solution for α k (mathematically, please show your working). Then run steepest descent with the same x (0) as in (a), and α 0 = 0.1. Use the same termination condition. Provide the first and last 5 values of x (k) , as well as a plot of α k over all iterations. What to submit: a derivation of α k , a print out of the first 5 and last 5 rows of your iterations, a single plot, a screen shot of any code used for this section, a copy of your python code in solutions.py.

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写(c) Comment on the differences you observed, why would we prefer steepest descent over gradient descent? Why would you prefer gradient descent over steepest descent? Finally, explain why this is a reasonable condition to terminate use to terminate the algorithm. In the next few parts, we will use the gradient methods explored above to solve a real machine learning problem. Consider the data provided in Q2.csv. It contains 414 real estate records, each of which contains the following features:

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写 ? transactiondate: date of transaction ? age: age of property

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写 ? nearestMRT: distance of property to nearest supermarket

? nConvenience: number of convenience stores in nearby locations

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写 ? latitude

? longitude

The target variable is the property price. The goal is to learn to predict property prices as a function of a subset of the above features.

(d) We need to preprocess the data. First remove any rows with missing values. Then, delete all features except for age, nearestMRT and nConvenience. Then use the sklearn minmaxscaler to normalize the features. Finally, create a training set from the first half of the resulting dataset, and a test set from the remaining half. Your end result should look like:

? first row X train: [0.73059361,0.00951267,1.]

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写? last row X train: [0.87899543,0.09926012,0.3]

? first row X test: [0.26255708,0.20677973,0.1]

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写? last row X test: [0.14840183,0.0103754,0.9]

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写 ? first row Y train: 37.9

? last row Y train: 34.2

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写? first row Y test: 26.2

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写? last row Y test: 63.9 What to submit: a copy of your python code in solutions.py

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写(e) Consider the loss function

Instead of computing the gradient directly though, we will rely on an automatic differentiation library called JAX. Read the first section of the documentation to get an idea of the syntax. Im- plement gradient descent from scratch and use the JAX library to compute the gradient of the loss function at each step. You will only need the following import statements:

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写se w (0) = [1,1,1,1] T , and a step size of 1. Terminate your algorithm when the absolute value of the loss from one iteration to the other is less than 0.0001. Report the number of iterations taken, and the final weight vector. Further, report the train and test losses achieved by your final model, and produce a plot of the training loss at each step of the algorithm. What to submit: a single plot, the final weight vector, the train and test loss of your final model, a screen shot of your code for this section, a copy of your python code in solutions.py

(f) Finally, re-do the previous section but with steepest descent instead. In order to compute α k at each step, you can either use JAX or it might be easier to use the minimize function in scipy (See lab3). Run the algorithm with the same w (0) as above, and take α 0 = 1 as your initial guess when numerically solving for α k (for each k). Terminate the algorithm when the loss value falls bellow 2.5. Report the number of iterations it took, as well as the final weight vector, and the train and test losses achieved. Generate a plot of the losses as before and include it. What to submit: a single plot, the final weight vector, the train and test accuracy of your final model, a screen shot of your code for this section, a copy of your python code in solutions.py

一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写(g) In this question we have explored the gradient descent and steepest descent variants of the gradi- ent method. Many other gradient based algorithms exist in the literature. Choose one and describe it. What to submit: some commentary, any plots or code you used in this section (you don’t need to write any code, or supply plots, but you may. Also be sure to cite any sources used here.)

- 留學生代寫
- Python代寫
- Java代寫
- c/c++代寫
- 數據庫代寫
- 算法代寫
- 機器學習代寫
- 數據挖掘代寫
- 數據分析代寫
- android/ios代寫
- web/html代寫
- 計算機網絡代寫
- 操作系統代寫
- 計算機體系結構代寫
- R代寫
- 數學代寫
- Finance 金融作業代寫
- Principles of Microeconomics 微觀經濟學代寫
- Accounting 會計代寫
- Statistics統計代寫
- 生物代寫
- 物理代寫
- 機械代寫
- Assignment代寫
- sql數據庫代寫
- analysis代寫
- Haskell代寫
- Linux代寫
- Shell代寫
- SPSS, SAS, R 數據分析代寫
- Principles of Macroeconomics 宏觀經濟學代寫
- Economics 經濟代寫
- Econometrics 計量經濟代寫
- Money and Banking 貨幣銀行學代寫
- Financial statistics 金融統計代寫
- Economic statistics 經濟統計代寫
- Probability theory 概率論代寫
- Algebra 代數代寫
- Engineering工程作業代寫
- Mechanical and Automation Engineering 機械與自動化工程代寫
- Actuarial Science 精算科學代寫
- JavaScript代寫
- Matlab代寫
- Unity代寫
- BigDate大數據代寫
- 匯編代寫
- stat代寫
- scala代寫
- OpenGL代寫
- CS代寫