一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写

Python代寫|機器學習代寫 - CSE 5523: Machine Learning
時間:2020-12-07
Version:0.9 StartHTML:0000000105 EndHTML:0000087257 StartFragment:0000000141 EndFragment:0000087217 Instructions (Important) a€¢ Total points = 55. Full Mark = 50. a€¢ Read each problem statement carefully. Make sure that you answer exactly what is required and that your solution is expressed clearly. a€¢ Some notation introduced in class, e.g., the notation we used for true risk, empirical risk, the output of a learning algorithm given a training set, parameter vectors, etc., will be used here directly without being defifined again. Please, refer to the notes if you have any confusion about the notation. Also, here k ?· k will be used to denote the Euclidean (i.e., L2) norm. a€¢ You may use any result covered in class, but please cite the result that you use. Problem 1: Linear Hard-Margin SVMs (15 points) In this problem, refer to the fifigure attached in the last page. In the attached fifigure, consider the constellation of feature vectors x i = (x (i) 1 , x (i) 2 ) a?? R2 and their assigned labels y(i) , i = 1, . . . , 7 (see the color/label code in the top-right corner of the fifigure): 1. Find the margin-maximizing linear classififier; that is, fifind (?‰, b) = ((w1, w2), b) describing the equation of the plane w1x1 + w2x2 + b = 0 of the largest margin with respect to the given constellation. Express (?‰, b) after performing the proper normalization such that min ia??{1,...,7} y(i) a1°?h?‰, x(i) i + b = 1. [Note: You do not need to solve the optimization problem for the SVM or prove formally that your plane is margin-maximizing. It is only required to fifind the parameters of the margin-maximizing classififier (?‰, b) as instructed above. You may want to provide an intuitive explanation of why you think your solution is optimal, for which you may get partial credit in case your solution is incorrect.] 2. Given your solution in Part 1: (a) Identify the set Q a?? {x (1) , . . . , x (7) } of points that are closest to your plane. (b) Find a set of scalars {?±i : i a?? Q} that satisfy all of the following (showing all details of your derivation): a?€ i a?? Q, ?±i a‰¥ 0, Xia??Q y (i) ?±i = 0, ?‰ = Xia??Q ?±iy (i) x (i) (c) What are the support vectors of your classififier?Problem 2: Solving a Kernel-Based SVM with SGD (18 points) This problem illustrates a simple example for solving a kernel-based SVM with Gaussian kernel via SGD. Consider a training set of 3 data points: (x(i) , y(i) ) a?? R2 ?— {a?’ 1, +1 } , i = 1, 2, 3, where x (1) =  0 0 , y (1) = +1 x (2) =  2 1 , y (2) = 1 x (3) =  2 1 , y (3) = 1 Let K : R2 ?— R2 a?’ R be a Gaussian kernel defifined as: K(x, x0 ) = exp 1 8 k x x0 k 2  for any x, x0 a?? R2 . Suppose we want to solve the kernel-based SVM as described in class with respect to this training set: 1. Obtain the Kernel matrix with respect to the given training set. Hence, write down the optimization problem corresponding to solving the soft-margin kernel-based SVM, i.e., write down an expression for the objective function (the regularized loss) that needs to be minimized over the variable coeffiffifficients ?± = (?±1, ?±2, ?±3) as discussed in class. 2. Set the regularization parameter ?? in the objective function of Part 1 as ?? = 1. Now, suppose we want to solve this optimization problem via SGD as discussed in class. For simplicity, consider SGD with total number of updates T = 6, i.e., 5 update steps not including initialization, which is assumed to be at ?±(1) = 0. Showing all intermediate steps, perform those updates and obtain the fifinal coeffiffifficient-vector ?± b (i.e., show all updates ?±(t) , t = 1, . . . , 6, as well as the fifinal output ?± b ). Note that in the actual execution, in each step, SGD samples one data point uniformly at random from the training set. For the sake of this problem, assume that over the course of the 5 update steps after initialization, SGD samples the folowing data points in the following order: at t = 1,  x (1) , y (1)  is sampled at t = 2,  x (2) , y (2)  is sampled at t = 3,  x (3) , y (3)  is sampled at t = 4,  x (2) , y (2)  is sampled at t = 5,  x (3) , y (3)  is sampled 3. Given the coeffiffifficient-vector ?± b obtained in Part 2, (a) give an expression for the resulting kernel-based classififier, (b) suppose you are given a new feature-vector x = (0, 1) a?? R2 . What label will your kernel classififier generate for x? Show your derivation. Problem 3: Stability (10 points) We say that a learning algorithm A is an AERM (Asymptotic Empirical Risk Minimizer ) for the class H with excess empirical risk (n) if for every distribution D it holds that E Sa??Dn  L b ( A (S); S) min ha??H L b (h; S)  a‰¤ (n).We say learning algorithm A learns a class H with excess true risk (n) if for every distribution D it holds that E Sa??Dn  L( A (S); D) min ha??H L(h; D)  a‰¤ (n). Prove that if a learning algorithm A is 1(n)-On-Average-Replace-One stable, and is an AERM with excess empirical risk 2(n) for the class H , then it learns H with excess true risk 1(n) + 2(n). Problem 4: Noisy Gradient Descent (12 points) Let M, ?? > 0. Let C a?? Rd be a closed convex set where max wa??C k w k a‰¤ M. Let f : Rd a?’ R be a convex, ??-Lipschitz function. Consider a variant of the basic (non-stochastic) gradient descent algorithm discussed in class, where at each update step, we get a noisy version of the gradient. Namely, at each iteration t = 1, . . . , T 1, the update step is given by: wt+1 = ??C (wt ?± ( a?? f(wt) + vt)), where vt a?? N 0, ??2Id  , t = 1, . . . , T 1, are independent Gaussian random variables (Id is the identity matrix of size d), and ??C is the Euclidean projection onto C discussed in class. You may assume the standard initialization: w1 = 0. Show that if ?± = a?? M (??2 +d??2 )T , then the output of the algorithm w b = 1 T P T t=1 wt after T iterations satisfifies E [f(w b )] min wa??C f(w) a‰¤ M p ??2 + d??2 a??T , where the expectation is taken w.r.t. the Gaussian noise variables vt, t = 1, . . . , T 1. [Note that the problem here is an optimization problem not a learning problem, i.e., there is no training data or the notion of excess risk. The goal is to minimize an objective function f over C a?? Rd via this noisy version of the basic gradient descent algorithm.] [Note that the bound to be proved is on the expectation of the optimization error. In your derivation, you need to take the expectation w.r.t. the randomness due to the Gaussian noise.](1, 5) + 1 label - 1 label (4, -2) ( - 1, 1) x2 x1 (-1, 4) (-2, -5) (1, - 1) (-5, 3) x (1) = x (2) = x (3) = x (5) = x (4) = x (7) = x (6) =

在線客服

售前咨詢
售后咨詢
微信號
Essay_Cheery
微信
专业essay代写|留学生论文,作业,网课,考试|代做功課服務-PROESSAY HKG 专业留学Essay|Assignment代写|毕业论文代写-rushmyessay,绝对靠谱负责 代写essay,代写assignment,「立减5%」网课代修-Australiaway 代写essay,代写assignment,代写PAPER,留学生论文代写网 毕业论文代写,代写paper,北美CS代写-编程代码,代写金融-第一代写网 作业代写:CS代写|代写论文|统计,数学,物理代写-天天论文网 提供高质量的essay代写,Paper代写,留学作业代写-天才代写 全优代写 - 北美Essay代写,Report代写,留学生论文代写作业代写 北美顶级代写|加拿大美国论文作业代写服务-最靠谱价格低-CoursePass 论文代写等留学生作业代做服务,北美网课代修领导者AssignmentBack