Python代寫｜機器學習代寫 - CSE 5523: Machine Learning

時間：2020-12-07

Version:0.9 StartHTML:0000000105 EndHTML:0000087257 StartFragment:0000000141 EndFragment:0000087217
Instructions (Important)
a€￠ Total points = 55. Full Mark = 50.
a€￠ Read each problem statement carefully. Make sure that you answer exactly what is required and
that your solution is expressed clearly.
a€￠ Some notation introduced in class, e.g., the notation we used for true risk, empirical risk,
the output of a learning algorithm given a training set, parameter vectors, etc., will be
used here directly without being defifined again. Please, refer to the notes if you have any
confusion about the notation. Also, here k ?· k will be used to denote the Euclidean (i.e., L2) norm.
a€￠ You may use any result covered in class, but please cite the result that you use.
Problem 1: Linear Hard-Margin SVMs (15 points)
In this problem, refer to the fifigure attached in the last page. In the attached fifigure, consider the
constellation of feature vectors x
i
= (x
(i)
1 , x
(i)
2 ) a?? R2
and their assigned labels y(i) , i = 1, . . . , 7 (see the
color/label code in the top-right corner of the fifigure):
1. Find the margin-maximizing linear classififier; that is, fifind (?‰, b) = ((w1, w2), b) describing the equation
of the plane w1x1 + w2x2 + b = 0 of the largest margin with respect to the given constellation. Express
(?‰, b) after performing the proper normalization such that min ia??{1,...,7}
y(i)
a1°?h?‰, x(i)
i + b = 1.
[Note: You do not need to solve the optimization problem for the SVM or prove formally that your
plane is margin-maximizing. It is only required to fifind the parameters of the margin-maximizing
classififier (?‰, b) as instructed above. You may want to provide an intuitive explanation of why you think
your solution is optimal, for which you may get partial credit in case your solution is incorrect.]
2. Given your solution in Part 1:
(a) Identify the set Q a?? {x (1) , . . . , x (7)
} of points that are closest to your plane.
(b) Find a set of scalars {?±i : i a?? Q} that satisfy all of the following (showing all details of your
derivation):
a?€ i a?? Q, ?±i a‰￥ 0,
Xia??Q
y
(i)
?±i = 0,
?‰ = Xia??Q
?±iy
(i)
x
(i)
(c) What are the support vectors of your classififier?Problem 2: Solving a Kernel-Based SVM with SGD (18 points)
This problem illustrates a simple example for solving a kernel-based SVM with Gaussian kernel via SGD.
Consider a training set of 3 data points: (x(i) , y(i)
)
a??
R2
?— {a?’
1, +1
}
, i = 1, 2, 3, where
x
(1)
=
0
0
, y
(1)
= +1
x
(2)
=
2
1
, y
(2)
= 1
x
(3)
=
2
1
, y
(3)
= 1
Let K : R2
?—
R2
a?’
R be a Gaussian kernel defifined as: K(x, x0
) = exp
1
8
k
x x0
k
2
for any x, x0
a??
R2
.
Suppose we want to solve the kernel-based SVM as described in class with respect to this training set:
1. Obtain the Kernel matrix with respect to the given training set. Hence, write down the
optimization problem corresponding to solving the soft-margin kernel-based SVM, i.e.,
write down an expression for the objective function (the regularized loss) that needs to be minimized
over the variable coeffiffifficients ?± = (?±1, ?±2, ?±3) as discussed in class.
2. Set the regularization parameter ?? in the objective function of Part 1 as ?? = 1. Now, suppose we want
to solve this optimization problem via SGD as discussed in class. For simplicity, consider SGD with
total number of updates T = 6, i.e., 5 update steps not including initialization, which is assumed to
be at ?±(1)
= 0. Showing all intermediate steps, perform those updates and obtain the fifinal
coeffiffifficient-vector ?±
b
(i.e., show all updates ?±(t) , t = 1, . . . , 6, as well as the fifinal output ?±
b
). Note
that in the actual execution, in each step, SGD samples one data point uniformly at random from the
training set. For the sake of this problem, assume that over the course of the 5 update steps after
initialization, SGD samples the folowing data points in the following order:
at t = 1,
x
(1)
, y
(1)
is sampled
at t = 2,
x
(2)
, y
(2)
is sampled
at t = 3,
x
(3)
, y
(3)
is sampled
at t = 4,
x
(2)
, y
(2)
is sampled
at t = 5,
x
(3)
, y
(3)
is sampled
3. Given the coeffiffifficient-vector ?±
b
obtained in Part 2,
(a) give an expression for the resulting kernel-based classififier,
(b) suppose you are given a new feature-vector x = (0, 1)
a??
R2
. What label will your kernel classififier
generate for x? Show your derivation.
Problem 3: Stability (10 points)
We say that a learning algorithm
A
is an AERM (Asymptotic Empirical Risk Minimizer ) for the class
H with excess empirical risk (n) if for every distribution D it holds that
E
Sa??Dn
L
b
(
A
(S); S) min ha??H L
b
(h; S)
a‰¤
(n).We say learning algorithm
A
learns a class
H
with excess true risk (n) if for every distribution D it holds
that
E
Sa??Dn
L(
A
(S); D) min ha??H L(h; D)
a‰¤
(n).
Prove that if a learning algorithm
A
is 1(n)-On-Average-Replace-One stable, and is an AERM with excess
empirical risk 2(n) for the class
H
, then it learns
H
with excess true risk 1(n) + 2(n).
Problem 4: Noisy Gradient Descent (12 points)
Let M, ?? > 0. Let
C a??
Rd
be a closed convex set where max wa??C k
w
k a‰¤
M. Let f : Rd
a?’
R be a convex,
??-Lipschitz function. Consider a variant of the basic (non-stochastic) gradient descent algorithm discussed
in class, where at each update step, we get a noisy version of the gradient. Namely, at each iteration
t = 1, . . . , T 1, the update step is given by:
wt+1 = ??C (wt ?± (
a??
f(wt) + vt)),
where vt
a?? N
0, ??2Id
, t = 1, . . . , T 1, are independent Gaussian random variables (Id is the identity
matrix of size d), and ??C is the Euclidean projection onto
C
discussed in class. You may assume the standard
initialization: w1 = 0.
Show that if ?± =
a??
M
(??2
+d??2
)T , then the output of the algorithm w
b
=
1
T
P
T
t=1 wt after T iterations satisfifies
E [f(w
b
)] min wa??C f(w)
a‰¤
M
p
??2 + d??2
a??T
,
where the expectation is taken w.r.t. the Gaussian noise variables vt, t = 1, . . . , T 1.
[Note that the problem here is an optimization problem not a learning problem, i.e., there is no training data
or the notion of excess risk. The goal is to minimize an objective function f over
C a??
Rd
via this noisy
version of the basic gradient descent algorithm.]
[Note that the bound to be proved is on the expectation of the optimization error. In your derivation, you
need to take the expectation w.r.t. the randomness due to the Gaussian noise.](1, 5)
+ 1 label
- 1 label
(4, -2)
( - 1, 1)
x2
x1
(-1, 4)
(-2, -5)
(1, - 1)
(-5, 3)
x
(1)
=
x
(2)
=
x
(3)
=
x
(5)
=
x
(4)
=
x
(7)
=
x
(6)
=

- 留學生代寫
- Python代寫
- Java代寫
- c/c++代寫
- 數據庫代寫
- 算法代寫
- 機器學習代寫
- 數據挖掘代寫
- 數據分析代寫
- android/ios代寫
- web/html代寫
- 計算機網絡代寫
- 操作系統代寫
- 計算機體系結構代寫
- R代寫
- 數學代寫
- Finance 金融作業代寫
- Principles of Microeconomics 微觀經濟學代寫
- Accounting 會計代寫
- Statistics統計代寫
- 生物代寫
- 物理代寫
- 機械代寫
- Assignment代寫
- sql數據庫代寫
- analysis代寫
- Haskell代寫
- Linux代寫
- Shell代寫
- SPSS, SAS, R 數據分析代寫
- Principles of Macroeconomics 宏觀經濟學代寫
- Economics 經濟代寫
- Econometrics 計量經濟代寫
- Money and Banking 貨幣銀行學代寫
- Financial statistics 金融統計代寫
- Economic statistics 經濟統計代寫
- Probability theory 概率論代寫
- Algebra 代數代寫
- Engineering工程作業代寫
- Mechanical and Automation Engineering 機械與自動化工程代寫
- Actuarial Science 精算科學代寫
- JavaScript代寫
- Matlab代寫
- Unity代寫
- BigDate大數據代寫
- 匯編代寫
- stat代寫
- scala代寫
- OpenGL代寫
- CS代寫