一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写

機器學習代寫 - CSE 142 Machine Learning
時間:2020-11-12
Directions: This homework is to be done individually. Please upload a set of solutions containing your name and @ucsc.edu e-mail address to Canvas. Typeset (e.g. TeX) solutions are preferred, but scans or photographs of hand-written solutions are acceptable provided that they are neat and legible. The TA may deduct points for poorly organized or illegible solutions. Question: 1 2 3 4 Total Points: 20 30 20 30 100 Bonus Points: 0 0 0 0 0 Score: Questions: 1. Logistic Regression. Logistic regression treats a binary classification (e.g. is/is-not a dog) probabal?istically, where the probability for any example x to be assigned the class y = 1 is given in terms the logistic sigmoid function g and a weight vector w that must be learned: g(w · x) = exp (w · x) 1 + exp (w · x) p(1 | x, w) = g(w · x) p(0 | x, w) = 1 1 g(w · x) (a) (10 points) Let g(w · x) = q From the definitions above, prove that w · x is a logit or “log-odds" function when 0 < q < 1 w · x = ln q 1 1 q (b) (10 points) Just as p(1|w, x) = g(w · x) is the probability for a positive classification conditioned on a set of weights, it is the likelihood of weights w given a classification y = 1. Show that the gradient (in parameter space - derivatives should be taken with respect to the components of w) of this log likelihood function reduces to ? log g(w · x) = x(1 1 g(w · x)) which is a vector quantity parallel to x. 2. (30 points) Naive Bayes. Use Naive Bayes to estimate whether a student will be an honor student (H) or normal student (N) in college based on their high school performance. Each instance has two features: the student’s high school GPA (a real number) and whether or not the student took any AP courses (a Boolean value, yes=1, no=0). Based on the following training data, create (by hand or with computational tools) a Naive Bayes prediction rule using normal (Gaussian) distributions to estimate the conditional probability density of high school GPAs given honors status (H or N) (this assigns non?zero probability to negative or greater-than-four GPA values, but that is fine for our purposes) and a Bernoulli distribution for the AP feature. label AP GPA H yes 4.0 H yes 3.7 H no 2.5 N no 3.8 N yes 3.3 N yes 3.0 N no 3.0 N no 2.7 N no 2.2 Recall that Naive Bayes makes the simplifying assumption that the features are conditionally independent given a class / label: p(gpa, ap | honors) = p(gpa | honors)p(ap | honors). Use maximum likelihood estimation (not the unbiased or Laplace estimates) for the distributions of the two features conditioned on the two classes. Give the mean and variance for each distribution over GPA values. For the variance here, you only need to calculate the biased sample variance estimator (divided by n), not the unbiased one (divided by n n 1). Describe your prediction rule in the following form: If AP courses are taken, predict H if the GPA is in a ? R if AP courses are not taken, predict H if the GPA is in b ? R where a and b must be found. Hint: It is probably easier to get this description if you take logarithms. 3 digits of precision should sufficient. Also, the logarithm of the Gaussian densities are quadratic, possibly yielding two distrinct zeros that correspond to the boundaries of the intervals a or b. 3. Nearest Neighbors. Assume that examples are drawn uniformly from the unit square. Independent of example features (i.e., location (x1, x2) in the unit square), labels are generated at random such that a proportion q where 0.5 < q < 1 are assigned label y = 1 and the remainder are assigned label y = 0. The Bayes-optimal hypothesis will minimize the error rate of its predictions given (x1, x2) by always predicting y = 1 and suffering an error rate of 1 1 q. How should we expect the 3-Nearest-Neighbors algorithm perform? Assume that the algorithm is trained on a large set of known labels. For each new sample, the algorithm finds the three closest (according to any metric!) points in its training set, finds the label shared by the majority of these three, and assigns this label to the new example. (a) (10 points) What is the expected error rate of the 3-Nearest-Neighbor algorithm, in terms of q? (b) (10 points) When is this better or worse than the Naive Bayes solution (recall, 0.5 < q < 1)? 4. Decision Tree. Sammy-the slug owns a car dealership that sells two types of cars: Honda(H) and BMW(B). He collected the following data of his customers that records their Gender, Annual income and the type of car they purchased: Gender AI (Annual Income in thousands) Preference F 100 B M 150 B F 80 B M 75 B F 90 H F 85 H M 85 H F 70 H Page 2 This question is about building decision trees to predict the car type that a new customer will prefer. Note that one of the attributes, Annual Income (AI), is a continuous variable. Lets assume that we will only allow binary splits for this attribute of the form AI < a and AI ≥ a, where a lies in the dataset. However, there can be multiple such splits in one path from root to leaf. For all your calculations, use log base 2. (a) (3 points) For this part of the question, lets assume that for some unknown reason, Sammy insists on keeping the “Annual Income” at the root node. How many possible values of a does he need to consider? What are they? (b) (3 points) What is the entropy of labels (car type) in the training dataset? (c) (9 points) What is the optimal root node for this dataset? Show your calculations. (d) (15 points) Draw the Decision Tree that would be learned by ID3 on this dataset. Only binary splits are allowed i.e. a node can have only two children. Your tree should contain all the information necessary to read it. At each node, indicate: 1. the attribute you are splitting on (when splitting on AI, also include the a used); 2. the label distribution of the instances at that node before the split (e.g. if there are 3 instances at a node and 2 belong to H and 1 belongs to B class, then label the node as h2, 1i). 3. Additionally, for each non-leaf node indicate the gain attained by the corresponding split, and label each leaf node by its class-label (H or B). 4. All edges between a parent and a child should be labeled with the value of the attribute. 5. It is okay to draw the tree by hand and include a clear picture in your pdf. 6. Don’t forget to include your calculations (6 points here).

在線客服

售前咨詢
售后咨詢
微信號
Essay_Cheery
微信
专业essay代写|留学生论文,作业,网课,考试|代做功課服務-PROESSAY HKG 专业留学Essay|Assignment代写|毕业论文代写-rushmyessay,绝对靠谱负责 代写essay,代写assignment,「立减5%」网课代修-Australiaway 代写essay,代写assignment,代写PAPER,留学生论文代写网 毕业论文代写,代写paper,北美CS代写-编程代码,代写金融-第一代写网 作业代写:CS代写|代写论文|统计,数学,物理代写-天天论文网 提供高质量的essay代写,Paper代写,留学作业代写-天才代写 全优代写 - 北美Essay代写,Report代写,留学生论文代写作业代写 北美顶级代写|加拿大美国论文作业代写服务-最靠谱价格低-CoursePass 论文代写等留学生作业代做服务,北美网课代修领导者AssignmentBack