一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写

Python/MATLAB代寫-CSE 250B
時間:2021-01-25
Nearest neighbor classification
CSE 250B
The problem we’ll solve today
Given an image of a handwritten digit, say which digit it is.
=? 3
Some more examples:
The problem we’ll solve today
Given an image of a handwritten digit, say which digit it is.
=? 3
Some more examples:
The machine learning approach
Assemble a data set:
The MNIST data set of handwritten digits:
? Training set of 60,000 images and their labels.
? Test set of 10,000 images and their labels.
And let the machine figure out the underlying patterns.
Nearest neighbor classification
Training images x (1), x (2), x (3), . . . , x (60000)
Labels y (1), y (2), y (3), . . . , y (60000) are numbers in the range 0? 9
How to classify a new image x?
? Find its nearest neighbor amongst the x (i)
? Return y (i)
The data space
How to measure the distance between images?
MNIST images:
? Size 28× 28 (total: 784 pixels)
? Each pixel is grayscale: 0-255
Stretch each image into a vector with 784 coordinates:
? Data space X = R784
? Label space Y = {0, 1, . . . , 9}
The data space
How to measure the distance between images?
MNIST images:
? Size 28× 28 (total: 784 pixels)
? Each pixel is grayscale: 0-255
Stretch each image into a vector with 784 coordinates:
? Data space X = R784
? Label space Y = {0, 1, . . . , 9}
The distance function
Remember Euclidean distance in two dimensions?
x = (1, 2)
z = (3, 5)
Euclidean distance in higher dimension
Euclidean distance between 784-dimensional vectors x , z is
‖x ? z‖ =
√√√√ 784∑
i=1
(xi ? zi )2
Here xi is the ith coordinate of x .
Nearest neighbor classification
Training images x (1), . . . , x (60000), labels y (1), . . . , y (60000)
To classify a new image x :
? Find its nearest neighbor amongst the x (i) using
Euclidean distance in R784
? Return y (i)
How accurate is this classifier?
Accuracy of nearest neighbor on MNIST
Training set of 60,000 points.
? What is the error rate on training points?
Zero.
In general, training error is an overly optimistic predictor of future
performance.
? A better gauge: separate test set of 10,000 points.
Test error = fraction of test points incorrectly classified.
? What test error would we expect for a random classifier?
(One that picks a label 0? 9 at random?) 90%.
? Test error of nearest neighbor: 3.09%.
Accuracy of nearest neighbor on MNIST
Training set of 60,000 points.
? What is the error rate on training points? Zero.
In general, training error is an overly optimistic predictor of future
performance.
? A better gauge: separate test set of 10,000 points.
Test error = fraction of test points incorrectly classified.
? What test error would we expect for a random classifier?
(One that picks a label 0? 9 at random?) 90%.
? Test error of nearest neighbor: 3.09%.
Accuracy of nearest neighbor on MNIST
Training set of 60,000 points.
? What is the error rate on training points? Zero.
In general, training error is an overly optimistic predictor of future
performance.
? A better gauge: separate test set of 10,000 points.
Test error = fraction of test points incorrectly classified.
? What test error would we expect for a random classifier?
(One that picks a label 0? 9 at random?) 90%.
? Test error of nearest neighbor: 3.09%.
Accuracy of nearest neighbor on MNIST
Training set of 60,000 points.
? What is the error rate on training points? Zero.
In general, training error is an overly optimistic predictor of future
performance.
? A better gauge: separate test set of 10,000 points.
Test error = fraction of test points incorrectly classified.
? What test error would we expect for a random classifier?
(One that picks a label 0? 9 at random?)
90%.
? Test error of nearest neighbor: 3.09%.
Accuracy of nearest neighbor on MNIST
Training set of 60,000 points.
? What is the error rate on training points? Zero.
In general, training error is an overly optimistic predictor of future
performance.
? A better gauge: separate test set of 10,000 points.
Test error = fraction of test points incorrectly classified.
? What test error would we expect for a random classifier?
(One that picks a label 0? 9 at random?) 90%.
? Test error of nearest neighbor: 3.09%.
Accuracy of nearest neighbor on MNIST
Training set of 60,000 points.
? What is the error rate on training points? Zero.
In general, training error is an overly optimistic predictor of future
performance.
? A better gauge: separate test set of 10,000 points.
Test error = fraction of test points incorrectly classified.
? What test error would we expect for a random classifier?
(One that picks a label 0? 9 at random?) 90%.
? Test error of nearest neighbor: 3.09%.
Examples of errors
Test set of 10,000 points:
? 309 are misclassified
? Error rate 3.09%
Examples of errors:
Query
NN
Ideas for improvement: (1) k-NN (2) better distance function.
Examples of errors
Test set of 10,000 points:
? 309 are misclassified
? Error rate 3.09%
Examples of errors:
Query
NN
Ideas for improvement: (1) k-NN (2) better distance function.
K -nearest neighbor classification
Classify a point using the labels of its k nearest neighbors among the
training points.
MNIST:
k 1 3 5 7 9 11
Test error (%) 3.09 2.94 3.13 3.10 3.43 3.34
In real life, there’s no test set. How to decide which k is best?
1 Hold-out set.
? Let S be the training set.
? Choose a subset V ? S as a validation set.
? What fraction of V is misclassified by finding the k-nearest
neighbors in S \ V ?
2 Leave-one-out cross-validation.
? For each point x ∈ S , find the k-nearest neighbors in S \ {x}.
? What fraction are misclassified?
K -nearest neighbor classification
Classify a point using the labels of its k nearest neighbors among the
training points.
MNIST:
k 1 3 5 7 9 11
Test error (%) 3.09 2.94 3.13 3.10 3.43 3.34
In real life, there’s no test set. How to decide which k is best?
1 Hold-out set.
? Let S be the training set.
? Choose a subset V ? S as a validation set.
? What fraction of V is misclassified by finding the k-nearest
neighbors in S \ V ?
2 Leave-one-out cross-validation.
? For each point x ∈ S , find the k-nearest neighbors in S \ {x}.
? What fraction are misclassified?
K -nearest neighbor classification
Classify a point using the labels of its k nearest neighbors among the
training points.
MNIST:
k 1 3 5 7 9 11
Test error (%) 3.09 2.94 3.13 3.10 3.43 3.34
In real life, there’s no test set. How to decide which k is best?
1 Hold-out set.
? Let S be the training set.
? Choose a subset V ? S as a validation set.
? What fraction of V is misclassified by finding the k-nearest
neighbors in S \ V ?
2 Leave-one-out cross-validation.
? For each point x ∈ S , find the k-nearest neighbors in S \ {x}.
? What fraction are misclassified?
K -nearest neighbor classification
Classify a point using the labels of its k nearest neighbors among the
training points.
MNIST:
k 1 3 5 7 9 11
Test error (%) 3.09 2.94 3.13 3.10 3.43 3.34
In real life, there’s no test set. How to decide which k is best?
1 Hold-out set.
? Let S be the training set.
? Choose a subset V ? S as a validation set.
? What fraction of V is misclassified by finding the k-nearest
neighbors in S \ V ?
2 Leave-one-out cross-validation.
? For each point x ∈ S , find the k-nearest neighbors in S \ {x}.
? What fraction are misclassified?
K -nearest neighbor classification
Classify a point using the labels of its k nearest neighbors among the
training points.
MNIST:
k 1 3 5 7 9 11
Test error (%) 3.09 2.94 3.13 3.10 3.43 3.34
In real life, there’s no test set. How to decide which k is best?
1 Hold-out set.
? Let S be the training set.
? Choose a subset V ? S as a validation set.
? What fraction of V is misclassified by finding the k-nearest
neighbors in S \ V ?
2 Leave-one-out cross-validation.
? For each point x ∈ S , find the k-nearest neighbors in S \ {x}.
? What fraction are misclassified?
Cross-validation
How to estimate the error of k-NN for a particular k?
10-fold cross-validation
? Divide the training set into 10 equal pieces.
Training set (call it S): 60,000 points
Call the pieces S1,S2, . . . ,S10: 6,000 points each.
? For each piece Si :
? Classify each point in Si using k-NN with training set S ? Si
? Let i = fraction of Si that is incorrectly classified
? Take the average of these 10 numbers:
estimated error with k-NN =
1 + · · ·+ 10
10
Another improvement: better distance functions
The Euclidean (`2) distance between these two images is very high!
Much better idea: distance measures that are invariant under:
? Small translations and rotations. e.g. tangent distance.
? A broader family of natural deformations. e.g. shape context.
Test error rates:
`2 tangent distance shape context
3.09 1.10 0.63
Another improvement: better distance functions
The Euclidean (`2) distance between these two images is very high!
Much better idea: distance measures that are invariant under:
? Small translations and rotations. e.g. tangent distance.
? A broader family of natural deformations. e.g. shape context.
Test error rates:
`2 tangent distance shape context
3.09 1.10 0.63
Another improvement: better distance functions
The Euclidean (`2) distance between these two images is very high!
Much better idea: distance measures that are invariant under:
? Small translations and rotations. e.g. tangent distance.
? A broader family of natural deformations. e.g. shape context.
Test error rates:
`2 tangent distance shape context
3.09 1.10 0.63
Related problem: feature selection
Feature selection/reweighting is part of picking a distance function.
And, one noisy feature can wreak havoc with nearest neighbor!
versus
Related problem: feature selection
Feature selection/reweighting is part of picking a distance function.
And, one noisy feature can wreak havoc with nearest neighbor!
versus
Algorithmic issue: speeding up NN search
Naive search takes time O(n) for training set of size n: slow!
There are data structures for speeding up nearest neighbor search, like:
1 Locality sensitive hashing
2 Ball trees
3 K -d trees
These are part of standard Python libraries for NN, and help a lot.
Algorithmic issue: speeding up NN search
Naive search takes time O(n) for training set of size n: slow!
There are data structures for speeding up nearest neighbor search, like:
1 Locality sensitive hashing
2 Ball trees
3 K -d trees
These are part of standard Python libraries for NN, and help a lot.
Example: k-d trees for NN search
A hierarchical, rectilinear spatial partition.
For data set S ? Rd :
? Pick a coordinate 1 ≤ i ≤ d .
? Compute v = median({xi : x ∈ S}).
? Split S into two halves:
SL = {x ∈ S : xi < v}
SR = {x ∈ S : xi ≥ v}
? Recurse on SL,SR
Two types of search, given a query q ∈ Rd :
? Defeatist search: Route q to a leaf cell and return the NN in that
cell. This might not be the true NN.
? Comprehensive search: Grow the search region to other cells that
cannot be ruled out using the triangle inequality.
Example: k-d trees for NN search
A hierarchical, rectilinear spatial partition.
For data set S ? Rd :
? Pick a coordinate 1 ≤ i ≤ d .
? Compute v = median({xi : x ∈ S}).
? Split S into two halves:
SL = {x ∈ S : xi < v}
SR = {x ∈ S : xi ≥ v}
? Recurse on SL,SR
Two types of search, given a query q ∈ Rd :
? Defeatist search: Route q to a leaf cell and return the NN in that
cell. This might not be the true NN.
? Comprehensive search: Grow the search region to other cells that
cannot be ruled out using the triangle inequality.
The curse of dimension in NN search
Situation: n data points in Rd .
1 Storage is O(nd)
2 Time to compute distance is O(d) for `p norms
3 Geometry
It is possible to have 2O(d) points that are roughly equidistant from
each other.
Current methods for fast exact NN search have time complexity
proportional to 2d and log n.
The curse of dimension in NN search
Situation: n data points in Rd .
1 Storage is O(nd)
2 Time to compute distance is O(d) for `p norms
3 Geometry
It is possible to have 2O(d) points that are roughly equidistant from
each other.
Current methods for fast exact NN search have time complexity
proportional to 2d and log n.
The curse of dimension in NN search
Situation: n data points in Rd .
1 Storage is O(nd)
2 Time to compute distance is O(d) for `p norms
3 Geometry
It is possible to have 2O(d) points that are roughly equidistant from
each other.
Current methods for fast exact NN search have time complexity
proportional to 2d and log n.
The curse of dimension in NN search
Situation: n data points in Rd .
1 Storage is O(nd)
2 Time to compute distance is O(d) for `p norms
3 Geometry
It is possible to have 2O(d) points that are roughly equidistant from
each other.
Current methods for fast exact NN search have time complexity
proportional to 2d and log n.
The curse of dimension in NN search
Situation: n data points in Rd .
1 Storage is O(nd)
2 Time to compute distance is O(d) for `p norms
3 Geometry
It is possible to have 2O(d) points that are roughly equidistant from
each other.
Current methods for fast exact NN search have time complexity
proportional to 2d and log n.
Postscript: useful families of distance functions
1 `p norms
2 Metric spaces
Measuring distance in Rm
Usual choice: Euclidean distance:
‖x ? z‖2 =
√√√√ m∑
i=1
(xi ? zi )2.
For p ≥ 1, here is `p distance:
‖x ? z‖p =
(
m∑
i=1
|xi ? zi |p
)1/p
? p = 2: Euclidean distance
? `1 distance: ‖x ? z‖1 =
∑m
i=1 |xi ? zi |
? `∞ distance: ‖x ? z‖∞ = maxi |xi ? zi |
Measuring distance in Rm
Usual choice: Euclidean distance:
‖x ? z‖2 =
√√√√ m∑
i=1
(xi ? zi )2.
For p ≥ 1, here is `p distance:
‖x ? z‖p =
(
m∑
i=1
|xi ? zi |p
)1/p
? p = 2: Euclidean distance
? `1 distance: ‖x ? z‖1 =
∑m
i=1 |xi ? zi |
? `∞ distance: ‖x ? z‖∞ = maxi |xi ? zi |
Example 1
Consider the all-ones vector (1, 1, . . . , 1) in Rd .
What are its `2, `1, and `∞ length?
Example 2
In R2, draw all points with:
1 `2 length 1
2 `1 length 1
3 `∞ length 1
Metric spaces
Let X be the space in which data lie.
A distance function d : X × X → R is a metric if it satisfies these
properties:
? d(x , y) ≥ 0 (nonnegativity)
? d(x , y) = 0 if and only if x = y
? d(x , y) = d(y , x) (symmetry)
? d(x , z) ≤ d(x , y) + d(y , z) (triangle inequality)
Example 1
X = Rm and d(x , y) = ‖x ? y‖p
Check:
? d(x , y) ≥ 0 (nonnegativity)
? d(x , y) = 0 if and only if x = y
? d(x , y) = d(y , x) (symmetry)
? d(x , z) ≤ d(x , y) + d(y , z) (triangle inequality)
Example 2
X = {strings over some alphabet} and d = edit distance
Check:
? d(x , y) ≥ 0 (nonnegativity)
? d(x , y) = 0 if and only if x = y
? d(x , y) = d(y , x) (symmetry)
? d(x , z) ≤ d(x , y) + d(y , z) (triangle inequality)

A non-metric distance function
Let p, q be probability distributions on some set X .
The Kullback-Leibler divergence or relative entropy between p, q is:
d(p, q) =

x∈X
p(x) log
p(x)
q(x)
.
學霸聯盟:"http://aessay.org"

在線客服

售前咨詢
售后咨詢
微信號
Essay_Cheery
微信
专业essay代写|留学生论文,作业,网课,考试|代做功課服務-PROESSAY HKG 专业留学Essay|Assignment代写|毕业论文代写-rushmyessay,绝对靠谱负责 代写essay,代写assignment,「立减5%」网课代修-Australiaway 代写essay,代写assignment,代写PAPER,留学生论文代写网 毕业论文代写,代写paper,北美CS代写-编程代码,代写金融-第一代写网 作业代写:CS代写|代写论文|统计,数学,物理代写-天天论文网 提供高质量的essay代写,Paper代写,留学作业代写-天才代写 全优代写 - 北美Essay代写,Report代写,留学生论文代写作业代写 北美顶级代写|加拿大美国论文作业代写服务-最靠谱价格低-CoursePass 论文代写等留学生作业代做服务,北美网课代修领导者AssignmentBack