一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写

Python代寫-COMP527-Assignment 2
時間:2021-07-08
COMP527 - JAN21 - CA Assignment 2
Data Clustering
Implementing the k-means and k-medians clustering algorithm
Assessment Information
Assignment Number 2 (of 2)
Weighting 15%
Assignment Circulated 08th June 2021
Deadline 28th July 2021, 17:00 UK Time (UTC)
Submission Mode Electronic via Canvas
Learning outcome assessed (1) A critical awareness of current problems and research
issues in data mining.
Purpose of assessment This assignment assess the understanding of k-means clus-
tering algorithm by implementing k-means for text cluster-
ing.
Marking criteria Marks for each question are indicated under the correspond-
ing question.
Late Submission Penalty Standard UoL Policy applies.
1 Submission Instructions
Submit via Canvas the following three files (please do NOT zip files into an archive)
1. the source code for all your programs0 (do not provide ipython/jupyter/colab note-
books, instead submit standalone code in a single .py file)
2. a README file (plain text) describing how to compile/run your code to produce the various
results required by the assignment, and
3. a PDF file providing the answer to the questions.
It is extremely important that you provide all the files described above and not just the source
code!
1
2 Objectives
This assignment requires you to implement the k-means and k-medians clustering algorithm using
the Python programming language.
No credit will be given for implementing any other types of clustering algo-
rithms or using an existing library for clustering instead of implementing it
by yourself.However, you are allowed to use numpy library for accessing data
structures such as numpy array. But it is not a requirement of the assignment
to use numpy. You can use matplotlib for plotting but it is not compulsory
to use matplotlib. You must provide a README file describing how to run
your code to re-produce your results. Programs that do not run will result
in a mark of zero!
3 Assignment Description
In the assignment, you are required to cluster words belonging to four categories: animals, countries,
fruits and veggies. The words are arranged into four different files that you will find in the archive
CA2data.zip. The first entry in each line is a word followed by 300 features (word embedding)
describing the meaning of that word.
Questions
(1) (25 marks) Implement the k-means clustering algorithm to cluster the instances into k clus-
ters.
(2) (25 marks) Implement the k-medians clustering algorithm to cluster the instances into k
clusters.
(3) (10 marks) Run the k-means clustering algorithm you implemented in part (1) to cluster the
given instances. Vary the value of k from 1 to 9 and compute the B-CUBED precision, recall,
and F-score for each set of clusters. Plot k in the horizontal axis and the B-CUBED precision,
recall and F-score in the vertical axis in the same plot.
(4) (10 marks) Now re-run the k-means clustering algorithm you implemented in part (1) but
normalise each object (vector) to unit `2 length before clustering. Vary the value of k from 1
to 9 and compute the B-CUBED precision, recall, and F-score for each set of clusters. Plot k
in the horizontal axis and the B-CUBED precision, recall and F-score in the vertical axis in
the same plot.
(5) (10 marks) Run the k-medians clustering algorithm you implemented in part (2) over the
unnormalised objects. Vary the value of k from 1 to 9 and compute the B-CUBED precision,
recall, and F-score for each set of clusters. Plot k in the horizontal axis and the B-CUBED
precision, recall and F-score in the vertical axis in the same plot.
(6) (10 marks) Now re-run the k-medians clustering algorithm you implemented in part (2) but
normalise each object (vector) to unit `2 length before clustering. Vary the value of k from 1
to 9 and compute the B-CUBED precision, recall, and F-score for each set of clusters. Plot k
2
in the horizontal axis and the B-CUBED precision, recall and F-score in the vertical axis in
the same plot.
(7) (10 marks) Comparing the different clusterings you obtained in (3)-(6), discuss in which
setting you obtained best clustering for this dataset.
3

學霸聯盟

在線客服

售前咨詢
售后咨詢
微信號
Essay_Cheery
微信
专业essay代写|留学生论文,作业,网课,考试|代做功課服務-PROESSAY HKG 专业留学Essay|Assignment代写|毕业论文代写-rushmyessay,绝对靠谱负责 代写essay,代写assignment,「立减5%」网课代修-Australiaway 代写essay,代写assignment,代写PAPER,留学生论文代写网 毕业论文代写,代写paper,北美CS代写-编程代码,代写金融-第一代写网 作业代写:CS代写|代写论文|统计,数学,物理代写-天天论文网 提供高质量的essay代写,Paper代写,留学作业代写-天才代写 全优代写 - 北美Essay代写,Report代写,留学生论文代写作业代写 北美顶级代写|加拿大美国论文作业代写服务-最靠谱价格低-CoursePass 论文代写等留学生作业代做服务,北美网课代修领导者AssignmentBack