Python代寫-COMP527-Assignment 2

時間：2021-07-08

COMP527 - JAN21 - CA Assignment 2

Data Clustering

Implementing the k-means and k-medians clustering algorithm

Assessment Information

Assignment Number 2 (of 2)

Weighting 15%

Assignment Circulated 08th June 2021

Deadline 28th July 2021, 17:00 UK Time (UTC)

Submission Mode Electronic via Canvas

Learning outcome assessed (1) A critical awareness of current problems and research

issues in data mining.

Purpose of assessment This assignment assess the understanding of k-means clus-

tering algorithm by implementing k-means for text cluster-

ing.

Marking criteria Marks for each question are indicated under the correspond-

ing question.

Late Submission Penalty Standard UoL Policy applies.

1 Submission Instructions

Submit via Canvas the following three files (please do NOT zip files into an archive)

1. the source code for all your programs0 (do not provide ipython/jupyter/colab note-

books, instead submit standalone code in a single .py file)

2. a README file (plain text) describing how to compile/run your code to produce the various

results required by the assignment, and

3. a PDF file providing the answer to the questions.

It is extremely important that you provide all the files described above and not just the source

code!

1

2 Objectives

This assignment requires you to implement the k-means and k-medians clustering algorithm using

the Python programming language.

No credit will be given for implementing any other types of clustering algo-

rithms or using an existing library for clustering instead of implementing it

by yourself.However, you are allowed to use numpy library for accessing data

structures such as numpy array. But it is not a requirement of the assignment

to use numpy. You can use matplotlib for plotting but it is not compulsory

to use matplotlib. You must provide a README file describing how to run

your code to re-produce your results. Programs that do not run will result

in a mark of zero!

3 Assignment Description

In the assignment, you are required to cluster words belonging to four categories: animals, countries,

fruits and veggies. The words are arranged into four different files that you will find in the archive

CA2data.zip. The first entry in each line is a word followed by 300 features (word embedding)

describing the meaning of that word.

Questions

(1) (25 marks) Implement the k-means clustering algorithm to cluster the instances into k clus-

ters.

(2) (25 marks) Implement the k-medians clustering algorithm to cluster the instances into k

clusters.

(3) (10 marks) Run the k-means clustering algorithm you implemented in part (1) to cluster the

given instances. Vary the value of k from 1 to 9 and compute the B-CUBED precision, recall,

and F-score for each set of clusters. Plot k in the horizontal axis and the B-CUBED precision,

recall and F-score in the vertical axis in the same plot.

(4) (10 marks) Now re-run the k-means clustering algorithm you implemented in part (1) but

normalise each object (vector) to unit `2 length before clustering. Vary the value of k from 1

to 9 and compute the B-CUBED precision, recall, and F-score for each set of clusters. Plot k

in the horizontal axis and the B-CUBED precision, recall and F-score in the vertical axis in

the same plot.

(5) (10 marks) Run the k-medians clustering algorithm you implemented in part (2) over the

unnormalised objects. Vary the value of k from 1 to 9 and compute the B-CUBED precision,

recall, and F-score for each set of clusters. Plot k in the horizontal axis and the B-CUBED

precision, recall and F-score in the vertical axis in the same plot.

(6) (10 marks) Now re-run the k-medians clustering algorithm you implemented in part (2) but

normalise each object (vector) to unit `2 length before clustering. Vary the value of k from 1

to 9 and compute the B-CUBED precision, recall, and F-score for each set of clusters. Plot k

2

in the horizontal axis and the B-CUBED precision, recall and F-score in the vertical axis in

the same plot.

(7) (10 marks) Comparing the different clusterings you obtained in (3)-(6), discuss in which

setting you obtained best clustering for this dataset.

3

學霸聯盟

Data Clustering

Implementing the k-means and k-medians clustering algorithm

Assessment Information

Assignment Number 2 (of 2)

Weighting 15%

Assignment Circulated 08th June 2021

Deadline 28th July 2021, 17:00 UK Time (UTC)

Submission Mode Electronic via Canvas

Learning outcome assessed (1) A critical awareness of current problems and research

issues in data mining.

Purpose of assessment This assignment assess the understanding of k-means clus-

tering algorithm by implementing k-means for text cluster-

ing.

Marking criteria Marks for each question are indicated under the correspond-

ing question.

Late Submission Penalty Standard UoL Policy applies.

1 Submission Instructions

Submit via Canvas the following three files (please do NOT zip files into an archive)

1. the source code for all your programs0 (do not provide ipython/jupyter/colab note-

books, instead submit standalone code in a single .py file)

2. a README file (plain text) describing how to compile/run your code to produce the various

results required by the assignment, and

3. a PDF file providing the answer to the questions.

It is extremely important that you provide all the files described above and not just the source

code!

1

2 Objectives

This assignment requires you to implement the k-means and k-medians clustering algorithm using

the Python programming language.

No credit will be given for implementing any other types of clustering algo-

rithms or using an existing library for clustering instead of implementing it

by yourself.However, you are allowed to use numpy library for accessing data

structures such as numpy array. But it is not a requirement of the assignment

to use numpy. You can use matplotlib for plotting but it is not compulsory

to use matplotlib. You must provide a README file describing how to run

your code to re-produce your results. Programs that do not run will result

in a mark of zero!

3 Assignment Description

In the assignment, you are required to cluster words belonging to four categories: animals, countries,

fruits and veggies. The words are arranged into four different files that you will find in the archive

CA2data.zip. The first entry in each line is a word followed by 300 features (word embedding)

describing the meaning of that word.

Questions

(1) (25 marks) Implement the k-means clustering algorithm to cluster the instances into k clus-

ters.

(2) (25 marks) Implement the k-medians clustering algorithm to cluster the instances into k

clusters.

(3) (10 marks) Run the k-means clustering algorithm you implemented in part (1) to cluster the

given instances. Vary the value of k from 1 to 9 and compute the B-CUBED precision, recall,

and F-score for each set of clusters. Plot k in the horizontal axis and the B-CUBED precision,

recall and F-score in the vertical axis in the same plot.

(4) (10 marks) Now re-run the k-means clustering algorithm you implemented in part (1) but

normalise each object (vector) to unit `2 length before clustering. Vary the value of k from 1

to 9 and compute the B-CUBED precision, recall, and F-score for each set of clusters. Plot k

in the horizontal axis and the B-CUBED precision, recall and F-score in the vertical axis in

the same plot.

(5) (10 marks) Run the k-medians clustering algorithm you implemented in part (2) over the

unnormalised objects. Vary the value of k from 1 to 9 and compute the B-CUBED precision,

recall, and F-score for each set of clusters. Plot k in the horizontal axis and the B-CUBED

precision, recall and F-score in the vertical axis in the same plot.

(6) (10 marks) Now re-run the k-medians clustering algorithm you implemented in part (2) but

normalise each object (vector) to unit `2 length before clustering. Vary the value of k from 1

to 9 and compute the B-CUBED precision, recall, and F-score for each set of clusters. Plot k

2

in the horizontal axis and the B-CUBED precision, recall and F-score in the vertical axis in

the same plot.

(7) (10 marks) Comparing the different clusterings you obtained in (3)-(6), discuss in which

setting you obtained best clustering for this dataset.

3

學霸聯盟

- 留學生代寫
- Python代寫
- Java代寫
- c/c++代寫
- 數據庫代寫
- 算法代寫
- 機器學習代寫
- 數據挖掘代寫
- 數據分析代寫
- android/ios代寫
- web/html代寫
- 計算機網絡代寫
- 操作系統代寫
- 計算機體系結構代寫
- R代寫
- 數學代寫
- Finance 金融作業代寫
- Principles of Microeconomics 微觀經濟學代寫
- Accounting 會計代寫
- Statistics統計代寫
- 生物代寫
- 物理代寫
- 機械代寫
- Assignment代寫
- sql數據庫代寫
- analysis代寫
- Haskell代寫
- Linux代寫
- Shell代寫
- SPSS, SAS, R 數據分析代寫
- Principles of Macroeconomics 宏觀經濟學代寫
- Economics 經濟代寫
- Econometrics 計量經濟代寫
- Money and Banking 貨幣銀行學代寫
- Financial statistics 金融統計代寫
- Economic statistics 經濟統計代寫
- Probability theory 概率論代寫
- Algebra 代數代寫
- Engineering工程作業代寫
- Mechanical and Automation Engineering 機械與自動化工程代寫
- Actuarial Science 精算科學代寫
- JavaScript代寫
- Matlab代寫
- Unity代寫
- BigDate大數據代寫
- 匯編代寫
- stat代寫
- scala代寫
- OpenGL代寫
- CS代寫