一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写

Python代寫|算法代寫 - CSC 371 – SPRING 2017
時間:2017-11-28
DATA MANAGEMENT AND VISUALIZATION ASSIGNMENT 4 UNIVERSITY OF VICTORIA Due: Thursday, March 30th, 2017 in class. Late assignments will not be accepted. Submit your answers to the questions below on paper by the deadline above. Addition- ally, you must upload a copy of all code you use (SQL, Python, R, Excel, LibreOffice, etc.) to conneX by 11:59pm on March 30th. All code submissions must be your own work. However, you are permitted to use the code discussed in lectures (or code posted on conneX) as the basis for your submissions if proper attribution is given. Question 1: BC Ferries Data Visualizations [7 marks] The questions below use the a3_28days.sql dataset from assignment 3. For all of the visualizations in this section, you may use any software you want (Python, R, Excel, LibreOffice, etc.), but you must hand in all of the source files and intermediate data you use. Additionally, you will lose marks if your visualizations are not well-formatted, with axis labels, titles, a legend, good choices for axis ranges and histogram bins (where applicable). (a) The query below generates the duration of every sailing on route 1. select (strftime(‘%s’,arrival)-strftime(‘%s’,scheduled_departure))/60 as duration_minutes from sailings where route_number = 1; Create a histogram of the sailing times for route 1. Choose a set of bins which makes the distribution clear. (b) Create a histogram of the sailing times for route 30 in the same format as the histogram above. (c) The query below computes the average sailing time per day in February for each route. select route_number, strftime(‘%d’,scheduled_departure) as day, avg((strftime(‘%s’,arrival)-strftime(‘%s’,scheduled_departure))/60) as avg_duration_minutes from sailings where strftime(‘%m’,scheduled_departure) = ’02’ group by route_number, day order by route_number, day; Construct a line plot with the average sailing time per day for each of routes 1, 2 and 30 (on the same plot). You will receive a zero if more than one plot is used. To separate the data by route, you will need to either modify the query above (and make three variations) to produce separate tables for the three routes, or use np.where to filter the result of the query above. Question 2: Regular Expressions [6 marks] Design regular expressions to match the classes of text described below. You will lose marks if your expressions matches anything not within the described class or if it does not match anything 1 which is. You can use regular parentheses () to delineate groups if necessary (there is no need to indicate non-capturing groups). (a) Street addresses, which consist of a number, possibly followed by exactly one letter (as in ‘1234a Main St.’), followed by a street name, which must be one word long and capitalized. The street name is then followed by a capitalized extension, which must end in a period (e.g. ‘St.’, ‘Rd.’, ‘Ave.’, etc.). Any number of spaces (but at least one) may appear between the number and the street name or the street name and the extension. (b) Sequences of lowercase letters (not necessarily forming an English word) which contain the letter ‘A’ at most 3 times. (c) Sequences of lowercase letters (not necessarily forming an English word) with an even number of characters (0, 2, 4, 6, etc.) and where every second character is a vowel, starting with the first character. The other characters may any letter (including vowels). For example, sequences like ‘at’, ‘away’, ‘economic’, ‘avocados’ and ‘onomatopoeia’ should match, but sequences like ‘canoe’, ‘bore’, ‘omega’ and ‘avocado’ should not. Question 3: Technological Literary Analysis and Deconstruction [11 marks] Project Gutenberg (http://www.gutenberg.org) maintains a large database of public domain eBooks. Most of the books in their collection are in the public domain due to their age (since the copyright on literary works expires after a certain number of years since the death of the author). These books are useful for analysing the patterns found in English text. A sign-up sheet has been posted to conneX, containing a selection of works from the Project Gutenberg repository. For this question, you must choose one of the books in the list and reserve it on conneX. Only one student is allowed to choose each book, and you will receive zero marks on this question if you analyse a book other than the one you have reserved on the online system. Since the analysis does not require actually reading the book1, the length or genre of the book should not affect the difficulty of this question. Once you have chosen a book, follow the link from the signup sheet and download the UTF-8 text version. This will be your input data. You should not modify this file at all (leave all of the Project Gutenberg preamble in place). The purpose of this question is to analyse the length and quantity of English words used in your chosen book. An “english word” is defined to be any sequence of characters that meets either of the following criteria. ? A sequence of one or more letters, which may be uppercase or lowercase in any combination. ? A sequence of one or more letters followed by exactly one hyphen (the ‘-’ character), followed by a sequence of one or more letters. The letters may be uppercase or lowercase in any combination. You can use regular expressions to extract all of the words from the downloaded text file. If you print all of the extracted words to a text file (one per line), you can treat the resulting file as a CSV spreadsheet (with one column) and import it into SQLite (as a table with one column). You can then perform analysis on the lengths or frequencies of words and export the resulting data for visualization. 1. Although some of the books are certainly worth reading… 2 In your submission, answer the questions below for your chosen book (and be sure to clearly indicate what book you chose). (a) Generate a list of the 25 words that appear most frequently, along with their counts, ranked in descending order. This can be done with a simple aggregation-based SQL query. The result should be case-sensitive (so ‘The’ should be considered to be a different word than ‘the’). For Jane Austen’s Pride and Prejudice (which is not available as a choice for this question), the 25 most frequently occurring words are as follows. word occurrences ———- ———– the 4218 to 4163 of 3707 and 3493 her 2138 I 2070 a 1955 was 1844 in 1836 that 1540 not 1419 she 1384 it 1301 be 1255 you 1201 his 1190 had 1152 as 1134 he 1104 with 1066 for 1060 is 861 have 835 Mr 786 him 764 (b) Find the longest word in the book. In SQLite, you can use the length() function to get the length of a text field (for example, select word, length(word) from word_list; will list each word and its length). The length function can also be used for aggregation or WHERE clauses. For Jane Austen’s Pride and Prejudice, the longest word is ‘respectable-looking’ (c) Generate a table consisting of the number of words of each length. For Pride and Prejudice, the table is word_length words_with_that_length ———– ———————- 1 4809 2 23900 3 3 28961 4 22163 5 12013 6 9284 7 8670 8 5126 9 5163 10 2490 11 1428 12 917 13 432 14 123 15 41 16 17 17 8 19 1 (d) Plot a histogram of the number of words of each length (that is, a bar chart of the data from part (c). You may use any visualization software, and your result should look superficially similar to the chart below. 30000 25000 20000 15000 10000 5000 0 Words in Jane Austen’s “Pride and Prejudice” Length vs. Frequency 1 2 3 4 5 6 7 8 9 10111213141516171819 Word Length (# characters) Submission Instructions Your primary submission should be a hard copy document containing the answers to all of the above questions. In addition to the hard copy, you must submit all code (and intermediate data such as CSV files or spreadsheets) to conneX. The usual rules for electronic submissions apply. You are permitted to delete and resubmit your assignment as many times as you want before the due date, but no submissions or resubmissions will be accepted after the due date has passed. Ensure that each code file contains a comment with your name and student number. If you do not submit the code files needed for a part of the assignment, it will not be possible to mark that part 4 Number of Occurrences of your submission and you will receive a mark of zero. After submitting your assignment, conneX will automatically send you a confirmation email. If you do not receive such an email, your submission was not received. If you have problems with the submission process, send an email to the instructor before the due date.

在線客服

售前咨詢
售后咨詢
微信號
Essay_Cheery
微信
专业essay代写|留学生论文,作业,网课,考试|代做功課服務-PROESSAY HKG 专业留学Essay|Assignment代写|毕业论文代写-rushmyessay,绝对靠谱负责 代写essay,代写assignment,「立减5%」网课代修-Australiaway 代写essay,代写assignment,代写PAPER,留学生论文代写网 毕业论文代写,代写paper,北美CS代写-编程代码,代写金融-第一代写网 作业代写:CS代写|代写论文|统计,数学,物理代写-天天论文网 提供高质量的essay代写,Paper代写,留学作业代写-天才代写 全优代写 - 北美Essay代写,Report代写,留学生论文代写作业代写 北美顶级代写|加拿大美国论文作业代写服务-最靠谱价格低-CoursePass 论文代写等留学生作业代做服务,北美网课代修领导者AssignmentBack