Assignment 3 - Intro Comp, Some of the most important data

Intro Comp

Assignment 3 – Text mining

This is an individual assignment. Please be sure to watch the recorded "Assignment Lectures" for Assignment 3 (two parts) before you start. Lecture videos are under Content and contain step-by-step demos. You have to submit the Excel Graph activity before you progress to this assignment.

Purpose: Some of the most important data that we have are large quantities of text data. Text data includes books, articles, blog posts, social media posts, emails, reports, journals and diaries, shopping lists, etc. etc. etc. This data is unstructured and can be massive. We have software tools available to help us make sense out of large quantities of text data. When we use software to analyze and try to make sense out of text data, we call this text mining. A practical example of this is chat bots. Chat bots utilize algorithms to try to make sense out of what the user types so it can select an appropriate response. In this assignment we will explore the text mining strategies of looking at word frequency and occurrences. Other methods beyond the scope of this class include part of speech tagging and machine learning.


In this assignment you will need files or web links (URLS) with text data. Please watch the How-To video before you start the assignment. You will use two free browser-based text mining tools to analyze your text.

Task 1. Visualizing Word Frequency. Looking at the frequency of occurrence of each word can give you an overall sense of a document. You will use your first file to create a word cloud to visualize frequency and understand how stop words affect word frequency analysis. We will use WordClouds for this.

Task 2. Taking a deeper look at word frequency and occurrence. For the second text document we will use Voyant Tools to explore other visualization methods as well as look at word correlations.

Download the attached Word document Template add your screen shot for Task 2 and answer the questions for each task. This website can show you how to take a screenshot on whatever device you are using.

Steps for Task 1:

1. Open the following website: WordClouds

2. Watch this HowTo video. To create a Word Cloud, you can upload a file or paste text or submit a URL.

3. Personalize your Word Cloud by using different Theme, Shape, Gap, Font or other options. Observe that Stop Words can be included or excluded. They are excluded by default, so you have to create two Word Clouds from the same text: one with and one without the stop words. The website has the common English stop words already.

4. Save your Word Clouds as two images, called "WordCloud with Stop Words" and "WordCloud without Stop Words". Use the PNG file format. Submit these image to the Assignment 3 drop box.

5. Answer the questions in the Word document for Task 1.

Steps for Task 2:

1. Open the following website:

2. Paste in some URLS or paste the text from your Task 1 text file or from a different text file.

3. You will see data in five different windows and each window has a tab.

a. In the upper left window click 'Links'

b. In the upper middle window click 'TermsBerry'

c. In the upper right window, leave it on 'Trends' (or click it if it's not there)

d. In the bottom left window, leave it on 'Summary' (or click it if not there)

e. In the bottom right window, click 'Correlations'

4. Take a screen shot of the entire screen and add it to your Word document.

5. Answer the questions in the Word document for Task 2.

P.S. Other tools that can also be used for this task:

· LINCS Project mirror

· Huma-Num mirror

Upload the following:

1. Two Word cloud images, and Word document with screenshots and answers to the questions for Tasks 1 and 2

2. Your text files that you used in Tasks 1 and 2, or the URLS you used.

