BIAM 530 Week 5 iLab, Analyzing Data with Hadoop


Category: BIAM 530 Tag: biam 530


BIAM 530 Week 5 Step 1: Review Hadoop Tutorial and Answer Questions

Question 1

Visit the US National Oceanic and Atmospheric Association (NOAA) data access site at (Links to an external site.)Links to an external site. and view the GHCN-Daily Sample PDF file illustrating the data used for this analysis. Are there other data columns here that might be relevant to 311 call volume besides what was used in the tutorial? If so, what are they, and how might you include them in the analysis?

Question 2

Review what other data files are available on this NOAA National centers for Environmental Information Quick Links page. Select one other file that might contain data of interest to a company or government agency and briefly describe an analysis that could be performed using this data.

Question 3

Visit the NYC Open Data site at (Links to an external site.)Links to an external site. and follow the links to Social Services and the 311 Service Requests page. In addition to the data columns used in the tutorial, what other data columns available might be useful in an analysis of 311 calls? Suggest some analyses that might be …..using these data.

Question 4

Explore the NYC Open Data site to identify other data sets available besides the 311 service calls. Select one of these data sets and briefly describe how it could be used in an analysis for a government agency or business.

Question 5

Summarize the major steps performed in the analysis shown in the tutorial, and the IBM BigSheets functions used in the analysis

Question 6

Compare and contrast the analysis shown in the tutorial, using Hadoop and IBM BigSheets. With how a similar analysis might be performed on a smaller data set using Microsoft Excel. What was similar and what was different? What challenges would be … an analyst skilled in Excel when adjusting to working with big data sets using Hadoop?

BIAM 530 Week 5 Step 3: Get Connection Details and View Ambari Console

Q1:What percentage of disk space on the Hadoop ……File System (HDFS) is being ….? How many gigabytes (GB) or terabytes (TB) does this represent? (Hint: Hover over the HDFS Disk Usage section and add the values for DFS Used and non-DFS Used.)

Q2:How many DataNodes are in the cluster? (Hint: Look in the HDFS Links section.)

Q3:How long has the cluster been running since the last restart? (Hint: Look in the NameNode Uptime section.)

BIAM 530 Week 5 Step 4: Use Basic HDFS Commands

biam 530 week 5biam 530 week 5