Homework Assignment #2 - Calculating Size of Distributed File Systems

Question # 00838096 Posted By: wildcraft Updated on: 02/03/2023 08:21 PM Due on: 02/04/2023
Subject Computer Science Topic General Computer Science Tutorials:
Question
Dot Image

Homework Assignment #2 – Calculating Size of Distributed File Systems

Exercise 3-1 Imagine that you want to analyze one terabyte (1 TB) ofdata that is residing in a single machine with eight input/output channels, where each channel has a reading speed of 150 megabytes per second (MB/s).

1. Calculate the time it takes for the reader to read the entire file.

2. To speed up the reading operation, consider adding more machines and creating a distributed cluster. What is the minimum number of machines you should install in the cluster so the entire read time is less than 10 seconds?

Exercise 3-2 A cluster with 50 machines is storing blocks of data that belong to customer complaints. The size of the file is 5 TB, and each machine has four channels with a reading speed of 100 MB/s for each channel. Is the number of machines (50) sufficient to read the data in under 20 seconds? If not, how many more similar machines need to be added to the cluster?

Exercise 3-3 You want to store a 500 MB file into a cluster with 12 nodes, which are located in four different racks (three nodes per rack) as shown in the figure below.

1. If a data block can store 128 MB, how many data blocks are needed to split this file?

2. Use a replication factor of 3 and the write principles discussed earlier to allocate the data blocks into this cluster.

3. Repeat steps 1 and 2 but with a block size of 256 MB.

Exercise 3-4 Use the same cluster in the figure shown below for a file size of 50 GB. Each Data Node can store up to 8 GB of data. You need to allocate the data blocks, each of a 256 MB size, in the cluster using a replication factor of 3.

1. Is the number of Data Nodes (12) sufficient to store this data file?

2. If not, how many more Data Nodes are needed? If needed, add them to the cluster in a separate rack and allocate the blocks in the modified cluster.

3. If 12 is sufficient, allocate the data blocks in the cluster.

4. Repeat steps 1 through 3 but with a block size of 128 MB.

Exercise 3-5 Consider the block allocations shown in the figure below. Using areplication factor of 3, are all blocks allocated in the correct Rack and Data Node?

If no, reallocate the blocks correctly. Explain your decision.

Exercise 3-6 Consider the block allocations shown in the figure below. Using a replication factor of 3, are all blocks allocated in the correct Rack and Data Node?

If no, reallocate the blocks correctly. Explain your decision.

Exercise 3-7 Use the HDFS commands provided in Appendix A-Part 2 (HDFS) to perform the following tasks. Submit a document in which each command is associated with a screenshot of its result.

1. Create a directory in HDFS.

2. Copy any file from the local machine to the newly created directory.

3. List the directory’s contents.

4. View the contents of the file.

5. Rename any file in HDFS.

6. Create another directory in HDFS and move any file from one directory to another.

7. Delete any file.

8. Delete any directory.

9. Move any file from HDFS to the local machine.

10. Display the size of files.

11. Change the group of any file or directory.

12. Change permissions of any file or directory.

Dot Image
Tutorials for this Question
  1. Tutorial # 00833554 Posted By: wildcraft Posted on: 02/03/2023 08:21 PM
    Puchased By: 2
    Tutorial Preview
    The solution of Homework Assignment #2 - Calculating Size of Distributed File Systems...
    Attachments
    Homework_Assignment_2_-_Calculating_Size_of_Distributed_File_Systems.ZIP (18.96 KB)

Great! We have found the solution of this question!

Whatsapp Lisa