Case Study 2.1 - Facebook’s Big Data Storage

Read “Case 2.1: Facebook’s Big Data Storage” and write an essay that answers the following questions:
1. What is Facebook's main motivation for creating new database management systems?
2. What are some of the challenges that Facebook faces when it comes to managing big data?
3. How does Facebook's use of Scuba and Cubrick database management systems improve its advertising efforts?
4. What types of data does Facebook collect from its users to inform its advertising efforts?
5. How does Facebook use machine learning to analyze user data and improve its advertising targeting?
6. What are the benefits and potential drawbacks of Facebook's use of big data and machine learning for advertising?
7. How does Facebook ensure that user data is protected and not misused for advertising purposes?
8. What are some potential ethical concerns related to Facebook's use of big data and machine learning for advertising?
9. How can Facebook improve transparency and communication with its users regarding its use of their data for advertising purposes?
10. What are some potential future developments in the field of big data and machine learning that could impact Facebook's advertising efforts?
Requirements:
· There is no specific page requirement for your analysis. Instead, your work will be evaluated based on how thoroughly it addresses each of the questions that have been outlined for you.
· You must utilize proper APA formatting and citations throughout your paper. If you use any supporting evidence from external sources, it is imperative that you provide accurate citations for each reference.
· You must include a minimum of two sources from scholarly articles or business periodicals, aside from the course textbook.
· Include your best critical thinking and analysis to arrive at your justification.
· Please note that any use of AI text generation tools such as ChatGPT is strictly prohibited and will result in a grade of zero due to plagiarism.
Submission: Upload/attach your completed paper to this assignment by the due date.
Case 2.1: Facebook’s Big Data Storage
Facebook uses many database management systems to sort and store big data. These database management systems include Hive, Hadoop, and Operational Data Store (ODS). These systems are a nontraditional relational database and process data for analysis, like who to advertise to, not to actually serve users. Hadoop is the base for Facebook’s Database Management Systems, but Facebook is constantly looking and trying to create new database management systems to become more efficient. There is so much data that Facebook receives every day and every minute. On aver- age Facebook’s system “processes 2.5 billion pieces of content and 500+ terabytes of data per day. It is pulling in 2.7 billion like actions and 300 million photos per day, and it scans roughly 105 terabytes of data each half hour” (Constine, 2012). The database management systems they have traditionally used are not as efficient as data has grown. In order to gain an advantage, Facebook is working to create bet- ter database management systems.
Facebook’s data warehouses, also known as the Hive, are capable of storing more than 300 petabytes of data in more than 800,000 tables. It runs more than 600,000 queries and 1 million map-reduce jobs per day. The common query engines over Hive include Presto, HiveQL, Hadoop, and Giraph. The data is used for a vari- ety of applications, from traditional database processing to image analytics, machine learning, and real-time interactive analytics (Wiener and Bronson, 2014). Facebook uses operational data store to support data mining of operational data or base data that is summarized for a data warehouse. ODS stores about 2 billion time series records. The data is loaded into an enterprise data architecture after being extracted from operational databases. The process includes standardization, cleansing, con- solidation, and transformation. It is used most commonly in alerts and dashboards and for troubleshooting system metrics with 1–5 min of time lag. There are about 40,000 queries per second (Wiener and Bronson, 2014).
Two database management systems that Facebook specifically created to sort data faster for advertising are Scuba and Cubrick. These database management sys- tems work fast to find the data they can use to purposefully advertise on Facebook. This helps with knowing what users want in that instance. The time it takes to pro- cess this data has shortened dramatically. Facebook receives millions of data every second, but Facebook’s drive to create new database management systems is to significantly advance the speed of information analysis, and they are always work- ing toward this goal.
Scuba is just “one of many ‘Big Data’ software platforms Facebook has pro- duced to control the information generated by its online operation – platforms that push the boundaries of distributed computing, the art of training hundreds or even thousands of computers on a single task” (Metz, 2017). Scuba is mainly used at Facebook for performance monitoring, trend spotting, and pattern mining. It grabs the data and sorts it faster and deletes data by itself. They have to delete certain data sometimes because it is either old, or there is limited space in the table. It deletes the oldest data in that table and provides the most relevant data. Scuba keeps all data in “high-speed memory systems running across hundreds of computer servers – not the hard disks, the memory systems – and this means you can query the data in near real-time” (Metz, 2017). By sorting current data and finding what users are doing now, Facebook can choose which advertisements users will respond to the best. Facebook can gain a lot more money when an advertisement is liked, clicked on, commented, or shared. Advertising strategically makes more money for Facebook. Data scientists can use database management systems to analyze how effective Facebook is running and the behavior of its users. This means Facebook can feed data directly to users (Metz, 2017).
Cubrick is used more for sorting generic data. They input a table of information into the database management system, and Cubrick takes all the attributes and con- nects the similar ones. The attributes are connected in relation with the colored dots as the user and the attributes are connected by the bricks, which form one cube. It is like graphing but with multiple graphs connected by the same range and dimension. This enables an improved and lean database mechanism only able to function over primitive data types. They sort the data and are able to rapidly group people together based on similar characteristics like gender or region. That is how they can group certain people to advertise.
Facebook uses these database management systems because they are faster and get them just the data they want. This helps them advertise to what the consumer wants now, not what they wanted in the past. They group people together using the data, and this can be based on simple things like demographics, people’s gender, age, income, housing type, and education level. All this information is able to be filled out on a person’s profile and can be public, like we mentioned before. They can also market based on the geographic market, marketing only in certain coun- tries, states, or regions because of the impacts from having different weather in areas and certain distinct cultures and values in an area. Again, they gain this infor- mation when you fill out a profile, specifying where you live and if you have your location on, they can track where you go. Price segmentation can be used as well but can be harder to track. They can decide what products to market to users such as cheap products, medium-priced, or expensive products. In order to do this, they target users based on what they put on their profile as their occupation, their degree level, and posts. After they get the data, they put it into their prediction algorithm.
Facebook uses the machine learning algorithm to rank what they think you would like to see in your feed. The algorithm does not just predict whether you will actu- ally “hit the like button on a post based on your past behavior. It also predicts whether you’ll click, comment, share, or hide it, or even mark it as spam. It will predict each of these outcomes, and others, with a certain degree of confidence, then combine them all to produce a single relevancy score that’s specific to both you and that post” (Oremus, 2016). This is how they can choose which advertisements they should advertise to people. The higher chance that a consumer will respond to the advertisement in any positive way will, in turn, make the advertisement a success. If the user shares it, likes it, or tags someone recommending the product, it allows other people to see the ad and make a purchase as well. There is a high chance that many friends of someone who likes the product will also like the product, making this system very effective.
Traditional large database systems are becoming incapable of processing big data that is generated with the current speed and volume. This is the reason many are looking to new developments for cloud-based big data analytics. Beneficial to developing AI, advancements in cloud technology can store more data and are accessible via the internet.
Cloud computing is based on remote servers on the internet that provides ser- vices to the users in different ways. AI technology in cloud computing is usually programmed to think just like a human would mimic his reactions and actions in certain circumstances. “By using the internet and central remote services it main- tains the data, applications, etc. which offers much more efficient computing by centralizing storage, memory, processing, bandwidth and so on” (Mollah et al. 2012). Infrastructure as a service (IaaS) provides renting storage, network, operat- ing system, servers, and virtual machines. It allows you to pay based on the usage of the services provided. Platform as a service (PaaS) designed mobile apps and web creation more easily. It eliminates the need to constantly update them or man- age them. Software as a service (SaaS) is a cloud service that allows the user to gain access to an application on the phone, tablet, or other electronic devices. Cloud computing has unique features that enable open access to wide resources, and it can be accessed easily from your phone, laptops, and computers. It opens up applica- tions that can provide rapid elasticity to resources used by clients and is automati- cally monitored. The fusion of cloud computing and AI technology will bring a significant change in the technology industry.
Cloud computing, where computing services can be delivered to virtually any- where there is an internet connection, has also seen significant growth in recent years, as many companies are moving toward using the cloud to deliver services. Together, big data and cloud computing are setting the trend of ever greater con- nectivity and improved data collection and analytics.
Although the cloud is not necessarily new to the world, since it has been around since the later 1990s, it is still something of great interest because of its vast amount of use. Almost every aspect of technology in our lives is affiliated with the cloud, such as video game storage, iTunes storage, business information, etc. With tech- nology advancing at a very rapid pace, the use of the cloud will still be as important, if not more important by the year 2020, which is when 5G is expected to launch globally. Even though 5G would increase the speed and vastness of technology, there will still need to be a place for easy and large amounts of storage. An outlook into the cloud is the increasing security in the cloud. Since many people are capable of hacking today, that causes a huge problem for not only individuals but the major- ity for businesses.

-
Rating:
5/
Solution: Case Study 2.1 - Facebook’s Big Data Storage