In this article, we will have a quick introduction of what is Big Data and Hadoop. Learn how it helps to solve complex data processing and give meaningful results. At the end of this article, we put a link on how to install Hadoop in Linux.
Big Data is any data that is too large or too complex to process by traditional data processing software. We commonly store our data using transactional databases like Oracle or MySQL. This allows us to see data in a tabular fashion. In the 90s and early 2000s, this is great as we haven’t had that much information and we mostly store text or simple data information.
Now, information has evolved. Different types of data eg. image, sound, etc., are transferred, analyzed, and is stored for future use. In 2018 alone, 2.5 quintillion bytes of data are produced every day. With that, the traditional databases could not store or analyze this data in its data warehouse. That’s when the big data concept was introduced.
Watch this video from Simplilearn to get a real-world example of how big data helps in everyday work.
After learning what is Big Data, let’s proceed in understanding Apache Hadoop and how it helps with storing our Big Data.
From Apache Hadoop site, they define Hadoop as a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.
For example, you have a 1GB file and if you will store it in Hadoop, Hadoop can slice this file into multiple chunks and store it in different machines. In this manner, you can scale easily by adding machines to your Hadoop cluster in case you are running out of space.
Hadoop also has replication. It can replicate your data in different machines so that in case a machine broke down, your data is still available and can be pulled.
Internally, Hadoop uses HDFS (Hadoop Distributed File System), MapReduce, and YARN to be able to process and store your data. Watch the video here from Simplilearn to have an example of what is Hadoop.
The best way to get familiar with this technology is by experiencing it yourself. In our next tutorial, we will be showing you how to install a single-node Hadoop in a Linux Virtual Box machine.