Skip to main content

Big Data Overview

It is a data age now. Data is everywhere, like that famous line from the poem ‘The Rime of the Ancient Mariner’, water water everywhere but not a drop to drink. Thank God, we are driving value from that data…yes it’s drinkable. Internet accounts most of the database that we see today. It is unstoppable kind of entity and now looks indispensable for our lives now. Well, in this post, we are going to see, if not all, some aspects of Big Data.

Let’s begin with some of its common features:
  • Due to variety it is segmented and cumulative.
  • It does not facilitate direct decisions. You got to drive value for informed decision making.
  • It comes in many forms and is not a substitute for structured data. It is further classified into three types of data: structured, semi, and unstructured.
  • Generally, it is wide with hundreds of fields.
  • It is kind of unstoppable and versatile as it is created every day.
  • For business and organizations use, it is generated internally and externally.
  • It can be managed by database frameworks like Hadoop and Cassandra.
Big Data is characterized by six Vs, which are:

Volume denotes the huge scaling of data, ranging from terabytes to zettabytes. The impact of internet and social media resulted into the explosion of data, for this reason big data is sometimes also referred as digital data. Data has grown from gigabytes to terabytes, petabytes, exabytes and zettabytes. And the internet data is expected to exceed ten zettabytes in next ten years.

People involved into Big Data must go through the data size tables, as you can see that big data has been increasing so drastically that some new terms for data-size has been added, see the image.  

Velocity accounts for the streaming of data and the movement of large volumes of data at high speed. It refers to the speed at which the data grows. Today, it is impossible to imagine people without a gadget, as a result continuous data is being generated through their gadgets, like tablets, mobile, laptops, smart devices, etc.

The various sources of data are as depicted in the picture:

Due to the increase in the global customer base and transactions and interactions with customers, the data created within an organization is growing along with external data. The contributors to this data growth are as follows:
  • Web
  • Billing
  • ERP
  • Machine Data
  • Network Elements
  • Social Media
  • Surveys
Variety refers to managing the complexity of data in different structures, ranging from relational data to logs and raw text. It refers to the different types of data, including text, images, audio, video, XML, and HTML. The three types of data are:
  • Structured data: it is represented in tabular format. Example: MySQL database.
  • Semi-structured data: data that does not have a formal data model. Example XML files
  • Unstructured data: data that does not have a predefined data model. Example text files.
It refers to the truthfulness of data

It refers to the presentation of data in a graphical format.

It refers to the derived value of an organization from using big data…basically it is done by the big data analytics.

Industry-wise use of big data:
Every industry has some use for big data. Some of the big data use cases are as follows:

  • Retail Sector: explicitly used for affinity detection and performing market analysis.
  • Credit Card Companies: detect fraudulent purchases to guide customers. Examine loan history before handing out credit card, CIBIL history and all.
  • Banks: Examine customer data before giving loan.
  • Medical Diagnostics: diagnose patient's illness based on symptoms.
  • Digital Marketing: find effective marketing channels.
  • Insurance Companies: to make policies and calculate premiums.
  • Manufacturing Units and Oil Rigs: reduce risk of equipment failure.
  • Advertising: identify target audience
Big Data Analytics:
With the origin of big data analytics, complete sets of data can be used instead of sample data to conduct an analysis.

Big Data analytics help in:
  • Finding associations in data
  • Predicting future outcomes
  • Performing prescriptive analysis
  • Taking data-driven decisions
  • Increasing safety
  • Reducing maintenance cost
  • Prevent failures
Traditional technology can be compared with big data technology in the following ways:

Traditional Technology:
  • Limited scalability
  • Uses highly parallel processors
  • Data in one place
  • High-end hardware used
  • Uses storage technology, such as SAN

Big Data Technology:
  • Highly scalable (RDBMS as vertical and Non-relational as horizontal)
  • Uses distributed processing
  • Data is distributed
  • Commodity hardware used
  • Uses distributed data with data redundancy


Popular posts from this blog

Six, Five by Binary | Book Review

A few years ago I accidentally came across a novel by William Kent Krueger titled Ordinary Grace. Unaware of my expectations, it turned out to be the best crime cum detective novel I had ever read in my life. So, after that I read many more crime, suspense, and detective fictions, but every time I bring Ordinary Grace for comparison. And this time too with this new novel ‘Six, Five’ written by an Indian writer Binary (probably pen name).

It is a pretty daunting book with over 400 pages and it has unwelcoming cover. Having a boy and girl holding each other’s hand did not make the cover very appealing. Blurb indicates that all Sherlock Holmes fans must go through this book once. I picked up thinking I will be, at least for a week or so, routing through different locations, part of outer and underworld, spies, undercover agents, grumbling detectives, good men and evil men. Often with detective stories, you become a part of their world; instead they enter your world. Much to my surprise,…

Why is Python becoming a Trend among Data Scientists?

Internet technology has set the world on fire. New revolutions are always around the corner. But did you ever notice that nowadays new revolutions are mostly based on technology and driven by data. It is data that is being generated everywhere via the internet. So what’s big deal about it? Well, the data we get from Internet is big data. Websites, social media, servers and so on...all contribute for data. It is data that is driving the demand-supply chain that serves the human race. Since we have been generating humongous amount of data every day, we have data scientists who drive value from it, so that humans can lead life of meaning and purpose and of convenient.

We now got hunch that Python has something to do with big data and work profile of data scientists. Now let’s get back to the point and seek answers as why data scientists are loving languages like Python and R over the traditional programming languages.
Let the pictures below speak for them, as a picture speaks a thousan…

What is Apache Cassandra?