Skip to main content

Big Data Overview


It is a data age now. Data is everywhere, like that famous line from the poem ‘The Rime of the Ancient Mariner’, water water everywhere but not a drop to drink. Thank God, we are driving value from that data…yes it’s drinkable. Internet accounts most of the database that we see today. It is unstoppable kind of entity and now looks indispensable for our lives now. Well, in this post, we are going to see, if not all, some aspects of Big Data.

Let’s begin with some of its common features:
  • Due to variety it is segmented and cumulative.
  • It does not facilitate direct decisions. You got to drive value for informed decision making.
  • It comes in many forms and is not a substitute for structured data. It is further classified into three types of data: structured, semi, and unstructured.
  • Generally, it is wide with hundreds of fields.
  • It is kind of unstoppable and versatile as it is created every day.
  • For business and organizations use, it is generated internally and externally.
  • It can be managed by database frameworks like Hadoop and Cassandra.
Big Data is characterized by six Vs, which are:

Volume:
Volume denotes the huge scaling of data, ranging from terabytes to zettabytes. The impact of internet and social media resulted into the explosion of data, for this reason big data is sometimes also referred as digital data. Data has grown from gigabytes to terabytes, petabytes, exabytes and zettabytes. And the internet data is expected to exceed ten zettabytes in next ten years.



People involved into Big Data must go through the data size tables, as you can see that big data has been increasing so drastically that some new terms for data-size has been added, see the image.  



Velocity:
Velocity accounts for the streaming of data and the movement of large volumes of data at high speed. It refers to the speed at which the data grows. Today, it is impossible to imagine people without a gadget, as a result continuous data is being generated through their gadgets, like tablets, mobile, laptops, smart devices, etc.

The various sources of data are as depicted in the picture:

 
Due to the increase in the global customer base and transactions and interactions with customers, the data created within an organization is growing along with external data. The contributors to this data growth are as follows:
  • Web
  • Billing
  • ERP
  • Machine Data
  • Network Elements
  • Social Media
  • Surveys
Variety:
Variety refers to managing the complexity of data in different structures, ranging from relational data to logs and raw text. It refers to the different types of data, including text, images, audio, video, XML, and HTML. The three types of data are:
  • Structured data: it is represented in tabular format. Example: MySQL database.
  • Semi-structured data: data that does not have a formal data model. Example XML files
  • Unstructured data: data that does not have a predefined data model. Example text files.
Veracity:
It refers to the truthfulness of data

Visualization:
It refers to the presentation of data in a graphical format.

Value:
It refers to the derived value of an organization from using big data…basically it is done by the big data analytics.

Industry-wise use of big data:
Every industry has some use for big data. Some of the big data use cases are as follows:

  • Retail Sector: explicitly used for affinity detection and performing market analysis.
  • Credit Card Companies: detect fraudulent purchases to guide customers. Examine loan history before handing out credit card, CIBIL history and all.
  • Banks: Examine customer data before giving loan.
  • Medical Diagnostics: diagnose patient's illness based on symptoms.
  • Digital Marketing: find effective marketing channels.
  • Insurance Companies: to make policies and calculate premiums.
  • Manufacturing Units and Oil Rigs: reduce risk of equipment failure.
  • Advertising: identify target audience
Big Data Analytics:
With the origin of big data analytics, complete sets of data can be used instead of sample data to conduct an analysis.

Big Data analytics help in:
  • Finding associations in data
  • Predicting future outcomes
  • Performing prescriptive analysis
  • Taking data-driven decisions
  • Increasing safety
  • Reducing maintenance cost
  • Prevent failures
Traditional technology can be compared with big data technology in the following ways:

Traditional Technology:
  • Limited scalability
  • Uses highly parallel processors
  • Data in one place
  • High-end hardware used
  • Uses storage technology, such as SAN

Big Data Technology:
  • Highly scalable (RDBMS as vertical and Non-relational as horizontal)
  • Uses distributed processing
  • Data is distributed
  • Commodity hardware used
  • Uses distributed data with data redundancy

Comments

Popular posts from this blog

Why is Python becoming a Trend among Data Scientists?

Internet technology has set the world on fire. New revolutions are always around the corner. But did you ever notice that nowadays new revolutions are mostly based on technology and driven by data. It is data that is being generated everywhere via the internet. So what’s big deal about it? Well, the data we get from Internet is big data. Websites, social media, servers and so on...all contribute for data. It is data that is driving the demand-supply chain that serves the human race. Since we have been generating humongous amount of data every day, we have data scientists who drive value from it, so that humans can lead life of meaning and purpose and of convenient.


We now got hunch that Python has something to do with big data and work profile of data scientists. Now let’s get back to the point and seek answers as why data scientists are loving languages like Python and R over the traditional programming languages.
Let the pictures below speak for them, as a picture speaks a thousan…

What Topics in Python Should You learn for Data Analysis?

First off, understand there is difference between developing full-fledged software and doing data analysis using Python as a programming language. Clearly, here your aim is to do data analysis using Python, so learning Python becomes imperative for you. Right? Well, most of the people new to ‘big data’ and ‘data science’ go pell-mell, as they do not know where the correct essence of learning lies. They think that learning Python from A to Z will make them smarter, may be it can, but that's too much time consuming. As a new aspirant, you should be able to make out as what you should exactly learn for doing data analysis using Python.

In this post, we will go through the most-likely path which will make you self-confident in Python and subsequently in data analysis.

Step 1 - Basics:
Your learning process starts with rudimentary knowledge. Learning resources for general are different than selected learning. So, be it anything, you must learn the basics involved in Python. To learn…

What is Apache Cassandra?