Skip to main content

Is it Possible to use MySQL for Big Data Analysis?

MySQL is a popular relational database (RDBMS) for web applications (think of Twitter) and many other applications. Well, here the question is about magnanimousness of MySQL. I mean is it possible to use MySQL, in many ways, for Big Data analysis? Even if you think ‘yes’, the problem lies in the floating definition of Big Data, as a result creating a halo of confusion for new learners. Well, MySQL is good for data mining and online analytics (OLAP). So, does it mean we can scale/analyze Big Data using MySQL? The answer is not definitive, well there are two possibilities, let’s see to.

Big Data being generated continuously and large in amount needs something bigger like Hadoop for storing, managing and batch processing. However, we can connect MySQL to Hadoop for importing and exporting RDBMS using Apache Sqoop. HDFS is used for storing data, and for analysis the data can be passed on to MySQL. For instance, raw metrics can be stored in HDFS, however, summarized data can be sent to MySQL for analysis.

Secondly, people thinking of storing big data as MySQL have little chances of success because with big data MySQL has its own limitations. Though sharding of MySQL is one option for big data storage, in sharding you can expand the nodes horizontally within the database. But still it doesn’t sound convincing as MySQL lacks a perfect parallel processing.

To have a better clarity over this, let’s understand the limitations of MySQL with Big Data.  

Poor Memory-focused Search Engine:
Data cache in RAM automatically grows large while storing huge chunks of data for any sort of application. At times, many query requests are being carried out by this RAM only, however MySQL lacks strong memory-focused search engine, thus delivering high and fast requests do not take place properly. This, in overall, hampers the performances at many instances.

MySQL is Incapable of dealing with Highly Volatile Data:
Remember online flash sales, there thousands of updates are made to keep audience well informed. Here data operates at a fast rate, and maintaining exact values is critical to overall success of a campaign. Well, with MySQL this is not possible as it is designed around transactional semantics with support for long transactions along with durability. Here data is safe, not overwritten, but it does not process rapidly.

Full Text Searches:
When it comes handling full text searches, MySQL slips off the podium because of its inability to handle parallel processing. Therefore, upon increasing the data volume, full text searches take backseat.

High Volume Data:
When MySQL was built it was on the basis of single node, unaware of modern data centers technology. Today if you wished to utilize MySQL for storing high volume data with one node, well that’s not possible in good spirit. You have to resort to sharding, which is to an extent a manual procedure, thus affects overall applications.

Clearly, MySQL can be a great RDBMS, but not a fit for Big Data scaling. Big Data is something big; you may need bigger arrangements to play with it.


Popular posts from this blog

Six, Five by Binary | Book Review

A few years ago I accidentally came across a novel by William Kent Krueger titled Ordinary Grace. Unaware of my expectations, it turned out to be the best crime cum detective novel I had ever read in my life. So, after that I read many more crime, suspense, and detective fictions, but every time I bring Ordinary Grace for comparison. And this time too with this new novel ‘Six, Five’ written by an Indian writer Binary (probably pen name).

It is a pretty daunting book with over 400 pages and it has unwelcoming cover. Having a boy and girl holding each other’s hand did not make the cover very appealing. Blurb indicates that all Sherlock Holmes fans must go through this book once. I picked up thinking I will be, at least for a week or so, routing through different locations, part of outer and underworld, spies, undercover agents, grumbling detectives, good men and evil men. Often with detective stories, you become a part of their world; instead they enter your world. Much to my surprise,…

Why is Python becoming a Trend among Data Scientists?

Internet technology has set the world on fire. New revolutions are always around the corner. But did you ever notice that nowadays new revolutions are mostly based on technology and driven by data. It is data that is being generated everywhere via the internet. So what’s big deal about it? Well, the data we get from Internet is big data. Websites, social media, servers and so on...all contribute for data. It is data that is driving the demand-supply chain that serves the human race. Since we have been generating humongous amount of data every day, we have data scientists who drive value from it, so that humans can lead life of meaning and purpose and of convenient.

We now got hunch that Python has something to do with big data and work profile of data scientists. Now let’s get back to the point and seek answers as why data scientists are loving languages like Python and R over the traditional programming languages.
Let the pictures below speak for them, as a picture speaks a thousan…

What is Apache Cassandra?