Friday, June 1, 2012

Big Data Analytics - Techniques and Trends …


   Making sense out of BIG DATA
Alright! Now we have got tonnes of information about Big Data. Question is, how do enterprises make sense out of it? So let us explore the various Data Analysis techniques that are either 1) most commonly used by companies across various industries or 2) relatively new but show strong growth potential in the near future.Through a series of posts, we will try to touch upon these techniques. The idea is to get familiarized with the buzzwords around Big Data.


Although there is a buzz around “Advanced Analytics” these days for Big Data analysis, researchers claim that they are mostly built upon the fundamentals of “Business Intelligence” or “BI” techniques, so barring all tweaks, customization and modifications at the moment, let us grasp the basics first.

BI encompasses a set of computer based methodologies that help analyze and report/present large amounts of ‘structured’ or ‘unstructured’ data. Is this something new? Apparently not, it has been used by businesses since long to support various business related activities like decision making, predictions, number crunching etc. Checkout this marketing video by a company called Avitas giving an idea of BI and the prospects: http://goo.gl/blKTe

However, the context in which these techniques are being utilized is changing - that is to analyze Big Data, which is just data after all!

Here are some known techniques under BI:

1. OLAP – Online Analytical Processing:
A data retrieval process used for structured databases more commonly known as Data ware houses. The major focus of this technique is to query or retrieve and effectively combine data from multiple sources or dimensions aggregated in a relational structure. Commonly used are the OLAP cubes, which combine, analyze and present data from 3 different sources. A typical data extraction would read like: - Sales of a company’s product x in region y for a period z which has been extracted from data sets for products (x,y,z), regions (x,y,z), periods (x,y,z).

2. Data Mining:
A methodology used to extract patterns from large datasets by combining methods from statistics and machine learning with database management. Examples of usage might include mining customer data to determine segments most likely to respond to an offer, mining human resources data to identify characteristics of most successful employees, or market basket analysis to model the purchase behavior of customers.

Further drilling into this category, following are certain methods which are used independently or in conjunction with one another to analyze data or in extension ‘Big Data’ -  

- Association rule learning
A technique for discovering interesting relationships, i.e., “association rules,” among variables in large databases based upon a set of algorithms. One application is market basket analysis, in which a retailer can determine which products are frequently bought together and use this information for marketing (a commonly cited example is the discovery that many supermarket shoppers who buy diapers also tend to buy beer. you can refer to the Forbes article about the IBM computing which brought about that discovery here - http://goo.gl/UNIFS

- Cluster analysis
A method for classifying objects from diverse groups into smaller groups of ‘seemingly’ similar objects whose characteristics of similarity are not known in advance. An example of cluster analysis is segmenting consumers into self-similar groups based on collective group behavior for targeted marketing. Example - recommending a customer in a movie which was bought/liked by another customer in the same group. It is almost in contrast to simple ‘classification’, up next!  

- Classification
This method identifies categories in which new data points belong, based on a training set containing data points that have already been categorized based on similar traits. One application is the prediction of segment-specific customer buying behavior where there is a clear hypothesis or objective outcome.

Dear avid readers! Considering the heaviness of the data dose being provided in this post, we have decided to use a common technique in providing the most sought after information effectively – (No it’s not related to Big Data!) It’s simply called providing a 'sequel'. So keep visiting to find the next one soon where we will talk a bit more about some other basic techniques and introduce the latest trends like Hadoop, Mashup, MapReduce in managing BIG DATA .…

Sources and references for detailed report and materials:
McKinsey report: http://goo.gl/ycvef
TDWI library reports on BigData: www.Tdwi.org



No comments:

Post a Comment