Skip to main content

What is BIG DATA ANALYSIS ?

Big data analytics is the process of examining large and varied data sets -- i.e., big data(Black box Data, Social media Data, Stock Exchange Data, Search Engine Data, Transport Data, Power grid Data) -- to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful information that can help organizations make more-informed business decisions. Volume, Variety and Velocity are the three V’s of Big Data.

A person involved in this kind-a-job is called to be a “Data Analyst”. And to receive this tag by large group of people he/she needs to be super good at statistics and if accompanied by software developing skills he/she would be called as “Data Scientist”.

How to be a renowned and efficient data scientist?

Technical Skills that the person needs to be good at are:
  • Statistical methods and packages (e.g. SPSS)
  • R and/or SAS languages
  • Data warehousing and business intelligence platforms
  • SQL databases and database querying languages
  • Programming (e.g. XML, Javascript or ETL frameworks)
  • Database design
  • Data mining
  • Data cleaning and munging
  • Data visualization and reporting techniques
  • Working knowledge of Hadoop & MapReduce
  • Machine learning techniques

Weapons they need to be aware of:

And when we talk about big data analytics, hadoop is a thing we can’t afford to miss. Well Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware using map-reduce algorithm. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs. It is currently managed by Apache Software Foundation, a global community of software developers and contributers.
Currently, four core modules are included in the basic framework from the Apache Foundation:
Hadoop Common – the libraries and utilities used by other Hadoop modules.
Hadoop Distributed File System (HDFS) – the Java-based scalable system that stores data across multiple machines without prior organization.
YARN – (Yet Another Resource Negotiator) provides resource management for the processes running on Hadoop.
MapReduce – a parallel processing software framework. It is comprised of two steps. Map step is a master node that takes inputs and partitions them into smaller subproblems and then distributes them to worker nodes. After the map step has taken place, the master node takes the answers to all of the subproblems and combines them to produce output.
Operational Data: “NoSQL” database provides a mechanism and storage for data.
Mango DB and Cassandra are the examples of databases based on NoSQL.(Both free and open-source).
Jaspersoft BI Suite: It can generate reports form the databases.
Karmasphere studio and analyst :It’s an IDE which makes creating and running Hadoop jobs easy. The tools provided by it like, Karmasphere Analyst, which is designed to simplify the process of plowing through all of the data in a Hadoop cluster.It has subroutines for uncompressing Zipped log files. Then it strings them together and parameterizes the Hive calls to produce a table of output for perusing.
And there are many other tools which you can put to your help according to your convenience.
Remember it’s just not about learning them, the application is more important and that depends on your own creativity and thinking skills. And the self-help tutorials won’t be teaching you the secret sauce like which statistics to consider and which statistics can be ignored, So stay in touch with some analytics professionals (and now-a-days with flooded social media platforms over the internet it is possible!).

Want to learn it? checkout these links:
(If there exist any other helpful resource not mentioned here please write in the comment section.)







Comments

  1. Contents are very simplified with a brief description. This will provide a great overview for absolute beginners.

    ReplyDelete

Post a Comment

Popular posts from this blog

IoT: The Magic Wand Technology.

Just imagine how exciting and amazing this w orld would be if everything around us is connected, it would not only save our time but also reduce our unnecessary tensions and stress. And since human-kind has evolved and developed many-folds, thus, we have a super-power technology to turn this into reality and that’s termed as “ IoT ” or “ Internet of Things ” - the connection of Physical World with the Cyber World . Sounds fascinating, isn’t it? Now, let us see how it works in reality . The Cyber-Physical system is configured or designed by following these functions: ·         Connection level : It involves attachment-free or wireless communication, monitoring and recording the physical conditions of the environment using sensors and organizing them at a central location. ·         Conversion level (Data-to-information): Involves data-correlation, tracking of machine-failures, malfunctions, reduces dow...

All you want to know about UX/UI designers.

Have you ever imagined how do they build such beautiful web-pages and so much user friendly websites, or you yourself have dreamt of developing one of your own? Then you must be aware of UX/UI designing and learning about its applications would give you a complete idea of what you want to build and how would it look like in the real world. source: Google Images UX Design refers to the term  User Experience Design , while UI Design stands for  User Interface Design . Both elements are crucial to a product and work closely together. But despite their professional relationship, the roles themselves are quite different, referring to very different parts of the process  and the design discipline. Both of them have a plain goal of making the user interaction simpler and efficient . The User Experience Design or UX Design is more or less non-technical, as it involves content-development, prototyping, analysis and iteration, basically the interaction with the users ...