Big data
analytics is the process of examining large and varied data sets -- i.e.,
big data(Black box Data, Social media Data, Stock Exchange Data, Search Engine
Data, Transport Data, Power grid Data) -- to uncover hidden patterns, unknown
correlations, market trends, customer preferences and other useful information
that can help organizations make more-informed business decisions. Volume,
Variety and Velocity are the three V’s of Big Data.
A person involved in this kind-a-job is called to be a “Data Analyst”. And to receive this tag by large group of people he/she needs to be super good at statistics and if accompanied by software developing skills he/she would be called as “Data Scientist”.
How to be a renowned and efficient data scientist?
Technical Skills that the person needs to be good at are:
- Statistical methods and packages (e.g. SPSS)
- R and/or SAS languages
- Data warehousing and business intelligence
platforms
- SQL databases and database querying languages
- Programming (e.g. XML, Javascript or ETL
frameworks)
- Database design
- Data mining
- Data cleaning and munging
- Data visualization and reporting techniques
- Working knowledge of Hadoop & MapReduce
- Machine learning techniques
Weapons they need to be aware of:
And when
we talk about big data analytics, hadoop is a thing we can’t afford to miss.
Well Hadoop is an open-source
software framework for storing data and running applications on clusters of
commodity hardware using map-reduce algorithm. It provides massive storage for
any kind of data, enormous processing power and the ability to handle virtually
limitless concurrent tasks or jobs. It is currently managed by Apache Software
Foundation, a global community of software developers and contributers.
Currently, four core modules are
included in the basic framework from the Apache Foundation:
Hadoop Common – the
libraries and utilities used by other Hadoop modules.
Hadoop Distributed
File System (HDFS) – the Java-based scalable system that stores data
across multiple machines without prior organization.
YARN – (Yet
Another Resource Negotiator) provides resource management for the processes
running on Hadoop.
MapReduce – a parallel
processing software framework. It is comprised of two steps. Map step is a
master node that takes inputs and partitions them into smaller subproblems and
then distributes them to worker nodes. After the map step has taken place, the
master node takes the answers to all of the subproblems and combines them to
produce output.
Operational Data: “NoSQL”
database provides a mechanism and storage for data.
Mango DB and Cassandra are the
examples of databases based on NoSQL.(Both free and open-source).
Jaspersoft BI Suite: It can generate reports form the databases.
Karmasphere studio and analyst :It’s an IDE which
makes creating and running Hadoop jobs easy. The tools
provided by it like, Karmasphere Analyst, which is designed to simplify the
process of plowing through all of the data in a Hadoop cluster.It has
subroutines for uncompressing Zipped log files. Then it strings them together
and parameterizes the Hive calls to produce a table of output for perusing.
And there are many other tools which you can put to your help
according to your convenience.
Remember it’s just not about learning them, the application is more important and that depends on your own creativity and thinking skills. And the self-help tutorials won’t be teaching you the secret sauce like which statistics to consider and which statistics can be ignored, So stay in touch with some analytics professionals (and now-a-days with flooded social media platforms over the internet it is possible!).
Want to learn it? checkout these links:
https://www.analyticsvidhya.com/blog/2015/07/big-data-analytics-youtube-ted-resources/#
, https://www.coursera.org/learn/big-data-introduction
, https://www.ngdata.com/big-data-analysis-resources/
,
(If there exist any other helpful resource
not mentioned here please write in the comment section.)
Contents are very simplified with a brief description. This will provide a great overview for absolute beginners.
ReplyDeleteThanks
Delete