Sai Prasad Potharaju, 2 Shanmuk Srinivas A, 3 Ravi Kumar Tirandasu
1,2,3
SRES COE,Department of Computer Engineering ,
Kopargaon,Maharashtra, India
1
psaiprasadcse@gmail.com
Abstract
Hadoop is a framework of tools. It is not a software that you can download on your computer. These tools are used to running applications on big data which has huge in capacity,need to process quickly and can be in variety forms. To manage the big data HIVE used as a data warehouse system for Hadoop that facilitates ad-hoc queries and the analysis of large datasets stored in Hadoop .Hive provides a SQL-LIKE languages called HIVEQL. In this paper we explains how to use hive using Hadoop with a simple real time example and also explained how to create a table,load the data into table from external file ,retrieve the data from table and their different statistics like CPU time for each stage of query execution ,cumulative
CPU time and time taken to fetch records.
Key Words:Hadoop,Hive,MapReduce,HDFS,HIVEQL
1.
1.1.
INTRODUCTION
Hadoop
Hadoop is a open source and is distributed under Apache license. It is a framework of tools and not a software that you can download. These tools are used to running applications on big data .Big data means data with respective to its volume, speed, variety forms(Unstructured).In traditional approach big data is processed by using powerful computer but this computer will do good job until some limit,because computer is not scalable. It process according to its processor(core) type and speed,memory capacity.
Hadoop takes different approach (Fig 1)than its traditional approach. It breaks the data into smaller pieces .
Breaking the data into smaller pieces is good idea but how about computation?
Computation is also broken into pieces i.e processed into different levels/nodes and