<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Abhishek Tiwari&#039;s Technology Blog &#187; Hive</title>
	<atom:link href="http://www.atiwari.com/articles/tag/hive/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.atiwari.com/articles</link>
	<description>My thoughts and opinion about the life and technology</description>
	<lastBuildDate>Mon, 03 Oct 2011 22:43:44 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
		<item>
		<title>Apache Hive Installation</title>
		<link>http://www.atiwari.com/articles/2009/08/apache-hive-installation/</link>
		<comments>http://www.atiwari.com/articles/2009/08/apache-hive-installation/#comments</comments>
		<pubDate>Mon, 10 Aug 2009 12:11:32 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Distributed Computing]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[Hive]]></category>
		<category><![CDATA[Hive Installation]]></category>
		<category><![CDATA[Hive running on Single Node Cluster Hadoop]]></category>
		<category><![CDATA[Running Hive]]></category>

		<guid isPermaLink="false">http://www.atiwari.com/articles/?p=48</guid>
		<description><![CDATA[Hive is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files. It provides a mechanism to put structure on this data and it also provides a simple query language called Hive QL which is based [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://hadoop.apache.org/hive/"><img class="alignright" src="http://hadoop.apache.org/hive/images/hive_logo_medium.jpg" alt="" width="189" height="81" /></a></p>
<p><a href="http://hadoop.apache.org/hive/">Hive</a> is a data warehouse infrastructure built on top of <a href="http://hadoop.apache.org/">Hadoop</a> that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files. It provides a mechanism to put structure on this data and it also provides a simple query language called Hive QL which is based on SQL and which enables users familiar with SQL to query this data. At the same time, this language also allows traditional map/reduce programmers to be able to plug in their custom mappers and reducers to do more sophisticated analysis which may not be supported by the built-in capabilities of the language.</p>
<p>Installation of Hive is pretty straigtforward and easy. With least chit-chatting, I will get to business for ya!</p>
<h2>Prerequisites</h2>
<h3>Sun Java 6</h3>
<p>Hadoop requires Sun Java 5.0.x. However, Hive wiki mentions a prerequisite of Sun Java 6.0. Thus we will stick to Sun Java 6.0</p>
<h3>Hadoop (0.17.x &#8211; 0.19.x)</h3>
<p>We must have Hadoop already up and running (support for 0.20.x is still under progress &#8211; so 0.17.x to 0.19.x is preferable)! If you don&#8217;t have Hadoop already installed for you, try and deploy it by going through the following tutorials:</p>
<p>Single Node Cluster Hadoop Installation: <a href="http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)">http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)</a><br />
Multi Node Cluster Hadoop Installation: <a href="http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)">http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)</a></p>
<p>I would have written a guide for Hadoop installation, but I really find Michael&#8217;s tutorial very cool for anyone to follow and get along with Hadoop! So if you havent installed Hadoop, thats the place to learn and do it fellas!</p>
<p>Note:<br />
a) For this tutorial purpose, we will be referring to a Single Node Hadoop installation</p>
<h3>SVN</h3>
<p>SVN aka Subversion is an open source version control system. Most of the apache projects are hosted over SVN. Thus, its a good idea to have it on your system if not already.</p>
<p>For the current tutorial, you will need it to grab the code out of Hive SVN Repository</p>
<p>Download it from: <a href=" http://subversion.tigris.org/">http://subversion.tigris.org/</a></p>
<h3>Ant</h3>
<p>Ant or Apache Ant is a Java-based build tool. In present context, you will need it to build the &#8216;checked out&#8217; Hive code.</p>
<p>Download it from: <a href="http://ant.apache.org/">http://ant.apache.org/</a></p>
<h2>Downloading and Building Hive</h2>
<p>Hive is available via SVN at: <a href="http://svn.apache.org/repos/asf/hadoop/hive/trunk">http://svn.apache.org/repos/asf/hadoop/hive/trunk</a></p>
<h3>We will first checkout Hive&#8217;s code</h3>
<blockquote><p>svn co http://svn.apache.org/repos/asf/hadoop/hive/trunk hive</p></blockquote>
<p>This will put Hive trunk&#8217;s content (Hive&#8217;s development repository) in your local &#8216;hive&#8217; directory</p>
<h3>Now, we will build the downloaded code</h3>
<blockquote><p>cd hive</p>
<p>ant -Dhadoop.version=&#8221;&lt;your-hadoop-version&gt;&#8221; package</p></blockquote>
<p><strong>For example</strong></p>
<blockquote><p>ant -Dhadoop.version=&#8221;0.19.2&#8243; package</p></blockquote>
<h3>Your built code is now in build/dist directory</h3>
<blockquote><p>cd build/dist<br />
ls</p></blockquote>
<p>On &#8216;ls&#8217; you will see the following content:</p>
<blockquote><p>README.txt<br />
bin/ (all the shell scripts)<br />
lib/ (required jar files)<br />
conf/ (configuration files)<br />
examples/ (sample input and query files)</p></blockquote>
<p>The &#8220;build/dist/&#8221; directory is your Hive Installation and moving further we are going to call it Hive Home.</p>
<h3>Let us set an environment variable for our Hive Home too:</h3>
<blockquote><p>export HIVE_HOME=&lt;some path&gt;/build/dist</p></blockquote>
<p><strong><strong>For example</strong></strong></p>
<blockquote><p>export HIVE_HOME=/data/build/dist</p></blockquote>
<h2>Hadoop Side Changes<strong><br />
</strong></h2>
<h3><strong>Hive uses hadoop that means:<br />
</strong></h3>
<p>1. you must have hadoop in your path OR<br />
2. export HADOOP_HOME=&lt;hadoop-install-dir&gt;</p>
<p><strong>In addition, you must create /tmp and /user/hive/warehouse (aka hive.metastore.warehouse.dir) and set them chmod g+w in HDFS before a table can be created in Hive.</strong></p>
<h3><strong>Commands to perform these changes<br />
</strong></h3>
<blockquote><p>$HADOOP_HOME/bin/hadoop fs -mkdir       /tmp<br />
$HADOOP_HOME/bin/hadoop fs -mkdir       /user/hive/warehouse<br />
$HADOOP_HOME/bin/hadoop fs -chmod g+w   /tmp<br />
$HADOOP_HOME/bin/hadoop fs -chmod g+w   /user/hive/warehouse</p></blockquote>
<h2>Running Hive</h2>
<p><strong>Now, you are all set to run Hive for yourself! Invoke the command line interface (cli) from the shell:</strong></p>
<blockquote><p>$HIVE_HOME/bin/hive</p></blockquote>
<p><strong>Note: You can also read about <a href="http://wiki.apache.org/hadoop/Hive/GettingStarted">Hive Installation at Hive&#8217;s wiki</a></strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.atiwari.com/articles/2009/08/apache-hive-installation/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>

