<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-2803979945699110663</id><updated>2011-11-27T16:39:48.729-08:00</updated><category term='maven'/><category term='hadoop'/><title type='text'>Maven Plugin for hadoop</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://maven-hadoop.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2803979945699110663/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://maven-hadoop.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Kay Kay</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>2</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-2803979945699110663.post-5680648159392780449</id><published>2010-02-26T21:17:00.001-08:00</published><updated>2010-02-26T21:18:19.415-08:00</updated><title type='text'>Maven for Hadoop - 0.20.1</title><content type='html'>&lt;div xmlns='http://www.w3.org/1999/xhtml'&gt;The previous version 0.20.0 was the first cut version that was released. There was a nasty bug , where only the first-level dependencies were pulled up, but the transitive dependencies of the dependency graph were not pulled by the hadoop:pack goal . &lt;br /&gt;&lt;br /&gt;That should be fixed in 0.20.1 , published since then to central maven repository.  As before use the following plugin to use the same. &lt;br /&gt;&lt;br /&gt;&lt;pre class='brush: xml'&gt;&amp;lt;plugin&amp;gt; &lt;br /&gt;&amp;lt;groupId&amp;gt;com.github.maven-hadoop.plugin&amp;lt;/groupId&amp;gt;&lt;br /&gt;&amp;lt;artifactId&amp;gt;maven-hadoop-plugin&amp;lt;/artifactId&amp;gt;&lt;br /&gt;&amp;lt;version&amp;gt;0.20.1&amp;lt;/version&amp;gt;&lt;br /&gt;&amp;lt;configuration&amp;gt;&lt;br /&gt;&amp;lt;hadoopHome&amp;gt;/opt/software/hadoop&amp;lt;/hadoopHome&amp;gt;&lt;br /&gt;&amp;lt;/configuration&amp;gt;&lt;br /&gt;&amp;lt;/plugin&amp;gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class='zemanta-pixie'&gt;&lt;img src='http://img.zemanta.com/pixy.gif?x-id=958ad347-d1cd-819b-9ea2-ae6aacfbb1b6' alt='' class='zemanta-pixie-img'/&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2803979945699110663-5680648159392780449?l=maven-hadoop.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2803979945699110663/posts/default/5680648159392780449'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2803979945699110663/posts/default/5680648159392780449'/><link rel='alternate' type='text/html' href='http://maven-hadoop.blogspot.com/2010/02/maven-for-hadoop-0201.html' title='Maven for Hadoop - 0.20.1'/><author><name>Kay Kay</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-2803979945699110663.post-8751364248291296758</id><published>2010-01-29T16:37:00.000-08:00</published><updated>2010-02-08T12:49:09.559-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='maven'/><category scheme='http://www.blogger.com/atom/ns#' term='hadoop'/><title type='text'>Maven plugin for hadoop - 0.20.0 released</title><content type='html'>&lt;div xmlns='http://www.w3.org/1999/xhtml'&gt;As anyone working with M-R jobs in the Hadoop framework would have been familiar, the job jar is expected to be wrapped up as a single file to be submitted to the master ( JobTracker) which is responsible to propagate the same across to the slaves ( TaskTrackers) to perform the job.&lt;br /&gt;&lt;br /&gt;I used to use a glorified shell script that used to bind the project class files and the dependencies together to create it. In the Ant world - it is relatively simple to write the packing script manually , but mvn gets a little bit trickier other than using mvn exec:exec itself. And more increasingly, find mvn way too easier to bootstrap a project when compared to Ant , with all due respect to the latter. &lt;br /&gt;&lt;br /&gt;So - wrote this initial goal - pack , that creates a single jar file along with the resources of the project. &lt;br /&gt;&lt;br /&gt;The dependencies of the project are present in the ./lib directory of the jar , that M-R starts can read in the classpath of the job. The alternate option is to flatten out the jars and then stitch all of them together along with the project resources as well. Not exactly intuitive and comfortable. &lt;br /&gt;&lt;br /&gt;Warning: This comes with absolutely NO WARRANTY, whatsoever and is released under Apache License. The mojo is just a glorified script with minimal error checking done and has quite a lot of scope to be improved. So - use it at your own risk ! &lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Installation&lt;/h3&gt;&lt;br /&gt;&lt;pre class='brush: xml'&gt;&amp;lt;plugin&amp;gt; &lt;br /&gt;&amp;lt;groupId&amp;gt;com.github.maven-hadoop.plugin&amp;lt;/groupId&amp;gt;&lt;br /&gt;&amp;lt;artifactId&amp;gt;maven-hadoop-plugin&amp;lt;/artifactId&amp;gt;&lt;br /&gt;&amp;lt;version&amp;gt;0.20.0&amp;lt;/version&amp;gt;&lt;br /&gt;&amp;lt;configuration&amp;gt;&lt;br /&gt;&amp;lt;hadoopHome&amp;gt;/opt/software/hadoop&amp;lt;/hadoopHome&amp;gt;&lt;br /&gt;&amp;lt;/configuration&amp;gt;&lt;br /&gt;&amp;lt;/plugin&amp;gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Set hadoop Home as appropriate to the installation that you use.&lt;br /&gt;&lt;br /&gt;&lt;span style='font-weight: bold;'&gt;Usage&lt;/span&gt;:&lt;br /&gt;=====&lt;br /&gt;&lt;br /&gt;Currently a single goal is available called as pack that creates the jar file to be submitted to the hadoop job engine.&lt;br /&gt;&lt;br /&gt;&lt;pre class='brush: bash'&gt;$ mvn hadoop:pack&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The jar contains a directory called ./lib , that contains all the dependency artifacts of the current project in it, along with the classes of the current project itself.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The jar would be created in $basedir/target/hadoop-deploy/${ant.project.name}-hdeploy.jar  .&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Caveat:&lt;/b&gt; The dependencies of the current project copied to the lib in the jar file are a subset of project dependencies. To be more clear - they are the entire list of transitive dependencies, minus the hadoop + hadoop's transitive dependencies , since they are already present in the classpath , when hadoop RunJar is launched. &lt;br /&gt;&lt;br /&gt;So the jars in the lib directory of $basedir/target/hadoop-deploy/${ant.project.name}-hdeploy.jar ( A - B ), is the list of dependencies of the current project (A) minus the transitive dependencies of Hadoop (B) to avoid classpath pollution. &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Once created , this could be submitted to the hadoop jar engine&lt;br /&gt;&lt;br /&gt;&lt;pre class='brush: bash'&gt;$ $HADOOP_HOME/bin/hadoop jar  $basedir/target/hadoop-deploy/${ant.project.name}-hdeploy.jar job.launching.mainClass&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class='zemanta-pixie'&gt;&lt;img src='http://img.zemanta.com/pixy.gif?x-id=27899092-903d-8466-aad7-25846df8e338' alt='' class='zemanta-pixie-img'/&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2803979945699110663-8751364248291296758?l=maven-hadoop.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maven-hadoop.blogspot.com/feeds/8751364248291296758/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://maven-hadoop.blogspot.com/2010/01/maven-plugin-for-hadoop-0200-released.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2803979945699110663/posts/default/8751364248291296758'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2803979945699110663/posts/default/8751364248291296758'/><link rel='alternate' type='text/html' href='http://maven-hadoop.blogspot.com/2010/01/maven-plugin-for-hadoop-0200-released.html' title='Maven plugin for hadoop - 0.20.0 released'/><author><name>Kay Kay</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
