<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Analyzing Apache logs with Pig</title>
	<atom:link href="http://www.cloudera.com/blog/2009/06/analyzing-apache-logs-with-pig/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.cloudera.com/blog/2009/06/analyzing-apache-logs-with-pig/</link>
	<description>Hadoop and Cloudera&#039;s Products and Services</description>
	<lastBuildDate>Fri, 10 Feb 2012 20:11:24 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: Dam</title>
		<link>http://www.cloudera.com/blog/2009/06/analyzing-apache-logs-with-pig/comment-page-1/#comment-16339</link>
		<dc:creator>Dam</dc:creator>
		<pubDate>Fri, 16 Sep 2011 15:29:09 +0000</pubDate>
		<guid isPermaLink="false">http://www.cloudera.com/blog/?p=808#comment-16339</guid>
		<description>for my shiped script I build an auto extract bash script done this way (beside geostream.pl and its dependances)
wrapper.sh.template ( 755 mode ) :
#!/bin/bash
sed &#039;0,/^__ARCHIVE_BELOW__$/d&#039; $0 &#124; tar xj
export PERL5LIB=$PERL5LIB:$(pwd)
./geostream.pl $1
rm -rf ./*
exit 0

__ARCHIVE_BELOW__

And then I build my script with this command :
cp wrapper.sh.template wrapper.sh; tar cjf - Geo geostream.pl &gt;&gt; wrapper.sh

then I can use it in my pig script :
DEFINE iplookup `wrapper.sh $GEO` 
ship (&#039;wrapper.sh&#039;) 
cache(&#039;/GeoIP/$GEO#$GEO&#039;);

The data file /GeoIP/GeoLiteCity.dat is stored on my hdfs storage. and copied with cache.</description>
		<content:encoded><![CDATA[<p>for my shiped script I build an auto extract bash script done this way (beside geostream.pl and its dependances)<br />
wrapper.sh.template ( 755 mode ) :<br />
#!/bin/bash<br />
sed &#8217;0,/^__ARCHIVE_BELOW__$/d&#8217; $0 | tar xj<br />
export PERL5LIB=$PERL5LIB:$(pwd)<br />
./geostream.pl $1<br />
rm -rf ./*<br />
exit 0</p>
<p>__ARCHIVE_BELOW__</p>
<p>And then I build my script with this command :<br />
cp wrapper.sh.template wrapper.sh; tar cjf &#8211; Geo geostream.pl &gt;&gt; wrapper.sh</p>
<p>then I can use it in my pig script :<br />
DEFINE iplookup `wrapper.sh $GEO`<br />
ship (&#8216;wrapper.sh&#8217;)<br />
cache(&#8216;/GeoIP/$GEO#$GEO&#8217;);</p>
<p>The data file /GeoIP/GeoLiteCity.dat is stored on my hdfs storage. and copied with cache.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: androm</title>
		<link>http://www.cloudera.com/blog/2009/06/analyzing-apache-logs-with-pig/comment-page-1/#comment-16149</link>
		<dc:creator>androm</dc:creator>
		<pubDate>Mon, 11 Jul 2011 21:02:30 +0000</pubDate>
		<guid isPermaLink="false">http://www.cloudera.com/blog/?p=808#comment-16149</guid>
		<description>Hi I was able to run this in local mode; but when I tried to run this in hadoop mode, I received the error msg &quot;ERROR 2055: Received Error while processing the map plan: &#039;ipwrapper.sh GeoLiteCity.dat &#039; failed with exit status: 2&quot;

Any ideas? Thanks!</description>
		<content:encoded><![CDATA[<p>Hi I was able to run this in local mode; but when I tried to run this in hadoop mode, I received the error msg &#8220;ERROR 2055: Received Error while processing the map plan: &#8216;ipwrapper.sh GeoLiteCity.dat &#8216; failed with exit status: 2&#8243;</p>
<p>Any ideas? Thanks!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mike</title>
		<link>http://www.cloudera.com/blog/2009/06/analyzing-apache-logs-with-pig/comment-page-1/#comment-15222</link>
		<dc:creator>Mike</dc:creator>
		<pubDate>Tue, 25 Jan 2011 22:58:47 +0000</pubDate>
		<guid isPermaLink="false">http://www.cloudera.com/blog/?p=808#comment-15222</guid>
		<description>What&#039;s the status of &quot;org.apache.pig.piggybank.filtering&quot; ? I don&#039;t see it yet in piggybank, as of release 0.8</description>
		<content:encoded><![CDATA[<p>What&#8217;s the status of &#8220;org.apache.pig.piggybank.filtering&#8221; ? I don&#8217;t see it yet in piggybank, as of release 0.8</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Pig to analyze Apache Logs &#124; user&#39;s Blog!</title>
		<link>http://www.cloudera.com/blog/2009/06/analyzing-apache-logs-with-pig/comment-page-1/#comment-14047</link>
		<dc:creator>Pig to analyze Apache Logs &#124; user&#39;s Blog!</dc:creator>
		<pubDate>Thu, 14 Oct 2010 12:39:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.cloudera.com/blog/?p=808#comment-14047</guid>
		<description>[...] is an interesting blog post on Cloudera with title Analyzing Apache logs with Pig. The author starts with how to install pig on Ubuntu and goes on to create a world map to visualize [...]</description>
		<content:encoded><![CDATA[<p>[...] is an interesting blog post on Cloudera with title Analyzing Apache logs with Pig. The author starts with how to install pig on Ubuntu and goes on to create a world map to visualize [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: tecosystems &#187; A Non-Exhaustive Look at Some Free and Non-Free Web Analytics Packages</title>
		<link>http://www.cloudera.com/blog/2009/06/analyzing-apache-logs-with-pig/comment-page-1/#comment-7757</link>
		<dc:creator>tecosystems &#187; A Non-Exhaustive Look at Some Free and Non-Free Web Analytics Packages</dc:creator>
		<pubDate>Wed, 02 Dec 2009 20:38:19 +0000</pubDate>
		<guid isPermaLink="false">http://www.cloudera.com/blog/?p=808#comment-7757</guid>
		<description>[...] Instead we have pageviews and all of the other metrics with which to draw imperfect conclusions. I expect this to be a temporary blindness, however: too many people want these kinds of answers for the questions to go unanswered for long. And if we can&#8217;t get the answers we need from someone else, we just might take matters into our own hands using tools like Hadoop and Pig. [...]</description>
		<content:encoded><![CDATA[<p>[...] Instead we have pageviews and all of the other metrics with which to draw imperfect conclusions. I expect this to be a temporary blindness, however: too many people want these kinds of answers for the questions to go unanswered for long. And if we can&#8217;t get the answers we need from someone else, we just might take matters into our own hands using tools like Hadoop and Pig. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joshua Barratt</title>
		<link>http://www.cloudera.com/blog/2009/06/analyzing-apache-logs-with-pig/comment-page-1/#comment-2765</link>
		<dc:creator>Joshua Barratt</dc:creator>
		<pubDate>Sun, 28 Jun 2009 18:35:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.cloudera.com/blog/?p=808#comment-2765</guid>
		<description>Great article, have been looking at needing to process web logs on a large scale already and this is a very useful example.

For the &#039;packing and shipping a perl script&#039; part, check out Par::Packer.

It allows you to turn a perl script and all it&#039;s dependencies into a single monolithic script file.</description>
		<content:encoded><![CDATA[<p>Great article, have been looking at needing to process web logs on a large scale already and this is a very useful example.</p>
<p>For the &#8216;packing and shipping a perl script&#8217; part, check out Par::Packer.</p>
<p>It allows you to turn a perl script and all it&#8217;s dependencies into a single monolithic script file.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: a little pig helping make me famous &#171; spack++</title>
		<link>http://www.cloudera.com/blog/2009/06/analyzing-apache-logs-with-pig/comment-page-1/#comment-2412</link>
		<dc:creator>a little pig helping make me famous &#171; spack++</dc:creator>
		<pubDate>Fri, 19 Jun 2009 03:19:48 +0000</pubDate>
		<guid isPermaLink="false">http://www.cloudera.com/blog/?p=808#comment-2412</guid>
		<description>[...] there you go. &#194;&#160;Recently, someone ported my stuff (which was awesome!), and folks at cloudera are blogging about [...]</description>
		<content:encoded><![CDATA[<p>[...] there you go. &#194;&#160;Recently, someone ported my stuff (which was awesome!), and folks at cloudera are blogging about [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David Gwartney</title>
		<link>http://www.cloudera.com/blog/2009/06/analyzing-apache-logs-with-pig/comment-page-1/#comment-2397</link>
		<dc:creator>David Gwartney</dc:creator>
		<pubDate>Thu, 18 Jun 2009 19:30:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.cloudera.com/blog/?p=808#comment-2397</guid>
		<description>This is great information. Its answered my question regarding streaming inside PIG that came up the other day.

Thanks for the info.

Dave</description>
		<content:encoded><![CDATA[<p>This is great information. Its answered my question regarding streaming inside PIG that came up the other day.</p>
<p>Thanks for the info.</p>
<p>Dave</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Analysing Apache logs &#171; Dumbotics</title>
		<link>http://www.cloudera.com/blog/2009/06/analyzing-apache-logs-with-pig/comment-page-1/#comment-2382</link>
		<dc:creator>Analysing Apache logs &#171; Dumbotics</dc:creator>
		<pubDate>Thu, 18 Jun 2009 12:03:03 +0000</pubDate>
		<guid isPermaLink="false">http://www.cloudera.com/blog/?p=808#comment-2382</guid>
		<description>[...] Apache&#160;logs  The Cloudera guys blogged about using Pig for examining Apache logs yesterday. Although it nicely illustrates several [...]</description>
		<content:encoded><![CDATA[<p>[...] Apache&nbsp;logs  The Cloudera guys blogged about using Pig for examining Apache logs yesterday. Although it nicely illustrates several [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>

