<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>PHP App Engine &#187; HadoopAndPig</title>
	<atom:link href="http://php-app-engine.com/category/hadoop-pig/feed/" rel="self" type="application/rss+xml" />
	<link>http://php-app-engine.com</link>
	<description>Clouds and NoSQL - by Smart Robot</description>
	<lastBuildDate>Tue, 07 Sep 2010 19:20:43 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.5</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>HADOOP FOR THE LONE ANALYST, WHY AND HOW</title>
		<link>http://php-app-engine.com/2010/hadoop-pig/hadoop-for-the-lone-analyst-why-and-how/</link>
		<comments>http://php-app-engine.com/2010/hadoop-pig/hadoop-for-the-lone-analyst-why-and-how/#comments</comments>
		<pubDate>Fri, 15 Jan 2010 08:21:03 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[HadoopAndPig]]></category>

		<guid isPermaLink="false">http://php-app-engine.com/?p=813</guid>
		<description><![CDATA[http://blog.tech.stylefeeder.com/2010/01/14/hadoop-for-the-lone-analyst/
Here at StyleFeeder, we spend a lot of time figuring out what our users are doing, and trying to figure out what they want. One of the tools we have brought to bear on these questions is Hadoop. Among the technical tools these days, Hadoop is like the prettiest girl in school, and it’s easy [...]]]></description>
		<wfw:commentRss>http://php-app-engine.com/2010/hadoop-pig/hadoop-for-the-lone-analyst-why-and-how/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Cloud9: Getting started with EC2</title>
		<link>http://php-app-engine.com/2010/how-to/cloud9-getting-started-with-ec2/</link>
		<comments>http://php-app-engine.com/2010/how-to/cloud9-getting-started-with-ec2/#comments</comments>
		<pubDate>Wed, 13 Jan 2010 23:53:48 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[HadoopAndPig]]></category>
		<category><![CDATA[HowTos]]></category>

		<guid isPermaLink="false">http://php-app-engine.com/?p=802</guid>
		<description><![CDATA[http://www.umiacs.umd.edu/~jimmylin/cloud9/docs/content/start-EC2.html
This tutorial will get you started with Cloud9 on Amazon&#8217;s EC2 (running the simple word count demo). For a gentler introduction to Hadoop, or if you don&#8217;t feel like experimenting with EC2, try my tutorial on getting started with Cloud9 in standalone mode. This tutorial assumes you&#8217;ve already downloaded Cloud9 and gotten it set up. [...]]]></description>
		<wfw:commentRss>http://php-app-engine.com/2010/how-to/cloud9-getting-started-with-ec2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Michael G. Noll &#8211; Hadoop and Python</title>
		<link>http://php-app-engine.com/2010/how-to/michael-g-noll-hadoop-and-python/</link>
		<comments>http://php-app-engine.com/2010/how-to/michael-g-noll-hadoop-and-python/#comments</comments>
		<pubDate>Wed, 13 Jan 2010 10:50:23 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[HadoopAndPig]]></category>
		<category><![CDATA[HowTos]]></category>

		<guid isPermaLink="false">http://php-app-engine.com/?p=798</guid>
		<description><![CDATA[http://www.michael-noll.com/wiki/Hadoop

    * Writing An Hadoop MapReduce Program In Python
    * Running Hadoop On Ubuntu Linux (Single-Node Cluster)
    * Running Hadoop On Ubuntu Linux (Multi-Node Cluster)
 
]]></description>
		<wfw:commentRss>http://php-app-engine.com/2010/how-to/michael-g-noll-hadoop-and-python/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hadoop and HBase in production</title>
		<link>http://php-app-engine.com/2010/hadoop-pig/hadoop-and-hbase-in-production/</link>
		<comments>http://php-app-engine.com/2010/hadoop-pig/hadoop-and-hbase-in-production/#comments</comments>
		<pubDate>Tue, 12 Jan 2010 23:23:44 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[HadoopAndPig]]></category>
		<category><![CDATA[noSQL]]></category>

		<guid isPermaLink="false">http://php-app-engine.com/?p=789</guid>
		<description><![CDATA[http://blog.readpath.com/2009/12/28/hadoop-and-hbase-in-production/
The personalized content scoring features of ReadPath depend on having a good measurement of term frequencies. So to support this, there is a dictionary of all of the terms used in the content database along with their frequencies. The initial implementation of the dictionary wasn’t scaling properly so it was converted to a Map/Reduce job [...]]]></description>
		<wfw:commentRss>http://php-app-engine.com/2010/hadoop-pig/hadoop-and-hbase-in-production/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Re: Analyzing MySQL slow query logs using Pig + Hadoop</title>
		<link>http://php-app-engine.com/2010/hadoop-pig/re-analyzing-mysql-slow-query-logs-using-pig-hadoop/</link>
		<comments>http://php-app-engine.com/2010/hadoop-pig/re-analyzing-mysql-slow-query-logs-using-pig-hadoop/#comments</comments>
		<pubDate>Tue, 12 Jan 2010 02:27:57 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[HadoopAndPig]]></category>

		<guid isPermaLink="false">http://php-app-engine.com/?p=783</guid>
		<description><![CDATA[http://www.mail-archive.com/pig-user@hadoop.apache.org/msg01633.html
A word of warning regarding that blog post &#8212; it&#8217;s written to explain
things, not to show how one would run them in production. So it&#8217;s a
bit verbose and does silly things like calling out to awk. Don&#8217;t take
it as a style guide  .
Someone recently commented that it&#8217;s way too long for the job it [...]]]></description>
		<wfw:commentRss>http://php-app-engine.com/2010/hadoop-pig/re-analyzing-mysql-slow-query-logs-using-pig-hadoop/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>HadoopHackDay was a major hit</title>
		<link>http://php-app-engine.com/2010/performance/hadoophackday-was-a-major-hit/</link>
		<comments>http://php-app-engine.com/2010/performance/hadoophackday-was-a-major-hit/#comments</comments>
		<pubDate>Tue, 12 Jan 2010 00:47:50 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[HadoopAndPig]]></category>
		<category><![CDATA[Money]]></category>
		<category><![CDATA[Performance]]></category>

		<guid isPermaLink="false">http://php-app-engine.com/?p=781</guid>
		<description><![CDATA[http://www.jonathanboutelle.com/mt/archives/2010/01/hadoophackday_w.html
-Hadoop is very resource-intensive! We started out using 1-node clusters to run our jobs against small subsets of data. Very quickly teams started upgrading to 5-node clusters due to the amount of time they were having to wait for results. Final runs against full data sets were powered by 10-node clusters of &#8220;medium&#8221; ec2 servers. [...]]]></description>
		<wfw:commentRss>http://php-app-engine.com/2010/performance/hadoophackday-was-a-major-hit/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hadoop, Pig, and Twitter (NoSQL East 2009)</title>
		<link>http://php-app-engine.com/2010/hadoop-pig/hadoop-pig-and-twitter-nosql-east-2009/</link>
		<comments>http://php-app-engine.com/2010/hadoop-pig/hadoop-pig-and-twitter-nosql-east-2009/#comments</comments>
		<pubDate>Fri, 08 Jan 2010 19:51:43 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[HadoopAndPig]]></category>

		<guid isPermaLink="false">http://php-app-engine.com/?p=748</guid>
		<description><![CDATA[http://www.slideshare.net/kevinweil/hadoop-pig-and-twitter-nosql-east-2009
A talk on the use of Hadoop and Pig inside Twitter, focusing on the flexibility and simplicity of Pig, and the benefits of that for solving real-world big data problems.

]]></description>
		<wfw:commentRss>http://php-app-engine.com/2010/hadoop-pig/hadoop-pig-and-twitter-nosql-east-2009/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Problem Solving with Apache Hadoop &amp; Pig</title>
		<link>http://php-app-engine.com/2010/hadoop-pig/problem-solving-with-apache-hadoop-pig/</link>
		<comments>http://php-app-engine.com/2010/hadoop-pig/problem-solving-with-apache-hadoop-pig/#comments</comments>
		<pubDate>Fri, 01 Jan 2010 13:59:36 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[HadoopAndPig]]></category>

		<guid isPermaLink="false">http://php-app-engine.com/?p=720</guid>
		<description><![CDATA[http://www.slideshare.net/hadoop/practical-problem-solving-with-apache-hadoop-pig
Practical Problem Solving with Hadoop and Pig Milind Bhandarkar (milindb@yahoo-inc.com) 
]]></description>
		<wfw:commentRss>http://php-app-engine.com/2010/hadoop-pig/problem-solving-with-apache-hadoop-pig/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>piglet</title>
		<link>http://php-app-engine.com/2009/hadoop-pig/piglet/</link>
		<comments>http://php-app-engine.com/2009/hadoop-pig/piglet/#comments</comments>
		<pubDate>Thu, 31 Dec 2009 07:11:37 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[HadoopAndPig]]></category>
		<category><![CDATA[Ruby Kids]]></category>

		<guid isPermaLink="false">http://php-app-engine.com/?p=707</guid>
		<description><![CDATA[http://github.com/iconara/piglet
Piglet is a DSL for writing Pig Latin scripts in Ruby:
  a = load &#8216;input&#8217;
  b = a.group :c
  store b, &#8216;output&#8217;
The code above will be translated to the following Pig Latin:
  relation_2 = LOAD &#8216;input&#8217;;
  relation_1 = GROUP relation_2 BY c;
  STORE relation_1 INTO &#8216;output&#8217;;
The aim is to [...]]]></description>
		<wfw:commentRss>http://php-app-engine.com/2009/hadoop-pig/piglet/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Building Hadoop Clusters On Linux In EC2</title>
		<link>http://php-app-engine.com/2009/how-to/building-hadoop-clusters-on-linux-in-ec2/</link>
		<comments>http://php-app-engine.com/2009/how-to/building-hadoop-clusters-on-linux-in-ec2/#comments</comments>
		<pubDate>Tue, 22 Dec 2009 19:53:48 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[HadoopAndPig]]></category>
		<category><![CDATA[HowTos]]></category>

		<guid isPermaLink="false">http://php-app-engine.com/?p=669</guid>
		<description><![CDATA[http://www.higherpass.com/linux/Tutorials/Building-Hadoop-Clusters-On-Linux-In-Ec2/
Learn to build and use multi-node Hadoop clusters running in Amazon EC2. A few bits of knowledge are assumed in this article, first a basic knowledge of Hadoop. If you haven&#8217;t used hadoop before you probably want to read Intro To Hadoop Article first. 
http://www.higherpass.com/java/Tutorials/Building-Hadoop-Mapreduce-Jobs-In-Java/
Hadoop is a parallel job processing framework from the Apache foundation. [...]]]></description>
		<wfw:commentRss>http://php-app-engine.com/2009/how-to/building-hadoop-clusters-on-linux-in-ec2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>A Benchmark for Hive, PIG and Hadoop</title>
		<link>http://php-app-engine.com/2009/performance/a-benchmark-for-hive-pig-and-hadoop/</link>
		<comments>http://php-app-engine.com/2009/performance/a-benchmark-for-hive-pig-and-hadoop/#comments</comments>
		<pubDate>Tue, 22 Dec 2009 19:40:32 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[HadoopAndPig]]></category>
		<category><![CDATA[Performance]]></category>

		<guid isPermaLink="false">http://php-app-engine.com/?p=667</guid>
		<description><![CDATA[http://issues.apache.org/jira/secure/attachment/12413737/hive_benchmark_2009-07-12.pdf
A Benchmark for Hive, PIG and Hadoop
]]></description>
		<wfw:commentRss>http://php-app-engine.com/2009/performance/a-benchmark-for-hive-pig-and-hadoop/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>pigpy</title>
		<link>http://php-app-engine.com/2009/opensource/pigpy/</link>
		<comments>http://php-app-engine.com/2009/opensource/pigpy/#comments</comments>
		<pubDate>Mon, 21 Dec 2009 07:10:40 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[HadoopAndPig]]></category>
		<category><![CDATA[Open Source Projects]]></category>

		<guid isPermaLink="false">http://php-app-engine.com/?p=659</guid>
		<description><![CDATA[http://code.google.com/p/pigpy/
pypig &#8211; a python tool to manage Pig reports
Pig provides an amazing set of tools to create complex relational processes on top of Hadoop, but it has a few missing pieces: # Looping constructs for easily creating multiple similar reports # Caching of intermediate calculations # Data management and cleanup code # Easy testing for [...]]]></description>
		<wfw:commentRss>http://php-app-engine.com/2009/opensource/pigpy/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hive vs. Pig</title>
		<link>http://php-app-engine.com/2009/hadoop-pig/hive-vs-pig/</link>
		<comments>http://php-app-engine.com/2009/hadoop-pig/hive-vs-pig/#comments</comments>
		<pubDate>Thu, 17 Dec 2009 01:23:22 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[HadoopAndPig]]></category>

		<guid isPermaLink="false">http://php-app-engine.com/?p=638</guid>
		<description><![CDATA[http://www.larsgeorge.com/2009/10/hive-vs-pig.html
While I was looking at Hive and Pig for processing large amounts of data without the need to write MapReduce code I found that there is no easy way to compare them against each other without reading into both in greater detail.
In this post I am trying to give you a 10,000ft view of both [...]]]></description>
		<wfw:commentRss>http://php-app-engine.com/2009/hadoop-pig/hive-vs-pig/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>wukong :: hadoop made easy</title>
		<link>http://php-app-engine.com/2009/opensource/wukong-hadoop-made-easy/</link>
		<comments>http://php-app-engine.com/2009/opensource/wukong-hadoop-made-easy/#comments</comments>
		<pubDate>Thu, 10 Dec 2009 01:00:48 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[HadoopAndPig]]></category>
		<category><![CDATA[Open Source Projects]]></category>

		<guid isPermaLink="false">http://php-app-engine.com/?p=610</guid>
		<description><![CDATA[http://mrflip.github.com/wukong/index.html

Wukong: Hadoop made so easy a Chimpanzee could run it.
Treat your dataset like a
   * stream of lines when it’s efficient to process by lines
    * stream of field arrays when it’s efficient to deal directly with fields
    * stream of lightweight objects when it’s efficient to [...]]]></description>
		<wfw:commentRss>http://php-app-engine.com/2009/opensource/wukong-hadoop-made-easy/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Map Reduce, Hadoop &amp; Pig</title>
		<link>http://php-app-engine.com/2009/hadoop-pig/map-reduce-hadoop-pig/</link>
		<comments>http://php-app-engine.com/2009/hadoop-pig/map-reduce-hadoop-pig/#comments</comments>
		<pubDate>Thu, 10 Dec 2009 00:51:21 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[HadoopAndPig]]></category>

		<guid isPermaLink="false">http://php-app-engine.com/?p=602</guid>
		<description><![CDATA[http://www.scribd.com/doc/23844299/Map-Reduce-Hadoop-Pig

A Hadoop, MapReduce and Pig summary
Powerpoint 24 Pages
]]></description>
		<wfw:commentRss>http://php-app-engine.com/2009/hadoop-pig/map-reduce-hadoop-pig/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Pig Performance Benchmarks</title>
		<link>http://php-app-engine.com/2009/performance/pig-performance-benchmarks/</link>
		<comments>http://php-app-engine.com/2009/performance/pig-performance-benchmarks/#comments</comments>
		<pubDate>Tue, 08 Dec 2009 16:52:23 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[HadoopAndPig]]></category>
		<category><![CDATA[Performance]]></category>

		<guid isPermaLink="false">http://php-app-engine.com/?p=593</guid>
		<description><![CDATA[https://issues.apache.org/jira/browse/PIG-200
To benchmark Pig performance, we need to have a TPC-H like Large Data Set plus Script Collection. This is used in comparison of different Pig releases, Pig vs. other systems (e.g. Pig + Hadoop vs. Hadoop Only).
Here is Wiki for small tests: http://wiki.apache.org/pig/PigPerformance
I am currently running long-running Pig scripts over data-sets in the order of [...]]]></description>
		<wfw:commentRss>http://php-app-engine.com/2009/performance/pig-performance-benchmarks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hw09 Making Hadoop Easy On Amazon Web Services</title>
		<link>http://php-app-engine.com/2009/hadoop-pig/hw09-making-hadoop-easy-on-amazon-web-services/</link>
		<comments>http://php-app-engine.com/2009/hadoop-pig/hw09-making-hadoop-easy-on-amazon-web-services/#comments</comments>
		<pubDate>Mon, 07 Dec 2009 21:04:00 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[HadoopAndPig]]></category>

		<guid isPermaLink="false">http://php-app-engine.com/?p=591</guid>
		<description><![CDATA[http://www.slideshare.net/cloudera/hw09-making-hadoop-easy-on-amazon-web-services
Amazon Elastic MapReduce &#8211; Peter Sirota 
]]></description>
		<wfw:commentRss>http://php-app-engine.com/2009/hadoop-pig/hw09-making-hadoop-easy-on-amazon-web-services/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cascading</title>
		<link>http://php-app-engine.com/2009/big-guys/cascading/</link>
		<comments>http://php-app-engine.com/2009/big-guys/cascading/#comments</comments>
		<pubDate>Wed, 02 Dec 2009 20:17:32 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Big Guys]]></category>
		<category><![CDATA[HadoopAndPig]]></category>
		<category><![CDATA[Open Source Projects]]></category>

		<guid isPermaLink="false">http://php-app-engine.com/?p=560</guid>
		<description><![CDATA[http://www.cascading.org/
Cascading is a feature rich API for defining and executing complex, scale-free, and fault tolerant data processing workflows on a Hadoop cluster.
The processing API lets the developer quickly assemble complex distributed processes without having to &#8220;think&#8221; in MapReduce. And to efficiently schedule them based on their dependencies and other available meta-data. Obviously simple data processing [...]]]></description>
		<wfw:commentRss>http://php-app-engine.com/2009/big-guys/cascading/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Pig Frustrations</title>
		<link>http://php-app-engine.com/2009/hadoop-pig/pig-frustrations/</link>
		<comments>http://php-app-engine.com/2009/hadoop-pig/pig-frustrations/#comments</comments>
		<pubDate>Wed, 02 Dec 2009 20:05:57 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[HadoopAndPig]]></category>

		<guid isPermaLink="false">http://php-app-engine.com/?p=558</guid>
		<description><![CDATA[http://developeraspirations.wordpress.com/2009/11/30/pig-frustrations/
My desires to implement better scalability through pre-processing reports via the Grid have lead me to Pig. Unfortunately, while Pig does remove some of the difficulties of writing for Hadoop (you no longer have to write all of the map-reduce jobs yourself in java), it has many limitations.
]]></description>
		<wfw:commentRss>http://php-app-engine.com/2009/hadoop-pig/pig-frustrations/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Analyzing Apache logs with Pig</title>
		<link>http://php-app-engine.com/2009/hadoop-pig/analyzing-apache-logs-with-pig/</link>
		<comments>http://php-app-engine.com/2009/hadoop-pig/analyzing-apache-logs-with-pig/#comments</comments>
		<pubDate>Mon, 30 Nov 2009 23:23:27 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[HadoopAndPig]]></category>

		<guid isPermaLink="false">http://php-app-engine.com/?p=544</guid>
		<description><![CDATA[http://www.cloudera.com/blog/2009/06/17/analyzing-apache-logs-with-pig/
In this blog post, we will use Pig to examine the download logs recorded on our server, demonstrating several features that are often glossed over in introductory Pig tutorials—parameter substitution in PigLatin scripts, Pig Streaming, and the use of custom loaders and user-defined functions (UDFs). It’s worth mentioning here that, as of last week, the [...]]]></description>
		<wfw:commentRss>http://php-app-engine.com/2009/hadoop-pig/analyzing-apache-logs-with-pig/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Analytics at Twitter</title>
		<link>http://php-app-engine.com/2009/hadoop-pig/analytics-at-twitter/</link>
		<comments>http://php-app-engine.com/2009/hadoop-pig/analytics-at-twitter/#comments</comments>
		<pubDate>Thu, 26 Nov 2009 09:16:34 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[HadoopAndPig]]></category>

		<guid isPermaLink="false">http://php-app-engine.com/?p=533</guid>
		<description><![CDATA[http://blog.tonybain.com/tony_bain/2009/11/analytics-at-twitter.html
Twitter, like many web 2.0 apps, started life as a MySQL based RBDMS application.  Today, Twitter is still using MySQL for much of their online operational functionality (although this is likely to change in the near future – think distributed), but on the analytics side of things Twitter has spent the last 6 months [...]]]></description>
		<wfw:commentRss>http://php-app-engine.com/2009/hadoop-pig/analytics-at-twitter/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to combine Elastic Mapreduce/Hadoop with other Amazon Web Services</title>
		<link>http://php-app-engine.com/2009/how-to/how-to-combine-elastic-mapreducehadoop-with-other-amazon-web-services/</link>
		<comments>http://php-app-engine.com/2009/how-to/how-to-combine-elastic-mapreducehadoop-with-other-amazon-web-services/#comments</comments>
		<pubDate>Thu, 12 Nov 2009 06:04:31 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[HadoopAndPig]]></category>
		<category><![CDATA[HowTos]]></category>
		<category><![CDATA[Misc]]></category>

		<guid isPermaLink="false">http://php-app-engine.com/?p=441</guid>
		<description><![CDATA[http://atbrox.com/2009/11/11/how-to-combine-elastic-mapreducehadoop-with-other-amazon-web-services/
Elastic Mapreduce default behavior is to read from and store to S3. When you need to access other AWS services, e.g. SQS queues or database services SimpleDB and RDS (MySQL) the best approach from Python is to use Boto. To get Boto to work with Elastic Mapreduce you need to dynamically load boto on each [...]]]></description>
		<wfw:commentRss>http://php-app-engine.com/2009/how-to/how-to-combine-elastic-mapreducehadoop-with-other-amazon-web-services/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>cloudmapreduce</title>
		<link>http://php-app-engine.com/2009/opensource/cloudmapreduce/</link>
		<comments>http://php-app-engine.com/2009/opensource/cloudmapreduce/#comments</comments>
		<pubDate>Wed, 11 Nov 2009 06:23:16 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[HadoopAndPig]]></category>
		<category><![CDATA[Open Source Projects]]></category>

		<guid isPermaLink="false">http://php-app-engine.com/?p=436</guid>
		<description><![CDATA[http://code.google.com/p/cloudmapreduce/

Cloud MapReduce was developed at Accenture Technology Labs by Huan Liu and Dan Orban. It is a MapReduce implementation on top of the Amazon Cloud OS.
By exploiting a cloud OS&#8217;s scalability, Cloud MapReduce achieves three primary advantages over other MapReduce implementations built on a traditional OS:
    * It is faster than other [...]]]></description>
		<wfw:commentRss>http://php-app-engine.com/2009/opensource/cloudmapreduce/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to use C++ Compiled Python for Amazon’s Elastic Mapreduce (Hadoop)</title>
		<link>http://php-app-engine.com/2009/how-to/how-to-use-c-compiled-python-for-amazon%e2%80%99s-elastic-mapreduce-hadoop/</link>
		<comments>http://php-app-engine.com/2009/how-to/how-to-use-c-compiled-python-for-amazon%e2%80%99s-elastic-mapreduce-hadoop/#comments</comments>
		<pubDate>Wed, 07 Oct 2009 16:04:57 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[HadoopAndPig]]></category>
		<category><![CDATA[HowTos]]></category>
		<category><![CDATA[Misc]]></category>

		<guid isPermaLink="false">http://php-app-engine.com/?p=163</guid>
		<description><![CDATA[http://atbrox.com/2009/10/07/how-to-use-c-compiled-python-for-amazons-elastic-mapreduce-hadoop/
Sometimes it can be useful to compile Python code for Amazon’s Elastic Mapreduce into C++ and then into a binary. The motivation for that could be to integrate with (existing) C or C++ code, or increase performance for CPU-intensive mapper or reducer methods. Here follows a description how to do that:
Based on Shedskin
]]></description>
		<wfw:commentRss>http://php-app-engine.com/2009/how-to/how-to-use-c-compiled-python-for-amazon%e2%80%99s-elastic-mapreduce-hadoop/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Building a Data Intensive Web Application with Cloudera, Hadoop, Hive, Pig, and EC2</title>
		<link>http://php-app-engine.com/2009/how-to/building-a-data-intensive-web-application-with-cloudera-hadoop-hive-pig-and-ec2/</link>
		<comments>http://php-app-engine.com/2009/how-to/building-a-data-intensive-web-application-with-cloudera-hadoop-hive-pig-and-ec2/#comments</comments>
		<pubDate>Sat, 03 Oct 2009 05:45:41 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[HadoopAndPig]]></category>
		<category><![CDATA[HowTos]]></category>

		<guid isPermaLink="false">http://php-app-engine.com/?p=119</guid>
		<description><![CDATA[http://www.cloudera.com/hadoop-data-intensive-application-tutorial
 This tutorial will show you how to use Amazon EC2 and Cloudera&#8217;s Distribution for Hadoop to run batch jobs for a data intensive web application. During the tutorial, we will perform the following data processing steps:
    * Configure and launch a Hadoop cluster on Amazon EC2 using the Cloudera tools
  [...]]]></description>
		<wfw:commentRss>http://php-app-engine.com/2009/how-to/building-a-data-intensive-web-application-with-cloudera-hadoop-hive-pig-and-ec2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Apache LogAnalysis using Pig</title>
		<link>http://php-app-engine.com/2009/how-to/apache-loganalysis-using-pig/</link>
		<comments>http://php-app-engine.com/2009/how-to/apache-loganalysis-using-pig/#comments</comments>
		<pubDate>Fri, 02 Oct 2009 03:09:37 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[HadoopAndPig]]></category>
		<category><![CDATA[HowTos]]></category>

		<guid isPermaLink="false">http://php-app-engine.com/?p=105</guid>
		<description><![CDATA[http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2728
Analyze your Apache logs using Pig and Amazon Elastic MapReduce.
    * Total bytes transferred per hour
    * A list of the top 50 IP addresses by traffic per hour
    * A list of the top 50 external referrers
    * The top 50 search [...]]]></description>
		<wfw:commentRss>http://php-app-engine.com/2009/how-to/apache-loganalysis-using-pig/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ec2cluster</title>
		<link>http://php-app-engine.com/2009/opensource/ec2cluster/</link>
		<comments>http://php-app-engine.com/2009/opensource/ec2cluster/#comments</comments>
		<pubDate>Thu, 01 Oct 2009 03:10:14 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[HadoopAndPig]]></category>
		<category><![CDATA[Open Source Projects]]></category>

		<guid isPermaLink="false">http://php-app-engine.com/?p=85</guid>
		<description><![CDATA[ec2cluster is a Rails web console, including a REST API, that launches temporary Beowulf clusters on Amazon EC2 for parallel processing. You upload input data and code to Amazon S3, then submit a job request including how many nodes you want in your cluster. ec2cluster will spin up &#038; configure a private beowulf cluster, process [...]]]></description>
		<wfw:commentRss>http://php-app-engine.com/2009/opensource/ec2cluster/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cloudera</title>
		<link>http://php-app-engine.com/2009/big-guys/cloudera/</link>
		<comments>http://php-app-engine.com/2009/big-guys/cloudera/#comments</comments>
		<pubDate>Tue, 29 Sep 2009 03:17:05 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Big Guys]]></category>
		<category><![CDATA[HadoopAndPig]]></category>

		<guid isPermaLink="false">http://php-app-engine.com/?p=16</guid>
		<description><![CDATA[Bringing Big Data to the Enterprise with Apache Hadoop


site
blog
feed
twitter


]]></description>
		<wfw:commentRss>http://php-app-engine.com/2009/big-guys/cloudera/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
