ec2base » Hadoop https://php-app-engine.com Amazon EC2 by Robot and Me Tue, 06 Oct 2009 02:28:24 +0000 http://wordpress.org/?v=2.8.4 en hourly 1 Building a Data Intensive Web Application with Cloudera, Hadoop, Hive, Pig, and EC2 https://php-app-engine.com/2009/how-to/building-a-data-intensive-web-application-with-cloudera-hadoop-hive-pig-and-ec2/ https://php-app-engine.com/2009/how-to/building-a-data-intensive-web-application-with-cloudera-hadoop-hive-pig-and-ec2/#comments Sat, 03 Oct 2009 05:45:41 +0000 admin https://php-app-engine.com/?p=119 http://www.cloudera.com/hadoop-data-intensive-application-tutorial

This tutorial will show you how to use Amazon EC2 and Cloudera’s Distribution for Hadoop to run batch jobs for a data intensive web application. During the tutorial, we will perform the following data processing steps:

* Configure and launch a Hadoop cluster on Amazon EC2 using the Cloudera tools
* Load Wikipedia log data into Hadoop from Amazon Elastic Block Store (EBS) snapshots and Amazon S3
* Run simple Pig and Hive commands on the log data
* Write a MapReduce job to clean the raw data and aggregate it to a daily level (page_title, date, count)
* Write a Hive query that finds trending Wikipedia articles by calling a custom mapper script
* Join the trend data in Hive with a table of Wikipedia page IDs
* Export the trend query results to S3 as a tab delimited text file for use in our web application’s MySQL database

]]>
https://php-app-engine.com/2009/how-to/building-a-data-intensive-web-application-with-cloudera-hadoop-hive-pig-and-ec2/feed/ 0
Apache LogAnalysis using Pig https://php-app-engine.com/2009/how-to/apache-loganalysis-using-pig/ https://php-app-engine.com/2009/how-to/apache-loganalysis-using-pig/#comments Fri, 02 Oct 2009 03:09:37 +0000 admin https://php-app-engine.com/?p=105 http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2728

Analyze your Apache logs using Pig and Amazon Elastic MapReduce.

* Total bytes transferred per hour
* A list of the top 50 IP addresses by traffic per hour
* A list of the top 50 external referrers
* The top 50 search terms in referrals from Bing and Google

You can modify the Pig script to generate additional information.

]]>
https://php-app-engine.com/2009/how-to/apache-loganalysis-using-pig/feed/ 0