... more stuff
at
php-app-engine.com

Archive for the ‘Open Source Projects’ Category

redis-admin

Wednesday, January 27th, 2010

http://code.google.com/p/redis-admin/

Redis Admin, or ReAdmin, is a open source web interface to the Administration of Redis. ReAdmin is fully written in PHP using Redis, of course.

rediska

Monday, January 25th, 2010

http://rediska.geometria-lab.net/

Rediska (radish on russian) – PHP client for Redis.

Redis is an advanced fast key-value database written in C. It can be used like memcached, in front of a traditional database, or on its own thanks to the fact that the in-memory datasets are not volatile but instead persisted on disk. One of the cool features is that you can store not only strings, but lists and sets with atomic operations to push/pop elements.

Neo4j

Friday, January 22nd, 2010

http://neo4j.org/

You can think of Neo4j as a high-performance graph engine with all the features of a mature and robust database. The programmer works with an object-oriented, flexible network structure rather than with strict and static tables — yet enjoys all the benefits of a fully transactional, enterprise-strength database.

Neo4j is released under a dual free software/commercial license model (which basically means that it’s “open source” but if you’re interested in using it in commercially, then you must buy a commercial license).

Neo4j has been in commercial development for 8 years and in production for over 5 years. It is a mature and robust graph database that provides:

Thoughts On The Design Of The Fossil DVCS

Thursday, January 21st, 2010

http://www.fossil-scm.org/index.html/doc/tip/www/theory1.wiki

Two questions (or criticisms) that arise frequently regarding Fossil can be summarized as follows:

1. Why is Fossil based on SQLite instead of a distributed NoSQL database?

2. Why is Fossil written in C instead of a modern high-level language?

Neither question can be answered directly because they are both based on false assumptions. We claim that Fossil is not based on SQLite at all and that Fossil is not based on a distributed NoSQL database because Fossil is a distributed NoSQL database. And, Fossil does use a modern high-level language for its implementation, namely SQL.

s3-bash

Thursday, January 21st, 2010

http://www.ipconfig.co.nz/blog/post.cfm/s3-bash

s3-bash is a small collection of BASH scripts to let you use it from an Unix, Linux or Mac OS X command line without needing Perl, Python, Java, .Net, etc…

CloudFusion

Monday, January 11th, 2010

http://getcloudfusion.com/

* Fast, powerful PHP toolkit with an easy to learn, consistent API.
* Highly extensible framework for easily adding new services.
* Ridiculously thorough API reference, code samples and tutorials.
* Works with Amazon Web Services and Eucalyptus.

Used to be called ‘Tarzan’ before.
Nice move with Eucaliptus.
Nice move with the name too.

Review my software: Keyspace consistently replicated key-value store (scalien.com)

Thursday, January 7th, 2010

http://news.ycombinator.com/item?id=705977

scalien.com

Keyspace

This is the document that the interested programmer or engineer should read first:

This paper describes the design and architecture of Keyspace, a distributed key-value store offering strong consistency, fault-tolerance and high availability. The source code is released as free, open-source software under the BSD license.

PaxosLease

PaxosLease is a Paxos-based, diskless algorithm for negotiating leases in a distributed system. It is used for master leases in Keyspace.

This paper describes PaxosLease, a distributed algorithm for lease negotiation. PaxosLease is based on Paxos, but does not require disk writes and does not make clock synchrony and skew assumptions. PaxosLease is used for master lease negotation in the open-source Keyspace replicated key-value store.

mcollective

Saturday, January 2nd, 2010

http://code.google.com/p/mcollective/

The Marionette Collective aka. mcollective is a framework to build server orchestration or parallel job execution systems.

Primarily we’ll use it as a means to programmatically execute actions on clusters of servers. In this regard we operate in the same space as tools like Func, Fabric or Capistrano.

We’ve attempted to think out of the box a bit designing this system by not relying on central inventories and tools like SSH, we’re not simply a fancy SSH “for loop”. MCollective uses modern tools like Publish Subscribe Middleware and modern philosophies like real time discovery of network resources using meta data and not hostnames. Delivering a very scalable and very fast parallel execution environment.

jclouds

Thursday, December 24th, 2009

http://code.google.com/p/jclouds/

jclouds is an open source framework that helps you get started in the cloud and reuse your java development skills. Our api allows you to freedom to use portable abstractions or cloud-specific features. We support many clouds including Amazon, VMWare, Azure, and Rackspace.

MapBox

Wednesday, December 23rd, 2009

http://mapbox.com/


MapBox is a suite of open source tools to create beautiful custom maps in Amazon’s cloud.

pigpy

Sunday, December 20th, 2009

http://code.google.com/p/pigpy/

pypig – a python tool to manage Pig reports

Pig provides an amazing set of tools to create complex relational processes on top of Hadoop, but it has a few missing pieces: # Looping constructs for easily creating multiple similar reports # Caching of intermediate calculations # Data management and cleanup code # Easy testing for report correctness

pypig is an attempt to fill in these holes by providing a python module that knows how to talk to a Hadoop cluster and can create and manage complex report structures.

appscale

Tuesday, December 15th, 2009

http://code.google.com/p/appscale/

AppScale is a platform that allows users to deploy and host their own Google App Engine applications. It executes automatically over Amazon EC2 and Eucalyptus as well as Xen and KVM. It has been developed and is maintained by the RACELab at UC Santa Barbara.

wukong :: hadoop made easy

Wednesday, December 9th, 2009

http://mrflip.github.com/wukong/index.html


Wukong: Hadoop made so easy a Chimpanzee could run it.

Treat your dataset like a

* stream of lines when it’s efficient to process by lines
* stream of field arrays when it’s efficient to deal directly with fields
* stream of lightweight objects when it’s efficient to deal with objects

Wukong is friends with Hadoop the elephant, Pig the query language, and the cat on your command line.

Send Wukong questions to the Infinite Monkeywrench mailing list

Vlad the Deployer

Sunday, December 6th, 2009

http://rubyhitsquad.com/Ruby_Hit_Squad.html

Basically, attack on Capistrano.
Grab the popcorn.


• Do the simplest thing that could possibly work.
• Nothing to 1.0 in four(ish) days.
• Targets the 80% use case.
• Uses Rake, as god intended.
• Use the right tool for the job (ssh, rsync, etc).
• Fold in the Rails Machine recipes.
• Clever is bad. Period.

Amazon S3 Authentication Tool for Curl

Thursday, December 3rd, 2009

http://developer.amazonwebservices.com/connect/entry.jspa?%20externalID=128

Curl is a popular command-line tool for interacting with HTTP services. This Perl script calculates the proper signature, then calls Curl with the appropriate arguments.

StarCluster

Wednesday, December 2nd, 2009

http://web.mit.edu/stardev/cluster/


Multiple Clusters – Currently, StarCluster only supports launching a single cluster on ec2. In theory, the software should be able to START multiple clusters but it’s not equipped to handle the accounting on which nodes belong to which cluster after the initial startup. This means things like listing the nodes, terminating a particular cluster, etc will not work. Support for the correct account of multiple clusters should come in future versions.

Dynamic Load Balancing – Support for a dynamically resizing cluster on ec2. Integrating the Service Domain Manager (SDM) and Hedeby software products from SUN into the AMI will allow ec2 nodes to easily be added to the Sun Grid Engine queue. This means you could theoretically start a single node cluster and as the load increases, ec2 nodes would be launched, added to the cluster, used for computation, and then removed when they’re idle. The impact of this would be to significantly lower the cost of using EC2 by only having one node up 24/7 and adding/removing nodes as needed.

Cascading

Wednesday, December 2nd, 2009

http://www.cascading.org/

Cascading is a feature rich API for defining and executing complex, scale-free, and fault tolerant data processing workflows on a Hadoop cluster.

The processing API lets the developer quickly assemble complex distributed processes without having to “think” in MapReduce. And to efficiently schedule them based on their dependencies and other available meta-data. Obviously simple data processing applications are supported as well, as complex jobs tend to start simple.

Cascading is Open Source and dual licensed under the GPL and OEM/Commercial Licenses. OEM/Commercial Licenses and Developer Support can be obtained through Concurrent, Inc.

Provisioning a Hudson CI server

Tuesday, November 24th, 2009

CI in a box and Hudson.

http://thediscoblog.com/2009/11/24/provisioning-a-hudson-ci-server/

http://www.ciinabox.com/

CI in a Box is one of the easiest ways to get up and running with Continuous Integration– in fact, if you don’t believe me, check out the CI in a Box tutorial video. As you’ll see, CI in a Box makes setting up a Hudson CI server practically a breeze by leveraging Amazon’s EC2; what’s more, the video quickly sets up an SVN project that contains an Ant build (don’t worry, CI in a Box supports Maven as well!).

Apache Traffic Server

Tuesday, November 24th, 2009

http://incubator.apache.org/projects/trafficserver.html

Traffic Server fills the need for a fast, extensible and scalable HTTP 1.1 proxy and cache. We have a production proven piece of software that can deliver HTTP traffic at high rates, and can scale well on modern SMP hardware. We have benchmarked Traffic Server to handle in excess of 35,000 RPS on a single box. Traffic Server has a rich feature set, implementing most of HTTP/1.1 to the RFC specifications.

Nimbus

Tuesday, November 17th, 2009

http://workspace.globus.org/

Nimbus is an open source toolkit that allows you to turn your cluster into an Infrastructure-as-a-Service (IaaS) cloud. Feature highlights include:

Two sets of Web Service interfaces: Amazon EC2 WSDLs and Grid community WSRF, read more about interfaces…

Implementation based on the Xen hypervisor (KVM coming soon), read more about supported virtualization technologies…

Wow. Build you own EC2.
And try selling it. Ra!

LibAWS++

Wednesday, November 11th, 2009

http://aws.28msec.com/


We are proud to announce the second beta release of the libaws project (version 0.9.2) Libaws is an easy-to-use code library that helps you programming C++-Software that communicates with Amazon Web Services. More in detail, it supports communication with the following amazon services:

* Amazon Simple Storage Service (Amazon S3)
* Amazon SimpleDB
* Amazon Simple Queue Service (Amazon SQS)

cloudmapreduce

Tuesday, November 10th, 2009

http://code.google.com/p/cloudmapreduce/


Cloud MapReduce was developed at Accenture Technology Labs by Huan Liu and Dan Orban. It is a MapReduce implementation on top of the Amazon Cloud OS.

By exploiting a cloud OS’s scalability, Cloud MapReduce achieves three primary advantages over other MapReduce implementations built on a traditional OS:

* It is faster than other implementations (e.g., 60 times faster than Hadoop in one case).

* It is more scalable because it has no single point of bottleneck.

* It is dramatically simpler with only 3,000 lines of code (e.g., two orders of magnitude simpler than Hadoop).

See details in Cloud MapReduce Technical Report.

See Command line options for details on how to specify a job run, and Pre-built AMI for how to use the pre-built AMI image to make running the job easier. A tutorial is coming soon.

ec2-elastic-backups

Tuesday, November 10th, 2009

http://github.com/truthtrap/ec2-elastic-backups

i put together these 2 files because i wanted dirvish like backups, but with snapshots and not too much work. all in all
it took me still to much, but thanks to tools like simpledb (http://code.google.com/p/amazon-simpledb-cli/) and the
wonderful amazon aws tools i have what i want.

Cassandra

Sunday, November 1st, 2009

http://incubator.apache.org/cassandra/

Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store. Cassandra brings together the distributed systems technologies from Dynamo and the data model from Google’s BigTable. Like Dynamo, Cassandra is eventually consistent. Like BigTable, Cassandra provides a ColumnFamily-based data model richer than typical key/value systems.

Cassandra was open sourced by Facebook in 2008, where it was designed by one of the authors of Amazon’s Dynamo. In a lot of ways you can think of Cassandra as Dynamo 2.0. Cassandra is in production use at Facebook but is still under heavy development.

Voldemort : EC2 Testing Infrastructure

Friday, October 30th, 2009

http://wiki.github.com/kirktrue/voldemort/ec2-testing-infrastructure

An open source clone of Amazon’s Dynamo

Goals and Deliverables

The primary goals of this project are as follows:

1. Ability to initialize EC2 instances for use with Voldemort
2. Cluster deployment, configuration
3. Individual node start/stop
4. Ability to leverage above for performance and correctness tests

4store.org

Tuesday, October 27th, 2009

http://4store.org/

4store, an efficient, scalable and stable RDF database

It is written in ANSI C99, and designed to run on UNIX-like systems.

4store is optimised to run on shared–nothing clusters of up to 32 nodes

When configured as a cluster, import performances of 120 kT/s are easily achievable. Query times for relatively simple queries are often in the low milliseconds, even over the standard HTTP SPARQL protocol.

http://thinklinks.wordpress.com/2009/10/27/4store-amazon-machine-image-and-billion-triple-challenge-data-set/

Today, we are making publicly available an Amazon Machine Image for 4store. Additionally, we are making an Elastic Block Storage snapshot of the BTC dataset for 4store. Thus, developers can easily get started using 4store with a billion triples on Amazon’s cloud.

MongoDb

Friday, October 23rd, 2009

http://www.mongodb.org/display/DOCS/Home

The best features of document databases, key/values stores, and RDBMSes in one.

Mongo (from “humongous”) is a high-performance, open source, schema-free document-oriented database. MongoDB is written in C++ and offers the following features:

* Collection oriented storage: easy storage of object/JSON -style data
* Dynamic queries
* Full index support, including on inner objects and embedded arrays
* Query profiling
* Replication and fail-over support
* Efficient storage of binary data including large objects (e.g. photos and videos)
* Auto-sharding for cloud-level scalability
* Commercial Support, Hosting and Consulting Available

A key goal of MongoDB is to bridge the gap between key/value stores (which are fast and highly scalable) and traditional RDBMS systems (which are deep in functionality).

http://www.mongodb.org/display/DOCS/Amazon+EC2

MongoDB runs well on Amazon EC2 . This page includes some notes in this regard.

aws – simple access to Amazon EC2 and S3

Sunday, October 18th, 2009

http://timkay.com/aws/


aws is a command-line tool that gives you easy access to Amazon EC2 and Amazon S3. aws is designed to be simple to install and simple to use.

Thanks to your feedback, aws is the top-rated “community code” for all of Amazon EC2 and S3! See the ratings and reviews at EC2 and S3. They make me blush! Thank you!

http://timkay.com/tools/
many other tools are kewl

Sumo

Thursday, October 8th, 2009

http://github.com/adamwiggins/sumo/

Tired of wrestling with server provisioning? Sumo!
Want to fire up a one-off EC2 instance, pronto? ec2-run-instances got you down? Try Sumo.

EC2 on Rails

Thursday, October 8th, 2009

http://ec2onrails.rubyforge.org/

EC2 on Rails is an Ubuntu Linux server image for Amazon’s EC2 hosting service that’s ready to run a standard Ruby on Rails application with little or no customization. It’s a Ruby on Rails virtual appliance.