... more stuff
at
php-app-engine.com

Performance comparison: key/value stores for language model counts

http://anyall.org/blog/2009/04/performance-comparison-keyvalue-stores-for-language-model-counts/

Here are timings for a single counting process: iterate over 45,000 short text messages, tokenize them, then increment counters for their unigrams and bigrams. (The speed of the data store is only one component of performance.) There are about 17 increments per tweet: 400k unique terms and 750k total count. This is substantially smaller than what I need, but it’s small enough to easily test. I used several very different architectures and packages, explained below.

Leave a Reply