А вот пример кода на нем:
from disco.core import Disco, result_iterator
def fun_map(e, params):
return [(w, 1) for w in e.split()]
def fun_reduce(iter, out, params):
s = {}
for w, f in iter:
s[w] = s.get(w, 0) + int(f)
for w, f in s.iteritems():
out.add(w, f)
results = Disco("disco://localhost").new_job(
name = "wordcount",
input = ["http://discoproject.org/chekhov.txt"],
map = fun_map,
reduce = fun_reduce).wait()
for word, frequency in result_iterator(results):
print word, frequency
This is a fully working Disco script that computes word frequencies in a text corpus. Disco distributes the script automatically to a cluster, so it can utilize all available CPUs in parallel. For details, see Disco tutorial.
No comments :
Post a Comment
Note: only a member of this blog may post a comment.