Archive for March, 2012

Map Reduce at Google App Engine

March 29, 2012

I recently had to prepare a Cloud Computing 101 lecture including exercises for a teacher exchange week at the Oulu University of Applied Sciences.

As I the curriculum was targeted on internet computing, I wanted to present Google App Engine as a PaaS alternative and MapReduce as an important background technology of the cloud. Luckily, these two are working together – available at an experimental stage here. I quickly found out that the Python support is much more advanced than the Java one (no reducer), so I decided to have a go with it.

The typical WordCount example is available, but Map/Reduce function are both not programmed in a way that you are able to include a Combiner (also not documented at the moment) in the MapReduce pipeline. Below is the changed Map/Reduce code.

def word_count_map(data):
  """Word count map function."""
  (entry, text_fn) = data
  text = text_fn()
  logging.debug("Got %s", entry.filename)
  for s in split_into_sentences(text):
   for w in split_into_words(s.lower()):
    #original - not working with combiner
    #yield (w, "")
    yield (w, "1")

def word_count_reduce(key, values):
  """Word count reduce function."""
  #original not working with combiner
  #yield "%s: %d\n" % (key, len(values))
  value_ints = [int(x) for x in values]
  yield "%s: %d\n" % (key,sum(value_ints))

The combiner has to be included in the pipeline via

def run(self, filekey, blobkey):
  logging.debug("filename is %s" % filekey)
  output = yield mapreduce_pipeline.MapreducePipeline(
           "word_count",
           "main.word_count_map",
           "main.word_count_reduce",
           combiner_spec="main.TestCombiner",
           input_reader_spec="mapreduce.input_readers.BlobstoreZipInputReader",
           output_writer_spec="mapreduce.output_writers.BlobstoreOutputWriter",
           mapper_params={
                          "blob_key": blobkey,
           },
           reducer_params={
                          "mime_type": "text/plain",
           },
           shards=16)
  yield StoreOutput("WordCount", filekey, output)

and the Combiner has to be defined as follows:

class TestCombiner(object):
  """Test combine handler."""
  invocations = []

  def __call__(self, key, values, combiner_values):
    self.invocations.append((key, values, combiner_values))

    value_ints = [int(x) for x in values]
    combiner_values_int = [int(x) for x in combiner_values]
    yield sum(value_ints + combiner_values_int)
    yield operation.counters.Increment("combiner-call")

  @classmethod
  def reset(cls):
    cls.invocations = []

Given that you can play around with GAE/MR with different pipelines, one including a combiner and the other not.

Advertisements

Pilgrimage to Mariazell

March 12, 2012

After taking 7 times one of the eastern routes (book tip: “Pilgerwege nach Mariazell: Band Ost & Nord” by Fritz and Erika Käfer) to Mariazell over Mamauwiese (nice place to stay overnight by the way) via Frein an der Mürz and Schöneben we are going for a partial northern route this time (originally 7 days hike). From St. Pölten over Kaiserkogel and Türnitz to Mariazell – 3 days, 85 km. Try to book your accommodation early, especially when you want to go during Pentecost and just stay for one night. Took me half a day to get it right for 11 pilgrims.

Next years we are going to explore western and southern routes to Mariazell well described in “Pilgerwege nach Mariazell: Band West & Süd”.

Cloud Computing all over Austria in 2012

March 12, 2012

After some time of thinking about “To Cloud Compute or not?” focusing on legal and security issues, it seems that we finally arrive at the stage of “Yes, but how?”. For the purpose of gaining insights and discussing with experienced cloud users a number of possibilites exist.

The Austrian Computer Society (OCG) hosts a permanent working group on cloud computing as well as IT Cluster Vienna. Additionally, EuroCloud Austria is regularly organizing events including brunches with some talks and good opportunity to network – the next big one called EuroCloud Day on 23. May 2012.

Image

Cloud Camp is also coming back to Austria – on 4. April 2012 @ Technical University of Vienna (Prechtlsaal) – organized by Ivona Brandic, Mario Meir-Huber and myself. This unconference is a great opportunity for interested people, from students to professionals, to grasp non-marketing experiences.

Also, conferences in Austria are putting a focus on Cloud Computing in 2012 – e.g. the Austrian International Networking Academy Conference 2012 (13 and 14 March @ TGM Vienna) as well as ASQT 2012 (6 and 7 September @ University Klagenfurt). Besides, the 3rd International Conference on Cloud Computing will take place in Vienna on 24 – 26 September 2012.

Another good source to observe cloud computing developments and events in Austria is the well informed CloudUserGroup @ Facebook.