Archive for the ‘science’ Category

Map Reduce at Google App Engine

March 29, 2012

I recently had to prepare a Cloud Computing 101 lecture including exercises for a teacher exchange week at the Oulu University of Applied Sciences.

As I the curriculum was targeted on internet computing, I wanted to present Google App Engine as a PaaS alternative and MapReduce as an important background technology of the cloud. Luckily, these two are working together – available at an experimental stage here. I quickly found out that the Python support is much more advanced than the Java one (no reducer), so I decided to have a go with it.

The typical WordCount example is available, but Map/Reduce function are both not programmed in a way that you are able to include a Combiner (also not documented at the moment) in the MapReduce pipeline. Below is the changed Map/Reduce code.

def word_count_map(data):
  """Word count map function."""
  (entry, text_fn) = data
  text = text_fn()
  logging.debug("Got %s", entry.filename)
  for s in split_into_sentences(text):
   for w in split_into_words(s.lower()):
    #original - not working with combiner
    #yield (w, "")
    yield (w, "1")

def word_count_reduce(key, values):
  """Word count reduce function."""
  #original not working with combiner
  #yield "%s: %d\n" % (key, len(values))
  value_ints = [int(x) for x in values]
  yield "%s: %d\n" % (key,sum(value_ints))

The combiner has to be included in the pipeline via

def run(self, filekey, blobkey):
  logging.debug("filename is %s" % filekey)
  output = yield mapreduce_pipeline.MapreducePipeline(
           "word_count",
           "main.word_count_map",
           "main.word_count_reduce",
           combiner_spec="main.TestCombiner",
           input_reader_spec="mapreduce.input_readers.BlobstoreZipInputReader",
           output_writer_spec="mapreduce.output_writers.BlobstoreOutputWriter",
           mapper_params={
                          "blob_key": blobkey,
           },
           reducer_params={
                          "mime_type": "text/plain",
           },
           shards=16)
  yield StoreOutput("WordCount", filekey, output)

and the Combiner has to be defined as follows:

class TestCombiner(object):
  """Test combine handler."""
  invocations = []

  def __call__(self, key, values, combiner_values):
    self.invocations.append((key, values, combiner_values))

    value_ints = [int(x) for x in values]
    combiner_values_int = [int(x) for x in combiner_values]
    yield sum(value_ints + combiner_values_int)
    yield operation.counters.Increment("combiner-call")

  @classmethod
  def reset(cls):
    cls.invocations = []

Given that you can play around with GAE/MR with different pipelines, one including a combiner and the other not.

Cloud Computing all over Austria in 2012

March 12, 2012

After some time of thinking about “To Cloud Compute or not?” focusing on legal and security issues, it seems that we finally arrive at the stage of “Yes, but how?”. For the purpose of gaining insights and discussing with experienced cloud users a number of possibilites exist.

The Austrian Computer Society (OCG) hosts a permanent working group on cloud computing as well as IT Cluster Vienna. Additionally, EuroCloud Austria is regularly organizing events including brunches with some talks and good opportunity to network – the next big one called EuroCloud Day on 23. May 2012.

Image

Cloud Camp is also coming back to Austria – on 4. April 2012 @ Technical University of Vienna (Prechtlsaal) – organized by Ivona Brandic, Mario Meir-Huber and myself. This unconference is a great opportunity for interested people, from students to professionals, to grasp non-marketing experiences.

Also, conferences in Austria are putting a focus on Cloud Computing in 2012 – e.g. the Austrian International Networking Academy Conference 2012 (13 and 14 March @ TGM Vienna) as well as ASQT 2012 (6 and 7 September @ University Klagenfurt). Besides, the 3rd International Conference on Cloud Computing will take place in Vienna on 24 – 26 September 2012.

Another good source to observe cloud computing developments and events in Austria is the well informed CloudUserGroup @ Facebook.

CloudCamp in Austria

September 21, 2011

After some time of preparation we finally made it – CloudCamp will take place at the St. Pölten University of Applied Sciences on 11. November 2011 between 11 a.m. and 4 p.m. with multiple tracks. It is an un-conference with no predefined schedule and free to join!

I’m co-organizing the event together with Mario Meir-Huber from CodeFore and Constantin Hofstetter from the Google Technology User Group Vienna. We are happy to be supported by the Austrian Computer Society as well as the IT Cluster Vienna!

Details and registration can be found here!

CAE Linux 2010 as VM

July 1, 2011

Due to my recent professional transition I’m looking into tools for industrial simulation. I found CAE Linux quite useful as it comes with a lot of the available open-source tool set for that purpose. Said that, I wanted to have a VM image of it – one preconfigured is available for CAE Linux 2008, but not for CAE Linux 210. So I downloaded the .iso file and tried to create a VM for VMPlayer 4. Although it seemed promising first, the guest OS – in this case Ubuntu 64 bit – didn’t start correctly. After playing around a while I switched my efforts to Virtual Box where it worked out of the box, unfortunately just with 640 x 480 resolution. Easy to fix with the Virtual Box Guest Extensions. The idea is to elaborated on combining the available industrial simulation tools with workflow technology such as Kepler or Meandre.

Research group management

November 10, 2010

Being in research since 6 years I experienced several ways of research group management – a very strict regime with pre-scheduled weekly meetings and very loose management. A recent article in CACM on SCORE (SCrum fOr REsearch) describes an interessting approach in the middle.

Beyond one’s own nose

May 27, 2010

Today I attended a rather economic talk of Tim Jackson at the Buiness University of Vienna with the topic
“Prosperity without growth”. Great speaker introducing and discussing well the dilemma of growth and possibilities out of it. Looks like a hard and long way to go for us…..

Publish or perish aka give me your number

March 8, 2010

I recently discovered a great tool called PoP (for publish or perish) to calculate a lot of the publication statistics, e.g. h-Index, often used to evaluate researchers when applying for a position.
It is based on Google Scholar and lets you quickly analyze a person of interest.
Unfortunately, with a special character in your name – like in mine – it doesn’t perform so great and some post-processing is required.