Hadoop: Generic Type Information at Runtime in Java

I’ve been playing around with Hadoop lately, and it seems that this is a situation when the lack of runtime information about generics really hurts.

The bad news: java isn’t going to be changing that anytime soon.
The good news: you don’t need to wait for them to do it.

       Here’s how.

1. Define Useful Generic Base

The secret here is to add a method per generic type to expose it’s type. Here’s an example


public interface UsefulMap<In1, Out1>
{
public Class<In1> getIn1Type();
public Class<Out1> getOut1Type();
}

2. Implement

Now when you implement, you will have to provide the runtime information.

public class MyMap implements UsefulMap<Double, String>
{
@Override
public Class<Double> getIn1Type()
{
return Double.class;
}
@Override
public Class<String> getOut1Type()
{
return String.class;
}
}

The nice part is, if you forget or change it, it won’t compile.

  public Class<Double> getIn1Type()
{
return Integer.class; <- Type mismatch: cannot convert from 
 Class<Integer> to Class<Double>
}

Once you do this, there are a bunch of convenience functions you can write that will make your setup much easier & safer.

For example, you can now write

HadoopJob.Create()  
 .WithMapper(new MyMapper())
 .AndReducer(new MyReducer())
 .Start();


Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s