mapreduce - Can I use HCatInputFormat with MultipleInputs in Hadoop? -


i'm attempting join between 2 datasets, 1 stored in hive table, other 1 not. see according people not normal, in, either define hive table or don't.

now there's multipleinputs class, addinputpath method takes configuration, path, inputformat, mapper

i use input format there , try put table name disguised path sounds wild guess @ best.

there's patch newer version of hive (i'm on cdh4 means hive 0.10 , hcat 0.5 sadly). found patch not quite straight forward translate current version , seems work multiple tables , not mix of them.

https://issues.apache.org/jira/browse/hive-4997

is possible or have recommendations?

the thing can think of reading raw data without using table, implies logic on hive specific formats i'd rather avoid.

hcatmultipleinputs can used reading multiple hive tables.

here patch (for 0.13) can @ installing multiple table support. has hcatmultipleinputs support multiple hive tables.

https://issues.apache.org/jira/i#browse/hive-4997

  example useage: hcatmultipleinputs.addinput(job,table1, db1, properites1, mapper1.class); 

you can use working code in below link: https://github.com/abhirj87/training/tree/master/multipleinputs


Comments

Popular posts from this blog

how to proxy from https to http with lighttpd -

android - Automated my builds -

python - Flask migration error -