hadoop - Pig filter fails due to unexpected data -


i running cassandra , have 20k records in play with. trying run filter in pig on data getting following message back:

2015-07-23 13:02:23,559 [thread-4] warn org.apache.hadoop.mapred.localjobrunner - job_local_0001 java.lang.runtimeexception: com.datastax.driver.core.exceptions.invalidqueryexception: expected 8 or 0 byte long (1) @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.pigrecordreader.initnextrecordreader(pigrecordreader.java:260) @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.pigrecordreader.nextkeyvalue(pigrecordreader.java:205) @ org.apache.hadoop.mapred.maptask$newtrackingrecordreader.nextkeyvalue(maptask.java:532) @ org.apache.hadoop.mapreduce.mapcontext.nextkeyvalue(mapcontext.java:67) @ org.apache.hadoop.mapreduce.mapper.run(mapper.java:143) @ org.apache.hadoop.mapred.maptask.runnewmapper(maptask.java:764) @ org.apache.hadoop.mapred.maptask.run(maptask.java:370) @ org.apache.hadoop.mapred.localjobrunner$job.run(localjobrunner.java:212) caused by: com.datastax.driver.core.exceptions.invalidqueryexception: expected 8 or 0 byte long (1) @ com.datastax.driver.core.exceptions.invalidqueryexception.copy(invalidqueryexception.java:35) @ com.datastax.driver.core.defaultresultsetfuture.extractcausefromexecutionexception(defaultresultsetfuture.java:263) @ com.datastax.driver.core.defaultresultsetfuture.getuninterruptibly(defaultresultsetfuture.java:179) @ com.datastax.driver.core.abstractsession.execute(abstractsession.java:52) @ com.datastax.driver.core.abstractsession.execute(abstractsession.java:44) @ org.apache.cassandra.hadoop.cql3.cqlrecordreader$rowiterator.(cqlrecordreader.java:259) @ org.apache.cassandra.hadoop.cql3.cqlrecordreader.initialize(cqlrecordreader.java:151) @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.pigrecordreader.initnextrecordreader(pigrecordreader.java:256) ... 7 more

you think obvious error, , believe me there ton of results on google this. it's clear piece of data isn't conforming expected type of given column. don't understand 1.) why happening, , 2.) how debug it. if try insert invalid data cassandra nodejs app, throw kind of error if data type doesn't match columns data type, means shouldn't possible? i've read data validation using utf8 wonky , setting different kind of validation answer, don't know how that. here steps reproduce:

grunt> define cqlnativestorage org.apache.cassandra.hadoop.pig.cqlnativestorage(); grunt> test = load 'cql://blah/blahblah' using cqlnativestorage(); grunt> describe test; 13:09:54.544 [main] debug o.a.c.hadoop.pig.cqlnativestorage - found ksdef name: blah 13:09:54.544 [main] debug o.a.c.hadoop.pig.cqlnativestorage - partition keys: ["ad_id"] 13:09:54.544 [main] debug o.a.c.hadoop.pig.cqlnativestorage - cluster keys: [] 13:09:54.544 [main] debug o.a.c.hadoop.pig.cqlnativestorage - row key validator: org.apache.cassandra.db.marshal.utf8type 13:09:54.544 [main] debug o.a.c.hadoop.pig.cqlnativestorage - cluster key validator: org.apache.cassandra.db.marshal.compositetype(org.apache.cassandra.db.marshal.utf8type) blahblah: {ad_id: chararray,address: chararray,city: chararray,date_created: long,date_listed: long,fireplace: bytearray,furnished: bytearray,garage: bytearray,neighbourhood: chararray,num_bathrooms: int,num_bedrooms: int,pet_friendly: bytearray,postal_code: chararray,price: double,province: chararray,square_feet: int,url: chararray,utilities_included: bytearray} grunt> query1 = filter blahblah city == 'new york'; grunt> dump query1;

then runs awhile , dumps out tons of logs , error appears.

discovered problem: pig partioner did not match cql3, , therefore data being parsed incorrectly. environment variable pig_partitioner=org.apache.cassandra.dht.randompartitioner. after changed pig_partitioner=org.apache.cassandra.dht.murmur3partitioner started working.


Comments

Popular posts from this blog

javascript - Karma not able to start PhantomJS on Windows - Error: spawn UNKNOWN -

c# - Display ASPX Popup control in RowDeleteing Event (ASPX Gridview) -

Nuget pack csproj using nuspec -