how does ORC indexing work -


the way how indexing in database works: refering answer xenph yan

creating index on field in table creates data structure holds field value, , pointer record relates to. index structure sorted, allowing binary searches performed on it.

the way understood orc indexing is, orc keeps statistics (min, max, sum) rows every 10'000 rows (by default )and if query data looks @ statistics figure out if needs read row chunk or not.

so correct orc indexing not sort data?

i have 69 column large table unstructured data , able perform ad-hoc queries on every column. so, able sort every column through index (or @ least of them). there no 'key' column in data get's queried rapidly.

hive has been designed pseudo-sql front-end running (long) batch jobs on (massive) data sets. can run "ad hoc queries" forget "rapidly".

besides, when index column in database (i.e. create index command in sql), index entire, exact value of each row. if data indeed "unstructured" make no sense.

so... if need full-text search, why don't dump data in elasticsearch or solr instead??


Comments

Popular posts from this blog

Android : Making Listview full screen -

javascript - Parse JSON from the body of the POST -

javascript - Chrome Extension: Interacting with iframe embedded within popup -