python - How can I create an efficient hash function for a a list of numbers with a range of over a billion? -
i writing script in python , attempting create hash table 647050
values (most values occur twice, no more 3-4 times) taken column of csv file. have incredible range minimum value being 651
, maximum being 2147477024
.
how can create efficient hash function can place 647050
values range of 2147476373
hash table in sensible amount of time?
and how might structure code?
for example have tried simple (but messy) hash function below slows such crawl haven't allowed complete. timed different intervals of values added hash table see exponential slow down (it doesn't slow down absolute crawl until on 640000
keys have been processed).
the values initialized 0
:
def hash_function(value): hash_key = value % 647050 return hash_key def is_inserted(hash_table, hash_key, num): if hash_table[hash_key] == 0: hash_table[hash_key] = num return true return false def insert(hash_table, hash_key, num): current_hash = hash_key inserted = false while not inserted: if current_hash == 647049: #prevent array out of bounds current_hash = 0: inserted = is_inserted(hash_table, current_hash, num) current_hash += 1 def main(): #some code hash_table = [0] * 647050 num in nums hash_key = hash_function(num) insert(hash_table, hash_key, num) #some code
edit/clarification : i'm sure if asked question correctly. need later search list of values matching entries (for 300
values), , want of course avoid nested each loops 647050
checks each of 300
values. thought of placing values list based on hash function search them quickly. not sure how implement this. current simple implementation creates far many collisions , takes forever.
if understood question correctly, should job:
hash_table = {i:nums[i] in range(647050)}
i using python's builtin dictionary, seems more efficient using lists.
Comments
Post a Comment