python - How can I create an efficient hash function for a a list of numbers with a range of over a billion? -


i writing script in python , attempting create hash table 647050 values (most values occur twice, no more 3-4 times) taken column of csv file. have incredible range minimum value being 651 , maximum being 2147477024.

how can create efficient hash function can place 647050 values range of 2147476373 hash table in sensible amount of time?

and how might structure code?

for example have tried simple (but messy) hash function below slows such crawl haven't allowed complete. timed different intervals of values added hash table see exponential slow down (it doesn't slow down absolute crawl until on 640000 keys have been processed).

the values initialized 0:

def hash_function(value):     hash_key = value % 647050     return hash_key  def is_inserted(hash_table, hash_key, num):     if hash_table[hash_key] == 0:         hash_table[hash_key] = num         return true     return false  def insert(hash_table, hash_key, num):     current_hash = hash_key     inserted = false     while not inserted:         if current_hash == 647049: #prevent array out of bounds             current_hash = 0:          inserted = is_inserted(hash_table, current_hash, num)         current_hash += 1  def main():     #some code     hash_table = [0] * 647050     num in nums         hash_key = hash_function(num)         insert(hash_table, hash_key, num)     #some code 

edit/clarification : i'm sure if asked question correctly. need later search list of values matching entries (for 300 values), , want of course avoid nested each loops 647050 checks each of 300 values. thought of placing values list based on hash function search them quickly. not sure how implement this. current simple implementation creates far many collisions , takes forever.

if understood question correctly, should job:

hash_table = {i:nums[i] in range(647050)} 

i using python's builtin dictionary, seems more efficient using lists.


Comments

Popular posts from this blog

javascript - Karma not able to start PhantomJS on Windows - Error: spawn UNKNOWN -

Nuget pack csproj using nuspec -

c# - Display ASPX Popup control in RowDeleteing Event (ASPX Gridview) -