|
Often when programming, there is a need to combine multiple variables into a single named object. Doing so makes it easier to keep track of your variables and adds clarity to your code. In Sheerpower, we call this object a CLUSTER (see CLUSTERS in Sheerpower). This section focuses on finding data in cluster arrays using the FINDROW() function (see also FINDROW Function).
The FINDROW() function creates an internal hash-table and uses it to quickly find a given value. Hashing is the process of converting data such as a name into one or more numbers. Instead of having to find the name, you can find the hashed number of the name. The hashed numbers can be organized in ways that make finding data extremely fast. Sheerpower’s FINDROW() function can find information at a rate of over 10 million rows per second. If the value being searched for does not exist, then the search is at a rate of over 15 million rows per second.
FINDROW() uses a method called “hashing” to perform high-speed lookups on cluster data. Internally, hashing uses what is called a hash-table. As new rows are added to the cluster, optimizations are done to maintain the hash-table. If values within the cluster change, more optimizations are done to maintain the changed hash-table.
Assuming there are 10 rows in your cluster, Sheerpower starts by building a hash-table with 16 slots. This over-allocation of 6 slots increases the efficiency of hash searches at the expense of some memory. The over-allocation also guarantees that there are always unused hash-table slots. The hash-table is accessed in three stages:
For example, if you are storing the value of “Boston” into this cluster (city$), Sheerpower creates three hash values (HVALUES) for this cluster variable...
Let’s find the first occurrence of Boston. Using three sets of prime numbers, we create three hash-values for Boston. The process of searching the hash-table is similar to when we built it.
When we want to insert a duplicate key, we find the hash-table slot for the first occurrence of the key. That slot contains, among other things, the current duplicate_key_count for the key. We increment the counter and then re-hash the duplicate key by factoring in the duplicate_key_count.
Our empirical studies using hundreds of millions of keys show the performance of FINDROW():
Hide Description
|
|
Enter or modify the code below, and then click on RUN |
Looking for the full power of Sheerpower?
Check out the Sheerpower website. Free to download. Free to use. |