Programming with Judy C Language Judy Version 4.0 For more information about the Judy technology, visit the Judy web site at: http://devresource.hp.
Legal Notices The information in this document is subject to change without notice. Hewlett-Packard makes no warranty of any kind with regard to this manual, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. Hewlett-Packard shall not be held liable for errors contained herein or direct, indirect, special, incidental or consequential damages in connection with the furnishing, performance, or use of this material. Warranty.
Judy Technology Software License ATTENTION: USE OF THE SOFTWARE IS SUBJECT TO THE HP SOFTWARE LICENSE AND WARRANTY TERMS SET FORTH BELOW. USING THE SOFTWARE INDICATES YOUR ACCEPTANCE OF THESE LICENSE AND WARRANTY TERMS. IF YOU DO NOT ACCEPT THESE TERMS, YOU MAY RETURN THE SOFTWARE FOR A FULL REFUND. IF THE SOFTWARE IS BUNDLED WITH ANOTHER PRODUCT, YOU MAY RETURN THE ENTIRE UNUSED PRODUCT FOR A FULL REFUND.
item" as defined in FAR 2.101(a), or as "Restricted computer software" as defined in FAR 52.227-19 (Jun 1987)(or any equivalent agency regulation or contract clause), whichever is applicable. You have only those rights provided for such Software and any accompanying documentation by the applicable FAR or DFARS clause or the HP standard software agreement for the product involved. DISCLAIMER.
Contents 1. About Judy When To Use Judy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2. The Judy Advantage Judy Data Structures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Compared with Other Search and Retrieval Methods. . . . . . . . . . . . . . . . . . . . . . . . . . 18 Hashing versus JudyL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents 6
Preface This book contains information about the Judy technology, a programming innovation from Hewlett-Packard that combines scalability and ease-of-use with a significant performance advantage in many data-intensive applications.
About Judy 1 About Judy The Judy technology brings the benefits of exceptionally efficient, dynamic arrays to the C programmer. • Judy is designed to work well over a wide, dynamic range of values and maintains excellent performance across all array populations. Data-intensive applications using Judy can scale up without tuning as program requirements grow. • Code creation and maintenance is much easier.
About Judy because Judy stores the indexes in sorted order allowing compression of the common digits in the index word. With binary-tree structures, you expect the speed to be close to a log2 N function of the number of elements stored. Judy's speed is closer to log256 N. Unlike many data structures, Judy only slows a little when the population of indexes increases by a factor of 200. With this gentle slope, Judy performs well into the tera-element populations if that much memory is available.
About Judy When To Use Judy Then it can efficiently determine how many prime numbers exist between any random pair of numbers (prime or not). Because Judy maintains counts internally, the count function call can return an answer very quickly. When To Use Judy Designing programs with an unbounded array paradigm is one of the most powerful uses of Judy. Use Judy when your program requires: • • • • • Sparse arrays with dynamic or unbounded indexes (0, 1, 2...
About Judy When To Use Judy Judy libraries The table below shows the location of the libraries that are provided with the Judy technology on the HP-UX system: Hardware Architecture HP-PA 1.1 (32-bit only) HP-PA 2.0 Type Location on system (from root) 32-bit 64-bit archive /usr/lib/libJudy.a N/A shared /usr/lib/libJudy.sl N/A archive none /usr/lib/pa20_64/ libJudy.a shared /usr/lib/pa20_32/libJudy.sl /usr/lib/pa20_64/libJudy.sl NOTE The 32-bit HP-PA 1.1 shared library (/usr/lib/libJudy.
About Judy When To Use Judy In summary Judy can replace many commonly used data structures: arrays, sparse arrays, hash tables, B-trees, binary trees, linear lists, skiplists, and counting functions. Judy is designed to: ❏ Scale dynamically for future program requirements. ❏ Handle large, unbounded array structures particularly well with performance that is almost linear across large populations. ❏ Provide fast, built-in sorting and counting. ❏ Increase productivity through an easy-to-use API.
About Judy When To Use Judy 14 Chapter 1
The Judy Advantage Judy Data Structures 2 The Judy Advantage The Judy advantage is a measurable performance benefit that can be demonstrated by benchmarking Judy against hash tables, skip lists, tree structures, stacks, and queues. The chapter describes how Judy uses the digital tree data structure in a new and unique way to provide enhanced performance. This chapter also contains real-time performance measurements that show Judy compared with a hash algorithm.
The Judy Advantage Judy Data Structures necessary to search a digital tree until you reach the value. The index is divided into a series of array indexes until a unique index suffix is obtained.
The Judy Advantage Judy Data Structures do this, Judy keeps internal population counts at each node. These counts are also the reason why you can call Judy1Count or JudyLCount to find all the indexes between x and y and get an answer back much faster than you can count a linear array of the same population. Here is what you might find in an element of a Judy node: Pointers Like a traditional tree, Judy provides pointers to the next tree level.
The Judy Advantage Compared with Other Search and Retrieval Methods Compared with Other Search and Retrieval Methods The table below shows how Judy algorithms compare with other search and retrieval methods. Table 2-1 Search and Retrieval method (data structure) O number for lookup Arrays Hashing Depends on algorithm Advantages Disadvantages Fast indexing Pre-allocation of entire array before storing a single element (inefficient for sparse arrays).
The Judy Advantage Compared with Other Search and Retrieval Methods Table 2-1 Search and Retrieval method (data structure) O number for lookup Advantages Skip Lists O(log n) • • Digital Trees (Trie) O (logm n) Judy Less than O(log256 n) • • • • • • • • Chapter 2 Simplify the tree-balancing problem by using a probabilistic method instead of a strictly enforced balancing method.
The Judy Advantage Hashing versus JudyL Hashing versus JudyL The figures below show JudyL compared to a commonly used hashing algorithm. The hash algorithm is a simple static table sized to be a power of two with a linear (linked list) collision chain. The table is a fixed size of 1,048,576 buckets (220). Each bucket is a pointer to a chain. Each chain element is 3 words (key, value, next pointer).
The Judy Advantage Hashing versus JudyL Figure 2-2 Judy versus hash insertion Chapter 2 21
The Judy Advantage Hashing versus JudyL Figure 2-3 Judy versus hash retrieval 22 Chapter 2
The Judy Advantage Hashing versus JudyL Figure 2-4 Judy versus hash memory performance Chapter 2 23
The Judy Advantage Hashing versus JudyL Most programmers understand that hashing with a static table can deliver excellent performance. However when you use hashing, you must remember to: • • • • Size your hash table correctly. Beware of operators like mod (i.e. %) on a PA-RISC system. Use any available optimizations, like mallopt(3C). In-line your code. When you use Judy, you don’t need to remember anything on this checklist! Judy takes care of all the optimizations for you.
Using Judy 3 Using Judy The Judy library functions support small or large, dynamic arrays with 32- or 64-bit indexes depending on the processor type. Each type of Judy array has a different kind of index and/or value. You can think of a Judy array as: value-type JudyArray [4294967296]; // or 232 on a 32-bit processor value-type JudyArray [18446744073709551616]; // or 264 on a 64-bit processor Judy arrays are quick to access and space efficient.
Using Judy Table 3-1 A summary of Judy functions Judy Array Type Sorting Function Judy1: maps ulong_t to bit JudyL: maps ulong_t to ulong_t Count (between two indexes) Judy1Count JudyLCount Count (locate the Nth index within an array, where N=Count) Judy1ByCount JudyLByCount Free Array Judy1FreeArray JudyLFreeArray JudySL: maps string to ulong_t JudySLFreeArray As part of normal processing, Judy sorts indexes numerically using its own, internal sorting criteria.
Using Judy JudySL sorts are case sensitive and JudySL literally stores strings as chunks of bytes or numbers. Sorting is numeric per the ASCII value of the character, so the letter "A" is sorted using a different numeric sequence than the letter "a". For more information about sorting strings, see the “JudySL sorting example” on page 45. Counting You can use Judy counting functions to rapidly determine the number of valid indexes between any pair of indexes.
Using Judy Using Judy1 Using Judy1 Judy1 functions represent Boolean values (bit maps) in a Judy array. These functions support arrays of one-bit (binary) values, representing true/false, valid/invalid, or present/absent. The table below lists the Judy1 functions. For more information, see the Judy1(3X )man page. Table 3-2 Judy1 functions Actions Judy1Test Indicates whether the index is set in the array. Judy1Set Sets a bit in the array. Judy1Unset Clears a bit in the array.
Using Judy Using Judy1 Judy1 example You can find the code for this example and others on the Judy web site: http://devresource.hp.com/judy/ // Judy 1 Example code: bit set operations. // There should be a header file that defines the bit operation types: #define JUDY1OP_AND 1L #define JUDY1OP_OR 2L #define JUDY1OP_ANDNOT 3L #include "Judy.h" /******************************************************************* * Name: Judy1Op * Description: * Logical set operations on Judy1 arrays.
Using Judy Using Judy1 * JError_t * PJError (OUT) * Judy Error struct used to return Judy error. * Returns: * !JERR if successful * JERR if an error occurs * If the error is a caller error (invalid Operation or no PPDest) * then the PJError error code will be JU_ERRNO_NONE.
Using Judy Using Judy1 if (Index1 > Index2) { Index2 = Index1; Judy_rv = Judy1First(PSet2, &Index2, PJError); } else { // do the AND Judy_rv = Judy1Set(&PnewJArray, Index1, PJError); if (Judy_rv == JERR) return JERR; // bump to the next bits Judy_rv = Judy1Next(PSet1, &Index1, PJError); Judy_rv += Judy1Next(PSet2, &Index2, PJError); } } *PPDest = PnewJArray; break; case JUDY1OP_OR: /* Set all the bits from PSet1 */ for (Index1 = 0L, Judy_rv = Judy1First(PSet1, &Index1, PJError); Judy_rv == 1; Judy_rv =
Using Judy Using Judy1 Judy_rv = Judy1Next(PSet2, &Index1, PJError)) { if (Judy1Set(&PnewJArray, Index1, PJError) == JERR) return JERR; } *PPDest = PnewJArray; break; case JUDY1OP_ANDNOT: // PSet1 with PSet2 removed // 0010 = PSet1(1010) ANDNOT PSet2(1100) for (Index1 = 0L, Judy_rv = Judy1First(PSet1, &Index1, PJError); Judy_rv == 1; Judy_rv = Judy1Next(PSet1, &Index1, PJError)) { // if bit doesn't exist in PSet2, then add to result if (0 == Judy1Test(PSet2, Index1, PJError)) { if (Judy1Set(&PnewJArray, I
Using Judy Using JudyL Using JudyL JudyL functions provide a way to represent long-word values in a Judy array. These functions support large arrays of 32-bit or 64-bit values (long words). Since an initial (empty) JudyL array is represented by a null pointer, you can make JudyL arrays multi-dimensional, the equivalent of: void * JudyLArray [4294967296][4294967296]...; The table below shows the JudyL functions. For more information, see the JudyL(3X) man page.
Using Judy Using JudyL Table 3-3 JudyL example JudyL functions Actions JudyLCount Returns the count of set indexes between two specified indexes, including the specified indexes themselves. JudyLByCount Locates the Nth index (N = count) that is set in the JudyL array. JudyLFreeArray Frees the entire JudyL array. This is much faster than using JudyLNext and JudyLDel. You can find the code for this example and others on the Judy web site: http://devresource.hp.
Using Judy Using JudyL // or having to predetermine a good hash table size. It also gives you a // sorted list at the same time. Also notice that JudyLCount is much faster than you // can sequentially count items in an array. // #include #include #include #include "Judy.h" // Default number of iterations (number of random numbers generated) // This may be overridden on the command line #define DEFAULT_ITER 1000000 // The number of buckets the histogram is divided into.
Using Judy Using JudyL struct timeval TBeg, TEnd; #define STARTTm gettimeofday(&TBeg, NULL) #define ENDTm gettimeofday(&TEnd, NULL) #define DeltaUSec \ ((double)(TEnd.tv_sec * 1000000.0 + TEnd.tv_usec) \ - (TBeg.tv_sec * 1000000.0 + TBeg.
Using Judy Using JudyL (void) puts ("\nJudyL demonstration: random(3M) histogram"); // SET NUMBER OF RANDOMS TO GENERATE if (argc != 2) { // SET TO DEFAULT_ITER num_vals = DEFAULT_ITER; (void) printf ("Usage: %s [numvals]\n", argv[0]); (void) printf (" Since you did not specify a number of values, %d\n", num_vals); (void) puts (" will be used as the number of random numbers to insert"); (void) puts (" into the Judy array"); } else { // OVERRIDE NUMBER OF RANDOMS TO GENERATE num_vals = atol(argv[
Using Judy Using JudyL random(); } /* end of random number generator time */ ENDTm; random_gen_time = DeltaUSec; (void) printf("It took %.3f sec to generate %ld random numbers\n", random_gen_time/1000000, num_vals); (void) printf(" (ie. %.3f uSec/number)\n\n", random_gen_time/num_vals); // REGENERATE RANDOMS AND INSERT THEM INTO A JUDYL ARRAY (void) puts("Please wait while the random numbers are inserted into"); (void) puts("a JudyL array (with a usage count) ...
Using Judy Using JudyL // COUNT THE NUMBER OF ELEMENTS IN THE JUDYL ARRAY // IE. COUNT THE NUMBER OF UNIQUE RANDOM NUMBERS STARTTm; unique_nums = JudyLCount(PJArray, 0, ~0L, &JError); ENDTm; (void) printf("\nThere were %ld unique random numbers generated\n", unique_nums); (void) printf("It took %.3f uSec to count them in the Judy array.\n", DeltaUSec); // FIND HOW MANY NUMBERS WERE GENERATED ONCE, TWICE, ...
Using Judy Using JudyL } (*PGenCount)++; // increment hit count } ENDTm; (void) printf("That took %.3f Secs or %.
Using Judy Using JudyL (void) printf(" %ld unique values from %8x - %8x\n", JudyLCount(PJArray, Index, Index + histo_incr, &JError), Index, Index + histo_incr); Index += histo_incr + 1; } // PRINT MEAN (average), // MEDIAN (middle value, or average of two middle values) // RANGE (low and high value) tmp1 = (ulong_t)(ran_sum/(long long)num_vals); (void) printf(" mean: 0x%08x\n", tmp1); // If there were an even number of randoms generated, then average // the two middle numbers.
Using Judy Using JudyL Index = ~0; JudyLLast(PJArray, &Index, 0); (void) printf(" last random generated: 0x%08x\n", Index); exit(0); /*NOTREACHED*/ } // main() Sample output $ funhist 10000000 JudyL demonstration: random(3M) histogram timing random number generation It took 0.416 sec to generate 10000000 random numbers (ie. 0.042 uSec/number) Please wait while the random numbers are inserted into a JudyL array (with a usage count) ... That took 1.619 uSec/Index.
Using Judy Using JudyL 310960 unique values from 10000000 - 13ffffff 311522 unique values from 14000000 - 17ffffff 312201 unique values from 18000000 - 1bffffff 312361 unique values from 1c000000 - 1fffffff 312487 unique values from 20000000 - 23ffffff 311927 unique values from 24000000 - 27ffffff 313360 unique values from 28000000 - 2bffffff 312025 unique values from 2c000000 - 2fffffff 311236 unique values from 30000000 - 33ffffff 310973 unique values from 34000000 - 37ffffff 310617 unique values from 380
Using Judy Using JudySL Using JudySL The JudySL functions provide a way to use string values as indexes into a Judy array. In JudySL arrays, elements are sorted lexicographically (case-sensitive) by indexes: void * JudySLArray ["Toto, I don't think we're in Kansas any more"]; Since an initial (empty) JudySL array is represented by a null pointer, you can make JudySL arrays multi-dimensional: void * JudySLArray [char *][char *]...; The table below shows the JudySL functions.
Using Judy Using JudySL JudySL sorting example You can find the code for this example and others on the Judy web site: http://devresource.hp.com/judy/ // JudySort.c // Judy demonstration code: Judy equivalent of sort -u. While Judy is not // primarily intended as a sorting algorithm, in many cases it's faster to // store values in a Judy array and read them back in sorted order than to sort // them using standard algorithms.
Using Judy Using JudySL // CHECK FOR REQUIRED INPUT FILE PARAMETER: if (argc != 2) { (void) fprintf (stderr, "Usage: %s \n", argv[0]); (void) fputs ("Uses each line as a JudySL index.
Using Judy Using JudySL == PPJERR) { fprintf(stderr, "File: '%s', line: %d: Judy error: %d\n", __FILE__, __LINE__, JU_ERRNO(&JError)); exit (1); } // To modify the program to output duplication counts, like uniq -d, or to emit // multiple copies of repeated strings: #ifdef notdef ++(*((ulong_t *) PPValue)); #endif } // PRINT SORTED STRINGS: // // To output all strings and not just the unique ones, output Index multiple // times = *((ulong_t *) PPValue).
Using Judy Using JudySL In summary The Judy technology provides three array types each supporting a different kind of index and/or value. These are: Judy1 Map ulong_t to bit JudyL Map ulong_t to ulong_t JudySL Map char* to ulong_t Each of these array types have associated functions described in Table 3-1 on page 25.
Example of a Multi-dimensional Array Concepts 4 Example of a Multi-dimensional Array The JudySLGet and JudySLIns functions shown in this chapter illustrate a multi-dimensional JudyL array. This code is the proof of concept code for the Judy Version 4.0 functions: JudySLGet() and JudySLIns(). Although it is similar to the current JudySL, this code should not be used with released JudySL functions.
Example of a Multi-dimensional Array Example Code Example Code #include "Judy.h" #define TRUE 1L #define CHNULL '\0' #define PCNULL ((char *) NULL) #define PPVNULL ((PPvoid_t) NULL) #define WORDSIZE (sizeof (ulong_t)) // bytes in a word = JudyL index. // Copy a word from one location to another; typically one of which is not // word-aligned: // // Like strncpy(), null-pad the destination, but based on performance // measurements don't use strncpy().
Example of a Multi-dimensional Array Example Code // **************************************************************************** // J U D Y S L G E T PPvoid_t JudySLGet ( const void * PArray, const char * Index, JError_t * PJError) { char * pos = (char *)Index; ulong_t indexword; PPvoid_t PPvalue; // place in Index. // next word to find. // from JudyL array.
Example of a Multi-dimensional Array Example Code // CHECK FOR END OF STRING IN CURRENT indexword: if (LASTWORD (indexword)) return (PPvalue); // is value for whole Index string. // CONTINUE TO NEXT LEVEL DOWN JUDYL ARRAY TREE: // // If a previous JudySLIns() ran out of memory partway down the tree, it left a // null *PPvalue; this is automatically treated here as a dead-end. pos += WORDSIZE; PArray = *PPvalue; // each value -> next array.
Example of a Multi-dimensional Array Example Code assert (PPArray != PPVNULL); assert (Index != PCNULL); while (TRUE) // until return. { WORDCPY (& indexword, pos); if ((PPvalue = JudyLIns (PPArray, indexword, PJError)) == PPJERR) return (PPVNULL); // Index cannot be stored. if (LASTWORD (indexword)) return (PPvalue); pos // is value for whole Index string. += WORDSIZE; PPArray = PPvalue; // each value -> next array.
Example of a Multi-dimensional Array Example Code 54 Chapter 4
Tuning HP-UX Memory Access The chatr Command 5 Tuning HP-UX Memory Access Like other kinds of arrays, Judy arrays can potentially consume large amounts of memory. Because of this you should be aware of some HP-UX features that directly affect memory performance and the amount of memory you can access. The chatr Command The chatr command changes a program's internal attributes and can be used to modify the data page size for a specific executable.
Tuning HP-UX Memory Access Kernel Tunable Parameters For over 2GB of RAM, use chatr to enable the use of data space above 2GB: chatr +q3p enable (only needed on a 32-bit a.out) For over 3GB of RAM, use: chatr +q4p enable (only needed on a 32-bit a.out) See the chatr(1) man page for more information. Kernel Tunable Parameters Use the following parameters to tune the HP-UX kernel for more memory availability: maxdsize Limits the data-segment size of 32-bit processes.
Tuning HP-UX Memory Access Compiler Switch Compiler Switch The -N C compiler switch (32-bit only) is passed through to the linker and directs it to lower the start of the heap to immediately after the text (code) segment. This allows you almost a 1GB larger data segment, but makes your a.out unsharable. See ld(1) for more information. Swap Partition This is the disk partition used for a system swap area.
Tuning HP-UX Memory Access Swap Partition 58 Chapter 5
Where Did Judy Come From? A Where Did Judy Come From? The technological innovations that eventually became the Judy code were designed, implemented, and tested through more than a decade of research and development at Hewlett-Packard. The time line below shows the evolution of the Judy technology in HP internal products and code releases to the HP programming community.
Where Did Judy Come From? Why is it called Judy? We tried many other names and acronyms such as Sparse Array Memory Manager (SAMM) or Associative Memory Lookup Routine (AMLR), but none of them were adopted by the HP engineering community. The inventor, Doug, realized his relationship with the algorithm paralleled the relationship with his sister— Judy.
Caching and Memory Management B Caching and Memory Management Why use caching? Cache is a small, fast memory holding recently accessed data. The purpose of caching is to avoid unnecessary reloads of hot data that is likely to be used soon or frequently. CPU caching is usually used to access processor memory but other types of caching can also be used for a local copy of data accessible over a network (such as web browser caching or disk buffer caching).
Caching and Memory Management What happened here? The software requires access to data that is not in cache. Most of the time this means having to access an index that is not in cache memory and having to fill cache from main memory. The more frequently cache-line fills are required the more performance degrades over time. If this gets bad enough, the computer spends most of its time thrashing by repeatedly accessing main memory. Hence the saying, “Thrashing is virtual crashing.
Glossary Terms in this glossary attributed to (NIST) are derived from the National Institute of Standards and Technology (NIST) glossary. Other terms are derived from the Free Online Dictionary of Computing and whatis?com.1 A abstract data type (ADT) A mathematically specified collection of data-storing entities with operations to create, access, or change instances. (NIST) See also data structure. algorithm A computable set of steps to achieve a desired result.
computer's main memory. This is analogous to using a buffer cache to minimize disk I/O, but is implemented in hardware. Cache lines are size-aligned and typically 16 words in size in modern processors. cache-line fill Filling a cache line by accessing main memory as the result of a cache miss for one or more bytes not currently in any cache line. A cache-line fill takes ~30 times longer than a cache hit. cardinal number A basic or primary value. Examples of cardinal numbers are 1, 7, 9, and 123.
the tree as the expanse and/or population shrinks to best represent the population's indexes with the fewest cache-line fills and the least amount of memory. I index A key value to an ADT that appears array-like at the application interface. Since Judy appears array-like, keys into Judy ADTs are referred to as indexes.
need. Typically, the rescaling is to a larger size or volume. (2) The ability not only to function well in the rescaled situation, but to actually take full advantage of it. skip list A probabilistic alternative to balanced trees that was invented by William Pugh in 1987. Parallel lists at higher levels skip geometrically more items. Searching begins at the highest level to quickly get to the right part of the list, then uses progressively lower level lists. With enough levels, searching is O(log n).
Index Numerics 10-way digital tree, 15 A advantages (of Judy), 15 algorithms counting, 27 search and retrieval, 17 sorting, 26 arrays binary, 28 dynamic, 11, 16, 25 in Judy, 11, 25, 59 in memory, 62 Judy1, 28 JudyL, 33 JudySL, 44 multi-dimensional, 44, 49 sparse, 11 unbounded, 9, 11 B big O notation, 10 Boolean values, 28 C caching, 61 cache line fills, 17, 61 CPU, 61 curve, 61 defined, 61 in Judy, 62 counting, 10, 11, 17, 27 CPU caching, 61 D data structures, 10, 15 digital trees, 15, 59 in memory, 17 defi
Index table of functions, 44 L libraries (in Judy), 12 long-word values, 33 M memory management, 61 memory requirements, 9 multi-dimensional arrays, 44, 49 N nodes (Judy tree), 17 O overview of Judy, 9 P performance, 10, 11, 12 pointers (Judy tree), 17 S scalability, 10, 11 search and retrieval, 11, 15, 17 sorting, 10, 11, 26 example of, 45 speed, 10 string values, 44 strings, sorting, 26 summary of functions (Judy), 25 T trie See digital trees type (Judy tree), 17 U unbounded arrays, 9, 11 V values Boolean