User guide

Data Integration with Sybase Avaki Studio 155
Join
Inner and outer tables
The algorithm and join type choices refer to inner and outer tables. Studio doesn’t
require you to specify which is which when you connect your inputs to the Join oper-
ator. Instead, you can choose which of the two connected inputs will be the inner table
by using the Inner Table popup.
Note The terms “inner table,” “inner result set,” and “inner input” are used
interchangeably in the discussion of join operations. Similarly, “outer table,”
“outer result set,” and “outer input” are equivalent.
Join algorithms
You can choose the algorithm that Studio uses to process the join operation. Generally,
we recommend the “Automatic” option, which causes Studio to choose the algorithm
based on the actual data with which it is presented at runtime. However, if you know
specifics about the size and shape of the two inputs, you can decide to choose an algo-
rithm yourself. The algorithms are as follows:
Algorithm Description
Automatic Studio automatically determines the most appropriate algorithm
based on the size of the incoming result sets. It starts by trying to per-
form a hash join. If the data in the inner table is too large to fit in
memory, Studio tries to read the outer table into memory. If the outer
table fits, Studio performs a hash join with the tables reversed. If nei-
ther table fits, Studio switches to the sort-merge algorithm.
Sort Merge This is a scalable algorithm that works well with data sets of any
size. Studio breaks the data into chunks in order to sort it before per-
forming the join.
Nested Loop The nested loop algorithm processes each row in the inner table once
for each row in the outer table. It is scalable, and uses less disk space
than the sort-merge algorithm. If one of the tables being joined fits in
memory, however, the hash algorithm will generally be more effi-
cient.
Hash Studio reads the inner table into memory and hashes it, allowing very
quick lookup as it reads each row of the outer table. This is very effi-
cient if the inner table is small enough to easily fit into memory.