Parallel Computing Toolbox™ User's Guide R2015a
How to Contact MathWorks Latest news: www.mathworks.com Sales and services: www.mathworks.com/sales_and_services User community: www.mathworks.com/matlabcentral Technical support: www.mathworks.com/support/contact_us Phone: 508-647-7000 The MathWorks, Inc. 3 Apple Hill Drive Natick, MA 01760-2098 Parallel Computing Toolbox™ User's Guide © COPYRIGHT 2004–2015 by The MathWorks, Inc. The software described in this document is furnished under a license agreement.
Revision History November 2004 March 2005 September 2005 November 2005 March 2006 September 2006 March 2007 September 2007 March 2008 October 2008 March 2009 September 2009 March 2010 September 2010 April 2011 September 2011 March 2012 September 2012 March 2013 September 2013 March 2014 October 2014 March 2015 Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online
Contents 1 2 Getting Started Parallel Computing Toolbox Product Description . . . . . . . . Key Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2 1-2 Parallel Computing with MathWorks Products . . . . . . . . . . . 1-3 Key Problems Addressed by Parallel Computing . . . . . . . . . Run Parallel for-Loops (parfor) . . . . . . . . . . . . . . . . . . . . . . . Execute Batch Jobs in Parallel . . . . . . . . . . . . . . . . . . . . . . . Partition Large Data Sets . . .
Reductions: Cumulative Values Updated by Each Iteration . vi Contents 2-8 parfor Programming Considerations . . . . . . . . . . . . . . . . . . MATLAB Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10 2-10 2-10 parfor Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-11 Inputs and Outputs in parfor-Loops . . . . . . . . . . . . . . . . . . .
3 Reduction Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Basic Rules for Reduction Variables . . . . . . . . . . . . . . . . . . Further Considerations with Reduction Variables . . . . . . . . Example: Using a Custom Reduction Function . . . . . . . . . . 2-32 2-33 2-34 2-37 Temporary Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Uninitialized Temporaries . . . . . . . . . . . . . . . . . . . . . . . . . .
4 5 Create Codistributed Arrays . . . . . . . . . . . . . . . . . . . . . . . . 3-11 Programming Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MATLAB Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13 3-13 3-13 3-13 Interactive Parallel Computation with pmode pmode Versus spmd . . .
6 Working with Codistributed Arrays . . . . . . . . . . . . . . . . . . . . How MATLAB Software Distributes Arrays . . . . . . . . . . . . . Creating a Codistributed Array . . . . . . . . . . . . . . . . . . . . . . . Local Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Obtaining information About the Array . . . . . . . . . . . . . . . Changing the Dimension of Distribution . . . . . . . . . . . . . . . Restoring the Full Array . . . . . . . . . . . . . . . . . . . . . . .
x Contents Job Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Job Monitor GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manage Jobs Using the Job Monitor . . . . . . . . . . . . . . . . . . Identify Task Errors Using the Job Monitor . . . . . . . . . . . . 6-29 6-29 6-30 6-30 Programming Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Program Development Guidelines . . . . . . . . . . . . . . . . . . . .
7 Run mapreduce on a Hadoop Cluster . . . . . . . . . . . . . . . . . . Cluster Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Output Format and Order . . . . . . . . . . . . . . . . . . . . . . . . . . Calculate Mean Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-61 6-61 6-61 6-61 Partition a Datastore in Parallel . . . . . . . . . . . . . . . . . . . . . . 6-64 Program Independent Jobs Program Independent Jobs . . . . . . . . . . . . . . . . . . . . . . .
8 Program Communicating Jobs Program Communicating Jobs . . . . . . . . . . . . . . . . . . . . . . . . 8-2 Program Communicating Jobs for a Supported Scheduler . Schedulers and Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . Code the Task Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . Code in the Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-4 8-4 8-4 8-5 Program Communicating Jobs for a Generic Scheduler . . . Introduction . . . . . . . .
Example: Run Your MATLAB Code . . . . . . . . . . . . . . . . . . Supported MATLAB Code . . . . . . . . . . . . . . . . . . . . . . . . . . 9-14 9-15 Identify and Select a GPU Device . . . . . . . . . . . . . . . . . . . . . Example: Select a GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-18 9-18 Run CUDA or PTX Code on GPU . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Create a CUDAKernel Object . . . . .
10 11 Objects — Alphabetical List Functions — Alphabetical List Glossary xiv Contents
1 Getting Started • “Parallel Computing Toolbox Product Description” on page 1-2 • “Parallel Computing with MathWorks Products” on page 1-3 • “Key Problems Addressed by Parallel Computing” on page 1-4 • “Introduction to Parallel Solutions” on page 1-6 • “Determine Product Installation and Versions” on page 1-14
1 Getting Started Parallel Computing Toolbox Product Description Perform parallel computations on multicore computers, GPUs, and computer clusters Parallel Computing Toolbox lets you solve computationally and data-intensive problems using multicore processors, GPUs, and computer clusters. High-level constructs—parallel for-loops, special array types, and parallelized numerical algorithms—let you parallelize MATLAB® applications without CUDA or MPI programming.
Parallel Computing with MathWorks Products Parallel Computing with MathWorks Products In addition to Parallel Computing Toolbox providing a local cluster of workers for your client machine, MATLAB Distributed Computing Server software allows you to run as many MATLAB workers on a remote cluster of computers as your licensing allows. Most MathWorks products let you code in such a way as to run applications in parallel.
1 Getting Started Key Problems Addressed by Parallel Computing In this section... “Run Parallel for-Loops (parfor)” on page 1-4 “Execute Batch Jobs in Parallel” on page 1-5 “Partition Large Data Sets” on page 1-5 Run Parallel for-Loops (parfor) Many applications involve multiple segments of code, some of which are repetitive. Often you can use for-loops to solve these cases.
Key Problems Addressed by Parallel Computing the same machine as the client, you might see significant performance improvement on a multicore/multiprocessor machine. So whether your loop takes a long time to run because it has many iterations or because each iteration takes a long time, you can improve your loop speed by distributing iterations to MATLAB workers.
1 Getting Started Introduction to Parallel Solutions In this section... “Interactively Run a Loop in Parallel” on page 1-6 “Run a Batch Job” on page 1-8 “Run a Batch Parallel Loop” on page 1-9 “Run Script as Batch Job from the Current Folder Browser” on page 1-11 “Distribute Arrays and Run SPMD” on page 1-12 Interactively Run a Loop in Parallel This section shows how to modify a simple for-loop so that it runs in parallel.
Introduction to Parallel Solutions MATLAB® workers parfor MATLAB® client Because the iterations run in parallel in other MATLAB sessions, each iteration must be completely independent of all other iterations. The worker calculating the value for A(100) might not be the same worker calculating A(500). There is no guarantee of sequence, so A(900) might be calculated before A(400). (The MATLAB Editor can help identify some problems with parfor code that might not contain independent iterations.
1 Getting Started Run a Batch Job To offload work from your MATLAB session to run in the background in another session, you can use the batch command. This example uses the for-loop from the previous example, inside a script. 1 To create the script, type: edit mywave 2 In the MATLAB Editor, enter the text of the for-loop: for i = 1:1024 A(i) = sin(i*2*pi/1024); end 3 Save the file and close the Editor.
Introduction to Parallel Solutions batch runs your code on a local worker or a cluster worker, but does not require a parallel pool. You can use batch to run either scripts or functions. For more details, see the batch reference page. Run a Batch Parallel Loop You can combine the abilities to offload a job and run a parallel loop. In the previous two examples, you modified a for-loop to make a parfor-loop, and you submitted a script with a for-loop as a batch job.
1 Getting Started MATLAB® client MATLAB® workers batch parfor 5 To view the results: wait(job) load(job,'A') plot(A) The results look the same as before, however, there are two important differences in execution: • The work of defining the parfor-loop and accumulating its results are offloaded to another MATLAB session by batch.
Introduction to Parallel Solutions Run Script as Batch Job from the Current Folder Browser From the Current Folder browser, you can run a MATLAB script as a batch job by browsing to the file’s folder, right-clicking the file, and selecting Run Script as Batch Job. The batch job runs on the cluster identified by the default cluster profile. The following figure shows the menu option to run the script file script1.m: Running a script as a batch from the browser uses only one worker from the cluster.
1 Getting Started Distribute Arrays and Run SPMD Distributed Arrays The workers in a parallel pool communicate with each other, so you can distribute an array among the workers. Each worker contains part of the array, and all the workers are aware of which portion of the array each worker has.
Introduction to Parallel Solutions The line above retrieves the data from worker 3 to assign the value of X. The following code sends data to worker 3: X = X + 2; R{3} = X; % Send the value of X from the client to worker 3. If the parallel pool remains open between spmd statements and the same workers are used, the data on each worker persists from one spmd statement to another. spmd R = R + labindex end % Use values of R from previous spmd.
1 Getting Started Determine Product Installation and Versions To determine if Parallel Computing Toolbox software is installed on your system, type this command at the MATLAB prompt. ver When you enter this command, MATLAB displays information about the version of MATLAB you are running, including a list of all toolboxes installed on your system and their version numbers.
2 Parallel for-Loops (parfor) • “Introduction to parfor” on page 2-2 • “Create a parfor-Loop” on page 2-4 • “Comparing for-Loops and parfor-Loops” on page 2-6 • “Reductions: Cumulative Values Updated by Each Iteration” on page 2-8 • “parfor Programming Considerations” on page 2-10 • “parfor Limitations” on page 2-11 • “Inputs and Outputs in parfor-Loops” on page 2-12 • “Objects and Handles in parfor-Loops” on page 2-13 • “Nesting and Flow in parfor-Loops” on page 2-15 • “Variables and Transparency in parfor
2 Parallel for-Loops (parfor) Introduction to parfor In this section... “parfor-Loops in MATLAB” on page 2-2 “Deciding When to Use parfor” on page 2-2 parfor-Loops in MATLAB The basic concept of a parfor-loop in MATLAB software is the same as the standard MATLAB for-loop: MATLAB executes a series of statements (the loop body) over a range of values.
Introduction to parfor one when you have only a small number of simple calculations. The examples of this section are only to illustrate the behavior of parfor-loops, not necessarily to show the applications best suited to them.
2 Parallel for-Loops (parfor) Create a parfor-Loop The safest approach when creating a parfor-loop is to assume that iterations are performed on different MATLAB workers in the parallel pool, so there is no sharing of information between iterations. If you have a for-loop in which all iterations are completely independent of each other, this loop is a good candidate for a parfor-loop.
Create a parfor-Loop • “Comparing for-Loops and parfor-Loops” on page 2-6 • “Reductions: Cumulative Values Updated by Each Iteration” on page 2-8 • “parfor Programming Considerations” on page 2-10 • “parfor Limitations” on page 2-11 2-5
2 Parallel for-Loops (parfor) Comparing for-Loops and parfor-Loops Because parfor-loops are not quite the same as for-loops, there are specific behaviors of each to be aware of. As seen from the example in the topic “Create a parfor-Loop” on page 2-4, when you assign to an array variable (such as A in that example) inside the loop by indexing with the loop variable, the elements of that array are available in the client workspace after the loop, much the same as with a for-loop.
Comparing for-Loops and parfor-Loops parfor-loop requires that each iteration be independent of the other iterations, and that all code that follows the parfor-loop not depend on the loop iteration sequence.
2 Parallel for-Loops (parfor) Reductions: Cumulative Values Updated by Each Iteration These two examples show parfor-loops using reduction assignments. A reduction is an accumulation across iterations of a loop. The example on the left uses x to accumulate a sum across 10 iterations of the loop. The example on the right generates a concatenated array, 1:10.
Reductions: Cumulative Values Updated by Each Iteration More About • “Introduction to parfor” on page 2-2 • “Comparing for-Loops and parfor-Loops” on page 2-6 • “Reduction Variables” on page 2-32 2-9
2 Parallel for-Loops (parfor) parfor Programming Considerations In this section... “MATLAB Path” on page 2-10 “Error Handling” on page 2-10 MATLAB Path All workers executing a parfor-loop must have the same MATLAB search path as the client, so that they can execute any functions called in the body of the loop. Therefore, whenever you use cd, addpath, or rmpath on the client, it also executes on all the workers, if possible. For more information, see the parpool reference page.
parfor Limitations parfor Limitations Most of these restrictions result from the need for loop iterations to be completely independent of each other, or the fact that the iterations run on MATLAB worker sessions instead of the client session.
2 Parallel for-Loops (parfor) Inputs and Outputs in parfor-Loops In this section... “Functions with Interactive Inputs” on page 2-12 “Displaying Output” on page 2-12 Functions with Interactive Inputs If you use a function that is not strictly computational in nature (e.g., input, keyboard) in a parfor-loop or in any function called by a parfor-loop, the behavior of that function occurs on the worker. The behavior might include hanging the worker process or having no visible effect at all.
Objects and Handles in parfor-Loops Objects and Handles in parfor-Loops In this section... “Using Objects in parfor-Loops” on page 2-13 “Handle Classes” on page 2-13 “Sliced Variables Referencing Function Handles” on page 2-13 Using Objects in parfor-Loops If you are passing objects into or out of a parfor-loop, the objects must properly facilitate being saved and loaded. For more information, see “Save and Load Process”.
2 Parallel for-Loops (parfor) B = @sin; for ii = 1:100 A(ii) = B(ii); end A corresponding parfor-loop does not allow B to reference a function handle.
Nesting and Flow in parfor-Loops Nesting and Flow in parfor-Loops In this section... “Nested Functions” on page 2-15 “Nested Loops” on page 2-15 “Nested spmd Statements” on page 2-17 “Break and Return Statements” on page 2-17 “P-Code Scripts” on page 2-17 Nested Functions The body of a parfor-loop cannot make reference to a nested function. However, it can call a nested function by means of a function handle. Nested Loops The body of a parfor-loop cannot contain another parfor-loop.
2 Parallel for-Loops (parfor) Limitations of Nested for-Loops For proper variable classification, the range of a for-loop nested in a parfor must be defined by constant numbers or variables. In the following example, the code on the left does not work because the for-loop upper limit is defined by a function call.
Nesting and Flow in parfor-Loops Invalid Valid disp(A(i, 1)) end disp(v(1)) A(i, :) = v; end end Inside a parfor, if you use multiple for-loops (not nested inside each other) to index into a single sliced array, they must loop over the same range of values. Furthermore, a sliced output variable can be used in only one nested for-loop.
2 Parallel for-Loops (parfor) More About 2-18 • “parfor Limitations” on page 2-11 • “Convert Nested for-Loops to parfor” on page 2-48
Variables and Transparency in parfor-Loops Variables and Transparency in parfor-Loops In this section... “Unambiguous Variable Names” on page 2-19 “Transparency” on page 2-19 “Structure Arrays in parfor-Loops” on page 2-20 “Scalar Expansion with Sliced Outputs” on page 2-21 “Global and Persistent Variables” on page 2-22 Unambiguous Variable Names If you use a name that MATLAB cannot unambiguously distinguish as a variable inside a parfor-loop, at parse time MATLAB assumes you are referencing a function.
2 Parallel for-Loops (parfor) Similarly, you cannot clear variables from a worker's workspace by executing clear inside a parfor statement: parfor ii = 1:4 clear('X') % cannot clear: transparency violation end As a workaround, you can free up most of the memory used by a variable by setting its value to empty, presumably when it is no longer needed in your parfor statement: parfor ii = 1:4 X = [];
Variables and Transparency in parfor-Loops temp = struct(); temp.myfield1 = rand(); temp.myfield2 = i; end parfor i = 1:4 temp = struct('myfield1',rand(),'myfield2',i); end Slicing Structure Fields You cannot use structure fields as sliced input or output arrays in a parfor-loop; that is, you cannot use the loop variable to index the elements of a structure field. For example, in the following code both lines in the loop generate a classification error because of the indexing: parfor i = 1:4 outputData.
2 Parallel for-Loops (parfor) x = zeros(10,12); parfor idx = 1:12 x(:,idx) = idx; end The following code offers a suggested workaround for this limitation. x = zeros(10,12); parfor idx = 1:12 x(:,idx) = repmat(idx,10,1); end Global and Persistent Variables The body of a parfor-loop cannot contain global or persistent variable declarations.
Classification of Variables in parfor-Loops Classification of Variables in parfor-Loops When a name in a parfor-loop is recognized as referring to a variable, the variable is classified into one of several categories. A parfor-loop generates an error if it contains any variables that cannot be uniquely categorized or if any variables violate their category restrictions.
2 Parallel for-Loops (parfor) temporary variable reduction variable sliced output variable loop variable sliced input variable broadcast variable Notes about Required and Recommended Guidelines The detailed topics linked from the table above, include guidelines and restrictions in shaded boxes like the one shown below. Those labeled as Required result in an error if your parfor code does not adhere to them.
Loop Variable Loop Variable The loop variable defines the loop index value for each iteration. It is set with the beginning line of a parfor statement: parfor p=1:12 For values across all iterations, the loop variable must evaluate to ascending consecutive integers. Each iteration is independent of all others, and each has its own loop index value. The following restriction is required, because changing p in the parfor body cannot guarantee the independence of iterations.
2 Parallel for-Loops (parfor) More About • 2-26 “Classification of Variables in parfor-Loops” on page 2-23
Sliced Variables Sliced Variables A sliced variable is one whose value can be broken up into segments, or slices, which are then operated on separately by different workers. Each iteration of the loop works on a different slice of the array. Using sliced variables is important because this type of variable can reduce communication between the client and workers. Only those slices needed by a worker are sent to it, and only when it starts working on a particular range of indices.
2 Parallel for-Loops (parfor) After the first level, you can use any type of valid MATLAB indexing in the second and further levels. The variable A shown here on the left is not sliced; that shown on the right is sliced: A.q{i,12} A{i,12}.q Fixed Index Listing. Within the first-level parentheses or braces of a sliced variable’s indexing, the list of indices is the same for all occurrences of a given variable.
Sliced Variables a simple (nonindexed) broadcast variable; and every other index is a scalar constant, a simple broadcast variable, a nested for-loop index, colon, or end. With i as the loop variable, the A variables shown here on the left are not sliced; those on the right are sliced: Not sliced Sliced A(i+f(k),j,:,3) % f(k) invalid for slicing A(i+k,j,:,3) A(i,20:30,end) % 20:30 not scalar A(i,:,end) A(i,:,s.field1) % s.
2 Parallel for-Loops (parfor) However, if it is clear that in every iteration, every reference to an array element is set before it is used, the variable is not a sliced input variable. In this example, all the elements of A are set, and then only those fixed values are used: parfor ii = 1:n if someCondition A(ii) = 32; else A(ii) = 17; end loop code that uses A(ii) end Even if a sliced variable is not explicitly referenced as an input, implicit usage might make it so.
Broadcast Variables Broadcast Variables A broadcast variable is any variable other than the loop variable or a sliced variable that is not affected by an assignment inside the loop. At the start of a parfor-loop, the values of any broadcast variables are sent to all workers. Although this type of variable can be useful or even essential, broadcast variables that are large can cause a lot of communication between client and workers.
2 Parallel for-Loops (parfor) Reduction Variables MATLAB supports an important exception, called reductions, to the rule that loop iterations must be independent. A reduction variable accumulates a value that depends on all the iterations together, but is independent of the iteration order. MATLAB allows reduction variables in parfor-loops. Reduction variables appear on both side of an assignment statement, such as any of the following, where expr is a MATLAB expression.
Reduction Variables parfor i = 1:n X = X + d(i); end This loop is equivalent to the following, where each d(i) is calculated by a different iteration: X = X + d(1) + ... + d(n) If the loop were a regular for-loop, the variable X in each iteration would get its value either before entering the loop or from the previous iteration of the loop. However, this concept does not apply to parfor-loops: In a parfor-loop, the value of X is never transmitted from client to workers or from worker to worker.
2 Parallel for-Loops (parfor) Required (static): If the reduction assignment uses * or [,], then in every reduction assignment for X, X must be consistently specified as the first argument or consistently specified as the second. The parfor-loop on the left below is not valid because the order of items in the concatenation is not consistent throughout the loop.
Reduction Variables beginning of each iteration. The parfor on the right is correct, because it does not assign f inside the loop: Invalid Valid f = @(x,k)x * k; parfor i = 1:n a = f(a,i); % loop body continued f = @times; % Affects f end f = @(x,k)x * k; parfor i = 1:n a = f(a,i); % loop body continued end Note that the operators && and || are not listed in the table in “Reduction Variables” on page 2-32.
2 Parallel for-Loops (parfor) parfor statement might produce values of X with different round-off errors. This is an unavoidable cost of parallelism. For example, the statement on the left yields 1, while the statement on the right returns 1 + eps: (1 + eps/2) + eps/2 1 + (eps/2 + eps/2) With the exception of the minus operator (-), all the special cases listed in the table in “Reduction Variables” on page 2-32 have a corresponding (perhaps approximately) associative function.
Reduction Variables f(e,a) = a = f(a,e) Examples of identity elements for some functions are listed in this table. Function Identity Element + 0 * and .* 1 [,] and [;] [] & true | false MATLAB uses the identity elements of reduction functions when it knows them. So, in addition to associativity and commutativity, you should also keep identity elements in mind when overloading these functions. Recommended: An overload of +, *, .
2 Parallel for-Loops (parfor) First consider the reduction function itself. To compare an iteration's result against another's, the function requires as input the current iteration's result and the known maximum result from other iterations so far. Each of the two inputs is a vector containing an iteration's result data and iteration number.
Temporary Variables Temporary Variables A temporary variable is any variable that is the target of a direct, nonindexed assignment, but is not a reduction variable. In the following parfor-loop, a and d are temporary variables: a = 0; z = 0; r = rand(1,10); parfor i = 1:10 a = i; z = z + i; if i <= 5 d = 2*a; end end % Variable a is temporary % Variable d is temporary In contrast to the behavior of a for-loop, MATLAB effectively clears any temporary variables before each iteration of a parfor-loop.
2 Parallel for-Loops (parfor) b = false; end ... end This loop is acceptable as an ordinary for-loop, but as a parfor-loop, b is a temporary variable because it occurs directly as the target of an assignment inside the loop. Therefore it is cleared at the start of each iteration, so its use in the condition of the if is guaranteed to be uninitialized.
Temporary Variables • “Reduction Variables” on page 2-32 2-41
2 Parallel for-Loops (parfor) Improving parfor Performance Where to Create Arrays With a parfor-loop, it might be faster to have each MATLAB worker create its own arrays or portions of them in parallel, rather than to create a large array in the client before the loop and send it out to all the workers separately.
Improving parfor Performance before the loop (as shown on the left below), rather than have each worker create its own arrays inside the loop (as shown on the right). Try the following examples running a parallel pool locally, and notice the difference in time execution for each loop. First open a local parallel pool: parpool('local') Then enter the following examples.
2 Parallel for-Loops (parfor) Parallel Pools In this section...
Parallel Pools Automatically Start and Stop a Parallel Pool By default, a parallel pool starts automatically when needed by certain parallel language features. The following statements and functions can cause a pool to start: • parfor • spmd • distributed • Composite • parfeval • parfevalOnAll • gcp • mapreduce • mapreducer Your parallel preferences specify which cluster the pool runs on, and the preferred number of workers in the pool.
2 Parallel for-Loops (parfor) To open a parallel pool based on your preference settings: parpool To open a pool of a specific size: parpool(4) To use a cluster other than your default, to specify where the pool runs: parpool('MyCluster',4) Shut Down a Parallel Pool You can get the current parallel pool, and use that object when you want to shut down the pool: p = gcp; delete(p) Pool Size and Cluster Selection There are several places to specify your pool size.
Parallel Pools If you specify a pool size at the command line, this overrides the setting of your preferences. But this value must fall within the limits of the applicable cluster profile. 4 Parallel preferences If you do not specify a pool size at the command line, MATLAB attempts to start a parallel pool with a size determined by your parallel preferences, provided that this value falls within the limits of the applicable cluster profile.
2 Parallel for-Loops (parfor) Convert Nested for-Loops to parfor A typical use case for nested loops is to step through an array using one loop variable to index through one dimension, and a nested loop variable to index another dimension. The basic form looks like this: X = zeros(n,m); for a = 1:n for b = 1:m X(a,b) = fun(a,b) end end The following code shows an extremely simple example, with results you can easily view.
Convert Nested for-Loops to parfor M1 = magic(5); for a = 1:5 parfor b = 1:5 M2(a,b) = a*10 + b + M1(a,b)/10000; end end M2 In this case, each iteration of the outer loop in MATLAB, initiates a parfor-loop. That is, this code creates five parfor-loops. There is generally more overhead to a parforloop than a for-loop, so you might find that this approach does not perform optimally.
3 Single Program Multiple Data (spmd) • “Execute Simultaneously on Multiple Data Sets” on page 3-2 • “Access Worker Variables with Composites” on page 3-6 • “Distribute Arrays” on page 3-10 • “Programming Tips” on page 3-13
3 Single Program Multiple Data (spmd) Execute Simultaneously on Multiple Data Sets In this section... “Introduction” on page 3-2 “When to Use spmd” on page 3-2 “Define an spmd Statement” on page 3-3 “Display Output” on page 3-5 Introduction The single program multiple data (spmd) language construct allows seamless interleaving of serial and parallel programming. The spmd statement lets you define a block of code to run simultaneously on multiple workers.
Execute Simultaneously on Multiple Data Sets Define an spmd Statement The general form of an spmd statement is: spmd end Note If a parallel pool is not running, spmd creates a pool using your default cluster profile, if your parallel preferences are set accordingly. The block of code represented by executes in parallel simultaneously on all workers in the parallel pool.
3 Single Program Multiple Data (spmd) R = rand(4,4); end Note All subsequent examples in this chapter assume that a parallel pool is open and remains open between sequences of spmd statements. Unlike a parfor-loop, the workers used for an spmd statement each have a unique value for labindex. This lets you specify code to be run on only certain workers, or to customize execution, usually for the purpose of accessing unique data.
Execute Simultaneously on Multiple Data Sets Display Output When running an spmd statement on a parallel pool, all command-line output from the workers displays in the client Command Window. Because the workers are MATLAB sessions without displays, any graphical output (for example, figure windows) from the pool does not display at all.
3 Single Program Multiple Data (spmd) Access Worker Variables with Composites In this section... “Introduction to Composites” on page 3-6 “Create Composites in spmd Statements” on page 3-6 “Variable Persistence and Sequences of spmd” on page 3-8 “Create Composites Outside spmd Statements” on page 3-9 Introduction to Composites Composite objects in the MATLAB client session let you directly access data values on the workers. Most often you assigned these variables within spmd statements.
Access Worker Variables with Composites 3 4 5 9 7 2 2 11 7 14 3 10 6 15 MM{2} 16 5 9 4 13 8 12 1 A variable might not be defined on every worker. For the workers on which a variable is not defined, the corresponding Composite element has no value. Trying to read that element throws an error. spmd if labindex > 1 HH = rand(4); end end HH Lab 1: No data Lab 2: class = double, size = [4 Lab 3: class = double, size = [4 4] 4] You can also set values of Composite elements from the client.
3 Single Program Multiple Data (spmd) 0 0 0 0 1 0 0 1 Data transfers from worker to client when you explicitly assign a variable in the client workspace using a Composite element: M = MM{1} % Transfer data from worker 1 to variable M on the client 8 3 4 1 5 9 6 7 2 Assigning an entire Composite to another Composite does not cause a data transfer.
Access Worker Variables with Composites AA(:) % Composite [1] [2] [3] [4] spmd AA = AA * 2; % Multiply existing value end AA(:) % Composite [2] [4] [6] [8] clear AA % Clearing in client also clears on workers spmd; AA = AA * 2; end % Generates error delete(gcp) Create Composites Outside spmd Statements The Composite function creates Composite objects without using an spmd statement.
3 Single Program Multiple Data (spmd) Distribute Arrays In this section... “Distributed Versus Codistributed Arrays” on page 3-10 “Create Distributed Arrays” on page 3-10 “Create Codistributed Arrays” on page 3-11 Distributed Versus Codistributed Arrays You can create a distributed array in the MATLAB client, and its data is stored on the workers of the open parallel pool.
Distribute Arrays requirements. These overloaded functions include eye(___,'distributed'), rand(___,'distributed'), etc. For a full list, see the distributed object reference page. • Create a codistributed array inside an spmd statement, then access it as a distributed array outside the spmd statement. This lets you use distribution schemes other than the default.
3 Single Program Multiple Data (spmd) parpool('local',2) % Create pool spmd codist = codistributor1d(3,[4,12]); Z = zeros(3,3,16,codist); Z = Z + labindex; end Z % View results in client. % Z is a distributed array here. delete(gcp) % Stop pool For more details on codistributed arrays, see “Working with Codistributed Arrays” on page 5-5.
Programming Tips Programming Tips In this section... “MATLAB Path” on page 3-13 “Error Handling” on page 3-13 “Limitations” on page 3-13 MATLAB Path All workers executing an spmd statement must have the same MATLAB search path as the client, so that they can execute any functions called in their common block of code. Therefore, whenever you use cd, addpath, or rmpath on the client, it also executes on all the workers, if possible. For more information, see the parpool reference page.
3 Single Program Multiple Data (spmd) X = 5; spmd eval('X'); end Similarly, you cannot clear variables from a worker's workspace by executing clear inside an spmd statement: spmd; clear('X'); end To clear a specific variable from a worker, clear its Composite from the client workspace. Alternatively, you can free up most of the memory used by a variable by setting its value to empty, presumably when it is no longer needed in your spmd statement: spmd
Programming Tips run in parallel in another parallel pool, but runs serially in a single thread on the worker running its containing function. Nested parfor-Loops The body of a parfor-loop cannot contain an spmd statement, and an spmd statement cannot contain a parfor-loop. Break and Return Statements The body of an spmd statement cannot contain break or return statements. Global and Persistent Variables The body of an spmd statement cannot contain global or persistent variable declarations.
4 Interactive Parallel Computation with pmode This chapter describes interactive pmode in the following sections: • “pmode Versus spmd” on page 4-2 • “Run Communicating Jobs Interactively Using pmode” on page 4-3 • “Parallel Command Window” on page 4-10 • “Running pmode Interactive Jobs on a Cluster” on page 4-15 • “Plotting Distributed Data Using pmode” on page 4-16 • “pmode Limitations and Unexpected Results” on page 4-18 • “pmode Troubleshooting” on page 4-19
4 Interactive Parallel Computation with pmode pmode Versus spmd pmode lets you work interactively with a communicating job running simultaneously on several workers. Commands you type at the pmode prompt in the Parallel Command Window are executed on all workers at the same time. Each worker executes the commands in its own workspace on its own variables.
Run Communicating Jobs Interactively Using pmode Run Communicating Jobs Interactively Using pmode This example uses a local scheduler and runs the workers on your local MATLAB client machine. It does not require an external cluster or scheduler. The steps include the pmode prompt (P>>) for commands that you type in the Parallel Command Window. 1 Start the pmode with the pmode command.
4 Interactive Parallel Computation with pmode 4 A variable does not necessarily have the same value on every worker. The labindex function returns the ID particular to each worker working on this communicating job. In this example, the variable x exists with a different value in the workspace of each worker. P>> x = labindex 5 Return the total number of workers working on the current communicating job with the numlabs function. P>> all = numlabs 6 Create a replicated array on all the workers.
Run Communicating Jobs Interactively Using pmode 7 Assign a unique value to the array on each worker, dependent on the worker number (labindex). With a different value on each worker, this is a variant array.
4 Interactive Parallel Computation with pmode 8 Until this point in the example, the variant arrays are independent, other than having the same name. Use the codistributed.build function to aggregate the array segments into a coherent array, distributed among the workers. P>> codist = codistributor1d(2, [2 2 2 2], [3 8]) P>> whole = codistributed.build(segment, codist) This combines four separate 3-by-2 arrays into one 3-by-8 codistributed array.
Run Communicating Jobs Interactively Using pmode P>> combined = gather(whole) Notice, however, that this gathers the entire array into the workspaces of all the workers. See the gather reference page for the syntax to gather the array into the workspace of only one worker. 12 Because the workers ordinarily do not have displays, if you want to perform any graphical tasks involving your data, such as plotting, you must do this from the client workspace.
4 Interactive Parallel Computation with pmode 14 If you require distribution along a different dimension, you can use the redistribute function. In this example, the argument 1 to codistributor1d specifies distribution of the array along the first dimension (rows).
Run Communicating Jobs Interactively Using pmode 15 Exit pmode and return to the regular MATLAB desktop.
4 Interactive Parallel Computation with pmode Parallel Command Window When you start pmode on your local client machine with the command pmode start local 4 four workers start on your local machine and a communicating job is created to run on them. The first time you run pmode with these options, you get a tiled display of the four workers.
Parallel Command Window You have several options for how to arrange the tiles showing your worker outputs. Usually, you will choose an arrangement that depends on the format of your data. For example, the data displayed until this point in this section, as in the previous figure, is distributed by columns. It might be convenient to arrange the tiles side by side.
4 Interactive Parallel Computation with pmode P>> distobj = codistributor('1d',1); P>> I = redistribute(I, distobj) When you rearrange the tiles, you see the following. Select vertical arrangement Drag to adjust tile sizes You can control the relative positions of the command window and the worker output. The following figure shows how to set the output to display beside the input, rather than above it.
Parallel Command Window You can choose to view the worker outputs by tabs. 1. Select tabbed display 3. Select labs shown in this tab 2. Select tab You can have multiple workers send their output to the same tile or tab. This allows you to have fewer tiles or tabs than workers. Click tabbed output Select only two tabs In this case, the window provides shading to help distinguish the outputs from the various workers.
4 Interactive Parallel Computation with pmode Multiple labs in same tab 4-14
Running pmode Interactive Jobs on a Cluster Running pmode Interactive Jobs on a Cluster When you run pmode on a cluster of workers, you are running a job that is much like any other communicating job, except it is interactive. The cluster can be heterogeneous, but with certain limitations described at http://www.mathworks.com/products/parallelcomputing/requirements.
4 Interactive Parallel Computation with pmode Plotting Distributed Data Using pmode Because the workers running a job in pmode are MATLAB sessions without displays, they cannot create plots or other graphic outputs on your desktop. When working in pmode with codistributed arrays, one way to plot a codistributed array is to follow these basic steps: 1 Use the gather function to collect the entire array into the workspace of one worker.
Plotting Distributed Data Using pmode This is not the only way to plot codistributed data. One alternative method, especially useful when running noninteractive communicating jobs, is to plot the data to a file, then view it from a later MATLAB session.
4 Interactive Parallel Computation with pmode pmode Limitations and Unexpected Results Using Graphics in pmode Displaying a GUI The workers that run the tasks of a communicating job are MATLAB sessions without displays. As a result, these workers cannot display graphical tools and so you cannot do things like plotting from within pmode.
pmode Troubleshooting pmode Troubleshooting In this section... “Connectivity Testing” on page 4-19 “Hostname Resolution” on page 4-19 “Socket Connections” on page 4-19 Connectivity Testing For testing connectivity between the client machine and the machines of your compute cluster, you can use Admin Center.
5 Math with Codistributed Arrays This chapter describes the distribution or partition of data across several workers, and the functionality provided for operations on that data in spmd statements, communicating jobs, and pmode. The sections are as follows.
5 Math with Codistributed Arrays Nondistributed Versus Distributed Arrays In this section... “Introduction” on page 5-2 “Nondistributed Arrays” on page 5-2 “Codistributed Arrays” on page 5-4 Introduction All built-in data types and data structures supported by MATLAB software are also supported in the MATLAB parallel computing environment. This includes arrays of any number of dimensions containing numeric, character, logical values, cells, or structures; but not function handles or user-defined objects.
Nondistributed Versus Distributed Arrays WORKER 1 8 3 4 1 5 9 6 7 2 WORKER 2 | | | | 8 3 4 1 5 9 WORKER 3 | | | | 6 7 2 8 3 4 1 5 9 6 7 2 WORKER 4 | | | | 8 3 4 1 5 9 6 7 2 Variant Arrays A variant array also resides in the workspaces of all workers, but its content differs on one or more workers. When you create the array, MATLAB assigns a different value to the same variable on all workers.
5 Math with Codistributed Arrays Codistributed Arrays With replicated and variant arrays, the full content of the array is stored in the workspace of each worker. Codistributed arrays, on the other hand, are partitioned into segments, with each segment residing in the workspace of a different worker. Each worker has its own array segment to work with.
Working with Codistributed Arrays Working with Codistributed Arrays In this section...
5 Math with Codistributed Arrays end Lab Lab Lab Lab 1: 2: 3: 4: This This This This lab lab lab lab stores stores stores stores D(:,1:250). D(:,251:500). D(:,501:750). D(:,751:1000). Each worker has access to all segments of the array. Access to the local segment is faster than to a remote segment, because the latter requires sending and receiving data between workers and thus takes more time.
Working with Codistributed Arrays number is not evenly divisible by the number of workers, MATLAB partitions the array as evenly as possible. MATLAB provides codistributor object properties called Dimension and Partition that you can use to determine the exact distribution of an array. See “Indexing into a Codistributed Array” on page 5-14 for more information on indexing with codistributed arrays.
5 Math with Codistributed Arrays spmd, A = [11:18; 21:28; 31:38; 41:48], end A = 11 12 13 14 15 16 17 21 22 23 24 25 26 27 31 32 33 34 35 36 37 41 42 43 44 45 46 47 18 28 38 48 The next line uses the codistributed function to construct a single 4-by-8 matrix D that is distributed along the second dimension of the array: spmd D = codistributed(A); getLocalPart(D) end 1: Local Part 11 12 21 22 31 32 41 42 | 2: Local Part | 13 14 | 23 24 | 33 34 | 43 44 | 3: Local Part | 15 16 | 25 26 | 35 36 | 45 46 |
Working with Codistributed Arrays (local part) on each worker first, and then combine them into a single array that is distributed across the workers. This example creates a 4-by-250 variant array A on each of four workers and then uses codistributor to distribute these segments across four workers, creating a 4-by-1000 codistributed array. Here is the variant array, A: spmd A = [1:250; 251:500; 501:750; 751:1000] + 250 * (labindex - 1); end 1 251 501 751 WORKER 1 2 ... 250 252 ... 500 502 ... 750 752 ..
5 Math with Codistributed Arrays Constructor Functions The codistributed constructor functions are listed here. Use the codist argument (created by the codistributor function: codist=codistributor()) to specify over which dimension to distribute the array. See the individual reference pages for these functions for further syntax and usage information.
Working with Codistributed Arrays size(D) L = getLocalPart(D); size(L) end returns on each worker: 3 3 80 20 Each worker recognizes that the codistributed array D is 3-by-80. However, notice that the size of the local part, L, is 3-by-20 on each worker, because the 80 columns of D are distributed over four workers. Creating a Codistributed from Local Arrays Use the codistributed function to perform the reverse operation.
5 Math with Codistributed Arrays where D is any MATLAB array. Determining the Dimension of Distribution The codistributor object determines how an array is partitioned and its dimension of distribution. To access the codistributor of an array, use the getCodistributor function.
Working with Codistributed Arrays Construct an 8-by-16 codistributed array D of random values distributed by columns on four workers: spmd D = rand(8,16,codistributor()); size(getLocalPart(D)) end returns on each worker: 8 4 Create a new codistributed array distributed by rows from an existing one already distributed by columns: spmd X = redistribute(D, codistributor1d(1)); size(getLocalPart(X)) end returns on each worker: 2 16 Restoring the Full Array You can restore a codistributed array to its und
5 Math with Codistributed Arrays 11 21 31 41 12 22 32 42 13 23 33 43 | | | | | 14 24 34 44 15 25 35 45 16 26 36 46 spmd, size(getLocalPart(D)), Lab 1: 4 3 Lab 2: 4 3 Lab 3: 4 2 Lab 4: 4 2 | | | | | 17 27 37 47 18 28 38 48 | | | | | 19 29 39 49 20 30 40 50 end Restore the undistributed segments to the full array form by gathering the segments: spmd, X = gather(D), X = 11 12 13 21 22 23 31 32 33 41 42 43 spmd, 4 size(X), 10 end 14 24 34 44 15 25 35 45 16 26 36 46 17 27 37 47 18 28 38 48
Working with Codistributed Arrays to the end of the entire array; that is, the last subscript of the final segment. The length of each segment is also not given by using the length or size functions, as they only return the length of the entire array. The MATLAB colon operator and end keyword are two of the basic tools for indexing into nondistributed arrays. For codistributed arrays, MATLAB provides a version of the colon operator, called codistributed.colon.
5 Math with Codistributed Arrays Element is in position 25000 on worker 2. Notice if you use a pool of a different size, the element ends up in a different location on a different worker, but the same code can be used to locate the element. 2-Dimensional Distribution As an alternative to distributing by a single dimension of rows or columns, you can distribute a matrix by blocks using '2dbc' or two-dimensional block-cyclic distribution.
Working with Codistributed Arrays Now you can use this codistributor object to distribute the original matrix: P>> AA = codistributed(A, DIST) This distributes the array among the workers according to this scheme: LAB 2 LAB 1 1 9 17 25 33 41 49 57 2 10 18 26 34 42 50 58 3 11 19 27 35 43 51 59 4 12 20 28 36 44 52 60 5 13 21 29 37 45 53 61 6 14 22 30 38 46 54 62 7 15 23 31 39 47 55 63 8 16 24 32 40 48 56 64 LAB 3 LAB 4 If the lab grid does
5 Math with Codistributed Arrays Original matrix 1 9 17 25 LAB 1 33 41 49 57 LAB 2 2 10 18 26 34 42 50 58 3 11 19 27 35 43 51 59 12 20 28 36 44 52 60 4 LAB 3 LAB 1 LAB 2 LAB 3 LAB 4 LAB 4 5 13 21 29 37 45 53 61 6 14 22 30 38 46 54 62 7 15 23 31 39 47 55 63 8 16 24 32 40 48 56 64 LAB 1 LAB 2 LAB 1 LAB 2 LAB 3 LAB 4 LAB 3 LAB 4 The diagram above shows a scheme that requires four overlays of the lab grid to accommodate the entire
Working with Codistributed Arrays The following points are worth noting: • '2dbc' distribution might not offer any performance enhancement unless the block size is at least a few dozen. The default block size is 64. • The lab grid should be as close to a square as possible. • Not all functions that are enhanced to work on '1d' codistributed arrays work on '2dbc' codistributed arrays.
5 Math with Codistributed Arrays Looping Over a Distributed Range (for-drange) In this section... “Parallelizing a for-Loop” on page 5-20 “Codistributed Arrays in a for-drange Loop” on page 5-21 Note Using a for-loop over a distributed range (drange) is intended for explicit indexing of the distributed dimension of codistributed arrays (such as inside an spmd statement or a communicating job). For most applications involving parallel for-loops you should first try using parfor loops.
Looping Over a Distributed Range (for-drange) plot(1:numDataSets, res); print -dtiff -r300 fig.tiff; save \\central\myResults\today.mat res end Note that the length of the for iteration and the length of the codistributed array results need to match in order to index into results within a for drange loop. This way, no communication is required between the workers.
5 Math with Codistributed Arrays D = eye(8, 8, codistributor()) E = zeros(8, 8, codistributor()) By default, these arrays are distributed by columns; that is, each of the four workers contains two columns of each array. If you use these arrays in a for-drange loop, any calculations must be self-contained within each worker. In other words, you can only perform calculations that are limited within each worker to the two columns of the arrays that the workers contain.
Looping Over a Distributed Range (for-drange) To loop over all elements in the array, you can use for-drange on the dimension of distribution, and regular for-loops on all other dimensions. The following example executes in an spmd statement running on a parallel pool of 4 workers: spmd PP = zeros(6,8,12,'codistributed'); RR = rand(6,8,12,codistributor()) % Default distribution: % by third dimension, evenly across 4 workers.
5 Math with Codistributed Arrays MATLAB Functions on Distributed and Codistributed Arrays Many functions in MATLAB software are enhanced or overloaded so that they operate on codistributed arrays in much the same way that they operate on arrays contained in a single workspace. In most cases, if any of the input arguments to these functions is a distributed or codistributed array, their output arrays are distributed or codistributed, respectively.
MATLAB Functions on Distributed and Codistributed Arrays atand atanh besselh besseli besselj besselk bessely beta betainc betaincinv betaln bitand bitor bitxor bsxfun cart2pol ctranspose(') cummax cummin cumprod cumsum diag diff dot double eig end eps eq(==) erf erfc erfcinv ifft ifft2 ifftn imag Inf int16 int32 int64 int8 inv ipermute isempty isequal isequaln isfinite isfloat mrdivide(/) mtimes(*) mod mode NaN ndims ndgrid ne(~=) nextpow2 nnz nonzeros norm normest not(~) nthroot num2cell repmat reshap
6 Programming Overview This chapter provides information you need for programming with Parallel Computing Toolbox software. Further details of evaluating functions in a cluster, programming independent jobs, and programming communicating jobs are covered in later chapters. This chapter describes features common to programming all kinds of jobs. The sections are as follows.
6 Programming Overview How Parallel Computing Products Run a Job In this section... “Overview” on page 6-2 “Toolbox and Server Components” on page 6-3 “Life Cycle of a Job” on page 6-7 Overview Parallel Computing Toolbox and MATLAB Distributed Computing Server software let you solve computationally and data-intensive problems using MATLAB and Simulink on multicore and multiprocessor computers.
How Parallel Computing Products Run a Job MATLAB Worker MATLAB Distributed Computing Server MATLAB Client Parallel Computing Toolbox MATLAB Worker Scheduler MATLAB Distributed Computing Server MATLAB Worker MATLAB Distributed Computing Server Basic Parallel Computing Setup Toolbox and Server Components • “MJS, Workers, and Clients” on page 6-3 • “Local Cluster” on page 6-5 • “Third-Party Schedulers” on page 6-5 • “Components on Mixed Platforms or Heterogeneous Clusters” on page 6-6 • “mdce Service” o
6 Programming Overview A MATLAB Distributed Computing Server software setup usually includes many workers that can all execute tasks simultaneously, speeding up execution of large MATLAB jobs. It is generally not important which worker executes a specific task. In an independent job, the workers evaluate tasks one at a time as available, perhaps simultaneously, perhaps not, returning the results to the MJS. In a communicating job, the workers evaluate tasks simultaneously.
How Parallel Computing Products Run a Job Worker Client Scheduler 1 Worker Worker Client Client Worker Client Scheduler 2 Worker Worker Cluster with Multiple Clients and MJSs Local Cluster A feature of Parallel Computing Toolbox software is the ability to run a local cluster of workers on the client machine, so that you can run jobs without requiring a remote cluster or MATLAB Distributed Computing Server software.
6 Programming Overview • Is the handling of parallel computing jobs the only cluster scheduling management you need? The MJS is designed specifically for MathWorks® parallel computing applications. If other scheduling tasks are not needed, a third-party scheduler might not offer any advantages. • Is there a file sharing configuration on your cluster already? The MJS can handle all file and data sharing necessary for your parallel computing applications.
How Parallel Computing Products Run a Job same platform. The cluster can also be comprised of both 32-bit and 64-bit machines, so long as your data does not exceed the limitations posed by the 32-bit systems. Other limitations are described at http://www.mathworks.com/products/parallel-computing/ requirements.html. In a mixed-platform environment, system administrators should be sure to follow the proper installation instructions for the local machine on which you are installing the software.
6 Programming Overview Worker Cluster Queued Running Job Job Job Job Pending Job Job createJob Job Job Job Job submit Client fetchOutputs Worker Worker Worker Finished Job Job Job Worker Job Stages of a Job The following table describes each stage in the life cycle of a job. 6-8 Job Stage Description Pending You create a job on the scheduler with the createJob function in your client session of Parallel Computing Toolbox software. The job's first state is pending.
How Parallel Computing Products Run a Job Job Stage Description Failed When using a third-party scheduler, a job might fail if the scheduler encounters an error when attempting to execute its commands or access necessary files. Deleted When a job’s data has been removed from its data location or from the MJS with the delete function, the state of the job in the client is deleted. This state is available only as long as the job object remains in the client.
6 Programming Overview Create Simple Independent Jobs Program a Job on a Local Cluster In some situations, you might need to define the individual tasks of a job, perhaps because they might evaluate different functions or have uniquely structured arguments. To program a job like this, the typical Parallel Computing Toolbox client session includes the steps shown in the following example. This example illustrates the basic steps in creating and running a job that contains a few simple tasks.
Create Simple Independent Jobs results = [2] [4] [6] 6 Delete the job. When you have the results, you can permanently remove the job from the scheduler's storage location.
6 Programming Overview Parallel Preferences You can access parallel preferences in the general preferences for MATLAB. To open the Preferences dialog box, use any one of the following: • On the Home tab in the Environment section, click Parallel > Parallel Preferences • Click the desktop pool indicator icon, and select Parallel preferences. • In the command window, type preferences In the navigation tree of the Preferences dialog box, click Parallel Computing Toolbox.
Parallel Preferences • Automatically create a parallel pool — This setting causes a pool to automatically start if one is not already running at the time a parallel language is encountered that runs on a pool, such as: • parfor • spmd • distributed • Composite • parfeval • parfevalOnAll • gcp • mapreduce • mapreducer With this setting, you never need to manually open a pool using the parpool function. If a pool automatically opens, you can still access the pool object with gcp.
6 Programming Overview Clusters and Cluster Profiles In this section...
Clusters and Cluster Profiles This opens the Discover Clusters dialog box, where you select the location of your clusters. As clusters are discovered, they populate a list for your selection: If you already have a profile for any of the listed clusters, those profile names are included in the list. If you want to create a new profile for one of the discovered clusters, select the name of the cluster you want to use, and click Next.
6 Programming Overview discovery of MJS clusters by identifying specific hosts rather than broadcasting across your network. A DNS service (SRV) record defines the location of hosts and ports of services, such as those related to the clusters you want to discover. Your system administrator creates DNS SRV records in your organization’s DNS infrastructure. For a description of the required record, and validation information, see “DNS SRV Record”.
Clusters and Cluster Profiles The imported profile appears in your Cluster Profile Manager list. Note that the list contains the profile name, which is not necessarily the file name. If you already have a profile with the same name as the one you are importing, the imported profile gets an extension added to its name so you can distinguish it. You can also export and import profiles programmatically with the parallel.exportProfile and parallel.importProfile functions.
6 Programming Overview The following example provides instructions on how to create and modify profiles using the Cluster Profile Manager. Suppose you want to create a profile to set several properties for jobs to run in an MJS cluster. The following example illustrates a possible workflow, where you create two profiles differentiated only by the number of workers they use. 1 6-18 In the Cluster Profile Manager, select Add > Custom > MATLAB Job Scheduler (MJS).
Clusters and Cluster Profiles This creates and displays a new profile, called MJSProfile1. 2 Double-click the new profile name in the listing, and modify the profile name to be MyMJSprofile1. 3 Click Edit in the tool strip so that you can set your profile property values. In the Description field, enter the text MJS with 4 workers, as shown in the following figure. Enter the host name for the machine on which the MJS is running, and the name of the MJS.
6 Programming Overview You might want to edit other properties depending on your particular network and cluster situation. 5 Click Done to save the profile settings. To create a similar profile with just a few differences, you can duplicate an existing profile and modify only the parts you need to change, as follows: 1 In the Cluster Profile Manager, right-click the profile name MyMJSprofile1 in the list and select Duplicate.
Clusters and Cluster Profiles 5 Scroll down to the Workers section, and for the Range of number of workers, clear the [4 4] and leave the field blank, as highlighted in the following figure: 6 Click Done to save the profile settings and to close the properties editor. You now have two profiles that differ only in the number of workers required for running a job. When creating a job, you can apply either profile to that job as a way of specifying how many workers it should run on.
6 Programming Overview You can see examples of profiles for different kinds of supported schedulers in the MATLAB Distributed Computing Server installation instructions at “Configure Your Cluster”. Validate Cluster Profiles The Cluster Profile Manager includes the ability to validate profiles. Validation assures that the MATLAB client session can access the cluster, and that the cluster can run the various types of jobs with the settings of your profile.
Clusters and Cluster Profiles Note Validation will fail if you already have a parallel pool open. When the tests are complete, you can click Show Details to get more information about test results. This information includes any error messages, debug logs, and other data that might be useful in diagnosing problems or helping to determine proper network settings. The Validation Results tab keeps the test results available until the current MATLAB session closes.
6 Programming Overview • The Cluster Profile Manager indicates which is the default profile. You can select any profile in the list, then click Set as Default. • You can get or set the default profile programmatically by using the parallel.defaultClusterProfile function. The following sets of commands achieve the same thing: parallel.
Apply Callbacks to MJS Jobs and Tasks Apply Callbacks to MJS Jobs and Tasks The MATLAB job scheduler (MJS) has the ability to trigger callbacks in the client session whenever jobs or tasks in the MJS cluster change to specific states.
6 Programming Overview disp(['Finished task: ' num2str(task.ID)]) Create a job and set its QueuedFcn, RunningFcn, and FinishedFcn properties, using a function handle to an anonymous function that sends information to the display. c = parcluster('MyMJS'); j = createJob(c,'Name','Job_52a'); j.QueuedFcn = @(job,eventdata) disp([job.Name ' now ' job.State]); j.RunningFcn = @(job,eventdata) disp([job.Name ' now ' job.State]); j.FinishedFcn = @(job,eventdata) disp([job.Name ' now ' job.
Apply Callbacks to MJS Jobs and Tasks Create and save a callback function clientTaskCompleted.m on the path of the MATLAB client, with the following content. (If you created this function for the previous example, you can use that.) function clientTaskCompleted(task,eventdata) disp(['Finished task: ' num2str(task.ID)]) Create objects for the cluster, job, and task. Then submit the job. All the callback properties are set from the profile when the objects are created.
6 Programming Overview guarantee to the order in which the tasks finish, so the plots might overwrite each other. Likewise, the FinishFcn callback for a job might be triggered to start before the FinishFcn callbacks for all its tasks are complete. • Submissions made with batch use applicable job and task callbacks. Parallel pools can trigger job callbacks defined by their cluster profile.
Job Monitor Job Monitor In this section... “Job Monitor GUI” on page 6-29 “Manage Jobs Using the Job Monitor” on page 6-30 “Identify Task Errors Using the Job Monitor” on page 6-30 Job Monitor GUI The Job Monitor displays the jobs in the queue for the scheduler determined by your selection of a cluster profile. Open the Job Monitor from the MATLAB desktop on the Home tab in the Environment section, by clicking Parallel > Monitor Jobs.
6 Programming Overview Typical Use Cases The Job Monitor lets you accomplish many different goals pertaining to job tracking and queue management.
Job Monitor If you save this script in a file named invert_me.m, you can try to run the script as a batch job on the default cluster: batch('invert_me') When updated after the job runs, the Job Monitor includes the job created by the batch command, with an error icon ( ) for this job. Right-click the job in the list, and select Show Errors.
6 Programming Overview Programming Tips In this section...
Programming Tips 3 Modify your code for division. Decide how you want your code divided. For an independent job, determine how best to divide it into tasks; for example, each iteration of a for-loop might define one task. For a communicating job, determine how best to take advantage of parallel processing; for example, a large array can be distributed across all your workers. 4 Use pmode to develop parallel functionality.
6 Programming Overview C:\TEMP\MDCE\Checkpoint\nodeA52_worker22_mlworker_log\work Writing to Files from Workers When multiple workers attempt to write to the same file, you might end up with a race condition, clash, or one worker might overwrite the data from another worker. This might be likely to occur when: • There is more than one worker per machine, and they attempt to write to the same file. • The workers have a shared file system, and use the same path to identify a file for writing.
Programming Tips clears all Parallel Computing Toolbox objects from the current MATLAB session. They still remain in the MJS. For information on recreating these objects in the client session, see “Recover Objects” on page 7-14. Running Tasks That Call Simulink Software The first task that runs on a worker session that uses Simulink software can take a long time to run, as Simulink is not automatically started at the beginning of the worker session. Instead, Simulink starts up when first called.
6 Programming Overview initialize a task is far greater than the actual time it takes for the worker to evaluate the task function.
Control Random Number Streams Control Random Number Streams In this section... “Different Workers” on page 6-37 “Client and Workers” on page 6-38 “Client and GPU” on page 6-39 “Worker CPU and Worker GPU” on page 6-41 Different Workers By default, each worker in a cluster working on the same job has a unique random number stream. This example uses two workers in a parallel pool to show they generate unique random number sequences.
6 Programming Overview delete(p) Note Because rng('shuffle') seeds the random number generator based on the current time, you should not use this command to set the random number stream on different workers if you want to assure independent streams. This is especially true when the command is sent to multiple workers simultaneously, such as inside a parfor, spmd, or a communicating job.
Control Random Number Streams For identical results, you can set the client and worker to use the same generator and seed. Here the file randScript2.m contains the following code: s = RandStream('CombRecursive','Seed',1); RandStream.setGlobalStream(s); R = rand(1,4); Now, run the new script in the client and on a worker: randScript2; % In client R R = 0.4957 0.2243 0.2073 0.6823 j = batch(c,'randScript2'); % On worker wait(j); load(j); R R = 0.4957 0.2243 0.2073 0.
6 Programming Overview Keyword Generator Multiple Stream and Approximate Period In Full Substream Support Precision 'CombRecursive' or 'mrg32k3a' Combined multiple recursive generator Yes 2127 'Philox4x32-10' Philox 4x32 generator with 10 rounds Yes 2129 'Threefry4x64-20' Threefry 4x64 generator with 20 rounds Yes 2258 None of these is the default client generator for the CPU. To generate the same sequence on CPU and GPU, you must use the only generator supported by both: 'CombRecursive'.
Control Random Number Streams Rg = -0.0108 -0.7577 -0.8159 0.4742 Worker CPU and Worker GPU Code running on a worker’s CPU uses the same generator to create random numbers as code running on a worker’s GPU, but they do not share the same stream. You can use a common seed to generate the same sequence of numbers, as shown in this example, where each worker creates the same sequence on GPU and CPU, but different from the sequence on the other worker.
6 Programming Overview Profiling Parallel Code In this section... “Introduction” on page 6-42 “Collecting Parallel Profile Data” on page 6-42 “Viewing Parallel Profile Data” on page 6-43 Introduction The parallel profiler provides an extension of the profile command and the profile viewer specifically for communicating jobs, to enable you to see how much time each worker spends evaluating each function and how much time communicating or waiting for communications with the other workers.
Profiling Parallel Code • Amount of data transferred between each worker • Amount of time each worker spends waiting for communications With the parallel profiler on, you can proceed to execute your code while the profiler collects the data. In the pmode Parallel Command Window, to find out if the profiler is on, type: P>> mpiprofile status For a complete list of options regarding profiler data details, clearing data, etc., see the mpiprofile reference page.
6 Programming Overview The function summary report displays the data for each function executed on a worker in sortable columns with the following headers: 6-44 Column Header Description Calls How many times the function was called on this worker Total Time The total amount of time this worker spent executing this function Self Time The time this worker spent inside this function, not within children or local functions Total Comm Time The total time this worker spent transferring data with othe
Profiling Parallel Code Column Header Description Total Time Plot Bar graph showing relative size of Self Time, Self Comm Waiting Time, and Total Time for this function on this worker Click the name of any function in the list for more details about the execution of that function. The function detail report for codistributed.mtimes includes this listing: The code that is displayed in the report is taken from the client.
6 Programming Overview communication. Manual Comparison Selection allows you to compare data from specific workers or workers that meet certain criteria. The following listing from the summary report shows the result of using the Automatic Comparison Selection of Compare (max vs. min TotalTime). The comparison shows data from worker (lab) 3 compared to worker (lab) 1 because these are the workers that spend the most versus least amount of time executing the code.
Profiling Parallel Code The next figure shows a summary report for the workers that spend the most versus least time for each function. A Manual Comparison Selection of max Time Aggregate against min Time >0 Aggregate generated this summary. Both aggregate settings indicate that the profiler should consider data from all workers for all functions, for both maximum and minimum. This report lists the data for codistributed.
6 Programming Overview Click on a function name in the summary listing of a comparison to get a detailed comparison. The detailed comparison for codistributed.
Profiling Parallel Code To see plots of communication data, select Plot All PerLab Communication in the Show Figures menu. The top portion of the plot view report plots how much data each worker receives from each other worker for all functions. To see only a plot of interworker communication times, select Plot CommTimePerLab in the Show Figures menu.
6 Programming Overview Plots like those in the previous two figures can help you determine the best way to balance work among your workers, perhaps by altering the partition scheme of your codistributed arrays.
Benchmarking Performance Benchmarking Performance HPC Challenge Benchmarks Several MATLAB files are available to illustrate HPC Challenge benchmark performance. You can find the files in the folder matlabroot/toolbox/distcomp/ examples/benchmark/hpcchallenge. Each file is self-documented with explanatory comments. These files are not self-contained examples, but rather require that you know enough about your cluster to be able to provide the necessary information when using these files.
6 Programming Overview Troubleshooting and Debugging In this section... “Object Data Size Limitations” on page 6-52 “File Access and Permissions” on page 6-52 “No Results or Failed Job” on page 6-54 “Connection Problems Between the Client and MJS” on page 6-54 “SFTP Error: Received Message Too Long” on page 6-55 Object Data Size Limitations The size limit of data transfers among the parallel computing objects is limited by the Java Virtual Machine (JVM) memory allocation.
Troubleshooting and Debugging Error using ==> feval Undefined command/function 'function_name'. The worker that ran the task did not have access to the function function_name. One solution is to make sure the location of the function’s file, function_name.m, is included in the job’s AdditionalPaths property. Another solution is to transfer the function file to the worker by adding function_name.m to the AttachedFiles property of the job.
6 Programming Overview • MATLAB could not read/write the job input/output files in the scheduler’s job storage location. The storage location might not be accessible to all the worker nodes, or the user that MATLAB runs as does not have permission to read/write the job files. • If using a generic scheduler: • The environment variable MDCE_DECODE_FUNCTION was not defined before the MATLAB worker started. • The decode function was not on the worker’s path.
Troubleshooting and Debugging how to start it and how to test connectivity, see “Start Admin Center” and “Test Connectivity” in the MATLAB Distributed Computing Server documentation. Detailed instructions for other methods of diagnosing connection problems between the client and MJS can be found in some of the Bug Reports listed on the MathWorks Web site. The following sections can help you identify the general nature of some connection problems.
6 Programming Overview Could not send Job3.common.mat for job 3: One of your shell's init files contains a command that is writing to stdout, interfering with sftp. Access help com.mathworks.toolbox.distcomp.remote.spi.plugin.SftpExtraBytesFromShellException: One of your shell's init files contains a command that is writing to stdout, interfering with sftp.
Run mapreduce on a Parallel Pool Run mapreduce on a Parallel Pool In this section... “Start Parallel Pool” on page 6-57 “Compare Parallel mapreduce” on page 6-57 Start Parallel Pool If you have Parallel Computing Toolbox installed, execution of mapreduce can open a parallel pool on the cluster specified by your default profile, for use as the execution environment. You can set your parallel preferences so that a pool does not automatically open.
6 Programming Overview Create two MapReducer objects for specifying the different execution environments for mapreduce. inMatlab = mapreducer(0); inPool = mapreducer(p); Create and preview the datastore. The data set used in this example is available in matlabroot/toolbox/matlab/demos. ds = datastore('airlinesmall.csv','TreatAsMissing','NA',...
Run mapreduce on a Parallel Pool readall(meanDelay) Key __________________ Value ________ 'MeanArrivalDelay' [7.1201] Then, run the calculation on the current parallel pool. Note that the output text indicates a parallel mapreduce.
6 Programming Overview Related Examples • “Getting Started with MapReduce” • “Run mapreduce on a Hadoop Cluster” More About 6-60 • “MapReduce” • “Datastore”
Run mapreduce on a Hadoop Cluster Run mapreduce on a Hadoop Cluster In this section... “Cluster Preparation” on page 6-61 “Output Format and Order” on page 6-61 “Calculate Mean Delay” on page 6-61 Cluster Preparation Before you can run mapreduce on a Hadoop® cluster, make sure that the cluster and client machine are properly configured. Consult your system administrator, or see “Configure a Hadoop Cluster”.
6 Programming Overview outputFolder = '/home/user/logs/hadooplog'; Note The specified outputFolder must not already exist. The mapreduce output from a Hadoop cluster cannot overwrite an existing folder. Create a MapReducer object to specify that mapreduce should use your Hadoop cluster. . mr = mapreducer(cluster); Create and preview the datastore. The data set is available in matlabroot/toolbox/ matlab/demos. ds = datastore('airlinesmall.csv','TreatAsMissing','NA',...
Run mapreduce on a Hadoop Cluster meanDelay = KeyValueDatastore with properties: Files: { ' .../tmp/alafleur/tpc00621b1_4eef_4abc_8078_646aa916e7d9/part0.seq' } ReadSize: 1 key-value pairs FileType: 'seq' Read the result. readall(meanDelay) Key __________________ Value ________ 'MeanArrivalDelay' [7.1201] Although for demonstration purposes this example uses a local data set, it is likely when using Hadoop that your data set is stored in an HDFS™ file system.
6 Programming Overview Partition a Datastore in Parallel Partitioning a datastore in parallel, with a portion of the datastore on each worker in a parallel pool, can provide benefits in many cases: • Perform some action on only one part of the whole datastore, or on several defined parts simultaneously. • Search for specific values in the data store, with all workers acting simultaneously on their own partitions. • Perform a reduction calculation on the workers across all partitions.
Partition a Datastore in Parallel [total,count] = sumAndCountArrivalDelay(ds) sumtime = toc mean = total/count total = 17211680 count = 2417320 sumtime = 10.8273 mean = 7.1201 The partition function allows you to partition the datastore into smaller parts, each represented as a datastore itself. These smaller datastores work completely independently of each other, so that you can work with them inside of parallel language features such as parfor loops and spmd blocks.
6 Programming Overview total = 0; count = 0; parfor ii = 1:N % Get partition ii of the datastore. subds = partition(ds,N,ii); [localTotal,localCount] = sumAndCountArrivalDelay(subds); total = total + localTotal; count = count + localCount; end end Now the MATLAB code calls this new function, so that the counting and summing of the non-NAN values can occur in parallel loop iterations.
Partition a Datastore in Parallel Rather than let the software calculate the number of partitions, you can explicitly set this value, so that the data can be appropriately partitioned to fit your algorithm. For example, to parallelize data from within an spmd block, you can specify the number of workers (numlabs) as the number of partitions to use. The following function uses an spmd block to perform a parallel read, and explicitly sets the number of partitions equal to the number of workers.
6 Programming Overview mean = 7.1201 delete(p); Parallel pool using the 'local' profile is shutting down. You might get some idea of modest performance improvements by comparing the times recorded in the variables sumtime, parfortime, and spmdtime. Your results might vary, as the performance can be affected by the datastore size, parallel pool size, hardware configuration, and other factors.
7 Program Independent Jobs • “Program Independent Jobs” on page 7-2 • “Program Independent Jobs on a Local Cluster” on page 7-3 • “Program Independent Jobs for a Supported Scheduler” on page 7-8 • “Share Code with the Workers” on page 7-16 • “Program Independent Jobs for a Generic Scheduler” on page 7-21
7 Program Independent Jobs Program Independent Jobs An Independent job is one whose tasks do not directly communicate with each other, that is, the tasks are independent of each other. The tasks do not need to run simultaneously, and a worker might run several tasks of the same job in succession. Typically, all tasks perform the same or similar functions on different data sets in an embarrassingly parallel configuration.
Program Independent Jobs on a Local Cluster Program Independent Jobs on a Local Cluster In this section... “Create and Run Jobs with a Local Cluster” on page 7-3 “Local Cluster Behavior” on page 7-6 Create and Run Jobs with a Local Cluster For jobs that require more control than the functionality offered by such high level constructs as spmd and parfor, you have to program all the steps for creating and running the job.
7 Program Independent Jobs Create a Cluster Object You use the parcluster function to create an object in your local MATLAB session representing the local scheduler. parallel.defaultClusterProfile('local'); c = parcluster(); Create a Job You create a job with the createJob function. This statement creates a job in the cluster’s job storage location, creates the job object job1 in the client session, and if you omit the semicolon at the end of the command, displays some information about the job.
Program Independent Jobs on a Local Cluster c Local Cluster Associated Jobs Number Pending: Number Queued: Number Running: Number Finished: 1 0 0 0 Create Tasks After you have created your job, you can create tasks for the job using the createTask function. Tasks define the functions to be evaluated by the workers during the running of the job. Often, the tasks of a job are all identical. In this example, five tasks will each generate a 3-by-3 matrix of random numbers.
7 Program Independent Jobs Fetch the Job’s Results The results of each task’s evaluation are stored in the task object’s OutputArguments property as a cell array. After waiting for the job to complete, use the function fetchOutputs to retrieve the results from all the tasks in the job. wait(job1) results = fetchOutputs(job1); Display the results from each task. results{1:5} 0.9501 0.2311 0.6068 0.4860 0.8913 0.7621 0.4565 0.0185 0.8214 0.4447 0.6154 0.7919 0.9218 0.7382 0.1763 0.4057 0.9355 0.
Program Independent Jobs on a Local Cluster evaluation to the local cluster, the scheduler starts a MATLAB worker for each task in the job, but only up to as many workers as allowed by the local profile. If your job has more tasks than allowed workers, the scheduler waits for one of the current tasks to complete before starting another MATLAB worker to evaluate the next task. You can modify the number of allowed workers in the local cluster profile.
7 Program Independent Jobs Program Independent Jobs for a Supported Scheduler In this section... “Create and Run Jobs” on page 7-8 “Manage Objects in the Scheduler” on page 7-13 Create and Run Jobs This section details the steps of a typical programming session with Parallel Computing Toolbox software using a supported job scheduler on a cluster.
Program Independent Jobs for a Supported Scheduler where MATLAB is accessed and many other cluster properties. The exact properties are determined by the type of cluster. The step in this section all assume the profile with the name MyProfile identifies the cluster you want to use, with all necessary property settings. With the proper use of a profile, the rest of the programming is the same, regardless of cluster type.
7 Program Independent Jobs Modified: false Host: node345 Username: mylogin NumWorkers: 1 NumBusyWorkers: 0 NumIdleWorkers: 1 JobStorageLocation: ClusterMatlabRoot: OperatingSystem: AllHostAddresses: SecurityLevel: HasSecureCommunication: Database on node345 C:\apps\matlab windows 0:0:0:0 0 (No security) false Associated Jobs Number Pending: Number Queued: Number Running: Number Finished: 0 0 0 0 Create a Job You create a job with the createJob function.
Program Independent Jobs for a Supported Scheduler Number Running: 0 Number Finished: 0 Task ID of Errors: [] Note that the job’s State property is pending. This means the job has not been queued for running yet, so you can now add tasks to it.
7 Program Independent Jobs Alternatively, you can create the five tasks with one call to createTask by providing a cell array of five cell arrays defining the input arguments to each task. T = createTask(job1, @rand, 1, {{3,3} {3,3} {3,3} {3,3} {3,3}}); In this case, T is a 5-by-1 matrix of task objects. Submit a Job to the Job Queue To run your job and have its tasks evaluated, you submit the job to the job queue with the submit function.
Program Independent Jobs for a Supported Scheduler wait(job1) results = fetchOutputs(job1); Display the results from each task. results{1:5} 0.9501 0.2311 0.6068 0.4860 0.8913 0.7621 0.4565 0.0185 0.8214 0.4447 0.6154 0.7919 0.9218 0.7382 0.1763 0.4057 0.9355 0.9169 0.4103 0.8936 0.0579 0.3529 0.8132 0.0099 0.1389 0.2028 0.1987 0.6038 0.2722 0.1988 0.0153 0.7468 0.4451 0.9318 0.4660 0.4186 0.8462 0.5252 0.2026 0.6721 0.8381 0.0196 0.6813 0.3795 0.
7 Program Independent Jobs Computing Server software or other cluster resources remain in place. When the client session ends, only the local reference objects are lost, not the actual job and task data in the cluster. Therefore, if you have submitted your job to the cluster job queue for execution, you can quit your client session of MATLAB, and the job will be executed by the cluster. You can retrieve the job results later in another client session.
Program Independent Jobs for a Supported Scheduler Remove Objects Permanently Jobs in the cluster continue to exist even after they are finished, and after the MJS is stopped and restarted. The ways to permanently remove jobs from the cluster are explained in the following sections: • “Delete Selected Objects” on page 7-15 • “Start an MJS from a Clean State” on page 7-15 Delete Selected Objects From the command line in the MATLAB client session, you can call the delete function for any job or task object.
7 Program Independent Jobs Share Code with the Workers Because the tasks of a job are evaluated on different machines, each machine must have access to all the files needed to evaluate its tasks. The basic mechanisms for sharing code are explained in the following sections: In this section...
Share Code with the Workers c = parcluster(); % Use default job1 = createJob(c); ap = {'/central/funcs','/dept1/funcs', ... '\\OurDomain\central\funcs','\\OurDomain\dept1\funcs'}; job1.AdditionalPaths = ap; • By putting the path command in any of the appropriate startup files for the worker: • matlabroot\toolbox\local\startup.m • matlabroot\toolbox\distcomp\user\jobStartup.m • matlabroot\toolbox\distcomp\user\taskStartup.
7 Program Independent Jobs more than one task for the job. (Note: Do not confuse this property with the UserData property on any objects in the MATLAB client. Information in UserData is available only in the client, and is not available to the scheduler or workers.) • AttachedFiles — This property of the job object is a cell array in which you manually specify all the folders and files that get sent to the workers.
Share Code with the Workers manually attached files to determine which code files are necessary for the workers, and to automatically send those files to the workers. You can set this property value in a cluster profile using the Profile Manager, or you can set it programmatically on a job object at the command line. c = parcluster(); j = createJob(c); j.AutoAttachFiles = true; The supported code file formats for automatic attachment are MATLAB files (.m extension), P-code files (.p), and MEX-files (.
7 Program Independent Jobs • taskStartup.m automatically executes on a worker each time the worker begins evaluation of a task. • poolStartup.m automatically executes on a worker each time the worker is included in a newly started parallel pool. • taskFinish.m automatically executes on a worker each time the worker completes evaluation of a task.
Program Independent Jobs for a Generic Scheduler Program Independent Jobs for a Generic Scheduler In this section...
7 Program Independent Jobs Client node MATLAB client Worker node Environment variables Environment variables Submit function MATLAB worker Decode function Scheduler Note Whereas the MJS keeps MATLAB workers running between tasks, a third-party scheduler runs MATLAB workers for only as long as it takes each worker to evaluate its one task.
Program Independent Jobs for a Generic Scheduler testlocation = 'Plant30' c.IndependentSubmitFcn = {@mysubmitfunc, time_limit, testlocation} In this example, the submit function requires five arguments: the three defaults, along with the numeric value of time_limit and the string value of testlocation.
7 Program Independent Jobs exist before the worker starts. For more information on the decode function, see “MATLAB Worker Decode Function” on page 7-27. Standard decode functions for independent and communicating jobs are provided with the product. If your submit functions make use of the definitions in these decode functions, you do not have to provide your own decode functions.
Program Independent Jobs for a Generic Scheduler Define Scheduler Command to Run MATLAB Workers The submit function must define the command necessary for your scheduler to start MATLAB workers. The actual command is specific to your scheduler and network configuration. The commands for some popular schedulers are listed in the following table. This table also indicates whether or not the scheduler automatically passes environment variables with its submission.
7 Program Independent Jobs This example function uses only the three default arguments. You can have additional arguments passed into your submit function, as discussed in “MATLAB Client Submit Function” on page 7-22. 2 Identify the values you want to send to your environment variables. For convenience, you define local variables for use in this function.
Program Independent Jobs for a Generic Scheduler derived from the values of your object properties. This command is inside the forloop so that your scheduler gets a command to start a MATLAB worker on the cluster for each task. Note If you are not familiar with your network scheduler, ask your system administrator for help. MATLAB Worker Decode Function The sole purpose of the MATLAB worker’s decode function is to read certain job and task information into the MATLAB worker session.
7 Program Independent Jobs 'parallel.cluster.generic.independentDecodeFcn'. The remainder of this section is useful only if you use names and settings other than the standards used in the provided decode functions. Identify File Name and Location The client’s submit function and the worker’s decode function work together as a pair. For more information on the submit function, see “MATLAB Client Submit Function” on page 7-22.
Program Independent Jobs for a Generic Scheduler With those values from the environment variables, the decode function must set the appropriate property values of the object that is its argument. The property values that must be set are the same as those in the corresponding submit function, except that instead of the cell array TaskLocations, each worker has only the individual string TaskLocation, which is one element of the TaskLocations cell array.
7 Program Independent Jobs c = parcluster('MyGenericProfile') If your cluster uses a shared file system for workers to access job and task data, set the JobStorageLocation and HasSharedFilesystem properties to specify where the job data is stored and that the workers should access job data directly in a shared file system. c.JobStorageLocation = '\\share\scratch\jobdata' c.
Program Independent Jobs for a Generic Scheduler 2. Create a Job You create a job with the createJob function, which creates a job object in the client session. The job data is stored in the folder specified by the cluster object's JobStorageLocation property. j = createJob(c) This statement creates the job object j in the client session. Note Properties of a particular job or task should be set from only one computer at a time.
7 Program Independent Jobs T = createTask(job1, @rand, 1, {{3,3} {3,3} {3,3} {3,3} {3,3}}); In this case, T is a 5-by-1 matrix of task objects. 4. Submit a Job to the Job Queue To run your job and have its tasks evaluated, you submit the job to the scheduler’s job queue. submit(j) The scheduler distributes the tasks of j to MATLAB workers for evaluation. The job runs asynchronously. If you need to wait for it to complete before you continue in your MATLAB client session, you can use the wait function.
Program Independent Jobs for a Generic Scheduler 0.6038 0.2722 0.1988 0.0153 0.7468 0.4451 0.9318 0.4660 0.4186 0.8462 0.5252 0.2026 0.6721 0.8381 0.0196 0.6813 0.3795 0.8318 Supplied Submit and Decode Functions There are several submit and decode functions provided with the toolbox for your use with the generic scheduler interface. These files are in the folder matlabroot/toolbox/distcomp/examples/integration In this folder are subdirectories for each of several types of scheduler.
7 Program Independent Jobs Filename Description deleteJobFcn.m Script to delete a job from the scheduler extractJobId.m Script to get the job’s ID from the scheduler getJobStateFcn.m Script to get the job's state from the scheduler getSubmitString.m Script to get the submission string for the scheduler These files are all programmed to use the standard decode functions provided with the product, so they do not have specialized decode functions.
Program Independent Jobs for a Generic Scheduler for ii = 1:props.NumberOfTasks define scheduler command per task end submit job to scheduler data_array = parse data returned from scheduler %possibly NumberOfTasks-by-2 matrix setJobClusterData(cluster, job, data_array) If your scheduler accepts only submissions of individual tasks, you might get return data pertaining to only each individual tasks. In this case, your submit function might have code structured like this: for ii = 1:props.
7 Program Independent Jobs command to scheduler canceling job job_id In a similar way, you can define what do to for deleting a job, and what to do for canceling and deleting tasks. Delete or Cancel a Running Job After your functions are written, you set the appropriate properties of the cluster object with handles to your functions.
Program Independent Jobs for a Generic Scheduler delete(j1) Get State Information About a Job or Task When using a third-party scheduler, it is possible that the scheduler itself can have more up-to-date information about your jobs than what is available to the toolbox from the job storage location. To retrieve that information from the scheduler, you can write a function to do that, and set the value of the GetJobStateFcn property as a handle to your function.
7 Program Independent Jobs The following step occurs in your network: 1 For each task, the scheduler starts a MATLAB worker session on a cluster node. The following steps occur in each MATLAB worker session: 7-38 1 The MATLAB worker automatically runs the decode function, finding it on the path. 2 The decode function reads the pertinent environment variables. 3 The decode function sets the properties of its argument object with values from the environment variables.
8 Program Communicating Jobs • “Program Communicating Jobs” on page 8-2 • “Program Communicating Jobs for a Supported Scheduler” on page 8-4 • “Program Communicating Jobs for a Generic Scheduler” on page 8-7 • “Further Notes on Communicating Jobs” on page 8-10
8 Program Communicating Jobs Program Communicating Jobs Communicating jobs are those in which the workers can communicate with each other during the evaluation of their tasks. A communicating job consists of only a single task that runs simultaneously on several workers, usually with different data. More specifically, the task is duplicated on each worker, so each worker can perform the task on a different set of data, or on a particular segment of a large data set.
Program Communicating Jobs Some of the details of a communicating job and its tasks might depend on the type of scheduler you are using.
8 Program Communicating Jobs Program Communicating Jobs for a Supported Scheduler In this section... “Schedulers and Conditions” on page 8-4 “Code the Task Function” on page 8-4 “Code in the Client” on page 8-5 Schedulers and Conditions You can run a communicating job using any type of scheduler. This section illustrates how to program communicating jobs for supported schedulers (MJS, local scheduler, Microsoft Windows HPC Server (including CCS), Platform LSF, PBS Pro, or TORQUE).
Program Communicating Jobs for a Supported Scheduler The function for this example is shown below.
8 Program Communicating Jobs When your cluster object is defined, you create the job object with the createCommunicatingJob function. The job Type property must be set as 'SPMD' when you create the job. cjob = createCommunicatingJob(c,'Type','SPMD'); The function file colsum.m (created in “Code the Task Function” on page 8-4) is on the MATLAB client path, but it has to be made available to the workers.
Program Communicating Jobs for a Generic Scheduler Program Communicating Jobs for a Generic Scheduler In this section... “Introduction” on page 8-7 “Code in the Client” on page 8-7 Introduction This section discusses programming communicating jobs using the generic scheduler interface. This interface lets you execute jobs on your cluster with any scheduler you might have. The principles of using the generic scheduler interface for communicating jobs are the same as those for independent jobs.
8 Program Communicating Jobs 3 Use createCommunicatingJob to create a communicating job object for your cluster. 4 Create a task, run the job, and retrieve the results as usual. Supplied Submit and Decode Functions There are several submit and decode functions provided with the toolbox for your use with the generic scheduler interface. These files are in the folder matlabroot/toolbox/distcomp/examples/integration In this folder are subfolders for each of several types of scheduler.
Program Communicating Jobs for a Generic Scheduler Filename Description getSubmitString.m Script to get the submission string for the scheduler These files are all programmed to use the standard decode functions provided with the product, so they do not have specialized decode functions. For communicating jobs, the standard decode function provided with the product is parallel.cluster.generic.communicatingDecodeFcn. You can view the required variables in this file by typing edit parallel.cluster.
8 Program Communicating Jobs Further Notes on Communicating Jobs In this section... “Number of Tasks in a Communicating Job” on page 8-10 “Avoid Deadlock and Other Dependency Errors” on page 8-10 Number of Tasks in a Communicating Job Although you create only one task for a communicating job, the system copies this task for each worker that runs the job. For example, if a communicating job runs on four workers, the Tasks property of the job contains four task objects.
Further Notes on Communicating Jobs In another example, suppose you want to transfer data from every worker to the next worker on the right (defined as the next higher labindex). First you define for each worker what the workers on the left and right are. from_lab_left = mod(labindex - 2, numlabs) + 1; to_lab_right = mod(labindex, numlabs) + 1; Then try to pass data around the ring.
9 GPU Computing • “GPU Capabilities and Performance” on page 9-2 • “Establish Arrays on a GPU” on page 9-3 • “Run Built-In Functions on a GPU” on page 9-8 • “Run Element-wise MATLAB Code on GPU” on page 9-13 • “Identify and Select a GPU Device” on page 9-18 • “Run CUDA or PTX Code on GPU” on page 9-20 • “Run MEX-Functions Containing CUDA Code” on page 9-31 • “Measure and Improve GPU Performance” on page 9-35
9 GPU Computing GPU Capabilities and Performance In this section... “Capabilities” on page 9-2 “Performance Benchmarking” on page 9-2 Capabilities Parallel Computing Toolbox enables you to program MATLAB to use your computer’s graphics processing unit (GPU) for matrix operations. In many cases, execution in the GPU is faster than in the CPU, so this feature might offer improved performance.
Establish Arrays on a GPU Establish Arrays on a GPU In this section... “Transfer Arrays Between Workspace and GPU” on page 9-3 “Create GPU Arrays Directly” on page 9-4 “Examine gpuArray Characteristics” on page 9-7 Transfer Arrays Between Workspace and GPU Send Arrays to the GPU A gpuArray in MATLAB represents an array that is stored on the GPU.
9 GPU Computing Transfer Array of a Specified Precision Create a matrix of double-precision random values in MATLAB, and then transfer the matrix as single-precision from MATLAB to the GPU: X = rand(1000); G = gpuArray(single(X)); Construct an Array for Storing on the GPU Construct a 100-by-100 matrix of uint32 ones and transfer it to the GPU.
Establish Arrays on a GPU For example, to see the help on the colon constructor, type help gpuArray/colon Example: Construct an Identity Matrix on the GPU To create a 1024-by-1024 identity matrix of type int32 on the GPU, type II = eye(1024,'int32','gpuArray'); size(II) 1024 1024 With one numerical argument, you create a 2-dimensional matrix.
9 GPU Computing parallel.gpu.RandStream These functions perform in the same way as rng and RandStream in MATLAB, but with certain limitations on the GPU. For more information on the use and limits of these functions, type help parallel.gpu.rng help parallel.gpu.RandStream The GPU uses the combined multiplicative recursive generator by default to create uniform random values, and uses inversion for creating normal values.
Establish Arrays on a GPU For more information about generating random numbers on a GPU, and a comparison between GPU and CPU generation, see “Control Random Number Streams” on page 6-37. For an example that shows performance comparisons for different random generators, see Generating Random Numbers on a GPU.
9 GPU Computing Run Built-In Functions on a GPU In this section... “MATLAB Functions with gpuArray Arguments” on page 9-8 “Example: Functions with gpuArray Input and Output” on page 9-9 “Sparse Arrays on a GPU” on page 9-10 “Considerations for Complex Numbers” on page 9-11 MATLAB Functions with gpuArray Arguments Many MATLAB built-in functions support gpuArray input arguments.
Run Built-In Functions on a GPU atan2d atand atanh besselj bessely beta betainc betaincinv betaln bitand bitcmp bitget bitor bitset bitshift bitxor blkdiag bsxfun cart2pol cart2sph cast cat cdf2rdf ceil chol circshift cumprod cumsum del2 det diag diff disp display dot double eig eps eq erf erfc erfcinv erfcx erfinv exp expm1 eye factorial false fft fft2 fftn idivide ifft ifft2 ifftn ifftshift imag ind2sub Inf inpolygon int16 int2str int32 int64 int8 interp1 interp2 interp3 interpn inv ipermute isaUnderly
9 GPU Computing Ga = rand(1000,'single','gpuArray'); Gfft = fft(Ga); Gb = (real(Gfft) + Ga) * 6; G = gather(Gb); The whos command is instructive for showing where each variable's data is stored. whos Name Size G Ga Gb Gfft Bytes 1000x1000 1000x1000 1000x1000 1000x1000 4000000 108 108 108 Class single gpuArray gpuArray gpuArray Notice that all the arrays are stored on the GPU (gpuArray), except for G, which is the result of the gather function.
Run Built-In Functions on a GPU (1,2) (2,5) 1 1 g = gpuArray(s); % g is a sparse gpuArray gt = transpose(g); % gt is a sparse gpuArray f = full(gt) % f is a full gpuArray 0 1 0 0 0 0 0 0 0 1 Considerations for Complex Numbers If the output of a function running on the GPU could potentially be complex, you must explicitly specify its input arguments as complex. This applies to gpuArray or to functions called in code run by arrayfun.
9 GPU Computing 9-12 Function Input Range for Real Output log(x) x >= 0 log1p(x) x >= -1 log10(x) x >= 0 log2(x) x >= 0 power(x,y) x >= 0 reallog(x) x >= 0 realsqrt(x) x >= 0 sqrt(x) x >= 0
Run Element-wise MATLAB Code on GPU Run Element-wise MATLAB Code on GPU In this section... “MATLAB Code vs. gpuArray Objects” on page 9-13 “Run Your MATLAB Functions on a GPU” on page 9-13 “Example: Run Your MATLAB Code” on page 9-14 “Supported MATLAB Code” on page 9-15 MATLAB Code vs. gpuArray Objects You have options for performing MATLAB calculations on the GPU: • You can transfer or create data on the GPU, and use the resulting gpuArray as input to enhanced built-in functions that support them.
9 GPU Computing Example: Run Your MATLAB Code In this example, a small function applies correction data to an array of measurement data. The function defined in the file myCal.m is: function c = myCal(rawdata, gain, offst) c = (rawdata .* gain) + offst; The function performs only element-wise operations when applying a gain factor and offset to each element of the rawdata array.
Run Element-wise MATLAB Code on GPU Supported MATLAB Code The function you pass into arrayfun or bsxfun can contain the following built-in MATLAB functions and operators: abs and acos acosh acot acoth acsc acsch asec asech asin asinh atan atan2 atanh beta betaln bitand bitcmp bitget bitor bitset bitshift bitxor ceil complex conj cos cosh cot coth csc csch double eps eq erf erfc erfcinv erfcx erfinv exp expm1 false fix floor gamma gammaln ge gt hypot imag Inf int8 int16 int32 int64 intmax intmin isfinite i
9 GPU Computing Generate Random Numbers on a GPU The function you pass to arrayfun or bsxfun for execution on a GPU can contain the random number generator functions rand, randi, and randn. However, the GPU does not support the complete functionality that MATLAB does. arrayfun and bsxfun support the following functions for random matrix generation on the GPU: rand rand() rand('single') rand('double') randn randn() randn('single') randn('double') randi randi() randi(IMAX, ...) randi([IMIN IMAX], ...
Run Element-wise MATLAB Code on GPU for gpuArray” on page 9-5. For more information about generating random numbers on a GPU, and a comparison between GPU and CPU generation, see “Control Random Number Streams” on page 6-37. For an example that shows performance comparisons for different random generators, see Generating Random Numbers on a GPU. Tips and Restrictions The following limitations apply to the code within the function that arrayfun or bsxfun is evaluating on a GPU.
9 GPU Computing Identify and Select a GPU Device If you have only one GPU in your computer, that GPU is the default.
Identify and Select a GPU Device AvailableMemory: MultiprocessorCount: ClockRateKHz: ComputeMode: GPUOverlapsTransfers: KernelExecutionTimeout: CanMapHostMemory: DeviceSupported: DeviceSelected: 4.9190e+09 13 614500 'Default' 1 0 1 1 1 If this is the device you want to use, you can proceed. 3 To use another device, call gpuDevice with the index of the other device, and view its properties to verify that it is the one you want.
9 GPU Computing Run CUDA or PTX Code on GPU In this section... “Overview” on page 9-20 “Create a CUDAKernel Object” on page 9-21 “Run a CUDAKernel” on page 9-26 “Complete Kernel Workflow” on page 9-28 Overview This topic explains how to create an executable kernel from CU or PTX (parallel thread execution) files, and run that kernel on a GPU from MATLAB. The kernel is represented in MATLAB by a CUDAKernel object, which can operate on MATLAB array or gpuArray variables.
Run CUDA or PTX Code on GPU The following sections provide details of these commands and workflow steps.
9 GPU Computing k = parallel.gpu.CUDAKernel('myfun.ptx','float *, const float *, float'); Another use for C prototype input is when your source code uses an unrecognized renaming of a supported data type. (See the supported types below.) Suppose your kernel comprises the following code. typedef float ArgType; __global__ void add3( ArgType * v1, const ArgType * v2 ) { int idx = threadIdx.
Run CUDA or PTX Code on GPU Integer Types int8_T, int16_T, int32_T, int64_T uint8_T, uint16_T, uint32_T, uint64_T The header file is shipped as matlabroot/extern/include/tmwtypes.h. You include the file in your program with the line: #include "tmwtypes.h" Argument Restrictions All inputs can be scalars or pointers, and can be labeled const. The C declaration of a kernel is always of the form: __global__ void aKernel(inputs ...
9 GPU Computing These rules have some implications. The most notable is that every output from a kernel must necessarily also be an input to the kernel, since the input allows the user to define the size of the output (which follows from being unable to allocate memory on the GPU). CUDAKernel Object Properties When you create a kernel object without a terminating semicolon, or when you type the object variable at the command line, MATLAB displays the kernel object properties. For example: k = parallel.
Run CUDA or PTX Code on GPU __global__ void simplestKernelEver( float * x, float val ) then the PTX code contains an entry that might be called _Z18simplestKernelEverPff. When you have multiple entry points, specify the entry name for the particular kernel when calling CUDAKernel to generate your kernel. Note The CUDAKernel function searches for your entry name in the PTX file, and matches on any substring occurrences. Therefore, you should not name any of your entries as substrings of any others.
9 GPU Computing • GridSize — A vector of three elements, the product of which determines the number of blocks. • ThreadBlockSize — A vector of three elements, the product of which determines the number of threads per block. (Note that the product cannot exceed the value of the property MaxThreadsPerBlock.) The default value for both of these properties is [1 1 1], but suppose you want to use 500 threads to run element-wise operations on vectors of 500 elements in parallel.
Run CUDA or PTX Code on GPU Use gpuArray Variables It might be more efficient to use gpuArray objects as input when running a kernel: k = parallel.gpu.CUDAKernel('conv.ptx','conv.cu'); i1 = gpuArray(rand(100,1,'single')); i2 = gpuArray(rand(100,1,'single')); result1 = feval(k,i1,i2); Because the output is a gpuArray, you can now perform other operations using this input or output data without further transfers between the MATLAB workspace and the GPU.
9 GPU Computing The input values x1 and x2 correspond to pInOut and c in the C function prototype. The output argument y corresponds to the value of pInOut in the C function prototype after the C kernel has executed.
Run CUDA or PTX Code on GPU 2 Compile the CU code at the shell command line to generate a PTX file called test.ptx. nvcc -ptx test.cu 3 Create the kernel in MATLAB. Currently this PTX file only has one entry so you do not need to specify it. If you were to put more kernels in, you would specify add1 as the entry. k = parallel.gpu.CUDAKernel('test.ptx','test.cu'); 4 Run the kernel with two numeric inputs. By default, a kernel runs on one thread.
9 GPU Computing 4 Before you run the kernel, set the number of threads correctly for the vectors you want to add. N = 128; k.ThreadBlockSize = N; in1 = ones(N,1,'gpuArray'); in2 = ones(N,1,'gpuArray'); result = feval(k,in1,in2); Example with CU and PTX Files For an example that shows how to work with CUDA, and provides CU and PTX files for you to experiment with, see Illustrating Three Approaches to GPU Computing: The Mandelbrot Set.
Run MEX-Functions Containing CUDA Code Run MEX-Functions Containing CUDA Code In this section... “Write a MEX-File Containing CUDA Code” on page 9-31 “Set Up for MEX-File Compilation” on page 9-32 “Compile a GPU MEX-File” on page 9-33 “Run the Resulting MEX-Functions” on page 9-33 “Comparison to a CUDA Kernel” on page 9-33 “Access Complex Data” on page 9-34 Write a MEX-File Containing CUDA Code Note Creating MEX-functions for gpuArray data is supported only on 64-bit platforms (win64, glnxa64, maci64).
9 GPU Computing { int i = blockDim.x * blockIdx.x + threadIdx.x; if (i < N) B[i] = 2.0 * A[i]; } It contains the following lines to determine the array size and launch a grid of the proper size: N = (int)(mxGPUGetNumberOfElements(A)); blocksPerGrid = (N + threadsPerBlock - 1) / threadsPerBlock; TimesTwo<<>>(d_A, d_B, N); Set Up for MEX-File Compilation • Your MEX source file that includes CUDA code must have a name with the extension .cu, not .c nor .cpp.
Run MEX-Functions Containing CUDA Code Compile a GPU MEX-File When you have set up the options file, use the mex command in MATLAB to compile a MEX-file containing the CUDA code. You can compile the example file using the command: mex -largeArrayDims mexGPUExample.cu The -largeArrayDims option is required to ensure that 64-bit values for array dimensions are passed to the MEX API.
9 GPU Computing • MEX-files can analyze the size of the input and allocate memory of a different size, or launch grids of a different size, from C or C++ code. In comparison, MATLAB code that calls CUDAKernel objects must pre-allocated output memory and determine the grid size. Access Complex Data Complex data on a GPU device is stored in interleaved complex format. That is, for a complex gpuArray A, the real and imaginary parts of element i are stored in consecutive addresses.
Measure and Improve GPU Performance Measure and Improve GPU Performance In this section... “Basic Workflow for Improving Performance” on page 9-35 “Advanced Tools for Improving Performance” on page 9-36 “Best Practices for Improving Performance” on page 9-37 “Measure Performance on the GPU” on page 9-38 “Vectorize for Improved GPU Performance” on page 9-39 Basic Workflow for Improving Performance The purpose of GPU computing in MATLAB is to speed up your applications.
9 GPU Computing you might need to vectorize your code, replacing looped scalar operations with MATLAB matrix and vector operations. While vectorizing is generally a good practice on the CPU, it is usually critical for achieving high performance on the GPU. For more information, see “Vectorize for Improved GPU Performance” on page 9-39.
Measure and Improve GPU Performance on the GPU, rewrites the code to use arrayfun for element-wise operations, and finally shows how to integrate a custom CUDA kernel for the same operation. Alternately, you can write a CUDA kernel as part of a MEX-file and call it using the CUDA Runtime API inside the MEX-file. Either of these approaches might let you work with low-level features of the GPU, such as shared memory and texture memory, that are not directly available in MATLAB code.
9 GPU Computing if you make that the first dimension. Similarly, if you frequently operate along a particular dimension, it is usually best to have it as the first dimension. In some cases, if consecutive operations target different dimensions of an array, it might be beneficial to transpose or permute the array between these operations. GPUs achieve high performance by calculating many results in parallel.
Measure and Improve GPU Performance repeating the timed operation to get better resolution, executing the function before measurement to avoid initialization overhead, and subtracting out the overhead of the timing function. Also, gputimeit ensures that all operations on the GPU have completed before the final timing. For example, consider measuring the time taken to compute the lu factorization of a random matrix A of size N-by-N.
9 GPU Computing transform of a filter vector, transforms back to the time domain, and stores the result in an output matrix. function y = fastConvolution(data,filter) [m,n] = size(data); % Zero-pad filter to the column length of data, and transform filter_f = fft(filter,m); % Create an array of zeros of the same size and class as data y = zeros(m,n,'like',data); % Transform each column of data for ix = 1:n af = fft(data(:,ix)); y(:,ix) = ifft(af .
Measure and Improve GPU Performance On the same machine, this code displays the output: Execution time on CPU = 0.019335 Execution time on GPU = 0.027235 Maximum absolute error = 1.1374e-14 Unfortunately, the GPU is slower than the CPU for this problem. The reason is that the for-loop is executing the FFT, multiplication, and inverse FFT operations on individual columns of length 4096.
9 GPU Computing Execution time on GPU = 0.0020537 Maximum absolute error = 1.1374e-14 In conclusion, vectorizing the code helps both the CPU and GPU versions to run faster. However, vectorization helps the GPU version much more than the CPU. The improved CPU version is nearly twice as fast as the original; the improved GPU version is 13 times faster than the original. The GPU code went from being 40% slower than the CPU in the original version, to about five times faster in the revised version.
10 Objects — Alphabetical List
10 Objects — Alphabetical List codistributed Access elements of arrays distributed among workers in parallel pool Constructor codistributed, codistributed.build You can also create a codistributed array explicitly from spmd code or a communicating job task with any of several overloaded MATLAB functions. eye(___,'codistributed') rand(___,'codistributed') false(___,'codistributed') randi(___,'codistributed') Inf(___,'codistributed') randn(___,'codistributed') NaN(___,'codistributed') codistributed.
codistributed Also among the methods there are several for examining the characteristics of the array itself.
10 Objects — Alphabetical List codistributor1d 1-D distribution scheme for codistributed array Constructor codistributor1d Description A codistributor1d object defines the 1-D distribution scheme for a codistributed array. The 1-D codistributor distributes arrays along a single specified dimension, the distribution dimension, in a noncyclic, partitioned manner.
codistributor2dbc codistributor2dbc 2-D block-cyclic distribution scheme for codistributed array Constructor codistributor2dbc Description A codistributor2dbc object defines the 2-D block-cyclic distribution scheme for a codistributed array. The 2-D block-cyclic codistributor can only distribute twodimensional matrices. It distributes matrices along two subscripts over a rectangular computational grid of labs in a blocked, cyclic manner.
10 Objects — Alphabetical List Composite Access nondistributed variables on multiple workers from client Constructor Composite Description Variables that exist on the workers running an spmd statement are accessible on the client as a Composite object. A Composite resembles a cell array with one element for each worker. So for Composite C: C{1} represents value of C on worker1 C{2} represents value of C on worker2 etc.
CUDAKernel CUDAKernel Kernel executable on GPU Constructor parallel.gpu.CUDAKernel Description A CUDAKernel object represents a CUDA kernel, that can execute on a GPU. You create the kernel when you compile PTX or CU code, as described in “Run CUDA or PTX Code on GPU” on page 9-20. Methods Properties A CUDAKernel object has the following properties: Property Name Description ThreadBlockSize Size of block of threads on the kernel.
10 Objects — Alphabetical List Property Name Description corresponding element in the vector of the MaxGridSize property of the GPUDevice object. SharedMemorySize The amount of dynamic shared memory (in bytes) that each thread block can use. Each thread block has an available shared memory region. The size of this region is limited in current cards to ~16 kB, and is shared with registers on the multiprocessors. As with all memory, this needs to be allocated before the kernel is launched.
CUDAKernel See Also gpuArray, GPUDevice 10-9
10 Objects — Alphabetical List distributed Access elements of distributed arrays from client Constructor distributed You can also create a distributed array explicitly from the client with any of several overloaded MATLAB functions. eye(___,'distributed') rand(___,'distributed') false(___,'distributed') randi(___,'distributed') Inf(___,'distributed') randn(___,'distributed') NaN(___,'distributed') distributed.cell ones(___,'distributed') distributed.spalloc true(___,'distributed') distributed.
distributed Methods The overloaded methods for distributed arrays are too numerous to list here. Most resemble and behave the same as built-in MATLAB functions. See “MATLAB Functions on Distributed and Codistributed Arrays”. Also among the methods are several for examining the characteristics of the array itself.
10 Objects — Alphabetical List gpuArray Array stored on GPU Constructor gpuArray converts an array in the MATLAB workspace into a gpuArray with elements stored on the GPU device. Also, the following create gpuArrays: eye(___,'gpuArray') rand(___,'gpuArray') false(___,'gpuArray') randi(___,'gpuArray') Inf(___,'gpuArray') randn(___,'gpuArray') NaN(___,'gpuArray') gpuArray.colon ones(___,'gpuArray') gpuArray.freqspace true(___,'gpuArray') gpuArray.linspace zeros(___,'gpuArray') gpuArray.
gpuArray Description A gpuArray object represents an array stored on the GPU. You can use the array for direct calculations, or in CUDA kernels that execute on the GPU. You can return the array to the MATLAB workspace with the gather function. Methods Other overloaded methods for a gpuArray object are too numerous to list here. Most resemble and behave the same as built-in MATLAB functions. See “Run Built-In Functions on a GPU ”.
10 Objects — Alphabetical List GPUDevice Graphics processing unit (GPU) Constructor gpuDevice Description A GPUDevice object represents a graphic processing unit (GPU) in your computer. You can use the GPU to execute CUDA kernels or MATLAB code. Methods The following functions let you identify, select, reset, or wait for a GPU device: Methods of the class include the following: Method Name Description parallel.gpu.GPUDevice.
GPUDevice where methodname is the name of the method. For example, to get help on isAvailable, type help parallel.gpu.GPUDevice.isAvailable Properties A GPUDevice object has the following read-only properties: Property Name Description Name Name of the CUDA device. Index Index by which you can select the device. ComputeCapability Computational capability of the CUDA device. Must meet required specification. SupportsDouble Indicates if this device can support double precision operations.
10 Objects — Alphabetical List Property Name Description ClockRateKHz Peak clock rate of the GPU in kHz. ComputeMode The compute mode of the device, according to the following values: 'Default' — The device is not restricted and can be used by multiple applications simultaneously. MATLAB can share the device with other applications, including other MATLAB sessions or workers. 'Exclusive thread' or 'Exclusive process' — The device can be used by only one application at a time.
mxGPUArray mxGPUArray Type for MATLAB gpuArray Description mxGPUArray is an opaque C language type that allows a MEX function access to the elements in a MATLAB gpuArray. Using the mxGPU API, you can perform calculations on a MATLAB gpuArray, and return gpuArray results to MATLAB. All MEX functions receive inputs and pass outputs as mxArrays. A gpuArray in MATLAB is a special kind of mxArray that represents an array stored on the GPU.
10 Objects — Alphabetical List See Also gpuArray, mxArray 10-18
parallel.Cluster parallel.Cluster Access cluster properties and behaviors Constructors parcluster getCurrentCluster (in the workspace of the MATLAB worker) Container Hierarchy Parent None Children parallel.Job, parallel.Pool Description A parallel.Cluster object provides access to a cluster, which controls the job queue, and distributes tasks to workers for execution. Types The two categories of clusters are the MATLAB job scheduler (MJS) and common job scheduler (CJS).
10 Objects — Alphabetical List Cluster Type Description parallel.cluster.HPCServer Interact with CJS cluster running Windows Microsoft HPC Server parallel.cluster.LSF Interact with CJS cluster running Platform LSF parallel.cluster.PBSPro Interact with CJS cluster running Altair PBS Pro parallel.cluster.Torque Interact with CJS cluster running TORQUE parallel.cluster.
parallel.
10 Objects — Alphabetical List Property Description SecurityLevel Degree of security applied to cluster and its jobs. For descriptions of security levels, see “Set MJS Cluster Security”. State Current state of cluster Username User accessing cluster Local Local cluster objects have no editable properties beyond the properties common to all clusters.
parallel.
10 Objects — Alphabetical List Property Description DeleteTaskFcn Function to run when deleting task GetJobStateFcn Function to run when querying job state IndependentSubmitFcn Function to run when submitting independent job HasSharedFilesystem Specify whether client and cluster nodes share JobStorageLocation Help For further help on cluster objects, including links to help for specific cluster types and object properties, type: help parallel.Cluster See Also parallel.Job, parallel.
parallel.cluster.Hadoop parallel.cluster.Hadoop Hadoop cluster for mapreducer Constructors parallel.cluster.Hadoop Description A parallel.cluster.Hadoop object provides access to a cluster for configuring mapreducer to use a Hadoop cluster for the computation environment. Properties A parallel.cluster.Hadoop object has the following properties.
10 Objects — Alphabetical List Property Description RequiresMathWorksHostedLicensing Specify whether cluster uses MathWorks hosted licensing Help For further help, type: help parallel.cluster.Hadoop See Also parallel.Cluster, parallel.
parallel.Future parallel.Future Request function execution on parallel pool workers Constructors parfeval, parfevalOnAll Container Hierarchy Parent parallel.Pool.FevalQueue Types The following table describes the available types of future objects. Future Type Description parallel.FevalFuture Single parfeval future instance parallel.FevalOnAllFuture parfevalOnAll future instance Description A parallel.
10 Objects — Alphabetical List Method Description cancel Cancel queued or running future fetchNext Retrieve next available unread future outputs (FevalFuture only) fetchOutputs Retrieve all outputs of future isequal True if futures have same ID (FevalFuture only) wait Wait for futures to complete Properties Future objects have the following properties. Note that some exist only for parallel.FevalFuture objects, not parallel.FevalOnAllFuture objects.
parallel.Future help parallel.FevalFuture help parallel.FevalOnAllFuture See Also parallel.
10 Objects — Alphabetical List parallel.Job Access job properties and behaviors Constructors createCommunicatingJob, createJob, findJob, recreate getCurrentJob (in the workspace of the MATLAB worker) Container Hierarchy Parent parallel.Cluster Children parallel.Task Description A parallel.Job object provides access to a job, which you create, define, and submit for execution. Types The following table describes the available types of job objects.
parallel.Job Methods All job type objects have the same methods, described in the following table. Properties Common to All Job Types The following properties are common to all job object types.
10 Objects — Alphabetical List Property Description UserData Information associated with job object Username Name of user who owns job MJS Jobs MJS independent job objects and MJS communicating job objects have the following properties in addition to the common properties: Property Description AuthorizedUsers Users authorized to access job FinishedFcn Callback function executed on client when this job finishes NumWorkersRange Minimum and maximum limits for number of workers to run job Queued
parallel.Job Help To get further help on a particular type of parallel.Job object, including a list of links to help for its properties, type help parallel.job.. For example: help parallel.job.MJSIndependentJob See Also parallel.Cluster, parallel.Task, parallel.
10 Objects — Alphabetical List parallel.Pool Access parallel pool Constructors parpool, gcp Description A parallel.Pool object provides access to a parallel pool running on a cluster. Methods A parallel pool object has the following methods. Properties A parallel pool object has the following properties.
parallel.Pool Help To get further help on parallel.Pool objects, including a list of links to help for specific properties, type: help parallel.Pool See Also parallel.Cluster, parallel.
10 Objects — Alphabetical List parallel.Task Access task properties and behaviors Constructors createTask, findTask getCurrentTask (in the workspace of the MATLAB worker) Container Hierarchy Parent parallel.Job Children none Description A parallel.Task object provides access to a task, which executes on a worker as part of a job. Types The following table describes the available types of task objects, determined by the type of cluster. Task Type Description parallel.task.
parallel.Task Properties Common to All Task Types The following properties are common to all task object types.
10 Objects — Alphabetical List MJS Tasks MJS task objects have the following properties in addition to the common properties: Property Description FailureInfo Information returned from failed task FinishedFcn Callback executed in client when task finishes MaximumRetries Maximum number of times to rerun failed task NumFailures Number of times tasked failed RunningFcn Callback executed in client when task starts running Timeout Time limit, in seconds, to complete task CJS Tasks CJS task object
parallel.Worker parallel.Worker Access worker that ran task Constructors getCurrentWorker in the workspace of the MATLAB worker. In the client workspace, a parallel.Worker object is available from the Worker property of a parallel.Task object. Container Hierarchy Parent parallel.cluster.MJS Children none Description A parallel.Worker object provides access to the MATLAB worker session that executed a task as part of a job. Types Worker Type Description parallel.cluster.
10 Objects — Alphabetical List Properties MJS Worker The following table describes the properties of an MJS worker. Property Description AllHostAddresses IP addresses of worker host Name Name of worker, set when worker session started Parent MJS cluster to which this worker belongs CJS Worker The following table describes the properties of an CJS worker.
RemoteClusterAccess RemoteClusterAccess Connect to schedulers when client utilities are not available locally Constructor r = parallel.cluster.RemoteClusterAccess(username) r = parallel.cluster.RemoteClusterAccess(username,P1,V1,...,Pn,Vn) Description parallel.cluster.RemoteClusterAccess allows you to establish a connection and run commands on a remote host. This class is intended for use with the generic scheduler interface when using remote submission of jobs or on nonshared file systems. r = parallel.
10 Objects — Alphabetical List Methods Method Name Description connect connect(r,clusterHost) establishes a connection to the specified host using the user credential options supplied in the constructor. File mirroring is not supported. connect(r,clusterHost,remoteDataLocation) establishes a connection to the specified host using the user credential options supplied in the constructor. remoteDataLocation identifies a folder on the clusterHost that is used for file mirroring.
RemoteClusterAccess Method Name Description runCommand [status,result] = runCommand(r,command) runs the supplied command on the remote host and returns the resulting status and standard output. The connect method must have already been called. startMirrorForJob startMirrorForJob(r,job) copies all the job files from the local DataLocation to the remote DataLocation, and starts mirroring files so that any changes to the files in the remote DataLocation are copied back to the local DataLocation.
10 Objects — Alphabetical List Property Name Description JobStorageLocation Location on the remote host for files that are being mirrored. UseIdentityFile Indicates if an identity file should be used when connecting to the remote host. Username User name for connecting to the remote host. Examples Mirror files from the remote data location. Assume the object job represents a job on your generic scheduler. remoteConnection = parallel.cluster.
11 Functions — Alphabetical List
11 Functions — Alphabetical List addAttachedFiles Attach files or folders to parallel pool Syntax addAttachedFiles(poolobj,files) Description addAttachedFiles(poolobj,files) adds extra attached files to the specified parallel pool. These files are transferred to each worker and are treated exactly the same as if they had been set at the time the pool was opened — specified by the parallel profile or the 'AttachedFiles' argument of the parpool function.
addAttachedFiles Files or folders to attach, specified as a string or cell array of strings. Each string can specify either an absolute or relative path to a file or folder. Example: {'myFun1.m','myFun2.
11 Functions — Alphabetical List arrayfun Apply function to each element of array on GPU Syntax A = arrayfun(FUN, B) A = arrayfun(FUN,B,C,...) [A,B,...] = arrayfun(FUN,C,...) Description This method of a gpuArray object is very similar in behavior to the MATLAB function arrayfun, except that the actual evaluation of the function happens on the GPU, not on the CPU.
arrayfun all have the same size or be scalar. Any scalar inputs are scalar expanded before being input to the function FUN. One or more of the inputs B, C, ... must be a gpuArray; any of the others can reside in CPU memory. Each array that is held in CPU memory is converted to a gpuArray before calling the function on the GPU. If you plan to use an array in several different arrayfun calls, it is more efficient to convert that array to a gpuArray before making the series of calls to arrayfun. [A,B,...
11 Functions — Alphabetical List R2 = rand(2,1,4,3,'gpuArray'); R3 = rand(1,5,4,3,'gpuArray'); R = arrayfun(@(x,y,z)(x+y.*z),R1,R2,R3); size(R) 2 5 4 3 R1 = rand(2,2,0,4,'gpuArray'); R2 = rand(2,1,1,4,'gpuArray'); R = arrayfun(@plus,R1,R2); size(R) 2 2 0 4 • Because the operations supported by arrayfun are strictly element-wise, and each element’s computation is performed independently of the others, certain restrictions are imposed: • Input and output arrays cannot change shape or size.
arrayfun o2 s1 s2 s3 400x400 400x400 400x400 400x400 108 108 108 108 gpuArray gpuArray gpuArray gpuArray Use gather to retrieve the data from the GPU to the MATLAB workspace.
11 Functions — Alphabetical List batch Run MATLAB script or function on worker Syntax j j j j j = = = = = batch('aScript') batch(myCluster,'aScript') batch(fcn,N,{x1, ..., xn}) batch(myCluster,fcn,N,{x1,...,xn}) batch(...,'p1',v1,'p2',v2,...) Arguments j The batch job object. 'aScript' The script of MATLAB code to be evaluated by the worker. myCluster Cluster object representing cluster compute resources. fcn Function handle or string of function name to be evaluated by the worker.
batch j = batch(fcn,N,{x1, ..., xn}) runs the function specified by a function handle or function name, fcn, on a worker in the cluster identified by the default cluster profile. The function returns j, a handle to the job object that runs the function. The function is evaluated with the given arguments, x1,...,xn, returning N output arguments. The function file for fcn is copied to the worker. (Do not include the .m file extension with the function name argument.) j = batch(myCluster,fcn,N,{x1,...
11 Functions — Alphabetical List is the cwd of MATLAB when the batch command is executed. If the string for this argument is '.', there is no change in folder before batch execution. • 'CaptureDiary' — A logical flag to indicate that the toolbox should collect the diary from the function call. See the diary function for information about the collected data. The default is true.
batch Clean up a batch job’s data after you are finished with it: delete(j) Run a batch function on a cluster that generates a 10-by-10 random matrix: c = parcluster(); j = batch(c,@rand,1,{10,10}); wait(j) diary(j) % Wait for the job to finish % Display the diary r = fetchOutputs(j); % Get results into a cell array r{1} % Display result More About Tips To see your batch job’s status or to track its progress, use the Job Monitor, as described in “Job Monitor” on page 6-29.
11 Functions — Alphabetical List bsxfun Binary singleton expansion function for gpuArray Syntax C = bsxfun(FUN,A,B) Description bsxfun with gpuArray input is similar in behavior to the MATLAB function bsxfun, except that the actual evaluation of the function, FUN, happens on the GPU, not on the CPU. C = bsxfun(FUN,A,B) applies the element-by-element binary operation specified by the function handle FUN to arrays A and B, with singleton expansion enabled.
bsxfun size(R) 2 5 4 3 R1 = rand(2,2,0,4,'gpuArray'); R2 = rand(2,1,1,4,'gpuArray'); R = bsxfun(@plus,R1,R2); size(R) 2 2 0 4 Examples Subtract the mean of each column from all elements in that column: A = rand(8,'gpuArray'); M = bsxfun(@minus,A,mean(A)); See Also arrayfun | gather | gpuArray | pagefun 11-13
11 Functions — Alphabetical List cancel Cancel job or task Syntax cancel(t) cancel(j) Arguments t Pending or running task to cancel. j Pending, running, or queued job to cancel. Description cancel(t) stops the task object, t, that is currently in the pending or running state. The task’s State property is set to finished, and no output arguments are returned.
cancel c = parcluster(); job1 = createJob(c); t = createTask(job1, @rand, 1, {3,3}); cancel(t) t Task with properties: ID: State: Function: Parent: StartTime: Running Duration: 1 finished @rand Job 1 0 days 0h 0m 0s ErrorIdentifier: parallel:task:UserCancellation ErrorMessage: The task was cancelled by user "mylogin" on machine "myhost.mydomain.com".
11 Functions — Alphabetical List cancel (FevalFuture) Cancel queued or running future Syntax cancel(F) Description cancel(F) stops the queued and running futures contained in F. No action is taken for finished futures. Each element of F that is not already in state 'finished' has its State property set to 'finished', and its Error property is set to contain an MException indicating that execution was cancelled. Examples Run a function several times until a satisfactory result is found.
cancel (FevalFuture) See Also fetchOutputs | isequal | parfeval | parfevalOnAll | fetchNext 11-17
11 Functions — Alphabetical List changePassword Prompt user to change MJS password Syntax changePassword(mjs) changePassword(mjs,username) Arguments mjs MJS cluster object on which password is changing username Character string identifying the user whose password is changing Description changePassword(mjs) prompts you to change your password as the current user on the MATLAB job scheduler (MJS) cluster represented by cluster object mjs. (Use the parcluster function to create a cluster object.
changePassword Change your password for the MJS cluster on which the parallel pool is running. p = gcp; mjs = p.
11 Functions — Alphabetical List classUnderlying Class of elements within gpuArray or distributed array Syntax C = classUnderlying(D) Description C = classUnderlying(D) returns the name of the class of the elements contained within the gpuArray or distributed array D. Similar to the MATLAB class function, this returns a string indicating the class of the data. Examples Examine the class of the elements of a gpuArray.
classUnderlying c1 = classUnderlying(D1) c8 = uint8 c1 = single See Also distributed | codistributed | gpuArray 11-21
11 Functions — Alphabetical List clear Remove objects from MATLAB workspace Syntax clear obj Arguments obj An object or an array of objects. Description clear obj removes obj from the MATLAB workspace. Examples This example creates two job objects on the MATLAB job scheduler jm. The variables for these job objects in the MATLAB workspace are job1 and job2. job1 is copied to a new variable, job1copy; then job1 and job2 are cleared from the MATLAB workspace.
clear 1 isequal (job1copy, j2) ans = 0 More About Tips If obj references an object in the cluster, it is cleared from the workspace, but it remains in the cluster. You can restore obj to the workspace with the parcluster, findJob, or findTask function; or with the Jobs or Tasks property.
11 Functions — Alphabetical List codistributed Create codistributed array from replicated local data Syntax C C C C = = = = codistributed(X) codistributed(X,codist) codistributed(X,codist,lab) codistributed(C1,codist) Description C = codistributed(X) distributes a replicated array X using the default codistributor, creating a codistributed array C as a result. X must be a replicated array, that is, it must have the same value on all workers. size(C) is the same as size(X).
codistributed Examples Create a 1000-by-1000 codistributed array C1 using the default distribution scheme. spmd N = 1000; X = magic(N); % Replicated on every worker C1 = codistributed(X); % Partitioned among the workers end Create a 1000-by-1000 codistributed array C2, distributed by rows (over its first dimension).
11 Functions — Alphabetical List codistributed.build Create codistributed array from distributed data Syntax D = codistributed.build(L, codist) D = codistributed.build(L, codist, 'noCommunication') Description D = codistributed.build(L, codist) forms a codistributed array with getLocalPart(D) = L. The codistributed array D is created as if you had combined all copies of the local array L. The distribution scheme is specified by codist.
codistributed.build % Distribute the matrix over the second dimension (columns), % and let the codistributor derive the partition from the % global size. codistr = codistributor1d(2, ... codistributor1d.unsetPartition, globalSize) % On 4 workers, codistr.Partition equals [251, 250, 250, 250]. % Allocate storage for the local part. localSize = [N, codistr.Partition(labindex)]; L = zeros(localSize); % Use globalIndices to map the indices of the columns % of the local part into the global column indices.
11 Functions — Alphabetical List codistributed.cell Create codistributed cell array Syntax C C C C C C = = = = = = codistributed.cell(n) codistributed.cell(m, n, p, ...) codistributed.cell([m, n, p, ...]) cell(n, codist) cell(m, n, p, ..., codist) cell([m, n, p, ...], codist) Description C = codistributed.cell(n) creates an n-by-n codistributed array of underlying class cell, distributing along columns. C = codistributed.cell(m, n, p, ...) or C = codistributed.cell([m, n, p, ...
codistributed.cell C = cell(8, codistributor1d()); end C = cell(m, n, p, ..., codist) and C = cell([m, n, p, ...], codist) are the same as C = codistributed.cell(m, n, p, ...) and C = codistributed.cell([m, n, p, ...]), respectively. You can also use the optional 'noCommunication' argument with this syntax. Examples With four workers, spmd(4) C = codistributed.cell(1000); end creates a 1000-by-1000 distributed cell array C, distributed by its second dimension (columns).
11 Functions — Alphabetical List codistributed.colon Distributed colon operation Syntax codistributed.colon(a,d,b) codistributed.colon(a,b) codistributed.colon( ___ ,codist) codistributed.colon( ___ ,'noCommunication') codistributed.colon( ___ ,codist,'noCommunication') Description codistributed.colon(a,d,b) partitions the vector a:d:b into numlabs contiguous subvectors of equal, or nearly equal length, and creates a codistributed array whose local portion on each worker is the labindex-th subvector.
codistributed.colon spmd(4); C = codistributed.colon(1,10), end Lab 1: This worker stores C(1:3). LocalPart: [1 2 3] Codistributor: [1x1 codistributor1d] Lab 2: This worker stores C(4:6). LocalPart: [4 5 6] Codistributor: [1x1 codistributor1d] Lab 3: This worker stores C(7:8). LocalPart: [7 8] Codistributor: [1x1 codistributor1d] Lab 4: This worker stores C(9:10).
11 Functions — Alphabetical List codistributed.spalloc Allocate space for sparse codistributed matrix Syntax SD = codistributed.spalloc(M, N, nzmax) SD = spalloc(M, N, nzmax, codist) Description SD = codistributed.spalloc(M, N, nzmax) creates an M-by-N all-zero sparse codistributed matrix with room to hold nzmax nonzeros. Optional arguments to codistributed.
codistributed.spalloc SD = codistributed.spalloc(N, N, 2*N); for ii=1:N-1 SD(ii,ii:ii+1) = [ii ii]; end end See Also spalloc | sparse | distributed.
11 Functions — Alphabetical List codistributed.speye Create codistributed sparse identity matrix Syntax CS CS CS CS CS CS = = = = = = codistributed.speye(n) codistributed.speye(m, n) codistributed.speye([m, n]) speye(n, codist) speye(m, n, codist) speye([m, n], codist) Description CS = codistributed.speye(n) creates an n-by-n sparse codistributed array of underlying class double. CS = codistributed.speye(m, n) or CS = codistributed.
codistributed.speye CS = speye(m, n, codist) and CS = speye([m, n], codist) are the same as CS = codistributed.speye(m, n) and CS = codistributed.speye([m, n]), respectively. You can also use the optional arguments with this syntax.
11 Functions — Alphabetical List codistributed.sprand Create codistributed sparse array of uniformly distributed pseudo-random values Syntax CS = codistributed.sprand(m, n, density) CS = sprand(n, codist) Description CS = codistributed.sprand(m, n, density) creates an m-by-n sparse codistributed array with approximately density*m*n uniformly distributed nonzero double entries. Optional arguments to codistributed.
codistributed.sprand spmd(4) CS = codistributed.sprand(1000, 1000, .001); end creates a 1000-by-1000 sparse codistributed double array CS with approximately 1000 nonzeros. CS is distributed by its second dimension (columns), and each worker contains a 1000-by-250 local piece of CS. spmd(4) codist = codistributor1d(2, 1:numlabs); CS = sprand(10, 10, .1, codist); end creates a 10-by-10 codistributed double array CS with approximately 10 nonzeros.
11 Functions — Alphabetical List codistributed.sprandn Create codistributed sparse array of uniformly distributed pseudo-random values Syntax CS = codistributed.sprandn(m, n, density) CS = sprandn(n, codist) Description CS = codistributed.sprandn(m, n, density) creates an m-by-n sparse codistributed array with approximately density*m*n normally distributed nonzero double entries. Optional arguments to codistributed.
codistributed.sprandn spmd(4) CS = codistributed.sprandn(1000, 1000, .001); end creates a 1000-by-1000 sparse codistributed double array CS with approximately 1000 nonzeros. CS is distributed by its second dimension (columns), and each worker contains a 1000-by-250 local piece of CS. spmd(4) codist = codistributor1d(2, 1:numlabs); CS = sprandn(10, 10, .1, codist); end creates a 10-by-10 codistributed double array CS with approximately 10 nonzeros.
11 Functions — Alphabetical List codistributor Create codistributor object for codistributed arrays Syntax codist codist codist codist codist codist codist = = = = = = = codistributor() codistributor('1d') codistributor('1d', dim) codistributor('1d', dim, part) codistributor('2dbc') codistributor('2dbc', lbgrid) codistributor('2dbc', lbgrid, blksize) Description There are two schemes for distributing arrays.
codistributor codist = codistributor('2dbc') forms a 2-D block-cyclic codistributor object. For more information about '2dbc' distribution, see “2-Dimensional Distribution” on page 5-16. codist = codistributor('2dbc', lbgrid) forms a 2-D block-cyclic codistributor object with the lab grid defined by lbgrid and with default block size. codist = codistributor('2dbc', lbgrid, blksize) forms a 2-D block-cyclic codistributor object with the lab grid defined by lbgrid and with a block size defined by blksize.
11 Functions — Alphabetical List A See Also codistributed | codistributor1d | codistributor2dbc | getCodistributor | getLocalPart | redistribute 11-42
codistributor1d codistributor1d Create 1-D codistributor object for codistributed arrays Syntax codist codist codist codist = = = = codistributor1d() codistributor1d(dim) codistributor1d(dim,part) codistributor1d(dim,part,gsize) Description The 1-D codistributor distributes arrays along a single, specified distribution dimension, in a noncyclic, partitioned manner. codist = codistributor1d() forms a codistributor1d object using default dimension and partition.
11 Functions — Alphabetical List To use a default dimension, specify codistributor1d.unsetDimension for that argument; the distribution dimension is derived from gsize and is set to the last non-singleton dimension. Similarly, to use a default partition, specify codistributor1d.unsetPartition for that argument; the partition is then derived from the default for that global size and distribution dimension.
codistributor1d.defaultPartition codistributor1d.defaultPartition Default partition for codistributed array Syntax P = codistributor1d.defaultPartition(n) Description P = codistributor1d.defaultPartition(n) is a vector with sum(P) = n and length(P) = numlabs. The first rem(n,numlabs) elements of P are equal to ceil(n/ numlabs) and the remaining elements are equal to floor(n/numlabs). This function is the basis for the default distribution of codistributed arrays.
11 Functions — Alphabetical List codistributor2dbc Create 2-D block-cyclic codistributor object for codistributed arrays Syntax codist codist codist codist codist = = = = = codistributor2dbc() codistributor2dbc(lbgrid) codistributor2dbc(lbgrid,blksize) codistributor2dbc(lbgrid,blksize,orient) codistributor2dbc(lbgrid,blksize,orient,gsize) Description The 2-D block-cyclic codistributor can be used only for two-dimensional arrays.
codistributor2dbc codist = codistributor2dbc(lbgrid,blksize,orient,gsize) forms a codistributor object that distributes arrays with the global size gsize. The resulting codistributor object is complete and can therefore be used to build a codistributed array from its local parts with codistributed.build. To use the default values for lab grid, block size, and orientation, specify them using codistributor2dbc.defaultLabGrid, codistributor2dbc.defaultBlockSize, and codistributor2dbc.
11 Functions — Alphabetical List codistributor2dbc.defaultLabGrid Default computational grid for 2-D block-cyclic distributed arrays Syntax grid = codistributor2dbc.defaultLabGrid() Description grid = codistributor2dbc.defaultLabGrid() returns a vector, grid = [nrow ncol], defining a computational grid of nrow-by-ncol workers in the open parallel pool, such that numlabs = nrow x ncol. The grid defined by codistributor2dbc.defaultLabGrid is as close to a square as possible.
Composite Composite Create Composite object Syntax C = Composite() C = Composite(nlabs) Description C = Composite() creates a Composite object on the client using workers from the parallel pool. The actual number of workers referenced by this Composite object depends on the size of the pool and any existing Composite objects. Generally, you should construct Composite objects outside any spmd statement.
11 Functions — Alphabetical List Examples The following examples all use a local parallel pool of four workers, opened with the statement: p = parpool('local',4); This example shows how to create a Composite object with no defined elements, then assign values using a for-loop in the client. c = Composite(); % One element per worker in the pool for w = 1:length(c) c{w} = 0; % Value stored on each worker end This example shows how to assign Composite elements in an spmd block.
Composite d = distributed([3 1 4 2]); % One integer per worker spmd c = getLocalPart(d); % Unique value on each worker end c{:} 3 1 4 2 See Also parpool | spmd 11-51
11 Functions — Alphabetical List createCommunicatingJob Create communicating job on cluster Syntax job job job job job = = = = = createCommunicatingJob(cluster) createCommunicatingJob(...,'p1',v1,'p2',v2,...) createCommunicatingJob(...,'Type','pool',...) createCommunicatingJob(...,'Type','spmd',...) createCommunicatingJob(...,'Profile','profileName',...) Description job = createCommunicatingJob(cluster) creates a communicating job object for the identified cluster. job = createCommunicatingJob(...
createCommunicatingJob simultaneously on all workers, and lab* functions can be used for communication between workers. job = createCommunicatingJob(...,'Profile','profileName',...) creates a communicating job object with the property values specified in the profile 'profileName'. If no profile is specified and the cluster object has a value specified in its 'Profile' property, the cluster’s profile is automatically applied.
11 Functions — Alphabetical List Delete the job from the cluster.
createJob createJob Create independent job on cluster Syntax obj = createJob(cluster) obj = createJob(...,'p1',v1,'p2',v2,...) job = createJob(...,'Profile','profileName',...) Arguments obj The job object. cluster The cluster object created by parcluster. p1, p2 Object properties configured at object creation. v1, v2 Initial values for corresponding object properties. Description obj = createJob(cluster) creates an independent job object for the identified cluster.
11 Functions — Alphabetical List is not specified and the cluster has a value specified in its 'Profile' property, the cluster’s profile is automatically applied. For details about defining and applying profiles, see “Clusters and Cluster Profiles” on page 6-14. Examples Create and Run a Basic Job Construct an independent job object using the default profile. c = parcluster j = createJob(c); Add tasks to the job. for i = 1:10 createTask(j,@rand,1,{10}); end Run the job.
createJob {'myapp/folderA','myapp/folderB','myapp/file1.
11 Functions — Alphabetical List createTask Create new task in job Syntax t t t t = = = = createTask(j, F, N, {inputargs}) createTask(j, F, N, {C1,...,Cm}) createTask(..., 'p1',v1,'p2',v2,...) createTask(...,'Profile', 'ProfileName',...) Arguments t Task object or vector of task objects. j The job that the task object is created in. F A handle to the function that is called when the task is evaluated, or an array of function handles.
createTask by a function handle or function name F, with the given input arguments {inputargs}, returning N output arguments. t = createTask(j, F, N, {C1,...,Cm}) uses a cell array of m cell arrays to create m task objects in job j, and returns a vector, t, of references to the new task objects. Each task evaluates the function specified by a function handle or function name F.
11 Functions — Alphabetical List Run the job. submit(j); Wait for the job to finish running, and get the output from the task evaluation. wait(j); taskoutput = fetchOutputs(j); Show the 10-by-10 random matrix. disp(taskoutput{1}); Create a Job with Three Tasks This example creates a job with three tasks, each of which generates a 10-by-10 random matrix.
delete delete Remove job or task object from cluster and memory Syntax delete(obj) Description delete(obj) removes the job or task object, obj, from the local MATLAB session, and removes it from the cluster’s JobStorageLocation. When the object is deleted, references to it become invalid. Invalid objects should be removed from the workspace with the clear command. If multiple references to an object exist in the workspace, deleting one reference to that object invalidates the remaining references to it.
11 Functions — Alphabetical List Delete all jobs on the cluster identified by the profile myProfile: myCluster = parcluster('myProfile'); delete(myCluster.
delete (Pool) delete (Pool) Shut down parallel pool Syntax delete(poolobj) Description delete(poolobj) shuts down the parallel pool associated with the object poolobj, and destroys the communicating job that comprises the pool. Subsequent parallel language features will automatically start a new parallel pool, unless your parallel preferences disable this behavior. References to the deleted pool object become invalid. Invalid objects should be removed from the workspace with the clear command.
11 Functions — Alphabetical List demote Demote job in cluster queue Syntax demote(c,job) Arguments c Cluster object that contains the job. job Job object demoted in the job queue. Description demote(c,job) demotes the job object job that is queued in the cluster c. If job is not the last job in the queue, demote exchanges the position of job and the job that follows it in the queue.
demote Examine the new queue sequence: [pjobs,qjobs,rjobs,fjobs] = findJob(c); get(qjobs,'Name') 'Job A' 'Job C' 'Job B' More About Tips After a call to demote or promote, there is no change in the order of job objects contained in the Jobs property of the cluster object. To see the scheduled order of execution for jobs in the queue, use the findJob function in the form [pending queued running finished] = findJob(c).
11 Functions — Alphabetical List diary Display or save Command Window text of batch job Syntax diary(job) diary(job, 'filename') Arguments job Job from which to view Command Window output text. 'filename' File to append with Command Window output text from batch job Description diary(job) displays the Command Window output from the batch job in the MATLAB Command Window. The Command Window output will be captured only if the batch command included the 'CaptureDiary' argument with a value of true.
distributed distributed Create distributed array from data in client workspace Syntax D = distributed(X) Description D = distributed(X) creates a distributed array from X. X is an array stored on the MATLAB client workspace, and D is a distributed array stored in parts on the workers of the open parallel pool. Constructing a distributed array from local data this way is appropriate only if the MATLAB client can store the entirety of X in its memory.
11 Functions — Alphabetical List D1 = distributed(magic(Nsmall)); Create a large distributed array directly, using a build method: Nlarge = 1000; D2 = rand(Nlarge,'distributed'); Retrieve elements of a distributed array, and note where the arrays are located by their Class: D3 = gather(D2); whos Name D1 D2 D3 Nlarge Nsmall See Also Size 50x50 1000x1000 1000x1000 1x1 1x1 Bytes 733 733 8000000 8 8 codistributed | gather | parpool 11-68 Class distributed distributed double double double
distributed.cell distributed.cell Create distributed cell array Syntax D = distributed.cell(n) D = distributed.cell(m, n, p, ...) D = distributed.cell([m, n, p, ...]) Description D = distributed.cell(n) creates an n-by-n distributed array of underlying class cell. D = distributed.cell(m, n, p, ...) or D = distributed.cell([m, n, p, ...]) create an m-by-n-by-p-by-... distributed array of underlying class cell. Examples Create a distributed 1000-by-1000 cell array: D = distributed.
11 Functions — Alphabetical List distributed.spalloc Allocate space for sparse distributed matrix Syntax SD = distributed.spalloc(M, N, nzmax) Description SD = distributed.spalloc(M, N, nzmax) creates an M-by-N all-zero sparse distributed matrix with room to hold nzmax nonzeros. Examples Allocate space for a 1000-by-1000 sparse distributed matrix with room for up to 2000 nonzero elements, then define several elements: N = 1000; SD = distributed.
distributed.speye distributed.speye Create distributed sparse identity matrix Syntax DS = distributed.speye(n) DS = distributed.speye(m, n) DS = distributed.speye([m, n]) Description DS = distributed.speye(n) creates an n-by-n sparse distributed array of underlying class double. DS = distributed.speye(m, n) or DS = distributed.speye([m, n]) creates an m-by-n sparse distributed array of underlying class double. Examples Create a distributed 1000-by-1000 sparse identity matrix: N = 1000; DS = distributed.
11 Functions — Alphabetical List distributed.sprand Create distributed sparse array of uniformly distributed pseudo-random values Syntax DS = distributed.sprand(m, n, density) Description DS = distributed.sprand(m, n, density) creates an m-by-n sparse distributed array with approximately density*m*n uniformly distributed nonzero double entries. Examples Create a 1000-by-1000 sparse distributed double array DS with approximately 1000 nonzeros. DS = distributed.sprand(1000, 1000, .
distributed.sprandn distributed.sprandn Create distributed sparse array of normally distributed pseudo-random values Syntax DS = distributed.sprandn(m, n, density) Description DS = distributed.sprandn(m, n, density) creates an m-by-n sparse distributed array with approximately density*m*n normally distributed nonzero double entries. Examples Create a 1000-by-1000 sparse distributed double array DS with approximately 1000 nonzeros. DS = distributed.sprandn(1000, 1000, .
11 Functions — Alphabetical List dload Load distributed arrays and Composite objects from disk Syntax dload dload filename dload filename X dload filename X Y Z ... dload -scatter ... [X,Y,Z,...] = dload('filename','X','Y','Z',...) Description dload without any arguments retrieves all variables from the binary file named matlab.mat. If matlab.mat is not available, the command generates an error. dload filename retrieves all variables from a file given a full pathname or a relative partial pathname.
dload When loading Composite objects, the data is sent to the available parallel pool workers. If the Composite is too large to fit on the current parallel pool, the data is not loaded. If the Composite is smaller than the current parallel pool, a warning is issued. Examples Load variables X, Y, and Z from the file fname.mat: dload fname X Y Z Use the function form of dload to load distributed arrays P and Q from file fname.mat: [P,Q] = dload('fname.
11 Functions — Alphabetical List dsave Save workspace distributed arrays and Composite objects to disk Syntax dsave dsave filename dsave filename X dsave filename X Y Z Description dsave without any arguments creates the binary file named matlab.mat and writes to the file all workspace variables, including distributed arrays and Composite objects. You can retrieve the variable data using dload. dsave filename saves all workspace variables to the binary file named filename.mat.
dsave See Also save | distributed | Composite | dload | parpool 11-77
11 Functions — Alphabetical List exist Check whether Composite is defined on workers Syntax h = exist(C,labidx) h = exist(C) Description h = exist(C,labidx) returns true if the entry in Composite C has a defined value on the worker with labindex labidx, false otherwise. In the general case where labidx is an array, the output h is an array of the same size as labidx, and h(i) indicates whether the Composite entry labidx(i) has a defined value. h = exist(C) is equivalent to h = exist(C, 1:length(C)).
existsOnGPU existsOnGPU Determine if gpuArray or CUDAKernel is available on GPU Syntax TF = existsOnGPU(DATA) Description TF = existsOnGPU(DATA) returns a logical value indicating whether the gpuArray or CUDAKernel object represented by DATA is still present on the GPU and available from your MATLAB session. The result is false if DATA is no longer valid and cannot be used.
11 Functions — Alphabetical List 4 14 15 1 reset(g); M_exists = existsOnGPU(M) 0 M % Try to display gpuArray Data no longer exists on the GPU. clear M See Also gpuDevice | gpuArray | parallel.gpu.
eye eye Identity matrix Syntax E E E E C C C C = = = = = = = = eye(sz,arraytype) eye(sz,datatype,arraytype) eye(sz,'like',P) eye(sz,datatype,'like',P) eye(sz,codist) eye(sz,datatype,codist) eye(sz, ___ ,codist,'noCommunication') eye(sz, ___ ,codist,'like',P) Description E = eye(sz,arraytype) creates an arraytype identity matrix with underlying class of double, with ones on the main diagonal and zeros elsewhere.
11 Functions — Alphabetical List Argument Values Descriptions 'uint8', 'int16', 'uint16', 'int32', 'uint32', 'int64', or 'uint64' E = eye(sz,'like',P) creates an identity matrix of the same type and underlying class (data type) as array P. E = eye(sz,datatype,'like',P) creates an identity matrix of the specified underlying class (datatype), and the same type as array P.
eye D = eye(1000,'distributed'); Create Codistributed Identity Matrix Create a 1000-by-1000 codistributed double identity matrix, distributed by its second dimension (columns). spmd(4) C = eye(1000,'codistributed'); end With four workers, each worker contains a 1000-by-250 local piece of C. Create a 1000-by-1000 codistributed uint16 identity matrix , distributed by its columns.
11 Functions — Alphabetical List false Array of logical 0 (false) Syntax F F C C C = = = = = false(sz,arraytype) false(sz,'like',P) false(sz,codist) false(sz, ___ ,codist,'noCommunication') false(sz, ___ ,codist,'like',P) Description F = false(sz,arraytype) creates a matrix with false values in all elements. The size and type of array are specified by the argument options according to the following table. Argument sz arraytype Values Descriptions n Specifies size as an n-by-n matrix.
false see the reference pages for codistributor1d and codistributor2dbc. To use the default distribution scheme, you can specify a codistributor constructor without arguments. For example: spmd C = false(8,codistributor1d()); end C = false(sz, ___ ,codist,'noCommunication') specifies that no interworker communication is to be performed when constructing a codistributed array, skipping some error checking steps.
11 Functions — Alphabetical List Each worker contains a 100-by-labindex local piece of C.
fetchNext fetchNext Retrieve next available unread FevalFuture outputs Syntax [idx,B1,B2,...,Bn] = fetchNext(F) [idx,B1,B2,...,Bn] = fetchNext(F,TIMEOUT) Description [idx,B1,B2,...,Bn] = fetchNext(F) waits for an unread FevalFuture in the array of futures F to finish, and then returns the index of that future in array F as idx, along with the future’s results in B1,B2,...,Bn. Before this call, the 'Read' property of the particular future is false; afterward it is true. [idx,B1,B2,...
11 Functions — Alphabetical List end % Build a waitbar to track progress h = waitbar(0,'Waiting for FevalFutures to complete...
fetchOutputs (job) fetchOutputs (job) Retrieve output arguments from all tasks in job Syntax data = fetchOutputs(job) Description data = fetchOutputs(job) retrieves the output arguments contained in the tasks of a finished job. If the job has M tasks, each row of the M-by-N cell array data contains the output arguments for the corresponding task in the job. Each row has N elements, where N is the greatest number of output arguments from any one task in the job.
11 Functions — Alphabetical List Wait for the job to finish and retrieve the random matrix: wait(j) data = fetchOutputs(j); data{1} 11-90
fetchOutputs (FevalFuture) fetchOutputs (FevalFuture) Retrieve all output arguments from Future Syntax [B1,B2,...,Bn] = fetchOutputs(F) [B1,B2,...,Bn] = fetchOutputs(F,'UniformOutput',false) Description [B1,B2,...,Bn] = fetchOutputs(F) fetches all outputs of future object F after first waiting for each element of F to reach the state 'finished'. An error results if any element of F has NumOutputArguments less than the requested number of outputs.
11 Functions — Alphabetical List 0.0048 0.9658 0.8488 Create an FevalFuture vector, and fetch all its outputs.
feval feval Evaluate kernel on GPU Syntax feval(KERN, x1, ..., xn) [y1, ..., ym] = feval(KERN, x1, ..., xn) Description feval(KERN, x1, ..., xn) evaluates the CUDA kernel KERN with the given arguments x1, ..., xn. The number of input arguments, n, must equal the value of the NumRHSArguments property of KERN, and their types must match the description in the ArgumentTypes property of KERN. The input data can be regular MATLAB data, GPU arrays, or a mixture of the two. [y1, ..., ym] = feval(KERN, x1, ...
11 Functions — Alphabetical List [y1, y2] = feval(KERN, x1, x2, x3) The three input arguments, x1, x2, and x3, correspond to the three arguments that are passed into the CUDA function. The output arguments, y1 and y2, are gpuArray types, and correspond to the values of pInOut1 and pInOut2 after the CUDA kernel has executed. See Also arrayfun | gather | gpuArray | parallel.gpu.
findJob findJob Find job objects stored in cluster Syntax out = findJob(c) [pending queued running completed] = findJob(c) out = findJob(c,'p1',v1,'p2',v2,...) Arguments c Cluster object in which to find the job. pending Array of jobs whose State is pending in cluster c. queued Array of jobs whose State is queued in cluster c. running Array of jobs whose State is running in cluster c. completed Array of jobs that have completed running, i.e., whose State is finished or failed in cluster c.
11 Functions — Alphabetical List completed jobs include those that failed. Jobs that are deleted or whose status is unavailable are not returned by this function. out = findJob(c,'p1',v1,'p2',v2,...) returns an array, out, of job objects whose property values match those passed as property-value pairs, p1, v1, p2, v2, etc. The property name must be in the form of a string, with the value being the appropriate type for that property.
findTask findTask Task objects belonging to job object Syntax tasks = findTask(j) [pending running completed] = findTask(j) tasks = findTask(j,'p1',v1,'p2',v2,...) Arguments j Job object. tasks Returned task objects. pending Array of tasks in job obj whose State is pending. running Array of tasks in job obj whose State is running. completed Array of completed tasks in job obj, i.e., those whose State is finished or failed. p1, p2 Task object properties to match.
11 Functions — Alphabetical List specified property-value pairs, p1, v1, p2, v2, etc. The property name must be in the form of a string, with the value being the appropriate type for that property. For a match, the object property value must be exactly the same as specified, including letter case. For example, if a task’s Name property value is MyTask, then findTask will not find that object while searching for a Name property value of mytask. Examples Create a job object.
for for for-loop over distributed range Syntax for variable = drange(colonop) statement ... statement end Description The general format is for variable = drange(colonop) statement ... statement end The colonop is an expression of the form start:increment:finish or start:finish. The default value of increment is 1. The colonop is partitioned by codistributed.colon into numlabs contiguous segments of nearly equal length.
11 Functions — Alphabetical List Examples Find the rank of magic squares. Access only the local portion of a codistributed array. r = zeros(1, 40, codistributor()); for n = drange(1:40) r(n) = rank(magic(n)); end r = gather(r); Perform Monte Carlo approximation of pi. Each worker is initialized to a different random number state. m = 10000; for p = drange(1:numlabs) z = rand(m, 1) + i*rand(m, 1); c = sum(abs(z) < 1) end k = gplus(c) p = 4*k/(m*numlabs); Attempt to compute Fibonacci numbers.
gather gather Transfer distributed array or gpuArray to local workspace Syntax X = gather(A) X = gather(C,lab) Description X = gather(A) can operate inside an spmd statement, pmode, or communicating job to gather together the elements of a codistributed array, or outside an spmd statement to gather the elements of a distributed array. If you execute this inside an spmd statement, pmode, or communicating job, X is a replicated array with all the elements of the array on every worker.
11 Functions — Alphabetical List n = 10; spmd C = codistributed(magic(n)); M = gather(C) % Gather all elements to all workers end S = gather(C) % Gather elements to client Gather all of the elements of C onto worker 1, for operations that cannot be performed across distributed arrays.
gather W 1024x1 8192 double More About Tips Note that gather assembles the codistributed or distributed array in the workspaces of all the workers on which it executes, or on the MATLAB client, respectively, but not both. If you are using gather within an spmd statement, the gathered array is accessible on the client via its corresponding Composite object; see “Access Worker Variables with Composites”.
11 Functions — Alphabetical List gcat Global concatenation Syntax Xs = gcat(X) Xs = gcat(X, dim) Xs = gcat(X, dim, targetlab) Description Xs = gcat(X) concatenates the variant array X from each worker in the second dimension. The result is replicated on all workers. Xs = gcat(X, dim) concatenates the variant array X from each worker in the dimension indicated by dim. Xs = gcat(X, dim, targetlab) performs the reduction, and places the result into res only on the worker indicated by targetlab.
gcp gcp Get current parallel pool Syntax p = gcp p = gcp('nocreate') Description p = gcp returns a parallel.Pool object representing the current parallel pool. The current pool is where parallel language features execute, such as parfor, spmd, distributed, Composite, parfeval and parfevalOnAll. If no parallel pool exists, gcp starts a new parallel pool and returns a pool object for that, unless automatic pool starts are disabled in your parallel preferences.
11 Functions — Alphabetical List delete(gcp('nocreate')) See Also Composite | delete | distributed | parfeval | parfevalOnAll | parfor | parpool | spmd 11-106
getAttachedFilesFolder getAttachedFilesFolder Folder into which AttachedFiles are written Syntax folder = getAttachedFilesFolder Arguments folder String indicating location where files from job’s AttachedFiles property are placed Description folder = getAttachedFilesFolder returns a string, which is the path to the local folder into which AttachedFiles are written. This function returns an empty array if it is not called on a MATLAB worker. Examples Find the current AttachedFiles folder.
11 Functions — Alphabetical List getCodistributor Codistributor object for existing codistributed array Syntax codist = getCodistributor(D) Description codist = getCodistributor(D) returns the codistributor object of codistributed array D. Properties of the object are Dimension and Partition for 1-D distribution; and BlockSize, LabGrid, and Orientation for 2-D block cyclic distribution. For any one codistributed array, getCodistributor returns the same values on all workers.
getCodistributor ornt = codist2.Orientation end Demonstrate that these codistributor objects are complete: spmd (4) isComplete(codist1) isComplete(codist2) end See Also codistributed | codistributed.
11 Functions — Alphabetical List getCurrentCluster Cluster object that submitted current task Syntax c = getCurrentCluster Arguments c The cluster object that scheduled the task currently being evaluated by the worker session. Description c = getCurrentCluster returns the parallel.Cluster object that has sent the task currently being evaluated by the worker session. Cluster object c is the Parent of the task’s parent job. Examples Find the current cluster.
getCurrentCluster See Also getAttachedFilesFolder | getCurrentJob | getCurrentTask | getCurrentWorker 11-111
11 Functions — Alphabetical List getCurrentJob Job object whose task is currently being evaluated Syntax job = getCurrentJob Arguments job The job object that contains the task currently being evaluated by the worker session. Description job = getCurrentJob returns the Parallel.Job object that is the Parent of the task currently being evaluated by the worker session. More About Tips If the function is executed in a MATLAB session that is not a worker, you get an empty result.
getCurrentTask getCurrentTask Task object currently being evaluated in this worker session Syntax task = getCurrentTask Arguments task The task object that the worker session is currently evaluating. Description task = getCurrentTask returns the Parallel.Task object whose function is currently being evaluated by the MATLAB worker session on the cluster. More About Tips If the function is executed in a MATLAB session that is not a worker, you get an empty result.
11 Functions — Alphabetical List getCurrentWorker Worker object currently running this session Syntax worker = getCurrentWorker Arguments worker The worker object that is currently evaluating the task that contains this function. Description worker = getCurrentWorker returns the Parallel.Worker object representing the MATLAB worker session that is currently evaluating the task function that contains this call. If the function runs in a MATLAB session that is not a worker, it returns an empty result.
getCurrentWorker j = createJob(c); j.AttachedFiles = {'identifyWorkerHost.
11 Functions — Alphabetical List getDebugLog Read output messages from job run in CJS cluster Syntax str = getDebugLog(cluster, job_or_task) Arguments str Variable to which messages are returned as a string expression. cluster Cluster object referring to Microsoft Windows HPC Server (or CCS), Platform LSF, PBS Pro, or TORQUE cluster, created by parcluster. job_or_task Object identifying job or task whose messages you want.
getDebugLog getDebugLog(c,j); See Also createCommunicatingJob | createJob | createTask | parcluster 11-117
11 Functions — Alphabetical List getJobClusterData Get specific user data for job on generic cluster Syntax userdata = getJobClusterData(cluster,job) Arguments userdata Information that was previously stored for this job cluster Cluster object identifying the generic third-party cluster running the job job Job object identifying the job for which to retrieve data Description userdata = getJobClusterData(cluster,job) returns data stored for the job job that was derived from the generic cluster clus
getJobFolder getJobFolder Folder on client where jobs are stored Syntax joblocation = getJobFolder(cluster,job) Description joblocation = getJobFolder(cluster,job) returns the path to the folder on disk where files are stored for the specified job and cluster. This folder is valid only the client MATLAB session, not necessarily the workers. This method exists only on clusters using the generic interface.
11 Functions — Alphabetical List getJobFolderOnCluster Folder on cluster where jobs are stored Syntax joblocation = getJobFolderOnCluster(cluster,job) Description joblocation = getJobFolderOnCluster(cluster,job) returns the path to the folder on disk where files are stored for the specified job and cluster. This folder is valid only in worker MATLAB sessions. An error results if the HasSharedFilesystem property of the cluster is false. This method exists only on clusters using the generic interface.
getLocalPart getLocalPart Local portion of codistributed array Syntax L = getLocalPart(A) Description L = getLocalPart(A) returns the local portion of a codistributed array.
11 Functions — Alphabetical List getLogLocation Log location for job or task Syntax logfile = getLogLocation(cluster,cj) logfile = getLogLocation(cluster,it) Description logfile = getLogLocation(cluster,cj) for a generic cluster cluster and communicating job cj, returns the location where the log data should be stored for the whole job cj.
globalIndices globalIndices Global indices for local part of codistributed array Syntax K = globalIndices(C,dim) K = globalIndices(C,dim,lab) [E,F] = globalIndices(C,dim) [E,F] = globalIndices(C,dim,lab) K = globalIndices(codist,dim,lab) [E,F] = globalIndices(codist,dim,lab) Description globalIndices tells you the relationship between indices on a local part and the corresponding index range in a given dimension on the codistributed array.
11 Functions — Alphabetical List Examples Create a 2-by-22 codistributed array among four workers, and view the global indices on each lab: spmd C = zeros(2,22,codistributor1d(2,[6 6 5 if labindex == 1 K = globalIndices(C,2) % returns elseif labindex == 2 [E,F] = globalIndices(C,2) % returns end K = globalIndices(C,2,3) % returns [E,F] = globalIndices(C,2,4) % returns end 5])); K = 1:6. E = 7, F = 12. K = 13:17. E = 18, F = 22.
gop gop Global operation across all workers Syntax res = gop(FUN,x) res = gop(FUN,x,targetlab) Arguments FUN Function to operate across workers. x Argument to function F, should be the same variable on all workers, but can have different values. res Variable to hold reduction result. targetlab Lab to which reduction results are returned. This value is returned by that worker’s labindex. Description res = gop(FUN,x) is the reduction via the function FUN of the quantities x from each worker.
11 Functions — Alphabetical List Examples This example shows how to calculate the sum and maximum values for x among all workers. p = parpool('local',4); x = Composite(); x{1} = 3; x{2} = 1; x{3} = 4; x{4} = 2; spmd xsum = gop(@plus,x); xmax = gop(@max,x); end xsum{1} 10 xmax{1} 4 This example shows how to horizontally concatenate the column vectors of x from all workers into a matrix. It uses the same 4-worker parallel pool opened by the previous example.
gop spmd res = gop(afun,num2str(labindex)); end res{1} 1 2 3 4 See Also labBarrier | labindex | numlabs 11-127
11 Functions — Alphabetical List gplus Global addition Syntax S = gplus(X) S = gplus(X, targetlab) Description S = gplus(X) returns the addition of the variant array X from each worker. The result S is replicated on all workers. S = gplus(X, targetlab) performs the addition, and places the result into S only on the worker indicated by targetlab. S is set to [] on all other workers. Examples With four workers, S = gplus(labindex) returns S = 1 + 2 + 3 + 4 = 10 on all four workers.
gpuArray gpuArray Create array on GPU Syntax G = gpuArray(X) Description G = gpuArray(X) copies the numeric array X to the GPU, and returns a gpuArray object. You can operate on this array by passing its gpuArray to the feval method of a CUDA kernel object, or by using one of the methods defined for gpuArray objects in “Establish Arrays on a GPU” on page 9-3. The MATLAB array X must be numeric (for example: single, double, int8, etc.
11 Functions — Alphabetical List G2 10x10 108 gpuArray Copy the array back to the MATLAB workspace. G1 = gather(G2); whos G1 Name G1 See Also Size Bytes 10x10 400 Class single arrayfun | bsxfun | existsOnGPU | feval | gather | parallel.gpu.
gpuDevice gpuDevice Query or select GPU device Syntax D = gpuDevice D = gpuDevice() D = gpuDevice(IDX) gpuDevice([ ]) Description D = gpuDevice or D = gpuDevice(), if no device is already selected, selects the default GPU device and returns a GPUDevice object representing that device. If a GPU device is already selected, this returns an object representing that device without clearing it. D = gpuDevice(IDX) selects the GPU device specified by index IDX. IDX must be in the range of 1 to gpuDeviceCount.
11 Functions — Alphabetical List for ii = 1:gpuDeviceCount g = gpuDevice(ii); fprintf(1,'Device %i has ComputeCapability %s \n', ... g.Index,g.ComputeCapability) end Device 1 has ComputeCapability 3.5 Device 2 has ComputeCapability 2.0 See Also arrayfun | wait (GPUDevice) | feval | gpuDeviceCount | parallel.gpu.
gpuDeviceCount gpuDeviceCount Number of GPU devices present Syntax n = gpuDeviceCount Description n = gpuDeviceCount returns the number of GPU devices present in your computer. Examples Determine how many GPU devices you have available in your computer and examine the properties of each. n = gpuDeviceCount; for ii = 1:n gpuDevice(ii) end See Also arrayfun | feval | gpuDevice | parallel.gpu.
11 Functions — Alphabetical List gputimeit Time required to run function on GPU Syntax t = gputimeit(F) t = gputimeit(F,N) Description t = gputimeit(F) measures the typical time (in seconds) required to run the function specified by the function handle F. The function handle accepts no external input arguments, but can be defined with input arguments to its internal function call. t = gputimeit(F,N) calls F to return N output arguments.
gputimeit t1 = gputimeit(f,1) 0.2933 More About Tips gputimeit is preferable to timeit for functions that use the GPU, because it ensures that all operations on the GPU have finished before recording the time and compensates for the overhead. For operations that do not use a GPU, timeit offers greater precision. Note the following limitations: • The function F should not call tic or toc. • You cannot use tic and toc to measure the execution time of gputimeit itself.
11 Functions — Alphabetical List help Help for toolbox functions in Command Window Syntax help class/function Arguments class A Parallel Computing Toolbox object class, for example, parallel.cluster, parallel.job, or parallel.task. function A function or property of the specified class. To see what functions or properties are available for a class, see the methods or properties reference page. Description help class/function returns command-line help for the specified function of the given class.
help parallel.job.CJSIndependentJob help parallel.job/createTask help parallel.
11 Functions — Alphabetical List Inf Array of infinity Syntax A A A A C C C C = = = = = = = = Inf(sz,arraytype) Inf(sz,datatype,arraytype) Inf(sz,'like',P) Inf(sz,datatype,'like',P) Inf(sz,codist) Inf(sz,datatype,codist) Inf(sz, ___ ,codist,'noCommunication') Inf(sz, ___ ,codist,'like',P) Description A = Inf(sz,arraytype) creates a matrix with underlying class of double, with Inf values in all elements.
Inf Argument datatype Values Descriptions 'gpuArray' Specifies gpuArray. 'double' (default), Specifies underlying class of the array, i.e., the 'single' data type of its elements. A = Inf(sz,'like',P) creates an array of Inf values with the same type and underlying class (data type) as array P. A = Inf(sz,datatype,'like',P) creates an array of Inf values with the specified underlying class (datatype), and the same type as array P.
11 Functions — Alphabetical List Create Codistributed Inf Matrix Create a 1000-by-1000 codistributed double matrix of Infs, distributed by its second dimension (columns). spmd(4) C = Inf(1000,'codistributed'); end With four workers, each worker contains a 1000-by-250 local piece of C. Create a 1000-by-1000 codistributed single matrix of Infs, distributed by its columns.
isaUnderlying isaUnderlying True if distributed array's underlying elements are of specified class Syntax TF = isaUnderlying(D, 'classname') Description TF = isaUnderlying(D, 'classname') returns true if the elements of distributed or codistributed array D are either an instance of classname or an instance of a class derived from classname. isaUnderlying supports the same values for classname as the MATLAB isa function does.
11 Functions — Alphabetical List iscodistributed True for codistributed array Syntax tf = iscodistributed(X) Description tf = iscodistributed(X) returns true for a codistributed array, or false otherwise. For a description of codistributed arrays, see “Nondistributed Versus Distributed Arrays” on page 5-2.
isComplete isComplete True if codistributor object is complete Syntax tf = isComplete(codist) Description tf = isComplete(codist) returns true if codist is a completely defined codistributor, or false otherwise. For a description of codistributed arrays, see “Nondistributed Versus Distributed Arrays” on page 5-2.
11 Functions — Alphabetical List isdistributed True for distributed array Syntax tf = isdistributed(X) Description tf = isdistributed(X) returns true for a distributed array, or false otherwise. For a description of a distributed array, see “Nondistributed Versus Distributed Arrays” on page 5-2.
isequal isequal True if clusters have same property values Syntax isequal(C1,C2) isequal(C1,C2,C3,...) Description isequal(C1,C2) returns logical 1 (true) if clusters C1 and C2 have the same property values, or logical 0 (false) otherwise. isequal(C1,C2,C3,...) returns true if all clusters are equal. isequal can operate on arrays of clusters. In this case, the arrays are compared element by element. When comparing clusters, isequal does not compare the contents of the clusters’ Jobs property.
11 Functions — Alphabetical List isequal (FevalFuture) True if futures have same ID Syntax eq = isequal(F1,F2) Description eq = isequal(F1,F2) returns logical 1 (true) if futures F1 and F2 have the same ID property value, or logical 0 (false) otherwise. Examples Compare future object in workspace to queued future object. p = parpool('local',2); q = p.FevalQueue; Fp = parfevalOnAll(p,@pause,0,30); F1 = parfeval(p,@magic,1,10); F2 = q.
isreplicated isreplicated True for replicated array Syntax tf = isreplicated(X) Description tf = isreplicated(X) returns true for a replicated array, or false otherwise. For a description of a replicated array, see “Nondistributed Versus Distributed Arrays” on page 5-2. isreplicated also returns true for a Composite X if all its elements are identical.
11 Functions — Alphabetical List jobStartup File for user-defined options to run when job starts Syntax jobStartup(job) Arguments job The job for which this startup is being executed. Description jobStartup(job) runs automatically on a worker the first time that worker evaluates a task for a particular job. You do not call this function from the client session, nor explicitly as part of a task function. You add MATLAB code to the jobStartup.m file to define job initialization actions on the worker.
labBarrier labBarrier Block execution until all workers reach this call Syntax labBarrier Description labBarrier blocks execution of a parallel algorithm until all workers have reached the call to labBarrier. This is useful for coordinating access to shared resources such as file I/O. Examples Synchronize Workers for Timing When timing code execution on the workers, use labBarrier to ensure all workers are synchronized and start their timed work together.
11 Functions — Alphabetical List labBroadcast Send data to all workers or receive data sent to all workers Syntax shared_data = labBroadcast(srcWkrIdx,data) shared_data = labBroadcast(srcWkrIdx) Arguments srcWkrIdx The labindex of the worker sending the broadcast. data The data being broadcast. This argument is required only for the worker that is broadcasting. The absence of this argument indicates that a worker is receiving. shared_data The broadcast data as it is received on all other workers.
labBroadcast Examples In this case, the broadcaster is the worker whose labindex is 1.
11 Functions — Alphabetical List labindex Index of this worker Syntax id = labindex Description id = labindex returns the index of the worker currently executing the function. labindex is assigned to each worker when a job begins execution, and applies only for the duration of that job. The value of labindex spans from 1 to n, where n is the number of workers running the current job, defined by numlabs.
labProbe labProbe Test to see if messages are ready to be received from other worker Syntax isDataAvail = labProbe isDataAvail = labProbe(srcWkrIdx) isDataAvail = labProbe('any',tag) isDataAvail = labProbe(srcWkrIdx,tag) [isDataAvail,srcWkrIdx,tag] = labProbe Arguments srcWkrIdx labindex of a particular worker from which to test for a message. tag Tag defined by the sending worker’s labSend function to identify particular data.
11 Functions — Alphabetical List [isDataAvail,srcWkrIdx,tag] = labProbe returns labindex of the workers and tags of ready messages. If no data is available, srcWkrIdx and tag are returned as [].
labReceive labReceive Receive data from another worker Syntax data = labReceive data = labReceive(srcWkrIdx) data = labReceive('any',tag) data = labReceive(srcWkrIdx,tag) [data,srcWkrIdx,tag] = labReceive Arguments srcWkrIdx labindex of a particular worker from which to receive data. tag Tag defined by the sending worker’s labSend function to identify particular data. 'any' String to indicate that data can come from any worker. data Data sent by the sending worker’s labSend function.
11 Functions — Alphabetical List More About Tips This function blocks execution in the worker until the corresponding call to labSend occurs in the sending worker.
labSend labSend Send data to another worker Syntax labSend(data,rcvWkrIdx) labSend(data,rcvWkrIdx,tag) Arguments data Data sent to the other workers; any MATLAB data type. rcvWkrIdx labindex of receiving worker or workers. tag Nonnegative integer to identify data. Description labSend(data,rcvWkrIdx) sends the data to the specified destination.data can be any MATLAB data type.
11 Functions — Alphabetical List labSendReceive Simultaneously send data to and receive data from another worker Syntax dataReceived = labSendReceive(rcvWkrIdx,srcWkrIdx,dataSent) dataReceived = labSendReceive(rcvWkrIdx,srcWkrIdx,dataSent,tag) Arguments dataSent Data on the sending worker that is sent to the receiving worker; any MATLAB data type. dataReceived Data accepted on the receiving worker. rcvWkrIdx labindex of the receiving worker to which data is sent.
labSendReceive dataReceived = labSendReceive(rcvWkrIdx,srcWkrIdx,dataSent,tag) uses the specified tag for the communication. tag can be any integer from 0 to 32767. Examples Create a unique set of data on each worker, and transfer each worker’s data one worker to the right (to the next higher labindex). First use the magic function to create a unique value for the variant array mydata on each worker.
11 Functions — Alphabetical List Lab 2: otherdata = 1 Lab 3: otherdata = 1 3 4 2 Transfer data to the next worker without wrapping data from the last worker to the first worker.
length length Length of object array Syntax length(obj) Arguments obj An object or an array of objects. Description length(obj) returns the length of obj. It is equivalent to the command max(size(obj)). Examples Examine how many tasks are in the job j1. length(j1.
11 Functions — Alphabetical List listAutoAttachedFiles List of files automatically attached to job, task, or parallel pool Syntax listAutoAttachedFiles(obj) Description listAutoAttachedFiles(obj) performs a dependency analysis on all the task functions, or on the batch job script or function. Then it displays a list of the code files that are already or going to be automatically attached to the job or task object obj.
listAutoAttachedFiles listAutoAttachedFiles(obj) Automatically Attach Files Programmatically Programmatically set a job to automatically attach code files, and then view a list of those files for one of the tasks in the job. c = parcluster(); % Use default profile j = createJob(c); j.
11 Functions — Alphabetical List load Load workspace variables from batch job Syntax load(job) load(job, 'X') load(job, 'X', 'Y', 'Z*') load(job, '-regexp', 'PAT1', 'PAT2') S = load(job ...) Arguments job Job from which to load workspace variables. 'X' , 'Y', 'Z*' Variables to load from the job. Wildcards allow pattern matching in MAT-file style. '-regexp' Indication to use regular expression pattern matching. S Struct containing the variables after loading.
load S = load(job ...) returns the contents of job into variable S, which is a struct containing fields matching the variables retrieved. Examples Run a batch job and load its results into your client workspace. j = batch('myScript'); wait(j) load(j) Load only variables whose names start with 'a'. load(job, 'a*') Load only variables whose names contain any digits.
11 Functions — Alphabetical List logout Log out of MJS cluster Syntax logout(c) Description logout(c) logs you out of the MJS cluster specified by cluster object c. Any subsequent call to a privileged action requires you to re-authenticate with a valid password. Logging out might be useful when you are finished working on a shared machine.
mapreducer mapreducer Define parallel execution environment for mapreduce mapreducer is the execution configuration function for mapreduce. This function specifies where mapreduce execution takes place. With Parallel Computing Toolbox, you can expand the execution environment to include various compute clusters.
11 Functions — Alphabetical List mapreducer(hcluster) specifies a Hadoop cluster for parallel execution of mapreduce. hcluster is a parallel.cluster.Hadoop object. mapreducer(mr) sets the global execution environment for mapreduce using a previously created MapReducer object, mr, if its ObjectVisibility property is 'On'. mr = mapreducer( ___ ) returns a MapReducer object to specify the execution environment.
mapreducer Output Arguments mr — Execution environment for mapreduce MapReducer object Execution environment for mapreduce, returned as a MapReducer object. See Also gcmr | gcp | mapreduce | parallel.cluster.
11 Functions — Alphabetical List methods List functions of object class Syntax methods(obj) out = methods(obj) Arguments obj An object or an array of objects. out Cell array of strings. Description methods(obj) returns the names of all methods for the class of which obj is an instance. out = methods(obj) returns the names of the methods as a cell array of strings. Examples Create cluster, job, and task objects, and examine what methods are available for each.
methods See Also help 11-171
11 Functions — Alphabetical List mpiLibConf Location of MPI implementation Syntax [primaryLib, extras] = mpiLibConf Arguments primaryLib MPI implementation library used by a communicating job. extras Cell array of other required library names. Description [primaryLib, extras] = mpiLibConf returns the MPI implementation library to be used by a communicating job. primaryLib is the name of the shared library file containing the MPI entry points.
mpiLibConf More About Tips Under all circumstances, the MPI library must support all MPI-1 functions. Additionally, the MPI library must support null arguments to MPI_Init as defined in section 4.2 of the MPI-2 standard. The library must also use an mpi.h header file that is fully compatible with MPICH2.
11 Functions — Alphabetical List mpiprofile Profile parallel communication and execution times Syntax mpiprofile mpiprofile on mpiprofile off mpiprofile resume mpiprofile clear mpiprofile status mpiprofile reset mpiprofile info mpiprofile viewer mpiprofile('viewer', ) Description mpiprofile enables or disables the parallel profiler data collection on a MATLAB worker running a communicating job. mpiprofile aggregates statistics on execution time and communication times.
mpiprofile Option Description additionally records information about built-in functions such as eig or labReceive. -messagedetail default This option specifies the detail at which communication information is stored. -messagedetail simplified -messagedetail default collects information on a per-lab instance. -messagedetail simplified turns off collection for *PerLab data fields, which reduces the profiling overhead.
11 Functions — Alphabetical List mpiprofile info returns a profiling data structure with additional fields to the one provided by the standard profile info in the FunctionTable entry. All these fields are recorded on a per-function and per-line basis, except for the *PerLab fields.
mpiprofile Examples In pmode, turn on the parallel profiler, run your function in parallel, and call the viewer: mpiprofile on; % call your function; mpiprofile viewer; If you want to obtain the profiler information from a communicating job outside of pmode (i.e., in the MATLAB client), you need to return output arguments of mpiprofile info by using the functional form of the command.
11 Functions — Alphabetical List mpiSettings Configure options for MPI communication Syntax mpiSettings('DeadlockDetection','on') mpiSettings('MessageLogging','on') mpiSettings('MessageLoggingDestination','CommandWindow') mpiSettings('MessageLoggingDestination','stdout') mpiSettings('MessageLoggingDestination','File','filename') Description mpiSettings('DeadlockDetection','on') turns on deadlock detection during calls to labSend and labReceive.
mpiSettings Examples Set deadlock detection for a communicating job inside the jobStartup.m file for that job: % Inside jobStartup.m for the communicating job mpiSettings('DeadlockDetection', 'on'); myLogFname = sprintf('%s_%d.
11 Functions — Alphabetical List mxGPUCopyFromMxArray (C) Copy mxArray to mxGPUArray C Syntax #include "gpu/mxGPUArray.h" mxGPUArray* mxGPUCopyFromMxArray(mxArray const * const mp) Arguments mp Pointer to an mxArray that contains either GPU or CPU data. Returns Pointer to an mxGPUArray. Description mxGPUCopyFromMxArray produces a new mxGPUArray object with the same characteristics as the input mxArray. • If the input mxArray contains a gpuArray, the output is a new copy of the data on the GPU.
mxGPUCopyGPUArray (C) mxGPUCopyGPUArray (C) Duplicate (deep copy) mxGPUArray object C Syntax #include "gpu/mxGPUArray.h" mxGPUArray* mxGPUCopyGPUArray(mxGPUArray const * const mgp) Arguments mgp Pointer to an mxGPUArray. Returns Pointer to an mxGPUArray. Description mxGPUCopyGPUArray produces a new array on the GPU and copies the data, and then returns a new mxGPUArray that refers to the copy. Use mxGPUDestroyGPUArray to delete the result when you are done with it.
11 Functions — Alphabetical List mxGPUCopyImag (C) Copy imaginary part of mxGPUArray C Syntax #include "gpu/mxGPUArray.h" mxGPUArray* mxGPUCopyImag(mxGPUArray const * const mgp) Arguments mgp Pointer to an mxGPUArray. The target gpuArray must be full, not sparse. Returns Pointer to an mxGPUArray. Description mxGPUCopyImag copies the imaginary part of GPU data, and returns a new mxGPUArray object that refers to the copy.
mxGPUCopyReal (C) mxGPUCopyReal (C) Copy real part of mxGPUArray C Syntax #include "gpu/mxGPUArray.h" mxGPUArray* mxGPUCopyReal(mxGPUArray const * const mgp) Arguments mgp Pointer to an mxGPUArray. The target gpuArray must be full, not sparse. Returns Pointer to an mxGPUArray. Description mxGPUCopyReal copies the real part of GPU data, and returns a new mxGPUArray object that refers to the copy. If the input is real rather than complex, the function returns a copy of the input.
11 Functions — Alphabetical List mxGPUCreateComplexGPUArray (C) Create complex GPU array from two real gpuArrays C Syntax #include "gpu/mxGPUArray.h" mxGPUArray* mxGPUCreateComplexGPUArray(mxGPUArray const * const mgpR, mxGPUArray const * const mgpI) Arguments mgpRmgpI Pointers to mxGPUArray data containing real and imaginary coefficients. The target gpuArrays must be full, not sparse. Returns Pointer to an mxGPUArray.
mxGPUCreateFromMxArray (C) mxGPUCreateFromMxArray (C) Create read-only mxGPUArray object from input mxArray C Syntax #include "gpu/mxGPUArray.h" mxGPUArray const * mxGPUCreateFromMxArray(mxArray const * const mp) Arguments mp Pointer to an mxArray that contains either GPU or CPU data. Returns Pointer to a read-only mxGPUArray object. Description mxGPUCreateFromMxArray produces a read-only mxGPUArray object from an mxArray.
11 Functions — Alphabetical List mxGPUCreateGPUArray (C) Create mxGPUArray object, allocating memory on GPU C Syntax #include "gpu/mxGPUArray.h" mxGPUArray* mxGPUCreateGPUArray(mwSize const ndims, mwSize const * const dims, mxClassID const cid, mxComplexity const ccx, mxGPUInitialize const init0) Arguments ndims mwSize type specifying the number of dimensions in the created mxGPUArray. dims Pointer to an mwSize vector specifying the sizes of each dimension in the created mxGPUArray.
mxGPUCreateGPUArray (C) Returns Pointer to an mxGPUArray. Description mxGPUCreateGPUArray creates a new mxGPUArray object with the specified size, type, and complexity. It also allocates the required memory on the GPU, and initializes the memory if requested. This function allocates a new mxGPUArray object on the CPU. Use mxGPUDestroyGPUArray to delete the object when you are done with it.
11 Functions — Alphabetical List mxGPUCreateMxArrayOnCPU (C) Create mxArray for returning CPU data to MATLAB with data from GPU C Syntax #include "gpu/mxGPUArray.h" mxArray* mxGPUCreateMxArrayOnCPU(mxGPUArray const * const mgp) Arguments mgp Pointer to an mxGPUArray. Returns Pointer to an mxArray object containing CPU data that is a copy of the GPU data. Description mxGPUCreateMxArrayOnCPU copies the GPU data from the specified mxGPUArray into an mxArray on the CPU for return to MATLAB.
mxGPUCreateMxArrayOnGPU (C) mxGPUCreateMxArrayOnGPU (C) Create mxArray for returning GPU data to MATLAB C Syntax #include "gpu/mxGPUArray.h" mxArray* mxGPUCreateMxArrayOnGPU(mxGPUArray const * const mgp) Arguments mgp Pointer to an mxGPUArray. Returns Pointer to an mxArray object containing GPU data. Description mxGPUCreateMxArrayOnGPU puts the mxGPUArray into an mxArray for return to MATLAB. The data remains on the GPU and the returned class in MATLAB is gpuArray.
11 Functions — Alphabetical List mxGPUDestroyGPUArray (C) Delete mxGPUArray object C Syntax #include "gpu/mxGPUArray.h" mxGPUDestroyGPUArray(mxGPUArray const * const mgp) Arguments mgp Pointer to an mxGPUArray. Description mxGPUDestroyGPUArray deletes an mxGPUArray object on the CPU. Use this function to delete an mxGPUArray object you created with: • mxGPUCreateGPUArray • mxGPUCreateFromMxArray • mxGPUCopyFromMxArray • mxGPUCopyReal • mxGPUCopyImag, or • mxGPUCreateComplexGPUArray.
mxGPUGetClassID (C) mxGPUGetClassID (C) mxClassID associated with data on GPU C Syntax #include "gpu/mxGPUArray.h" mxClassID mxGPUGetClassID(mxGPUArray const * const mgp) Arguments mgp Pointer to an mxGPUArray. Returns mxClassID type. Description mxGPUGetClassID returns an mxClassID type indicating the underlying class of the input data.
11 Functions — Alphabetical List mxGPUGetComplexity (C) Complexity of data on GPU C Syntax #include "gpu/mxGPUArray.h" mxComplexity mxGPUGetComplexity(mxGPUArray const * const mgp) Arguments mgp Pointer to an mxGPUArray. Returns mxComplexity type. Description mxGPUGetComplexity returns an mxComplexity type indicating the complexity of the GPU data.
mxGPUGetData (C) mxGPUGetData (C) Raw pointer to underlying data C Syntax #include "gpu/mxGPUArray.h" void* mxGPUGetData(mxGPUArray const * const mgp) Arguments mgp Pointer to an mxGPUArray on the GPU. The target gpuArray must be full, not sparse. Returns Pointer to data. Description mxGPUGetData returns a raw pointer to the underlying data. Cast this pointer to the type of data that you want to use on the device.
11 Functions — Alphabetical List mxGPUGetDataReadOnly (C) Read-only raw pointer to underlying data C Syntax #include "gpu/mxGPUArray.h" void const* mxGPUGetDataReadOnly(mxGPUArray const * const mgp) Arguments mgp Pointer to an mxGPUArray on the GPU. The target gpuArray must be full, not sparse. Returns Read-only pointer to data. Description mxGPUGetDataReadOnly returns a read-only raw pointer to the underlying data. Cast it to the type of data that you want to use on the device.
mxGPUGetDimensions (C) mxGPUGetDimensions (C) mxGPUArray dimensions C Syntax #include "gpu/mxGPUArray.h" mwSize const * mxGPUGetDimensions(mxGPUArray const * const mgp) Arguments mgp Pointer to an mxGPUArray. Returns Pointer to a read-only array of mwSize type. Description mxGPUGetDimensions returns a pointer to an array of mwSize indicating the dimensions of the input argument. Use mxFree to delete the output.
11 Functions — Alphabetical List mxGPUGetNumberOfDimensions (C) Size of dimension array for mxGPUArray C Syntax #include "gpu/mxGPUArray.h" mwSize mxGPUGetNumberOfDimensions(mxGPUArray const * const mgp) Arguments mgp Pointer to an mxGPUArray. Returns mwSize type. Description mxGPUGetNumberOfDimensions returns the size of the dimension array for the mxGPUArray input argument, indicating the number of its dimensions.
mxGPUGetNumberOfElements (C) mxGPUGetNumberOfElements (C) Number of elements on GPU for array C Syntax #include "gpu/mxGPUArray.h" mwSize mxGPUGetNumberOfElements(mxGPUArray const * const mgp) Arguments mgp Pointer to an mxGPUArray. Returns mwSize type. Description mxGPUGetNumberOfElements returns the total number of elements on the GPU for this array.
11 Functions — Alphabetical List mxGPUIsSame (C) Determine if two mxGPUArrays refer to same GPU data C Syntax #include "gpu/mxGPUArray.h" int mxGPUIsSame(mxGPUArray const * const mgp1, mxGPUArray const * const mgp2) Arguments mgp1mgp2 Pointers to mxGPUArray. Returns int type. Description mxGPUIsSame returns an integer indicating if two mxGPUArray pointers refer to the same GPU data: • 1 (true) indicates that the inputs refer to the same data.
mxGPUIsSparse (C) mxGPUIsSparse (C) Determine if mxGPUArray contains sparse GPU data C Syntax #include "gpu/mxGPUArray.h" int mxGPUIsSparse(mxGPUArray const * mp); Arguments mp Pointer to an mxGPUArray to be queried for sparse data. Returns Integer indicating true result: • 1 indicates the input is a sparse gpuArray. • 0 indicates the input is not a sparse gpuArray.
11 Functions — Alphabetical List mxGPUIsValidGPUData (C) Determine if mxArray is pointer to valid GPU data C Syntax #include "gpu/mxGPUArray.h" int mxGPUIsValidGPUData(mxArray const * const mp) Arguments mgp Pointer to an mxArray. Returns int type. Description mxGPUIsValidGPUData indicates if the mxArray is a pointer to valid GPU data If the GPU device is reinitialized in MATLAB with gpuDevice, all data on the device becomes invalid, but the CPU data structures that refer to the GPU data still exist.
mxInitGPU (C) mxInitGPU (C) Initialize MATLAB GPU library on currently selected device C Syntax #include "gpu/mxGPUArray.h" int mxInitGPU() Returns int type with one of the following values: • MX_GPU_SUCCESS if the MATLAB GPU library is successfully initialized. • MX_GPU_FAILURE if not successfully initialized. Description Before using any CUDA code in your MEX file, initialize the MATLAB GPU library if you intend to use any mxGPUArray functionality in MEX or any GPU calls in MATLAB.
11 Functions — Alphabetical List mxIsGPUArray (C) Determine if mxArray contains GPU data C Syntax #include "gpu/mxGPUArray.h" int mxIsGPUArray(mxArray const * const mp); Arguments mp Pointer to an mxArray that might contain gpuArray data. Returns Integer indicating true result: • 1 indicates the input is a gpuArray. • 0 indicates the input is not a gpuArray.
NaN NaN Array of Not-a-Numbers Syntax A A A A C C C C = = = = = = = = NaN(sz,arraytype) NaN(sz,datatype,arraytype) NaN(sz,'like',P) NaN(sz,datatype,'like',P) NaN(sz,codist) NaN(sz,datatype,codist) NaN(sz, ___ ,codist,'noCommunication') NaN(sz, ___ ,codist,'like',P) Description A = NaN(sz,arraytype) creates a matrix with underlying class of double, with NaN values in all elements. A = NaN(sz,datatype,arraytype) creates a matrix with underlying class of datatype, with NaN values in all elements.
11 Functions — Alphabetical List Argument datatype Values Descriptions 'gpuArray' Specifies gpuArray. 'double' (default), Specifies underlying class of the array, i.e., the 'single' data type of its elements. A = NaN(sz,'like',P) creates an array of NaN values with the same type and underlying class (data type) as array P. A = NaN(sz,datatype,'like',P) creates an array of NaN values with the specified underlying class (datatype), and the same type as array P.
NaN Create Codistributed NaN Matrix Create a 1000-by-1000 codistributed double matrix of NaNs, distributed by its second dimension (columns). spmd(4) C = NaN(1000,'codistributed'); end With four workers, each worker contains a 1000-by-250 local piece of C. Create a 1000-by-1000 codistributed single matrix of NaNs, distributed by its columns. spmd(4) codist = codistributor('1d',2,100*[1:numlabs]); C = NaN(1000,1000,'single',codist); end Each worker contains a 100-by-labindex local piece of C.
11 Functions — Alphabetical List numlabs Total number of workers operating in parallel on current job Syntax n = numlabs Description n = numlabs returns the total number of workers currently operating on the current job. This value is the maximum value that can be used with labSend and labReceive. More About Tips In an spmd block, numlabs on each worker returns the parallel pool size. However, inside a parfor-loop, numlabs always returns a value of 1.
ones ones Array of ones Syntax N N N N C C C C = = = = = = = = ones(sz,arraytype) ones(sz,datatype,arraytype) ones(sz,'like',P) ones(sz,datatype,'like',P) ones(sz,codist) ones(sz,datatype,codist) ones(sz, ___ ,codist,'noCommunication') ones(sz, ___ ,codist,'like',P) Description N = ones(sz,arraytype) creates a matrix with underlying class of double, with ones in all elements. N = ones(sz,datatype,arraytype) creates a matrix with underlying class of datatype, with ones in all elements.
11 Functions — Alphabetical List Argument datatype Values Descriptions 'gpuArray' Specifies gpuArray. 'double' (default), 'single', 'int8', 'uint8', 'int16', 'uint16', Specifies underlying class of the array, i.e., the 'int32', data type of its elements. 'uint32', 'int64', or 'uint64' N = ones(sz,'like',P) creates an array of ones with the same type and underlying class (data type) as array P.
ones Examples Create Distributed Ones Matrix Create a 1000-by-1000 distributed array of ones with underlying class double: D = ones(1000,'distributed'); Create Codistributed Ones Matrix Create a 1000-by-1000 codistributed double matrix of ones, distributed by its second dimension (columns). spmd(4) C = ones(1000,'codistributed'); end With four workers, each worker contains a 1000-by-250 local piece of C. Create a 1000-by-1000 codistributed uint16 matrix of ones, distributed by its columns.
11 Functions — Alphabetical List pagefun Apply function to each page of array on GPU Syntax A = pagefun(FUN,B) A = pagefun(FUN,B,C,...) [A,B,...] = pagefun(FUN,C,...) Description pagefun iterates over the pages of a gpuArray, applying the same function to each page. A = pagefun(FUN,B) applies the function specified by FUN to each page of the gpuArray B, and returns the results in gpuArray A, such that A(:,:,I,J,...) = FUN(B(:,:,I,J,...)).
pagefun FUN must be a handle to a function that is written in the MATLAB language (i.e., not a built-in function or a MEX-function).
11 Functions — Alphabetical List B = rand(K,N,P,'gpuArray'); C = pagefun(@mtimes,A,B); s = size(C) % returns M-by-N-by-P s = 300 See Also 1000 200 arrayfun | bsxfun | gather | gpuArray 11-212
parallel.cluster.Hadoop parallel.cluster.Hadoop Create Hadoop cluster object Syntax hcluster = parallel.cluster.Hadoop hcluster = parallel.cluster.Hadoop(Name,Value) Description hcluster = parallel.cluster.Hadoop creates a parallel.cluster.Hadoop object representing the Hadoop cluster. You use the resulting object as input to the mapreducer function, for specifying the Hadoop cluster as the mapreduce parallel execution environment. hcluster = parallel.cluster.
11 Functions — Alphabetical List Input Arguments Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'HadoopInstallFolder','/share/hadoop/a1.2.
parallel.cluster.
11 Functions — Alphabetical List parallel.clusterProfiles Names of all available cluster profiles Syntax ALLPROFILES = parallel.clusterProfiles [ALLPROFILES, DEFAULTPROFILE] = parallel.clusterProfiles Description ALLPROFILES = parallel.clusterProfiles returns a cell array containing the names of all available profiles. [ALLPROFILES, DEFAULTPROFILE] = parallel.clusterProfiles returns a cell array containing the names of all available profiles, and separately the name of the default profile.
parallel.clusterProfiles allNames = parallel.clusterProfiles() myCluster = parcluster(allNames{end}); See Also parallel.defaultClusterProfile | parallel.exportProfile | parallel.
11 Functions — Alphabetical List parallel.defaultClusterProfile Examine or set default cluster profile Syntax p = parallel.defaultClusterProfile oldprofile = parallel.defaultClusterProfile(newprofile) Description p = parallel.defaultClusterProfile returns the name of the default cluster profile. oldprofile = parallel.defaultClusterProfile(newprofile) sets the default profile to be newprofile and returns the previous default profile.
parallel.defaultClusterProfile oldDefault = parallel.defaultClusterProfile('Profile2'); strcmp(oldDefault,'MyProfile') % returns true See Also parallel.clusterProfiles | parallel.
11 Functions — Alphabetical List parallel.exportProfile Export one or more profiles to file Syntax parallel.exportProfile(profileName, filename) parallel.exportProfile({profileName1, profileName2,..., profileNameN}, filename) Description parallel.exportProfile(profileName, filename) exports the profile with the name profileName to specified filename. The extension .settings is appended to the filename, unless already there. parallel.exportProfile({profileName1, profileName2,...
parallel.exportProfile notLocal = ~strcmp(allProfiles,'local'); profilesToExport = allProfiles(notLocal); if ~isempty(profilesToExport) parallel.exportProfile(profilesToExport,'AllProfiles'); end See Also parallel.clusterProfiles | parallel.
11 Functions — Alphabetical List parallel.gpu.CUDAKernel Create GPU CUDA kernel object from PTX and CU code Syntax KERN KERN KERN KERN = = = = parallel.gpu.CUDAKernel(PTXFILE, parallel.gpu.CUDAKernel(PTXFILE, parallel.gpu.CUDAKernel(PTXFILE, parallel.gpu.CUDAKernel(PTXFILE, CPROTO) CPROTO, FUNC) CUFILE) CUFILE, FUNC) Description KERN = parallel.gpu.CUDAKernel(PTXFILE, CPROTO) and KERN = parallel.gpu.
parallel.gpu.CUDAKernel int idx = blockIdx.x * blockDim.x + threadIdx.x; if (idx < vecLen) { pi[idx] += c; } and simpleEx.ptx contains the PTX resulting from compiling simpleEx.cu into PTX, both of the following statements return a kernel object that you can use to call the addToVector CUDA kernel. kern = parallel.gpu.CUDAKernel('simpleEx.ptx', ... 'simpleEx.cu'); kern = parallel.gpu.CUDAKernel('simpleEx.ptx', ...
11 Functions — Alphabetical List parallel.importProfile Import cluster profiles from file Syntax prof = parallel.importProfile(filename) Description prof = parallel.importProfile(filename) imports the profiles stored in the specified file and returns the names of the imported profiles. If filename has no extension, .settings is assumed; configuration files must be specified with the .mat extension. Configuration .mat files contain only one profile, but profile .
parallel.importProfile Import all the profiles from the file ManyProfiles.settings, and use the first one to open a parallel pool. profs = parallel.importProfile('ManyProfiles'); parpool(profs{1}) Import a configuration from the file OldConfiguration.mat, and set it as the default parallel profile. old_conf = parallel.importProfile('OldConfiguration.mat') parallel.defaultClusterProfile(old_conf) See Also parallel.clusterProfiles | parallel.defaultClusterProfile | parallel.
11 Functions — Alphabetical List parcluster Create cluster object Syntax c = parcluster c = parcluster(profile) Description c = parcluster returns a cluster object representing the cluster identified by the default cluster profile, with the cluster object properties set to the values defined in that profile. c = parcluster(profile) returns a cluster object representing the cluster identified by the specified cluster profile, with the cluster object properties set to the values defined in that profile.
parcluster parpool(myCluster); Find a particular cluster using the profile named 'MyProfile', and create an independent job on the cluster. myCluster = parcluster('MyProfile'); j = createJob(myCluster); See Also createJob | parallel.clusterProfiles | parallel.
11 Functions — Alphabetical List parfeval Execute function asynchronously on parallel pool worker Syntax F = parfeval(p,fcn,numout,in1,in2,...) F = parfeval(fcn,numout,in1,in2,...) Description F = parfeval(p,fcn,numout,in1,in2,...) requests asynchronous execution of the function fcn on a worker contained in the parallel pool p, expecting numout output arguments and supplying as input arguments in1,in2,.... The asynchronous evaluation of fcn does not block MATLAB. F is a parallel.
parfeval for idx = 1:10 f(idx) = parfeval(p,@magic,1,idx); % Square size determined by idx end % Collect the results as they become available. magicResults = cell(1,10); for idx = 1:10 % fetchNext blocks until next results are available. [completedIdx,value] = fetchNext(f); magicResults{completedIdx} = value; fprintf('Got result with index: %d.
11 Functions — Alphabetical List parfevalOnAll Execute function asynchronously on all workers in parallel pool Syntax F = parfevalOnAll(p,fcn,numout,in1,in2,...) F = parfevalOnAll(fcn,numout,in1,in2,...) Description F = parfevalOnAll(p,fcn,numout,in1,in2,...) requests the asynchronous execution of the function fcn on all workers in the parallel pool p, expecting numout output arguments from each worker and supplying input arguments in1,in2,... to each worker. F is a parallel.
parfor parfor Execute loop iterations in parallel Syntax parfor loopvar = initval:endval, statements, end parfor (loopvar = initval:endval, M), statements, end Description parfor loopvar = initval:endval, statements, end allows you to write a loop for a statement or block of code that executes in parallel on a cluster of workers, which are identified and reserved with the parpool command.
11 Functions — Alphabetical List than that number, even if additional workers are available. If you request more resources than are available, MATLAB uses the maximum number available at the time of the call. If the parfor-loop cannot run on workers in a parallel pool (for example, if no workers are available or M is 0), MATLAB executes the loop on the client in a serial manner. In this situation, the parfor semantics are preserved in that the loop iterations can execute in any order.
parfor Notably, the assignments to the variables i, t, and u do not affect variables with the same name in the context of the parfor statement. The rationale is that the body of the parfor is executed in parallel for all values of i, and there is no deterministic way to say what the “final” values of these variables are. Thus, parfor is defined to leave these variables unaffected in the context of the parfor statement.
11 Functions — Alphabetical List are necessary for its execution, then automatically attaches those files to the parallel pool so that the code is available to the workers.
parpool parpool Create parallel pool on cluster Syntax parpool parpool(poolsize) parpool(profilename) parpool(profilename,poolsize) parpool(cluster) parpool(cluster,poolsize) parpool( ___ ,Name,Value) poolobj = parpool( ___ ) Description parpool enables the full functionality of the parallel language features (parfor and spmd) in MATLAB by creating a special job on a pool of workers, and connecting the MATLAB client to the parallel pool.
11 Functions — Alphabetical List parpool( ___ ,Name,Value) applies the specified values for certain properties when starting the pool. poolobj = parpool( ___ ) returns a parallel.Pool object to the client workspace representing the pool on the cluster. You can use the pool object to programmatically delete the pool or to access its properties. Examples Create Pool from Default Profile Start a parallel pool using the default profile to define the number of workers.
parpool Return Pool Object and Delete Pool Create a parallel pool with the default profile, and later delete the pool. poolobj = parpool; delete(poolobj) Determine Size of Current Pool Find the number of workers in the current parallel pool. poolobj = gcp('nocreate'); % If no pool, do not create new one. if isempty(poolobj) poolsize = 0; else poolsize = poolobj.
11 Functions — Alphabetical List Example: c = parcluster(); Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'AttachedFiles',{'myFun.
parpool More About Tips • The pool status indicator in the lower-left corner of the desktop shows the client session connection to the pool and the pool status. Click the icon for a menu of supported pool actions. With a pool running: With no pool running: • If you set your parallel preferences to automatically create a parallel pool when necessary, you do not need to explicitly call the parpool command.
11 Functions — Alphabetical List This slight difference in behavior might be an issue in a mixed-platform environment where the client is not the same platform as the workers, where folders local to or mapped from the client are not available in the same way to the workers, or where folders are in a nonshared file system.
parpool P7 P8 • “Parallel Preferences” • “Clusters and Cluster Profiles” • “Pass Data to and from Worker Sessions” See Also Composite | delete | distributed | gcp | parallel.
11 Functions — Alphabetical List pause Pause MATLAB job scheduler queue Syntax pause(mjs) Arguments mjs MATLAB job scheduler object whose queue is paused. Description pause(mjs) pauses the MATLAB job scheduler’s queue so that jobs waiting in the queued state will not run. Jobs that are already running also pause, after completion of tasks that are already running. No further jobs or tasks will run until the resume function is called for the MJS.
pctconfig pctconfig Configure settings for Parallel Computing Toolbox client session Syntax pctconfig('p1', v1, ...) config = pctconfig('p1', v1, ...) config = pctconfig() Arguments p1 Property to configure. Supported properties are 'portrange', 'hostname'. v1 Value for corresponding property. config Structure of configuration value. Description pctconfig('p1', v1, ...) sets the client configuration property p1 with the value v1.
11 Functions — Alphabetical List If the property is 'hostname', the specified value is used to set the hostname for the client session of Parallel Computing Toolbox software. This is useful when the client computer is known by more than one hostname. The value you should use is the hostname by which the cluster nodes can contact the client computer. The toolbox supports both short hostnames and fully qualified domain names. config = pctconfig('p1', v1, ...) returns a structure to config.
pctRunDeployedCleanup pctRunDeployedCleanup Clean up after deployed parallel applications Syntax pctRunDeployedCleanup Description pctRunDeployedCleanup performs necessary cleanup so that the client JVM can properly terminate when the deployed application exits. All deployed applications that use Parallel Computing Toolbox functionality need to call pctRunDeployedCleanup after the last call to Parallel Computing Toolbox functionality.
11 Functions — Alphabetical List pctRunOnAll Run command on client and all workers in parallel pool Syntax pctRunOnAll command Description pctRunOnAll command runs the specified command on all the workers of the parallel pool as well as the client, and prints any command-line output back to the client Command Window. The specified command runs in the base workspace of the workers and does not have any return variables.
pctRunOnAll See Also parpool 11-247
11 Functions — Alphabetical List pload Load file into parallel session Syntax pload(fileroot) Arguments fileroot Part of filename common to all saved files being loaded. Description pload(fileroot) loads the data from the files named [fileroot num2str(labindex)] into the workers running a communicating job. The files should have been created by the psave command. The number of workers should be the same as the number of files. The files should be accessible to all the workers.
pload This creates three files (threeThings1.mat, threeThings2.mat, threeThings3.mat) in the current working directory. Clear the workspace on all the workers and confirm there are no variables. clear all whos Load the previously saved data into the workers. Confirm its presence.
11 Functions — Alphabetical List pmode Interactive Parallel Command Window Syntax pmode pmode pmode pmode pmode pmode pmode pmode start start numworkers start prof numworkers quit exit client2lab clientvar workers workervar lab2client workervar worker clientvar cleanup prof Description pmode allows the interactive parallel execution of MATLAB commands. pmode achieves this by defining and submitting a communicating job, and opening a Parallel Command Window connected to the workers running the job.
pmode pmode quit or pmode exit stops the pmode job, deletes it, and closes the Parallel Command Window. You can enter this command at the MATLAB prompt or the pmode prompt. pmode client2lab clientvar workers workervar copies the variable clientvar from the MATLAB client to the variable workervar on the workers identified by workers. If workervar is omitted, the copy is named clientvar. workers can be either a single index or a vector of indices.
11 Functions — Alphabetical List pmode start local 4 Start pmode using the profile myProfile and eight workers on the cluster. pmode start myProfile 8 Execute a command on all workers. P>> x = 2*labindex; Copy the variable x from worker 7 to the MATLAB client. pmode lab2client x 7 Copy the variable y from the MATLAB client to workers 1 through 8. pmode client2lab y 1:8 Display the current working directory of each worker. P>> pwd See Also createCommunicatingJob | parallel.
poolStartup poolStartup File for user-defined options to run on each worker when parallel pool starts Syntax poolStartup Description poolStartup runs automatically on a worker each time the worker forms part of a parallel pool. You do not call this function from the client session, nor explicitly as part of a task function. You add MATLAB code to the poolStartup.m file to define pool initialization on the worker. The worker looks for poolStartup.
11 Functions — Alphabetical List See Also jobStartup | taskFinish | taskStartup 11-254
promote promote Promote job in MJS cluster queue Syntax promote(c,job) Arguments c The MJS cluster object that contains the job. job Job object promoted in the queue. Description promote(c,job) promotes the job object job, that is queued in the MJS cluster c. If job is not the first job in the queue, promote exchanges the position of job and the previous job.
11 Functions — Alphabetical List Examine the new queue sequence: [pjobs, qjobs, rjobs, fjobs] = findJob(c); get(qjobs,'Name') 'Job A' 'Job C' 'Job B' More About Tips After a call to promote or demote, there is no change in the order of job objects contained in the Jobs property of the MJS cluster object. To see the scheduled order of execution for jobs in the queue, use the findJob function in the form [pending queued running finished] = findJob(c).
psave psave Save data from communicating job session Syntax psave(fileroot) Arguments fileroot Part of filename common to all saved files. Description psave(fileroot) saves the data from the workers’ workspace into the files named [fileroot num2str(labindex)]. The files can be loaded by using the pload command with the same fileroot, which should point to a folder accessible to all the workers.
11 Functions — Alphabetical List Clear the workspace on all the workers and confirm there are no variables. clear all whos Load the previously saved data into the workers. Confirm its presence.
rand rand Array of rand values Syntax R R R R C C C C = = = = = = = = rand(sz,arraytype) rand(sz,datatype,arraytype) rand(sz,'like',P) rand(sz,datatype,'like',P) rand(sz,codist) rand(sz,datatype,codist) rand(sz, ___ ,codist,'noCommunication') rand(sz, ___ ,codist,'like',P) Description R = rand(sz,arraytype) creates a matrix with underlying class of double, with rand values in all elements.
11 Functions — Alphabetical List Argument datatype Values Descriptions 'gpuArray' Specifies gpuArray. 'double' (default), Specifies underlying class of the array, i.e., the 'single' data type of its elements. R = rand(sz,'like',P) creates an array of rand values with the same type and underlying class (data type) as array P. R = rand(sz,datatype,'like',P) creates an array of rand values with the specified underlying class (datatype), and the same type as array P.
rand Create Codistributed Rand Matrix Create a 1000-by-1000 codistributed double matrix of rands, distributed by its second dimension (columns). spmd(4) C = rand(1000,'codistributed'); end With four workers, each worker contains a 1000-by-250 local piece of C. Create a 1000-by-1000 codistributed single matrix of rands, distributed by its columns. spmd(4) codist = codistributor('1d',2,100*[1:numlabs]); C = rand(1000,1000,'single',codist); end Each worker contains a 100-by-labindex local piece of C.
11 Functions — Alphabetical List randi Array of random integers Syntax R R R R C C C C = = = = = = = = randi(valrange,sz,arraytype) randi(valrange,sz,datatype,arraytype) randi(valrange,sz,'like',P) randi(valrange,sz,datatype,'like',P) randi(valrange,sz,codist) randi(valrange,sz,datatype,codist) randi(valrange,sz, ___ ,codist,'noCommunication') randi(valrange,sz, ___ ,codist,'like',P) Description R = randi(valrange,sz,arraytype) creates a matrix with underlying class of double, with randi integer value
randi Argument datatype Values Descriptions 'codistributed' Specifies codistributed array, using the default distribution scheme. 'gpuArray' Specifies gpuArray. 'double' (default), 'single', 'int8', 'uint8', 'int16', 'uint16', Specifies underlying class of the array, i.e., the 'int32', data type of its elements. 'uint32', 'int64', or 'uint64' R = randi(valrange,sz,'like',P) creates an array of randi values with the same type and underlying class (data type) as array P.
11 Functions — Alphabetical List Examples Create Distributed Randi Matrix Create a 1000-by-1000 distributed array of randi values from 1 to 100, with underlying class double: D = randi(100,1000,'distributed'); Create Codistributed Randi Matrix Create a 1000-by-1000 codistributed double matrix of randi values from 0 to 12, distributed by its second dimension (columns). spmd(4) C = randi([0 12],1000,'codistributed'); end With four workers, each worker contains a 1000-by-250 local piece of C.
randn randn Array of randn values Syntax R R R R C C C C = = = = = = = = randn(sz,arraytype) randn(sz,datatype,arraytype) randn(sz,'like',P) randn(sz,datatype,'like',P) randn(sz,codist) rand(sz,datatype,codist) randn(sz, ___ ,codist,'noCommunication') randn(sz, ___ ,codist,'like',P) Description R = randn(sz,arraytype) creates a matrix with underlying class of double, with randn values in all elements.
11 Functions — Alphabetical List Argument datatype Values Descriptions 'gpuArray' Specifies gpuArray. 'double' (default), Specifies underlying class of the array, i.e., the 'single' data type of its elements. R = randn(sz,'like',P) creates an array of randn values with the same type and underlying class (data type) as array P. R = randn(sz,datatype,'like',P) creates an array of randn values with the specified underlying class (datatype), and the same type as array P.
randn Create Codistributed Randn Matrix Create a 1000-by-1000 codistributed double matrix of randn values, distributed by its second dimension (columns). spmd(4) C = randn(1000,'codistributed'); end With four workers, each worker contains a 1000-by-250 local piece of C. Create a 1000-by-1000 codistributed single matrix of randn values, distributed by its columns.
11 Functions — Alphabetical List recreate Create new job from existing job Syntax newjob = recreate(oldjob) newjob = recreate(oldjob,'TaskID',ids) Arguments newjob New job object. oldjob Original job object to be duplicated. 'TaskID' Option to include only some tasks ids Vector of integers specifying task IDs Description newjob = recreate(oldjob) creates a new job object based on an existing job, containing the same tasks and settable properties as oldjob.
recreate Recreate a Job with Specified Tasks This example shows how to recreate an independent job, which has only the tasks with IDs 21 to 32 from the job oldIndependentJob. newJob = recreate(oldIndependentJob,'TaskID',[21:32]); Recreate Jobs of a Specific User This example shows how to find and recreate all failed jobs submitted by user Mary. Assume the default cluster is the one Mary had submitted her jobs to.
11 Functions — Alphabetical List redistribute Redistribute codistributed array with another distribution scheme Syntax D2 = redistribute(D1, codist) Description D2 = redistribute(D1, codist) redistributes a codistributed array D1 and returns D2 using the distribution scheme defined by the codistributor object codist. Examples Redistribute an array according to the distribution scheme of another array.
reset reset Reset GPU device and clear its memory Syntax reset(gpudev) Description reset(gpudev) resets the GPU device and clears its memory of gpuArray and CUDAKernel data. The GPU device identified by gpudev remains the selected device, but all gpuArray and CUDAKernel objects in MATLAB representing data on that device are invalid.
11 Functions — Alphabetical List M % Display gpuArray 16 5 9 4 2 11 7 14 3 10 6 15 13 8 12 1 reset(g); g % Show that the device is still selected g = CUDADevice with properties: Name: Index: ComputeCapability: SupportsDouble: DriverVersion: ToolkitVersion: MaxThreadsPerBlock: MaxShmemPerBlock: MaxThreadBlockSize: MaxGridSize: SIMDWidth: TotalMemory: AvailableMemory: MultiprocessorCount: ClockRateKHz: ComputeMode: GPUOverlapsTransfers: KernelExecutionTimeout: CanMapHostMemory: DeviceSupported: DeviceS
reset clear M See Also gpuDevice | gpuArray | parallel.gpu.
11 Functions — Alphabetical List resume Resume processing queue in MATLAB job scheduler Syntax resume(mjs) Arguments mjs MATLAB job scheduler object whose queue is resumed. Description resume(mjs) resumes processing of the specified MATLAB job scheduler's queue so that jobs waiting in the queued state will be run. This call will do nothing if the MJS is not paused.
saveAsProfile saveAsProfile Save cluster properties to specified profile Description saveAsProfile(cluster,profileName) saves the properties of the cluster object to the specified profile, and updates the cluster Profile property value to indicate the new profile name. Examples Create a cluster, then modify a property and save the properties to a new profile. myCluster = parcluster('local'); myCluster.
11 Functions — Alphabetical List saveProfile Save modified cluster properties to its current profile Description saveProfile(cluster) saves the modified properties on the cluster object to the profile specified by the cluster’s Profile property, and sets the Modified property to false. If the cluster’s Profile property is empty, an error is thrown. Examples Create a cluster, then modify a property and save the change to the profile.
saveProfile Properties: Profile: Modified: Host: NumWorkers: local false HOSTNAME 3 After saving, the local profile now matches the current property settings, so the myCluster.Modified property is false.
11 Functions — Alphabetical List setConstantMemory Set some constant memory on GPU Syntax setConstantMemory(kern,sym,val) setConstantMemory(kern,sym1,val1,sym2,val2,...) Description setConstantMemory(kern,sym,val) sets the constant memory in the CUDA kernel kern with symbol name sym to contain the data in val. val can be any numeric array, including a gpuArray. The command errors if the named symbol does not exist or if it is not big enough to contain the specified data.
setConstantMemory setConstantMemory(KERN,'N1',int32(10)); setConstantMemory(KERN,'N2',int32(10)); setConstantMemory(KERN,'CONST_DATA',1:10); or setConstantMemory(KERN,'N1',int32(10),'N2',int32(10),'CONST_DATA',1:10); See Also gpuArray | parallel.gpu.
11 Functions — Alphabetical List setJobClusterData Set specific user data for job on generic cluster Syntax setJobClusterData(cluster,job,userdata) Arguments cluster Cluster object identifying the generic third-party cluster running the job job Job object identifying the job for which to store data userdata Information to store for this job Description setJobClusterData(cluster,job,userdata) stores data for the job job that is running on the generic cluster cluster.
size size Size of object array Syntax d = size(obj) [m,n] = size(obj) [m1,m2,m3,...,mn] = size(obj) m = size(obj,dim) Arguments obj An object or an array of objects. dim The dimension of obj. d The number of rows and columns in obj. m The number of rows in obj, or the length of the dimension specified by dim. n The number of columns in obj. m1,m2,m3,...,mn The lengths of the first n dimensions of obj.
11 Functions — Alphabetical List See Also length 11-282
sparse sparse Create sparse distributed or codistributed matrix Syntax SD SC SC SC SC SC = = = = = = sparse(FD) sparse(m,n,codist) sparse(m,n,codist,'noCommunication') sparse(i,j,v,m,n,nzmax) sparse(i,j,v,m,n) sparse(i,j,v) Description SD = sparse(FD) converts a full distributed or codistributed array FD to a sparse distributed or codistributed (respectively) array SD.
11 Functions — Alphabetical List To simplify this six-argument call, you can pass scalars for the argument v and one of the arguments i or j, in which case they are expanded so that i, j, and v all have the same length. SC = sparse(i,j,v,m,n) uses nzmax = max([length(i) length(j)]) . SC = sparse(i,j,v) uses m = max(i) and n = max(j). The maxima are computed before any zeros in v are removed, so one of the rows of [i j v] might be [m n 0], assuring the matrix size satisfies the requirements of m and n.
sparse Create a sparse codistributed array from vectors of indices and a distributed array of element values: r = [ 1 1 4 4 8]; c = [ 1 4 1 4 8]; v = [10 20 30 40 0]; V = distributed(v); spmd SC = sparse(r,c,V); end In this example, even though the fifth element of the value array v is 0, the size of the result is an 8–by-8 matrix because of the corresponding maximum indices in r and c.
11 Functions — Alphabetical List spmd Execute code in parallel on workers of parallel pool Syntax spmd, statements, end spmd(n), statements, end spmd(m,n), statements, end Description The general form of an spmd (single program, multiple data) statement is: spmd statements end spmd, statements, end defines an spmd statement on a single line. MATLAB executes the spmd body denoted by statements on several MATLAB workers simultaneously.
spmd By default, MATLAB uses as many workers as it finds available in the pool. When there are no MATLAB workers available, MATLAB executes the block body locally and creates Composite objects as necessary. spmd(n), statements, end uses n to specify the exact number of MATLAB workers to evaluate statements, provided that n workers are available from the parallel pool. If there are not enough workers available, an error is thrown.
11 Functions — Alphabetical List • If the AutoAttachFiles property in the cluster profile for the parallel pool is set to true, MATLAB performs an analysis on an spmd block to determine what code files are necessary for its execution, then automatically attaches those files to the parallel pool job so that the code is available to the workers. • For information about restrictions and limitations when using spmd, see “Limitations” on page 3-13.
submit submit Queue job in scheduler Syntax submit(j) Arguments j Job object to be queued. Description submit(j) queues the job object j in its cluster queue. The cluster used for this job was determined when the job was created. Examples Find the MJS cluster identified by the cluster profile Profile1. c1 = parcluster('Profile1'); Create a job object in this cluster. j1 = createJob(c1); Add a task object to be evaluated for the job.
11 Functions — Alphabetical List More About Tips When a job is submitted to a cluster queue, the job’s State property is set to queued, and the job is added to the list of jobs waiting to be executed. The jobs in the waiting list are executed in a first in, first out manner; that is, the order in which they were submitted, except when the sequence is altered by promote, demote, cancel, or delete.
subsasgn subsasgn Subscripted assignment for Composite Syntax C(i) = {B} C(1:end) = {B} C([i1, i2]) = {B1, B2} C{i} = B Description subsasgn assigns remote values to Composite objects. The values reside on the workers in the current parallel pool. C(i) = {B} sets the entry of C on worker i to the value B. C(1:end) = {B} sets all entries of C to the value B. C([i1, i2]) = {B1, B2} assigns different values on workers i1 and i2. C{i} = B sets the entry of C on worker i to the value B.
11 Functions — Alphabetical List subsref Subscripted reference for Composite Syntax B = C(i) B = C([i1, i2, ...]) B = C{i} [B1, B2, ...] = C{[i1, i2, ...]} Description subsref retrieves remote values of a Composite object from the workers in the current parallel pool. B = C(i) returns the entry of Composite C from worker i as a cell array. B = C([i1, i2, ...]) returns multiple entries as a cell array. B = C{i} returns the value of Composite C from worker i as a single entry. [B1, B2, ...] = C{[i1, i2, .
taskFinish taskFinish User-defined options to run on worker when task finishes Syntax taskFinish(task) Arguments task The task being evaluated by the worker Description taskFinish(task) runs automatically on a worker each time the worker finishes evaluating a task for a particular job. You do not call this function from the client session, nor explicitly as part of a task function. You add MATLAB code to the taskFinish.m file to define anything you want executed on the worker when a task is finished.
11 Functions — Alphabetical List taskStartup User-defined options to run on worker when task starts Syntax taskStartup(task) Arguments task The task being evaluated by the worker. Description taskStartup(task) runs automatically on a worker each time the worker evaluates a task for a particular job. You do not call this function from the client session, nor explicitly as part of a task function. You add MATLAB code to the taskStartup.m file to define task initialization on the worker.
true true Array of logical 1 (true) Syntax T T C C C = = = = = true(sz,arraytype) true(sz,'like',P) true(sz,codist) true(sz, ___ ,codist,'noCommunication') true(sz, ___ ,codist,'like',P) Description T = true(sz,arraytype) creates a matrix with true values in all elements. The size and type of array are specified by the argument options according to the following table. Argument sz arraytype Values Descriptions n Specifies size as an n-by-n matrix.
11 Functions — Alphabetical List reference pages for codistributor1d and codistributor2dbc. To use the default distribution scheme, you can specify a codistributor constructor without arguments. For example: spmd C = true(8,codistributor1d()); end C = true(sz, ___ ,codist,'noCommunication') specifies that no interworker communication is to be performed when constructing a codistributed array, skipping some error checking steps.
true Each worker contains a 100-by-labindex local piece of C.
11 Functions — Alphabetical List updateAttachedFiles Update attached files or folders on parallel pool Syntax updateAttachedFiles(poolobj) Description updateAttachedFiles(poolobj) checks all the attached files of the specified parallel pool to see if they have changed, and replicates any changes to each of the workers in the pool. This checks files that were attached (by a profile or parpool argument) when the pool was started and those subsequently attached with the addAttachedFiles command.
updateAttachedFiles See Also addAttachedFiles | gcp | listAutoAttachedFiles | parpool 11-299
11 Functions — Alphabetical List wait Wait for job to change state Syntax wait(j) wait(j,'state') wait(j,'state',timeout) Arguments j Job object whose change in state to wait for. 'state' Value of the job object’s State property to wait for. timeout Maximum time to wait, in seconds. Description wait(j) blocks execution in the client session until the job identified by the object j reaches the 'finished' state or fails. This occurs when all the job’s tasks are finished processing on the workers.
wait Note Simulink models cannot run while a MATLAB session is blocked by wait. If you must run Simulink from the MATLAB client while also running jobs, you cannot use wait. Examples Submit a job to the queue, and wait for it to finish running before retrieving its results. submit(j); wait(j,'finished') results = fetchOutputs(j) Submit a batch job and wait for it to finish before retrieving its variables.
11 Functions — Alphabetical List wait (FevalFuture) Wait for futures to complete Syntax OK = wait(F) OK = wait(F,STATE) OK = wait(F,STATE,TIMEOUT) Description OK = wait(F) blocks execution until each of the array of futures F has reached the 'finished' state. OK is true if the wait completed successfully, and false if any of the futures was cancelled or failed execution. OK = wait(F,STATE) blocks execution until the array of futures F has reached the state STATE.
wait (GPUDevice) wait (GPUDevice) Wait for GPU calculation to complete Syntax wait(gpudev) Description wait(gpudev) blocks execution in MATLAB until the GPU device identified by the GPUDevice object gpudev completes its calculations. This can be used before calls to toc when timing GPU code that does not gather results back to the workspace. When gathering results from a GPU, MATLAB automatically waits until all GPU calculations are complete, so you do not need to explicitly call wait in that situation.
11 Functions — Alphabetical List zeros Array of zeros Syntax Z Z Z Z C C C C = = = = = = = = zeros(sz,arraytype) zeros(sz,datatype,arraytype) zeros(sz,'like',P) zeros(sz,datatype,'like',P) zeros(sz,codist) zeros(sz,datatype,codist) zeros(sz, ___ ,codist,'noCommunication') zeros(sz, ___ ,codist,'like',P) Description Z = zeros(sz,arraytype) creates a matrix with underlying class of double, with zeros in all elements.
zeros Argument datatype Values Descriptions 'gpuArray' Specifies gpuArray. 'double' (default), 'single', 'int8', 'uint8', 'int16', 'uint16', Specifies underlying class of the array, i.e., the 'int32', data type of its elements. 'uint32', 'int64', or 'uint64' Z = zeros(sz,'like',P) creates an array of zeros with the same type and underlying class (data type) as array P.
11 Functions — Alphabetical List Examples Create Distributed Zeros Matrix Create a 1000-by-1000 distributed array of zeros with underlying class double: D = zeros(1000,'distributed'); Create Codistributed Zeros Matrix Create a 1000-by-1000 codistributed double matrix of zeros, distributed by its second dimension (columns). spmd(4) C = zeros(1000,'codistributed'); end With four workers, each worker contains a 1000-by-250 local piece of C.
Glossary CHECKPOINTBASE The name of the parameter in the mdce_def file that defines the location of the checkpoint directories for the MATLAB job scheduler and workers. checkpoint directory See CHECKPOINTBASE. client The MATLAB session that defines and submits the job. This is the MATLAB session in which the programmer usually develops and prototypes applications. Also known as the MATLAB client. client computer The computer running the MATLAB client; often your desktop.
Glossary distributed application The same application that runs independently on several nodes, possibly with different input parameters. There is no communication, shared data, or synchronization points between the nodes, so they are generally considered to be coarse-grained. distributed array An array partitioned into segments, with each segment residing in the workspace of a different worker.
Glossary homogeneous cluster A cluster of identical machines, in terms of both hardware and software. independent job A job composed of independent tasks, which do not communication with each other during evaluation. Tasks do not need to run simultaneously. job The complete large-scale operation to perform in MATLAB, composed of a set of tasks. job scheduler checkpoint information Snapshot of information necessary for the MATLAB job scheduler to recover from a system crash or reboot.
Glossary node A computer that is part of a cluster. parallel application The same application that runs on several workers simultaneously, with communication, shared data, or synchronization points between the workers. parallel pool A collection of workers that are reserved by the client and running a special communicating job for execution of parfor-loops, spmd statements, and distributed arrays. private array An array which resides in the workspaces of one or more, but perhaps not all workers.
Glossary worker checkpoint information Files required by the worker during the execution of tasks.