User manual

2 Network Administration
Locating Checkp
oint Directories
Checkpoint dire
ctories contain information related to persistence data, which
the engine servi
ces use to create continuity from one instance of a session to
another. For exa
mple, if you stop and restart a job manager, the new session
will continue t
he old session, using all the same data.
A primary featu
re offered by the checkpoint directories is in crash recovery.
This allows eng
ine services to automatically resume their sessions after a
system goes dow
n and comes back up, minimizing the loss of data. However, if
aMATLABworke
r goes down d uring the evaluation of a task, that task is
neither reeva
luated nor reassigned to another w orker. In this case, a finished
job may not h a
ve a complete set of output data, as data from any unfinished
tasks might b
e m issing.
Note If a job m
anager crashes and restarts, its workers can take up to 2
minutes to re
register with it.
2-14