Basic System Problem Analysis - August 2003

67
Case Study: SA663 continued
Finally we can format the GUFD for file $d and see that it agrees with what was found,
the bad status was $fc0e008f.
There are some other interesting things to be seen; the GDPD_PTR is zero. This should
be a pointer to the last GDPD in a linked list. Since the call to tm_unlink_gdpd failed we
could assume (and it would be correct!) that the failure was due to the fact that this value
is null. Something else seems to have either cleared the value or unlinked the GDPD
erroneously. It is also possible that the file being closed was never actually linked into the
list correctly in the first place.
That’s a problem with reading memory dumps, it’s relatively simple to find out what
happened. Figuring out why or how it happened is far more difficult!
One final bit of potentially interesting information is the fact that the GUFD field
STORE_ACTIVE is 1. That seems to imply that a STORE may be running. This is
another of those “how can you tell without the source code” problems.
The most direct way of confirming the suspicion would be to find out if STORE is
running. That can be done by setting a filter on the string “STORE” and using the
PM_PTREE macro (with no input PIN) to scan all processes in the dump:
$251 ($70) nmdat > env filter 'STORE'
$252 ($70) nmdat STORE> pm_ptree
$1d7 (STORE.PUB.SYS) #J2697
$21d (STORE.PUB.SYS) #J2697
It’s running alright. So does that mean STORE did this? No, obviously not but it would
be a data point.
If you are reporting this to the Response Center you can supply this information in your
initial contact with them. If there are internal reports that indicate there is problem the
support engineer may be able to recommend a patch right away. You can also check the
ITRC database to see if there are any documents reporting this too.
This information can also be recorded in a failure log for later reference.
Last but certainly not least, since the problem involves a file it would be wise to schedule
a time to run FSCHECK. The problem itself is unlikely to be due to physical damage to
the file but why take chances.