al45tair.net – Interesting OS X Crash Report Tidbits

Yesterday I spent some time looking at OS X crash logs; as anyone who has been working on the OS X platform as long as I have will have noticed, the precise format of the crash logs your applications generate is now somewhat different to the reports that were generated way back on OS X 10.0.

Indeed, Apple documents no fewer than six versions of the crash log format, and that doesn’t include the variants in use on iOS, the effects of running applications under Rosetta, or a couple of newer formats that have appeared more recently. Aside from version 1 crash logs, all of the formats include a field named "Report Version"; a more complete table of crash log versions would look like this:

Version	Platform
1	Mac OS X prior to 10.3.2
2	Mac OS X 10.3.2 through 10.3.9
3	Mac OS X 10.4.x on PowerPC
4	Mac OS X 10.4.x on Intel
5	Apparently never shipped
6	Mac OS X 10.5 through 10.7
7	Output from `sample` command line tool
10	Mac OS X 10.8 and later
11	Mac OS X 10.8 spin/hang report
101	iOS 1 (reported as OS X 1.x)
102	Unknown
103	iOS 2
104	iOS 3 and later

Interestingly, the symbolicatecrash script that is used with iOS crash reports doesn’t appear to know about report version 101, but does know about report version 102. In case you don’t already know, symbolicatecrash can be found at /Developer/Platforms/iPhoneOS.platform/Developer/Library /PrivateFrameworks/DTDeviceKit.framework/Resources/symbolicatecrash. (With Xcode 4, this is inside the application bundle, so tack /Applications/Xcode.app/Contents onto the front too).

There’s little point going through all the changes between the different versions on Mac OS X, because, up to version 6 at least, they’re more than adequately documented in TN2123. Its iOS counterpart, TN2151, seems rather less useful, though it does at least include a list of exception codes that you might see.

iOS does differ a bit, though; it doesn’t include the “Command” field from Mac OS X, but it does include fields named “Incident Identifier” and “CrashReporter Key”, the former of which is a UUID and the latter of which is a 40 character long hexadecimal number.

Additionally, if a thread has been assigned a name using pthread_setname_np(), on Mac OS X the backtrace for that thread will start with a line resembling the following:

Thread <number>[ Crashed]:: <name of thread>

while on iOS you’ll see two lines:

Thread <number> name:  <name of thread>
Thread <number>[ Crashed]:

Obviously the thread state dump (with the registers) will differ from processor to processor, but there are additional differences in the format of the “Binary Images” section.

One other interesting feature (and it was this that got me interested yesterday) is that some crash reports now contain additional information labelled “Application Specific Information” or similar. This could be quite useful, but the mechanism by which it appears is completely undocumented…

Anyway, as a result of my investigations yesterday, it seems there are two different mechanisms for adding this to a crash report. On Mac OS X and iOS, there is a special symbol __crashreporter_info__ that you can define that lets you add to the “Application Specific Information” field; e.g.

static const char *__crashreporter_info__ = 0;
asm(".desc __crashreporter_info__, 0x10");

void crash(void)
{
  __crashreporter_info__ = "This crash is expected!";
  *(int *)4 = 8;
}

The asm statement is used to mark the __crashreporter_info__ field as “referenced dynamically”; this means it won’t get stripped and will be included in the resulting binary so the crash reporter can see it.

I don’t know exactly which version of Mac OS X this was added in, but it is the older of the two mechanisms and exists on iOS as well.

On newer versions of Mac OS X, you can also give extra information to the crash reporter via a special data structure:

/* crash_info_t is always 64-bit, even if you build 32-bit code,
   so we set the alignment of its members to 8 bytes to achieve
   the appropriate layout in both cases */
#define CRASH_ALIGN __attribute__((aligned(8)))

typedef struct {
  unsigned    version   CRASH_ALIGN;
  const char *message   CRASH_ALIGN;
  const char *signature CRASH_ALIGN;
  const char *backtrace CRASH_ALIGN;
  const char *message2  CRASH_ALIGN;
  void       *reserved  CRASH_ALIGN;
  void       *reserved2 CRASH_ALIGN;
} crash_info_t;

#define CRASH_ANNOTATION __attribute__((section("__DATA,__crash_info")))
#define CRASH_VERSION    4

crash_info_t gCRAnnotations CRASH_ANNOTATION = { CRASH_VERSION,
                                                 0, 0, 0, 0,
                                                 0, 0 };

void crash(void)
{
  gCRAnnotations.message = "Message #1";
  gCRAnnotations.signature = "My test crash";
  gCRAnnotations.backtrace =
  "0   MyTest     0x12345678 myTest(3, 4, 5)\n"
  "1   MyTest     0x23456789 myMain";
  gCRAnnotations.message2 = "Message #2";
  *(int *)4 = 8;
}

Both of these mechanisms are per image. That is, the crash reporter will collect up information from every image loaded into the address space of the crashed process. In the case of the “Application Specific Information” field, the messages are output one after the other; the same is true for the “Application Specific Signature” field. If you are generating your own backtrace (as the NSException code does), each backtrace is output separately, and they are numbered... for instance, if we change the line

  *(int *)4 = 8;

to read

  [NSException raise:@"TestException" format:@"Nothing to see here"];

then the resulting crash log will contain two backtraces labelled “Application Specific Backtrace 1” and “Application Specific Backtrace 2”.

Important Note

None of this is documented. If you are going to use it, be sure that you initialise the variables to zero/NULL, and DO NOT USE THE reserved or reserved2 fields.