UTIs Are Better Than You Think — and Here’s Why

If you’re a Mac or iOS developer, you’ve probably heard of UTIs; if you come from some other platform, you probably haven’t, so I’ll start with a quick overview.

UTI stands for Uniform Type Identifier, and it’s used in a similar way to a MIME type---it’s a name for a particular kind of data---but unlike MIME types, UTIs are based on reverse DNS syntax strings (with two notable exceptions, which we’ll get to in a minute). If you want a new MIME type, you either have to register it with IETF, which is time-consuming and a source of uncertainty (they might not agree that you need one), or you can stick an x- prefix on it (but this might create clashes). If you want a UTI, however, all you need do is own an Internet domain; if you own example.com, you also own all of the UTI namespace starting com.example, so you could use com.example.MyDataType.

If you’re familiar with MIME types, you’ll also know that they’re split into two pieces, separated by a ‘/’ character. For instance, text/plain or image/jpeg. The part before the ‘/’ is supposed to tell you what kind of data you’re dealing with. This is useful, but it really doesn’t give you that much information; for instance, if you have a MIME type starting text/, you know it’s some kind of text, but you don’t really know how to decode it, or even if it’s compatible with US-ASCII.

UTIs, on the other hand, have a conformance hierarchy, so it’s possible to tell the system that your wonderful new data format, com.example.MyDataType, is actually a kind of video file. Moreover, you can be specific about it; maybe it’s actually a special kind of MPEG file. How does this work? Well, Apple has reserved all UTIs starting public., and has defined a large set of standard UTIs in that space. For instance, public.jpeg conforms to public.image, which in turn conforms to public.data, which, in turn, conforms to public.item. There’s actually more than one hierarchy (e.g. public.text conforms to both public.data and public.content), and Apple provides APIs that allow you to test any given UTI for conformance with any other UTI.

Additionally, every UTI has associated with it a set of tags. These are things like file extensions, MIME types and Mac OS OSType codes, and each class of tag is identified by---you guessed it---a UTI. So, the UTI for a file extension is public.filename-extension, for a MIME type it’s public.mime-type and for an OSType it’s com.apple.ostype. Again, APIs are provided that allow you to ask, given a UTI, e.g. what the file extension might be for that UTI.

Finally, Apple’s platform allows applications to declare their own UTIs by adding information to their Info.plist files, which makes them accessible to all applications on the system.

So what, you ask? Well here’s the really clever part; there’s an API that, given a tag class and a tag will find you the most appropriate UTI. For instance, if you give it public.filename-extension and jpeg, it will return public.jpeg. However, sometimes you’ll have a file that the system doesn’t recognize; you might know something about it (e.g. it’s file extension), but it isn’t in the system’s UTI database, and no application has mentioned it in their Info.plist. What happens now? Well, if, for instance, you ask for the UTI for a file you’ve found with the extension frob, you’ll get back a rather cryptic looking answer: dyn.age80q6xtqk.

The neat part about this odd-looking UTI is that the information you gave the system is still there. If you ask any Mac or iOS device for the file extension for dyn.age80q6xtqk, it will immediately tell you: frob. You could transmit this UTI across a network, and it would still tell you when you ask, that the extension is frob.

This runs slightly deeper, though. When you ask for a UTI, you can specify that it must conform to a particular type. Maybe you know that your frob file is really some kind of public.text? In that case, you might instead get the answer: dyn.ah62d4r34ge80q6xtqk. You might be able to guess that this new UTI also knows that it conforms to public.text, and again, any Mac or iOS device presented with this UTI will be able to tell you so.

Why is this feature useful? Well, web server administrators will have seen the unusual behaviour of various client systems when their server doesn’t know the correct MIME type for a file; some systems try to guess from the file extension, while others throw their hands up and treat the file as raw data (which is probably but not necessarily the default on any given web server). If the web used UTIs instead, the UTI would preserve all of the information the server did have, which would result in an improved experience in some cases.

What Apple doesn’t tell you is that these dynamic UTIs can actually hold more than just a single tag class. For example, the UTI

dyn.ah62d4r34gq81k3p2su1zuppgsm10esvvhzxhe55c

has a MIME type (text/X-frob), and a file extension (frob) as well as stipulating conformance to public.text. There’s no API on Mac OS X or iOS that will construct this UTI for you, mind… so how did I do it?

Well, the first thing you need to know is that all dynamic UTIs currently start with the string dyn.a. The first part, dyn. identifies the fact that it’s a dynamically generated UTI; the ‘a’ is, I’m guessing, a format identifier (so if you’re parsing these UTIs yourself, you must check that it’s an ‘a’ and not any other character; if it isn’t an ‘a’, you should indicate that you don’t understand the UTI).

Strip that off, and we have a funny looking string, which, it turns out, is encoded in a custom form of base-32 encoding using the encoding vector

abcdefghkmnpqrstuvwxyz0123456789

Let’s decode the dynamic UTI above and see what it contains; feeding the string

h62d4r34gq81k3p2su1zuppgsm10esvvhzxhe55c

through a custom base-32 decoder, we get

?0=7:3=text/X-frob:1=frob

Hmmm. Let’s look at another example; say we have a custom tag class com.example.SpecialType… let’s generate a UTI with that. Here it is:

dyn.ah62d4r34qr104pxftbu046dqqy1fg6dfqry0c5cytf2gntpgr710e2pw

Yikes! That’s a lot longer. What happened? Well, decoding it, we see

?0=7:com.example.SpecialType=foobar

Aha. So these are key-value pairs with the tag class and tag respectively. What about that funny ?0=7 on the front? Well, when I created this UTI I said it conformed to public.text. What if it conforms instead to com.example.Borked? Now I get

dyn.ah62d425try1gn8dbrz2g23mskm11e45fqu7gg55rf3w1u2prsb0gnpwxsbw0g4pbrvnhw6dfhzxg855cqf3a

which decodes to

?0=com.example.Borked:com.example.SpecialType=foobar

Hmmm. So ?0 means “the UTI we conform to”, right? So ‘7’ must mean public.text

Well, it turns out that the system has a built-in list of UTIs that get abbreviated to the hexadecimal digits ‘0’ through ‘F’. They are:

0: UTTypeConformsTo
1: public.filename-extension
2: com.apple.ostype
3: public.mime-type
4: com.apple.nspboard-type
5: public.url-scheme
6: public.data
7: public.text
8: public.plain-text
9: public.utf16-plain-text
A: com.apple.traditional-mac-plain-text
B: public.image
C: public.video
D: public.audio
E: public.directory
F: public.folder

Now we can understand the string we saw earlier;

?0=7:3=text/X-frob:1=frob

expands to

?UTTypeConformsTo=public.text:public.mime-type=text/X-frob:public.filename-extension=frob

There are, as ever, a couple of niggles. The first is that tags might in general contain special characters (for instance, ‘=’ signs) that would mess us up. What does the UTI system do about these? It escapes them, that’s what. The set of characters that are escaped is:

, : = \ NUL

The ‘,’ is interesting; it turns out that each of the keys in the string can have more than one value associated with them. That is, it’s legal for the UTI to encode something like

?0=7,B:3=text/X-frob,image/X-frob:1=frob

resulting in something like

dyn.ah62d4r3qkk7dgtpyqz6hkp42fzxhe55cfvy042phqy1zuppgsm10esvvhzxhe55c

If you ask the system about this UTI, you’ll find that it conforms to both public.text and public.image, in spite of the fact that neither type conforms to the other. Unfortunately, Apple only provides a way to copy the preferred tag given a tag class and a UTI, so if you ask for its MIME type, you’ll only get text/X-frob (the first one, as per the documentation).

So, in summary:

  • We’ve seen that UTIs can encapsulate any kind of type information, not just the obvious ones like file extensions. You can define your own type information, entirely orthogonal to the set Apple uses, if you like.

  • We’ve seen that conformance information allows programs using UTIs to determine that (for instance) they know how to handle data with a UTI they’ve never even seen before because it conforms to a UTI they do understand.

  • We’ve seen that it’s possible to generate dynamic UTIs that will remember the information with which they were created even when passed to another system.

  • We’ve seen how these dynamic are generated, and the mechanism by which they hold the information given when they were created.

If you’re interested in reading Apple’s documentation, you might find their Uniform Type Identifiers Overview informative. Note that the format of the dynamic UTIs is undocumented. If you are going to rely on the information I presented above, make sure you check that they start dyn.a and do not assume that you understand any dynamic UTI that has a different format identifier.