thread safety requirements in MIT krb5 libraries
Ken Raeburn
raeburn at MIT.EDU
Thu Dec 18 00:41:27 EST 2003
The proposal for thread safety changes to the MIT Kerberos library
almost two years ago had certain assumptions built in, some of which
seem not to be relevant any longer, or at least not nearly as important
as we had guessed they might be.
I'd like to get some discussion on some of them.
Previous discussions, for reference:
http://mv.ezproxy.com.ezproxyberklee.flo.org/menelaus.mit.edu/krb5dev/6761
http://mailman.mit.edu.ezproxyberklee.flo.org/pipermail/krbdev/2003-August/001838.html
and other messages from those threads.
Callbacks:
I don't think the ability to register callback functions for thread
system operations will be needed. It appears that for all the
platforms we care about, there's a standard thread system (or more than
one, mapping to the same basic OS interface so as to be interoperable).
We were concerned that people might want to be able to use, for
example, the Gnu PTH library instead of a native, preemptive pthreads,
but I haven't heard many people expressing interest.
Not supporting callback registration makes some things easier, like
letting us use the system mutex type instead of always calling an
allocation function that returns a pointer. In particular, it also
lets us use statically initialized mutex or pthread_once_t (or
equivalent) objects.
I brought this up in August, it didn't generate a lot of discussion.
Single-threaded programs:
Obviously, a single-threaded program must continue to work. I believe
creating threads in a program written to be single-threaded may
complicate the signal handling semantics quite a bit, so the library
can't create threads of its own. To the best of my knowledge, though,
compiling against system threading headers and simply not ever calling
the thread creation functions should be fully compatible with a
single-threaded application.
It would be nice if we could avoid having to link against the system
pthread library, when it is a separate library, if the program isn't
going to create threads. Many systems have support for weak references
(where &foo is a null pointer rather than a link-time error if foo
isn't linked in), which would let us make the pthread library optional.
If that doesn't work on some system, however, I don't think it's a
disaster if we start requiring the pthread library. In shared-library
builds, it would happen through library dependencies; in static-library
builds, if the application builder is using the "krb5-config" script
we've been trying to encourage people to use (with what degree of
success, I don't know), then it should still be automatic.
Shim layer:
I think we still want a set of macros and functions to provide an API
similar to a subset of the POSIX API, rather than always using the
POSIX API directly and emulating it on platforms like Windows.
Using a new API requires us to specify precisely what we require of the
underlying thread system, so that porting to a new non-POSIX system is
more straightforward. It also allows us to easily implement checks for
weak references on the platforms that use them, without having to
repeatedly code such conditionalized checks throughout the libraries.
Also, Sam has convinced me that an auxiliary library we link against
may not be a totally evil thing, especially if it's automatically
pulled in via shared library dependencies or "krb5-config --libs".
This would be a good place to put any thread support functions we find
we need. (And also, probably, a replacement for getaddrinfo, for
platforms where it doesn't exist, like the old IRIX version MIT is
using, or where it's too broken for us to use, like Mac OS X. And
other stuff like that.) This avoids any need to compile stuff into
each library, with library-dependent prefixes to avoid name collisions.
I'm not suggesting any specific API at this point; still thinking about
that.
Thread-safe objects:
We've been assuming for a long time that a krb5_context would be used
in only one thread at a time, for performance reasons and to reduce the
implementation work on our part. I don't think we've talked about the
krb5_auth_context objects; I've kind of assumed they'd always be used
in conjunction with the krb5_context under which they were created.
There's been discussion about being able to use certain other objects
in multiple threads, and to take objects created in one thread and
context and use them in another thread and context serially.
Specifically, replay caches, credentials caches, and key tables would
be good to have locked so they can be shared across threads (especially
the replay cache, for a multithreaded server). Principal names and
other small data we'd like to be able to "move" from one context to
another, so they shouldn't share any data with the krb5_context, or if
they do, we document the heck out of what functions return references
to context data and what functions create independent copies.
This also means that, unlike in the scheme proposed two years ago, an
arbitrary number of locks may be needed, so we need the ability to
create and destroy them dynamically, rather than request a fixed number
at initialization time.
File locking:
It looks like the standard UNIX/POSIX file locking techniques have some
interesting drawbacks for multiple threads in contention over a file.
Basically, the locks are per file and not per file descriptor, and per
process and not per thread. Furthermore, closing any file descriptor
opened on a given file releases the locks associated with that file, so
opening a file, doing an fstat to see if it's the same as an already
opened file, and closing it, will release any locks held on the opened
file in another thread of the same process.
We could maintain a global (per-process) list of files held open with
locking capability, and check the 'fstat' data for files in this cache
before locking another file. A file matching an already open file can
be closed as soon as we know we don't have any locks held on that file
via other file descriptors. If a file is opened and closed without
locking, and another thread has a lock on another file descriptor on
the same file, we may have a problem...but perhaps we could just have
the close operation block on the release of the lock?
Sam suggested we could consider an application restriction that all
references to a file must use the same name, i.e., "/var/tmp/foobar"
and "/tmp/foobar" would be handled as separate files, even if "/tmp" is
a symlink to "/var/tmp". It would simplify things greatly, and in most
cases wouldn't be a problem, but it still makes me uncomfortable.
Using the pathname would, however, be a good first cut -- i.e., we
shouldn't need to use fstat to know that two replay caches opened with
the same absolute pathname will be the same, given that the replay
cache is supposed to refresh from the file if it changes.
The filename checks would be a good idea in any case, though. If two
or ten threads open the same replay cache, it's silly to have multiple
copies of all of the data, and have each thread reload it every time
another thread changes it.
Dynamic loading:
I believe it's going to be a requirement that we be able to load the
Kerberos or GSSAPI library dynamically, do some stuff with it, and
unload it, and repeat the cycle, without resource leaks, at least for a
properly written program. So any thread-specific storage or
globally-used heap storage we keep but hide from the application needs
to be freed up when the library is unloaded. That shouldn't be hard,
but the internal APIs we use for per-thread storage might need a little
adjusting from the POSIX versions to support this better.
What does "a properly written program" mean in this case? Well, I
assume the caller will probably have to free up any objects it's
created through the library APIs before unloading the library. The
unload-time cleanup should only need to deal with stuff we maintain
under the covers.
Cancellation:
It would probably be difficult, if not impossible, to make the library
code be async cancel safe. However, making it safe for synchronous
cancellation may be doable. How much do threaded programs actually use
pthread_cancel or the Windows equivalent?
There's another kind of cancellation that might be desirable too. If
thread 1 manages a GUI with a cancel button, and thread 2 is waiting
for packets from the nameserver or KDC, or is running a long
calculation to generate a key, thread 1 may want to tell thread 2 to
stop what it's doing. Cancelling the entire thread is one way of doing
that, but might we want to be able to cancel just the current
operation, and propagate that fact up to the caller? This is probably
a question for people working on long-running Mac and Windows GUI
programs; UNIX guys like me just hit control-C in our terminal windows.
:-)
This latter form may just involve having a flag someplace (krb5
context?) that will tell the Kerberos library to simply return a
special error code from whatever it's currently doing, as quickly as
possible. But how do we get the flag set? Do multiple threads come
into it again?
Feature testing:
How does an application know if the Kerberos library it's using is
thread-safe? We can do all we want in the Kerberos library, but if
we've only got gethostbyname available for address lookups, for
example, the resulting program can't be thread-safe. And I know at
least one getaddrinfo implementation that's not thread-safe.
Should we provide a way to describe which objects can be used
simultaneously from multiple threads and which cannot, in case we add
mutex protection to additional objects in the future?
MIT applications:
Unless some good reason for it can be presented, none of MIT's Kerberos
programs will use threads. (KDC performance might be improved, for
example, if one thread can decrypt an incoming message while another
waits for disk blocks to be paged in from the database. But I don't
know that we need that sort of minor gain at this time, especially
given the work required.)
More information about the krbdev
mailing list