thread safety requirements in MIT krb5 libraries
Nicolas Williams
Nicolas.Williams at sun.com
Thu Dec 18 17:45:14 EST 2003
On Thu, Dec 18, 2003 at 12:41:27AM -0500, Ken Raeburn wrote:
> The proposal for thread safety changes to the MIT Kerberos library
> almost two years ago had certain assumptions built in, some of which
> seem not to be relevant any longer, or at least not nearly as important
> as we had guessed they might be.
>
> I'd like to get some discussion on some of them.
>
> Previous discussions, for reference:
> http://mv.ezproxy.com.ezproxyberklee.flo.org/menelaus.mit.edu/krb5dev/6761
> http://mailman.mit.edu.ezproxyberklee.flo.org/pipermail/krbdev/2003-August/001838.html
> and other messages from those threads.
>
> Callbacks:
[...]
> Single-threaded programs:
>
> Obviously, a single-threaded program must continue to work. I believe
> creating threads in a program written to be single-threaded may
> complicate the signal handling semantics quite a bit, so the library
> can't create threads of its own. To the best of my knowledge, though,
> compiling against system threading headers and simply not ever calling
> the thread creation functions should be fully compatible with a
> single-threaded application.
Right.
> It would be nice if we could avoid having to link against the system
> pthread library, when it is a separate library, if the program isn't
> going to create threads. Many systems have support for weak references
> (where &foo is a null pointer rather than a link-time error if foo
> isn't linked in), which would let us make the pthread library optional.
> If that doesn't work on some system, however, I don't think it's a
> disaster if we start requiring the pthread library. In shared-library
> builds, it would happen through library dependencies; in static-library
> builds, if the application builder is using the "krb5-config" script
> we've been trying to encourage people to use (with what degree of
> success, I don't know), then it should still be automatic.
This is not a consideration for Sun.
>
> Shim layer:
[...]
Ok.
> Thread-safe objects:
>
> We've been assuming for a long time that a krb5_context would be used
> in only one thread at a time, for performance reasons and to reduce the
> implementation work on our part. I don't think we've talked about the
> krb5_auth_context objects; I've kind of assumed they'd always be used
> in conjunction with the krb5_context under which they were created.
Don't forget that the mechanism currently uses a single global
krb5_context.
Perhaps operations' effect on the krb5_context can be described as
destructive, reader or nil and use reader/writer locks on krb5_context.
Or perhaps the mech can be modified to use a per-thread krb5_context
(much as is often done with errno). A krb5 gss context can always be
associated with a krb5_context, for gss ops that involve gss contexts.
> There's been discussion about being able to use certain other objects
> in multiple threads, and to take objects created in one thread and
> context and use them in another thread and context serially.
Yes.
> Specifically, replay caches, credentials caches, and key tables would
> be good to have locked so they can be shared across threads (especially
> the replay cache, for a multithreaded server). Principal names and
> other small data we'd like to be able to "move" from one context to
> another, so they shouldn't share any data with the krb5_context, or if
> they do, we document the heck out of what functions return references
> to context data and what functions create independent copies.
And corresponding GSS objects.
> This also means that, unlike in the scheme proposed two years ago, an
> arbitrary number of locks may be needed, so we need the ability to
> create and destroy them dynamically, rather than request a fixed number
> at initialization time.
Correct.
> File locking:
>
> It looks like the standard UNIX/POSIX file locking techniques have some
> interesting drawbacks for multiple threads in contention over a file.
> Basically, the locks are per file and not per file descriptor, and per
> process and not per thread. Furthermore, closing any file descriptor
> opened on a given file releases the locks associated with that file, so
> opening a file, doing an fstat to see if it's the same as an already
> opened file, and closing it, will release any locks held on the opened
> file in another thread of the same process.
Er, why not use stat(2) instead of open(2), fstat(2), close(2)?
> We could maintain a global (per-process) list of files held open with
> locking capability, and check the 'fstat' data for files in this cache
> before locking another file. A file matching an already open file can
> be closed as soon as we know we don't have any locks held on that file
> via other file descriptors. If a file is opened and closed without
> locking, and another thread has a lock on another file descriptor on
> the same file, we may have a problem...but perhaps we could just have
> the close operation block on the release of the lock?
A user-land lock manager shim is a good choice if the problem can't be
avoided. Another possibility is to use lockfiles instead of file
locking.
> Sam suggested we could consider an application restriction that all
> references to a file must use the same name, i.e., "/var/tmp/foobar"
> and "/tmp/foobar" would be handled as separate files, even if "/tmp" is
> a symlink to "/var/tmp". It would simplify things greatly, and in most
> cases wouldn't be a problem, but it still makes me uncomfortable.
If you'll go as far as having a user-land lock manager shim you might as
well use {fsid, inode number} tuples to index the lock manager
structures.
> Using the pathname would, however, be a good first cut -- i.e., we
> shouldn't need to use fstat to know that two replay caches opened with
> the same absolute pathname will be the same, given that the replay
> cache is supposed to refresh from the file if it changes.
Right.
> The filename checks would be a good idea in any case, though. If two
> or ten threads open the same replay cache, it's silly to have multiple
> copies of all of the data, and have each thread reload it every time
> another thread changes it.
Definitely.
> Dynamic loading:
>
> I believe it's going to be a requirement that we be able to load the
> Kerberos or GSSAPI library dynamically, do some stuff with it, and
> unload it, and repeat the cycle, without resource leaks, at least for a
> properly written program. So any thread-specific storage or
> globally-used heap storage we keep but hide from the application needs
> to be freed up when the library is unloaded. That shouldn't be hard,
> but the internal APIs we use for per-thread storage might need a little
> adjusting from the POSIX versions to support this better.
>
> What does "a properly written program" mean in this case? Well, I
> assume the caller will probably have to free up any objects it's
> created through the library APIs before unloading the library. The
> unload-time cleanup should only need to deal with stuff we maintain
> under the covers.
This may mean cleaning up the krb5 API to ensure the consistent use of
krb5 constructors and destructors for krb5 objects.
> Cancellation:
>
> It would probably be difficult, if not impossible, to make the library
> code be async cancel safe. However, making it safe for synchronous
> cancellation may be doable. How much do threaded programs actually use
> pthread_cancel or the Windows equivalent?
>
> There's another kind of cancellation that might be desirable too. If
> thread 1 manages a GUI with a cancel button, and thread 2 is waiting
> for packets from the nameserver or KDC, or is running a long
> calculation to generate a key, thread 1 may want to tell thread 2 to
> stop what it's doing. Cancelling the entire thread is one way of doing
> that, but might we want to be able to cancel just the current
> operation, and propagate that fact up to the caller? This is probably
> a question for people working on long-running Mac and Windows GUI
> programs; UNIX guys like me just hit control-C in our terminal windows.
> :-)
>
> This latter form may just involve having a flag someplace (krb5
> context?) that will tell the Kerberos library to simply return a
> special error code from whatever it's currently doing, as quickly as
> possible. But how do we get the flag set? Do multiple threads come
> into it again?
A thread-local variable pointing at the current krb5_context which is
set/restored on krb5 API entry/exit?
>
> Feature testing:
>
> How does an application know if the Kerberos library it's using is
> thread-safe? We can do all we want in the Kerberos library, but if
> we've only got gethostbyname available for address lookups, for
> example, the resulting program can't be thread-safe. And I know at
> least one getaddrinfo implementation that's not thread-safe.
Er, if lack of some needed thread-safe utility on some platform causes
some function to be thread-unsafe then that function should use a big
lock to protect calls to the thread-unsafe utility and the documentation
should be clear that on that platform that function uses a thread-unsafe
utility (and which).
> Should we provide a way to describe which objects can be used
> simultaneously from multiple threads and which cannot, in case we add
> mutex protection to additional objects in the future?
So that programs can adapt by merely re-linking? Nah...
> MIT applications:
>
> Unless some good reason for it can be presented, none of MIT's Kerberos
> programs will use threads. (KDC performance might be improved, for
> example, if one thread can decrypt an incoming message while another
> waits for disk blocks to be paged in from the database. But I don't
> know that we need that sort of minor gain at this time, especially
> given the work required.)
Krb5kdc should really be multi-threaded, particularly once PKINIT comes
into the picture. This probably has some impact on libkdb. But this
should be a low priority item.
Cheers,
Nico
--
More information about the krbdev
mailing list