Patch to increase the scope of the locked critical path in fget_light to include
the conditional where there is only one reference to the passed file_struct.
Currently there is no protection against someone modifying that reference count
after it has been read in fget_light and falling into a code path where the fd
array is modified. The result is a race condition that leads to a corrupted fd
table and potential oopses. This patch corrects that by enforcing the locking
protocol that is used by all other accessors of the fd table on the 1 reference
case in fget_light. Smoke tested by me, with no failures.
Signed-off-by: Neil Horman <[email protected]>
file_table.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
--- linux-2.6.git/fs/file_table.c.racefix 2005-05-20 07:32:12.000000000 -0400
+++ linux-2.6.git/fs/file_table.c 2005-05-20 08:53:03.000000000 -0400
@@ -174,17 +174,17 @@ struct file fastcall *fget_light(unsigne
struct files_struct *files = current->files;
*fput_needed = 0;
+ spin_lock(&files->file_lock);
if (likely((atomic_read(&files->count) == 1))) {
file = fcheck_files(files, fd);
} else {
- spin_lock(&files->file_lock);
file = fcheck_files(files, fd);
if (file) {
get_file(file);
*fput_needed = 1;
}
- spin_unlock(&files->file_lock);
}
+ spin_unlock(&files->file_lock);
return file;
}
--
/***************************************************
*Neil Horman
*Software Engineer
*Red Hat, Inc.
*[email protected]
*gpg keyid: 1024D / 0x92A74FA1
*http://pgp.mit.edu
***************************************************/
On Fri, May 20, 2005 at 09:23:25AM -0400, Neil Horman wrote:
> Patch to increase the scope of the locked critical path in fget_light to include
> the conditional where there is only one reference to the passed file_struct.
> Currently there is no protection against someone modifying that reference count
> after it has been read in fget_light and falling into a code path where the fd
> array is modified. The result is a race condition that leads to a corrupted fd
> table and potential oopses. This patch corrects that by enforcing the locking
> protocol that is used by all other accessors of the fd table on the 1 reference
> case in fget_light. Smoke tested by me, with no failures.
Er... If we get 1, we *KNOW* who holds the only reference - that's us.
And to change refcount of files_struct you need to hold a reference to
it.
Do you have a full race scenario? With all participants spelled out, please.
On Fri, May 20, 2005 at 02:33:38PM +0100, Al Viro wrote:
> Er... If we get 1, we *KNOW* who holds the only reference - that's us.
> And to change refcount of files_struct you need to hold a reference to
or contents
On Fri, May 20, 2005 at 02:40:46PM +0100, Al Viro wrote:
> On Fri, May 20, 2005 at 02:33:38PM +0100, Al Viro wrote:
> > Er... If we get 1, we *KNOW* who holds the only reference - that's us.
> > And to change refcount of files_struct you need to hold a reference to
> or contents
I don't have the complete race scenario, just a stack that suggests that
files->fd was corrupted. This problem isn't recreatable at will (yet), so this
is really based on a thought experiment more than anything else. The conditions
that I was envisioning was a multithreaded application in which two threads
modified the same file descriptor at the same time. Its my understanding, from
the way I read the code that the ref count on a file_struct will still be one
for a multithreaded application, and as such it would be possible, using the
fget_light routine for one thread to be be preforming an operation on an
descriptor in the fd array, while another thread preformed another operation on
it (say close). The result potentially would be (if the two operations were on
the same fd index), that the routine operating in fget_light would retrieve a
corrupted fd value from the files->fd array, since it was being modified at the
same time by another execution context. Given the above, I suppose it might be
more appropriate to increment the reference count on thread creation as well as
process creation, but I wasn't certain of the potential side effects of doing
so. Also, every other location in the vfs that calls fcheck_files to retrieve a
value out of file_structs fd array, do so under the protection of the
file_structs file_lock, so this really seemed like a candidate for race badness
to me.
Regards
Neil
--
/***************************************************
*Neil Horman
*Software Engineer
*Red Hat, Inc.
*[email protected]
*gpg keyid: 1024D / 0x92A74FA1
*http://pgp.mit.edu
***************************************************/
On Fri, May 20, 2005 at 11:25:50AM -0400, Neil Horman wrote:
> I don't have the complete race scenario, just a stack that suggests that
> files->fd was corrupted. This problem isn't recreatable at will (yet), so this
> is really based on a thought experiment more than anything else. The conditions
> that I was envisioning was a multithreaded application in which two threads
> modified the same file descriptor at the same time. Its my understanding, from
> the way I read the code that the ref count on a file_struct will still be one
> for a multithreaded application, and as such it would be possible, using the
> fget_light routine for one thread to be be preforming an operation on an
> descriptor in the fd array, while another thread preformed another operation
Incorrect. References to files_struct are held by task_struct. Kernel
stack is determined by task_struct. So your two threads would have
to share task_struct (i.e. be purely userland ones) and could not
run in the kernel at the same time for very obvious reasons.
The rules are simple:
* all access to files_struct is done from upper-half (i.e. is
process-synchronous).
* the only files_struct you can modify is *(current->files)
* each task_struct that has ->files pointing to given files_struct
contributes 1 to ->count of that files_struct. There might be other holders
of temporary references and they also contribute to ->count.
* all changes of task->files itself are process-synchronous. Only
two kinds of changes are possible:
1) current->files can be set to NULL. That drops a reference to
original files_struct.
2) current->files can be replaced with a pointer to a new copy of
previous files_struct. This operations drops a reference to old one and
sets the refcount on a copy to 1. It could either be done explicitly (when
unshare(2) gets merged into Linus' tree) or implicitly at the clone()/fork()
time. In the latter case that's done by parent to child before the child
gets a chance to run.
* at task creation time, child inherits ->files from parent; that
acquires a new reference to it. That might be followed by implicit unshare()
(see above). fork(2) always unshares ->files, clone(2) does that unless
CLONE_FILES had been passed to it in flags.
IOW, the only way for two tasks to have ->files pointing to the same object
is to have it unchanged all the way back to common ancestor. In particular,
if current->files->count is 1, we know that no other task has ->files pointing
to our files_struct and that will remain true until we call clone(). It does
*not* mean that current->files->count will remain 1; somebody might acquire
a temporary reference to our files_struct. However, we are guaranteed that
all such references will be used only for read-only access (that happens,
e.g., when somebody does ls /proc/<our_pid>/fd - they will grab a reference
to our ->files and go looking at the descriptor table; they are not allowed
to change it, though).
On Fri, May 20, 2005 at 10:28:24PM +0100, Al Viro wrote:
> On Fri, May 20, 2005 at 11:25:50AM -0400, Neil Horman wrote:
> > I don't have the complete race scenario, just a stack that suggests that
> > files->fd was corrupted. This problem isn't recreatable at will (yet), so this
> > is really based on a thought experiment more than anything else. The conditions
> > that I was envisioning was a multithreaded application in which two threads
> > modified the same file descriptor at the same time. Its my understanding, from
> > the way I read the code that the ref count on a file_struct will still be one
> > for a multithreaded application, and as such it would be possible, using the
> > fget_light routine for one thread to be be preforming an operation on an
> > descriptor in the fd array, while another thread preformed another operation
>
> Incorrect. References to files_struct are held by task_struct. Kernel
> stack is determined by task_struct. So your two threads would have
> to share task_struct (i.e. be purely userland ones) and could not
> run in the kernel at the same time for very obvious reasons.
>
> The rules are simple:
> * all access to files_struct is done from upper-half (i.e. is
> process-synchronous).
> * the only files_struct you can modify is *(current->files)
> * each task_struct that has ->files pointing to given files_struct
> contributes 1 to ->count of that files_struct. There might be other holders
> of temporary references and they also contribute to ->count.
> * all changes of task->files itself are process-synchronous. Only
> two kinds of changes are possible:
> 1) current->files can be set to NULL. That drops a reference to
> original files_struct.
> 2) current->files can be replaced with a pointer to a new copy of
> previous files_struct. This operations drops a reference to old one and
> sets the refcount on a copy to 1. It could either be done explicitly (when
> unshare(2) gets merged into Linus' tree) or implicitly at the clone()/fork()
> time. In the latter case that's done by parent to child before the child
> gets a chance to run.
> * at task creation time, child inherits ->files from parent; that
> acquires a new reference to it. That might be followed by implicit unshare()
> (see above). fork(2) always unshares ->files, clone(2) does that unless
> CLONE_FILES had been passed to it in flags.
>
> IOW, the only way for two tasks to have ->files pointing to the same object
> is to have it unchanged all the way back to common ancestor. In particular,
> if current->files->count is 1, we know that no other task has ->files pointing
> to our files_struct and that will remain true until we call clone(). It does
> *not* mean that current->files->count will remain 1; somebody might acquire
> a temporary reference to our files_struct. However, we are guaranteed that
> all such references will be used only for read-only access (that happens,
> e.g., when somebody does ls /proc/<our_pid>/fd - they will grab a reference
> to our ->files and go looking at the descriptor table; they are not allowed
> to change it, though).
Ok, thanks for the through explination here, quite educational. I'll go looking
for other causes for this.
Thanks and regards
Neil
--
/***************************************************
*Neil Horman
*Software Engineer
*Red Hat, Inc.
*[email protected]
*gpg keyid: 1024D / 0x92A74FA1
*http://pgp.mit.edu
***************************************************/