When one thread is calling sys_ioctl(), and another thread is calling
sys_close(), current code has protected most cases.
But for the below case, it will cause issue:
T1 T2 T3
sys_close(oldfile) sys_open(newfile) sys_ioctl(oldfile)
-> __close_fd()
lock file_lock
assign NULL file
put fd to be unused
unlock file_lock
get new fd is same as old
assign newfile to same fd
fget_flight()
get the newfile!!!
decrease file->f_count
file->f_count == 0
--> try to release file
The race is when T1 try to close one oldFD, T3 is trying to ioctl the oldFD,
if currently the new T2 is trying to open a newfile, it maybe get the newFD is
same as oldFD.
And normal case T3 should get NULL file pointer due to released by T1, but T3
get the newfile pointer, and continue the ioctl accessing.
It maybe causes unexpectable error, we hit one system panic at do_vfs_ioctl().
Here we can fix it that putting "put_unused_fd()" after filp_close(),
it can avoid this case.
Signed-off-by: liu chuansheng <[email protected]>
---
fs/file.c | 11 ++++++++++-
1 files changed, 10 insertions(+), 1 deletions(-)
diff --git a/fs/file.c b/fs/file.c
index 4a78f98..3f9b825 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -590,6 +590,7 @@ int __close_fd(struct files_struct *files, unsigned fd)
{
struct file *file;
struct fdtable *fdt;
+ int ret;
spin_lock(&files->file_lock);
fdt = files_fdtable(files);
@@ -600,9 +601,17 @@ int __close_fd(struct files_struct *files, unsigned fd)
goto out_unlock;
rcu_assign_pointer(fdt->fd[fd], NULL);
__clear_close_on_exec(fd, fdt);
+ spin_unlock(&files->file_lock);
+ ret = filp_close(file, files);
+
+ /* Delaying put_unused_fd after flip_close, otherwise
+ * when race happening between fget() and close(),
+ * the fget() may get one wrong file pointer.
+ */
+ spin_lock(&files->file_lock);
__put_unused_fd(files, fd);
spin_unlock(&files->file_lock);
- return filp_close(file, files);
+ return ret;
out_unlock:
spin_unlock(&files->file_lock);
--
1.7.0.4
On Tue, Aug 27, 2013 at 12:12:49AM +0800, Chuansheng Liu wrote:
>
> When one thread is calling sys_ioctl(), and another thread is calling
> sys_close(), current code has protected most cases.
>
> But for the below case, it will cause issue:
> T1 T2 T3
> sys_close(oldfile) sys_open(newfile) sys_ioctl(oldfile)
> -> __close_fd()
> lock file_lock
> assign NULL file
> put fd to be unused
> unlock file_lock
> get new fd is same as old
> assign newfile to same fd
> fget_flight()
> get the newfile!!!
> decrease file->f_count
> file->f_count == 0
> --> try to release file
>
> The race is when T1 try to close one oldFD, T3 is trying to ioctl the oldFD,
> if currently the new T2 is trying to open a newfile, it maybe get the newFD is
> same as oldFD.
>
> And normal case T3 should get NULL file pointer due to released by T1, but T3
> get the newfile pointer, and continue the ioctl accessing.
>
> It maybe causes unexpectable error, we hit one system panic at do_vfs_ioctl().
>
> Here we can fix it that putting "put_unused_fd()" after filp_close(),
> it can avoid this case.
NAK. T3 getting the new file is valid (think what happens if T1 returns from
close() before T2 enters open() and T3 hits ioctl() after both of those),
the userland code is, at the very least, racy and no, moving put_unused_fd()
around is not going to solve any problems - it might shift the race window,
but that's it.
It certainly does not affect the possibility of panics in do_vfs_ioctl()
you are seeing and I would really like to see the details on those instead of
this kind of voodoo "fixes".
On Tue, 2013-08-27 at 00:12 +0800, Chuansheng Liu wrote:
> When one thread is calling sys_ioctl(), and another thread is calling
> sys_close(), current code has protected most cases.
>
> But for the below case, it will cause issue:
> T1 T2 T3
> sys_close(oldfile) sys_open(newfile) sys_ioctl(oldfile)
> -> __close_fd()
> lock file_lock
> assign NULL file
> put fd to be unused
> unlock file_lock
> get new fd is same as old
> assign newfile to same fd
> fget_flight()
> get the newfile!!!
> decrease file->f_count
> file->f_count == 0
> --> try to release file
>
> The race is when T1 try to close one oldFD, T3 is trying to ioctl the oldFD,
> if currently the new T2 is trying to open a newfile, it maybe get the newFD is
> same as oldFD.
>
> And normal case T3 should get NULL file pointer due to released by T1, but T3
> get the newfile pointer, and continue the ioctl accessing.
>
> It maybe causes unexpectable error, we hit one system panic at do_vfs_ioctl().
>
Not clear if the bug is not elsewhere.
What panic did you have exactly ?
> Here we can fix it that putting "put_unused_fd()" after filp_close(),
> it can avoid this case.
>
Three threads doing this kind of stuff cannot expect T3 gets the old or
new file anyway. Its totally unspecified.
Hello Al and Eric,
Thanks your comments, please see the below comments.
> -----Original Message-----
> From: Al Viro [mailto:[email protected]] On Behalf Of Al Viro
> Sent: Monday, August 26, 2013 7:30 PM
> To: Liu, Chuansheng
> Cc: [email protected]; [email protected]
> Subject: Re: [PATCH] Fix the race between the fget() and close()
>
> On Tue, Aug 27, 2013 at 12:12:49AM +0800, Chuansheng Liu wrote:
> >
> > When one thread is calling sys_ioctl(), and another thread is calling
> > sys_close(), current code has protected most cases.
> >
> > But for the below case, it will cause issue:
> > T1 T2
> T3
> > sys_close(oldfile) sys_open(newfile)
> sys_ioctl(oldfile)
> > -> __close_fd()
> > lock file_lock
> > assign NULL file
> > put fd to be unused
> > unlock file_lock
> > get new fd is same as old
> > assign newfile to same fd
> > fget_flight()
> >
> get the newfile!!!
> > decrease file->f_count
> > file->f_count == 0
> > --> try to release file
> >
> > The race is when T1 try to close one oldFD, T3 is trying to ioctl the oldFD,
> > if currently the new T2 is trying to open a newfile, it maybe get the newFD is
> > same as oldFD.
> >
> > And normal case T3 should get NULL file pointer due to released by T1, but T3
> > get the newfile pointer, and continue the ioctl accessing.
> >
> > It maybe causes unexpectable error, we hit one system panic at
> do_vfs_ioctl().
> >
> > Here we can fix it that putting "put_unused_fd()" after filp_close(),
> > it can avoid this case.
>
> NAK. T3 getting the new file is valid (think what happens if T1 returns from
> close() before T2 enters open() and T3 hits ioctl() after both of those),
> the userland code is, at the very least, racy and no, moving put_unused_fd()
> around is not going to solve any problems - it might shift the race window,
> but that's it.
Yes, the fix is really insane, I realized it after sent it.
But I think the race is there, right?
>
> It certainly does not affect the possibility of panics in do_vfs_ioctl()
> you are seeing and I would really like to see the details on those instead of
> this kind of voodoo "fixes".
The whole panic stack is below, and my kernel code is:
int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd,
unsigned long arg)
{
int error = 0;
int __user *argp = (int __user *)arg;
struct inode *inode = filp->f_path.dentry->d_inode;
=== > Panic here, due to filp->f_path.dentry == NULL, then accessing 0x20 failed.
Could you share some scenario for this case? Thanks.
<1>[ 392.448004] BUG: unable to handle kernel NULL pointer dereference at 00000020
<1>[ 392.456626] IP: [<c132c043>] do_vfs_ioctl+0x23/0x570
<4>[ 392.462545] *pde = 00000000
<4>[ 392.466101] Oops: 0000 [#1] PREEMPT SMP
<4>[ 392.471284] Modules linked in: atomisp lm3554 ov2722 imx1x5 atmel_mxt_ts vxd392 videobuf_vmalloc videobuf_core hdmi_audio bcm_bt_lpm bcm43241(O) kct_daemon(O)
<4>[ 392.489564]
<4>[ 392.491397] Pid: 3625, comm: Binder_4 Tainted: G O 3.4.43-187546-g85c9adb #1
<4>[ 392.501069] EIP: 0060:[<c132c043>] EFLAGS: 00010296 CPU: 0
<4>[ 392.507379] EIP is at do_vfs_ioctl+0x23/0x570
<4>[ 392.512539] EAX: 00000000 EBX: e7fb8cc0 ECX: c0186201 EDX: c0186201
<4>[ 392.519721] ESI: 00000009 EDI: 79022b9c EBP: db30df90 ESP: db30df1c
<4>[ 392.527040] DS: 007b ES: 007b FS: 00d8 GS: 003b SS: 0068
<4>[ 392.533253] CR0: 80050033 CR2: 00000020 CR3: 2765a000 CR4: 001007d0
<4>[ 392.540558] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
<4>[ 392.547752] DR6: ffff0ff0 DR7: 00000400
<0>[ 392.552336] Process Binder_4 (pid: 3625, ti=db30c000 task=e7602c00 task.ti=db30c000)
<0>[ 392.561177] Stack:
<4>[ 392.563707] db30df00 c14d8344 00000000 00000001 00000001 f1204910 00000000 c7db3320
<4>[ 392.573881] 76f4feb4 00000000 00000002 00000000 00000000 00000000 00000000 e7a1bd88
<4>[ 392.584192] 00000002 00000001 00000000 76f4feb4 db30dfac c128aad0 00000000 e7fb8cc0
<0>[ 392.594362] Call Trace:
<4>[ 392.597272] [<c14d8344>] ? security_file_permission+0x24/0xb0
<4>[ 392.603979] [<c128aad0>] ? sys_futex+0x80/0x130
<4>[ 392.609428] [<c131dd14>] ? fget_light+0x44/0xe0
<4>[ 392.614764] [<c132c5f8>] sys_ioctl+0x68/0x80
<4>[ 392.619930] [<c1a89331>] syscall_call+0x7/0xb
<0>[ 392.625066] Code: ff ff f3 90 eb 8e 66 90 55 89 e5 83 ec 74 89 5d f4 89 75 f8 89 7d fc 3e 8d 74 26 00 89 c3 8b 40 0c 81 f9 52 54 00 00 89 d6 89 ca <8b> 78 20 0f 84 74 03 00 00 76 72 81 f9 77 58 04 c0 0f 84 c6 02
<0>[ 392.656788] EIP: [<c132c043>] do_vfs_ioctl+0x23/0x570 SS:ESP 0068:db30df1c
<4>[ 392.664976] CR2: 0000000000000020
<1>[ 392.669816] BUG: unable to handle kernel NULL pointer dereference at 00000020
<1>[ 392.678055] IP: [<c131c8a7>] vfs_read+0x97/0x160
<4>[ 392.683460] *pde = 00000000
<4>[ 392.686835] Oops: 0000 [#2] PREEMPT SMP
<4>[ 392.691545] Modules linked in: atomisp lm3554 ov2722 imx1x5 atmel_mxt_ts vxd392 videobuf_vmalloc videobuf_core hdmi_audio bcm_bt_lpm bcm43241(O) kct_daemon(O)
<4>[ 392.708638]
<4>[ 392.710379] Pid: 345, comm: adbd Tainted: G O 3.4.43-187546-g85c9adb #1
<4>[ 392.719179] EIP: 0060:[<c131c8a7>] EFLAGS: 00010206 CPU: 0
<4>[ 392.725449] EIP is at vfs_read+0x97/0x160
<4>[ 392.730015] EAX: 00000018 EBX: f108a6c0 ECX: 00000000 EDX: 00000000
<4>[ 392.737154] ESI: 00000001 EDI: 00000018 EBP: f0e23f8c ESP: f0e23f6c
<4>[ 392.744251] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
<4>[ 392.750424] CR0: 8005003b CR2: 00000020 CR3: 30cc6000 CR4: 001007d0
<4>[ 392.757517] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
<4>[ 392.764658] DR6: ffff0ff0 DR7: 00000400
<0>[ 392.769022] Process adbd (pid: 345, ti=f0e22000 task=f1806880 task.ti=f0e22000)
<0>[ 392.777329] Stack:
<4>[ 392.779657] f0e23f9c 00000000 f108a6c0 c17df320 c131dd14 f108a6c0 080b7440 00000018
<4>[ 392.789056] f0e23fac c131c9ad f0e23f9c 00000001 00000000 00000000 00000006 080b7440
<4>[ 392.798400] f0e22000 c1a89331 00000006 080ba17c 00000018 080b7440 00000018 40310d48
<0>[ 392.807831] Call Trace:
<4>[ 392.810647] [<c17df320>] ? mtp_ioctl+0x410/0x410
<4>[ 392.816034] [<c131dd14>] ? fget_light+0x44/0xe0
<4>[ 392.821282] [<c131c9ad>] sys_read+0x3d/0x70
<4>[ 392.826145] [<c1a89331>] syscall_call+0x7/0xb
<4>[ 392.831246] [<c1a80000>] ? uncompress_inline.isra.42+0x86/0xff
<0>[ 392.837950] Code: 89 f2 8b 40 08 89 45 ec 85 c0 8b 45 08 89 04 24 89 d8 0f 84 b4 00 00 00 8b 75 ec ff d6 89 c7 85 ff 7e 6c 8b 53 0c be 01 00 00 00 <8b> 42 20 89 45 f0 0f b7 00 66 25 00 f0 66 3d 00 40 b8 01 00 00
<0>[ 392.864457] EIP: [<c131c8a7>] vfs_read+0x97/0x160 SS:ESP 0068:f0e23f6c
<4>[ 392.872059] CR2: 0000000000000020
<4>[ 392.877014] ---[ end trace ed88a6d4a6a2b4b7 ]---
<0>[ 392.882285] Kernel panic - not syncing: Fatal exception
On Mon, Aug 26, 2013 at 11:56:43PM +0000, Liu, Chuansheng wrote:
> Yes, the fix is really insane, I realized it after sent it.
> But I think the race is there, right?
Yes, in userland code. Depending on the timing, T3 might
* issue ioctl() on old file if called before close() in T1
* fail with EBADF if called between the close() by T1 and reopen by T2
* fail with EBADF if open() in T2 happens before the file gets closed
by T1 and ioctl() happens after close() - in that case new descriptor will
be different from the old one
* issue ioctl() on new file if reopen happens after close() and
ioctl() happens after open().
* issue ioctl() on a completely unrelated file, if some other
thread calls open()/pipe()/whatnot between close() in T1 and open() in T2
IOW, userland code doing that kind of things is very, very broken - it
can't depend on which file (if any) ioctl() will be applied to. The
kernel should *not* panic/oops/whatnot on that, of course. And there
we have four cases:
1) fget/fget_light picks old pointer to file and atomic increment of
->f_count succeeds. In that case close() is irrelevant - we have pinned
struct file down and the final fput() won't happen until we are done.
open() is even more irrelevant in that case, of course, since we don't
even look at new struct file.
2) fget/fget_light picks old pointer to file just as close() had
been replacing it with NULL and dropping the final reference to old struct
file; by the time we do atomic increment of f_count, it has already reached
0 and atomic_long_inc_not_zero fails. Since we are under rcu_read_lock(),
struct file memory couldn't have been reused yet, so looking at f_count is
safe. We act as if we'd been called just *after* close() removing the
pointer to file from the table (instead of just as it was clearing that
pointer) and fail with EBADF.
3) fget/fget_light picks NULL; nothing to do, EBADF time for us.
4) fget/fget_light picks pointer to *new* struct file. In that
case close() (and all earlier history of that descriptor) is irrelevant -
it's the same as if ioctl() had been done right after open().
> The whole panic stack is below, and my kernel code is:
> int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd,
> unsigned long arg)
> {
> int error = 0;
> int __user *argp = (int __user *)arg;
> struct inode *inode = filp->f_path.dentry->d_inode;
> === > Panic here, due to filp->f_path.dentry == NULL, then accessing 0x20 failed.
>
> Could you share some scenario for this case? Thanks.
*shrug*
Might be buggered refcounting on struct file somewhere (i.e. extra fput() done,
getting the file freed *before* close(), leaving a dangling pointer in
descriptor table). Might be memory corruption of some kind, slapping junk
pointer into descriptor table. Might be buggered refcounting on struct
dentry - if extra dput() is done somewhere, dentry might get freed under
us or become negative.
Hell, might be buggered refcounting on descriptor table - binder is playing
interesting games there. Try to reproduce that with CONFIG_DEBUG_KMEMLEAK
and slab debugging turned on, see if you hit anything from those; if it's
more or less readily reproducible, I would start with that - too many
scenarios involve broken refcounting of one sort or another.
On Tue, Aug 27, 2013 at 01:42:47AM +0100, Al Viro wrote:
> Might be buggered refcounting on struct file somewhere (i.e. extra fput() done,
> getting the file freed *before* close(), leaving a dangling pointer in
> descriptor table). Might be memory corruption of some kind, slapping junk
> pointer into descriptor table. Might be buggered refcounting on struct
> dentry - if extra dput() is done somewhere, dentry might get freed under
> us or become negative.
>
> Hell, might be buggered refcounting on descriptor table - binder is playing
> interesting games there. Try to reproduce that with CONFIG_DEBUG_KMEMLEAK
> and slab debugging turned on, see if you hit anything from those; if it's
> more or less readily reproducible, I would start with that - too many
> scenarios involve broken refcounting of one sort or another.
Nevermind dentry refcounting - you get NULL dentry, not NULL inode.
Other scenarios still remain, so I'd really recommend slab/kmemleak
debugging turned on.
Thanks Al.
> -----Original Message-----
> From: Al Viro [mailto:[email protected]] On Behalf Of Al Viro
> Sent: Tuesday, August 27, 2013 8:43 AM
> To: Liu, Chuansheng
> Cc: Eric Dumazet; [email protected]; [email protected]
> Subject: Re: [PATCH] Fix the race between the fget() and close()
>
> On Mon, Aug 26, 2013 at 11:56:43PM +0000, Liu, Chuansheng wrote:
>
>
> Hell, might be buggered refcounting on descriptor table - binder is playing
I also suspected binder do some strange thing there, but in the panic log, there is another
process causes this case during vfs_read(), maybe it is not related with binder totally.
<1>[ 392.669816] BUG: unable to handle kernel NULL pointer dereference at 00000020
<1>[ 392.678055] IP: [<c131c8a7>] vfs_read+0x97/0x160
> interesting games there. Try to reproduce that with
> CONFIG_DEBUG_KMEMLEAK
> and slab debugging turned on, see if you hit anything from those; if it's
> more or less readily reproducible, I would start with that - too many
> scenarios involve broken refcounting of one sort or another.
It is not easy to hit, will try with CONFIG_DEBUG_KMEMLEAK and SLAB DEBUGGING on.
Hello Al,
> -----Original Message-----
> From: Al Viro [mailto:[email protected]] On Behalf Of Al Viro
> Sent: Tuesday, August 27, 2013 8:49 AM
> To: Liu, Chuansheng
> Cc: Eric Dumazet; [email protected]; [email protected]
> Subject: Re: [PATCH] Fix the race between the fget() and close()
>
> On Tue, Aug 27, 2013 at 01:42:47AM +0100, Al Viro wrote:
> > Might be buggered refcounting on struct file somewhere (i.e. extra fput()
> done,
> > getting the file freed *before* close(), leaving a dangling pointer in
> > descriptor table). Might be memory corruption of some kind, slapping junk
> > pointer into descriptor table. Might be buggered refcounting on struct
> > dentry - if extra dput() is done somewhere, dentry might get freed under
> > us or become negative.
> >
> > Hell, might be buggered refcounting on descriptor table - binder is playing
> > interesting games there. Try to reproduce that with
> CONFIG_DEBUG_KMEMLEAK
> > and slab debugging turned on, see if you hit anything from those; if it's
> > more or less readily reproducible, I would start with that - too many
> > scenarios involve broken refcounting of one sort or another.
>
> Nevermind dentry refcounting - you get NULL dentry, not NULL inode.
> Other scenarios still remain, so I'd really recommend slab/kmemleak
> debugging turned on.
I think I found one of possible race here(two processes P1 and P2):
P1 has the the files_struct pointer FILES1, P2 has the files_struct pointer FILES2,
When P1 open file, the new struct file pointer SHARE_FILE will be installed into FILES1,
and file refcount is 1;
And in P1, we can get P2's files_struct FILES2, and thru _fd_install(), we can add SHARE_FILE
into P2's FILES2.
Then the same file pointer SHARE_FILE stayed in both P1 and P2's files_struct, and the panic case
will happen:
P1 P2
Open the SHARE_FILE
Installed SHARE_FILE into P2's file_struct FILES2
Ioctl(SHARE_FILE) When P2 exiting,
fget_light()
due to FILES1->refcount is 1, put_files_struct will be called,
there will be no RCU and SHARE_FILE refcount increasing will close all files including SHARE_FILE
But at this time, P1 is still operate SHARE_FILE without the refcount safety.
Then the panic will happen at vfs_ioctl() due to the SHARE_FILE has been freed.
Is it allowable that installing one file pointer into another FILES_STRUCT? Seems binder is doing the similar things.
In fact, if in ioctl function, we can call fget() instead of fget_light(), this panic can be avoided.
Is it making sense?
On Sat, Aug 31, 2013 at 05:53:11AM +0000, Liu, Chuansheng wrote:
> I think I found one of possible race here(two processes P1 and P2):
> P1 has the the files_struct pointer FILES1, P2 has the files_struct pointer FILES2,
>
> When P1 open file, the new struct file pointer SHARE_FILE will be installed into FILES1,
> and file refcount is 1;
>
> And in P1, we can get P2's files_struct FILES2, and thru _fd_install(), we can add SHARE_FILE
> into P2's FILES2.
>
> Then the same file pointer SHARE_FILE stayed in both P1 and P2's files_struct, and the panic case
> will happen:
> P1 P2
> Open the SHARE_FILE
> Installed SHARE_FILE into P2's file_struct FILES2
... without bumping refcount on SHARE_FILE? Then you really have a big
problem. task_fd_install() call is preceded by grabbing a reference
to the file we are installing, though... BTW, /* TODO: fput? */ after
that call is really bogus - the code doesn't call fput() there and it's
quite correct as is, since at that point the reference had gone into
descriptor table we'd been installing into and doesn't need to be dropped.
> Ioctl(SHARE_FILE) When P2 exiting,
> fget_light()
> due to FILES1->refcount is 1, put_files_struct will be called,
> there will be no RCU and SHARE_FILE refcount increasing will close all files including SHARE_FILE
>
> But at this time, P1 is still operate SHARE_FILE without the refcount safety.
>
> Then the panic will happen at vfs_ioctl() due to the SHARE_FILE has been freed.
>
> Is it allowable that installing one file pointer into another FILES_STRUCT? Seems binder is doing the similar things.
> In fact, if in ioctl function, we can call fget() instead of fget_light(), this panic can be avoided.
>
> Is it making sense?
No, it doesn't. For one thing, any reference in any files_struct should
contribute 1 to refcount of struct file. For another, you can modify
files_struct *ONLY* if you hold a reference to it. binder, a misdesigned
piece of shit it is, does that only via proc->files, which is set in
binder_mmap() by grabbing a new reference to current->files of mmap(2)
caller. It is safe to do (nobody can switch task's ->files to another
files_struct under it) and once that's done, there's a pinned reference
to that files_struct. If, at the time of task_fd_install(), it happens
to be task->files_struct of some process, its refcount is going to be
at least 2, fdget() done by that other process will see that descriptor
table is shared and will bump the refcount of file being accessed.
The subtle part here is that mmap() does *NOT* use fdget() - the property
we are aiming for is that if at the time of fdget() descriptor table
hadn't been shared, no new references that could be used to modify it
will be acquired until the matching fdput(). So binder_mmap() can
legitimately grab a reference to the descriptor table of calling process.
> -----Original Message-----
> From: Al Viro [mailto:[email protected]] On Behalf Of Al Viro
> Sent: Saturday, August 31, 2013 2:48 PM
> To: Liu, Chuansheng
> Cc: Eric Dumazet; [email protected]; [email protected]
> Subject: Re: [PATCH] Fix the race between the fget() and close()
>
> On Sat, Aug 31, 2013 at 05:53:11AM +0000, Liu, Chuansheng wrote:
>
> > I think I found one of possible race here(two processes P1 and P2):
> > P1 has the the files_struct pointer FILES1, P2 has the files_struct pointer
> FILES2,
> >
> > When P1 open file, the new struct file pointer SHARE_FILE will be installed
> into FILES1,
> > and file refcount is 1;
> >
> > And in P1, we can get P2's files_struct FILES2, and thru _fd_install(), we can
> add SHARE_FILE
> > into P2's FILES2.
> >
> > Then the same file pointer SHARE_FILE stayed in both P1 and P2's files_struct,
> and the panic case
> > will happen:
> > P1
> P2
> > Open the SHARE_FILE
> > Installed SHARE_FILE into P2's file_struct FILES2
>
> ... without bumping refcount on SHARE_FILE? Then you really have a big
> problem. task_fd_install() call is preceded by grabbing a reference
> to the file we are installing, though... BTW, /* TODO: fput? */ after
> that call is really bogus - the code doesn't call fput() there and it's
> quite correct as is, since at that point the reference had gone into
> descriptor table we'd been installing into and doesn't need to be dropped.
>
> > Ioctl(SHARE_FILE)
> When P2 exiting,
> > fget_light()
> > due to FILES1->refcount is 1,
> put_files_struct will be called,
> > there will be no RCU and SHARE_FILE refcount increasing
> will close all files including SHARE_FILE
> >
> > But at this time, P1 is still operate SHARE_FILE without the refcount safety.
> >
> > Then the panic will happen at vfs_ioctl() due to the SHARE_FILE has been
> freed.
> >
> > Is it allowable that installing one file pointer into another FILES_STRUCT?
> Seems binder is doing the similar things.
> > In fact, if in ioctl function, we can call fget() instead of fget_light(), this panic
> can be avoided.
> >
> > Is it making sense?
>
> No, it doesn't. For one thing, any reference in any files_struct should
> contribute 1 to refcount of struct file. For another, you can modify
> files_struct *ONLY* if you hold a reference to it. binder, a misdesigned
My scenario is:
P1 files_struct refcount is 1, P2's is 1 also.
P1 get_files_struct(P2)
P1 install one file into P2's files_struct
P1 put_files_struct(P2)
Then P1 and P2's files_struct refcount are 1, then when P1 is doing ioctl() and P2 is exiting
with put_files_struct(P2), the race will occur, my understanding is wrong?
On Sat, Aug 31, 2013 at 07:01:33AM +0000, Liu, Chuansheng wrote:
> My scenario is:
> P1 files_struct refcount is 1, P2's is 1 also.
> P1 get_files_struct(P2)
> P1 install one file into P2's files_struct
> P1 put_files_struct(P2)
>
> Then P1 and P2's files_struct refcount are 1, then when P1 is doing ioctl() and P2 is exiting
> with put_files_struct(P2), the race will occur, my understanding is wrong?
First of all, this wouldn't have been a problem (so you get a new reference
to file inserted in P2's files_struct; file refcount had been bumped, so
destruction of P2's files_struct will undo that increment of file refcount
and we are still fine). _Removal_ in a similar scenario would have been
a problem, with P2 doing fdget() while its table isn't shared, then P1
removing a reference from it and dropping a file - the last one, at that,
since fdget() assumed that the reference would've stayed in P2's descriptor
table. HOWEVER, P1 does not do get_files_struct(P2) at all - it's only
done by P2 in binder_mmap().
Again, the invariant to look for is this:
* if current->files had not been shared at fdget() time, it won't
be shared at matching fdput() and no entries will have been removed in
between.
task_fd_install()/task_close_fd() are done on proc->files, which contributes
to descriptor table refcount. All other modifications are done to
current->files, which also contributes to refcount. If at fdget() time
current->files had refcount 1, we had no other processes with task->files
pointing to this descriptor table *and* no binder_proc had their ->files
pointint to it. No new ones may appear, since new process could get
such a reference only from do_fork() called by us and new binder_proc could
get such a reference only from binder_mmap() called by us. Neither is
called between fdget() and fdput(). So in that case the only reference
to this descriptor table will remain current->files and all removals
would have to be done by ourselves (and not via task_close_fd(), at that).
And AFAICS, binder_lock() prevents proc->files being dropped under
task_close_fd() and task_fd_install(). Hell knows...
How reproducible it is? Do you have any more instances, or had that
been a one-off panic?
> -----Original Message-----
> From: Al Viro [mailto:[email protected]] On Behalf Of Al Viro
> Sent: Saturday, August 31, 2013 3:36 PM
> To: Liu, Chuansheng
> Cc: Eric Dumazet; [email protected]; [email protected]
> Subject: Re: [PATCH] Fix the race between the fget() and close()
>
> On Sat, Aug 31, 2013 at 07:01:33AM +0000, Liu, Chuansheng wrote:
>
> > My scenario is:
> > P1 files_struct refcount is 1, P2's is 1 also.
> > P1 get_files_struct(P2)
> > P1 install one file into P2's files_struct
> > P1 put_files_struct(P2)
> >
> > Then P1 and P2's files_struct refcount are 1, then when P1 is doing ioctl() and
> P2 is exiting
> > with put_files_struct(P2), the race will occur, my understanding is wrong?
>
> First of all, this wouldn't have been a problem (so you get a new reference
> to file inserted in P2's files_struct; file refcount had been bumped, so
> destruction of P2's files_struct will undo that increment of file refcount
> and we are still fine). _Removal_ in a similar scenario would have been
> a problem, with P2 doing fdget() while its table isn't shared, then P1
> removing a reference from it and dropping a file - the last one, at that,
> since fdget() assumed that the reference would've stayed in P2's descriptor
> table. HOWEVER, P1 does not do get_files_struct(P2) at all - it's only
> done by P2 in binder_mmap().
Got it, thanks. In other process, the fget() + _fd_install() should be the same
as the process call open() directly, and the file reference count will be at least 2.
>
> Again, the invariant to look for is this:
> * if current->files had not been shared at fdget() time, it won't
> be shared at matching fdput() and no entries will have been removed in
> between.
>
> task_fd_install()/task_close_fd() are done on proc->files, which contributes
> to descriptor table refcount. All other modifications are done to
> current->files, which also contributes to refcount. If at fdget() time
> current->files had refcount 1, we had no other processes with task->files
> pointing to this descriptor table *and* no binder_proc had their ->files
> pointint to it. No new ones may appear, since new process could get
> such a reference only from do_fork() called by us and new binder_proc could
> get such a reference only from binder_mmap() called by us. Neither is
> called between fdget() and fdput(). So in that case the only reference
> to this descriptor table will remain current->files and all removals
> would have to be done by ourselves (and not via task_close_fd(), at that).
>
> And AFAICS, binder_lock() prevents proc->files being dropped under
> task_close_fd() and task_fd_install(). Hell knows...
>
> How reproducible it is? Do you have any more instances, or had that
> been a one-off panic?
Just meet once yet.