2013-03-20 15:34:10

by Ming Lei

[permalink] [raw]
Subject: [PATCH 0/2] sysfs: fix use after free in sysfs_readdir()

Hi,

These two patches fix two bugs inside sysfs_readdir(), and both
the two bugs may cause use after free problem, which is reported
by Dave on trinity fuzz test on syscall.


Thanks,
--
Ming Lei


2013-03-20 15:26:44

by Ming Lei

[permalink] [raw]
Subject: [PATCH 1/2] sysfs: fix race between readdir and lseek

While readdir() is running, lseek() may set filp->f_pos as zero,
then may leave filp->private_data pointing to one sysfs_dirent
object without holding its reference counter, so the sysfs_dirent
object may be used after free in next readdir().

This patch holds inode->i_mutex to avoid the problem since
the lock is always held in readdir path.

Reported-by: Dave Jones <[email protected]>
Tested-by: Sasha Levin <[email protected]>
Cc: <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
---
fs/sysfs/dir.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 2fbdff6..c9e1660 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -1058,10 +1058,21 @@ static int sysfs_readdir(struct file * filp, void * dirent, filldir_t filldir)
return 0;
}

+static loff_t sysfs_dir_llseek(struct file *file, loff_t offset, int whence)
+{
+ struct inode *inode = file_inode(file);
+ loff_t ret;
+
+ mutex_lock(&inode->i_mutex);
+ ret = generic_file_llseek(file, offset, whence);
+ mutex_unlock(&inode->i_mutex);
+
+ return ret;
+}

const struct file_operations sysfs_dir_operations = {
.read = generic_read_dir,
.readdir = sysfs_readdir,
.release = sysfs_dir_release,
- .llseek = generic_file_llseek,
+ .llseek = sysfs_dir_llseek,
};
--
1.7.9.5

2013-03-20 15:26:55

by Ming Lei

[permalink] [raw]
Subject: [PATCH 2/2] sysfs: handle failure path correctly for readdir()

In case of 'if (filp->f_pos == 0 or 1)' of sysfs_readdir(),
the failure from filldir() isn't handled, and the reference counter
of the sysfs_dirent object pointed by filp->private_data will be
released without clearing filp->private_data, so use after free
bug will be triggered later.

This patch returns immeadiately under the situation for fixing the bug,
and it is reasonable to return from readdir() when filldir() fails.

Reported-by: Dave Jones <[email protected]>
Tested-by: Sasha Levin <[email protected]>
Cc: <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
---
fs/sysfs/dir.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index c9e1660..e145126 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -1020,6 +1020,8 @@ static int sysfs_readdir(struct file * filp, void * dirent, filldir_t filldir)
ino = parent_sd->s_ino;
if (filldir(dirent, ".", 1, filp->f_pos, ino, DT_DIR) == 0)
filp->f_pos++;
+ else
+ return 0;
}
if (filp->f_pos == 1) {
if (parent_sd->s_parent)
@@ -1028,6 +1030,8 @@ static int sysfs_readdir(struct file * filp, void * dirent, filldir_t filldir)
ino = parent_sd->s_ino;
if (filldir(dirent, "..", 2, filp->f_pos, ino, DT_DIR) == 0)
filp->f_pos++;
+ else
+ return 0;
}
mutex_lock(&sysfs_mutex);
for (pos = sysfs_dir_pos(ns, parent_sd, filp->f_pos, pos);
--
1.7.9.5

2013-03-20 16:26:48

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 2/2] sysfs: handle failure path correctly for readdir()

On Wed, Mar 20, 2013 at 9:25 AM, Ming Lei <[email protected]> wrote:
> In case of 'if (filp->f_pos == 0 or 1)' of sysfs_readdir(),
> the failure from filldir() isn't handled, and the reference counter
> of the sysfs_dirent object pointed by filp->private_data will be
> released without clearing filp->private_data, so use after free
> bug will be triggered later.
>
> This patch returns immeadiately under the situation for fixing the bug,
> and it is reasonable to return from readdir() when filldir() fails.
>
> Reported-by: Dave Jones <[email protected]>
> Tested-by: Sasha Levin <[email protected]>
> Cc: <[email protected]>
> Signed-off-by: Ming Lei <[email protected]>
> ---
> fs/sysfs/dir.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
> index c9e1660..e145126 100644
> --- a/fs/sysfs/dir.c
> +++ b/fs/sysfs/dir.c
> @@ -1020,6 +1020,8 @@ static int sysfs_readdir(struct file * filp, void * dirent, filldir_t filldir)
> ino = parent_sd->s_ino;
> if (filldir(dirent, ".", 1, filp->f_pos, ino, DT_DIR) == 0)
> filp->f_pos++;
> + else
> + return 0;
> }
> if (filp->f_pos == 1) {
> if (parent_sd->s_parent)
> @@ -1028,6 +1030,8 @@ static int sysfs_readdir(struct file * filp, void * dirent, filldir_t filldir)
> ino = parent_sd->s_ino;
> if (filldir(dirent, "..", 2, filp->f_pos, ino, DT_DIR) == 0)
> filp->f_pos++;
> + else
> + return 0;
> }

Looks good to me. This is just an observation. readdir callers are
checking against NULL as opposed 0. Not a problem really probably
since NULL is defined as 0.

-- Shuah

2013-03-21 02:41:42

by Zefan Li

[permalink] [raw]
Subject: Re: [PATCH 1/2] sysfs: fix race between readdir and lseek

On 2013/3/20 23:25, Ming Lei wrote:
> While readdir() is running, lseek() may set filp->f_pos as zero,
> then may leave filp->private_data pointing to one sysfs_dirent
> object without holding its reference counter, so the sysfs_dirent
> object may be used after free in next readdir().
>
> This patch holds inode->i_mutex to avoid the problem since
> the lock is always held in readdir path.
>

In fact the same race exists between readdir() and read()/write()...

> Reported-by: Dave Jones <[email protected]>
> Tested-by: Sasha Levin <[email protected]>
> Cc: <[email protected]>
> Signed-off-by: Ming Lei <[email protected]>
> ---
> fs/sysfs/dir.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)

2013-03-21 03:17:52

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH 1/2] sysfs: fix race between readdir and lseek

On Thu, Mar 21, 2013 at 10:41 AM, Li Zefan <[email protected]> wrote:
>
> In fact the same race exists between readdir() and read()/write()...

Fortunately, no read()/write() are implemented on sysfs directory, :-)

Thanks,
--
Ming Lei

2013-03-21 03:31:51

by Zefan Li

[permalink] [raw]
Subject: Re: [PATCH 1/2] sysfs: fix race between readdir and lseek

On 2013/3/21 11:17, Ming Lei wrote:
> On Thu, Mar 21, 2013 at 10:41 AM, Li Zefan <[email protected]> wrote:
>>
>> In fact the same race exists between readdir() and read()/write()...
>
> Fortunately, no read()/write() are implemented on sysfs directory, :-)
>

That's irrelevant...

See my report:

https://patchwork.kernel.org/patch/2160771/

2013-03-21 04:48:40

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH 1/2] sysfs: fix race between readdir and lseek

On Thu, Mar 21, 2013 at 11:28 AM, Li Zefan <[email protected]> wrote:
> On 2013/3/21 11:17, Ming Lei wrote:
>> On Thu, Mar 21, 2013 at 10:41 AM, Li Zefan <[email protected]> wrote:
>>>
>>> In fact the same race exists between readdir() and read()/write()...
>>
>> Fortunately, no read()/write() are implemented on sysfs directory, :-)
>>
>
> That's irrelevant...

As far as sysfs is concerned, the filp->f_ops can't be changed in
read/write path.

>
> See my report:
>
> https://patchwork.kernel.org/patch/2160771/

Yes, I know there might be some mess after the commit ef3d0fd2
(vfs: do (nearly) lockless generic_file_llseek).

Also looks it has been stated in Documentation/filesystems/Locking:

->llseek() locking has moved from llseek to the individual llseek
implementations. If your fs is not using generic_file_llseek, you
need to acquire and release the appropriate locks in your ->llseek().
For many filesystems, it is probably safe to acquire the inode
mutex or just to use i_size_read() instead.
Note: this does not protect the file->f_pos against concurrent modifications
since this is something the userspace has to take care about.


Thanks,
--
Ming Lei

2013-03-22 05:49:14

by Zefan Li

[permalink] [raw]
Subject: Re: [PATCH 1/2] sysfs: fix race between readdir and lseek

On 2013/3/21 12:48, Ming Lei wrote:
> On Thu, Mar 21, 2013 at 11:28 AM, Li Zefan <[email protected]> wrote:
>> On 2013/3/21 11:17, Ming Lei wrote:
>>> On Thu, Mar 21, 2013 at 10:41 AM, Li Zefan <[email protected]> wrote:
>>>>
>>>> In fact the same race exists between readdir() and read()/write()...
>>>
>>> Fortunately, no read()/write() are implemented on sysfs directory, :-)
>>>
>>
>> That's irrelevant...
>
> As far as sysfs is concerned, the filp->f_ops can't be changed in
> read/write path.
>

Yes, it can...As I said, it's irrelevant, because it's vfs that changes
file->f_pos.

SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
{
struct fd f = fdget(fd);
ssize_t ret = -EBADF;

if (f.file) {
loff_t pos = file_pos_read(f.file); <--- read f_pos
ret = vfs_read(f.file, buf, count, &pos); <--- return -EISDIR
file_pos_write(f.file, pos); <--- write f_pos
fdput(f);
}
return ret;
}

>>
>> See my report:
>>
>> https://patchwork.kernel.org/patch/2160771/
>
> Yes, I know there might be some mess after the commit ef3d0fd2
> (vfs: do (nearly) lockless generic_file_llseek).
>
> Also looks it has been stated in Documentation/filesystems/Locking:
>
> ->llseek() locking has moved from llseek to the individual llseek
> implementations. If your fs is not using generic_file_llseek, you
> need to acquire and release the appropriate locks in your ->llseek().
> For many filesystems, it is probably safe to acquire the inode
> mutex or just to use i_size_read() instead.
> Note: this does not protect the file->f_pos against concurrent modifications
> since this is something the userspace has to take care about.
>

2013-03-22 09:31:21

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH 1/2] sysfs: fix race between readdir and lseek

On Fri, Mar 22, 2013 at 1:48 PM, Li Zefan <[email protected]> wrote:
> On 2013/3/21 12:48, Ming Lei wrote:
>
> Yes, it can...As I said, it's irrelevant, because it's vfs that changes
> file->f_pos.
>
> SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
> {
> struct fd f = fdget(fd);
> ssize_t ret = -EBADF;
>
> if (f.file) {
> loff_t pos = file_pos_read(f.file); <--- read f_pos
> ret = vfs_read(f.file, buf, count, &pos); <--- return -EISDIR
> file_pos_write(f.file, pos); <--- write f_pos

Considered that f_pos of sysfs directory is always less than INT_MAX,
we need't worry about atomic writing it in file_pos_write().

The only probable problem on sysfs is below scenario in read()/write():

- pos is read as less than 2 in file_pos_read(f.file)
- ret = vfs_read(f.file, buf, count, &pos)
---> readdir() in another path
- file_pos_write(pos)
---> readdir() found f_pos becomes 0 or 1, and may cause
use-after-free problem

Considered that vfs_read()/vfs_write on sysfs dir is almost doing nothing, the
above problem may only exist in theory. The patch[1] can't avoid it too.
I am wondering if it need to be fixed, but I will try to figure out how to
avoid it in sysfs_readdir() if possible.

[1], https://patchwork.kernel.org/patch/2160771/

Thanks,
--
Ming Lei

2013-03-26 07:31:07

by Zefan Li

[permalink] [raw]
Subject: Re: [PATCH 1/2] sysfs: fix race between readdir and lseek

On 2013/3/22 17:31, Ming Lei wrote:
> On Fri, Mar 22, 2013 at 1:48 PM, Li Zefan <[email protected]> wrote:
>> On 2013/3/21 12:48, Ming Lei wrote:
>>
>> Yes, it can...As I said, it's irrelevant, because it's vfs that changes
>> file->f_pos.
>>
>> SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
>> {
>> struct fd f = fdget(fd);
>> ssize_t ret = -EBADF;
>>
>> if (f.file) {
>> loff_t pos = file_pos_read(f.file); <--- read f_pos
>> ret = vfs_read(f.file, buf, count, &pos); <--- return -EISDIR
>> file_pos_write(f.file, pos); <--- write f_pos
>
> Considered that f_pos of sysfs directory is always less than INT_MAX,
> we need't worry about atomic writing it in file_pos_write().
>
> The only probable problem on sysfs is below scenario in read()/write():
>
> - pos is read as less than 2 in file_pos_read(f.file)
> - ret = vfs_read(f.file, buf, count, &pos)
> ---> readdir() in another path
> - file_pos_write(pos)
> ---> readdir() found f_pos becomes 0 or 1, and may cause
> use-after-free problem
>
> Considered that vfs_read()/vfs_write on sysfs dir is almost doing nothing, the
> above problem may only exist in theory.

The read() vs readdir() race in sysfs directory doesn't exist in theory only.

Mar 25 11:16:57 lxc34 kernel: [ 3581.923110] ------------[ cut here ]------------
Mar 25 11:16:57 lxc34 kernel: [ 3581.923124] WARNING: at fs/sysfs/sysfs.h:195 sysfs_readdir+0x277/0x290()
Mar 25 11:16:57 lxc34 kernel: [ 3581.923131] Hardware name: Tecal RH2285
Mar 25 11:16:57 lxc34 kernel: [ 3581.923136] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_i
scsi bridge ipv6 stp llc cpufreq_conservative cpufreq_userspace cpufreq_powersave binfmt_misc fuse loop dm_mod c
oretemp acpi_cpufreq mperf crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf
128mul bnx2 iTCO_wdt iTCO_vendor_support sg i2c_i801 ehci_pci mptctl tpm_tis tpm tpm_bios serio_raw microcode lp
c_ich i2c_core hid_generic mfd_core button usbhid hid uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd
ext3 mbcache jbd fan processor ide_pci_generic ide_core ata_generic ata_piix libata mptsas mptscsih mptbase scs
i_transport_sas scsi_mod thermal thermal_sys hwmon
Mar 25 11:16:57 lxc34 kernel: [ 3581.923238] Pid: 13289, comm: a.out Not tainted 3.9.0-rc1-0.7-default+ #38
Mar 25 11:16:57 lxc34 kernel: [ 3581.923245] Call Trace:
Mar 25 11:16:57 lxc34 kernel: [ 3581.923251] [<ffffffff8120b137>] ? sysfs_readdir+0x277/0x290
Mar 25 11:16:57 lxc34 kernel: [ 3581.923258] [<ffffffff8120b137>] ? sysfs_readdir+0x277/0x290
Mar 25 11:16:57 lxc34 kernel: [ 3581.923273] [<ffffffff81042d3f>] warn_slowpath_common+0x7f/0xc0
Mar 25 11:16:57 lxc34 kernel: [ 3581.923281] [<ffffffff81042d9a>] warn_slowpath_null+0x1a/0x20
Mar 25 11:16:57 lxc34 kernel: [ 3581.923288] [<ffffffff8120b137>] sysfs_readdir+0x277/0x290
Mar 25 11:16:57 lxc34 kernel: [ 3581.923296] [<ffffffff811a11a0>] ? sys_ioctl+0x90/0x90
Mar 25 11:16:57 lxc34 kernel: [ 3581.923303] [<ffffffff811a11a0>] ? sys_ioctl+0x90/0x90
Mar 25 11:16:57 lxc34 kernel: [ 3581.923310] [<ffffffff811a1589>] vfs_readdir+0xa9/0xc0
Mar 25 11:16:57 lxc34 kernel: [ 3581.923317] [<ffffffff811a163b>] sys_getdents64+0x9b/0x110
Mar 25 11:16:57 lxc34 kernel: [ 3581.923327] [<ffffffff814bb5d9>] system_call_fastpath+0x16/0x1b
Mar 25 11:16:57 lxc34 kernel: [ 3581.923333] ---[ end trace a995ae360b2301bd ]---
Mar 25 11:16:57 lxc34 kernel: [ 3581.923339] ida_remove called for id=19055 which is not allocated.
...

And finally my kernel crashed.

I'm not asking you to fix it, but just let you know about this bug. Al proposed a fix but I don't know
if he'll work on it.

2013-03-26 08:45:35

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH 1/2] sysfs: fix race between readdir and lseek

On Tue, Mar 26, 2013 at 3:30 PM, Li Zefan <[email protected]> wrote:
>> Considered that vfs_read()/vfs_write on sysfs dir is almost doing nothing, the
>> above problem may only exist in theory.
>
> The read() vs readdir() race in sysfs directory doesn't exist in theory only.

Could you let me know if you have applied the two patches on your test?

https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=e5110f411d2ee35bf8d202ccca2e89c633060dca

https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=991f76f837bf22c5bb07261cfd86525a0a96650c

Also I appreciate it if you may share your test case...

Thanks,
--
Ming Lei

2013-03-26 14:03:36

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH 1/2] sysfs: fix race between readdir and lseek

Hi Zefan,

On Tue, Mar 26, 2013 at 4:45 PM, Ming Lei <[email protected]> wrote:
> On Tue, Mar 26, 2013 at 3:30 PM, Li Zefan <[email protected]> wrote:
>>> Considered that vfs_read()/vfs_write on sysfs dir is almost doing nothing, the
>>> above problem may only exist in theory.
>>
>> The read() vs readdir() race in sysfs directory doesn't exist in theory only.
>
> Could you let me know if you have applied the two patches on your test?
>
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=e5110f411d2ee35bf8d202ccca2e89c633060dca
>
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=991f76f837bf22c5bb07261cfd86525a0a96650c
>
> Also I appreciate it if you may share your test case...

If you mean the test code on link[1], I can't reproduce the
warning with the two sysfs fix patches in 4 hours's test.

[1], https://patchwork.kernel.org/patch/2160771/

Thanks,
--
Ming Lei

2013-03-26 15:59:11

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH 1/2] sysfs: fix race between readdir and lseek

On Tue, Mar 26, 2013 at 10:03 PM, Ming Lei <[email protected]> wrote:
>
> If you mean the test code on link[1], I can't reproduce the
> warning with the two sysfs fix patches in 4 hours's test.
>
> [1], https://patchwork.kernel.org/patch/2160771/

You are right, looks it is not a problem just in theory, and I can
reproduce it now with your test code by the following steps:

- load all modules
- run your test code on the directory of '/sys/module'
- then can observe the use after free after minutes(a bit easier to
add below debug code[1])

Previously, I can't reproduce because I just test on one specific
unused module directory.

[1], debug code
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -280,6 +280,11 @@ void release_sysfs_dirent(struct sysfs_dirent * sd)
* sd->s_parent won't change beneath us.
*/
parent_sd = sd->s_parent;
+ if(!(sd->s_flags & SYSFS_FLAG_REMOVED)) {
+ printk("%s-%d sysfs_dirent use after free: %s-%s\n",
+ __func__, __LINE__, parent_sd->s_name, sd->s_name);
+ dump_stack();
+ }


The below patch(also attached) can fix the issue.
--
diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 79a0fd2..484f25e 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -1022,6 +1022,7 @@ static int sysfs_readdir(struct file * filp,
void * dirent, filldir_t filldir)
enum kobj_ns_type type;
const void *ns;
ino_t ino;
+ loff_t off;

type = sysfs_ns_type(parent_sd);
ns = sysfs_info(dentry->d_sb)->ns[type];
@@ -1044,6 +1045,7 @@ static int sysfs_readdir(struct file * filp,
void * dirent, filldir_t filldir)
return 0;
}
mutex_lock(&sysfs_mutex);
+ off = filp->f_pos;
for (pos = sysfs_dir_pos(ns, parent_sd, filp->f_pos, pos);
pos;
pos = sysfs_dir_next_pos(ns, parent_sd, filp->f_pos, pos)) {
@@ -1055,19 +1057,24 @@ static int sysfs_readdir(struct file * filp,
void * dirent, filldir_t filldir)
len = strlen(name);
ino = pos->s_ino;
type = dt_type(pos);
- filp->f_pos = pos->s_hash;
+ off = filp->f_pos = pos->s_hash;
filp->private_data = sysfs_get(pos);

mutex_unlock(&sysfs_mutex);
- ret = filldir(dirent, name, len, filp->f_pos, ino, type);
+ ret = filldir(dirent, name, len, off, ino, type);
mutex_lock(&sysfs_mutex);
if (ret < 0)
break;
}
mutex_unlock(&sysfs_mutex);
- if ((filp->f_pos > 1) && !pos) { /* EOF */
- filp->f_pos = INT_MAX;
+
+ /* don't reference last entry if its refcount is dropped */
+ if (!pos) {
filp->private_data = NULL;
+
+ /* EOF and not changed as 0 or 1 in read/write path */
+ if (off == filp->f_pos && off > 1)
+ filp->f_pos = INT_MAX;
}
return 0;
}



Thanks,
--
Ming Lei


Attachments:
sysfs-fix-readdir-v5.patch (1.48 kB)