2019-11-25 12:56:11

by Vladis Dronov

[permalink] [raw]
Subject: [PATCH] fs: fix use-after-free in __fput() when a chardev is removed but a file is still open

In a case when a chardev file (like /dev/ptp0) is open but an underlying
device is removed, closing this file leads to a use-after-free. This
reproduces easily in a KVM virtual machine:

# cat openptp0.c
int main() { ... fp = fopen("/dev/ptp0", "r"); ... sleep(10); }

# uname -r
5.4.0-219d5433
# cat /proc/cmdline
... slub_debug=FZP
# modprobe ptp_kvm
# ./openptp0 &
[1] 670
opened /dev/ptp0, sleeping 10s...
# rmmod ptp_kvm
# ls /dev/ptp*
ls: cannot access '/dev/ptp*': No such file or directory
# ...woken up
[ 102.375849] general protection fault: 0000 [#1] SMP
[ 102.377372] CPU: 1 PID: 670 Comm: openptp0 Not tainted 5.4.0-219d5433 #1
[ 102.379163] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ...
[ 102.381129] RIP: 0010:module_put.part.0+0x7/0x80
[ 102.383019] RSP: 0018:ffff9ba440687e00 EFLAGS: 00010202
[ 102.383451] RAX: 0000000000002000 RBX: 6b6b6b6b6b6b6b6b RCX: ffff91e736800ad0
[ 102.384030] RDX: ffffcf6408bc2808 RSI: 0000000000000247 RDI: 6b6b6b6b6b6b6b6b
[ 102.386032] ... ^^^ a slub poison
[ 102.389866] Call Trace:
[ 102.390086] __fput+0x21f/0x240
[ 102.390363] task_work_run+0x79/0x90
[ 102.390671] do_exit+0x2c9/0xad0
[ 102.390931] ? vfs_write+0x16a/0x190
[ 102.391241] do_group_exit+0x35/0x90
[ 102.391549] __x64_sys_exit_group+0xf/0x10
[ 102.391898] do_syscall_64+0x3d/0x110
[ 102.392240] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 102.392695] RIP: 0033:0x7f0fa7016246
[ 102.396615] ...
[ 102.397225] Modules linked in: [last unloaded: ptp_kvm]
[ 102.410323] Fixing recursive fault but reboot is needed!

This happens in:

static void __fput(struct file *file)
{ ...
if (file->f_op->release)
file->f_op->release(inode, file); <<< cdev is kfree'd here
if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL &&
!(mode & FMODE_PATH))) {
cdev_put(inode->i_cdev); <<< cdev fields are accessed here

because of:

__fput()
posix_clock_release()
kref_put(&clk->kref, delete_clock) <<< the last reference
delete_clock()
delete_ptp_clock()
kfree(ptp) <<< cdev is embedded in ptp
cdev_put
module_put(p->owner) <<< *p is kfree'd

The fix is to call cdev_put() before file->f_op->release(). This fix the
class of bugs when a chardev device is removed when its file is open, for
example:

# lspci
00:09.0 System peripheral: Intel Corporation 6300ESB Watchdog Timer
# ./openwdog0 &
[1] 672
opened /dev/watchdog0, sleeping 10s...
# echo 1 > /sys/devices/pci0000:00/0000:00:09.0/remove
# ls /dev/watch*
ls: cannot access '/dev/watch*': No such file or directory
# ...woken up
[ 63.500271] general protection fault: 0000 [#1] SMP
[ 63.501757] CPU: 1 PID: 672 Comm: openwdog0 Not tainted 5.4.0-219d5433 #4
[ 63.503605] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ...
[ 63.507064] RIP: 0010:module_put.part.0+0x7/0x80
[ 63.513841] RSP: 0018:ffffb96b00667e00 EFLAGS: 00010202
[ 63.515376] RAX: 0000000000002000 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000150013
[ 63.517478] RDX: 0000000000000246 RSI: 0000000000000000 RDI: 6b6b6b6b6b6b6b6b

Analyzed-by: Stephen Johnston <[email protected]>
Analyzed-by: Vern Lovejoy <[email protected]>
Signed-off-by: Vladis Dronov <[email protected]>
---
fs/file_table.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index 30d55c9a1744..21ba35024950 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -276,12 +276,12 @@ static void __fput(struct file *file)
if (file->f_op->fasync)
file->f_op->fasync(-1, file, 0);
}
- if (file->f_op->release)
- file->f_op->release(inode, file);
if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL &&
!(mode & FMODE_PATH))) {
cdev_put(inode->i_cdev);
}
+ if (file->f_op->release)
+ file->f_op->release(inode, file);
fops_put(file->f_op);
put_pid(file->f_owner.pid);
if ((mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ)
--
2.20.1


2019-12-08 19:50:11

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] fs: fix use-after-free in __fput() when a chardev is removed but a file is still open

On Mon, Nov 25, 2019 at 01:53:42PM +0100, Vladis Dronov wrote:
> In a case when a chardev file (like /dev/ptp0) is open but an underlying
> device is removed, closing this file leads to a use-after-free. This
> reproduces easily in a KVM virtual machine:
>
> # cat openptp0.c
> int main() { ... fp = fopen("/dev/ptp0", "r"); ... sleep(10); }

> static void __fput(struct file *file)
> { ...
> if (file->f_op->release)
> file->f_op->release(inode, file); <<< cdev is kfree'd here

> if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL &&
> !(mode & FMODE_PATH))) {
> cdev_put(inode->i_cdev); <<< cdev fields are accessed here
>
> because of:
>
> __fput()
> posix_clock_release()
> kref_put(&clk->kref, delete_clock) <<< the last reference
> delete_clock()
> delete_ptp_clock()
> kfree(ptp) <<< cdev is embedded in ptp
> cdev_put
> module_put(p->owner) <<< *p is kfree'd
>
> The fix is to call cdev_put() before file->f_op->release(). This fix the
> class of bugs when a chardev device is removed when its file is open, for
> example:

And what's to prevent rmmod coming and freeing ->release code right as you
are executing it?

2019-12-08 20:19:27

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] fs: fix use-after-free in __fput() when a chardev is removed but a file is still open

On Sun, Dec 08, 2019 at 07:49:07PM +0000, Al Viro wrote:
> On Mon, Nov 25, 2019 at 01:53:42PM +0100, Vladis Dronov wrote:
> > In a case when a chardev file (like /dev/ptp0) is open but an underlying
> > device is removed, closing this file leads to a use-after-free. This
> > reproduces easily in a KVM virtual machine:
> >
> > # cat openptp0.c
> > int main() { ... fp = fopen("/dev/ptp0", "r"); ... sleep(10); }
>
> > static void __fput(struct file *file)
> > { ...
> > if (file->f_op->release)
> > file->f_op->release(inode, file); <<< cdev is kfree'd here
>
> > if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL &&
> > !(mode & FMODE_PATH))) {
> > cdev_put(inode->i_cdev); <<< cdev fields are accessed here
> >
> > because of:
> >
> > __fput()
> > posix_clock_release()
> > kref_put(&clk->kref, delete_clock) <<< the last reference
> > delete_clock()
> > delete_ptp_clock()
> > kfree(ptp) <<< cdev is embedded in ptp
> > cdev_put
> > module_put(p->owner) <<< *p is kfree'd
> >
> > The fix is to call cdev_put() before file->f_op->release(). This fix the
> > class of bugs when a chardev device is removed when its file is open, for
> > example:
>
> And what's to prevent rmmod coming and freeing ->release code right as you
> are executing it?

FWIW, the bug here seems to be that the lifetime rules of cdev are fucked -
if it can get freed while its ->kobj is still alive, we have something
very wrong there. IOW, you have ptp lifetime controlled by *TWO*
refcounts - that of clk and that of of cdev->kobj. That's doesn't work.
Replace that kfree() with dropping a kobject reference, perhaps, so
that freeing would've been done by its release callback?

2019-12-27 02:28:11

by Vladis Dronov

[permalink] [raw]
Subject: [PATCH v2] ptp: fix the race between the release of ptp_clock and cdev

In a case when a ptp chardev (like /dev/ptp0) is open but an underlying
device is removed, closing this file leads to a race. This reproduces
easily in a kvm virtual machine:

ts# cat openptp0.c
int main() { ... fp = fopen("/dev/ptp0", "r"); ... sleep(10); }
ts# uname -r
5.5.0-rc3-46cf053e
ts# cat /proc/cmdline
... slub_debug=FZP
ts# modprobe ptp_kvm
ts# ./openptp0 &
[1] 670
opened /dev/ptp0, sleeping 10s...
ts# rmmod ptp_kvm
ts# ls /dev/ptp*
ls: cannot access '/dev/ptp*': No such file or directory
ts# ...woken up
[ 48.010809] general protection fault: 0000 [#1] SMP
[ 48.012502] CPU: 6 PID: 658 Comm: openptp0 Not tainted 5.5.0-rc3-46cf053e #25
[ 48.014624] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ...
[ 48.016270] RIP: 0010:module_put.part.0+0x7/0x80
[ 48.017939] RSP: 0018:ffffb3850073be00 EFLAGS: 00010202
[ 48.018339] RAX: 000000006b6b6b6b RBX: 6b6b6b6b6b6b6b6b RCX: ffff89a476c00ad0
[ 48.018936] RDX: fffff65a08d3ea08 RSI: 0000000000000247 RDI: 6b6b6b6b6b6b6b6b
[ 48.019470] ... ^^^ a slub poison
[ 48.023854] Call Trace:
[ 48.024050] __fput+0x21f/0x240
[ 48.024288] task_work_run+0x79/0x90
[ 48.024555] do_exit+0x2af/0xab0
[ 48.024799] ? vfs_write+0x16a/0x190
[ 48.025082] do_group_exit+0x35/0x90
[ 48.025387] __x64_sys_exit_group+0xf/0x10
[ 48.025737] do_syscall_64+0x3d/0x130
[ 48.026056] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 48.026479] RIP: 0033:0x7f53b12082f6
[ 48.026792] ...
[ 48.030945] Modules linked in: ptp i6300esb watchdog [last unloaded: ptp_kvm]
[ 48.045001] Fixing recursive fault but reboot is needed!

This happens in:

static void __fput(struct file *file)
{ ...
if (file->f_op->release)
file->f_op->release(inode, file); <<< cdev is kfree'd here
if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL &&
!(mode & FMODE_PATH))) {
cdev_put(inode->i_cdev); <<< cdev fields are accessed here

Namely:

__fput()
posix_clock_release()
kref_put(&clk->kref, delete_clock) <<< the last reference
delete_clock()
delete_ptp_clock()
kfree(ptp) <<< cdev is embedded in ptp
cdev_put
module_put(p->owner) <<< *p is kfree'd, bang!

Here cdev is embedded in posix_clock which is embedded in ptp_clock.
The race happens because ptp_clock's lifetime is controlled by two
refcounts: kref and cdev.kobj in posix_clock. This is wrong.

Make ptp_clock's sysfs device a parent of cdev with cdev_device_add()
created especially for such cases. This way the parent device with its
ptp_clock is not released until all references to the cdev are released.
This adds a requirement that an initialized but not exposed struct
device should be provided to posix_clock_register() by a caller instead
of a simple dev_t.

This approach was adopted from the commit 72139dfa2464 ("watchdog: Fix
the race between the release of watchdog_core_data and cdev"). See
details of the implementation in the commit 233ed09d7fda ("chardev: add
helper function to register char devs with a struct device").

Link: https://lore.kernel.org/linux-fsdevel/[email protected]/T/#u
Analyzed-by: Stephen Johnston <[email protected]>
Analyzed-by: Vern Lovejoy <[email protected]>
Signed-off-by: Vladis Dronov <[email protected]>
---
drivers/ptp/ptp_clock.c | 31 ++++++++++++++-----------------
drivers/ptp/ptp_private.h | 2 +-
include/linux/posix-clock.h | 19 +++++++++++--------
kernel/time/posix-clock.c | 31 +++++++++++++------------------
4 files changed, 39 insertions(+), 44 deletions(-)

diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
index e60eab7f8a61..61fafe0374ce 100644
--- a/drivers/ptp/ptp_clock.c
+++ b/drivers/ptp/ptp_clock.c
@@ -166,9 +166,9 @@ static struct posix_clock_operations ptp_clock_ops = {
.read = ptp_read,
};

-static void delete_ptp_clock(struct posix_clock *pc)
+static void ptp_clock_release(struct device *dev)
{
- struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
+ struct ptp_clock *ptp = container_of(dev, struct ptp_clock, dev);

mutex_destroy(&ptp->tsevq_mux);
mutex_destroy(&ptp->pincfg_mux);
@@ -213,7 +213,6 @@ struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info,
}

ptp->clock.ops = ptp_clock_ops;
- ptp->clock.release = delete_ptp_clock;
ptp->info = info;
ptp->devid = MKDEV(major, index);
ptp->index = index;
@@ -236,15 +235,6 @@ struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info,
if (err)
goto no_pin_groups;

- /* Create a new device in our class. */
- ptp->dev = device_create_with_groups(ptp_class, parent, ptp->devid,
- ptp, ptp->pin_attr_groups,
- "ptp%d", ptp->index);
- if (IS_ERR(ptp->dev)) {
- err = PTR_ERR(ptp->dev);
- goto no_device;
- }
-
/* Register a new PPS source. */
if (info->pps) {
struct pps_source_info pps;
@@ -260,8 +250,18 @@ struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info,
}
}

- /* Create a posix clock. */
- err = posix_clock_register(&ptp->clock, ptp->devid);
+ /* Initialize a new device of our class in our clock structure. */
+ device_initialize(&ptp->dev);
+ ptp->dev.devt = ptp->devid;
+ ptp->dev.class = ptp_class;
+ ptp->dev.parent = parent;
+ ptp->dev.groups = ptp->pin_attr_groups;
+ ptp->dev.release = ptp_clock_release;
+ dev_set_drvdata(&ptp->dev, ptp);
+ dev_set_name(&ptp->dev, "ptp%d", ptp->index);
+
+ /* Create a posix clock and link it to the device. */
+ err = posix_clock_register(&ptp->clock, &ptp->dev);
if (err) {
pr_err("failed to create posix clock\n");
goto no_clock;
@@ -273,8 +273,6 @@ struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info,
if (ptp->pps_source)
pps_unregister_source(ptp->pps_source);
no_pps:
- device_destroy(ptp_class, ptp->devid);
-no_device:
ptp_cleanup_pin_groups(ptp);
no_pin_groups:
if (ptp->kworker)
@@ -304,7 +302,6 @@ int ptp_clock_unregister(struct ptp_clock *ptp)
if (ptp->pps_source)
pps_unregister_source(ptp->pps_source);

- device_destroy(ptp_class, ptp->devid);
ptp_cleanup_pin_groups(ptp);

posix_clock_unregister(&ptp->clock);
diff --git a/drivers/ptp/ptp_private.h b/drivers/ptp/ptp_private.h
index 9171d42468fd..6b97155148f1 100644
--- a/drivers/ptp/ptp_private.h
+++ b/drivers/ptp/ptp_private.h
@@ -28,7 +28,7 @@ struct timestamp_event_queue {

struct ptp_clock {
struct posix_clock clock;
- struct device *dev;
+ struct device dev;
struct ptp_clock_info *info;
dev_t devid;
int index; /* index into clocks.map */
diff --git a/include/linux/posix-clock.h b/include/linux/posix-clock.h
index fe6cfdcfbc26..5cfe13293243 100644
--- a/include/linux/posix-clock.h
+++ b/include/linux/posix-clock.h
@@ -69,29 +69,32 @@ struct posix_clock_operations {
*
* @ops: Functional interface to the clock
* @cdev: Character device instance for this clock
- * @kref: Reference count.
+ * @dev: Pointer to the clock's device.
* @rwsem: Protects the 'zombie' field from concurrent access.
* @zombie: If 'zombie' is true, then the hardware has disappeared.
- * @release: A function to free the structure when the reference count reaches
- * zero. May be NULL if structure is statically allocated.
*
* Drivers should embed their struct posix_clock within a private
* structure, obtaining a reference to it during callbacks using
* container_of().
+ *
+ * Drivers should supply an initialized but not exposed struct device
+ * to posix_clock_register(). It is used to manage lifetime of the
+ * driver's private structure. It's 'release' field should be set to
+ * a release function for this private structure.
*/
struct posix_clock {
struct posix_clock_operations ops;
struct cdev cdev;
- struct kref kref;
+ struct device *dev;
struct rw_semaphore rwsem;
bool zombie;
- void (*release)(struct posix_clock *clk);
};

/**
* posix_clock_register() - register a new clock
- * @clk: Pointer to the clock. Caller must provide 'ops' and 'release'
- * @devid: Allocated device id
+ * @clk: Pointer to the clock. Caller must provide 'ops' field
+ * @dev: Pointer to the initialized device. Caller must provide
+ * 'release' filed
*
* A clock driver calls this function to register itself with the
* clock device subsystem. If 'clk' points to dynamically allocated
@@ -100,7 +103,7 @@ struct posix_clock {
*
* Returns zero on success, non-zero otherwise.
*/
-int posix_clock_register(struct posix_clock *clk, dev_t devid);
+int posix_clock_register(struct posix_clock *clk, struct device *dev);

/**
* posix_clock_unregister() - unregister a clock
diff --git a/kernel/time/posix-clock.c b/kernel/time/posix-clock.c
index ec960bb939fd..200fb2d3be99 100644
--- a/kernel/time/posix-clock.c
+++ b/kernel/time/posix-clock.c
@@ -14,8 +14,6 @@

#include "posix-timers.h"

-static void delete_clock(struct kref *kref);
-
/*
* Returns NULL if the posix_clock instance attached to 'fp' is old and stale.
*/
@@ -125,7 +123,7 @@ static int posix_clock_open(struct inode *inode, struct file *fp)
err = 0;

if (!err) {
- kref_get(&clk->kref);
+ get_device(clk->dev);
fp->private_data = clk;
}
out:
@@ -141,7 +139,7 @@ static int posix_clock_release(struct inode *inode, struct file *fp)
if (clk->ops.release)
err = clk->ops.release(clk);

- kref_put(&clk->kref, delete_clock);
+ put_device(clk->dev);

fp->private_data = NULL;

@@ -161,38 +159,35 @@ static const struct file_operations posix_clock_file_operations = {
#endif
};

-int posix_clock_register(struct posix_clock *clk, dev_t devid)
+int posix_clock_register(struct posix_clock *clk, struct device *dev)
{
int err;

- kref_init(&clk->kref);
init_rwsem(&clk->rwsem);

cdev_init(&clk->cdev, &posix_clock_file_operations);
+ err = cdev_device_add(&clk->cdev, dev);
+ if (err) {
+ pr_err("%s unable to add device %d:%d\n",
+ dev_name(dev), MAJOR(dev->devt), MINOR(dev->devt));
+ return err;
+ }
clk->cdev.owner = clk->ops.owner;
- err = cdev_add(&clk->cdev, devid, 1);
+ clk->dev = dev;

- return err;
+ return 0;
}
EXPORT_SYMBOL_GPL(posix_clock_register);

-static void delete_clock(struct kref *kref)
-{
- struct posix_clock *clk = container_of(kref, struct posix_clock, kref);
-
- if (clk->release)
- clk->release(clk);
-}
-
void posix_clock_unregister(struct posix_clock *clk)
{
- cdev_del(&clk->cdev);
+ cdev_device_del(&clk->cdev, clk->dev);

down_write(&clk->rwsem);
clk->zombie = true;
up_write(&clk->rwsem);

- kref_put(&clk->kref, delete_clock);
+ put_device(clk->dev);
}
EXPORT_SYMBOL_GPL(posix_clock_unregister);

--
2.20.1

2019-12-27 15:03:14

by Richard Cochran

[permalink] [raw]
Subject: Re: [PATCH v2] ptp: fix the race between the release of ptp_clock and cdev

On Fri, Dec 27, 2019 at 03:26:27AM +0100, Vladis Dronov wrote:
> Here cdev is embedded in posix_clock which is embedded in ptp_clock.
> The race happens because ptp_clock's lifetime is controlled by two
> refcounts: kref and cdev.kobj in posix_clock. This is wrong.
>
> Make ptp_clock's sysfs device a parent of cdev with cdev_device_add()
> created especially for such cases. This way the parent device with its
> ptp_clock is not released until all references to the cdev are released.
> This adds a requirement that an initialized but not exposed struct
> device should be provided to posix_clock_register() by a caller instead
> of a simple dev_t.
>
> This approach was adopted from the commit 72139dfa2464 ("watchdog: Fix
> the race between the release of watchdog_core_data and cdev"). See
> details of the implementation in the commit 233ed09d7fda ("chardev: add
> helper function to register char devs with a struct device").

Thanks for digging into this!

Acked-by: Richard Cochran <[email protected]>

> /**
> * posix_clock_register() - register a new clock
> - * @clk: Pointer to the clock. Caller must provide 'ops' and 'release'
> - * @devid: Allocated device id
> + * @clk: Pointer to the clock. Caller must provide 'ops' field
> + * @dev: Pointer to the initialized device. Caller must provide
> + * 'release' filed

field

Thanks,
Richard

2019-12-27 17:27:01

by Vladis Dronov

[permalink] [raw]
Subject: Re: [PATCH v2] ptp: fix the race between the release of ptp_clock and cdev

Hello, Richard,

Thank you for the review!

> > + * @dev: Pointer to the initialized device. Caller must provide
> > + * 'release' filed
>
> field

Indeed. *sigh* Nothing is ideal. Let's hope a maintainer could fix it if
this is approved.

Best regards,
Vladis Dronov | Red Hat, Inc. | The Core Kernel | Senior Software Engineer

----- Original Message -----
> From: "Richard Cochran" <[email protected]>
> To: "Vladis Dronov" <[email protected]>
> Cc: [email protected], "Alexander Viro" <[email protected]>, "Al Viro" <[email protected]>,
> [email protected], [email protected]
> Sent: Friday, December 27, 2019 4:02:19 PM
> Subject: Re: [PATCH v2] ptp: fix the race between the release of ptp_clock and cdev
>
> On Fri, Dec 27, 2019 at 03:26:27AM +0100, Vladis Dronov wrote:
> > Here cdev is embedded in posix_clock which is embedded in ptp_clock.
> > The race happens because ptp_clock's lifetime is controlled by two
> > refcounts: kref and cdev.kobj in posix_clock. This is wrong.
> >
> > Make ptp_clock's sysfs device a parent of cdev with cdev_device_add()
> > created especially for such cases. This way the parent device with its
> > ptp_clock is not released until all references to the cdev are released.
> > This adds a requirement that an initialized but not exposed struct
> > device should be provided to posix_clock_register() by a caller instead
> > of a simple dev_t.
> >
> > This approach was adopted from the commit 72139dfa2464 ("watchdog: Fix
> > the race between the release of watchdog_core_data and cdev"). See
> > details of the implementation in the commit 233ed09d7fda ("chardev: add
> > helper function to register char devs with a struct device").
>
> Thanks for digging into this!
>
> Acked-by: Richard Cochran <[email protected]>
>
> > /**
> > * posix_clock_register() - register a new clock
> > - * @clk: Pointer to the clock. Caller must provide 'ops' and 'release'
> > - * @devid: Allocated device id
> > + * @clk: Pointer to the clock. Caller must provide 'ops' field
> > + * @dev: Pointer to the initialized device. Caller must provide
> > + * 'release' filed
>
> field
>
> Thanks,
> Richard

2019-12-31 04:20:50

by David Miller

[permalink] [raw]
Subject: Re: [PATCH v2] ptp: fix the race between the release of ptp_clock and cdev

From: Vladis Dronov <[email protected]>
Date: Fri, 27 Dec 2019 03:26:27 +0100

> In a case when a ptp chardev (like /dev/ptp0) is open but an underlying
> device is removed, closing this file leads to a race. This reproduces
> easily in a kvm virtual machine:
. ..
> This happens in:
>
> static void __fput(struct file *file)
> { ...
> if (file->f_op->release)
> file->f_op->release(inode, file); <<< cdev is kfree'd here
> if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL &&
> !(mode & FMODE_PATH))) {
> cdev_put(inode->i_cdev); <<< cdev fields are accessed here
>
> Namely:
>
> __fput()
> posix_clock_release()
> kref_put(&clk->kref, delete_clock) <<< the last reference
> delete_clock()
> delete_ptp_clock()
> kfree(ptp) <<< cdev is embedded in ptp
> cdev_put
> module_put(p->owner) <<< *p is kfree'd, bang!
>
> Here cdev is embedded in posix_clock which is embedded in ptp_clock.
> The race happens because ptp_clock's lifetime is controlled by two
> refcounts: kref and cdev.kobj in posix_clock. This is wrong.
>
> Make ptp_clock's sysfs device a parent of cdev with cdev_device_add()
> created especially for such cases. This way the parent device with its
> ptp_clock is not released until all references to the cdev are released.
> This adds a requirement that an initialized but not exposed struct
> device should be provided to posix_clock_register() by a caller instead
> of a simple dev_t.
>
> This approach was adopted from the commit 72139dfa2464 ("watchdog: Fix
> the race between the release of watchdog_core_data and cdev"). See
> details of the implementation in the commit 233ed09d7fda ("chardev: add
> helper function to register char devs with a struct device").
>
> Link: https://lore.kernel.org/linux-fsdevel/[email protected]/T/#u
> Analyzed-by: Stephen Johnston <[email protected]>
> Analyzed-by: Vern Lovejoy <[email protected]>
> Signed-off-by: Vladis Dronov <[email protected]>

Applied, thanks.