After reviewing driver submissions with new cdev + ioctl usages one
common stumbling block is coordinating the shutdown of the ioctl path,
or other file operations, at driver ->remove() time. While cdev_del()
guarantees that no new file descriptors will be established, operations
on existing file descriptors can proceed indefinitely.
Given the observation that the kernel spends the resources for a percpu_ref
per request_queue shared with all block_devices on a gendisk, do the
same for all the cdev instances that share the same
cdev_add()-to-cdev_del() lifetime.
With this in place cdev_del() not only guarantees 'no new opens', but it
also guarantees 'no new operations invocations' and 'all threads running
in an operation handler have exited that handler'.
As a proof point of the way driver implementations open-code around this
gap in the api the libnvdimm ioctl path is reworked with a result of:
4 files changed, 101 insertions(+), 153 deletions(-)
---
Dan Williams (3):
cdev: Finish the cdev api with queued mode support
libnvdimm/ida: Switch to non-deprecated ida helpers
libnvdimm/ioctl: Switch to cdev_register_queued()
drivers/nvdimm/btt_devs.c | 6 +
drivers/nvdimm/bus.c | 177 +++++++++------------------------------
drivers/nvdimm/core.c | 14 ++-
drivers/nvdimm/dax_devs.c | 4 -
drivers/nvdimm/dimm_devs.c | 53 +++++++++---
drivers/nvdimm/namespace_devs.c | 14 +--
drivers/nvdimm/nd-core.h | 14 ++-
drivers/nvdimm/pfn_devs.c | 4 -
fs/char_dev.c | 108 ++++++++++++++++++++++--
include/linux/cdev.h | 21 ++++-
10 files changed, 238 insertions(+), 177 deletions(-)
On Wed, Jan 20, 2021 at 11:38 AM Dan Williams <[email protected]> wrote:
>
> After reviewing driver submissions with new cdev + ioctl usages one
> common stumbling block is coordinating the shutdown of the ioctl path,
> or other file operations, at driver ->remove() time. While cdev_del()
> guarantees that no new file descriptors will be established, operations
> on existing file descriptors can proceed indefinitely.
>
> Given the observation that the kernel spends the resources for a percpu_ref
> per request_queue shared with all block_devices on a gendisk, do the
> same for all the cdev instances that share the same
> cdev_add()-to-cdev_del() lifetime.
>
> With this in place cdev_del() not only guarantees 'no new opens', but it
> also guarantees 'no new operations invocations' and 'all threads running
> in an operation handler have exited that handler'.
Prompted by the reaction I realized that this is pushing an incomplete
story about why this is needed, and the "queued" concept is way off
base. The problem this is trying to solve is situations like this:
long xyz_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
xyz_ioctl_dev = file->private_data;
xyz_driver_context = xyz_ioctl_dev->context;
...
}
int xyz_probe(struct device *dev)
{
xyz_driver_context = devm_kzalloc(...);
...
xyz_ioctl_dev = kmalloc(...);
device_initialize(&xyz_ioctl_dev->dev);
xyz_ioctl_dev->context = xyz_driver_context;
...
cdev_device_add(&xyz_ioctl_dev->cdev, xyz_ioctl_dev->dev);
}
...where a parent driver allocates context tied to the lifetime of the
parent device driver-bind-lifetime, and that context ends up getting
used in the ioctl path. I interpret Greg's assertion "if you do this
right you don't have this problem" as "don't reference anything with a
lifetime shorter than the xyz_ioctl_dev lifetime in your ioctl
handler". That is true, but it can be awkward to constraint
xyz_driver_context to a sub-device, and it constrains some of the
convenience of devm. So the goal is to have a cdev api that accounts
for all the common lifetimes when devm is in use. So I'm now thinking
of an api like:
devm_cdev_device_add(struct device *host, struct cdev *cdev,
struct device *dev)
...where @host bounds the lifetime of data used by the cdev
file_operations, and @dev is the typical containing structure for
@cdev. Internally I would refactor the debugfs mechanism for flushing
in-flight file_operations so that is shared by the cdev
implementation. Either adopt the debugfs method for file_operations
syncing, or switch debugfs to percpu_ref (leaning towards the former).
Does this clarify the raised concerns?