xillybus_find_inode() is called by xillybus_open() and xillyusb_open()
to translate the inode's major and minor into a pointer to a relevant
data structure and an index.
But with xillyusb_open(), the data structure can be freed by
xillyusb_disconnect() during an unintentional time gap between the
release of the mutex that is taken by xillybus_find_inode() and the
mutex that is subsequently taken by xillyusb_open().
To fix this, xillybus_find_inode() supplies the pointer to the mutex that
it has locked (when returning success), so xillyusb_open() releases this
mutex only after obtaining the mutex that is specific to a device file.
This ensures that xillyusb_disconnect() won't release anything that is in
use.
This manipulation of mutexes poses no risk for a deadlock, because in
all usage scenarios, @unit_mutex (which is locked by xillybus_find_inode)
is always taken when no other mutex is locked. Hence a consistent locking
order is guaranteed.
xillybus_open() unlocks this mutex immediately, as this driver doesn't
support hot unplugging anyhow.
Reported-by: Hyunwoo Kim <[email protected]>
Signed-off-by: Eli Billauer <[email protected]>
---
drivers/char/xillybus/xillybus_class.c | 8 +++++---
drivers/char/xillybus/xillybus_class.h | 2 ++
drivers/char/xillybus/xillybus_core.c | 6 +++++-
drivers/char/xillybus/xillyusb.c | 4 +++-
4 files changed, 15 insertions(+), 5 deletions(-)
diff --git a/drivers/char/xillybus/xillybus_class.c b/drivers/char/xillybus/xillybus_class.c
index 0f238648dcfe..c846dc3ed225 100644
--- a/drivers/char/xillybus/xillybus_class.c
+++ b/drivers/char/xillybus/xillybus_class.c
@@ -211,6 +211,7 @@ void xillybus_cleanup_chrdev(void *private_data,
EXPORT_SYMBOL(xillybus_cleanup_chrdev);
int xillybus_find_inode(struct inode *inode,
+ struct mutex **to_unlock,
void **private_data, int *index)
{
int minor = iminor(inode);
@@ -227,13 +228,14 @@ int xillybus_find_inode(struct inode *inode,
break;
}
- mutex_unlock(&unit_mutex);
-
- if (!unit)
+ if (!unit) {
+ mutex_unlock(&unit_mutex);
return -ENODEV;
+ }
*private_data = unit->private_data;
*index = minor - unit->lowest_minor;
+ *to_unlock = &unit_mutex;
return 0;
}
diff --git a/drivers/char/xillybus/xillybus_class.h b/drivers/char/xillybus/xillybus_class.h
index 5dbfdfc95c65..7cf89ac8bb32 100644
--- a/drivers/char/xillybus/xillybus_class.h
+++ b/drivers/char/xillybus/xillybus_class.h
@@ -12,6 +12,7 @@
#include <linux/device.h>
#include <linux/fs.h>
#include <linux/module.h>
+#include <linux/mutex.h>
int xillybus_init_chrdev(struct device *dev,
const struct file_operations *fops,
@@ -25,6 +26,7 @@ void xillybus_cleanup_chrdev(void *private_data,
struct device *dev);
int xillybus_find_inode(struct inode *inode,
+ struct mutex **to_unlock, /* Mutex to release */
void **private_data, int *index);
#endif /* __XILLYBUS_CLASS_H */
diff --git a/drivers/char/xillybus/xillybus_core.c b/drivers/char/xillybus/xillybus_core.c
index 11b7c4749274..7fd35f055310 100644
--- a/drivers/char/xillybus/xillybus_core.c
+++ b/drivers/char/xillybus/xillybus_core.c
@@ -1431,11 +1431,15 @@ static int xillybus_open(struct inode *inode, struct file *filp)
struct xilly_endpoint *endpoint;
struct xilly_channel *channel;
int index;
+ struct mutex *to_unlock; /* Mutex locked by xillybus_find_inode() */
- rc = xillybus_find_inode(inode, (void **)&endpoint, &index);
+ rc = xillybus_find_inode(inode, &to_unlock,
+ (void **)&endpoint, &index);
if (rc)
return rc;
+ mutex_unlock(to_unlock);
+
if (endpoint->fatal_error)
return -EIO;
diff --git a/drivers/char/xillybus/xillyusb.c b/drivers/char/xillybus/xillyusb.c
index 39bcbfd908b4..63e94d935067 100644
--- a/drivers/char/xillybus/xillyusb.c
+++ b/drivers/char/xillybus/xillyusb.c
@@ -1236,8 +1236,9 @@ static int xillyusb_open(struct inode *inode, struct file *filp)
struct xillyusb_endpoint *out_ep = NULL;
int rc;
int index;
+ struct mutex *to_unlock; /* Mutex locked by xillybus_find_inode() */
- rc = xillybus_find_inode(inode, (void **)&xdev, &index);
+ rc = xillybus_find_inode(inode, &to_unlock, (void **)&xdev, &index);
if (rc)
return rc;
@@ -1245,6 +1246,7 @@ static int xillyusb_open(struct inode *inode, struct file *filp)
filp->private_data = chan;
mutex_lock(&chan->lock);
+ mutex_unlock(to_unlock);
rc = -ENODEV;
--
2.17.1
On Wed, Oct 26, 2022 at 11:52:40AM +0300, Eli Billauer wrote:
> xillybus_find_inode() is called by xillybus_open() and xillyusb_open()
> to translate the inode's major and minor into a pointer to a relevant
> data structure and an index.
>
> But with xillyusb_open(), the data structure can be freed by
> xillyusb_disconnect() during an unintentional time gap between the
> release of the mutex that is taken by xillybus_find_inode() and the
> mutex that is subsequently taken by xillyusb_open().
>
> To fix this, xillybus_find_inode() supplies the pointer to the mutex that
> it has locked (when returning success), so xillyusb_open() releases this
> mutex only after obtaining the mutex that is specific to a device file.
> This ensures that xillyusb_disconnect() won't release anything that is in
> use.
That's really odd, and not normal at all. We don't pass around mutexes
like this as how do you know if that's allowed?
>
> This manipulation of mutexes poses no risk for a deadlock, because in
> all usage scenarios, @unit_mutex (which is locked by xillybus_find_inode)
> is always taken when no other mutex is locked. Hence a consistent locking
> order is guaranteed.
>
> xillybus_open() unlocks this mutex immediately, as this driver doesn't
> support hot unplugging anyhow.
>
> Reported-by: Hyunwoo Kim <[email protected]>
> Signed-off-by: Eli Billauer <[email protected]>
> ---
> drivers/char/xillybus/xillybus_class.c | 8 +++++---
> drivers/char/xillybus/xillybus_class.h | 2 ++
> drivers/char/xillybus/xillybus_core.c | 6 +++++-
> drivers/char/xillybus/xillyusb.c | 4 +++-
> 4 files changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/char/xillybus/xillybus_class.c b/drivers/char/xillybus/xillybus_class.c
> index 0f238648dcfe..c846dc3ed225 100644
> --- a/drivers/char/xillybus/xillybus_class.c
> +++ b/drivers/char/xillybus/xillybus_class.c
> @@ -211,6 +211,7 @@ void xillybus_cleanup_chrdev(void *private_data,
> EXPORT_SYMBOL(xillybus_cleanup_chrdev);
>
> int xillybus_find_inode(struct inode *inode,
> + struct mutex **to_unlock,
To unlock when? Who unlocks it? What is the lifespan here?
Why can't it just be part of the structure?
> void **private_data, int *index)
> {
> int minor = iminor(inode);
> @@ -227,13 +228,14 @@ int xillybus_find_inode(struct inode *inode,
> break;
> }
>
> - mutex_unlock(&unit_mutex);
> -
> - if (!unit)
> + if (!unit) {
> + mutex_unlock(&unit_mutex);
> return -ENODEV;
> + }
>
> *private_data = unit->private_data;
> *index = minor - unit->lowest_minor;
> + *to_unlock = &unit_mutex;
Why are you wanting the caller to unlock this? It's a global mutex (for
the whole file), this feels really odd.
What is this function supposed to be doing? You only return an int, and
you have some odd opaque void pointer being set. That feels wrong and
is not a normal design at all.
confused,
greg k-h
On Wed, Oct 26, 2022 at 11:52:40AM +0300, Eli Billauer wrote:
> xillybus_find_inode() is called by xillybus_open() and xillyusb_open()
> to translate the inode's major and minor into a pointer to a relevant
> data structure and an index.
>
> But with xillyusb_open(), the data structure can be freed by
> xillyusb_disconnect() during an unintentional time gap between the
> release of the mutex that is taken by xillybus_find_inode() and the
> mutex that is subsequently taken by xillyusb_open().
>
> To fix this, xillybus_find_inode() supplies the pointer to the mutex that
> it has locked (when returning success), so xillyusb_open() releases this
> mutex only after obtaining the mutex that is specific to a device file.
> This ensures that xillyusb_disconnect() won't release anything that is in
> use.
The standard way of handling this problem is different from this. The
driver defines a private mutex, and it ensures that any routine calling
*_find_inode() holds the mutex. It also ensures that the mutex is held
while a new device is being registered and while a device is being
removed.
Even that won't fix all the synchronization problems. A process can
open a device, and then after the device has been removed the process
can still try to access the device. The driver needs to ensure that
such accesses are not allowed.
Alan Stern
Hello Greg,
I'm going to follow Alan's advice, and add a mutex instead of juggling
with the existing one. So I'll prepare a new patch with a completely
different solution. I've answered your questions below, even though I'm
not sure any of that is still relevant.
On 26/10/2022 15:07, Greg KH wrote:
>> int xillybus_find_inode(struct inode *inode,
>> + struct mutex **to_unlock,
>
> To unlock when? Who unlocks it? What is the lifespan here?
xillybus_find_inode() finds the structure that represents a Xillybus
(PCIe / OF) or XillyUSB (USB) device. The idea was to delay the unlock
of the mutex until other means have been taken to prevent that structure
from being freed. Which was virtually immediately after
xillybus_find_inode() returns, but gave XillyUSB's driver a chance to
acquire another mutex before releasing this one.
XillyUSB's driver can't prevent the release of this structure before it
knows which structure it is (that's xillybus_find_inode()'s job to find
out). But because xillybus_find_inode() unlocks its mutex before
returning, there is a short period of time where the structure is
unprotected. So I thought, let's extend the life of the first mutex just
a little bit to keep the whole thing watertight.
>
> Why can't it just be part of the structure?
The problem is that this structure is either a struct xilly_endpoint
(for PCIe / OF) or a struct xillyusb_dev (for USB), and these have
virtually nothing in common. xillybus_find_inode() is used by two
completely different drivers that are grouped into a single class.
>>
>> *private_data = unit->private_data;
>> *index = minor - unit->lowest_minor;
>> + *to_unlock = &unit_mutex;
>
> Why are you wanting the caller to unlock this? It's a global mutex (for
> the whole file), this feels really odd.
As mentioned above, this gives the caller (i.e. XillyUSB's driver) a
chance to ensure that the structure isn't freed as a result of the
device's disconnection.
>
> What is this function supposed to be doing? You only return an int, and
> you have some odd opaque void pointer being set. That feels wrong and
> is not a normal design at all.
xillybus_find_inode() finds the device's structure, based upon the
inode's major and minor.
The opaque void pointer is the structure that represents the device,
either a PCIe / OF device or a XillyUSB device. Because of the
difference between a driver for PCIe / OF and a USB driver, these are
different structs, and hence represented by a void pointer. The int just
says which of the device files, that are covered by the struct, has been
opened.
Once again, all this is history, as I'm preparing a new patch, which is
going to add a separate mutex.
Regards,
Eli
Hello Alan,
On 26/10/2022 18:02, Alan Stern wrote:
> On Wed, Oct 26, 2022 at 11:52:40AM +0300, Eli Billauer wrote:
>> To fix this, xillybus_find_inode() supplies the pointer to the mutex that
>> it has locked (when returning success), so xillyusb_open() releases this
>> mutex only after obtaining the mutex that is specific to a device file.
>> This ensures that xillyusb_disconnect() won't release anything that is in
>> use.
>
> The standard way of handling this problem is different from this. The
> driver defines a private mutex, and it ensures that any routine calling
> *_find_inode() holds the mutex. It also ensures that the mutex is held
> while a new device is being registered and while a device is being
> removed.
Thanks, I'm going to follow that advice in my v2 patch.
>
> Even that won't fix all the synchronization problems. A process can
> open a device, and then after the device has been removed the process
> can still try to access the device. The driver needs to ensure that
> such accesses are not allowed.
Indeed. For that purpose, the relevant struct has a kref_counter, and an
error flag that indicates a removal among others, along with mutexes.
The problem is the time gap from the moment that the struct has been
found by xillybus_find_inode() until it has been secured with the kref.
A new mutex is going to solve that.
Regards,
Eli