2021-08-18 07:41:36

by Desmond Cheong Zhi Xi

[permalink] [raw]
Subject: [PATCH v3 0/9] drm, kernel: update locking for DRM

Hi,

The patches in this series are largely fixes and prepwork leading up to
the final patch which plugs races with modesetting rights. Most of the
fixes don't have bug reports, so comments would be very appreciated.

The biggest change from the previous version is that we convert
drm_device.master_mutex into master_rwsem, instead of introducing
master_rwsem as a third lock.

Overall, this series makes the following changes:

- Patch 1: Move master_lookup_lock into struct drm_device (enables us to
use it to protect attributes accessed by different drm_files)

- Patch 2: Add a missing master_lookup_lock in drm_master_release

- Patch 3: Fix a potential race in drm_is_current_master_locked

- Patch 4: Fix potential null ptr dereferences in drm_{auth, ioctl}

- Patch 5: Move magic_map,unique{_len} out from master_mutex's
protection into master_lookup_lock's protection (allows us to avoid
read_lock -> write_lock deadlocks)

- Patch 6: Convert master_mutex into rwsem (avoids creating a new lock)

- Patch 7: Update global mutex locking in the ioctl handler (avoids
deadlock when grabbing read lock on master_rwsem in drm_ioctl_kernel)

- Patch 8: Export task_work_add (enables us to write drm_master_flush)

- Patch 9: Plug races with drm modesetting rights

v2 -> v3:
- Unexport drm_master_flush, as suggested by Daniel Vetter.
- Merge master_mutex and master_rwsem, as suggested by Daniel Vetter.
- Export task_work_add, reported by kernel test robot.
- Make master_flush static, reported by kernel test robot.
- Move master_lookup_lock into struct drm_device.
- Add a missing lock on master_lookup_lock in drm_master_release.
- Fix a potential race in drm_is_current_master_locked.
- Fix potential null ptr dereferences in drm_{auth, ioctl}.
- Protect magic_map,unique{_len} with master_lookup_lock.
- Convert master_mutex into a rwsem.
- Update global mutex locking in the ioctl handler.

v1 -> v2 (suggested by Daniel Vetter):
- Address an additional race when drm_open runs.
- Switch from SRCU to rwsem to synchronise readers and writers.
- Implement drm_master_flush with task_work so that flushes can be
queued to run before returning to userspace without creating a new
DRM_MASTER_FLUSH ioctl flag.

Best wishes,
Desmond

Desmond Cheong Zhi Xi (9):
drm: move master_lookup_lock into drm_device
drm: hold master_lookup_lock when releasing a drm_file's master
drm: check for null master in drm_is_current_master_locked
drm: fix potential null ptr dereferences in drm_{auth,ioctl}
drm: protect magic_map,unique{_len} with master_lookup_lock
drm: convert drm_device.master_mutex into a rwsem
drm: update global mutex lock in the ioctl handler
kernel: export task_work_add
drm: avoid races with modesetting rights

drivers/gpu/drm/drm_auth.c | 108 ++++++++++++++++++++++++---------
drivers/gpu/drm/drm_debugfs.c | 4 +-
drivers/gpu/drm/drm_drv.c | 4 +-
drivers/gpu/drm/drm_file.c | 1 -
drivers/gpu/drm/drm_internal.h | 1 +
drivers/gpu/drm/drm_ioctl.c | 39 +++++++-----
drivers/gpu/drm/drm_lease.c | 1 +
include/drm/drm_auth.h | 6 +-
include/drm/drm_device.h | 27 +++++++--
include/drm/drm_file.h | 20 +++---
kernel/task_work.c | 1 +
11 files changed, 145 insertions(+), 67 deletions(-)

--
2.25.1


2021-08-18 07:42:14

by Desmond Cheong Zhi Xi

[permalink] [raw]
Subject: [PATCH v3 1/9] drm: move master_lookup_lock into drm_device

The master_lookup_lock spinlock can be useful as an inner lock for
other attributes that are currently protected by
drm_device.master_mutex.

However, since the spinlock belongs to struct drm_file, its use case
is limited to serializing accesses by a single drm_file. Moving this
lock into struct drm_device allows us to use it for structures that
are accessed by multiple drm_files, such as drm_master.magic_map.

Signed-off-by: Desmond Cheong Zhi Xi <[email protected]>
---
drivers/gpu/drm/drm_auth.c | 18 +++++++++---------
drivers/gpu/drm/drm_drv.c | 1 +
drivers/gpu/drm/drm_file.c | 1 -
include/drm/drm_device.h | 3 +++
include/drm/drm_file.h | 10 ++++------
5 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
index 60a6b21474b1..8efb58aa7d95 100644
--- a/drivers/gpu/drm/drm_auth.c
+++ b/drivers/gpu/drm/drm_auth.c
@@ -63,7 +63,7 @@

static bool drm_is_current_master_locked(struct drm_file *fpriv)
{
- lockdep_assert_once(lockdep_is_held(&fpriv->master_lookup_lock) ||
+ lockdep_assert_once(lockdep_is_held(&fpriv->minor->dev->master_lookup_lock) ||
lockdep_is_held(&fpriv->minor->dev->master_mutex));

return fpriv->is_master && drm_lease_owner(fpriv->master) == fpriv->minor->dev->master;
@@ -83,9 +83,9 @@ bool drm_is_current_master(struct drm_file *fpriv)
{
bool ret;

- spin_lock(&fpriv->master_lookup_lock);
+ spin_lock(&fpriv->minor->dev->master_lookup_lock);
ret = drm_is_current_master_locked(fpriv);
- spin_unlock(&fpriv->master_lookup_lock);
+ spin_unlock(&fpriv->minor->dev->master_lookup_lock);

return ret;
}
@@ -174,9 +174,9 @@ static int drm_new_set_master(struct drm_device *dev, struct drm_file *fpriv)
new_master = drm_master_create(dev);
if (!new_master)
return -ENOMEM;
- spin_lock(&fpriv->master_lookup_lock);
+ spin_lock(&dev->master_lookup_lock);
fpriv->master = new_master;
- spin_unlock(&fpriv->master_lookup_lock);
+ spin_unlock(&dev->master_lookup_lock);

fpriv->is_master = 1;
fpriv->authenticated = 1;
@@ -338,9 +338,9 @@ int drm_master_open(struct drm_file *file_priv)
if (!dev->master) {
ret = drm_new_set_master(dev, file_priv);
} else {
- spin_lock(&file_priv->master_lookup_lock);
+ spin_lock(&dev->master_lookup_lock);
file_priv->master = drm_master_get(dev->master);
- spin_unlock(&file_priv->master_lookup_lock);
+ spin_unlock(&dev->master_lookup_lock);
}
mutex_unlock(&dev->master_mutex);

@@ -405,13 +405,13 @@ struct drm_master *drm_file_get_master(struct drm_file *file_priv)
{
struct drm_master *master = NULL;

- spin_lock(&file_priv->master_lookup_lock);
+ spin_lock(&file_priv->minor->dev->master_lookup_lock);
if (!file_priv->master)
goto unlock;
master = drm_master_get(file_priv->master);

unlock:
- spin_unlock(&file_priv->master_lookup_lock);
+ spin_unlock(&file_priv->minor->dev->master_lookup_lock);
return master;
}
EXPORT_SYMBOL(drm_file_get_master);
diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index 7a5097467ba5..218c16f11c80 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -612,6 +612,7 @@ static int drm_dev_init(struct drm_device *dev,
mutex_init(&dev->filelist_mutex);
mutex_init(&dev->clientlist_mutex);
mutex_init(&dev->master_mutex);
+ spin_lock_init(&dev->master_lookup_lock);

ret = drmm_add_action(dev, drm_dev_init_release, NULL);
if (ret)
diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index ed25168619fc..b8679bbaea69 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -176,7 +176,6 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
init_waitqueue_head(&file->event_wait);
file->event_space = 4096; /* set aside 4k for event buffer */

- spin_lock_init(&file->master_lookup_lock);
mutex_init(&file->event_read_lock);

if (drm_core_check_feature(dev, DRIVER_GEM))
diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
index 604b1d1b2d72..506eb2784819 100644
--- a/include/drm/drm_device.h
+++ b/include/drm/drm_device.h
@@ -152,6 +152,9 @@ struct drm_device {
*/
struct mutex master_mutex;

+ /** @master_lookup_lock: Serializes &drm_file.master. */
+ spinlock_t master_lookup_lock;
+
/**
* @open_count:
*
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index a3acb7ac3550..0536e9612a46 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -227,15 +227,16 @@ struct drm_file {
* @master:
*
* Master this node is currently associated with. Protected by struct
- * &drm_device.master_mutex, and serialized by @master_lookup_lock.
+ * &drm_device.master_mutex, and serialized by
+ * &drm_device.master_lookup_lock.
*
* Only relevant if drm_is_primary_client() returns true. Note that
* this only matches &drm_device.master if the master is the currently
* active one.
*
* To update @master, both &drm_device.master_mutex and
- * @master_lookup_lock need to be held, therefore holding either of
- * them is safe and enough for the read side.
+ * &drm_device.master_lookup_lock need to be held, therefore holding
+ * either of them is safe and enough for the read side.
*
* When dereferencing this pointer, either hold struct
* &drm_device.master_mutex for the duration of the pointer's use, or
@@ -248,9 +249,6 @@ struct drm_file {
*/
struct drm_master *master;

- /** @master_lock: Serializes @master. */
- spinlock_t master_lookup_lock;
-
/** @pid: Process that opened this file. */
struct pid *pid;

--
2.25.1

2021-08-18 07:42:32

by Desmond Cheong Zhi Xi

[permalink] [raw]
Subject: [PATCH v3 2/9] drm: hold master_lookup_lock when releasing a drm_file's master

When drm_file.master changes value, the corresponding
drm_device.master_lookup_lock should be held.

In drm_master_release, a call to drm_master_put sets the
file_priv->master to NULL, so we protect this section with
drm_device.master_lookup_lock.

Signed-off-by: Desmond Cheong Zhi Xi <[email protected]>
---
drivers/gpu/drm/drm_auth.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
index 8efb58aa7d95..8c0e0dba1611 100644
--- a/drivers/gpu/drm/drm_auth.c
+++ b/drivers/gpu/drm/drm_auth.c
@@ -373,8 +373,11 @@ void drm_master_release(struct drm_file *file_priv)
}

/* drop the master reference held by the file priv */
- if (file_priv->master)
+ if (file_priv->master) {
+ spin_lock(&dev->master_lookup_lock);
drm_master_put(&file_priv->master);
+ spin_unlock(&dev->master_lookup_lock);
+ }
mutex_unlock(&dev->master_mutex);
}

--
2.25.1

2021-08-18 07:42:43

by Desmond Cheong Zhi Xi

[permalink] [raw]
Subject: [PATCH v3 3/9] drm: check for null master in drm_is_current_master_locked

There is a window after calling drm_master_release, and before a file
is freed, where drm_file can have is_master set to true, but both the
drm_file and drm_device have no master.

This could result in wrongly approving permissions in
drm_is_current_master_locked. Add a check that fpriv->master is
non-NULl to guard against this scenario.

Signed-off-by: Desmond Cheong Zhi Xi <[email protected]>
---
drivers/gpu/drm/drm_auth.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
index 8c0e0dba1611..f9267b21556e 100644
--- a/drivers/gpu/drm/drm_auth.c
+++ b/drivers/gpu/drm/drm_auth.c
@@ -66,7 +66,8 @@ static bool drm_is_current_master_locked(struct drm_file *fpriv)
lockdep_assert_once(lockdep_is_held(&fpriv->minor->dev->master_lookup_lock) ||
lockdep_is_held(&fpriv->minor->dev->master_mutex));

- return fpriv->is_master && drm_lease_owner(fpriv->master) == fpriv->minor->dev->master;
+ return (fpriv->is_master && fpriv->master &&
+ drm_lease_owner(fpriv->master) == fpriv->minor->dev->master);
}

/**
--
2.25.1

2021-08-18 07:42:59

by Desmond Cheong Zhi Xi

[permalink] [raw]
Subject: [PATCH v3 4/9] drm: fix potential null ptr dereferences in drm_{auth,ioctl}

There are three areas where we dereference struct drm_master without
checking if the pointer is non-NULL.

1. drm_getmagic is called from the ioctl_handler. Since
DRM_IOCTL_GET_MAGIC has no ioctl flags, drm_getmagic is run without
any check that drm_file.master has been set.

2. Similarly, drm_getunique is called from the ioctl_handler, but
DRM_IOCTL_GET_UNIQUE has no ioctl flags. So there is no guarantee that
drm_file.master has been set.

3. drm_master_release can also be called without having a
drm_file.master set. Here is one error path:
drm_open():
drm_open_helper():
drm_master_open():
drm_new_set_master(); <--- returns -ENOMEM,
drm_file.master not set
drm_file_free():
drm_master_release(); <--- NULL ptr dereference
(file_priv->master->magic_map)

Fix these by checking if the master pointers are NULL before use.

Signed-off-by: Desmond Cheong Zhi Xi <[email protected]>
---
drivers/gpu/drm/drm_auth.c | 16 ++++++++++++++--
drivers/gpu/drm/drm_ioctl.c | 5 +++++
2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
index f9267b21556e..b7230604496b 100644
--- a/drivers/gpu/drm/drm_auth.c
+++ b/drivers/gpu/drm/drm_auth.c
@@ -95,11 +95,18 @@ EXPORT_SYMBOL(drm_is_current_master);
int drm_getmagic(struct drm_device *dev, void *data, struct drm_file *file_priv)
{
struct drm_auth *auth = data;
+ struct drm_master *master;
int ret = 0;

mutex_lock(&dev->master_mutex);
+ master = file_priv->master;
+ if (!master) {
+ mutex_unlock(&dev->master_mutex);
+ return -EINVAL;
+ }
+
if (!file_priv->magic) {
- ret = idr_alloc(&file_priv->master->magic_map, file_priv,
+ ret = idr_alloc(&master->magic_map, file_priv,
1, 0, GFP_KERNEL);
if (ret >= 0)
file_priv->magic = ret;
@@ -355,8 +362,12 @@ void drm_master_release(struct drm_file *file_priv)

mutex_lock(&dev->master_mutex);
master = file_priv->master;
+
+ if (!master)
+ goto unlock;
+
if (file_priv->magic)
- idr_remove(&file_priv->master->magic_map, file_priv->magic);
+ idr_remove(&master->magic_map, file_priv->magic);

if (!drm_is_current_master_locked(file_priv))
goto out;
@@ -379,6 +390,7 @@ void drm_master_release(struct drm_file *file_priv)
drm_master_put(&file_priv->master);
spin_unlock(&dev->master_lookup_lock);
}
+unlock:
mutex_unlock(&dev->master_mutex);
}

diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 26f3a9ede8fe..4d029d3061d9 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -121,6 +121,11 @@ int drm_getunique(struct drm_device *dev, void *data,

mutex_lock(&dev->master_mutex);
master = file_priv->master;
+ if (!master) {
+ mutex_unlock(&dev->master_mutex);
+ return -EINVAL;
+ }
+
if (u->unique_len >= master->unique_len) {
if (copy_to_user(u->unique, master->unique, master->unique_len)) {
mutex_unlock(&dev->master_mutex);
--
2.25.1

2021-08-18 07:43:14

by Desmond Cheong Zhi Xi

[permalink] [raw]
Subject: [PATCH v3 7/9] drm: update global mutex lock in the ioctl handler

In a future patch, a read lock on drm_device.master_rwsem is
held in the ioctl handler before the check for ioctl
permissions. However, this produces the following lockdep splat:

======================================================
WARNING: possible circular locking dependency detected
5.14.0-rc6-CI-Patchwork_20831+ #1 Tainted: G U
------------------------------------------------------
kms_lease/1752 is trying to acquire lock:
ffffffff827bad88 (drm_global_mutex){+.+.}-{3:3}, at: drm_open+0x64/0x280

but task is already holding lock:
ffff88812e350108 (&dev->master_rwsem){++++}-{3:3}, at:
drm_ioctl_kernel+0xfb/0x1a0

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&dev->master_rwsem){++++}-{3:3}:
lock_acquire+0xd3/0x310
down_read+0x3b/0x140
drm_master_internal_acquire+0x1d/0x60
drm_client_modeset_commit+0x10/0x40
__drm_fb_helper_restore_fbdev_mode_unlocked+0x88/0xb0
drm_fb_helper_set_par+0x34/0x40
intel_fbdev_set_par+0x11/0x40 [i915]
fbcon_init+0x270/0x4f0
visual_init+0xc6/0x130
do_bind_con_driver+0x1de/0x2c0
do_take_over_console+0x10e/0x180
do_fbcon_takeover+0x53/0xb0
register_framebuffer+0x22d/0x310
__drm_fb_helper_initial_config_and_unlock+0x36c/0x540
intel_fbdev_initial_config+0xf/0x20 [i915]
async_run_entry_fn+0x28/0x130
process_one_work+0x26d/0x5c0
worker_thread+0x37/0x390
kthread+0x13b/0x170
ret_from_fork+0x1f/0x30

-> #1 (&helper->lock){+.+.}-{3:3}:
lock_acquire+0xd3/0x310
__mutex_lock+0xa8/0x930
__drm_fb_helper_restore_fbdev_mode_unlocked+0x44/0xb0
intel_fbdev_restore_mode+0x2b/0x50 [i915]
drm_lastclose+0x27/0x50
drm_release_noglobal+0x42/0x60
__fput+0x9e/0x250
task_work_run+0x6b/0xb0
exit_to_user_mode_prepare+0x1c5/0x1d0
syscall_exit_to_user_mode+0x19/0x50
do_syscall_64+0x46/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #0 (drm_global_mutex){+.+.}-{3:3}:
validate_chain+0xb39/0x1e70
__lock_acquire+0x5a1/0xb70
lock_acquire+0xd3/0x310
__mutex_lock+0xa8/0x930
drm_open+0x64/0x280
drm_stub_open+0x9f/0x100
chrdev_open+0x9f/0x1d0
do_dentry_open+0x14a/0x3a0
dentry_open+0x53/0x70
drm_mode_create_lease_ioctl+0x3cb/0x970
drm_ioctl_kernel+0xc9/0x1a0
drm_ioctl+0x201/0x3d0
__x64_sys_ioctl+0x6a/0xa0
do_syscall_64+0x37/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae

other info that might help us debug this:
Chain exists of:
drm_global_mutex --> &helper->lock --> &dev->master_rwsem
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&dev->master_rwsem);
lock(&helper->lock);
lock(&dev->master_rwsem);
lock(drm_global_mutex);

*** DEADLOCK ***

The lock hierarchy inversion happens because we grab the
drm_global_mutex while already holding on to master_rwsem. To avoid
this, we do some prep work to grab the drm_global_mutex before
checking for ioctl permissions.

At the same time, we update the check for the global mutex to use the
drm_dev_needs_global_mutex helper function.

Signed-off-by: Desmond Cheong Zhi Xi <[email protected]>
---
drivers/gpu/drm/drm_ioctl.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 880fc565d599..2cb57378a787 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -779,19 +779,19 @@ long drm_ioctl_kernel(struct file *file, drm_ioctl_t *func, void *kdata,
if (drm_dev_is_unplugged(dev))
return -ENODEV;

+ /* Enforce sane locking for modern driver ioctls. */
+ if (unlikely(drm_dev_needs_global_mutex(dev)) && !(flags & DRM_UNLOCKED))
+ mutex_lock(&drm_global_mutex);
+
retcode = drm_ioctl_permit(flags, file_priv);
if (unlikely(retcode))
- return retcode;
+ goto out;

- /* Enforce sane locking for modern driver ioctls. */
- if (likely(!drm_core_check_feature(dev, DRIVER_LEGACY)) ||
- (flags & DRM_UNLOCKED))
- retcode = func(dev, kdata, file_priv);
- else {
- mutex_lock(&drm_global_mutex);
- retcode = func(dev, kdata, file_priv);
+ retcode = func(dev, kdata, file_priv);
+
+out:
+ if (unlikely(drm_dev_needs_global_mutex(dev)) && !(flags & DRM_UNLOCKED))
mutex_unlock(&drm_global_mutex);
- }
return retcode;
}
EXPORT_SYMBOL(drm_ioctl_kernel);
--
2.25.1

2021-08-18 07:43:14

by Desmond Cheong Zhi Xi

[permalink] [raw]
Subject: [PATCH v3 5/9] drm: protect magic_map,unique{_len} with master_lookup_lock

Currently, drm_device.master_mutex is used to serialize writes to the
drm_master.magic_map idr and to protect drm_master.unique{_len}.

In preparation for converting drm_device.master_mutex into an outer
rwsem that might be read locked before entering some of these
functions, we can instead serialize access to drm_master.magic_map and
drm_master.unique{_len} using drm_device.master_lookup_lock which is
an inner lock.

Signed-off-by: Desmond Cheong Zhi Xi <[email protected]>
---
drivers/gpu/drm/drm_auth.c | 12 +++++++-----
drivers/gpu/drm/drm_ioctl.c | 10 ++++++----
include/drm/drm_auth.h | 6 +++---
include/drm/drm_device.h | 7 ++++++-
4 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
index b7230604496b..0acb444fbbac 100644
--- a/drivers/gpu/drm/drm_auth.c
+++ b/drivers/gpu/drm/drm_auth.c
@@ -98,10 +98,10 @@ int drm_getmagic(struct drm_device *dev, void *data, struct drm_file *file_priv)
struct drm_master *master;
int ret = 0;

- mutex_lock(&dev->master_mutex);
+ spin_lock(&dev->master_lookup_lock);
master = file_priv->master;
if (!master) {
- mutex_unlock(&dev->master_mutex);
+ spin_unlock(&dev->master_lookup_lock);
return -EINVAL;
}

@@ -112,7 +112,7 @@ int drm_getmagic(struct drm_device *dev, void *data, struct drm_file *file_priv)
file_priv->magic = ret;
}
auth->magic = file_priv->magic;
- mutex_unlock(&dev->master_mutex);
+ spin_unlock(&dev->master_lookup_lock);

DRM_DEBUG("%u\n", auth->magic);

@@ -127,13 +127,13 @@ int drm_authmagic(struct drm_device *dev, void *data,

DRM_DEBUG("%u\n", auth->magic);

- mutex_lock(&dev->master_mutex);
+ spin_lock(&dev->master_lookup_lock);
file = idr_find(&file_priv->master->magic_map, auth->magic);
if (file) {
file->authenticated = 1;
idr_replace(&file_priv->master->magic_map, NULL, auth->magic);
}
- mutex_unlock(&dev->master_mutex);
+ spin_unlock(&dev->master_lookup_lock);

return file ? 0 : -EINVAL;
}
@@ -366,8 +366,10 @@ void drm_master_release(struct drm_file *file_priv)
if (!master)
goto unlock;

+ spin_lock(&dev->master_lookup_lock);
if (file_priv->magic)
idr_remove(&master->magic_map, file_priv->magic);
+ spin_unlock(&dev->master_lookup_lock);

if (!drm_is_current_master_locked(file_priv))
goto out;
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 4d029d3061d9..e5c3845b6e62 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -119,21 +119,21 @@ int drm_getunique(struct drm_device *dev, void *data,
struct drm_unique *u = data;
struct drm_master *master;

- mutex_lock(&dev->master_mutex);
+ spin_lock(&dev->master_lookup_lock);
master = file_priv->master;
if (!master) {
- mutex_unlock(&dev->master_mutex);
+ spin_unlock(&dev->master_lookup_lock);
return -EINVAL;
}

if (u->unique_len >= master->unique_len) {
if (copy_to_user(u->unique, master->unique, master->unique_len)) {
- mutex_unlock(&dev->master_mutex);
+ spin_unlock(&dev->master_lookup_lock);
return -EFAULT;
}
}
u->unique_len = master->unique_len;
- mutex_unlock(&dev->master_mutex);
+ spin_unlock(&dev->master_lookup_lock);

return 0;
}
@@ -405,7 +405,9 @@ static int drm_setversion(struct drm_device *dev, void *data, struct drm_file *f
* Version 1.1 includes tying of DRM to specific device
* Version 1.4 has proper PCI domain support
*/
+ spin_lock(&dev->master_lookup_lock);
retcode = drm_set_busid(dev, file_priv);
+ spin_unlock(&dev->master_lookup_lock);
if (retcode)
goto done;
}
diff --git a/include/drm/drm_auth.h b/include/drm/drm_auth.h
index ba248ca8866f..f5be73153798 100644
--- a/include/drm/drm_auth.h
+++ b/include/drm/drm_auth.h
@@ -67,17 +67,17 @@ struct drm_master {
struct drm_device *dev;
/**
* @unique: Unique identifier: e.g. busid. Protected by
- * &drm_device.master_mutex.
+ * &drm_device.master_lookup_lock.
*/
char *unique;
/**
* @unique_len: Length of unique field. Protected by
- * &drm_device.master_mutex.
+ * &drm_device.master_lookup_lock.
*/
int unique_len;
/**
* @magic_map: Map of used authentication tokens. Protected by
- * &drm_device.master_mutex.
+ * &drm_device.master_lookup_lock.
*/
struct idr magic_map;
void *driver_priv;
diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
index 506eb2784819..cf5d15aeb25f 100644
--- a/include/drm/drm_device.h
+++ b/include/drm/drm_device.h
@@ -152,7 +152,12 @@ struct drm_device {
*/
struct mutex master_mutex;

- /** @master_lookup_lock: Serializes &drm_file.master. */
+ /**
+ * @master_lookup_lock:
+ *
+ * Serializes &drm_file.master, &drm_master.magic_map,
+ * &drm_master.unique, and &drm_master.unique_len.
+ */
spinlock_t master_lookup_lock;

/**
--
2.25.1

2021-08-18 07:43:41

by Desmond Cheong Zhi Xi

[permalink] [raw]
Subject: [PATCH v3 6/9] drm: convert drm_device.master_mutex into a rwsem

drm_device.master_mutex currently protects the following:
- drm_device.master
- drm_file.master
- drm_file.was_master
- drm_file.is_master

These attributes determine modesetting permissions for drm devices,
and there is a clear separation between functions that read or change
the permissions. Hence, convert master_mutex into a rwsem to enable
concurrent readers.

Signed-off-by: Desmond Cheong Zhi Xi <[email protected]>
---
drivers/gpu/drm/drm_auth.c | 27 ++++++++++++++-------------
drivers/gpu/drm/drm_debugfs.c | 4 ++--
drivers/gpu/drm/drm_drv.c | 3 +--
drivers/gpu/drm/drm_ioctl.c | 4 ++--
include/drm/drm_device.h | 11 +++++++----
include/drm/drm_file.h | 12 ++++++------
6 files changed, 32 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
index 0acb444fbbac..b65681ff42fc 100644
--- a/drivers/gpu/drm/drm_auth.c
+++ b/drivers/gpu/drm/drm_auth.c
@@ -64,7 +64,7 @@
static bool drm_is_current_master_locked(struct drm_file *fpriv)
{
lockdep_assert_once(lockdep_is_held(&fpriv->minor->dev->master_lookup_lock) ||
- lockdep_is_held(&fpriv->minor->dev->master_mutex));
+ lockdep_is_held(&fpriv->minor->dev->master_rwsem));

return (fpriv->is_master && fpriv->master &&
drm_lease_owner(fpriv->master) == fpriv->minor->dev->master);
@@ -175,7 +175,7 @@ static int drm_new_set_master(struct drm_device *dev, struct drm_file *fpriv)
struct drm_master *old_master;
struct drm_master *new_master;

- lockdep_assert_held_once(&dev->master_mutex);
+ lockdep_assert_held_once(&dev->master_rwsem);

WARN_ON(fpriv->is_master);
old_master = fpriv->master;
@@ -257,7 +257,7 @@ int drm_setmaster_ioctl(struct drm_device *dev, void *data,
{
int ret;

- mutex_lock(&dev->master_mutex);
+ down_write(&dev->master_rwsem);

ret = drm_master_check_perm(dev, file_priv);
if (ret)
@@ -289,7 +289,7 @@ int drm_setmaster_ioctl(struct drm_device *dev, void *data,

drm_set_master(dev, file_priv, false);
out_unlock:
- mutex_unlock(&dev->master_mutex);
+ up_write(&dev->master_rwsem);
return ret;
}

@@ -306,7 +306,7 @@ int drm_dropmaster_ioctl(struct drm_device *dev, void *data,
{
int ret;

- mutex_lock(&dev->master_mutex);
+ down_write(&dev->master_rwsem);

ret = drm_master_check_perm(dev, file_priv);
if (ret)
@@ -329,8 +329,9 @@ int drm_dropmaster_ioctl(struct drm_device *dev, void *data,
}

drm_drop_master(dev, file_priv);
+
out_unlock:
- mutex_unlock(&dev->master_mutex);
+ up_write(&dev->master_rwsem);
return ret;
}

@@ -342,7 +343,7 @@ int drm_master_open(struct drm_file *file_priv)
/* if there is no current master make this fd it, but do not create
* any master object for render clients
*/
- mutex_lock(&dev->master_mutex);
+ down_write(&dev->master_rwsem);
if (!dev->master) {
ret = drm_new_set_master(dev, file_priv);
} else {
@@ -350,7 +351,7 @@ int drm_master_open(struct drm_file *file_priv)
file_priv->master = drm_master_get(dev->master);
spin_unlock(&dev->master_lookup_lock);
}
- mutex_unlock(&dev->master_mutex);
+ up_write(&dev->master_rwsem);

return ret;
}
@@ -360,7 +361,7 @@ void drm_master_release(struct drm_file *file_priv)
struct drm_device *dev = file_priv->minor->dev;
struct drm_master *master;

- mutex_lock(&dev->master_mutex);
+ down_write(&dev->master_rwsem);
master = file_priv->master;

if (!master)
@@ -393,7 +394,7 @@ void drm_master_release(struct drm_file *file_priv)
spin_unlock(&dev->master_lookup_lock);
}
unlock:
- mutex_unlock(&dev->master_mutex);
+ up_write(&dev->master_rwsem);
}

/**
@@ -468,9 +469,9 @@ EXPORT_SYMBOL(drm_master_put);
/* Used by drm_client and drm_fb_helper */
bool drm_master_internal_acquire(struct drm_device *dev)
{
- mutex_lock(&dev->master_mutex);
+ down_read(&dev->master_rwsem);
if (dev->master) {
- mutex_unlock(&dev->master_mutex);
+ up_read(&dev->master_rwsem);
return false;
}

@@ -481,6 +482,6 @@ EXPORT_SYMBOL(drm_master_internal_acquire);
/* Used by drm_client and drm_fb_helper */
void drm_master_internal_release(struct drm_device *dev)
{
- mutex_unlock(&dev->master_mutex);
+ up_read(&dev->master_rwsem);
}
EXPORT_SYMBOL(drm_master_internal_release);
diff --git a/drivers/gpu/drm/drm_debugfs.c b/drivers/gpu/drm/drm_debugfs.c
index b0a826489488..b34c9c263188 100644
--- a/drivers/gpu/drm/drm_debugfs.c
+++ b/drivers/gpu/drm/drm_debugfs.c
@@ -55,7 +55,7 @@ static int drm_name_info(struct seq_file *m, void *data)
struct drm_device *dev = minor->dev;
struct drm_master *master;

- mutex_lock(&dev->master_mutex);
+ down_read(&dev->master_rwsem);
master = dev->master;
seq_printf(m, "%s", dev->driver->name);
if (dev->dev)
@@ -65,7 +65,7 @@ static int drm_name_info(struct seq_file *m, void *data)
if (dev->unique)
seq_printf(m, " unique=%s", dev->unique);
seq_printf(m, "\n");
- mutex_unlock(&dev->master_mutex);
+ up_read(&dev->master_rwsem);

return 0;
}
diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index 218c16f11c80..d61079ae6259 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -570,7 +570,6 @@ static void drm_dev_init_release(struct drm_device *dev, void *res)
/* Prevent use-after-free in drm_managed_release when debugging is
* enabled. Slightly awkward, but can't really be helped. */
dev->dev = NULL;
- mutex_destroy(&dev->master_mutex);
mutex_destroy(&dev->clientlist_mutex);
mutex_destroy(&dev->filelist_mutex);
mutex_destroy(&dev->struct_mutex);
@@ -611,7 +610,7 @@ static int drm_dev_init(struct drm_device *dev,
mutex_init(&dev->struct_mutex);
mutex_init(&dev->filelist_mutex);
mutex_init(&dev->clientlist_mutex);
- mutex_init(&dev->master_mutex);
+ init_rwsem(&dev->master_rwsem);
spin_lock_init(&dev->master_lookup_lock);

ret = drmm_add_action(dev, drm_dev_init_release, NULL);
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index e5c3845b6e62..880fc565d599 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -390,7 +390,7 @@ static int drm_setversion(struct drm_device *dev, void *data, struct drm_file *f
struct drm_set_version *sv = data;
int if_version, retcode = 0;

- mutex_lock(&dev->master_mutex);
+ down_read(&dev->master_rwsem);
if (sv->drm_di_major != -1) {
if (sv->drm_di_major != DRM_IF_MAJOR ||
sv->drm_di_minor < 0 || sv->drm_di_minor > DRM_IF_MINOR) {
@@ -427,7 +427,7 @@ static int drm_setversion(struct drm_device *dev, void *data, struct drm_file *f
sv->drm_di_minor = DRM_IF_MINOR;
sv->drm_dd_major = dev->driver->major;
sv->drm_dd_minor = dev->driver->minor;
- mutex_unlock(&dev->master_mutex);
+ up_read(&dev->master_rwsem);

return retcode;
}
diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
index cf5d15aeb25f..f1ae4570a20a 100644
--- a/include/drm/drm_device.h
+++ b/include/drm/drm_device.h
@@ -107,7 +107,7 @@ struct drm_device {
* @master:
*
* Currently active master for this device.
- * Protected by &master_mutex
+ * Protected by &master_rwsem
*/
struct drm_master *master;

@@ -146,11 +146,14 @@ struct drm_device {
struct mutex struct_mutex;

/**
- * @master_mutex:
+ * @master_rwsem:
*
- * Lock for &drm_minor.master and &drm_file.is_master
+ * Synchronizes modesetting rights between multiple users. Users that
+ * can change the modeset or display state must hold a read lock on
+ * @master_rwsem, and users that change modesetting rights should hold
+ * a write lock.
*/
- struct mutex master_mutex;
+ struct rw_semaphore master_rwsem;

/**
* @master_lookup_lock:
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index 0536e9612a46..ce213ac23c38 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -205,7 +205,7 @@ struct drm_file {
* @was_master:
*
* This client has or had, master capability. Protected by struct
- * &drm_device.master_mutex.
+ * &drm_device.master_rwsem.
*
* This is used to ensure that CAP_SYS_ADMIN is not enforced, if the
* client is or was master in the past.
@@ -216,7 +216,7 @@ struct drm_file {
* @is_master:
*
* This client is the creator of @master. Protected by struct
- * &drm_device.master_mutex.
+ * &drm_device.master_rwsem.
*
* See also the :ref:`section on primary nodes and authentication
* <drm_primary_node>`.
@@ -227,20 +227,20 @@ struct drm_file {
* @master:
*
* Master this node is currently associated with. Protected by struct
- * &drm_device.master_mutex, and serialized by
+ * &drm_device.master_rwsem, and serialized by
* &drm_device.master_lookup_lock.
*
* Only relevant if drm_is_primary_client() returns true. Note that
* this only matches &drm_device.master if the master is the currently
* active one.
*
- * To update @master, both &drm_device.master_mutex and
+ * To update @master, both &drm_device.master_rwsem and
* &drm_device.master_lookup_lock need to be held, therefore holding
* either of them is safe and enough for the read side.
*
* When dereferencing this pointer, either hold struct
- * &drm_device.master_mutex for the duration of the pointer's use, or
- * use drm_file_get_master() if struct &drm_device.master_mutex is not
+ * &drm_device.master_rwsem for the duration of the pointer's use, or
+ * use drm_file_get_master() if struct &drm_device.master_rwsem is not
* currently held and there is no other need to hold it. This prevents
* @master from being freed during use.
*
--
2.25.1

2021-08-18 07:44:23

by Desmond Cheong Zhi Xi

[permalink] [raw]
Subject: [PATCH v3 8/9] kernel: export task_work_add

The task_work_add function is needed to prevent userspace races with
DRM modesetting rights.

Some DRM ioctls can change modesetting permissions while other
concurrent users are performing modesetting. To prevent races with
userspace, such functions should flush readers of old permissions
before returning to user mode. As the function that changes
permissions might itself be a reader of the old permissions, we intend
to schedule this flush using task_work_add.

However, when DRM is compiled as a loadable kernel module without
exporting task_work_add, we get the following compilation error:

ERROR: modpost: "task_work_add" [drivers/gpu/drm/drm.ko] undefined!

Reported-by: kernel test robot <[email protected]>
Signed-off-by: Desmond Cheong Zhi Xi <[email protected]>
---
kernel/task_work.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/task_work.c b/kernel/task_work.c
index 1698fbe6f0e1..90000404af2b 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -60,6 +60,7 @@ int task_work_add(struct task_struct *task, struct callback_head *work,

return 0;
}
+EXPORT_SYMBOL(task_work_add);

/**
* task_work_cancel_match - cancel a pending work added by task_work_add()
--
2.25.1

2021-08-18 07:44:42

by Desmond Cheong Zhi Xi

[permalink] [raw]
Subject: [PATCH v3 9/9] drm: avoid races with modesetting rights

In drm_client_modeset.c and drm_fb_helper.c,
drm_master_internal_{acquire,release} are used to avoid races with DRM
userspace. These functions hold onto drm_device.master_rwsem while
committing, and bail if there's already a master.

However, there are other places where modesetting rights can race. A
time-of-check-to-time-of-use error can occur if an ioctl that changes
the modeset has its rights revoked after it validates its permissions,
but before it completes.

There are four places where modesetting permissions can change:

- DROP_MASTER ioctl removes rights for a master and its leases

- REVOKE_LEASE ioctl revokes rights for a specific lease

- SET_MASTER ioctl sets the device master if the master role hasn't
been acquired yet

- drm_open which can create a new master for a device if one does not
currently exist

These races can be avoided using drm_device.master_rwsem: users that
perform modesetting should hold a read lock on the new
drm_device.master_rwsem, and users that change these permissions
should either hold a write lock, or should flush readers before
returning to userspace.

Reported-by: Daniel Vetter <[email protected]>
Signed-off-by: Desmond Cheong Zhi Xi <[email protected]>
---
drivers/gpu/drm/drm_auth.c | 29 +++++++++++++++++++++++++++++
drivers/gpu/drm/drm_internal.h | 1 +
drivers/gpu/drm/drm_ioctl.c | 8 ++++++--
drivers/gpu/drm/drm_lease.c | 1 +
include/drm/drm_device.h | 10 +++++++++-
5 files changed, 46 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
index b65681ff42fc..84d00275ff8a 100644
--- a/drivers/gpu/drm/drm_auth.c
+++ b/drivers/gpu/drm/drm_auth.c
@@ -29,6 +29,7 @@
*/

#include <linux/slab.h>
+#include <linux/task_work.h>

#include <drm/drm_auth.h>
#include <drm/drm_drv.h>
@@ -127,6 +128,7 @@ int drm_authmagic(struct drm_device *dev, void *data,

DRM_DEBUG("%u\n", auth->magic);

+ lockdep_assert_held_once(&dev->master_rwsem);
spin_lock(&dev->master_lookup_lock);
file = idr_find(&file_priv->master->magic_map, auth->magic);
if (file) {
@@ -485,3 +487,30 @@ void drm_master_internal_release(struct drm_device *dev)
up_read(&dev->master_rwsem);
}
EXPORT_SYMBOL(drm_master_internal_release);
+
+/* After flushing, all readers that might have seen old master/lease
+ * permissions are guaranteed to have completed.
+ */
+static void master_flush(struct callback_head *cb)
+{
+ struct drm_device *dev = container_of(cb, struct drm_device,
+ master_flush_work);
+
+ down_write(&dev->master_rwsem);
+ up_write(&dev->master_rwsem);
+}
+
+/* Queues up work to flush all readers of master/lease permissions. This work
+ * is run before this task returns to user mode. Calling this function when a
+ * task changes modesetting rights ensures that other processes that perform
+ * modesetting do not race with userspace.
+ */
+void drm_master_flush(struct drm_device *dev)
+{
+ init_task_work(&dev->master_flush_work, master_flush);
+ task_work_add(current, &dev->master_flush_work, TWA_RESUME);
+ /* If task_work_add fails, then the task is exiting. In this case, it
+ * doesn't matter if master_flush is run, so we don't need an
+ * alternative mechanism for flushing.
+ */
+}
diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 17f3548c8ed2..b1cd39338756 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -144,6 +144,7 @@ int drm_master_open(struct drm_file *file_priv);
void drm_master_release(struct drm_file *file_priv);
bool drm_master_internal_acquire(struct drm_device *dev);
void drm_master_internal_release(struct drm_device *dev);
+void drm_master_flush(struct drm_device *dev);

/* drm_sysfs.c */
extern struct class *drm_class;
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 2cb57378a787..7f523e1c5650 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -390,7 +390,7 @@ static int drm_setversion(struct drm_device *dev, void *data, struct drm_file *f
struct drm_set_version *sv = data;
int if_version, retcode = 0;

- down_read(&dev->master_rwsem);
+ lockdep_assert_held_once(&dev->master_rwsem);
if (sv->drm_di_major != -1) {
if (sv->drm_di_major != DRM_IF_MAJOR ||
sv->drm_di_minor < 0 || sv->drm_di_minor > DRM_IF_MINOR) {
@@ -427,7 +427,6 @@ static int drm_setversion(struct drm_device *dev, void *data, struct drm_file *f
sv->drm_di_minor = DRM_IF_MINOR;
sv->drm_dd_major = dev->driver->major;
sv->drm_dd_minor = dev->driver->minor;
- up_read(&dev->master_rwsem);

return retcode;
}
@@ -783,6 +782,9 @@ long drm_ioctl_kernel(struct file *file, drm_ioctl_t *func, void *kdata,
if (unlikely(drm_dev_needs_global_mutex(dev)) && !(flags & DRM_UNLOCKED))
mutex_lock(&drm_global_mutex);

+ if (unlikely(flags & DRM_MASTER))
+ down_read(&dev->master_rwsem);
+
retcode = drm_ioctl_permit(flags, file_priv);
if (unlikely(retcode))
goto out;
@@ -790,6 +792,8 @@ long drm_ioctl_kernel(struct file *file, drm_ioctl_t *func, void *kdata,
retcode = func(dev, kdata, file_priv);

out:
+ if (unlikely(flags & DRM_MASTER))
+ up_read(&dev->master_rwsem);
if (unlikely(drm_dev_needs_global_mutex(dev)) && !(flags & DRM_UNLOCKED))
mutex_unlock(&drm_global_mutex);
return retcode;
diff --git a/drivers/gpu/drm/drm_lease.c b/drivers/gpu/drm/drm_lease.c
index dee4f24a1808..983701198ffd 100644
--- a/drivers/gpu/drm/drm_lease.c
+++ b/drivers/gpu/drm/drm_lease.c
@@ -723,6 +723,7 @@ int drm_mode_revoke_lease_ioctl(struct drm_device *dev,
}

_drm_lease_revoke(lessee);
+ drm_master_flush(dev);

fail:
mutex_unlock(&dev->mode_config.idr_mutex);
diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
index f1ae4570a20a..617f7fe1d3bf 100644
--- a/include/drm/drm_device.h
+++ b/include/drm/drm_device.h
@@ -151,10 +151,18 @@ struct drm_device {
* Synchronizes modesetting rights between multiple users. Users that
* can change the modeset or display state must hold a read lock on
* @master_rwsem, and users that change modesetting rights should hold
- * a write lock.
+ * a write lock or flush readers before returning to userspace.
*/
struct rw_semaphore master_rwsem;

+ /**
+ * @master_flush_work:
+ *
+ * Callback structure used internally to queue work to flush readers of
+ * master/lease permissions.
+ */
+ struct callback_head master_flush_work;
+
/**
* @master_lookup_lock:
*
--
2.25.1

2021-08-18 10:06:30

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH v3 3/9] drm: check for null master in drm_is_current_master_locked

On Wed, Aug 18, 2021 at 03:38:18PM +0800, Desmond Cheong Zhi Xi wrote:
> There is a window after calling drm_master_release, and before a file
> is freed, where drm_file can have is_master set to true, but both the
> drm_file and drm_device have no master.
>
> This could result in wrongly approving permissions in
> drm_is_current_master_locked. Add a check that fpriv->master is
> non-NULl to guard against this scenario.
>
> Signed-off-by: Desmond Cheong Zhi Xi <[email protected]>

This should be impossible, drm_master_release is only called when the
struct file is released, which means all ioctls and anything else have
finished (they hold a temporary reference).

fpriv->master can change (if the drm_file becomes newly minted master and
wasnt one before through the setmaster ioctl), but it cannot become NULL
before it's completely gone from the system.
-Daniel


> ---
> drivers/gpu/drm/drm_auth.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
> index 8c0e0dba1611..f9267b21556e 100644
> --- a/drivers/gpu/drm/drm_auth.c
> +++ b/drivers/gpu/drm/drm_auth.c
> @@ -66,7 +66,8 @@ static bool drm_is_current_master_locked(struct drm_file *fpriv)
> lockdep_assert_once(lockdep_is_held(&fpriv->minor->dev->master_lookup_lock) ||
> lockdep_is_held(&fpriv->minor->dev->master_mutex));
>
> - return fpriv->is_master && drm_lease_owner(fpriv->master) == fpriv->minor->dev->master;
> + return (fpriv->is_master && fpriv->master &&
> + drm_lease_owner(fpriv->master) == fpriv->minor->dev->master);
> }
>
> /**
> --
> 2.25.1
>

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2021-08-18 10:10:12

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH v3 2/9] drm: hold master_lookup_lock when releasing a drm_file's master

On Wed, Aug 18, 2021 at 03:38:17PM +0800, Desmond Cheong Zhi Xi wrote:
> When drm_file.master changes value, the corresponding
> drm_device.master_lookup_lock should be held.
>
> In drm_master_release, a call to drm_master_put sets the
> file_priv->master to NULL, so we protect this section with
> drm_device.master_lookup_lock.
>
> Signed-off-by: Desmond Cheong Zhi Xi <[email protected]>

At this points all refcounts to drm_file have disappeared, so yeah this is
a lockless access, but also no one can observe it anymore. See also next
patch.

Hence I think the current code is fine.
-Daniel

> ---
> drivers/gpu/drm/drm_auth.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
> index 8efb58aa7d95..8c0e0dba1611 100644
> --- a/drivers/gpu/drm/drm_auth.c
> +++ b/drivers/gpu/drm/drm_auth.c
> @@ -373,8 +373,11 @@ void drm_master_release(struct drm_file *file_priv)
> }
>
> /* drop the master reference held by the file priv */
> - if (file_priv->master)
> + if (file_priv->master) {
> + spin_lock(&dev->master_lookup_lock);
> drm_master_put(&file_priv->master);
> + spin_unlock(&dev->master_lookup_lock);
> + }
> mutex_unlock(&dev->master_mutex);
> }
>
> --
> 2.25.1
>

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2021-08-18 10:13:13

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH v3 4/9] drm: fix potential null ptr dereferences in drm_{auth,ioctl}

On Wed, Aug 18, 2021 at 03:38:19PM +0800, Desmond Cheong Zhi Xi wrote:
> There are three areas where we dereference struct drm_master without
> checking if the pointer is non-NULL.
>
> 1. drm_getmagic is called from the ioctl_handler. Since
> DRM_IOCTL_GET_MAGIC has no ioctl flags, drm_getmagic is run without
> any check that drm_file.master has been set.
>
> 2. Similarly, drm_getunique is called from the ioctl_handler, but
> DRM_IOCTL_GET_UNIQUE has no ioctl flags. So there is no guarantee that
> drm_file.master has been set.

I think the above two are impossible, due to the refcounting rules for
struct file.

> 3. drm_master_release can also be called without having a
> drm_file.master set. Here is one error path:
> drm_open():
> drm_open_helper():
> drm_master_open():
> drm_new_set_master(); <--- returns -ENOMEM,
> drm_file.master not set
> drm_file_free():
> drm_master_release(); <--- NULL ptr dereference
> (file_priv->master->magic_map)
>
> Fix these by checking if the master pointers are NULL before use.
>
> Signed-off-by: Desmond Cheong Zhi Xi <[email protected]>
> ---
> drivers/gpu/drm/drm_auth.c | 16 ++++++++++++++--
> drivers/gpu/drm/drm_ioctl.c | 5 +++++
> 2 files changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
> index f9267b21556e..b7230604496b 100644
> --- a/drivers/gpu/drm/drm_auth.c
> +++ b/drivers/gpu/drm/drm_auth.c
> @@ -95,11 +95,18 @@ EXPORT_SYMBOL(drm_is_current_master);
> int drm_getmagic(struct drm_device *dev, void *data, struct drm_file *file_priv)
> {
> struct drm_auth *auth = data;
> + struct drm_master *master;
> int ret = 0;
>
> mutex_lock(&dev->master_mutex);
> + master = file_priv->master;
> + if (!master) {
> + mutex_unlock(&dev->master_mutex);
> + return -EINVAL;
> + }
> +
> if (!file_priv->magic) {
> - ret = idr_alloc(&file_priv->master->magic_map, file_priv,
> + ret = idr_alloc(&master->magic_map, file_priv,
> 1, 0, GFP_KERNEL);
> if (ret >= 0)
> file_priv->magic = ret;
> @@ -355,8 +362,12 @@ void drm_master_release(struct drm_file *file_priv)
>
> mutex_lock(&dev->master_mutex);
> master = file_priv->master;
> +
> + if (!master)
> + goto unlock;

This is a bit convoluted, since we're in the single-threaded release path
we don't need any locking for file_priv related things. Therefore we can
pull the master check out and just directly return.

But since it's a bit surprising maybe a comment that this can happen when
drm_master_open in drm_open_helper fails?

Another option, and maybe cleaner, would be to move the drm_master_release
from drm_file_free into drm_close_helper. That would be fully symmetrical
and should also fix the bug here?
-Daniel


> +
> if (file_priv->magic)
> - idr_remove(&file_priv->master->magic_map, file_priv->magic);
> + idr_remove(&master->magic_map, file_priv->magic);
>
> if (!drm_is_current_master_locked(file_priv))
> goto out;
> @@ -379,6 +390,7 @@ void drm_master_release(struct drm_file *file_priv)
> drm_master_put(&file_priv->master);
> spin_unlock(&dev->master_lookup_lock);
> }
> +unlock:
> mutex_unlock(&dev->master_mutex);
> }
>
> diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
> index 26f3a9ede8fe..4d029d3061d9 100644
> --- a/drivers/gpu/drm/drm_ioctl.c
> +++ b/drivers/gpu/drm/drm_ioctl.c
> @@ -121,6 +121,11 @@ int drm_getunique(struct drm_device *dev, void *data,
>
> mutex_lock(&dev->master_mutex);
> master = file_priv->master;
> + if (!master) {
> + mutex_unlock(&dev->master_mutex);
> + return -EINVAL;
> + }
> +
> if (u->unique_len >= master->unique_len) {
> if (copy_to_user(u->unique, master->unique, master->unique_len)) {
> mutex_unlock(&dev->master_mutex);
> --
> 2.25.1
>

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2021-08-18 10:45:27

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH v3 5/9] drm: protect magic_map,unique{_len} with master_lookup_lock

On Wed, Aug 18, 2021 at 03:38:20PM +0800, Desmond Cheong Zhi Xi wrote:
> Currently, drm_device.master_mutex is used to serialize writes to the
> drm_master.magic_map idr and to protect drm_master.unique{_len}.
>
> In preparation for converting drm_device.master_mutex into an outer
> rwsem that might be read locked before entering some of these
> functions, we can instead serialize access to drm_master.magic_map and
> drm_master.unique{_len} using drm_device.master_lookup_lock which is
> an inner lock.
>
> Signed-off-by: Desmond Cheong Zhi Xi <[email protected]>
> ---
> drivers/gpu/drm/drm_auth.c | 12 +++++++-----
> drivers/gpu/drm/drm_ioctl.c | 10 ++++++----
> include/drm/drm_auth.h | 6 +++---
> include/drm/drm_device.h | 7 ++++++-
> 4 files changed, 22 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
> index b7230604496b..0acb444fbbac 100644
> --- a/drivers/gpu/drm/drm_auth.c
> +++ b/drivers/gpu/drm/drm_auth.c
> @@ -98,10 +98,10 @@ int drm_getmagic(struct drm_device *dev, void *data, struct drm_file *file_priv)
> struct drm_master *master;
> int ret = 0;
>
> - mutex_lock(&dev->master_mutex);
> + spin_lock(&dev->master_lookup_lock);
> master = file_priv->master;
> if (!master) {
> - mutex_unlock(&dev->master_mutex);
> + spin_unlock(&dev->master_lookup_lock);
> return -EINVAL;
> }
>
> @@ -112,7 +112,7 @@ int drm_getmagic(struct drm_device *dev, void *data, struct drm_file *file_priv)
> file_priv->magic = ret;
> }
> auth->magic = file_priv->magic;
> - mutex_unlock(&dev->master_mutex);
> + spin_unlock(&dev->master_lookup_lock);
>
> DRM_DEBUG("%u\n", auth->magic);
>
> @@ -127,13 +127,13 @@ int drm_authmagic(struct drm_device *dev, void *data,
>
> DRM_DEBUG("%u\n", auth->magic);
>
> - mutex_lock(&dev->master_mutex);
> + spin_lock(&dev->master_lookup_lock);
> file = idr_find(&file_priv->master->magic_map, auth->magic);
> if (file) {
> file->authenticated = 1;
> idr_replace(&file_priv->master->magic_map, NULL, auth->magic);
> }
> - mutex_unlock(&dev->master_mutex);
> + spin_unlock(&dev->master_lookup_lock);
>
> return file ? 0 : -EINVAL;
> }
> @@ -366,8 +366,10 @@ void drm_master_release(struct drm_file *file_priv)
> if (!master)
> goto unlock;
>
> + spin_lock(&dev->master_lookup_lock);
> if (file_priv->magic)
> idr_remove(&master->magic_map, file_priv->magic);
> + spin_unlock(&dev->master_lookup_lock);
>
> if (!drm_is_current_master_locked(file_priv))
> goto out;
> diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
> index 4d029d3061d9..e5c3845b6e62 100644
> --- a/drivers/gpu/drm/drm_ioctl.c
> +++ b/drivers/gpu/drm/drm_ioctl.c
> @@ -119,21 +119,21 @@ int drm_getunique(struct drm_device *dev, void *data,
> struct drm_unique *u = data;
> struct drm_master *master;
>
> - mutex_lock(&dev->master_mutex);
> + spin_lock(&dev->master_lookup_lock);
> master = file_priv->master;
> if (!master) {
> - mutex_unlock(&dev->master_mutex);
> + spin_unlock(&dev->master_lookup_lock);
> return -EINVAL;
> }
>
> if (u->unique_len >= master->unique_len) {
> if (copy_to_user(u->unique, master->unique, master->unique_len)) {
> - mutex_unlock(&dev->master_mutex);
> + spin_unlock(&dev->master_lookup_lock);

copy_to_user while holding a spinlock isn't going to work well, at least
when we take a fault. The might_fault() annotations (if enabled) should
catch that.

Which is really annoying, because it kinda puts a wrench into this neat
plan here :-/

> return -EFAULT;
> }
> }
> u->unique_len = master->unique_len;
> - mutex_unlock(&dev->master_mutex);
> + spin_unlock(&dev->master_lookup_lock);
>
> return 0;
> }
> @@ -405,7 +405,9 @@ static int drm_setversion(struct drm_device *dev, void *data, struct drm_file *f
> * Version 1.1 includes tying of DRM to specific device
> * Version 1.4 has proper PCI domain support
> */
> + spin_lock(&dev->master_lookup_lock);
> retcode = drm_set_busid(dev, file_priv);
> + spin_unlock(&dev->master_lookup_lock);

Similar issue with drm_set_busid, calling kmalloc under a spinlock isn't a
good idea. This one here is at least much easier to fix by pushing the
locking down a lot.

I'm wondering a bit whether a better fix for these ioctls wouldn't be to
- drop the DRM_MASTER flag from the ioctl table
- take the rwsem in write mode (which would replace our current
dev->master_lock) and check for master status while holding that lock

I think that would result in simpler locking or am I missing something?
Maybe this could even work as a replacment for the lookup spinlock, since
we're untangling the nesting quite a bit?

Really just tossing ideas around since I feel like we don't have the best
one yet ...
-Daniel

> if (retcode)
> goto done;
> }
> diff --git a/include/drm/drm_auth.h b/include/drm/drm_auth.h
> index ba248ca8866f..f5be73153798 100644
> --- a/include/drm/drm_auth.h
> +++ b/include/drm/drm_auth.h
> @@ -67,17 +67,17 @@ struct drm_master {
> struct drm_device *dev;
> /**
> * @unique: Unique identifier: e.g. busid. Protected by
> - * &drm_device.master_mutex.
> + * &drm_device.master_lookup_lock.
> */
> char *unique;
> /**
> * @unique_len: Length of unique field. Protected by
> - * &drm_device.master_mutex.
> + * &drm_device.master_lookup_lock.
> */
> int unique_len;
> /**
> * @magic_map: Map of used authentication tokens. Protected by
> - * &drm_device.master_mutex.
> + * &drm_device.master_lookup_lock.
> */
> struct idr magic_map;
> void *driver_priv;
> diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
> index 506eb2784819..cf5d15aeb25f 100644
> --- a/include/drm/drm_device.h
> +++ b/include/drm/drm_device.h
> @@ -152,7 +152,12 @@ struct drm_device {
> */
> struct mutex master_mutex;
>
> - /** @master_lookup_lock: Serializes &drm_file.master. */
> + /**
> + * @master_lookup_lock:
> + *
> + * Serializes &drm_file.master, &drm_master.magic_map,
> + * &drm_master.unique, and &drm_master.unique_len.
> + */
> spinlock_t master_lookup_lock;
>
> /**
> --
> 2.25.1
>

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2021-08-18 11:07:25

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH v3 7/9] drm: update global mutex lock in the ioctl handler

On Wed, Aug 18, 2021 at 03:38:22PM +0800, Desmond Cheong Zhi Xi wrote:
> In a future patch, a read lock on drm_device.master_rwsem is
> held in the ioctl handler before the check for ioctl
> permissions. However, this produces the following lockdep splat:
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.14.0-rc6-CI-Patchwork_20831+ #1 Tainted: G U
> ------------------------------------------------------
> kms_lease/1752 is trying to acquire lock:
> ffffffff827bad88 (drm_global_mutex){+.+.}-{3:3}, at: drm_open+0x64/0x280
>
> but task is already holding lock:
> ffff88812e350108 (&dev->master_rwsem){++++}-{3:3}, at:
> drm_ioctl_kernel+0xfb/0x1a0
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 (&dev->master_rwsem){++++}-{3:3}:
> lock_acquire+0xd3/0x310
> down_read+0x3b/0x140
> drm_master_internal_acquire+0x1d/0x60
> drm_client_modeset_commit+0x10/0x40
> __drm_fb_helper_restore_fbdev_mode_unlocked+0x88/0xb0
> drm_fb_helper_set_par+0x34/0x40
> intel_fbdev_set_par+0x11/0x40 [i915]
> fbcon_init+0x270/0x4f0
> visual_init+0xc6/0x130
> do_bind_con_driver+0x1de/0x2c0
> do_take_over_console+0x10e/0x180
> do_fbcon_takeover+0x53/0xb0
> register_framebuffer+0x22d/0x310
> __drm_fb_helper_initial_config_and_unlock+0x36c/0x540
> intel_fbdev_initial_config+0xf/0x20 [i915]
> async_run_entry_fn+0x28/0x130
> process_one_work+0x26d/0x5c0
> worker_thread+0x37/0x390
> kthread+0x13b/0x170
> ret_from_fork+0x1f/0x30
>
> -> #1 (&helper->lock){+.+.}-{3:3}:
> lock_acquire+0xd3/0x310
> __mutex_lock+0xa8/0x930
> __drm_fb_helper_restore_fbdev_mode_unlocked+0x44/0xb0
> intel_fbdev_restore_mode+0x2b/0x50 [i915]
> drm_lastclose+0x27/0x50
> drm_release_noglobal+0x42/0x60
> __fput+0x9e/0x250
> task_work_run+0x6b/0xb0
> exit_to_user_mode_prepare+0x1c5/0x1d0
> syscall_exit_to_user_mode+0x19/0x50
> do_syscall_64+0x46/0xb0
> entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> -> #0 (drm_global_mutex){+.+.}-{3:3}:
> validate_chain+0xb39/0x1e70
> __lock_acquire+0x5a1/0xb70
> lock_acquire+0xd3/0x310
> __mutex_lock+0xa8/0x930
> drm_open+0x64/0x280
> drm_stub_open+0x9f/0x100
> chrdev_open+0x9f/0x1d0
> do_dentry_open+0x14a/0x3a0
> dentry_open+0x53/0x70
> drm_mode_create_lease_ioctl+0x3cb/0x970
> drm_ioctl_kernel+0xc9/0x1a0
> drm_ioctl+0x201/0x3d0
> __x64_sys_ioctl+0x6a/0xa0
> do_syscall_64+0x37/0xb0
> entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> other info that might help us debug this:
> Chain exists of:
> drm_global_mutex --> &helper->lock --> &dev->master_rwsem
> Possible unsafe locking scenario:
> CPU0 CPU1
> ---- ----
> lock(&dev->master_rwsem);
> lock(&helper->lock);
> lock(&dev->master_rwsem);
> lock(drm_global_mutex);
>
> *** DEADLOCK ***
>
> The lock hierarchy inversion happens because we grab the
> drm_global_mutex while already holding on to master_rwsem. To avoid
> this, we do some prep work to grab the drm_global_mutex before
> checking for ioctl permissions.
>
> At the same time, we update the check for the global mutex to use the
> drm_dev_needs_global_mutex helper function.

This is intentional, essentially we force all non-legacy drivers to have
unlocked ioctl (otherwise everyone forgets to set that flag).

For non-legacy drivers the global lock only ensures ordering between
drm_open and lastclose (I think at least), and between
drm_dev_register/unregister and the backwards ->load/unload callbacks
(which are called in the wrong place, but we cannot fix that for legacy
drivers).

->load/unload should be completely unused (maybe radeon still uses it),
and ->lastclose is also on the decline.

Maybe we should update the comment of drm_global_mutex to explain what it
protects and why.

I'm also confused how this patch connects to the splat, since for i915 we
shouldn't be taking the drm_global_lock here at all. The problem seems to
be the drm_open_helper when we create a new lease, which is an entirely
different can of worms.

I'm honestly not sure how to best do that, but we should be able to create
a file and then call drm_open_helper directly, or well a version of that
which never takes the drm_global_mutex. Because that is not needed for
nested drm_file opening:
- legacy drivers never go down this path because leases are only supported
with modesetting, and modesetting is only supported for non-legacy
drivers
- the races against dev->open_count due to last_close or ->load callbacks
don't matter, because for the entire ioctl we already have an open
drm_file and that wont disappear.

So this should work, but I'm not entirely sure how to make it work.
-Daniel

> Signed-off-by: Desmond Cheong Zhi Xi <[email protected]>
> ---
> drivers/gpu/drm/drm_ioctl.c | 18 +++++++++---------
> 1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
> index 880fc565d599..2cb57378a787 100644
> --- a/drivers/gpu/drm/drm_ioctl.c
> +++ b/drivers/gpu/drm/drm_ioctl.c
> @@ -779,19 +779,19 @@ long drm_ioctl_kernel(struct file *file, drm_ioctl_t *func, void *kdata,
> if (drm_dev_is_unplugged(dev))
> return -ENODEV;
>
> + /* Enforce sane locking for modern driver ioctls. */
> + if (unlikely(drm_dev_needs_global_mutex(dev)) && !(flags & DRM_UNLOCKED))
> + mutex_lock(&drm_global_mutex);
> +
> retcode = drm_ioctl_permit(flags, file_priv);
> if (unlikely(retcode))
> - return retcode;
> + goto out;
>
> - /* Enforce sane locking for modern driver ioctls. */
> - if (likely(!drm_core_check_feature(dev, DRIVER_LEGACY)) ||
> - (flags & DRM_UNLOCKED))
> - retcode = func(dev, kdata, file_priv);
> - else {
> - mutex_lock(&drm_global_mutex);
> - retcode = func(dev, kdata, file_priv);
> + retcode = func(dev, kdata, file_priv);
> +
> +out:
> + if (unlikely(drm_dev_needs_global_mutex(dev)) && !(flags & DRM_UNLOCKED))
> mutex_unlock(&drm_global_mutex);
> - }
> return retcode;
> }
> EXPORT_SYMBOL(drm_ioctl_kernel);
> --
> 2.25.1
>

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2021-08-18 11:10:25

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH v3 8/9] kernel: export task_work_add

On Wed, Aug 18, 2021 at 03:38:23PM +0800, Desmond Cheong Zhi Xi wrote:
> The task_work_add function is needed to prevent userspace races with
> DRM modesetting rights.
>
> Some DRM ioctls can change modesetting permissions while other
> concurrent users are performing modesetting. To prevent races with
> userspace, such functions should flush readers of old permissions
> before returning to user mode. As the function that changes
> permissions might itself be a reader of the old permissions, we intend
> to schedule this flush using task_work_add.
>
> However, when DRM is compiled as a loadable kernel module without
> exporting task_work_add, we get the following compilation error:
>
> ERROR: modpost: "task_work_add" [drivers/gpu/drm/drm.ko] undefined!
>
> Reported-by: kernel test robot <[email protected]>
> Signed-off-by: Desmond Cheong Zhi Xi <[email protected]>

Just realized another benefit of pushing the dev->master_rwsem write
locks down into ioctls that need them: We wouldn't need this function here
exported for use in drm. But also I'm not sure that works any better than
the design in your current patch set ...
-Daniel

> ---
> kernel/task_work.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/task_work.c b/kernel/task_work.c
> index 1698fbe6f0e1..90000404af2b 100644
> --- a/kernel/task_work.c
> +++ b/kernel/task_work.c
> @@ -60,6 +60,7 @@ int task_work_add(struct task_struct *task, struct callback_head *work,
>
> return 0;
> }
> +EXPORT_SYMBOL(task_work_add);
>
> /**
> * task_work_cancel_match - cancel a pending work added by task_work_add()
> --
> 2.25.1
>

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2021-08-18 14:51:06

by Desmond Cheong Zhi Xi

[permalink] [raw]
Subject: Re: [PATCH v3 2/9] drm: hold master_lookup_lock when releasing a drm_file's master

On 18/8/21 6:05 pm, Daniel Vetter wrote:
> On Wed, Aug 18, 2021 at 03:38:17PM +0800, Desmond Cheong Zhi Xi wrote:
>> When drm_file.master changes value, the corresponding
>> drm_device.master_lookup_lock should be held.
>>
>> In drm_master_release, a call to drm_master_put sets the
>> file_priv->master to NULL, so we protect this section with
>> drm_device.master_lookup_lock.
>>
>> Signed-off-by: Desmond Cheong Zhi Xi <[email protected]>
>
> At this points all refcounts to drm_file have disappeared, so yeah this is
> a lockless access, but also no one can observe it anymore. See also next
> patch.
>
> Hence I think the current code is fine.
> -Daniel
>

Ah ok got it, I didn't realize that. I'll drop patch 2 and 3 from the
series then.

>> ---
>> drivers/gpu/drm/drm_auth.c | 5 ++++-
>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
>> index 8efb58aa7d95..8c0e0dba1611 100644
>> --- a/drivers/gpu/drm/drm_auth.c
>> +++ b/drivers/gpu/drm/drm_auth.c
>> @@ -373,8 +373,11 @@ void drm_master_release(struct drm_file *file_priv)
>> }
>>
>> /* drop the master reference held by the file priv */
>> - if (file_priv->master)
>> + if (file_priv->master) {
>> + spin_lock(&dev->master_lookup_lock);
>> drm_master_put(&file_priv->master);
>> + spin_unlock(&dev->master_lookup_lock);
>> + }
>> mutex_unlock(&dev->master_mutex);
>> }
>>
>> --
>> 2.25.1
>>
>

2021-08-18 15:38:29

by Desmond Cheong Zhi Xi

[permalink] [raw]
Subject: Re: [PATCH v3 4/9] drm: fix potential null ptr dereferences in drm_{auth,ioctl}

On 18/8/21 6:11 pm, Daniel Vetter wrote:
> On Wed, Aug 18, 2021 at 03:38:19PM +0800, Desmond Cheong Zhi Xi wrote:
>> There are three areas where we dereference struct drm_master without
>> checking if the pointer is non-NULL.
>>
>> 1. drm_getmagic is called from the ioctl_handler. Since
>> DRM_IOCTL_GET_MAGIC has no ioctl flags, drm_getmagic is run without
>> any check that drm_file.master has been set.
>>
>> 2. Similarly, drm_getunique is called from the ioctl_handler, but
>> DRM_IOCTL_GET_UNIQUE has no ioctl flags. So there is no guarantee that
>> drm_file.master has been set.
>
> I think the above two are impossible, due to the refcounting rules for
> struct file.
>

Right, will drop those two parts from the patch.

>> 3. drm_master_release can also be called without having a
>> drm_file.master set. Here is one error path:
>> drm_open():
>> drm_open_helper():
>> drm_master_open():
>> drm_new_set_master(); <--- returns -ENOMEM,
>> drm_file.master not set
>> drm_file_free():
>> drm_master_release(); <--- NULL ptr dereference
>> (file_priv->master->magic_map)
>>
>> Fix these by checking if the master pointers are NULL before use.
>>
>> Signed-off-by: Desmond Cheong Zhi Xi <[email protected]>
>> ---
>> drivers/gpu/drm/drm_auth.c | 16 ++++++++++++++--
>> drivers/gpu/drm/drm_ioctl.c | 5 +++++
>> 2 files changed, 19 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
>> index f9267b21556e..b7230604496b 100644
>> --- a/drivers/gpu/drm/drm_auth.c
>> +++ b/drivers/gpu/drm/drm_auth.c
>> @@ -95,11 +95,18 @@ EXPORT_SYMBOL(drm_is_current_master);
>> int drm_getmagic(struct drm_device *dev, void *data, struct drm_file *file_priv)
>> {
>> struct drm_auth *auth = data;
>> + struct drm_master *master;
>> int ret = 0;
>>
>> mutex_lock(&dev->master_mutex);
>> + master = file_priv->master;
>> + if (!master) {
>> + mutex_unlock(&dev->master_mutex);
>> + return -EINVAL;
>> + }
>> +
>> if (!file_priv->magic) {
>> - ret = idr_alloc(&file_priv->master->magic_map, file_priv,
>> + ret = idr_alloc(&master->magic_map, file_priv,
>> 1, 0, GFP_KERNEL);
>> if (ret >= 0)
>> file_priv->magic = ret;
>> @@ -355,8 +362,12 @@ void drm_master_release(struct drm_file *file_priv)
>>
>> mutex_lock(&dev->master_mutex);
>> master = file_priv->master;
>> +
>> + if (!master)
>> + goto unlock;
>
> This is a bit convoluted, since we're in the single-threaded release path
> we don't need any locking for file_priv related things. Therefore we can
> pull the master check out and just directly return.
>
> But since it's a bit surprising maybe a comment that this can happen when
> drm_master_open in drm_open_helper fails?
>

Sounds good. This can actually also happen in the failure path of
mock_drm_getfile if anon_inode_getfile fails. I'll leave a short note
about both of them.

> Another option, and maybe cleaner, would be to move the drm_master_release
> from drm_file_free into drm_close_helper. That would be fully symmetrical
> and should also fix the bug here?
> -Daniel
>
Hmmm maybe the first option to move the check out of the lock might be
better. If I'm not wrong, we would otherwise also need to move
drm_master_release into drm_client_close.

>
>> +
>> if (file_priv->magic)
>> - idr_remove(&file_priv->master->magic_map, file_priv->magic);
>> + idr_remove(&master->magic_map, file_priv->magic);
>>
>> if (!drm_is_current_master_locked(file_priv))
>> goto out;
>> @@ -379,6 +390,7 @@ void drm_master_release(struct drm_file *file_priv)
>> drm_master_put(&file_priv->master);
>> spin_unlock(&dev->master_lookup_lock);
>> }
>> +unlock:
>> mutex_unlock(&dev->master_mutex);
>> }
>>
>> diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
>> index 26f3a9ede8fe..4d029d3061d9 100644
>> --- a/drivers/gpu/drm/drm_ioctl.c
>> +++ b/drivers/gpu/drm/drm_ioctl.c
>> @@ -121,6 +121,11 @@ int drm_getunique(struct drm_device *dev, void *data,
>>
>> mutex_lock(&dev->master_mutex);
>> master = file_priv->master;
>> + if (!master) {
>> + mutex_unlock(&dev->master_mutex);
>> + return -EINVAL;
>> + }
>> +
>> if (u->unique_len >= master->unique_len) {
>> if (copy_to_user(u->unique, master->unique, master->unique_len)) {
>> mutex_unlock(&dev->master_mutex);
>> --
>> 2.25.1
>>
>

2021-08-18 16:36:01

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH v3 4/9] drm: fix potential null ptr dereferences in drm_{auth, ioctl}

On Wed, Aug 18, 2021 at 5:37 PM Desmond Cheong Zhi Xi
<[email protected]> wrote:
>
> On 18/8/21 6:11 pm, Daniel Vetter wrote:
> > On Wed, Aug 18, 2021 at 03:38:19PM +0800, Desmond Cheong Zhi Xi wrote:
> >> There are three areas where we dereference struct drm_master without
> >> checking if the pointer is non-NULL.
> >>
> >> 1. drm_getmagic is called from the ioctl_handler. Since
> >> DRM_IOCTL_GET_MAGIC has no ioctl flags, drm_getmagic is run without
> >> any check that drm_file.master has been set.
> >>
> >> 2. Similarly, drm_getunique is called from the ioctl_handler, but
> >> DRM_IOCTL_GET_UNIQUE has no ioctl flags. So there is no guarantee that
> >> drm_file.master has been set.
> >
> > I think the above two are impossible, due to the refcounting rules for
> > struct file.
> >
>
> Right, will drop those two parts from the patch.
>
> >> 3. drm_master_release can also be called without having a
> >> drm_file.master set. Here is one error path:
> >> drm_open():
> >> drm_open_helper():
> >> drm_master_open():
> >> drm_new_set_master(); <--- returns -ENOMEM,
> >> drm_file.master not set
> >> drm_file_free():
> >> drm_master_release(); <--- NULL ptr dereference
> >> (file_priv->master->magic_map)
> >>
> >> Fix these by checking if the master pointers are NULL before use.
> >>
> >> Signed-off-by: Desmond Cheong Zhi Xi <[email protected]>
> >> ---
> >> drivers/gpu/drm/drm_auth.c | 16 ++++++++++++++--
> >> drivers/gpu/drm/drm_ioctl.c | 5 +++++
> >> 2 files changed, 19 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
> >> index f9267b21556e..b7230604496b 100644
> >> --- a/drivers/gpu/drm/drm_auth.c
> >> +++ b/drivers/gpu/drm/drm_auth.c
> >> @@ -95,11 +95,18 @@ EXPORT_SYMBOL(drm_is_current_master);
> >> int drm_getmagic(struct drm_device *dev, void *data, struct drm_file *file_priv)
> >> {
> >> struct drm_auth *auth = data;
> >> + struct drm_master *master;
> >> int ret = 0;
> >>
> >> mutex_lock(&dev->master_mutex);
> >> + master = file_priv->master;
> >> + if (!master) {
> >> + mutex_unlock(&dev->master_mutex);
> >> + return -EINVAL;
> >> + }
> >> +
> >> if (!file_priv->magic) {
> >> - ret = idr_alloc(&file_priv->master->magic_map, file_priv,
> >> + ret = idr_alloc(&master->magic_map, file_priv,
> >> 1, 0, GFP_KERNEL);
> >> if (ret >= 0)
> >> file_priv->magic = ret;
> >> @@ -355,8 +362,12 @@ void drm_master_release(struct drm_file *file_priv)
> >>
> >> mutex_lock(&dev->master_mutex);
> >> master = file_priv->master;
> >> +
> >> + if (!master)
> >> + goto unlock;
> >
> > This is a bit convoluted, since we're in the single-threaded release path
> > we don't need any locking for file_priv related things. Therefore we can
> > pull the master check out and just directly return.
> >
> > But since it's a bit surprising maybe a comment that this can happen when
> > drm_master_open in drm_open_helper fails?
> >
>
> Sounds good. This can actually also happen in the failure path of
> mock_drm_getfile if anon_inode_getfile fails. I'll leave a short note
> about both of them.
>
> > Another option, and maybe cleaner, would be to move the drm_master_release
> > from drm_file_free into drm_close_helper. That would be fully symmetrical
> > and should also fix the bug here?
> > -Daniel
> >
> Hmmm maybe the first option to move the check out of the lock might be
> better. If I'm not wrong, we would otherwise also need to move
> drm_master_release into drm_client_close.

Do we have to?

If I haven't missed anything, the drm_client stuff only calls
drm_file_alloc and doesn't set up a master. So this should work?
-Daniel

>
> >
> >> +
> >> if (file_priv->magic)
> >> - idr_remove(&file_priv->master->magic_map, file_priv->magic);
> >> + idr_remove(&master->magic_map, file_priv->magic);
> >>
> >> if (!drm_is_current_master_locked(file_priv))
> >> goto out;
> >> @@ -379,6 +390,7 @@ void drm_master_release(struct drm_file *file_priv)
> >> drm_master_put(&file_priv->master);
> >> spin_unlock(&dev->master_lookup_lock);
> >> }
> >> +unlock:
> >> mutex_unlock(&dev->master_mutex);
> >> }
> >>
> >> diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
> >> index 26f3a9ede8fe..4d029d3061d9 100644
> >> --- a/drivers/gpu/drm/drm_ioctl.c
> >> +++ b/drivers/gpu/drm/drm_ioctl.c
> >> @@ -121,6 +121,11 @@ int drm_getunique(struct drm_device *dev, void *data,
> >>
> >> mutex_lock(&dev->master_mutex);
> >> master = file_priv->master;
> >> + if (!master) {
> >> + mutex_unlock(&dev->master_mutex);
> >> + return -EINVAL;
> >> + }
> >> +
> >> if (u->unique_len >= master->unique_len) {
> >> if (copy_to_user(u->unique, master->unique, master->unique_len)) {
> >> mutex_unlock(&dev->master_mutex);
> >> --
> >> 2.25.1
> >>
> >
>


--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2021-08-19 05:35:39

by Desmond Cheong Zhi Xi

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH v3 4/9] drm: fix potential null ptr dereferences in drm_{auth, ioctl}

On 19/8/21 12:33 am, Daniel Vetter wrote:
> On Wed, Aug 18, 2021 at 5:37 PM Desmond Cheong Zhi Xi
> <[email protected]> wrote:
>>
>> On 18/8/21 6:11 pm, Daniel Vetter wrote:
>>> On Wed, Aug 18, 2021 at 03:38:19PM +0800, Desmond Cheong Zhi Xi wrote:
>>>> There are three areas where we dereference struct drm_master without
>>>> checking if the pointer is non-NULL.
>>>>
>>>> 1. drm_getmagic is called from the ioctl_handler. Since
>>>> DRM_IOCTL_GET_MAGIC has no ioctl flags, drm_getmagic is run without
>>>> any check that drm_file.master has been set.
>>>>
>>>> 2. Similarly, drm_getunique is called from the ioctl_handler, but
>>>> DRM_IOCTL_GET_UNIQUE has no ioctl flags. So there is no guarantee that
>>>> drm_file.master has been set.
>>>
>>> I think the above two are impossible, due to the refcounting rules for
>>> struct file.
>>>
>>
>> Right, will drop those two parts from the patch.
>>
>>>> 3. drm_master_release can also be called without having a
>>>> drm_file.master set. Here is one error path:
>>>> drm_open():
>>>> drm_open_helper():
>>>> drm_master_open():
>>>> drm_new_set_master(); <--- returns -ENOMEM,
>>>> drm_file.master not set
>>>> drm_file_free():
>>>> drm_master_release(); <--- NULL ptr dereference
>>>> (file_priv->master->magic_map)
>>>>
>>>> Fix these by checking if the master pointers are NULL before use.
>>>>
>>>> Signed-off-by: Desmond Cheong Zhi Xi <[email protected]>
>>>> ---
>>>> drivers/gpu/drm/drm_auth.c | 16 ++++++++++++++--
>>>> drivers/gpu/drm/drm_ioctl.c | 5 +++++
>>>> 2 files changed, 19 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
>>>> index f9267b21556e..b7230604496b 100644
>>>> --- a/drivers/gpu/drm/drm_auth.c
>>>> +++ b/drivers/gpu/drm/drm_auth.c
>>>> @@ -95,11 +95,18 @@ EXPORT_SYMBOL(drm_is_current_master);
>>>> int drm_getmagic(struct drm_device *dev, void *data, struct drm_file *file_priv)
>>>> {
>>>> struct drm_auth *auth = data;
>>>> + struct drm_master *master;
>>>> int ret = 0;
>>>>
>>>> mutex_lock(&dev->master_mutex);
>>>> + master = file_priv->master;
>>>> + if (!master) {
>>>> + mutex_unlock(&dev->master_mutex);
>>>> + return -EINVAL;
>>>> + }
>>>> +
>>>> if (!file_priv->magic) {
>>>> - ret = idr_alloc(&file_priv->master->magic_map, file_priv,
>>>> + ret = idr_alloc(&master->magic_map, file_priv,
>>>> 1, 0, GFP_KERNEL);
>>>> if (ret >= 0)
>>>> file_priv->magic = ret;
>>>> @@ -355,8 +362,12 @@ void drm_master_release(struct drm_file *file_priv)
>>>>
>>>> mutex_lock(&dev->master_mutex);
>>>> master = file_priv->master;
>>>> +
>>>> + if (!master)
>>>> + goto unlock;
>>>
>>> This is a bit convoluted, since we're in the single-threaded release path
>>> we don't need any locking for file_priv related things. Therefore we can
>>> pull the master check out and just directly return.
>>>
>>> But since it's a bit surprising maybe a comment that this can happen when
>>> drm_master_open in drm_open_helper fails?
>>>
>>
>> Sounds good. This can actually also happen in the failure path of
>> mock_drm_getfile if anon_inode_getfile fails. I'll leave a short note
>> about both of them.
>>
>>> Another option, and maybe cleaner, would be to move the drm_master_release
>>> from drm_file_free into drm_close_helper. That would be fully symmetrical
>>> and should also fix the bug here?
>>> -Daniel
>>>
>> Hmmm maybe the first option to move the check out of the lock might be
>> better. If I'm not wrong, we would otherwise also need to move
>> drm_master_release into drm_client_close.
>
> Do we have to?
>
> If I haven't missed anything, the drm_client stuff only calls
> drm_file_alloc and doesn't set up a master. So this should work?
> -Daniel
>

Ahhhh ok yes, I understand what's going on better now. I think you're
right, moving drm_master_release into drm_close_helper should fix all
the places it can go wrong.

>>
>>>
>>>> +
>>>> if (file_priv->magic)
>>>> - idr_remove(&file_priv->master->magic_map, file_priv->magic);
>>>> + idr_remove(&master->magic_map, file_priv->magic);
>>>>
>>>> if (!drm_is_current_master_locked(file_priv))
>>>> goto out;
>>>> @@ -379,6 +390,7 @@ void drm_master_release(struct drm_file *file_priv)
>>>> drm_master_put(&file_priv->master);
>>>> spin_unlock(&dev->master_lookup_lock);
>>>> }
>>>> +unlock:
>>>> mutex_unlock(&dev->master_mutex);
>>>> }
>>>>
>>>> diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
>>>> index 26f3a9ede8fe..4d029d3061d9 100644
>>>> --- a/drivers/gpu/drm/drm_ioctl.c
>>>> +++ b/drivers/gpu/drm/drm_ioctl.c
>>>> @@ -121,6 +121,11 @@ int drm_getunique(struct drm_device *dev, void *data,
>>>>
>>>> mutex_lock(&dev->master_mutex);
>>>> master = file_priv->master;
>>>> + if (!master) {
>>>> + mutex_unlock(&dev->master_mutex);
>>>> + return -EINVAL;
>>>> + }
>>>> +
>>>> if (u->unique_len >= master->unique_len) {
>>>> if (copy_to_user(u->unique, master->unique, master->unique_len)) {
>>>> mutex_unlock(&dev->master_mutex);
>>>> --
>>>> 2.25.1
>>>>
>>>
>>
>
>

2021-08-19 09:29:50

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 8/9] kernel: export task_work_add

On Wed, Aug 18, 2021 at 03:38:23PM +0800, Desmond Cheong Zhi Xi wrote:
> +EXPORT_SYMBOL(task_work_add);

EXPORT_SYMBOL_GPL for this kinds of functionality, please.

2021-08-19 09:43:13

by Desmond Cheong Zhi Xi

[permalink] [raw]
Subject: Re: [PATCH v3 8/9] kernel: export task_work_add

On 19/8/21 5:26 pm, Christoph Hellwig wrote:
> On Wed, Aug 18, 2021 at 03:38:23PM +0800, Desmond Cheong Zhi Xi wrote:
>> +EXPORT_SYMBOL(task_work_add);
>
> EXPORT_SYMBOL_GPL for this kinds of functionality, please.
>

Thanks, I wasn't aware of the GPL-only export. I'll update this in a
future series if we still need the export.

2021-08-19 10:54:57

by Desmond Cheong Zhi Xi

[permalink] [raw]
Subject: Re: [PATCH v3 7/9] drm: update global mutex lock in the ioctl handler

On 18/8/21 7:02 pm, Daniel Vetter wrote:
> On Wed, Aug 18, 2021 at 03:38:22PM +0800, Desmond Cheong Zhi Xi wrote:
>> In a future patch, a read lock on drm_device.master_rwsem is
>> held in the ioctl handler before the check for ioctl
>> permissions. However, this produces the following lockdep splat:
>>
>> ======================================================
>> WARNING: possible circular locking dependency detected
>> 5.14.0-rc6-CI-Patchwork_20831+ #1 Tainted: G U
>> ------------------------------------------------------
>> kms_lease/1752 is trying to acquire lock:
>> ffffffff827bad88 (drm_global_mutex){+.+.}-{3:3}, at: drm_open+0x64/0x280
>>
>> but task is already holding lock:
>> ffff88812e350108 (&dev->master_rwsem){++++}-{3:3}, at:
>> drm_ioctl_kernel+0xfb/0x1a0
>>
>> which lock already depends on the new lock.
>>
>> the existing dependency chain (in reverse order) is:
>>
>> -> #2 (&dev->master_rwsem){++++}-{3:3}:
>> lock_acquire+0xd3/0x310
>> down_read+0x3b/0x140
>> drm_master_internal_acquire+0x1d/0x60
>> drm_client_modeset_commit+0x10/0x40
>> __drm_fb_helper_restore_fbdev_mode_unlocked+0x88/0xb0
>> drm_fb_helper_set_par+0x34/0x40
>> intel_fbdev_set_par+0x11/0x40 [i915]
>> fbcon_init+0x270/0x4f0
>> visual_init+0xc6/0x130
>> do_bind_con_driver+0x1de/0x2c0
>> do_take_over_console+0x10e/0x180
>> do_fbcon_takeover+0x53/0xb0
>> register_framebuffer+0x22d/0x310
>> __drm_fb_helper_initial_config_and_unlock+0x36c/0x540
>> intel_fbdev_initial_config+0xf/0x20 [i915]
>> async_run_entry_fn+0x28/0x130
>> process_one_work+0x26d/0x5c0
>> worker_thread+0x37/0x390
>> kthread+0x13b/0x170
>> ret_from_fork+0x1f/0x30
>>
>> -> #1 (&helper->lock){+.+.}-{3:3}:
>> lock_acquire+0xd3/0x310
>> __mutex_lock+0xa8/0x930
>> __drm_fb_helper_restore_fbdev_mode_unlocked+0x44/0xb0
>> intel_fbdev_restore_mode+0x2b/0x50 [i915]
>> drm_lastclose+0x27/0x50
>> drm_release_noglobal+0x42/0x60
>> __fput+0x9e/0x250
>> task_work_run+0x6b/0xb0
>> exit_to_user_mode_prepare+0x1c5/0x1d0
>> syscall_exit_to_user_mode+0x19/0x50
>> do_syscall_64+0x46/0xb0
>> entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>> -> #0 (drm_global_mutex){+.+.}-{3:3}:
>> validate_chain+0xb39/0x1e70
>> __lock_acquire+0x5a1/0xb70
>> lock_acquire+0xd3/0x310
>> __mutex_lock+0xa8/0x930
>> drm_open+0x64/0x280
>> drm_stub_open+0x9f/0x100
>> chrdev_open+0x9f/0x1d0
>> do_dentry_open+0x14a/0x3a0
>> dentry_open+0x53/0x70
>> drm_mode_create_lease_ioctl+0x3cb/0x970
>> drm_ioctl_kernel+0xc9/0x1a0
>> drm_ioctl+0x201/0x3d0
>> __x64_sys_ioctl+0x6a/0xa0
>> do_syscall_64+0x37/0xb0
>> entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>> other info that might help us debug this:
>> Chain exists of:
>> drm_global_mutex --> &helper->lock --> &dev->master_rwsem
>> Possible unsafe locking scenario:
>> CPU0 CPU1
>> ---- ----
>> lock(&dev->master_rwsem);
>> lock(&helper->lock);
>> lock(&dev->master_rwsem);
>> lock(drm_global_mutex);
>>
>> *** DEADLOCK ***
>>
>> The lock hierarchy inversion happens because we grab the
>> drm_global_mutex while already holding on to master_rwsem. To avoid
>> this, we do some prep work to grab the drm_global_mutex before
>> checking for ioctl permissions.
>>
>> At the same time, we update the check for the global mutex to use the
>> drm_dev_needs_global_mutex helper function.
>
> This is intentional, essentially we force all non-legacy drivers to have
> unlocked ioctl (otherwise everyone forgets to set that flag).
>
> For non-legacy drivers the global lock only ensures ordering between
> drm_open and lastclose (I think at least), and between
> drm_dev_register/unregister and the backwards ->load/unload callbacks
> (which are called in the wrong place, but we cannot fix that for legacy
> drivers).
>
> ->load/unload should be completely unused (maybe radeon still uses it),
> and ->lastclose is also on the decline.
>

Ah ok got it, I'll change the check back to
drm_core_check_feature(dev, DRIVER_LEGACY) then.

> Maybe we should update the comment of drm_global_mutex to explain what it
> protects and why.
>

The comments in drm_dev_needs_global_mutex make sense I think, I just
didn't read the code closely enough.

> I'm also confused how this patch connects to the splat, since for i915 we

Right, my bad, this is a separate instance of circular locking. I was
too hasty when I saw that for legacy drivers we might grab master_rwsem
then drm_global_mutex in the ioctl handler.

> shouldn't be taking the drm_global_lock here at all. The problem seems to
> be the drm_open_helper when we create a new lease, which is an entirely
> different can of worms.
>
> I'm honestly not sure how to best do that, but we should be able to create
> a file and then call drm_open_helper directly, or well a version of that
> which never takes the drm_global_mutex. Because that is not needed for
> nested drm_file opening:
> - legacy drivers never go down this path because leases are only supported
> with modesetting, and modesetting is only supported for non-legacy
> drivers
> - the races against dev->open_count due to last_close or ->load callbacks
> don't matter, because for the entire ioctl we already have an open
> drm_file and that wont disappear.
>
> So this should work, but I'm not entirely sure how to make it work.
> -Daniel
>

One idea that comes to mind is to change the outcome of
drm_dev_needs_global_mutex while we're in the ioctl, but that requires
more locking which sounds like a bad idea.

Another idea, which is quite messy, but just for thoughts, uses the idea
of pushing the master_rwsem read lock down:

diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 7f523e1c5650..5d05e744b728 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -712,7 +712,7 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, drm_crtc_get_sequence_ioctl, 0),
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE, drm_crtc_queue_sequence_ioctl, 0),
- DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, drm_mode_create_lease_ioctl, DRM_MASTER),
+ DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, drm_mode_create_lease_ioctl, 0),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_LIST_LESSEES, drm_mode_list_lessees_ioctl, DRM_MASTER),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_GET_LEASE, drm_mode_get_lease_ioctl, DRM_MASTER),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_REVOKE_LEASE, drm_mode_revoke_lease_ioctl, DRM_MASTER),
diff --git a/drivers/gpu/drm/drm_lease.c b/drivers/gpu/drm/drm_lease.c
index 983701198ffd..a25bc69522b4 100644
--- a/drivers/gpu/drm/drm_lease.c
+++ b/drivers/gpu/drm/drm_lease.c
@@ -500,6 +500,19 @@ int drm_mode_create_lease_ioctl(struct drm_device *dev,
return -EINVAL;
}

+ /* Clone the lessor file to create a new file for us */
+ DRM_DEBUG_LEASE("Allocating lease file\n");
+ lessee_file = file_clone_open(lessor_file);
+ if (IS_ERR(lessee_file))
+ return PTR_ERR(lessee_file);
+
+ down_read(&dev->master_rwsem);
+
+ if (!drm_is_current_master(lessor_priv)) {
+ ret = -EACCES;
+ goto out_file;
+ }
+
lessor = drm_file_get_master(lessor_priv);
/* Do not allow sub-leases */
if (lessor->lessor) {
@@ -547,14 +560,6 @@ int drm_mode_create_lease_ioctl(struct drm_device *dev,
goto out_leases;
}

- /* Clone the lessor file to create a new file for us */
- DRM_DEBUG_LEASE("Allocating lease file\n");
- lessee_file = file_clone_open(lessor_file);
- if (IS_ERR(lessee_file)) {
- ret = PTR_ERR(lessee_file);
- goto out_lessee;
- }
-
lessee_priv = lessee_file->private_data;
/* Change the file to a master one */
drm_master_put(&lessee_priv->master);
@@ -571,17 +576,19 @@ int drm_mode_create_lease_ioctl(struct drm_device *dev,
fd_install(fd, lessee_file);

drm_master_put(&lessor);
+ up_read(&dev->master_rwsem);
DRM_DEBUG_LEASE("drm_mode_create_lease_ioctl succeeded\n");
return 0;

-out_lessee:
- drm_master_put(&lessee);
-
out_leases:
put_unused_fd(fd);

out_lessor:
drm_master_put(&lessor);
+
+out_file:
+ up_read(&dev->master_rwsem);
+ fput(lessee_file);
DRM_DEBUG_LEASE("drm_mode_create_lease_ioctl failed: %d\n", ret);
return ret;
}


Something like this would also address the other deadlock we'd hit in
drm_mode_create_lease_ioctl():

drm_ioctl_kernel():
down_read(&master_rwsem); <--- down_read()
drm_mode_create_lease_ioctl():
drm_lease_create():
file_clone_open():
...
drm_open():
drm_open_helper():
drm_master_open():
down_write(&master_rwsem); <--- down_write()

Overall, I think the suggestion to push master_rwsem write locks down
into ioctls would solve the nesting problem for those ioctls.

Although I'm still a little concerned that, just like here, there might
be deeply embedded nested locking, so locking becomes prone to breaking.
It does smell a bit to me.

>> Signed-off-by: Desmond Cheong Zhi Xi <[email protected]>
>> ---
>> drivers/gpu/drm/drm_ioctl.c | 18 +++++++++---------
>> 1 file changed, 9 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
>> index 880fc565d599..2cb57378a787 100644
>> --- a/drivers/gpu/drm/drm_ioctl.c
>> +++ b/drivers/gpu/drm/drm_ioctl.c
>> @@ -779,19 +779,19 @@ long drm_ioctl_kernel(struct file *file, drm_ioctl_t *func, void *kdata,
>> if (drm_dev_is_unplugged(dev))
>> return -ENODEV;
>>
>> + /* Enforce sane locking for modern driver ioctls. */
>> + if (unlikely(drm_dev_needs_global_mutex(dev)) && !(flags & DRM_UNLOCKED))
>> + mutex_lock(&drm_global_mutex);
>> +
>> retcode = drm_ioctl_permit(flags, file_priv);
>> if (unlikely(retcode))
>> - return retcode;
>> + goto out;
>>
>> - /* Enforce sane locking for modern driver ioctls. */
>> - if (likely(!drm_core_check_feature(dev, DRIVER_LEGACY)) ||
>> - (flags & DRM_UNLOCKED))
>> - retcode = func(dev, kdata, file_priv);
>> - else {
>> - mutex_lock(&drm_global_mutex);
>> - retcode = func(dev, kdata, file_priv);
>> + retcode = func(dev, kdata, file_priv);
>> +
>> +out:
>> + if (unlikely(drm_dev_needs_global_mutex(dev)) && !(flags & DRM_UNLOCKED))
>> mutex_unlock(&drm_global_mutex);
>> - }
>> return retcode;
>> }
>> EXPORT_SYMBOL(drm_ioctl_kernel);
>> --
>> 2.25.1
>>
>

2021-08-19 11:25:51

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH v3 7/9] drm: update global mutex lock in the ioctl handler

On Thu, Aug 19, 2021 at 12:53 PM Desmond Cheong Zhi Xi
<[email protected]> wrote:
>
> On 18/8/21 7:02 pm, Daniel Vetter wrote:
> > On Wed, Aug 18, 2021 at 03:38:22PM +0800, Desmond Cheong Zhi Xi wrote:
> >> In a future patch, a read lock on drm_device.master_rwsem is
> >> held in the ioctl handler before the check for ioctl
> >> permissions. However, this produces the following lockdep splat:
> >>
> >> ======================================================
> >> WARNING: possible circular locking dependency detected
> >> 5.14.0-rc6-CI-Patchwork_20831+ #1 Tainted: G U
> >> ------------------------------------------------------
> >> kms_lease/1752 is trying to acquire lock:
> >> ffffffff827bad88 (drm_global_mutex){+.+.}-{3:3}, at: drm_open+0x64/0x280
> >>
> >> but task is already holding lock:
> >> ffff88812e350108 (&dev->master_rwsem){++++}-{3:3}, at:
> >> drm_ioctl_kernel+0xfb/0x1a0
> >>
> >> which lock already depends on the new lock.
> >>
> >> the existing dependency chain (in reverse order) is:
> >>
> >> -> #2 (&dev->master_rwsem){++++}-{3:3}:
> >> lock_acquire+0xd3/0x310
> >> down_read+0x3b/0x140
> >> drm_master_internal_acquire+0x1d/0x60
> >> drm_client_modeset_commit+0x10/0x40
> >> __drm_fb_helper_restore_fbdev_mode_unlocked+0x88/0xb0
> >> drm_fb_helper_set_par+0x34/0x40
> >> intel_fbdev_set_par+0x11/0x40 [i915]
> >> fbcon_init+0x270/0x4f0
> >> visual_init+0xc6/0x130
> >> do_bind_con_driver+0x1de/0x2c0
> >> do_take_over_console+0x10e/0x180
> >> do_fbcon_takeover+0x53/0xb0
> >> register_framebuffer+0x22d/0x310
> >> __drm_fb_helper_initial_config_and_unlock+0x36c/0x540
> >> intel_fbdev_initial_config+0xf/0x20 [i915]
> >> async_run_entry_fn+0x28/0x130
> >> process_one_work+0x26d/0x5c0
> >> worker_thread+0x37/0x390
> >> kthread+0x13b/0x170
> >> ret_from_fork+0x1f/0x30
> >>
> >> -> #1 (&helper->lock){+.+.}-{3:3}:
> >> lock_acquire+0xd3/0x310
> >> __mutex_lock+0xa8/0x930
> >> __drm_fb_helper_restore_fbdev_mode_unlocked+0x44/0xb0
> >> intel_fbdev_restore_mode+0x2b/0x50 [i915]
> >> drm_lastclose+0x27/0x50
> >> drm_release_noglobal+0x42/0x60
> >> __fput+0x9e/0x250
> >> task_work_run+0x6b/0xb0
> >> exit_to_user_mode_prepare+0x1c5/0x1d0
> >> syscall_exit_to_user_mode+0x19/0x50
> >> do_syscall_64+0x46/0xb0
> >> entry_SYSCALL_64_after_hwframe+0x44/0xae
> >>
> >> -> #0 (drm_global_mutex){+.+.}-{3:3}:
> >> validate_chain+0xb39/0x1e70
> >> __lock_acquire+0x5a1/0xb70
> >> lock_acquire+0xd3/0x310
> >> __mutex_lock+0xa8/0x930
> >> drm_open+0x64/0x280
> >> drm_stub_open+0x9f/0x100
> >> chrdev_open+0x9f/0x1d0
> >> do_dentry_open+0x14a/0x3a0
> >> dentry_open+0x53/0x70
> >> drm_mode_create_lease_ioctl+0x3cb/0x970
> >> drm_ioctl_kernel+0xc9/0x1a0
> >> drm_ioctl+0x201/0x3d0
> >> __x64_sys_ioctl+0x6a/0xa0
> >> do_syscall_64+0x37/0xb0
> >> entry_SYSCALL_64_after_hwframe+0x44/0xae
> >>
> >> other info that might help us debug this:
> >> Chain exists of:
> >> drm_global_mutex --> &helper->lock --> &dev->master_rwsem
> >> Possible unsafe locking scenario:
> >> CPU0 CPU1
> >> ---- ----
> >> lock(&dev->master_rwsem);
> >> lock(&helper->lock);
> >> lock(&dev->master_rwsem);
> >> lock(drm_global_mutex);
> >>
> >> *** DEADLOCK ***
> >>
> >> The lock hierarchy inversion happens because we grab the
> >> drm_global_mutex while already holding on to master_rwsem. To avoid
> >> this, we do some prep work to grab the drm_global_mutex before
> >> checking for ioctl permissions.
> >>
> >> At the same time, we update the check for the global mutex to use the
> >> drm_dev_needs_global_mutex helper function.
> >
> > This is intentional, essentially we force all non-legacy drivers to have
> > unlocked ioctl (otherwise everyone forgets to set that flag).
> >
> > For non-legacy drivers the global lock only ensures ordering between
> > drm_open and lastclose (I think at least), and between
> > drm_dev_register/unregister and the backwards ->load/unload callbacks
> > (which are called in the wrong place, but we cannot fix that for legacy
> > drivers).
> >
> > ->load/unload should be completely unused (maybe radeon still uses it),
> > and ->lastclose is also on the decline.
> >
>
> Ah ok got it, I'll change the check back to
> drm_core_check_feature(dev, DRIVER_LEGACY) then.
>
> > Maybe we should update the comment of drm_global_mutex to explain what it
> > protects and why.
> >
>
> The comments in drm_dev_needs_global_mutex make sense I think, I just
> didn't read the code closely enough.
>
> > I'm also confused how this patch connects to the splat, since for i915 we
>
> Right, my bad, this is a separate instance of circular locking. I was
> too hasty when I saw that for legacy drivers we might grab master_rwsem
> then drm_global_mutex in the ioctl handler.
>
> > shouldn't be taking the drm_global_lock here at all. The problem seems to
> > be the drm_open_helper when we create a new lease, which is an entirely
> > different can of worms.
> >
> > I'm honestly not sure how to best do that, but we should be able to create
> > a file and then call drm_open_helper directly, or well a version of that
> > which never takes the drm_global_mutex. Because that is not needed for
> > nested drm_file opening:
> > - legacy drivers never go down this path because leases are only supported
> > with modesetting, and modesetting is only supported for non-legacy
> > drivers
> > - the races against dev->open_count due to last_close or ->load callbacks
> > don't matter, because for the entire ioctl we already have an open
> > drm_file and that wont disappear.
> >
> > So this should work, but I'm not entirely sure how to make it work.
> > -Daniel
> >
>
> One idea that comes to mind is to change the outcome of
> drm_dev_needs_global_mutex while we're in the ioctl, but that requires
> more locking which sounds like a bad idea.
>
> Another idea, which is quite messy, but just for thoughts, uses the idea
> of pushing the master_rwsem read lock down:

Yeah I think that's cleaner, and I think that also should work a lot
better for the other ioctls:
- We don't have a need to flush readers anymore since we'll just take
the rwsem in write mode
- There's much less inversions, and maybe we could even get rid of the
spinlock since at that point all readers should at least have the
rwsem read-locked.

>
> diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
> index 7f523e1c5650..5d05e744b728 100644
> --- a/drivers/gpu/drm/drm_ioctl.c
> +++ b/drivers/gpu/drm/drm_ioctl.c
> @@ -712,7 +712,7 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
> DRM_RENDER_ALLOW),
> DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, drm_crtc_get_sequence_ioctl, 0),
> DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE, drm_crtc_queue_sequence_ioctl, 0),
> - DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, drm_mode_create_lease_ioctl, DRM_MASTER),
> + DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, drm_mode_create_lease_ioctl, 0),
> DRM_IOCTL_DEF(DRM_IOCTL_MODE_LIST_LESSEES, drm_mode_list_lessees_ioctl, DRM_MASTER),
> DRM_IOCTL_DEF(DRM_IOCTL_MODE_GET_LEASE, drm_mode_get_lease_ioctl, DRM_MASTER),
> DRM_IOCTL_DEF(DRM_IOCTL_MODE_REVOKE_LEASE, drm_mode_revoke_lease_ioctl, DRM_MASTER),
> diff --git a/drivers/gpu/drm/drm_lease.c b/drivers/gpu/drm/drm_lease.c
> index 983701198ffd..a25bc69522b4 100644
> --- a/drivers/gpu/drm/drm_lease.c
> +++ b/drivers/gpu/drm/drm_lease.c
> @@ -500,6 +500,19 @@ int drm_mode_create_lease_ioctl(struct drm_device *dev,
> return -EINVAL;
> }
>
> + /* Clone the lessor file to create a new file for us */
> + DRM_DEBUG_LEASE("Allocating lease file\n");
> + lessee_file = file_clone_open(lessor_file);
> + if (IS_ERR(lessee_file))
> + return PTR_ERR(lessee_file);
> +
> + down_read(&dev->master_rwsem);
> +
> + if (!drm_is_current_master(lessor_priv)) {
> + ret = -EACCES;
> + goto out_file;
> + }
> +
> lessor = drm_file_get_master(lessor_priv);
> /* Do not allow sub-leases */
> if (lessor->lessor) {
> @@ -547,14 +560,6 @@ int drm_mode_create_lease_ioctl(struct drm_device *dev,
> goto out_leases;
> }
>
> - /* Clone the lessor file to create a new file for us */
> - DRM_DEBUG_LEASE("Allocating lease file\n");
> - lessee_file = file_clone_open(lessor_file);
> - if (IS_ERR(lessee_file)) {
> - ret = PTR_ERR(lessee_file);
> - goto out_lessee;
> - }
> -
> lessee_priv = lessee_file->private_data;
> /* Change the file to a master one */
> drm_master_put(&lessee_priv->master);
> @@ -571,17 +576,19 @@ int drm_mode_create_lease_ioctl(struct drm_device *dev,
> fd_install(fd, lessee_file);
>
> drm_master_put(&lessor);
> + up_read(&dev->master_rwsem);
> DRM_DEBUG_LEASE("drm_mode_create_lease_ioctl succeeded\n");
> return 0;
>
> -out_lessee:
> - drm_master_put(&lessee);
> -
> out_leases:
> put_unused_fd(fd);
>
> out_lessor:
> drm_master_put(&lessor);
> +
> +out_file:
> + up_read(&dev->master_rwsem);
> + fput(lessee_file);
> DRM_DEBUG_LEASE("drm_mode_create_lease_ioctl failed: %d\n", ret);
> return ret;
> }
>
>
> Something like this would also address the other deadlock we'd hit in
> drm_mode_create_lease_ioctl():
>
> drm_ioctl_kernel():
> down_read(&master_rwsem); <--- down_read()
> drm_mode_create_lease_ioctl():
> drm_lease_create():
> file_clone_open():
> ...
> drm_open():
> drm_open_helper():
> drm_master_open():
> down_write(&master_rwsem); <--- down_write()
>
> Overall, I think the suggestion to push master_rwsem write locks down
> into ioctls would solve the nesting problem for those ioctls.

Yup, my gut feeling agress. And the above is a nice solution without
having to dig out all the code for creating a file directly (it's
doable I think at least, we do it for dma-buf).

> Although I'm still a little concerned that, just like here, there might
> be deeply embedded nested locking, so locking becomes prone to breaking.
> It does smell a bit to me.

Yeah, that's pretty much the bane of locking cleanup/rework. You have
to do it to figure out what goes boom :-/ Even with the most careful
audit there's surprises left.
-Daniel

> >> Signed-off-by: Desmond Cheong Zhi Xi <[email protected]>
> >> ---
> >> drivers/gpu/drm/drm_ioctl.c | 18 +++++++++---------
> >> 1 file changed, 9 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
> >> index 880fc565d599..2cb57378a787 100644
> >> --- a/drivers/gpu/drm/drm_ioctl.c
> >> +++ b/drivers/gpu/drm/drm_ioctl.c
> >> @@ -779,19 +779,19 @@ long drm_ioctl_kernel(struct file *file, drm_ioctl_t *func, void *kdata,
> >> if (drm_dev_is_unplugged(dev))
> >> return -ENODEV;
> >>
> >> + /* Enforce sane locking for modern driver ioctls. */
> >> + if (unlikely(drm_dev_needs_global_mutex(dev)) && !(flags & DRM_UNLOCKED))
> >> + mutex_lock(&drm_global_mutex);
> >> +
> >> retcode = drm_ioctl_permit(flags, file_priv);
> >> if (unlikely(retcode))
> >> - return retcode;
> >> + goto out;
> >>
> >> - /* Enforce sane locking for modern driver ioctls. */
> >> - if (likely(!drm_core_check_feature(dev, DRIVER_LEGACY)) ||
> >> - (flags & DRM_UNLOCKED))
> >> - retcode = func(dev, kdata, file_priv);
> >> - else {
> >> - mutex_lock(&drm_global_mutex);
> >> - retcode = func(dev, kdata, file_priv);
> >> + retcode = func(dev, kdata, file_priv);
> >> +
> >> +out:
> >> + if (unlikely(drm_dev_needs_global_mutex(dev)) && !(flags & DRM_UNLOCKED))
> >> mutex_unlock(&drm_global_mutex);
> >> - }
> >> return retcode;
> >> }
> >> EXPORT_SYMBOL(drm_ioctl_kernel);
> >> --
> >> 2.25.1
> >>
> >
>


--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch