When replacing the Big Kernel Lock in commit:
<2a48fc0ab24241755dc93bfd4f01d68efab47f5a> ("block: autoconvert trivial BKL
users to private mutex") , the lock was replaced with a sr-wide lock.
This causes very poor performance when using multiple sr devices, as the
sr driver was not able to execute more than one command to one drive at
any given time, even when there were many CD drives available.
Replace the global mutex with per-sr-device mutex.
Someone tried this patch at the time, but it never made it
upstream, due to possible concerns with race conditions, but it's not
clear the patch actually caused those:
https://www.spinics.net/lists/linux-scsi/msg63706.html
https://www.spinics.net/lists/linux-scsi/msg63750.html
Also see
http://lists.xiph.org/pipermail/paranoia/2019-December/001647.html
Signed-off-by: Merlijn Wajer <[email protected]>
---
drivers/scsi/sr.c | 16 +++++++++-------
drivers/scsi/sr.h | 2 ++
2 files changed, 11 insertions(+), 7 deletions(-)
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index 38ddbbfe5..6809fdcfd 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -77,7 +77,6 @@ MODULE_ALIAS_SCSI_DEVICE(TYPE_WORM);
CDC_CD_R|CDC_CD_RW|CDC_DVD|CDC_DVD_R|CDC_DVD_RAM|CDC_GENERIC_PACKET| \
CDC_MRW|CDC_MRW_W|CDC_RAM)
-static DEFINE_MUTEX(sr_mutex);
static int sr_probe(struct device *);
static int sr_remove(struct device *);
static blk_status_t sr_init_command(struct scsi_cmnd *SCpnt);
@@ -535,9 +534,9 @@ static int sr_block_open(struct block_device *bdev, fmode_t mode)
scsi_autopm_get_device(sdev);
check_disk_change(bdev);
- mutex_lock(&sr_mutex);
+ mutex_lock(&cd->lock);
ret = cdrom_open(&cd->cdi, bdev, mode);
- mutex_unlock(&sr_mutex);
+ mutex_unlock(&cd->lock);
scsi_autopm_put_device(sdev);
if (ret)
@@ -550,10 +549,10 @@ static int sr_block_open(struct block_device *bdev, fmode_t mode)
static void sr_block_release(struct gendisk *disk, fmode_t mode)
{
struct scsi_cd *cd = scsi_cd(disk);
- mutex_lock(&sr_mutex);
+ mutex_lock(&cd->lock);
cdrom_release(&cd->cdi, mode);
scsi_cd_put(cd);
- mutex_unlock(&sr_mutex);
+ mutex_unlock(&cd->lock);
}
static int sr_block_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
@@ -564,7 +563,7 @@ static int sr_block_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
void __user *argp = (void __user *)arg;
int ret;
- mutex_lock(&sr_mutex);
+ mutex_lock(&cd->lock);
ret = scsi_ioctl_block_when_processing_errors(sdev, cmd,
(mode & FMODE_NDELAY) != 0);
@@ -594,7 +593,7 @@ static int sr_block_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
scsi_autopm_put_device(sdev);
out:
- mutex_unlock(&sr_mutex);
+ mutex_unlock(&cd->lock);
return ret;
}
@@ -700,6 +699,7 @@ static int sr_probe(struct device *dev)
disk = alloc_disk(1);
if (!disk)
goto fail_free;
+ mutex_init(&cd->lock);
spin_lock(&sr_index_lock);
minor = find_first_zero_bit(sr_index_bits, SR_DISKS);
@@ -1009,6 +1009,8 @@ static void sr_kref_release(struct kref *kref)
put_disk(disk);
+ mutex_destroy(&cd->lock);
+
kfree(cd);
}
diff --git a/drivers/scsi/sr.h b/drivers/scsi/sr.h
index a2bb7b8ba..339c624e0 100644
--- a/drivers/scsi/sr.h
+++ b/drivers/scsi/sr.h
@@ -20,6 +20,7 @@
#include <linux/genhd.h>
#include <linux/kref.h>
+#include <linux/mutex.h>
#define MAX_RETRIES 3
#define SR_TIMEOUT (30 * HZ)
@@ -51,6 +52,7 @@ typedef struct scsi_cd {
bool ignore_get_event:1; /* GET_EVENT is unreliable, use TUR */
struct cdrom_device_info cdi;
+ struct mutex lock;
/* We hold gendisk and scsi_device references on probe and use
* the refs on this kref to decide when to release them */
struct kref kref;
--
2.17.1
Hi,
I was wondering if or what changes the maintainers would like me to make
in order to get this merged.
This regression was introduced back in 2010 and it recently took me just
under a week to find the root cause. The Internet Archive is digitising
hundreds of thousands of CDs, in an effort to preserve whatever is on
those CDs, and not being able to digitise more than one CD per machine
is a real pain.
My hope is that once I can massage the patch into something that can be
approved, we can also send the patch to the various stable trees.
Digging through some old email threads I also found a suggestion to move
the lock instead to the cdrom device. Would that be preferred to having
the lock per sr instance?
Merlijn
On 12/02/2020 17:44, Merlijn Wajer wrote:
> When replacing the Big Kernel Lock in commit:
> <2a48fc0ab24241755dc93bfd4f01d68efab47f5a> ("block: autoconvert trivial BKL
> users to private mutex") , the lock was replaced with a sr-wide lock.
>
> This causes very poor performance when using multiple sr devices, as the
> sr driver was not able to execute more than one command to one drive at
> any given time, even when there were many CD drives available.
>
> Replace the global mutex with per-sr-device mutex.
>
> Someone tried this patch at the time, but it never made it
> upstream, due to possible concerns with race conditions, but it's not
> clear the patch actually caused those:
>
> https://www.spinics.net/lists/linux-scsi/msg63706.html
> https://www.spinics.net/lists/linux-scsi/msg63750.html
>
> Also see
>
> http://lists.xiph.org/pipermail/paranoia/2019-December/001647.html
>
> Signed-off-by: Merlijn Wajer <[email protected]>
> ---
> drivers/scsi/sr.c | 16 +++++++++-------
> drivers/scsi/sr.h | 2 ++
> 2 files changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
> index 38ddbbfe5..6809fdcfd 100644
> --- a/drivers/scsi/sr.c
> +++ b/drivers/scsi/sr.c
> @@ -77,7 +77,6 @@ MODULE_ALIAS_SCSI_DEVICE(TYPE_WORM);
> CDC_CD_R|CDC_CD_RW|CDC_DVD|CDC_DVD_R|CDC_DVD_RAM|CDC_GENERIC_PACKET| \
> CDC_MRW|CDC_MRW_W|CDC_RAM)
>
> -static DEFINE_MUTEX(sr_mutex);
> static int sr_probe(struct device *);
> static int sr_remove(struct device *);
> static blk_status_t sr_init_command(struct scsi_cmnd *SCpnt);
> @@ -535,9 +534,9 @@ static int sr_block_open(struct block_device *bdev, fmode_t mode)
> scsi_autopm_get_device(sdev);
> check_disk_change(bdev);
>
> - mutex_lock(&sr_mutex);
> + mutex_lock(&cd->lock);
> ret = cdrom_open(&cd->cdi, bdev, mode);
> - mutex_unlock(&sr_mutex);
> + mutex_unlock(&cd->lock);
>
> scsi_autopm_put_device(sdev);
> if (ret)
> @@ -550,10 +549,10 @@ static int sr_block_open(struct block_device *bdev, fmode_t mode)
> static void sr_block_release(struct gendisk *disk, fmode_t mode)
> {
> struct scsi_cd *cd = scsi_cd(disk);
> - mutex_lock(&sr_mutex);
> + mutex_lock(&cd->lock);
> cdrom_release(&cd->cdi, mode);
> scsi_cd_put(cd);
> - mutex_unlock(&sr_mutex);
> + mutex_unlock(&cd->lock);
> }
>
> static int sr_block_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
> @@ -564,7 +563,7 @@ static int sr_block_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
> void __user *argp = (void __user *)arg;
> int ret;
>
> - mutex_lock(&sr_mutex);
> + mutex_lock(&cd->lock);
>
> ret = scsi_ioctl_block_when_processing_errors(sdev, cmd,
> (mode & FMODE_NDELAY) != 0);
> @@ -594,7 +593,7 @@ static int sr_block_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
> scsi_autopm_put_device(sdev);
>
> out:
> - mutex_unlock(&sr_mutex);
> + mutex_unlock(&cd->lock);
> return ret;
> }
>
> @@ -700,6 +699,7 @@ static int sr_probe(struct device *dev)
> disk = alloc_disk(1);
> if (!disk)
> goto fail_free;
> + mutex_init(&cd->lock);
>
> spin_lock(&sr_index_lock);
> minor = find_first_zero_bit(sr_index_bits, SR_DISKS);
> @@ -1009,6 +1009,8 @@ static void sr_kref_release(struct kref *kref)
>
> put_disk(disk);
>
> + mutex_destroy(&cd->lock);
> +
> kfree(cd);
> }
>
> diff --git a/drivers/scsi/sr.h b/drivers/scsi/sr.h
> index a2bb7b8ba..339c624e0 100644
> --- a/drivers/scsi/sr.h
> +++ b/drivers/scsi/sr.h
> @@ -20,6 +20,7 @@
>
> #include <linux/genhd.h>
> #include <linux/kref.h>
> +#include <linux/mutex.h>
>
> #define MAX_RETRIES 3
> #define SR_TIMEOUT (30 * HZ)
> @@ -51,6 +52,7 @@ typedef struct scsi_cd {
> bool ignore_get_event:1; /* GET_EVENT is unreliable, use TUR */
>
> struct cdrom_device_info cdi;
> + struct mutex lock;
> /* We hold gendisk and scsi_device references on probe and use
> * the refs on this kref to decide when to release them */
> struct kref kref;
>
On Wed, Feb 12, 2020 at 5:45 PM Merlijn Wajer <[email protected]> wrote:
>
> When replacing the Big Kernel Lock in commit:
> <2a48fc0ab24241755dc93bfd4f01d68efab47f5a> ("block: autoconvert trivial BKL
> users to private mutex") , the lock was replaced with a sr-wide lock.
>
> This causes very poor performance when using multiple sr devices, as the
> sr driver was not able to execute more than one command to one drive at
> any given time, even when there were many CD drives available.
>
> Replace the global mutex with per-sr-device mutex.
>
> Someone tried this patch at the time, but it never made it
> upstream, due to possible concerns with race conditions, but it's not
> clear the patch actually caused those:
>
> https://www.spinics.net/lists/linux-scsi/msg63706.html
> https://www.spinics.net/lists/linux-scsi/msg63750.html
>
> Also see
>
> http://lists.xiph.org/pipermail/paranoia/2019-December/001647.html
>
> Signed-off-by: Merlijn Wajer <[email protected]>
This all looks reasonable to me. The conversion from BKL to a per-driver
mutex was done in a mostly automated way, and I did not attempt to make
it more fine-grained then.
I don't see any global state accessed in the open/close/ioctl functions,
so this is probably completely safe. It may even be possible to avoid
that mutex completely, but that is harder to prove.
Acked-by: Arnd Bergmann <[email protected]>