2016-03-01 11:00:12

by Johan de Jong

[permalink] [raw]
Subject: SCSI sr driver: parallel writes to optical serialized which hurts performance (sr_mutex)

Dear developers,

(Please CC me as I am not subscribed (yet))

Writing (backing up) to multiple optical drives at the same time results in
a performance loss of about 7-10 times compared to writing to a single
drive.

After digging around it seems the problem arose about 5 years ago after the
Big Kernel Lock removal and the introduction of the new "sr_mutex" private
mutex in drivers/scsi/sr.c, which locks on a per driver basis instead of a
per device basis.

Various reports by users are listed on this issue on various mailing lists,
so I think there is interest for a solution in the linux community. So far,
it looks like this has not attracted the attention of, or not been
identified as a priority by, any of the kernel developers. However, I think
a Linux based DIY server with multiple optical drives for the purpose of
backing up files in multiple offline copies is a very useful application
and it would be unfortunate if the current behavior keeps such an
application unfeasible.

Would someone be willing to look into this and/or comment on the issue?

Sincerely,
Johan de Jong


2016-03-05 20:15:18

by Johan de Jong

[permalink] [raw]
Subject: Re: SCSI sr driver: parallel writes to optical serialized which hurts performance (sr_mutex)

Dear developers,

In the mean time I have applied and tested the 2013 patch by Otto Meta:

http://marc.info/?l=linux-scsi&m=135705061804384&w=2

which, in short, replaces mutex_lock(&sr_mutex) (global mutex), that
was introduced in 2010 to replace lock_kernel(), by per-device mutexes
and allowing concurrent ioctl(SG_IO) in different processes with
different sr devices.

I had to patch some parts by hand as the posted patch did not slide
seamlessly into my more current source tree, but I'm happy to report
that the patch indeed does what it intends and solves the performance
issue with accessing multiple sr devices concurrently. Repeated
concurrent writes to three SATA burners have shown reliable and
performance penalty free runs. In addition, repeated concurrent drive
tray open and close (eject (-t) /dev/sr0 & eject (-t) /dev/sr1 & eject
(-t) /dev/sr2) commands result in simultaneous (as opposed to the
unpatched kernel) and reliable tray movement with no visible
indications of locking problems caused by the patch either physically
or in the kernel logs.

I have been running the patched kernel for a number of days now to
full satisfaction and relief. I would therefore venture to suggest
that mutex_lock(&sr_mutex) is indeed the cause of the severe
performance penalty and that the 2013 Otto Meta patch proposes a
viable remedy that bears nomination for patching into the main kernel
tree.

Sincerely,
Johan de Jong

2016-03-05 20:46:47

by Thomas Schmitt

[permalink] [raw]
Subject: Re: SCSI sr driver: parallel writes to optical serialized which hurts performance (sr_mutex)

Hi,

as developer of libburn i got several user complaints about poor
concurrent throughput. Since last year i suffer from it myself
on kernel 3.16 of Debian 8. Before i had 2.6.18 which did very well
in that aspect.

An old workaround for IDE master-slave concurrency problems brings
a certain degree of relief on some drives. See
http://libburnia-project.org/wiki/ConcurrentLinuxSr

But the much better solution would be to remove the need for the
global lock shared by all ioctl(SG_IO) to all /dev/sr*.

Given the old reports of Otto Meta about possible race conditions
with drives at the same IDE controller, and the rareness of IDE
attached drives nowadays, i propose to keep the global sr_mutex lock
for IDE attached drives.
Question is how this can be determined from the device parameters
of the calls in question:
struct block_device *bdev
struct gendisk *disk
struct scsi_cd *cd


Have a nice day :)

Thomas

2016-03-05 21:25:28

by Wakko Warner

[permalink] [raw]
Subject: Re: SCSI sr driver: parallel writes to optical serialized which hurts performance (sr_mutex)

Johan de Jong wrote:
> In the mean time I have applied and tested the 2013 patch by Otto Meta:
>
> http://marc.info/?l=linux-scsi&m=135705061804384&w=2
>
> which, in short, replaces mutex_lock(&sr_mutex) (global mutex), that
> was introduced in 2010 to replace lock_kernel(), by per-device mutexes
> and allowing concurrent ioctl(SG_IO) in different processes with
> different sr devices.

There seems to be a few patches floating around. I've had one running on
3.3.0 for a long time w/o any issues. I'm currently using the one from Tim
Small (Search for subject Fix performance burning or extracting audio etc.
from multiple optical drives.) on 4.x (where x is 3-4) and a 3.14.something
without any issues.

I still have the emails from Tim. My current usage is 2 systems with 3
burners from the same source.

2016-03-05 21:36:37

by Johan de Jong

[permalink] [raw]
Subject: Re: SCSI sr driver: parallel writes to optical serialized which hurts performance (sr_mutex)

Hi Wakko,

If I remember correctly I did see you commenting on discussions on
either the Otto Meta patch, or another that proposed to remove the
mutex entirely. I was unaware of any others.

Do you have more information on why this never resulted in a succesful
concerted effort to get a patch in the kernel tree? Do the patches
have drawbacks or have they never been submitted properly? If the
latter, we might endeavor it?

Best,
Johan

2016-03-06 02:06:16

by Wakko Warner

[permalink] [raw]
Subject: Re: SCSI sr driver: parallel writes to optical serialized which hurts performance (sr_mutex)

Johan de Jong wrote:
> Hi Wakko,
>
> If I remember correctly I did see you commenting on discussions on
> either the Otto Meta patch, or another that proposed to remove the
> mutex entirely. I was unaware of any others.

I received the last set of patches from Tim more than a year ago. I wasn't
using a system with multiple drives other than my 3.3.0 box. Last year
around november I gathered some more drives and put them in another system
and I decided to test the patches. I sent a report back to the list back in
november 2015 (maybe october) about the success.

> Do you have more information on why this never resulted in a succesful
> concerted effort to get a patch in the kernel tree? Do the patches
> have drawbacks or have they never been submitted properly? If the
> latter, we might endeavor it?

No, sorry. I'm just a user. I haven't had any crashes with the patches.
I'd like to see the patches go in, I'm tired of patching every new kernel.
I'm running the ones from Tim on 2 machines. One machine has the dvd
burners (exported as iscsi targets) and the other machine connects to it.
The patches have to be on both machines for it to work (I assume). I'm able
to burn at 16x to 3 burners over iscsi at the same time. I've burned many
discs this way with out any issues (other than bad media).

Just FYI: The computer with the burners is running 4.4.1 and the iscsi
initiator is 3.12.52.

--
Microsoft has beaten Volkswagen's world record. Volkswagen only created 22
million bugs.

2016-03-07 12:14:01

by Alan Cox

[permalink] [raw]
Subject: Re: SCSI sr driver: parallel writes to optical serialized which hurts performance (sr_mutex)

On Sat, 05 Mar 2016 21:47:00 +0100
"Thomas Schmitt" <[email protected]> wrote:

> Hi,
>
> as developer of libburn i got several user complaints about poor
> concurrent throughput. Since last year i suffer from it myself
> on kernel 3.16 of Debian 8. Before i had 2.6.18 which did very well
> in that aspect.
>
> An old workaround for IDE master-slave concurrency problems brings
> a certain degree of relief on some drives. See
> http://libburnia-project.org/wiki/ConcurrentLinuxSr
>
> But the much better solution would be to remove the need for the
> global lock shared by all ioctl(SG_IO) to all /dev/sr*.
>
> Given the old reports of Otto Meta about possible race conditions
> with drives at the same IDE controller, and the rareness of IDE
> attached drives nowadays, i propose to keep the global sr_mutex lock
> for IDE attached drives.

If there are race conditions present in the libata drivers then they want
fixing there. The old IDE drivers are basically obsoleted by libata for
all real world uses and most "IDE" devices are actually SATA now anyway.

Alan

2016-03-07 13:11:33

by Thomas Schmitt

[permalink] [raw]
Subject: Re: SCSI sr driver: parallel writes to optical serialized which hurts performance (sr_mutex)

Hi,

i wrote:
> > Given the old reports of Otto Meta about possible race conditions
> > with drives at the same IDE controller, and the rareness of IDE
> > attached drives nowadays, i propose to keep the global sr_mutex lock
> > for IDE attached drives.

One Thousand Gnomes <[email protected]> wrote:
> If there are race conditions present in the libata drivers then they want
> fixing there.

>From the view of software architecture: of course, yes.

But the research of Johan de Jong shows that this patch was
proposed several times and always failed to be decided due to
problems when testing heavy concurrency on IDE attached drives.

Newest threads known to me (besides this one) were started by Tim Small
in november 2014:
"[PATCH 0/4] Fix performance burning or extracting audio etc.
from multiple optical drives."
http://marc.info/?t=141692734400009&r=1&w=2
"Very slow throughput when using cdparanoia on two SATA CDROM drives with /dev/sr but not /dev/sg"
http://marc.info/?t=141528207400003&r=1&w=2
In the middle of the discussion Jens Axboe was positive towards
the issue. But then came IDE problems.

It is not clear to me whether the reported problems existed already
with the Big Kernel Lock and whether they do not exist with the global
sr_mutex lock which is currently in drivers/scsi/sr.c.

Especially the problem reports of Otto Meta in 2013 are not explainable
alone by wrongly directed SCSI commands or confused householding in
the lower drivers. In
http://marc.info/?l=linux-scsi&m=135734072119667&w=2
he reports that a drive tray was stuck out and moved in only on
command eject -t, but not on pressing the drive's eject button.

This is not SCSI MMC (as payload of ATAPI) behavior.
The SCSI command 1Eh PREVENT/ALLOW MEDIUM REMOVAL is defined in MMC-5
to override the definition in SPC-3. MMC-5, 6.14 says about it:
"[...] requests that the Drive enable or disable the removal of the
medium in the Drive. The Drive shall not allow medium removal if any
Host currently has medium removal prevented."
The drive cannot protect the medium when the tray is out. So being
stuck in this state is not normal on drive firmware level.


> The old IDE drivers are basically obsoleted by libata for
> all real world uses and most "IDE" devices are actually SATA now anyway.

Of course, if we can get reports that a modern kernel on a machine
with two optical drives on the same IDE controller works fine,
then we do not have to care for older kernels.

But given the situation i see, it seems better to handle all IDE
drives like they are handled now, and to only let the SATA or USB
attached drives perform per-drive locking.
We have several positive reports with SATA drives. So i consider it
proven that no concurrency problems exist before SATA processing
gets separated from IDE processing.

If still concurrency problems show up on IDE, then they cannot be
blamed on the relaxed locking of the other drives.
If IDE users want no discrimination, one could give them a kernel
configuration option and let them search for problems on their own
risk. Maybe they find out what's really wrong in IDE.

(Uninformed guess:
include/uapi/linux/major.h and block/genhd.c function
add_disk(struct gendisk *disk) make me think that one could
possibly recognize IDE attached drives by comparing
static int ide_majors[] =
{IDE0_MAJOR, IDE1_MAJOR, IDE2_MAJOR, IDE3_MAJOR, IDE4_MAJOR,
IDE5_MAJOR, IDE6_MAJOR, IDE7_MAJOR, IDE8_MAJOR, IDE9_MAJOR,
-1};
with
MAJOR(disk_to_dev(disk)->devt)
)


Have a nice day :)

Thomas