2019-11-08 08:45:32

by Damien Le Moal

[permalink] [raw]
Subject: Re: Slow I/O on USB media after commit f664a3cc17b7d0a2bc3b3ab96181e1029b0ec0e6

On 2019/11/08 4:00, Andrea Vai wrote:
> [Sorry for the duplicate message, it didn't reach the lists due to
> html formatting]
> Il giorno gio 7 nov 2019 alle ore 08:54 Damien Le Moal
> <[email protected]> ha scritto:
>>
>> On 2019/11/07 16:04, Andrea Vai wrote:
>>> Il giorno mer, 06/11/2019 alle 22.13 +0000, Damien Le Moal ha scritto:
>>>>
>>>>
>>>> Please simply try your write tests after doing this:
>>>>
>>>> echo mq-deadline > /sys/block/<name of your USB
>>>> disk>/queue/scheduler
>>>>
>>>> And confirm that mq-deadline is selected with:
>>>>
>>>> cat /sys/block/<name of your USB disk>/queue/scheduler
>>>> [mq-deadline] kyber bfq none
>>>
>>> ok, which kernel should I test with this: the fresh git cloned, or the
>>> one just patched with Alan's patch, or doesn't matter which one?
>>
>> Probably all of them to see if there are any differences.
>
> with both kernels, the output of
> cat /sys/block/sdh/queue/schedule
>
> already contains [mq-deadline]: is it correct to assume that the echo
> command and the subsequent testing is useless? What to do now?

Probably, yes. Have you obtained a blktrace of the workload during these
tests ? Any significant difference in the IO pattern (IO size and
randomness) and IO timing (any device idle time where the device has no
command to process) ? Asking because the problem may be above the block
layer, with the file system for instance.

>
> Thanks, and bye
> Andrea
>


--
Damien Le Moal
Western Digital Research


2019-11-08 14:34:48

by Jens Axboe

[permalink] [raw]
Subject: Re: Slow I/O on USB media after commit f664a3cc17b7d0a2bc3b3ab96181e1029b0ec0e6

On 11/8/19 1:42 AM, Damien Le Moal wrote:
> On 2019/11/08 4:00, Andrea Vai wrote:
>> [Sorry for the duplicate message, it didn't reach the lists due to
>> html formatting]
>> Il giorno gio 7 nov 2019 alle ore 08:54 Damien Le Moal
>> <[email protected]> ha scritto:
>>>
>>> On 2019/11/07 16:04, Andrea Vai wrote:
>>>> Il giorno mer, 06/11/2019 alle 22.13 +0000, Damien Le Moal ha scritto:
>>>>>
>>>>>
>>>>> Please simply try your write tests after doing this:
>>>>>
>>>>> echo mq-deadline > /sys/block/<name of your USB
>>>>> disk>/queue/scheduler
>>>>>
>>>>> And confirm that mq-deadline is selected with:
>>>>>
>>>>> cat /sys/block/<name of your USB disk>/queue/scheduler
>>>>> [mq-deadline] kyber bfq none
>>>>
>>>> ok, which kernel should I test with this: the fresh git cloned, or the
>>>> one just patched with Alan's patch, or doesn't matter which one?
>>>
>>> Probably all of them to see if there are any differences.
>>
>> with both kernels, the output of
>> cat /sys/block/sdh/queue/schedule
>>
>> already contains [mq-deadline]: is it correct to assume that the echo
>> command and the subsequent testing is useless? What to do now?
>
> Probably, yes. Have you obtained a blktrace of the workload during these
> tests ? Any significant difference in the IO pattern (IO size and
> randomness) and IO timing (any device idle time where the device has no
> command to process) ? Asking because the problem may be above the block
> layer, with the file system for instance.

blktrace would indeed be super useful, especially if you can do that
with a kernel that's fast for you, and one with the current kernel
where it's slow.

Given that your device is sdh, you simply do:

# blktrace /dev/sdh

and then run the test, then ctrl-c the blktrace. Then do:

# blkparse sdh > output

and save that output file. Do both runs, and bzip2 them up. The shorter
the run you can reproduce with the better, to cut down on the size of
the traces.

--
Jens Axboe

2019-11-09 10:10:38

by Ming Lei

[permalink] [raw]
Subject: Re: Slow I/O on USB media after commit f664a3cc17b7d0a2bc3b3ab96181e1029b0ec0e6

On Fri, Nov 08, 2019 at 08:42:53AM +0000, Damien Le Moal wrote:
> On 2019/11/08 4:00, Andrea Vai wrote:
> > [Sorry for the duplicate message, it didn't reach the lists due to
> > html formatting]
> > Il giorno gio 7 nov 2019 alle ore 08:54 Damien Le Moal
> > <[email protected]> ha scritto:
> >>
> >> On 2019/11/07 16:04, Andrea Vai wrote:
> >>> Il giorno mer, 06/11/2019 alle 22.13 +0000, Damien Le Moal ha scritto:
> >>>>
> >>>>
> >>>> Please simply try your write tests after doing this:
> >>>>
> >>>> echo mq-deadline > /sys/block/<name of your USB
> >>>> disk>/queue/scheduler
> >>>>
> >>>> And confirm that mq-deadline is selected with:
> >>>>
> >>>> cat /sys/block/<name of your USB disk>/queue/scheduler
> >>>> [mq-deadline] kyber bfq none
> >>>
> >>> ok, which kernel should I test with this: the fresh git cloned, or the
> >>> one just patched with Alan's patch, or doesn't matter which one?
> >>
> >> Probably all of them to see if there are any differences.
> >
> > with both kernels, the output of
> > cat /sys/block/sdh/queue/schedule
> >
> > already contains [mq-deadline]: is it correct to assume that the echo
> > command and the subsequent testing is useless? What to do now?
>
> Probably, yes. Have you obtained a blktrace of the workload during these
> tests ? Any significant difference in the IO pattern (IO size and
> randomness) and IO timing (any device idle time where the device has no
> command to process) ? Asking because the problem may be above the block
> layer, with the file system for instance.

You may get the IO pattern via the previous trace

https://lore.kernel.org/linux-usb/[email protected]/

IMO, if it is related write order, one possibility could be that
the queue lock is killed in .make_request_fn().


Thanks,
Ming

2019-11-11 10:47:53

by Andrea Vai

[permalink] [raw]
Subject: Re: Slow I/O on USB media after commit f664a3cc17b7d0a2bc3b3ab96181e1029b0ec0e6

Il giorno ven, 08/11/2019 alle 07.33 -0700, Jens Axboe ha scritto:
> On 11/8/19 1:42 AM, Damien Le Moal wrote:
> > On 2019/11/08 4:00, Andrea Vai wrote:
> >> [Sorry for the duplicate message, it didn't reach the lists due
> to
> >> html formatting]
> >> Il giorno gio 7 nov 2019 alle ore 08:54 Damien Le Moal
> >> <[email protected]> ha scritto:
> >>>
> >>> On 2019/11/07 16:04, Andrea Vai wrote:
> >>>> Il giorno mer, 06/11/2019 alle 22.13 +0000, Damien Le Moal ha
> scritto:
> >>>>>
> >>>>>
> >>>>> Please simply try your write tests after doing this:
> >>>>>
> >>>>> echo mq-deadline > /sys/block/<name of your USB
> >>>>> disk>/queue/scheduler
> >>>>>
> >>>>> And confirm that mq-deadline is selected with:
> >>>>>
> >>>>> cat /sys/block/<name of your USB disk>/queue/scheduler
> >>>>> [mq-deadline] kyber bfq none
> >>>>
> >>>> ok, which kernel should I test with this: the fresh git cloned,
> or the
> >>>> one just patched with Alan's patch, or doesn't matter which
> one?
> >>>
> >>> Probably all of them to see if there are any differences.
> >>
> >> with both kernels, the output of
> >> cat /sys/block/sdh/queue/schedule
> >>
> >> already contains [mq-deadline]: is it correct to assume that the
> echo
> >> command and the subsequent testing is useless? What to do now?
> >
> > Probably, yes. Have you obtained a blktrace of the workload during
> these
> > tests ? Any significant difference in the IO pattern (IO size and
> > randomness) and IO timing (any device idle time where the device
> has no
> > command to process) ? Asking because the problem may be above the
> block
> > layer, with the file system for instance.
>
> blktrace would indeed be super useful, especially if you can do that
> with a kernel that's fast for you, and one with the current kernel
> where it's slow.
>
> Given that your device is sdh, you simply do:
>
> # blktrace /dev/sdh
>
> and then run the test, then ctrl-c the blktrace. Then do:
>
> # blkparse sdh > output
>
> and save that output file. Do both runs, and bzip2 them up. The
> shorter
> the run you can reproduce with the better, to cut down on the size
> of
> the traces.

Sorry, the next message from Ming...

-----
You may get the IO pattern via the previous trace
https://lore.kernel.org/linux-usb/[email protected]/

IMO, if it is related write order, one possibility could be that
the queue lock is killed in .make_request_fn().
-----

...made me wonder if I should really do the blkparse trace test, or
not. So please confirm if it's needed (testing is quite time-consuming
, so I'd like to do it if it's needed).

Thanks, and bye,
Andrea