2020-02-29 02:52:55

by Josh Triplett

[permalink] [raw]
Subject: [PATCH] nvme: Check for readiness more quickly, to speed up boot time

After initialization, nvme_wait_ready checks for readiness every 100ms,
even though the drive may be ready far sooner than that. This delays
system boot by hundreds of milliseconds. Reduce the delay, checking for
readiness every millisecond instead.

Boot-time tests on an AWS c5.12xlarge:

Before:
[ 0.546936] initcall nvme_init+0x0/0x5b returned 0 after 37 usecs
...
[ 0.764178] nvme nvme0: 2/0/0 default/read/poll queues
[ 0.768424] nvme0n1: p1
[ 0.774132] EXT4-fs (nvme0n1p1): mounted filesystem with ordered data mode. Opts: (null)
[ 0.774146] VFS: Mounted root (ext4 filesystem) on device 259:1.
...
[ 0.788141] Run /sbin/init as init process

After:
[ 0.537088] initcall nvme_init+0x0/0x5b returned 0 after 37 usecs
...
[ 0.543457] nvme nvme0: 2/0/0 default/read/poll queues
[ 0.548473] nvme0n1: p1
[ 0.554339] EXT4-fs (nvme0n1p1): mounted filesystem with ordered data mode. Opts: (null)
[ 0.554344] VFS: Mounted root (ext4 filesystem) on device 259:1.
...
[ 0.567931] Run /sbin/init as init process

Signed-off-by: Josh Triplett <[email protected]>
---
drivers/nvme/host/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index a4d8c90ee7cc..04174a40b9b0 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2074,7 +2074,7 @@ static int nvme_wait_ready(struct nvme_ctrl *ctrl, u64 cap, bool enabled)
if ((csts & NVME_CSTS_RDY) == bit)
break;

- msleep(100);
+ usleep_range(1000, 2000);
if (fatal_signal_pending(current))
return -EINTR;
if (time_after(jiffies, timeout)) {
--
2.25.1


2020-03-01 02:03:58

by Chaitanya Kulkarni

[permalink] [raw]
Subject: Re: [PATCH] nvme: Check for readiness more quickly, to speed up boot time

Nit:- please have a look at the patch subject line and make
sure it is not exceeding the required length.

One question though, have you seen similar kind of performance
improvements when system is booted ?

I took some numbers and couldn't see similar benefit. See [1] :-

Without :-

714.532560-714.456099 = .076461
721.189886-721.110845 = .079041
727.836938-727.765572 = .071366
734.589886-734.519779 = .070107
741.244296-741.173503 = .070793

With this patch :-

805.549656-805.461924 = .087732
812.199549-812.124364 = .075185
818.868111-818.793779 = .074332
825.636130-825.554311 = .081819
832.287043-832.205882 = .081161

Regards,
Chaitanya

[1] Detail log :-

Without this patch :-

[ 714.456099] nvme_init 3133
[ 714.458501] nvme nvme0: pci function 0000:61:00.0
[ 714.532560] nvme nvme0: 32/0/0 default/read/poll queues
[ 721.110845] nvme_init 3133
[ 721.114112] nvme nvme0: pci function 0000:61:00.0
[ 721.189886] nvme nvme0: 32/0/0 default/read/poll queues
[ 727.765572] nvme_init 3133
[ 727.767814] nvme nvme0: pci function 0000:61:00.0
[ 727.836938] nvme nvme0: 32/0/0 default/read/poll queues
[ 734.519779] nvme_init 3133
[ 734.522099] nvme nvme0: pci function 0000:61:00.0
[ 734.589886] nvme nvme0: 32/0/0 default/read/poll queues
[ 741.173503] nvme_init 3133
[ 741.176089] nvme nvme0: pci function 0000:61:00.0
[ 741.244296] nvme nvme0: 32/0/0 default/read/poll queues

With this patch :-

[ 805.461924] nvme_init 3133
[ 805.464521] nvme nvme0: pci function 0000:61:00.0
[ 805.549656] nvme nvme0: 32/0/0 default/read/poll queues
[ 812.124364] nvme_init 3133
[ 812.126975] nvme nvme0: pci function 0000:61:00.0
[ 812.199549] nvme nvme0: 32/0/0 default/read/poll queues
[ 818.793779] nvme_init 3133
[ 818.796581] nvme nvme0: pci function 0000:61:00.0
[ 818.868111] nvme nvme0: 32/0/0 default/read/poll queues
[ 825.554311] nvme_init 3133
[ 825.556551] nvme nvme0: pci function 0000:61:00.0
[ 825.636130] nvme nvme0: 32/0/0 default/read/poll queues
[ 832.205882] nvme_init 3133
[ 832.208934] nvme nvme0: pci function 0000:61:00.0
[ 832.287043] nvme nvme0: 32/0/0 default/read/poll queues


On 02/28/2020 06:52 PM, Josh Triplett wrote:
> After initialization, nvme_wait_ready checks for readiness every 100ms,
> even though the drive may be ready far sooner than that. This delays
> system boot by hundreds of milliseconds. Reduce the delay, checking for
> readiness every millisecond instead.
>
> Boot-time tests on an AWS c5.12xlarge:
>
> Before:
> [ 0.546936] initcall nvme_init+0x0/0x5b returned 0 after 37 usecs
> ...
> [ 0.764178] nvme nvme0: 2/0/0 default/read/poll queues
> [ 0.768424] nvme0n1: p1
> [ 0.774132] EXT4-fs (nvme0n1p1): mounted filesystem with ordered data mode. Opts: (null)
> [ 0.774146] VFS: Mounted root (ext4 filesystem) on device 259:1.
> ...
> [ 0.788141] Run /sbin/init as init process
>
> After:
> [ 0.537088] initcall nvme_init+0x0/0x5b returned 0 after 37 usecs
> ...
> [ 0.543457] nvme nvme0: 2/0/0 default/read/poll queues
> [ 0.548473] nvme0n1: p1
> [ 0.554339] EXT4-fs (nvme0n1p1): mounted filesystem with ordered data mode. Opts: (null)
> [ 0.554344] VFS: Mounted root (ext4 filesystem) on device 259:1.
> ...
> [ 0.567931] Run /sbin/init as init process
>
> Signed-off-by: Josh Triplett <[email protected]>
> ---
> drivers/nvme/host/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index a4d8c90ee7cc..04174a40b9b0 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2074,7 +2074,7 @@ static int nvme_wait_ready(struct nvme_ctrl *ctrl, u64 cap, bool enabled)
> if ((csts & NVME_CSTS_RDY) == bit)
> break;
>
> - msleep(100);
> + usleep_range(1000, 2000);
> if (fatal_signal_pending(current))
> return -EINTR;
> if (time_after(jiffies, timeout)) {
>

2020-03-01 09:03:20

by Josh Triplett

[permalink] [raw]
Subject: Re: [PATCH] nvme: Check for readiness more quickly, to speed up boot time

On Sun, Mar 01, 2020 at 02:01:05AM +0000, Chaitanya Kulkarni wrote:
> Nit:- please have a look at the patch subject line and make
> sure it is not exceeding the required length.

Documentation/process/submitting-patches.rst says "no more than 70-75
characters,", and the summary here is 61. Checkpatch similarly says 75.
Is there somewhere I missed that gives a different number?

> One question though, have you seen similar kind of performance
> improvements when system is booted ?

I tested with nvme compiled in, both with one NVMe device and two NVMe
devices, and in both cases it provided a *substantial* speedup. I didn't
test nvme compiled as a module, but in general I'd expect that if you're
trying to optimize initialization time you'd want to build it in.

> I took some numbers and couldn't see similar benefit. See [1] :-
>
> Without :-
>
> 714.532560-714.456099 = .076461
> 721.189886-721.110845 = .079041
> 727.836938-727.765572 = .071366
> 734.589886-734.519779 = .070107
> 741.244296-741.173503 = .070793

With numbers in this range, I don't see how you could be hitting the
100ms msleep at all, even once, which means this patch shouldn't have
any effect on the timing you're measuring.

- Josh Triplett

2020-03-01 18:33:29

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH] nvme: Check for readiness more quickly, to speed up boot time

On Fri, Feb 28, 2020 at 06:52:28PM -0800, Josh Triplett wrote:
> @@ -2074,7 +2074,7 @@ static int nvme_wait_ready(struct nvme_ctrl *ctrl, u64 cap, bool enabled)
> if ((csts & NVME_CSTS_RDY) == bit)
> break;
>
> - msleep(100);
> + usleep_range(1000, 2000);
> if (fatal_signal_pending(current))
> return -EINTR;
> if (time_after(jiffies, timeout)) {

The key being this sleep schedules the task unlike udelay. It's neat
you can boot where 100ms is considered a long time.

This clearly helps when you've one nvme that becomes ready quickly, but
what happens with many nvme's that are slow to ready? This change will
end up polling the status of those 1000's of times, I wonder if there's
a point where this frequent sleep/wake cycle initializing a whole lot
of nvme devices in parallel may interfere with other init tasks.

I doubt there's really an issue there, but thought it's worth considering
what happens at the other end of the specturm.

Anyway, the patch looks fine to me.

Reviewed-by: Keith Busch <[email protected]>

2020-03-01 19:16:24

by Josh Triplett

[permalink] [raw]
Subject: Re: [PATCH] nvme: Check for readiness more quickly, to speed up boot time

On Sun, Mar 01, 2020 at 10:32:31AM -0800, Keith Busch wrote:
> On Fri, Feb 28, 2020 at 06:52:28PM -0800, Josh Triplett wrote:
> > @@ -2074,7 +2074,7 @@ static int nvme_wait_ready(struct nvme_ctrl *ctrl, u64 cap, bool enabled)
> > if ((csts & NVME_CSTS_RDY) == bit)
> > break;
> >
> > - msleep(100);
> > + usleep_range(1000, 2000);
> > if (fatal_signal_pending(current))
> > return -EINTR;
> > if (time_after(jiffies, timeout)) {
>
> The key being this sleep schedules the task unlike udelay.

Right; I don't think it's reasonable to busyloop here, just sleep for
less time.

> It's neat you can boot where 100ms is considered a long time.

It's been fun. This was one of the longest single delays in a ~1s boot.

> This clearly helps when you've one nvme that becomes ready quickly, but
> what happens with many nvme's that are slow to ready? This change will
> end up polling the status of those 1000's of times, I wonder if there's
> a point where this frequent sleep/wake cycle initializing a whole lot
> of nvme devices in parallel may interfere with other init tasks.

usleep_range allows the kernel to consolidate those wakeups, so if you
have multiple NVMe devices, the kernel should in theory just wake up
once, check them all for readiness, and go back to sleep.

> I doubt there's really an issue there, but thought it's worth considering
> what happens at the other end of the specturm.
>
> Anyway, the patch looks fine to me.
>
> Reviewed-by: Keith Busch <[email protected]>

Thank you!

Does this seem reasonable to enqueue for 5.7?

- Josh Triplett

2020-03-01 19:54:44

by Chaitanya Kulkarni

[permalink] [raw]
Subject: Re: [PATCH] nvme: Check for readiness more quickly, to speed up boot time

Keith,

On 03/01/2020 10:32 AM, Keith Busch wrote:
> The key being this sleep schedules the task unlike udelay. It's neat
> you can boot where 100ms is considered a long time.
>
> This clearly helps when you've one nvme that becomes ready quickly, but
> what happens with many nvme's that are slow to ready? This change will
> end up polling the status of those 1000's of times, I wonder if there's
> a point where this frequent sleep/wake cycle initializing a whole lot
> of nvme devices in parallel may interfere with other init tasks.

This is what I didn't understand, although patch does improves
performance for one controller under one situation, (also after
testing I didn't see big issues with the controllers that I've,
I didn't understand how it will behave with the situation you have
mentioned. But anyways, looks good.

Reviewed-by: Chaitanya Kulkarni <[email protected]>

2020-03-02 14:53:38

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH] nvme: Check for readiness more quickly, to speed up boot time

On Sun, Mar 01, 2020 at 11:15:01AM -0800, Josh Triplett wrote:
> On Sun, Mar 01, 2020 at 10:32:31AM -0800, Keith Busch wrote:
> > I doubt there's really an issue there, but thought it's worth considering
> > what happens at the other end of the specturm.
> >
> > Anyway, the patch looks fine to me.
> >
> > Reviewed-by: Keith Busch <[email protected]>
>
> Thank you!
>
> Does this seem reasonable to enqueue for 5.7?

Yes, early enough for 5.7. Let's just give this a few more days to see if
nvme fabrics developers have any comments. Reading controller status for
those transports is more complicated than PCI. I don't see an issue
there either, but since this is common code, we should see if anyone
else want to weigh in.