2023-11-24 20:58:20

by Mark O'Donovan

[permalink] [raw]
Subject: [PATCH RESEND] nvme: fine-tune sending of first keep-alive

Keep-alive commands are sent half-way through the kato period.
This normally works well but fails when the keep-alive system is
started when we are more than half way through the kato.
This can happen on larger setups or due to host delays.
With this change we now time the initial keep-alive command from
the controller initialisation time, rather than the keep-alive
mechanism activation time.

Signed-off-by: Mark O'Donovan <[email protected]>
---
drivers/nvme/host/core.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 46a4c9c5ea96..8bf24c1cd8bb 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1192,8 +1192,16 @@ static unsigned long nvme_keep_alive_work_period(struct nvme_ctrl *ctrl)

static void nvme_queue_keep_alive_work(struct nvme_ctrl *ctrl)
{
- queue_delayed_work(nvme_wq, &ctrl->ka_work,
- nvme_keep_alive_work_period(ctrl));
+ unsigned long now = jiffies;
+ unsigned long delay = nvme_keep_alive_work_period(ctrl);
+ unsigned long ka_next_check_tm = ctrl->ka_last_check_time + delay;
+
+ if (time_after(now, ka_next_check_tm))
+ delay = 0;
+ else
+ delay = ka_next_check_tm - now;
+
+ queue_delayed_work(nvme_wq, &ctrl->ka_work, delay);
}

static enum rq_end_io_ret nvme_keep_alive_end_io(struct request *rq,
@@ -4471,6 +4479,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
INIT_DELAYED_WORK(&ctrl->failfast_work, nvme_failfast_work);
memset(&ctrl->ka_cmd, 0, sizeof(ctrl->ka_cmd));
ctrl->ka_cmd.common.opcode = nvme_admin_keep_alive;
+ ctrl->ka_last_check_time = jiffies;

BUILD_BUG_ON(NVME_DSM_MAX_RANGES * sizeof(struct nvme_dsm_range) >
PAGE_SIZE);

base-commit: 5b7ad877e4d81f8904ce83982b1ba5c6e83deccb
--
2.39.2


2023-11-27 18:28:38

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH RESEND] nvme: fine-tune sending of first keep-alive

On Fri, Nov 24, 2023 at 08:56:59PM +0000, Mark O'Donovan wrote:
> Keep-alive commands are sent half-way through the kato period.
> This normally works well but fails when the keep-alive system is
> started when we are more than half way through the kato.
> This can happen on larger setups or due to host delays.
> With this change we now time the initial keep-alive command from
> the controller initialisation time, rather than the keep-alive
> mechanism activation time.

Thanks, applied to nvme-6.7.