Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp1002643ybx; Wed, 30 Oct 2019 08:25:59 -0700 (PDT) X-Google-Smtp-Source: APXvYqz4pJnHAV8Iapu9VnzJucpMizk8YsBuZWdIc68ZYO/f7Ig+7Tip626ONU4CVT7EzmbCQ+Zf X-Received: by 2002:a50:cd53:: with SMTP id d19mr184413edj.197.1572449159831; Wed, 30 Oct 2019 08:25:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1572449159; cv=none; d=google.com; s=arc-20160816; b=nxuG0wxVnmf1skl7cPwbUdqBN+LgbgyQlDYBV+SjmbWnPZ6G7hQgcKQD+b3y3Mo/ej wPDaUCJC2YuIFqnYhZ/uuxDcfBTyjwrefSJL16xB3GJqRtDSIFNQonaWqkdRsZK70vs+ VlgjKU8CHROV1//nwnD34R6KWDlLCUGzAMyUkJifIgJqJKNAI+Ii7Pu2f0zrUHmxro+i LUFTJC0Y2bg+1P/pl5PxpaP3B6sYUWX0LKUXO0/WUPThtG77k76ai9wOaxQXbj8DBflm JWzA6ds33bIrN65l4t2DpEAn8aEQXLQ5sxBe8HAEkI4PDGffVgasziLU25lLuX2LxRWh x4rw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from; bh=B1jWXtVYBPO9zoxyPKkm+3a83J7aYJImHzyHlgsZNsA=; b=nWIo0iunc8PKLOZjlUo7zqmIdef/LYzMk6kAr+lUAUCW6FE5uF3WF7RNm7ZK2kDroA SSY7085hsee7k/DWbVsXHnqfQZ6MYrM6Rux74odaRzkQXryunbMO0vOE53VcnuaniFwd XmZPHdMmOC8nOi5Heu/HtA4YHfviEu0H4VsMN1m4CJlpRU/gIK1Zy1z6MJJORtjpOZ2Q W6GvH3tkZ790z3mA4H5TZZwJJopfsGXtY5JYPiGxqlnpo9e4H4WE45W6yeLQgkLaRHDo X/XDUM2lWthanSjGDQ9vcRc8q/6ujLzewuV5KmoNC4n40p422k1ETIzDplO83t3KccBi UrJg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k16si1752684edj.133.2019.10.30.08.25.35; Wed, 30 Oct 2019 08:25:59 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726535AbfJ3PYW (ORCPT + 99 others); Wed, 30 Oct 2019 11:24:22 -0400 Received: from mx2.suse.de ([195.135.220.15]:56766 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726246AbfJ3PYW (ORCPT ); Wed, 30 Oct 2019 11:24:22 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id B3A57AE87; Wed, 30 Oct 2019 15:24:20 +0000 (UTC) From: Daniel Wagner To: linux-nvme@lists.infradead.org Cc: linux-kernel@vger.kernel.org, Daniel Wagner , Christoph Hellwig , Sagi Grimberg , Johannes Thumshirn Subject: [RFC] nvmet: Always remove processed AER elements from list Date: Wed, 30 Oct 2019 16:24:18 +0100 Message-Id: <20191030152418.23753-1-dwagner@suse.de> X-Mailer: git-send-email 2.16.4 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org All async events are enqueued via nvmet_add_async_event() which updates the ctrl->async_event_cmds[] array and additionally an struct nvmet_async_event is added to the ctrl->async_events list. Under normal operations the nvmet_async_event_work() updates again the ctrl->async_event_cmds and removes the corresponding struct nvmet_async_event from the list again. Though nvmet_sq_destroy() could be called which calles nvmet_async_events_free() which only updates the ctrl->async_event_cmds[] array. Add a new function nvmet_async_events_process() which processes the async events and updates both array and the list. With this we avoid having two places where the array and list are modified. Cc: Christoph Hellwig Cc: Sagi Grimberg Cc: Johannes Thumshirn Signed-off-by: Daniel Wagner --- Hi, I've got following oops: PID: 79413 TASK: ffff92f03a814ec0 CPU: 19 COMMAND: "kworker/19:2" #0 [ffffa5308b8c3c58] machine_kexec at ffffffff8e05dd02 #1 [ffffa5308b8c3ca8] __crash_kexec at ffffffff8e12102a #2 [ffffa5308b8c3d68] crash_kexec at ffffffff8e122019 #3 [ffffa5308b8c3d80] oops_end at ffffffff8e02e091 #4 [ffffa5308b8c3da0] general_protection at ffffffff8e8015c5 [exception RIP: nvmet_async_event_work+94] RIP: ffffffffc0d9a80e RSP: ffffa5308b8c3e58 RFLAGS: 00010202 RAX: dead000000000100 RBX: ffff92dcbc7464b0 RCX: 0000000000000002 RDX: 0000000000040002 RSI: 38ffff92dc9814cf RDI: ffff92f217722f20 RBP: ffff92dcbc746418 R8: 0000000000000000 R9: 0000000000000000 R10: 000000000000035b R11: ffff92efb8dd2091 R12: ffff92dcbc7464a0 R13: ffff92dbe03a5f29 R14: 0000000000000000 R15: 0ffff92f92f26864 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #5 [ffffa5308b8c3e78] process_one_work at ffffffff8e0a3b0c #6 [ffffa5308b8c3eb8] worker_thread at ffffffff8e0a41e7 #7 [ffffa5308b8c3f10] kthread at ffffffff8e0a93af #8 [ffffa5308b8c3f50] ret_from_fork at ffffffff8e800235 this maps to nvmet_async_event_results. So it looks like this function access a stale aen pointer: static u32 nvmet_async_event_result(struct nvmet_async_event *aen) { return aen->event_type | (aen->event_info << 8) | (aen->log_page << 16); } The test which is executed removes during operation one of the nodes. The only odd thing I could figure out so far is that the list is not updated in sync with the array in all cases. I suppose this could be the source of the crash. Not verified yet. Johannes though it makes sense to post this as question. Thanks, Daniel drivers/nvme/target/core.c | 39 +++++++++++++++++---------------------- 1 file changed, 17 insertions(+), 22 deletions(-) diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c index 3a67e244e568..9d9efd585847 100644 --- a/drivers/nvme/target/core.c +++ b/drivers/nvme/target/core.c @@ -129,27 +129,8 @@ static u32 nvmet_async_event_result(struct nvmet_async_event *aen) return aen->event_type | (aen->event_info << 8) | (aen->log_page << 16); } -static void nvmet_async_events_free(struct nvmet_ctrl *ctrl) +static void nvmet_async_events_process(struct nvmet_ctrl *ctrl, u16 status) { - struct nvmet_req *req; - - while (1) { - mutex_lock(&ctrl->lock); - if (!ctrl->nr_async_event_cmds) { - mutex_unlock(&ctrl->lock); - return; - } - - req = ctrl->async_event_cmds[--ctrl->nr_async_event_cmds]; - mutex_unlock(&ctrl->lock); - nvmet_req_complete(req, NVME_SC_INTERNAL | NVME_SC_DNR); - } -} - -static void nvmet_async_event_work(struct work_struct *work) -{ - struct nvmet_ctrl *ctrl = - container_of(work, struct nvmet_ctrl, async_event_work); struct nvmet_async_event *aen; struct nvmet_req *req; @@ -163,16 +144,30 @@ static void nvmet_async_event_work(struct work_struct *work) } req = ctrl->async_event_cmds[--ctrl->nr_async_event_cmds]; - nvmet_set_result(req, nvmet_async_event_result(aen)); + if (status == 0) + nvmet_set_result(req, nvmet_async_event_result(aen)); list_del(&aen->entry); kfree(aen); mutex_unlock(&ctrl->lock); - nvmet_req_complete(req, 0); + nvmet_req_complete(req, status); } } +static void nvmet_async_events_free(struct nvmet_ctrl *ctrl) +{ + nvmet_async_events_process(ctrl, NVME_SC_INTERNAL | NVME_SC_DNR); +} + +static void nvmet_async_event_work(struct work_struct *work) +{ + struct nvmet_ctrl *ctrl = + container_of(work, struct nvmet_ctrl, async_event_work); + + nvmet_async_events_process(ctrl, 0); +} + void nvmet_add_async_event(struct nvmet_ctrl *ctrl, u8 event_type, u8 event_info, u8 log_page) { -- 2.16.4