Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754168AbdCBHJ4 (ORCPT ); Thu, 2 Mar 2017 02:09:56 -0500 Received: from mail.linux-iscsi.org ([67.23.28.174]:60178 "EHLO linux-iscsi.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750994AbdCBHJv (ORCPT ); Thu, 2 Mar 2017 02:09:51 -0500 Message-ID: <1488438545.21712.73.camel@haakon3.risingtidesystems.com> Subject: Re: [PATCH] target: Fix NULL dereference during LUN lookup + active I/O shutdown From: "Nicholas A. Bellinger" To: "Bryant G. Ly" Cc: target-devel , linux-scsi , lkml , Rob Millner , Vaibhav Tandon Date: Wed, 01 Mar 2017 23:09:05 -0800 In-Reply-To: <1fd6d4ca-c010-f4c1-5586-127a6595ac53@linux.vnet.ibm.com> References: <1487832999-3390-1-git-send-email-nab@linux-iscsi.org> <1fd6d4ca-c010-f4c1-5586-127a6595ac53@linux.vnet.ibm.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.4.4-1 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2847 Lines: 79 On Thu, 2017-02-23 at 11:46 -0600, Bryant G. Ly wrote: > > From: Nicholas Bellinger > > > > When transport_clear_lun_ref() is shutting down a se_lun via > > configfs with new I/O in-flight, it's possible to trigger a > > NULL pointer dereference in transport_lookup_cmd_lun() due > > to the fact percpu_ref_get() doesn't do any __PERCPU_REF_DEAD > > checking before incrementing lun->lun_ref.count after > > lun->lun_ref has switched to atomic_t mode. > > > > This results in a NULL pointer dereference as LUN shutdown > > code in core_tpg_remove_lun() continues running after the > > existing ->release() -> core_tpg_lun_ref_release() callback > > completes, and clears the RCU protected se_lun->lun_se_dev > > pointer. > > > > During the OOPs, the state of lun->lun_ref in the process > > which triggered the NULL pointer dereference looks like > > the following on v4.1.y stable code: > > > > struct se_lun { > > lun_link_magic = 4294932337, > > lun_status = TRANSPORT_LUN_STATUS_FREE, > > > > ..... > > > > lun_se_dev = 0x0, > > lun_sep = 0x0, > > > > ..... > > > > lun_ref = { > > count = { > > counter = 1 > > }, > > percpu_count_ptr = 3, > > release = 0xffffffffa02fa1e0 , > > confirm_switch = 0x0, > > force_atomic = false, > > rcu = { > > next = 0xffff88154fa1a5d0, > > func = 0xffffffff8137c4c0 > > } > > } > > } > > > > To address this bug, use percpu_ref_tryget_live() to ensure > > once __PERCPU_REF_DEAD is visable on all CPUs and ->lun_ref > > has switched to atomic_t, all new I/Os will fail to obtain > > a new lun->lun_ref reference. > > > > Also use an explicit percpu_ref_kill_and_confirm() callback > > to block on ->lun_ref_comp to allow the first stage and > > associated RCU grace period to complete, and then block on > > ->lun_ref_shutdown waiting for the final percpu_ref_put() > > to drop the last reference via transport_lun_remove_cmd() > > before continuing with core_tpg_remove_lun() shutdown. > > > > Reported-by: Rob Millner > > Tested-by: Rob Millner > > Cc: Rob Millner > > Tested-by: Vaibhav Tandon > > Cc: Vaibhav Tandon > > Signed-off-by: Nicholas Bellinger > > --- > > drivers/target/target_core_device.c | 10 ++++++++-- > > drivers/target/target_core_tpg.c | 3 ++- > > drivers/target/target_core_transport.c | 31 ++++++++++++++++++++++++++++++- > > include/target/target_core_base.h | 1 + > > 4 files changed, 41 insertions(+), 4 deletions(-) > > > I have seen this and have tested this with our custom kernel. > > So this looks good from me! > Added your Tested-by to the patch. Thanks Bryant.