Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp1273061ybl; Sat, 18 Jan 2020 23:11:59 -0800 (PST) X-Google-Smtp-Source: APXvYqx8NP3GbptlG/ExZdTjNP3jtGatiqlelhEMJX4kB1Cw+TZ4oPCcD+vT3+JJ8cq/5tDj5lhA X-Received: by 2002:aca:db43:: with SMTP id s64mr8801320oig.144.1579417918945; Sat, 18 Jan 2020 23:11:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1579417918; cv=none; d=google.com; s=arc-20160816; b=vGME60EC29568HubJbAX0dX6weIWq8twbVyQWpQ+mM8LxI47/Ypp/mfOzvBvBu/VM1 X2loM9Wtkyh4ITz7trzKssE90dEyV+/irDCi+5meleqikrh8Px7kZmU10qlr06aiNmnb GYm06EI5c2AHPlLruIBRekpFHmjvYOqP8ET6G2qaav5y97X0W+xmm7xVKj244k76eTBU i1ib3oz8ugoH/J1Po1t1BMKEngdBsML7Ir+9+FFttu8EI3qlNw+u8+15ocEAFmI2tYf4 WgSavyiudzO1ZxSkGr4nILxHCDvZ11af/Makpgs/HGUjh3idHReXC2OesR2VO+Mly69g hLNA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:cc:to :subject; bh=TGrIhq+EFxs1Wv47XfBWQIggBIE7VCy20P03j/zrp6c=; b=NnoS0Q/UHfbfuSkRokjcJO0LJxBsiknb+OmKVaeb+9OyWHrGJ+9GdJghzUXXOXi4Wu hVaptzTIoH5lOjpsmwREdt1iDXfN794Iz2VEYIBvR8P2N2UOjVFY1S4ZsVwYroND1L9Q ocgL0Ssupe03R9if8dcQjx217KvZ4aTlIx75zkun3w9TCVRDbf4NfKy1CeonqiLzOIch hJJFHXh2wyvKifP+j3vGf2af/DRGOxGSmCOkwcd4DQn5wD4GSUM6mKi3xcJ4cqCU/sMP eOyoaIw2ZSAE6+zhDULL8v7NGMFYwVwQrV64q3Iiyv4HmPDIKuZJg/3tHhKuCcUlTLgi l8Jg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n2si17384041otk.177.2020.01.18.23.11.46; Sat, 18 Jan 2020 23:11:58 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726738AbgASHKu (ORCPT + 99 others); Sun, 19 Jan 2020 02:10:50 -0500 Received: from szxga07-in.huawei.com ([45.249.212.35]:46534 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726396AbgASHKt (ORCPT ); Sun, 19 Jan 2020 02:10:49 -0500 Received: from DGGEMS409-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id 98AB3E170DCDEDF383EB; Sun, 19 Jan 2020 15:10:47 +0800 (CST) Received: from [127.0.0.1] (10.173.222.66) by DGGEMS409-HUB.china.huawei.com (10.3.19.209) with Microsoft SMTP Server id 14.3.439.0; Sun, 19 Jan 2020 15:10:44 +0800 Subject: Re: [PATCH] nbd: fix potential NULL pointer fault in connect and disconnect process To: Mike Christie , , CC: , , , Xiubo Li References: <20200117115005.37006-1-sunke32@huawei.com> <5E21EF96.1010204@redhat.com> From: "sunke (E)" Message-ID: Date: Sun, 19 Jan 2020 15:10:43 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: <5E21EF96.1010204@redhat.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.173.222.66] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Thanks for your detailed suggestions. 在 2020/1/18 1:32, Mike Christie 写道: > On 01/17/2020 05:50 AM, Sun Ke wrote: >> Connect and disconnect a nbd device repeatedly, will cause >> NULL pointer fault. >> >> It will appear by the steps: >> 1. Connect the nbd device and disconnect it, but now nbd device >> is not disconnected totally. >> 2. Connect the same nbd device again immediately, it will fail >> in nbd_start_device with a EBUSY return value. >> 3. Wait a second to make sure the last config_refs is reduced >> and run nbd_config_put to disconnect the nbd device totally. >> 4. Start another process to open the nbd_device, config_refs >> will increase and at the same time disconnect it. > > Just to make sure I understood this, for step 4 the process is doing: > > open(/dev/nbdX); > ioctl(NBD_DISCONNECT, /dev/nbdX) or nbd_genl_disconnect(for /dev/nbdX) > > ? > do nbd_genl_disconnect(for /dev/nbdX); I tested it. Connect /dev/nbdX through ioctl interface by nbd-client -L -N export localhost /dev/nbdX and through netlink interface by nbd-client localhost XXXX /dev/nbdX, disconnect /dev/nbdX by nbd-client -d /dev/nbdX. Both call nbd_genl_disconnect(for /dev/nbdX) and both contain the same null pointer dereference. > There is no successful NBD_DO_IT / nbd_genl_connect between the open and > disconnect calls at step #4, because it would normally be done at #2 and > that failed. nbd_disconnect_and_put could then reference a null > recv_workq. If we are also racing with a close() then that could free > the device/config from under nbd_disconnect_and_put. > Yes, nbd_disconnect_and_put could then reference a null recv_workq. >> >> To fix it, add a NBD_HAS_STARTED flag. Set it in nbd_start_device_ioctl > > I'm not sure if we need the new bit. We could just add a check for a non > null task_recv in nbd_genl_disconnect like how nbd_start_device and > nbd_genl_disconnect do. > I am also not very sure which is better. because in nbd_config_put, not only recv_workq is null, nbd->task_recv and nbd->config the same. so I doubt that if step 4 do something else will also reference a null pointer. > The new bit might be more clear which is nice. If we got this route, > should the new bit be a runtime_flag like other device state bits? > > Yes, I realize it. Just add a check for a non null task_recv in nbd_genl_disconnect is better, right? >> and nbd_genl_connect if nbd device is started successfully. >> Clear it in nbd_config_put. Test it in nbd_genl_disconnect and >> nbd_genl_reconfigure. >> >> Signed-off-by: Sun Ke >> --- >> drivers/block/nbd.c | 21 +++++++++++++++++++++ >> 1 file changed, 21 insertions(+) >> >> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c >> index b4607dd96185..ddd364e208ab 100644 >> --- a/drivers/block/nbd.c >> +++ b/drivers/block/nbd.c >> @@ -83,6 +83,7 @@ struct link_dead_args { >> >> #define NBD_DESTROY_ON_DISCONNECT 0 >> #define NBD_DISCONNECT_REQUESTED 1 >> +#define NBD_HAS_STARTED 2 >> >> struct nbd_config { >> u32 flags; >> @@ -1215,6 +1216,7 @@ static void nbd_config_put(struct nbd_device *nbd) >> nbd->disk->queue->limits.discard_alignment = 0; >> blk_queue_max_discard_sectors(nbd->disk->queue, UINT_MAX); >> blk_queue_flag_clear(QUEUE_FLAG_DISCARD, nbd->disk->queue); >> + clear_bit(NBD_HAS_STARTED, &nbd->flags); >> >> mutex_unlock(&nbd->config_lock); >> nbd_put(nbd); >> @@ -1290,6 +1292,8 @@ static int nbd_start_device_ioctl(struct nbd_device *nbd, struct block_device *b >> ret = nbd_start_device(nbd); >> if (ret) >> return ret; >> + else >> + set_bit(NBD_HAS_STARTED, &nbd->flags); >> >> if (max_part) >> bdev->bd_invalidated = 1; >> @@ -1961,6 +1965,7 @@ static int nbd_genl_connect(struct sk_buff *skb, struct genl_info *info) >> mutex_unlock(&nbd->config_lock); >> if (!ret) { >> set_bit(NBD_RT_HAS_CONFIG_REF, &config->runtime_flags); >> + set_bit(NBD_HAS_STARTED, &nbd->flags); >> refcount_inc(&nbd->config_refs); >> nbd_connect_reply(info, nbd->index); >> } >> @@ -2008,6 +2013,14 @@ static int nbd_genl_disconnect(struct sk_buff *skb, struct genl_info *info) >> index); >> return -EINVAL; >> } >> + >> + if (!test_bit(NBD_HAS_STARTED, &nbd->flags)) { >> + mutex_unlock(&nbd_index_mutex); >> + printk(KERN_ERR "nbd: device at index %d failed to start\n", >> + index); >> + return -EBUSY; >> + } >> + >> if (!refcount_inc_not_zero(&nbd->refs)) { >> mutex_unlock(&nbd_index_mutex); >> printk(KERN_ERR "nbd: device at index %d is going down\n", >> @@ -2049,6 +2062,14 @@ static int nbd_genl_reconfigure(struct sk_buff *skb, struct genl_info *info) >> index); >> return -EINVAL; >> } >> + >> + if (!test_bit(NBD_HAS_STARTED, &nbd->flags)) { >> + mutex_unlock(&nbd_index_mutex); >> + printk(KERN_ERR "nbd: device at index %d failed to start\n", >> + index); >> + return -EBUSY; >> + } >> + >> if (!refcount_inc_not_zero(&nbd->refs)) { >> mutex_unlock(&nbd_index_mutex); >> printk(KERN_ERR "nbd: device at index %d is going down\n", >> I thought the changes in nbd_genl_reconfigure is necessary althought my test do not call it. but now I think it is superfluous, nbd_genl_reconfigure checks for a non null task_recv. Thanks, Ke > > > . >