Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp4250443rwr; Sun, 23 Apr 2023 01:24:06 -0700 (PDT) X-Google-Smtp-Source: AKy350ayh1rO4FDrKqmRwG2Vps1OolAVd3OEsKNr0nwmR2vDdtJMl4PgOc7xRgAtGZbbvVecx4ql X-Received: by 2002:a05:6a00:230a:b0:63d:2d6a:47be with SMTP id h10-20020a056a00230a00b0063d2d6a47bemr12380655pfh.2.1682238245957; Sun, 23 Apr 2023 01:24:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682238245; cv=none; d=google.com; s=arc-20160816; b=u206RHgmtf6PUMy7tdKrnugo2Nm7zgM0D/EqhIrdSD4+CV2gLemoYSyOOrm3JdbtjM MLfxQzw5LtOAvPh98MUiZVgct21h+H+WYOkIWH7Rn6oW8yrW9mKv7qWqF2iKAFOGRBI/ iwFAmmCVfSI7cYGzSWPVNb5316hhDPguCDABCghHL+1WZNLTLDAX3LjLEBqN12awb2+o 9jQu1uxsqEvxamXx9SPsRgvlD8QsXe2jaLc0gp+GFn2HakBU45yhAojoNjhnCkt/lEYm Fa9Yl8iGpJwedAwl0bmv9yEDyna6fjvZCdNd9kGNnFtbRHyUUdR23LLPihchdS+/d4QA o8iA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:cc:to :subject; bh=UVUzZBmohM3mT9Djpzsnf1V/TbkK2lxvlA0qOc5D3+Q=; b=eZMQis1w4f4V7n/vU9KMwuPRs6aInaZQz84/rGvYg7ZUXsZH7ux7WYQaJGW2x/07mY Iy1z7fia/0WIDn6J4Fshv4xWmyiunfriRQwWzpR+SHBqYbzqDh9lQ9RKrFyVKwLaxSl2 Mv6w25AuA1Uyo9CzsY19PSN+FuA2KIb+9Kb26N6J7Eld7HqxHe7x7AP3eLJ6+sBp4vth PmKsZ9HNaGkva8Gp3Uln/lrjI1H7UdAP2Jut8KtOF0QgD0+xHYPLVCit8AD3jJ6e8gwk VBY5oBHf1bNBiu41mR99VcBnQDFMZoZRIPNrbsiEtelvzAWq4w0bO3W5bDgpaXRIwDoo ofJg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k12-20020aa79d0c000000b0063b8a37e42bsi8397000pfp.123.2023.04.23.01.23.54; Sun, 23 Apr 2023 01:24:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229547AbjDWIQM (ORCPT + 99 others); Sun, 23 Apr 2023 04:16:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40426 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229511AbjDWIQL (ORCPT ); Sun, 23 Apr 2023 04:16:11 -0400 Received: from dggsgout12.his.huawei.com (unknown [45.249.212.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E56651986; Sun, 23 Apr 2023 01:16:05 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.153]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4Q41L817nHz4f3khR; Sun, 23 Apr 2023 16:16:00 +0800 (CST) Received: from [10.174.176.73] (unknown [10.174.176.73]) by APP4 (Coremail) with SMTP id gCh0CgD3X7NA6URkr9jnHw--.47528S3; Sun, 23 Apr 2023 16:16:01 +0800 (CST) Subject: Re: [PATCH for-6.4/block] block/rq_qos: protect rq_qos apis with a new lock To: Yu Kuai , tj@kernel.org, hch@lst.de, josef@toxicpanda.com, axboe@kernel.dk Cc: cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, yi.zhang@huawei.com, yangerkun@huawei.com, "yukuai (C)" References: <20230414084008.2085155-1-yukuai1@huaweicloud.com> From: Yu Kuai Message-ID: Date: Sun, 23 Apr 2023 16:15:59 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <20230414084008.2085155-1-yukuai1@huaweicloud.com> Content-Type: text/plain; charset=gbk; format=flowed Content-Transfer-Encoding: 8bit X-CM-TRANSID: gCh0CgD3X7NA6URkr9jnHw--.47528S3 X-Coremail-Antispam: 1UD129KBjvJXoW3Wry7JF1DtFyfJFyDKFy8Xwb_yoWxCr17pa y8KF43A392gr4Dua1DGw4xXwsIgws5KrW8CrWfW34ayrZF9r10vF1kAFyUWFWrArsxZF4k XrW8WrsYkr1UCrJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUU9F14x267AKxVW8JVW5JwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK02 1l84ACjcxK6xIIjxv20xvE14v26F1j6w1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4U JVWxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_Gc CE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E 2Ix0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJV W8JwACjcxG0xvEwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lFIxGxcIEc7CjxVA2Y2ka 0xkIwI1lc7I2V7IY0VAS07AlzVAYIcxG8wCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7x kEbVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E 67AF67kF1VAFwI0_Jw0_GFylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCw CI42IY6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1lIxAIcVCF04k26cxKx2IYs7xG6rW3Jr0E 3s1lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8JVW8JrUvcS sGvfC2KfnxnUUI43ZEXa7VUbXdbUUUUUU== X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,KHOP_HELO_FCRDNS, MAY_BE_FORGED,NICE_REPLY_A,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, ?? 2023/04/14 16:40, Yu Kuai ะด??: > From: Yu Kuai > > commit 50e34d78815e ("block: disable the elevator int del_gendisk") > move rq_qos_exit() from disk_release() to del_gendisk(), this will > introduce some problems: > > 1) If rq_qos_add() is triggered by enabling iocost/iolatency through > cgroupfs, then it can concurrent with del_gendisk(), it's not safe to > write 'q->rq_qos' concurrently. > > 2) Activate cgroup policy that is relied on rq_qos will call > rq_qos_add() and blkcg_activate_policy(), and if rq_qos_exit() is > called in the middle, null-ptr-dereference will be triggered in > blkcg_activate_policy(). > > 3) blkg_conf_open_bdev() can call blkdev_get_no_open() first to find the > disk, then if rq_qos_exit() from del_gendisk() is done before > rq_qos_add(), then memory will be leaked. > > This patch add a new disk level mutex 'rq_qos_mutex': > > 1) The lock will protect rq_qos_exit() directly. > > 2) For wbt that doesn't relied on blk-cgroup, rq_qos_add() can only be > called from disk initialization for now because wbt can't be > destructed until rq_qos_exit(), so it's safe not to protect wbt for > now. Hoever, in case that rq_qos dynamically destruction is supported > in the furture, this patch also protect rq_qos_add() from wbt_init() > directly, this is enough because blk-sysfs already synchronize > writers with disk removal. > > 3) For iocost and iolatency, in order to synchronize disk removal and > cgroup configuration, the lock is held after blkdev_get_no_open() > from blkg_conf_open_bdev(), and is released in blkg_conf_exit(). > In order to fix the above memory leak, disk_live() is checked after > holding the new lock. > Friendly ping ... Thanks, Kuai > Fixes: 50e34d78815e ("block: disable the elevator int del_gendisk") > Signed-off-by: Yu Kuai > --- > block/blk-cgroup.c | 9 +++++++++ > block/blk-core.c | 1 + > block/blk-rq-qos.c | 20 ++++++-------------- > block/blk-wbt.c | 2 ++ > include/linux/blkdev.h | 1 + > 5 files changed, 19 insertions(+), 14 deletions(-) > > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c > index 1c1ebeb51003..0d79d864ecb1 100644 > --- a/block/blk-cgroup.c > +++ b/block/blk-cgroup.c > @@ -705,6 +705,13 @@ int blkg_conf_open_bdev(struct blkg_conf_ctx *ctx) > return -ENODEV; > } > > + mutex_lock(&bdev->bd_queue->rq_qos_mutex); > + if (!disk_live(bdev->bd_disk)) { > + blkdev_put_no_open(bdev); > + mutex_unlock(&bdev->bd_queue->rq_qos_mutex); > + return -ENODEV; > + } > + > ctx->body = input; > ctx->bdev = bdev; > return 0; > @@ -849,6 +856,7 @@ EXPORT_SYMBOL_GPL(blkg_conf_prep); > */ > void blkg_conf_exit(struct blkg_conf_ctx *ctx) > __releases(&ctx->bdev->bd_queue->queue_lock) > + __releases(&ctx->bdev->bd_queue->rq_qos_mutex) > { > if (ctx->blkg) { > spin_unlock_irq(&bdev_get_queue(ctx->bdev)->queue_lock); > @@ -856,6 +864,7 @@ void blkg_conf_exit(struct blkg_conf_ctx *ctx) > } > > if (ctx->bdev) { > + mutex_unlock(&ctx->bdev->bd_queue->rq_qos_mutex); > blkdev_put_no_open(ctx->bdev); > ctx->body = NULL; > ctx->bdev = NULL; > diff --git a/block/blk-core.c b/block/blk-core.c > index 269765d16cfd..fc7f902bdf5b 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -420,6 +420,7 @@ struct request_queue *blk_alloc_queue(int node_id) > mutex_init(&q->debugfs_mutex); > mutex_init(&q->sysfs_lock); > mutex_init(&q->sysfs_dir_lock); > + mutex_init(&q->rq_qos_mutex); > spin_lock_init(&q->queue_lock); > > init_waitqueue_head(&q->mq_freeze_wq); > diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c > index d8cc820a365e..167be74df4ee 100644 > --- a/block/blk-rq-qos.c > +++ b/block/blk-rq-qos.c > @@ -288,11 +288,13 @@ void rq_qos_wait(struct rq_wait *rqw, void *private_data, > > void rq_qos_exit(struct request_queue *q) > { > + mutex_lock(&q->rq_qos_mutex); > while (q->rq_qos) { > struct rq_qos *rqos = q->rq_qos; > q->rq_qos = rqos->next; > rqos->ops->exit(rqos); > } > + mutex_unlock(&q->rq_qos_mutex); > } > > int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id, > @@ -300,6 +302,8 @@ int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id, > { > struct request_queue *q = disk->queue; > > + lockdep_assert_held(&q->rq_qos_mutex); > + > rqos->disk = disk; > rqos->id = id; > rqos->ops = ops; > @@ -307,18 +311,13 @@ int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id, > /* > * No IO can be in-flight when adding rqos, so freeze queue, which > * is fine since we only support rq_qos for blk-mq queue. > - * > - * Reuse ->queue_lock for protecting against other concurrent > - * rq_qos adding/deleting > */ > blk_mq_freeze_queue(q); > > - spin_lock_irq(&q->queue_lock); > if (rq_qos_id(q, rqos->id)) > goto ebusy; > rqos->next = q->rq_qos; > q->rq_qos = rqos; > - spin_unlock_irq(&q->queue_lock); > > blk_mq_unfreeze_queue(q); > > @@ -330,7 +329,6 @@ int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id, > > return 0; > ebusy: > - spin_unlock_irq(&q->queue_lock); > blk_mq_unfreeze_queue(q); > return -EBUSY; > } > @@ -340,21 +338,15 @@ void rq_qos_del(struct rq_qos *rqos) > struct request_queue *q = rqos->disk->queue; > struct rq_qos **cur; > > - /* > - * See comment in rq_qos_add() about freezing queue & using > - * ->queue_lock. > - */ > - blk_mq_freeze_queue(q); > + lockdep_assert_held(&q->rq_qos_mutex); > > - spin_lock_irq(&q->queue_lock); > + blk_mq_freeze_queue(q); > for (cur = &q->rq_qos; *cur; cur = &(*cur)->next) { > if (*cur == rqos) { > *cur = rqos->next; > break; > } > } > - spin_unlock_irq(&q->queue_lock); > - > blk_mq_unfreeze_queue(q); > > mutex_lock(&q->debugfs_mutex); > diff --git a/block/blk-wbt.c b/block/blk-wbt.c > index e49a48684532..53bf5aa6f9ad 100644 > --- a/block/blk-wbt.c > +++ b/block/blk-wbt.c > @@ -942,7 +942,9 @@ int wbt_init(struct gendisk *disk) > /* > * Assign rwb and add the stats callback. > */ > + mutex_lock(&q->rq_qos_mutex); > ret = rq_qos_add(&rwb->rqos, disk, RQ_QOS_WBT, &wbt_rqos_ops); > + mutex_unlock(&q->rq_qos_mutex); > if (ret) > goto err_free; > > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index 6ede578dfbc6..17774f55743e 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -395,6 +395,7 @@ struct request_queue { > > struct blk_queue_stats *stats; > struct rq_qos *rq_qos; > + struct mutex rq_qos_mutex; > > const struct blk_mq_ops *mq_ops; > >