Received: by 2002:ac0:e34a:0:0:0:0:0 with SMTP id g10csp41969imn; Wed, 27 Jul 2022 14:37:41 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vjjUCodVFOM7pDTHoTqrKCPrgFfJ2xpVgeGLKmvjNi53U6hE81+lC7Q5lifHklJdDjV2rm X-Received: by 2002:a17:90b:38c3:b0:1f2:e229:41fd with SMTP id nn3-20020a17090b38c300b001f2e22941fdmr6607050pjb.201.1658957861005; Wed, 27 Jul 2022 14:37:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658957860; cv=none; d=google.com; s=arc-20160816; b=GpqqxLc+5hpia1iBMPrks1B82jWZaALQ1tQBPzzqYqSFtD9cLZECZOI4+8e/5/o/qT cHdU5+lKcXSmXpGXmhOh8F+Gd5x+Gqg2uAGaxt/67CnxPSABnoZL8GFBhEnVSc9JQdsF 2r6uri3yp5+kvTKHUVhCSi2lr+fpWLprXqfRnPQ0NvM9VanuWtyCzYqg37hpqeIjxHP+ XSv+nYT4NI/Df58FYrSOKo1BtXSN17Fnxpnhx92LILTEXtLoynkUShqjZVgvriGm5Zq1 2xnn42Ct8m6Pj+sUOjlaIaC0Nc1DSpbR9SoJujaSKAS6zQOfdNhDl5tbxnY/TkNSrPxX GKXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:subject:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:cc:to:from:dkim-signature; bh=GybQ1tw6aSCqyvxdAGuholV0PW9F5aoNumffRpM3f7A=; b=Q7TTdQ4Jq52jYUS3oDIUy6zgcufdSzFRD9ioqAe2FgmBxVw7gppRyFsDv5bdHohqOC 0dB6FXSM2H98X5KDq0L9c6aVZAdinPIqcAHcDdCOcSk7eY5xcAgeBlCSckdLV5rSicMN dUJYT7t9H8huQTe/TP4hD29VzzPB57WKJNh5wfTUh7fH/yOnEDIXEJ9tvIlk1rTtVtdG idAMyOXufFmJ4XM10K7Z0M2ra3V37+GYKKNkr2jCeYM+gWrGWpefHFvjVnEsgZL027ce g4y7b02PM2+GIp6/td5IiHtoM9u88mtHDXhWKqrF/C/6u2wlWRk/4uQiJ0GRQRkAwj3A BWVA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@deltatee.com header.s=20200525 header.b="jCG+I/yx"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=deltatee.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a27-20020aa7971b000000b005258c37d4a5si21581687pfg.356.2022.07.27.14.37.26; Wed, 27 Jul 2022 14:37:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@deltatee.com header.s=20200525 header.b="jCG+I/yx"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=deltatee.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234699AbiG0VJD (ORCPT + 99 others); Wed, 27 Jul 2022 17:09:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36010 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237679AbiG0VIg (ORCPT ); Wed, 27 Jul 2022 17:08:36 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0A4D9664E9; Wed, 27 Jul 2022 14:06:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=GybQ1tw6aSCqyvxdAGuholV0PW9F5aoNumffRpM3f7A=; b=jCG+I/yxG+oDqZg6L9ScjODwtT JdPENCkRIfSQtVe94AbTWE4S0mXXmwUAtzpbI3hzDeeIba6yfnp+yrtcjab1NzMn0A2ifv/5uKj5D xt336YhrbnsjxsPYgjrqz+QCuRjYJqSjIwVYILOm9+a8D6y0f+/nJkwJYL495itUwgj4379gAV6UB URD0jfBRYYKnAHypRHSzcHn19rHQH1lD6eMBOLyin9aKx3e7F9zy+PAXgO7LOppo1DMhNbae1mAAZ Bfo8j0VuhL9l5bLaC8BYV1cf2HdQAel1wIwm4HzZMLsR7rGp9TeDKqFM2EhdUFxpkqtkQR45KAR2J 2SORJqNw==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1oGoEH-001pyC-Bs; Wed, 27 Jul 2022 15:06:06 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1oGoEF-000VI9-69; Wed, 27 Jul 2022 15:06:03 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe Date: Wed, 27 Jul 2022 15:06:00 -0600 Message-Id: <20220727210600.120221-6-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220727210600.120221-1-logang@deltatee.com> References: <20220727210600.120221-1-logang@deltatee.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_PASS,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 Subject: [PATCH 5/5] md/raid5: Ensure batch_last is released before sleeping for quiesce X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org A race condition exists where if raid5_quiesce() is called in the middle of a request that has set batch_last, it will deadlock. batch_last will hold a reference to a stripe when raid5_quiesce() is called. This will cause the next raid5_get_active_stripe() call to sleep waiting for the quiesce to finish, but the raid5_quiesce() thread will wait for active_stripes to go to zero which will never happen because request thread is waiting for the quiesce to stop. Fix this by creating a special __raid5_get_active_stripe() function which takes the request context and clears the last_batch before sleeping. While we're at it, change the arguments of raid5_get_active_stripe() to bools. Fixes: 4fcbd9abb6f2 ("md/raid5: Keep a reference to last stripe_head for batch") Reported-by: David Sloan Signed-off-by: Logan Gunthorpe --- drivers/md/raid5.c | 36 ++++++++++++++++++++++++++++-------- drivers/md/raid5.h | 2 +- 2 files changed, 29 insertions(+), 9 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 0a8687fd1748..421bac221a74 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -800,9 +800,9 @@ static bool is_inactive_blocked(struct r5conf *conf, int hash) return active < (conf->max_nr_stripes * 3 / 4); } -struct stripe_head * -raid5_get_active_stripe(struct r5conf *conf, sector_t sector, - int previous, int noblock, int noquiesce) +static struct stripe_head *__raid5_get_active_stripe(struct r5conf *conf, + struct stripe_request_ctx *ctx, sector_t sector, + bool previous, bool noblock, bool noquiesce) { struct stripe_head *sh; int hash = stripe_hash_locks_hash(conf, sector); @@ -812,9 +812,22 @@ raid5_get_active_stripe(struct r5conf *conf, sector_t sector, spin_lock_irq(conf->hash_locks + hash); retry: - wait_event_lock_irq(conf->wait_for_quiescent, - conf->quiesce == 0 || noquiesce, - *(conf->hash_locks + hash)); + if (!noquiesce && conf->quiesce) { + /* + * Must release the reference to batch_last before waiting, + * on quiesce, otherwise the batch_last will hold a reference + * to a stripe and raid5_quiesce() will deadlock waiting for + * active_stripes to go to zero. + */ + if (ctx && ctx->batch_last) { + raid5_release_stripe(ctx->batch_last); + ctx->batch_last = NULL; + } + + wait_event_lock_irq(conf->wait_for_quiescent, !conf->quiesce, + *(conf->hash_locks + hash)); + } + sh = find_get_stripe(conf, sector, conf->generation - previous, hash); if (sh) goto out; @@ -850,6 +863,13 @@ raid5_get_active_stripe(struct r5conf *conf, sector_t sector, return sh; } +struct stripe_head *raid5_get_active_stripe(struct r5conf *conf, + sector_t sector, bool previous, bool noblock, bool noquiesce) +{ + return __raid5_get_active_stripe(conf, NULL, sector, previous, noblock, + noquiesce); +} + static bool is_full_stripe_write(struct stripe_head *sh) { BUG_ON(sh->overwrite_disks > (sh->disks - sh->raid_conf->max_degraded)); @@ -5992,8 +6012,8 @@ static enum stripe_result make_stripe_request(struct mddev *mddev, pr_debug("raid456: %s, sector %llu logical %llu\n", __func__, new_sector, logical_sector); - sh = raid5_get_active_stripe(conf, new_sector, previous, - (bi->bi_opf & REQ_RAHEAD), 0); + sh = __raid5_get_active_stripe(conf, ctx, new_sector, previous, + (bi->bi_opf & REQ_RAHEAD), 0); if (unlikely(!sh)) { /* cannot get stripe, just give-up */ bi->bi_status = BLK_STS_IOERR; diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h index 638d29863503..a5082bed83c8 100644 --- a/drivers/md/raid5.h +++ b/drivers/md/raid5.h @@ -812,7 +812,7 @@ extern sector_t raid5_compute_sector(struct r5conf *conf, sector_t r_sector, struct stripe_head *sh); extern struct stripe_head * raid5_get_active_stripe(struct r5conf *conf, sector_t sector, - int previous, int noblock, int noquiesce); + bool previous, bool noblock, bool noquiesce); extern int raid5_calc_degraded(struct r5conf *conf); extern int r5c_journal_mode_set(struct mddev *mddev, int journal_mode); #endif -- 2.30.2