Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp1723197iog; Thu, 16 Jun 2022 12:22:00 -0700 (PDT) X-Google-Smtp-Source: AGRyM1tXiyrcju9cGTr7QeD40oPSUZ8YVDWiUASVYCB6NTBx+mKMvc/U5ms4LSyIB2jUkRNsJfvX X-Received: by 2002:a63:2c15:0:b0:408:a75e:340a with SMTP id s21-20020a632c15000000b00408a75e340amr5865124pgs.313.1655407320103; Thu, 16 Jun 2022 12:22:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1655407320; cv=none; d=google.com; s=arc-20160816; b=kQINR8vd+R80/kwleQDY+TJwc/QlMLiGNL5SGeRCn4gkM7t7kGNifmOegtpzW+FYBW jLavr4U6+a2Xo/xtwugvfUc3TfNnjPA+6Vr+fzh/rCNST+AfngbFirAQ5bsIUgcptO85 qL5Tkfdw5qa0CAZmFbQDCPwwGRfFgStGhUulv4EMpm8n251k2K7/pT0mMNMwLBmMlcO4 KYyHGSFETa54rd8NT3hBhG7SRjkiww3e8BGdbJxNEsuOwHc7G8kF8ROrAxIA1Q2UPWaC 1j7pzxLZK2VYDMtnZELlfCh1rVLmRpVvEL8n8qnYjuWkrYXj+0sfXa20pVG0XxnJWRl5 oWWA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:subject:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:cc:to:from:dkim-signature; bh=5N+DuwbwjsNtA6MiAkyKqthsWUBtdpGk6ol5t9owlgA=; b=npTSFwPENWFjzfRr/AzPJbKKjjRZfq72GN654CQET1Mmu5dhyvYhBkKDLFj8NQpeJW 7fs2mnoieuQbXBOfLVfNhaBHgWJDKQVtE0n5qha1kbdz3Hi8FYvsi/L4sV1rNnJZzquh 60gqHB3pqygXH4T+RILnVJjtMcaDJiq61WPxDMynhCQmd1adr69qXUGt/sNY0E6mOazx vMUi4qLvjZ2jSZA4m9eq5TEWbxdiYcyeYLpKY9PpP5wWwTK1OADdutj+FUSHyVs9MoN8 6NMiIAOoYk6DV46Vw3usGqVPQza5isCTQJlkt1q8sAhPd7xILvYXH7di5Vpq9GNBz4g3 hgwQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@deltatee.com header.s=20200525 header.b=OEtl1eZg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=deltatee.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id nn9-20020a17090b38c900b001eae90cb9b3si3918181pjb.38.2022.06.16.12.21.42; Thu, 16 Jun 2022 12:22:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@deltatee.com header.s=20200525 header.b=OEtl1eZg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=deltatee.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378325AbiFPTUm (ORCPT + 99 others); Thu, 16 Jun 2022 15:20:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40952 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1377680AbiFPTT5 (ORCPT ); Thu, 16 Jun 2022 15:19:57 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B09DE5623A; Thu, 16 Jun 2022 12:19:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=5N+DuwbwjsNtA6MiAkyKqthsWUBtdpGk6ol5t9owlgA=; b=OEtl1eZgooRbxWrxo2k7IkqbRT D8nwWEVRc/mKxskHhRw6wqZE5vOTOBA00gp/MOIQnFT1KggEX02XM7FrWa69FsBpkj2feDmLbW3EO H7FrtkAuXxR/JByBS0mouVXUWOq97R6hoWfej0lppTMOvD84eT/UyiVqZh5ewL4aGwSp2zzohkjUM GwI2mUxMSpwGz90KtW6Ip3vN6LNDWR9DJJhfW2eegy4+5XMMxzY+vyB03GTPub1oihoXT5yLR841U S7k0v/bhI67zO1b3d5ZwHXMUqGYe1/MN1ovchnzuowKyozuTCx1Twj2gc4XPA2msxubUm+J3sXj6h FaMBzNSQ==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1o1v1z-0092ii-T0; Thu, 16 Jun 2022 13:19:52 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1o1v1w-0006Fc-6c; Thu, 16 Jun 2022 13:19:48 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe , Christoph Hellwig Date: Thu, 16 Jun 2022 13:19:40 -0600 Message-Id: <20220616191945.23935-11-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220616191945.23935-1-logang@deltatee.com> References: <20220616191945.23935-1-logang@deltatee.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com, hch@lst.de X-SA-Exim-Mail-From: gunthorp@deltatee.com X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_PASS,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 Subject: [PATCH v3 10/15] md/raid5: Keep a reference to last stripe_head for batch X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When batching, every stripe head has to find the previous stripe head to add to the batch list. This involves taking the hash lock which is highly contended during IO. Instead of finding the previous stripe_head each time, store a reference to the previous stripe_head in a pointer so that it doesn't require taking the contended lock another time. The reference to the previous stripe must be released before scheduling and waiting for work to get done. Otherwise, it can hold up raid5_activate_delayed() and deadlock. Signed-off-by: Logan Gunthorpe Reviewed-by: Christoph Hellwig Acked-by: Guoqing Jiang --- drivers/md/raid5.c | 52 +++++++++++++++++++++++++++++++++++----------- 1 file changed, 40 insertions(+), 12 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 17ddaa41147c..34f8d6c18bd3 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -843,7 +843,8 @@ static bool stripe_can_batch(struct stripe_head *sh) } /* we only do back search */ -static void stripe_add_to_batch_list(struct r5conf *conf, struct stripe_head *sh) +static void stripe_add_to_batch_list(struct r5conf *conf, + struct stripe_head *sh, struct stripe_head *last_sh) { struct stripe_head *head; sector_t head_sector, tmp_sec; @@ -856,15 +857,20 @@ static void stripe_add_to_batch_list(struct r5conf *conf, struct stripe_head *sh return; head_sector = sh->sector - RAID5_STRIPE_SECTORS(conf); - hash = stripe_hash_locks_hash(conf, head_sector); - spin_lock_irq(conf->hash_locks + hash); - head = find_get_stripe(conf, head_sector, conf->generation, hash); - spin_unlock_irq(conf->hash_locks + hash); - - if (!head) - return; - if (!stripe_can_batch(head)) - goto out; + if (last_sh && head_sector == last_sh->sector) { + head = last_sh; + atomic_inc(&head->count); + } else { + hash = stripe_hash_locks_hash(conf, head_sector); + spin_lock_irq(conf->hash_locks + hash); + head = find_get_stripe(conf, head_sector, conf->generation, + hash); + spin_unlock_irq(conf->hash_locks + hash); + if (!head) + return; + if (!stripe_can_batch(head)) + goto out; + } lock_two_stripes(head, sh); /* clear_batch_ready clear the flag */ @@ -5794,6 +5800,8 @@ enum stripe_result { }; struct stripe_request_ctx { + /* a reference to the last stripe_head for batching */ + struct stripe_head *batch_last; /* the request had REQ_PREFLUSH, cleared after the first stripe_head */ bool do_flush; }; @@ -5888,8 +5896,13 @@ static enum stripe_result make_stripe_request(struct mddev *mddev, goto out_release; } - if (stripe_can_batch(sh)) - stripe_add_to_batch_list(conf, sh); + if (stripe_can_batch(sh)) { + stripe_add_to_batch_list(conf, sh, ctx->batch_last); + if (ctx->batch_last) + raid5_release_stripe(ctx->batch_last); + atomic_inc(&sh->count); + ctx->batch_last = sh; + } if (ctx->do_flush) { set_bit(STRIPE_R5C_PREFLUSH, &sh->state); @@ -5984,6 +5997,18 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi) continue; if (res == STRIPE_SCHEDULE_AND_RETRY) { + /* + * Must release the reference to batch_last before + * scheduling and waiting for work to be done, + * otherwise the batch_last stripe head could prevent + * raid5_activate_delayed() from making progress + * and thus deadlocking. + */ + if (ctx.batch_last) { + raid5_release_stripe(ctx.batch_last); + ctx.batch_last = NULL; + } + schedule(); prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE); @@ -5995,6 +6020,9 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi) finish_wait(&conf->wait_for_overlap, &w); + if (ctx.batch_last) + raid5_release_stripe(ctx.batch_last); + if (rw == WRITE) md_write_end(mddev); bio_endio(bi); -- 2.30.2