Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp896993rwb; Mon, 26 Sep 2022 07:13:52 -0700 (PDT) X-Google-Smtp-Source: AMsMyM404AX4TVd4frHDn8rbtS0vL4H07h9sBhi70q0QRYXOOLD4ncTY1/skpu0KNZbzUX02awkU X-Received: by 2002:a05:6402:370a:b0:456:d006:6948 with SMTP id ek10-20020a056402370a00b00456d0066948mr16229033edb.90.1664201632030; Mon, 26 Sep 2022 07:13:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664201632; cv=none; d=google.com; s=arc-20160816; b=A/UPY9ci+ma4emilS3MglTBq7TuCOtp3kR2FlihPxjLGbdkkCcpPfJTu0d26sM0TdL uqpr0Cxh/sNYd8Si4UHKIRtu0wHPCtJKGvrgtdal4q7LSHvVvHurSxOfcsqEMnw/Vb6L g3lBiimxsX1v13/OIbSClutuJIJ3DW/1NGA8KFEZWHqTZ1lTI9GK3JvuA1BPy+4ENy1t fmIsx7PWhySSGVQmMyLiiR33aH8D1zG9fj3CXk7Ot/bvhcsb/GnQEFdjaez8vkk6S+1C Zm6PrlrGsVm3Mqi3ydW8gRamDXQiZ0+7GHc6H7uuHZSlH8okGkZ9ve4gPtTFrYZfBIPJ si8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=ojSrC33S03zB5Mm3BCRmw3jUsQosMmq6GztVWiTHsNo=; b=O5esb660HpYmdmk+7PV/yc18MUmuVIwT5vyW5hai+bjhkpQKF7NvyfJMw/shbzLPDu 0Wj+z1HUxkA21SraZtZRTncWXlH3CbZqZNgwb0ckMHebt6xlBJWlTegqcIEXrsoTuwby vbf5zvXfizp8SjjvwdX02bDcz0C2lxyZBEPO+3JZ33dhBzRjvoeAOeVLDPB2rCEyGFfj U187qNSxk7TEdB2a1ID61ati69KDXVCGP16Zrx65EQL3uAj/upEqMysJFhZPgFgWSOCC d1PuUPSuJra3zXfaEvUp6rwYJQPRT88UuCnzWDtW9Dq8Ip2AJuIVGQ5xKLRi/XVgYnem 67rA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=o6a9xCa9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id go18-20020a1709070d9200b0073d6481dedesi12266ejc.224.2022.09.26.07.13.11; Mon, 26 Sep 2022 07:13:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=o6a9xCa9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238177AbiIZLhv (ORCPT + 99 others); Mon, 26 Sep 2022 07:37:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59660 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238046AbiIZLhK (ORCPT ); Mon, 26 Sep 2022 07:37:10 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0ECEF1C103; Mon, 26 Sep 2022 03:44:04 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 745166091B; Mon, 26 Sep 2022 10:41:00 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6105AC433D6; Mon, 26 Sep 2022 10:40:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1664188859; bh=e5xL77n75MzC+7zDlCWqNHCCknAu25jstEltBBgDNU8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=o6a9xCa9SKTNx63x9mUJ1Sg0njx6MDAAWvu5gWRSE7W8EJwOIZ93X1bWzO+Oxr30x h6ztcRQQuxM6gPUfRLUoM7/J8Bxeh+oSNy5F1/NZWuKO6OJBEnHLeF17xhlYDn3JoB I9LSKBvq9fPht3OmN2deYV62xWjMpf6bqN1/f4jA= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Theodore Tso , stable@kernel.org Subject: [PATCH 5.15 144/148] ext4: limit the number of retries after discarding preallocations blocks Date: Mon, 26 Sep 2022 12:12:58 +0200 Message-Id: <20220926100801.615046585@linuxfoundation.org> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220926100756.074519146@linuxfoundation.org> References: <20220926100756.074519146@linuxfoundation.org> User-Agent: quilt/0.67 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.2 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Theodore Ts'o commit 80fa46d6b9e7b1527bfd2197d75431fd9c382161 upstream. This patch avoids threads live-locking for hours when a large number threads are competing over the last few free extents as they blocks getting added and removed from preallocation pools. From our bug reporter: A reliable way for triggering this has multiple writers continuously write() to files when the filesystem is full, while small amounts of space are freed (e.g. by truncating a large file -1MiB at a time). In the local filesystem, this can be done by simply not checking the return code of write (0) and/or the error (ENOSPACE) that is set. Over NFS with an async mount, even clients with proper error checking will behave this way since the linux NFS client implementation will not propagate the server errors [the write syscalls immediately return success] until the file handle is closed. This leads to a situation where NFS clients send a continuous stream of WRITE rpcs which result in ERRNOSPACE -- but since the client isn't seeing this, the stream of writes continues at maximum network speed. When some space does appear, multiple writers will all attempt to claim it for their current write. For NFS, we may see dozens to hundreds of threads that do this. The real-world scenario of this is database backup tooling (in particular, github.com/mdkent/percona-xtrabackup) which may write large files (>1TiB) to NFS for safe keeping. Some temporary files are written, rewound, and read back -- all before closing the file handle (the temp file is actually unlinked, to trigger automatic deletion on close/crash.) An application like this operating on an async NFS mount will not see an error code until TiB have been written/read. The lockup was observed when running this database backup on large filesystems (64 TiB in this case) with a high number of block groups and no free space. Fragmentation is generally not a factor in this filesystem (~thousands of large files, mostly contiguous except for the parts written while the filesystem is at capacity.) Signed-off-by: Theodore Ts'o Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman --- fs/ext4/mballoc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -5539,6 +5539,7 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t ext4_fsblk_t block = 0; unsigned int inquota = 0; unsigned int reserv_clstrs = 0; + int retries = 0; u64 seq; might_sleep(); @@ -5641,7 +5642,8 @@ repeat: ar->len = ac->ac_b_ex.fe_len; } } else { - if (ext4_mb_discard_preallocations_should_retry(sb, ac, &seq)) + if (++retries < 3 && + ext4_mb_discard_preallocations_should_retry(sb, ac, &seq)) goto repeat; /* * If block allocation fails then the pa allocated above