Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A79C3C6FD1D for ; Fri, 17 Mar 2023 08:39:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231663AbjCQIjV (ORCPT ); Fri, 17 Mar 2023 04:39:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54252 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231322AbjCQIig (ORCPT ); Fri, 17 Mar 2023 04:38:36 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 283D41ACCC; Fri, 17 Mar 2023 01:38:00 -0700 (PDT) Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 32H70BsR004924; Fri, 17 Mar 2023 08:37:40 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : mime-version; s=pp1; bh=hSpAz0/Gptd847h02VX3CFwUKMDByZ32pZ2ZeItHqQU=; b=EC0ywy3otQwPfFq4YQkXltA4qxNXZuq7r6P5T91AUAijdKOh/+iz+54ow7tth84OH9u3 rBnQeMWS/P6NvYUJ5l3iq9qTBckLMan4bTEltypnoKzVqiLycVpELgq/PZUH+uwkKSy7 BiGZ1T6Qv+X5Ry/l0ZzIJEcgngKN312ehD7Ftgf5ib3KdzbCLA0XBvY/WMTxXgIs9Kz7 lIumC7alatnrCNubRHirZCM1zcgOcMJHQXauEOE8LsBXZYQ1CA3gl6OAvS0JJddKQfc8 OzZLBEezxgzA4q0KzEfsTWLwBs2WdHoNvtr6gM5OX8N5SCFui2Xummvw+g7xMe6ypLyi Fw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3pckedaa78-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Mar 2023 08:37:40 +0000 Received: from m0098416.ppops.net (m0098416.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 32H72HhC015398; Fri, 17 Mar 2023 08:37:39 GMT Received: from ppma02fra.de.ibm.com (47.49.7a9f.ip4.static.sl-reverse.com [159.122.73.71]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3pckedaa6j-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Mar 2023 08:37:39 +0000 Received: from pps.filterd (ppma02fra.de.ibm.com [127.0.0.1]) by ppma02fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 32GHUhRb024411; Fri, 17 Mar 2023 08:37:38 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma02fra.de.ibm.com (PPS) with ESMTPS id 3pbsmbhj1k-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Mar 2023 08:37:37 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 32H8bZAK30540478 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 17 Mar 2023 08:37:35 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7C95D20F6E; Fri, 17 Mar 2023 08:37:35 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6069E20F63; Fri, 17 Mar 2023 08:37:33 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com (unknown [9.43.91.202]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Fri, 17 Mar 2023 08:37:33 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jan Kara , rookxu , Ritesh Harjani Subject: [PATCH v5 6/9] ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa() Date: Fri, 17 Mar 2023 14:07:10 +0530 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: qpZNkgbnKTIoRndN4oGwXgp-fKbb4WdU X-Proofpoint-GUID: 0Vi4Ix1WZHPXo4vqXsQWnJSf9-WMpobF Content-Transfer-Encoding: 8bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-17_04,2023-03-16_02,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 clxscore=1015 adultscore=0 impostorscore=0 malwarescore=0 suspectscore=0 mlxlogscore=999 bulkscore=0 priorityscore=1501 mlxscore=0 phishscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303150002 definitions=main-2303170057 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When the length of best extent found is less than the length of goal extent we need to make sure that the best extent atleast covers the start of the original request. This is done by adjusting the ac_b_ex.fe_logical (logical start) of the extent. While doing so, the current logic sometimes results in the best extent's logical range overflowing the goal extent. Since this best extent is later added to the inode preallocation list, we have a possibility of introducing overlapping preallocations. This is discussed in detail here [1]. To fix this, replace the existing logic with the below logic for adjusting best extent as it keeps fragmentation in check while ensuring logical range of best extent doesn't overflow out of goal extent: 1. Check if best extent can be kept at end of goal range and still cover original start. 2. Else, check if best extent can be kept at start of goal range and still cover original start. 3. Else, keep the best extent at start of original request. Also, add a few extra BUG_ONs that might help catch errors faster. [1] https://lore.kernel.org/r/Y+OGkVvzPN0RMv0O@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com Signed-off-by: Ojaswin Mujoo Reviewed-by: Ritesh Harjani (IBM) Reviewed-by: Jan Kara --- fs/ext4/mballoc.c | 49 ++++++++++++++++++++++++++++++----------------- 1 file changed, 31 insertions(+), 18 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 37bf6507cbfd..1304c95d8c59 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -4328,6 +4328,7 @@ static void ext4_mb_use_inode_pa(struct ext4_allocation_context *ac, BUG_ON(start < pa->pa_pstart); BUG_ON(end > pa->pa_pstart + EXT4_C2B(sbi, pa->pa_len)); BUG_ON(pa->pa_free < len); + BUG_ON(ac->ac_b_ex.fe_len <= 0); pa->pa_free -= len; mb_debug(ac->ac_sb, "use %llu/%d from inode pa %p\n", start, len, pa); @@ -4666,10 +4667,8 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) pa = ac->ac_pa; if (ac->ac_b_ex.fe_len < ac->ac_g_ex.fe_len) { - int winl; - int wins; - int win; - int offs; + int new_bex_start; + int new_bex_end; /* we can't allocate as much as normalizer wants. * so, found space must get proper lstart @@ -4677,26 +4676,40 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) BUG_ON(ac->ac_g_ex.fe_logical > ac->ac_o_ex.fe_logical); BUG_ON(ac->ac_g_ex.fe_len < ac->ac_o_ex.fe_len); - /* we're limited by original request in that - * logical block must be covered any way - * winl is window we can move our chunk within */ - winl = ac->ac_o_ex.fe_logical - ac->ac_g_ex.fe_logical; + /* + * Use the below logic for adjusting best extent as it keeps + * fragmentation in check while ensuring logical range of best + * extent doesn't overflow out of goal extent: + * + * 1. Check if best ex can be kept at end of goal and still + * cover original start + * 2. Else, check if best ex can be kept at start of goal and + * still cover original start + * 3. Else, keep the best ex at start of original request. + */ + new_bex_end = ac->ac_g_ex.fe_logical + + EXT4_C2B(sbi, ac->ac_g_ex.fe_len); + new_bex_start = new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len); + if (ac->ac_o_ex.fe_logical >= new_bex_start) + goto adjust_bex; - /* also, we should cover whole original request */ - wins = EXT4_C2B(sbi, ac->ac_b_ex.fe_len - ac->ac_o_ex.fe_len); + new_bex_start = ac->ac_g_ex.fe_logical; + new_bex_end = + new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len); + if (ac->ac_o_ex.fe_logical < new_bex_end) + goto adjust_bex; - /* the smallest one defines real window */ - win = min(winl, wins); + new_bex_start = ac->ac_o_ex.fe_logical; + new_bex_end = + new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len); - offs = ac->ac_o_ex.fe_logical % - EXT4_C2B(sbi, ac->ac_b_ex.fe_len); - if (offs && offs < win) - win = offs; +adjust_bex: + ac->ac_b_ex.fe_logical = new_bex_start; - ac->ac_b_ex.fe_logical = ac->ac_o_ex.fe_logical - - EXT4_NUM_B2C(sbi, win); BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical); BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len); + BUG_ON(new_bex_end > (ac->ac_g_ex.fe_logical + + EXT4_C2B(sbi, ac->ac_g_ex.fe_len))); } pa->pa_lstart = ac->ac_b_ex.fe_logical; -- 2.31.1