Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7E7EC636D4 for ; Fri, 17 Feb 2023 12:15:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230287AbjBQMPZ (ORCPT ); Fri, 17 Feb 2023 07:15:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51142 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230093AbjBQMO7 (ORCPT ); Fri, 17 Feb 2023 07:14:59 -0500 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E950E6605B; Fri, 17 Feb 2023 04:14:46 -0800 (PST) Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 31HB0S8H012085; Fri, 17 Feb 2023 12:14:43 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : mime-version; s=pp1; bh=3Aoc7po0EoQIZm0dBtmhB5yc7Mo+m+A9HqmnpsDSHAY=; b=rSSZ7AxfuV2ffljsEKifuU5MoeGW1dltZJVZIvBoqzKUhNZlSD8UmK2BiycZqVOfcjsf FTolX5M7cNte1eyKT5wKUsoMKghKq3ClllkWQYaY05YUA88V/9JKXU6SvTyrZRDbcU+z ticaRkDqas6BiAZOu0dZJEZHrHJaTCRZbGQV68bMgAnt4swFNyfPG+h3QVqGFBnfwudr 5eNddnpQK265w4ph+v/VWUJxcjp+geA/NEGUt6q1lXWlNuvz7naHOFLVwITDImyR+UyT JzKxVGSYcn9ykcUjpwGrKdp4CnmYxahShiWg4oNyPvUUzAjmv0tIaRaZDY/x68Q0aCEQ bw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3nt3310tb4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 12:14:43 +0000 Received: from m0098399.ppops.net (m0098399.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 31HBN3RL025225; Fri, 17 Feb 2023 12:14:42 GMT Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3nt3310taf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 12:14:42 +0000 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 31HBUJBU010698; Fri, 17 Feb 2023 12:14:40 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma04ams.nl.ibm.com (PPS) with ESMTPS id 3np2n700h3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 12:14:40 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 31HCEbEq46596600 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 17 Feb 2023 12:14:38 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CFBAA2004D; Fri, 17 Feb 2023 12:14:37 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0047620040; Fri, 17 Feb 2023 12:14:36 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com (unknown [9.43.3.39]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Fri, 17 Feb 2023 12:14:35 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jan Kara , rookxu Subject: [PATCH v4 6/9] ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa() Date: Fri, 17 Feb 2023 17:44:15 +0530 Message-Id: <79b5240a6168171577b1bb9ef7a27c0c52676d37.1676634592.git.ojaswin@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: vPwLMry1_oXZ-X3kOsVT1PbtfwuiDn8g X-Proofpoint-GUID: 89MkkpUqUWSEHQxDnsMKRrnpyxAvFgL5 Content-Transfer-Encoding: 8bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.170.22 definitions=2023-02-17_06,2023-02-17_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 suspectscore=0 mlxlogscore=999 adultscore=0 spamscore=0 phishscore=0 clxscore=1015 priorityscore=1501 impostorscore=0 malwarescore=0 bulkscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302170109 Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org When the length of best extent found is less than the length of goal extent we need to make sure that the best extent atleast covers the start of the original request. This is done by adjusting the ac_b_ex.fe_logical (logical start) of the extent. While doing so, the current logic sometimes results in the best extent's logical range overflowing the goal extent. Since this best extent is later added to the inode preallocation list, we have a possibility of introducing overlapping preallocations. This is discussed in detail here [1]. To fix this, replace the existing logic with the below logic for adjusting best extent as it keeps fragmentation in check while ensuring logical range of best extent doesn't overflow out of goal extent: 1. Check if best extent can be kept at end of goal range and still cover original start. 2. Else, check if best extent can be kept at start of goal range and still cover original start. 3. Else, keep the best extent at start of original request. Also, add a few extra BUG_ONs that might help catch errors faster. [1] https://lore.kernel.org/r/Y+OGkVvzPN0RMv0O@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com Signed-off-by: Ojaswin Mujoo --- fs/ext4/mballoc.c | 49 ++++++++++++++++++++++++++++++----------------- 1 file changed, 31 insertions(+), 18 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index fdb9d0a8f35d..ba9d26e2f2aa 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -4330,6 +4330,7 @@ static void ext4_mb_use_inode_pa(struct ext4_allocation_context *ac, BUG_ON(start < pa->pa_pstart); BUG_ON(end > pa->pa_pstart + EXT4_C2B(sbi, pa->pa_len)); BUG_ON(pa->pa_free < len); + BUG_ON(ac->ac_b_ex.fe_len <= 0); pa->pa_free -= len; mb_debug(ac->ac_sb, "use %llu/%d from inode pa %p\n", start, len, pa); @@ -4668,10 +4669,8 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) pa = ac->ac_pa; if (ac->ac_b_ex.fe_len < ac->ac_g_ex.fe_len) { - int winl; - int wins; - int win; - int offs; + int new_bex_start; + int new_bex_end; /* we can't allocate as much as normalizer wants. * so, found space must get proper lstart @@ -4679,26 +4678,40 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) BUG_ON(ac->ac_g_ex.fe_logical > ac->ac_o_ex.fe_logical); BUG_ON(ac->ac_g_ex.fe_len < ac->ac_o_ex.fe_len); - /* we're limited by original request in that - * logical block must be covered any way - * winl is window we can move our chunk within */ - winl = ac->ac_o_ex.fe_logical - ac->ac_g_ex.fe_logical; + /* + * Use the below logic for adjusting best extent as it keeps + * fragmentation in check while ensuring logical range of best + * extent doesn't overflow out of goal extent: + * + * 1. Check if best ex can be kept at end of goal and still + * cover original start + * 2. Else, check if best ex can be kept at start of goal and + * still cover original start + * 3. Else, keep the best ex at start of original request. + */ + new_bex_end = ac->ac_g_ex.fe_logical + + EXT4_C2B(sbi, ac->ac_g_ex.fe_len); + new_bex_start = new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len); + if (ac->ac_o_ex.fe_logical >= new_bex_start) + goto adjust_bex; - /* also, we should cover whole original request */ - wins = EXT4_C2B(sbi, ac->ac_b_ex.fe_len - ac->ac_o_ex.fe_len); + new_bex_start = ac->ac_g_ex.fe_logical; + new_bex_end = + new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len); + if (ac->ac_o_ex.fe_logical < new_bex_end) + goto adjust_bex; - /* the smallest one defines real window */ - win = min(winl, wins); + new_bex_start = ac->ac_o_ex.fe_logical; + new_bex_end = + new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len); - offs = ac->ac_o_ex.fe_logical % - EXT4_C2B(sbi, ac->ac_b_ex.fe_len); - if (offs && offs < win) - win = offs; +adjust_bex: + ac->ac_b_ex.fe_logical = new_bex_start; - ac->ac_b_ex.fe_logical = ac->ac_o_ex.fe_logical - - EXT4_NUM_B2C(sbi, win); BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical); BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len); + BUG_ON(new_bex_end > (ac->ac_g_ex.fe_logical + + EXT4_C2B(sbi, ac->ac_g_ex.fe_len))); } /* preallocation can change ac_b_ex, thus we store actually -- 2.31.1