Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp3882350rdb; Mon, 11 Dec 2023 02:54:41 -0800 (PST) X-Google-Smtp-Source: AGHT+IHrebgj99cBHdWzsk20Y3kiq+SGXc6WS8REJ6vZiNp53lh3rOb5wmCJBfT/ai4tjlai1uMF X-Received: by 2002:a17:90b:23c4:b0:286:6cc0:62b2 with SMTP id md4-20020a17090b23c400b002866cc062b2mr4380402pjb.49.1702292081364; Mon, 11 Dec 2023 02:54:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702292081; cv=none; d=google.com; s=arc-20160816; b=q+Efbnpi9mcTIbbT7FxehdP+V2Mro2m6I4hy4bSfAyMuEbG2EFhGxSI+3iURUZ8Exg KdRc7lvoNpQ8b+PCMj5PkD44dEKQI2J4EtUAx/laY1nua6pyTikJRnXk7MhyXidU4lec zg6Pcv9e2fbDSwyVG/kCrC+3cA72KmXgDcd5VL158372x32i0HQeQfyTOrb0qbeg29Fh voOwx0yXSJYH8722bJaculS5+0gKcUPoEH+fQqga2YhsR8/BrOyoEkbjvsKpJ07iNsUn Cm/Q/o7XlkuYF1pldIN+E2Y0ml52+M9rscbZmIlsiIuLD0tG/8G9DYHA4kgN8ufEvGIB 1CfA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:in-reply-to:content-disposition :references:message-id:subject:cc:to:from:date:dkim-signature; bh=yg+AOJ4pAdpNoXbdrXUMj4wiRQaG783GHe5izSwVsbE=; fh=c9ndHfVHtW1zFyQCjHmbQRH/XfkImt5YURxx3VgQzvA=; b=wWR74p9rL/h6xGBfY7bgo8xaO3OCVU/7zDh1iqM1BxIW2k19f+NteT+3zp3HDX9HZT 7iQO2IQBJlHo+LyYirgxk+xqo+Onzd8KTlnWOz+ynztgNtpXwkpX3BZ6gHrLt4U5wD3b ihLUH61zlTXKtaOUgOm/vbpJcy/IvWB6d8oWiNj8joMXY8QQTHCTGNbgTh4k8asbtSJK xTB7YVAd5XqvhJpp40Eo+4y3azgEaF8V2waaGsy6HhAtLAt4wlH+tbbmiMd81r40IrGL osn7VUVCG0bvxg9A3/36aOf2HCSL2Q9TwjQE3UC74nFrQFn2aii3EoKaf5IWJWa5orKk 2/EA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=YqMGZErW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id jx3-20020a17090b46c300b00286eeac293fsi7202025pjb.155.2023.12.11.02.54.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Dec 2023 02:54:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=YqMGZErW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id C2A398067E3F; Mon, 11 Dec 2023 02:54:38 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234220AbjLKKyY (ORCPT + 99 others); Mon, 11 Dec 2023 05:54:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46340 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231734AbjLKKyW (ORCPT ); Mon, 11 Dec 2023 05:54:22 -0500 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AF4619A; Mon, 11 Dec 2023 02:54:28 -0800 (PST) Received: from pps.filterd (m0353722.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3BB8JhUs009011; Mon, 11 Dec 2023 10:54:22 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : cc : subject : message-id : references : content-type : in-reply-to : mime-version; s=pp1; bh=yg+AOJ4pAdpNoXbdrXUMj4wiRQaG783GHe5izSwVsbE=; b=YqMGZErWBP2OC+aAdw3WUMoEGrVBBqwtz2sFTWfi1+5/TqSw2EW8IdZA1Tqdu0bMD0hl XwgDytxHL7T2eSVnQBt2RizYpCwBxYO7k7OSMku0xVh324cDd4G6aeYiSNj6A29YW5X3 eFXSHIOUiFYsMkgOlBNKyOcqlVTU1ZLILUO0NU3CgE4zySbowBtI55G5eNx9DDvYDZDE 5zI/70eRLUr8Bq2UktJn6PJS5p88fAYk2sdb2Kcmpcnqr+568KtVcQ/mkf0Q8yGReyZ8 IZosrCHOCctFIz8keCdD2l5ezWmSSRtPuZx1xBC9QI9puUBTxtSqbR0p0LgzTH0LtRmO 3g== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3uwu4u1sp0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 11 Dec 2023 10:54:21 +0000 Received: from m0353722.ppops.net (m0353722.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3BBAcWtw031481; Mon, 11 Dec 2023 10:54:21 GMT Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3uwu4u1snq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 11 Dec 2023 10:54:21 +0000 Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3BB90Clc004216; Mon, 11 Dec 2023 10:54:20 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 3uw4sk0enj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 11 Dec 2023 10:54:20 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3BBAsIss45351510 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 11 Dec 2023 10:54:18 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 999F820049; Mon, 11 Dec 2023 10:54:18 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BE16920040; Mon, 11 Dec 2023 10:54:16 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com (unknown [9.124.31.44]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTPS; Mon, 11 Dec 2023 10:54:16 +0000 (GMT) Date: Mon, 11 Dec 2023 16:24:14 +0530 From: Ojaswin Mujoo To: John Garry Cc: linux-ext4@vger.kernel.org, "Theodore Ts'o" , Ritesh Harjani , linux-kernel@vger.kernel.org, "Darrick J . Wong" , linux-block@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, dchinner@redhat.com Subject: Re: [RFC 0/7] ext4: Allocator changes for atomic write support with DIO Message-ID: References: <8c06c139-f994-442b-925e-e177ef2c5adb@oracle.com> Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 0hk_CaKg5hl9OuxCwwTAIB1EfHwqWApK X-Proofpoint-GUID: 6HnBDUGIbZWzEmNKTMIAYK6m4LFLD47B X-Proofpoint-UnRewURL: 1 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-12-11_04,2023-12-07_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 priorityscore=1501 adultscore=0 phishscore=0 mlxscore=0 malwarescore=0 spamscore=0 mlxlogscore=999 lowpriorityscore=0 suspectscore=0 impostorscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2312110087 X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Mon, 11 Dec 2023 02:54:38 -0800 (PST) Hi John, On Mon, Dec 04, 2023 at 02:44:36PM +0000, John Garry wrote: > On 04/12/2023 13:38, Ojaswin Mujoo wrote: > > > So are we supposed to be doing atomic writes on unwritten ranges only in the > > > file to get the aligned allocations? > > If we do an atomic write on a hole, ext4 will give us an aligned extent > > provided the hole is big enough to accomodate it. > > > > However, if we do an atomic write on a existing extent (written or > > unwritten) ext4 would check if it satisfies the alignment and length > > requirement and returns an error if it doesn't. > > This seems a rather big drawback. So if I'm not wrong, we force the extent alignment as well as the size of the extent in xfs right. We didn't want to overly restrict the users of atomic writes by forcing the extents to be of a certain alignment/size irrespective of the size of write. The design in this patchset provides this flexibility at the cost of some added precautions that the user should take (eg not doing an atomic write on a pre existing unaligned extent etc). However, I don't think it should be too hard to provide an optional forced alignment feature on top of this if there's interest in that. > > > Since we don't have cow > > like functionality afaik the only way we could let this kind of write go > > through is by punching the pre-existing extent which is not something we > > can directly do in the same write operation, hence we return error. > > Well, as you prob saw, for XFS we were relying on forcing extent alignment, > and not CoW (yet). That's right. > > > > > > I actually tried that, and I got a WARN triggered: > > > > > > # mkfs.ext4 /dev/sda > > > mke2fs 1.46.5 (30-Dec-2021) > > > Creating filesystem with 358400 1k blocks and 89760 inodes > > > Filesystem UUID: 7543a44b-2957-4ddc-9d4a-db3a5fd019c9 > > > Superblock backups stored on blocks: > > > 8193, 24577, 40961, 57345, 73729, 204801, 221185 > > > > > > Allocating group tables: done > > > Writing inode tables: done > > > Creating journal (8192 blocks): done > > > Writing superblocks and filesystem accounting information: done > > > > > > [ 12.745889] mkfs.ext4 (150) used greatest stack depth: 13304 bytes left > > > # mount /dev/sda mnt > > > [ 12.798804] EXT4-fs (sda): mounted filesystem > > > 7543a44b-2957-4ddc-9d4a-db3a5fd019c9 r/w with ordered data mode. Quota > > > mode: none. > > > # touch mnt/file > > > # > > > # /test-statx -a /root/mnt/file > > > statx(/root/mnt/file) = 0 > > > dump_statx results=5fff > > > Size: 0 Blocks: 0 IO Block: 1024 regular file > > > Device: 08:00 Inode: 12 Links: 1 > > > Access: (0644/-rw-r--r--) Uid: 0 Gid: 0 > > > Access: 2023-12-04 10:27:40.002848720+0000 > > > Modify: 2023-12-04 10:27:40.002848720+0000 > > > Change: 2023-12-04 10:27:40.002848720+0000 > > > Birth: 2023-12-04 10:27:40.002848720+0000 > > > stx_attributes_mask=0x703874 > > > STATX_ATTR_WRITE_ATOMIC set > > > unit min: 1024 > > > uunit max: 524288 > > > Attributes: 0000000000400000 (........ ........ ........ ........ > > > ........ .?--.... ..---... .---.-..) > > > # > > > > > > > > > > > > looks ok so far, then write 4KB at offset 0: > > > > > > # /test-pwritev2 -a -d -p 0 -l 4096 /root/mnt/file > > > file=/root/mnt/file write_size=4096 offset=0 o_flags=0x4002 wr_flags=0x24 > > ... > > > > Please note that I tested on my own dev branch, which contains changes over > > > [1], but I expect it would not make a difference for this test. > > Hmm this should not ideally happen, can you please share your test > > script with me if possible? > > It's doing nothing special, just RWF_ATOMIC flag is set for DIO write: > > https://github.com/johnpgarry/linux/blob/236870d48ecb19c1cf89dc439e188182a0524cd4/samples/vfs/test-pwritev2.c Thanks for the script, will try to replicate this today and get back to you. > > > > > > > > > Script to test using pwritev2() can be found here: > > > > https://gist.github.com/OjaswinM/e67accee3cbb7832bd3f1a9543c01da9 > > > Please note that the posix_memalign() call in the program should PAGE align. > > Why do you say that? direct IO seems to be working when the userspace > > buffer is 512 byte aligned, am I missing something? > > ah, sorry, if you just use 1x IOV vector then no special alignment are > required, so ignore this. Indeed, I need to improve kernel checks for > alignment anyway. > > Thanks, > John > Regards, ojaswin