Received: by 2002:a05:7412:37c9:b0:e2:908c:2ebd with SMTP id jz9csp666278rdb; Tue, 19 Sep 2023 06:59:04 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHAlg91IP6IdpV5y3YV6G/NSaEy5aqoWyxbZH65PUC7a1ZfTVybnCJFofL53viuz7L3gZe9 X-Received: by 2002:a05:6a20:8f0c:b0:159:e0b9:bd02 with SMTP id b12-20020a056a208f0c00b00159e0b9bd02mr13922542pzk.40.1695131943810; Tue, 19 Sep 2023 06:59:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695131943; cv=none; d=google.com; s=arc-20160816; b=RnBJEzN6c7Rk1bHNAMLARaBAHLG+jvqxwLkQaI03pooUIsFqblSuCRCesY2nBP2bvA Q/gYcQ3sBnKGFElocEbalyYni+Na6D5xAL5UEfT0VUyeZWRPZ2BA+BYQHqMWiXl5bpWp 6mXR2lsIfJmFEIfHf3Z/iMYRfVokZiB7qBRM0uc/jFe7fpF4WBamfZeXzTJK0r8J1S6i 16NLhxaRKfCzsb7E4F3MpTFFmQ63SKR07bk7lhSfpi7oG1pvBFkrr80OAs2xRKzFuYq2 3wITvbfVkFN1al1bkUh4ZBXAM4E+MTjFIdpYMY78pAewjUbecpTyLPmjo+aNbEWajhwy uX+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:subject:user-agent:mime-version:date:message-id; bh=XhG8mkfZci7pibRdstEqAbYX/+ed26EiBR6ZPNG8xME=; fh=cW7ztpJ3QCW330gt00cV5c/Ite7T3QGICZkMafpeG9Q=; b=H7ZOUewQ+s4Syt8chdjzLQ7u6eTgO0Z+9SpxWFXBLnHSpQHByEwlmCp2GihDGPc4lH 8wNowFT7qXE7UKF+UWlBdsqpzXLAsxJg2PGFWCACgpztR9Gq0Mc5tT+BB3N+XK2QoCdC ArL29WV+5tBybgEkC15gNHcmfYL+UVOzY0xPwyF4gh+tMA7r8loePZ9x0d11DVku09fM PS9bHdta9b0n48wPiVKynLlwaciu7w1HfZSrVsobRT7Xf7qOGAobv1L17UkaVdhBPu7z ayFdEyHqV2KcoeJsyTj51U8fjB3JabJsEVZrqMSiLR7rfxVl7xr9MsLkjyWFX1iHLF1D oxhg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id b26-20020a6567da000000b00563deb65f93si1484449pgs.200.2023.09.19.06.59.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Sep 2023 06:59:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 42EB381142DA; Tue, 19 Sep 2023 06:47:46 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232419AbjISNru (ORCPT + 99 others); Tue, 19 Sep 2023 09:47:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60044 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232414AbjISNrt (ORCPT ); Tue, 19 Sep 2023 09:47:49 -0400 Received: from out30-131.freemail.mail.aliyun.com (out30-131.freemail.mail.aliyun.com [115.124.30.131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 702F89E for ; Tue, 19 Sep 2023 06:47:42 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R181e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045168;MF=hsiangkao@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0VsRnxVo_1695131257; Received: from 192.168.3.4(mailfrom:hsiangkao@linux.alibaba.com fp:SMTPD_---0VsRnxVo_1695131257) by smtp.aliyun-inc.com; Tue, 19 Sep 2023 21:47:38 +0800 Message-ID: <9fccc0e4-8f51-d3e7-21de-f85f8837be7f@linux.alibaba.com> Date: Tue, 19 Sep 2023 21:47:34 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.15.0 Subject: Re: [bug report] ext4 misses final i_size meta sync under O_DIRECT | O_SYNC semantics after iomap DIO conversion To: Jan Kara Cc: linux-ext4@vger.kernel.org, Theodore Ts'o , Matthew Bobrowski , Christoph Hellwig , Joseph Qi , "Darrick J. Wong" References: <02d18236-26ef-09b0-90ad-030c4fe3ee20@linux.alibaba.com> <20230919120532.5dg7mgdnwd5lezgz@quack3> From: Gao Xiang In-Reply-To: <20230919120532.5dg7mgdnwd5lezgz@quack3> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.2 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Tue, 19 Sep 2023 06:47:46 -0700 (PDT) (sorry... add Darrick here...) Hi Jan, On 2023/9/19 20:05, Jan Kara wrote: > Hello! > > On Tue 19-09-23 14:00:04, Gao Xiang wrote: >> Our consumer reports a behavior change between pre-iomap and iomap >> direct io conversion: >> >> If the system crashes after an appending write to a file open with >> O_DIRECT | O_SYNC flag set, file i_size won't be updated even if >> O_SYNC was marked before. >> >> It can be reproduced by a test program in the attachment with >> gcc -o repro repro.c && ./repro testfile && echo c > /proc/sysrq-trigger >> >> After some analysis, we found that before iomap direct I/O conversion, >> the timing was roughly (taking Linux 3.10 codebase as an example): >> >> .. >> - ext4_file_dio_write >> - __generic_file_aio_write >> .. >> - ext4_direct_IO # generic_file_direct_write >> - ext4_ext_direct_IO >> - ext4_ind_direct_IO # final_size > inode->i_size >> - .. >> - ret = blockdev_direct_IO() >> - i_size_write(inode, end) # orphan && ret > 0 && >> # end > inode->i_size >> - ext4_mark_inode_dirty() >> - ... >> - generic_write_sync # handling O_SYNC >> >> So the dirty inode meta will be committed into journal immediately >> if O_SYNC is set. However, After commit 569342dc2485 ("ext4: move >> inode extension/truncate code out from ->iomap_end() callback"), >> the new behavior seems as below: >> >> .. >> - ext4_dio_write_iter >> - ext4_dio_write_checks # extend = 1 >> - iomap_dio_rw >> - __iomap_dio_rw >> - iomap_dio_complete >> - generic_write_sync >> - ext4_handle_inode_extension # extend = 1 >> >> So that i_size will be recorded only after generic_write_sync() is >> called. So O_SYNC won't flush the update i_size to the disk. > > Indeed, that looks like a bug. Thanks for report! Thanks for the confirmation! > >> On the other side, after a quick look of XFS side, it will record >> i_size changes in xfs_dio_write_end_io() so it seems that it doesn't >> have this problem. > > Yes, I'm a bit hazy on the details but I think we've decided to call > ext4_handle_inode_extension() directly from ext4_dio_write_iter() because > from ext4_dio_write_end_io() it was difficult to test in a race-free way > whether extending i_size (and i_disksize) is needed or not (we don't > necessarily hold i_rwsem there). I'll think how we could fix the problem > you've reported. Yes, another concern is O_DSYNC, I'm quite not sure if the behavior is changed too. I had a rough feeling that currently iomap DIO behaviors on these are too strict and might not fit in each specific fs detailed implementation, tho. Thanks, Gao Xiang > > Honza