Received: by 2002:a05:7412:37c9:b0:e2:908c:2ebd with SMTP id jz9csp626664rdb; Tue, 19 Sep 2023 06:01:01 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFd1A7Hu12rDXYxhytBVEl3/ReGVQ+/c752kHCyrAj3NiwZV5o5oHD5vvHxUZzJMrIBrlIn X-Received: by 2002:a05:6a20:9718:b0:138:60e:9bb with SMTP id hr24-20020a056a20971800b00138060e09bbmr9185693pzc.28.1695128460928; Tue, 19 Sep 2023 06:01:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695128460; cv=none; d=google.com; s=arc-20160816; b=cj0rG7BYVRdubqpgeklt5cW2j9bIf1Dv1wguT0KxzFxTJew5cpJBZUOgPML+IX2D3C Sz2RmnwSxzo9UaN+X7OXL6kmg/JTh6JMhm783O/L7MifN4Zyi9K1fbWO4sn42TCL/fxd BpdQaJ2Y78vvOu71GfVg7FM+HMDqn/aXM8NLzihNhRTpX5l8mL+Ae9PJHryMuhJEXA2V S/AbRVEpeoX8g4+3xbZvsExCyFYjvfu63HkVz1s3oaDsPg5Ma8FMxT4sCYNZfvCGH5/w kOuWGF08nILDDbKQFDXFsPZ/tM34aAqBg52LdXGzNZ+rdvGIGpBFTXhsN/F2nwU+tV/L yrWA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature :dkim-signature; bh=z7gz9m/wDLk7VauKP/nz592F9OD+w767xYdMeuCwADM=; fh=q8+PZN6c8UaaCh5MYcr1kv96S/7TK9paO+SmSa24tPE=; b=pTHkNGaqgFUoQ8fKO3x6jqelBcPAynydnVRJcNTZ4l6xmb5ni6/oH7vyjWV+bLGOBL iKJ2TjdSXhDwmurdtVvso4Uq4Mu8v6zwvbrke813/mO7wO9l8ZwnW6j9g8e7xYvv8iIT VSQhumGXVn2UxIzqgeyc0NzTDJlRwB0GWim1sD40RRrT5135EDzFkqTagVyr3KTamsbu WeSn/WvR1/f8Z1w9xJxPlx2SgKg6+GbJQZCL2ZL5JX7t+XkD96svjMFlN1M1VEit+91X fnnO/C5D1r9dAhSNv4ksZzAgUss27xtO06nYETHqX3gSXj6ksWrVFAX1ApfjJ0fg8R6e Tw9g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=3bOIskra; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519 header.b=nalwXWpo; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id d63-20020a633642000000b00578ccb95206si135864pga.745.2023.09.19.06.01.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Sep 2023 06:01:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=3bOIskra; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519 header.b=nalwXWpo; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id DE30D8081BC9; Tue, 19 Sep 2023 05:05:37 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229699AbjISMFl (ORCPT + 99 others); Tue, 19 Sep 2023 08:05:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56690 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229497AbjISMFk (ORCPT ); Tue, 19 Sep 2023 08:05:40 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 69C42A9 for ; Tue, 19 Sep 2023 05:05:34 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id D17342296B; Tue, 19 Sep 2023 12:05:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1695125132; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=z7gz9m/wDLk7VauKP/nz592F9OD+w767xYdMeuCwADM=; b=3bOIskraCYMl3jpMPGpSpaTcGw6n9x1IjCVeITRT0aVTb47NdaLMh4U3gAm4ouSgO3or4h XUtydm8uex+ZAXPZeJ3nMzUDWVbvaFDJRXiIV2pCUfACzhPN9goQkk68Aw3Y3fyjuBRJn0 vst72Z8spnbcI5OukY9Vj6bLrKHycfM= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1695125132; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=z7gz9m/wDLk7VauKP/nz592F9OD+w767xYdMeuCwADM=; b=nalwXWpo5OfqOIuXMGKtVONItQ0UnpzLenDbxtLplWiJWCZWfHgFABpLaLcT9ZIhN6A7Q6 WGZej0ZmyXm0VBAw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id B8EED134F3; Tue, 19 Sep 2023 12:05:32 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 0jIjLYyOCWUEBQAAMHmgww (envelope-from ); Tue, 19 Sep 2023 12:05:32 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 53708A0759; Tue, 19 Sep 2023 14:05:32 +0200 (CEST) Date: Tue, 19 Sep 2023 14:05:32 +0200 From: Jan Kara To: Gao Xiang Cc: linux-ext4@vger.kernel.org, Theodore Ts'o , Jan Kara , Matthew Bobrowski , Christoph Hellwig , Joseph Qi Subject: Re: [bug report] ext4 misses final i_size meta sync under O_DIRECT | O_SYNC semantics after iomap DIO conversion Message-ID: <20230919120532.5dg7mgdnwd5lezgz@quack3> References: <02d18236-26ef-09b0-90ad-030c4fe3ee20@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <02d18236-26ef-09b0-90ad-030c4fe3ee20@linux.alibaba.com> X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Tue, 19 Sep 2023 05:05:38 -0700 (PDT) Hello! On Tue 19-09-23 14:00:04, Gao Xiang wrote: > Our consumer reports a behavior change between pre-iomap and iomap > direct io conversion: > > If the system crashes after an appending write to a file open with > O_DIRECT | O_SYNC flag set, file i_size won't be updated even if > O_SYNC was marked before. > > It can be reproduced by a test program in the attachment with > gcc -o repro repro.c && ./repro testfile && echo c > /proc/sysrq-trigger > > After some analysis, we found that before iomap direct I/O conversion, > the timing was roughly (taking Linux 3.10 codebase as an example): > > .. > - ext4_file_dio_write > - __generic_file_aio_write > .. > - ext4_direct_IO # generic_file_direct_write > - ext4_ext_direct_IO > - ext4_ind_direct_IO # final_size > inode->i_size > - .. > - ret = blockdev_direct_IO() > - i_size_write(inode, end) # orphan && ret > 0 && > # end > inode->i_size > - ext4_mark_inode_dirty() > - ... > - generic_write_sync # handling O_SYNC > > So the dirty inode meta will be committed into journal immediately > if O_SYNC is set. However, After commit 569342dc2485 ("ext4: move > inode extension/truncate code out from ->iomap_end() callback"), > the new behavior seems as below: > > .. > - ext4_dio_write_iter > - ext4_dio_write_checks # extend = 1 > - iomap_dio_rw > - __iomap_dio_rw > - iomap_dio_complete > - generic_write_sync > - ext4_handle_inode_extension # extend = 1 > > So that i_size will be recorded only after generic_write_sync() is > called. So O_SYNC won't flush the update i_size to the disk. Indeed, that looks like a bug. Thanks for report! > On the other side, after a quick look of XFS side, it will record > i_size changes in xfs_dio_write_end_io() so it seems that it doesn't > have this problem. Yes, I'm a bit hazy on the details but I think we've decided to call ext4_handle_inode_extension() directly from ext4_dio_write_iter() because from ext4_dio_write_end_io() it was difficult to test in a race-free way whether extending i_size (and i_disksize) is needed or not (we don't necessarily hold i_rwsem there). I'll think how we could fix the problem you've reported. Honza -- Jan Kara SUSE Labs, CR