Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp5158851ybe; Tue, 17 Sep 2019 03:43:11 -0700 (PDT) X-Google-Smtp-Source: APXvYqxpfVUSatGIC6jAJHB0sgkHSxTkaG7qLYFNs+XOCugQI/2kyidicejIrZMZyHJiYSnWgLXz X-Received: by 2002:a17:906:b283:: with SMTP id q3mr1306445ejz.7.1568716991691; Tue, 17 Sep 2019 03:43:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568716991; cv=none; d=google.com; s=arc-20160816; b=LPEoIkxNWZPS6iGRZ04DHY5ijGD3ktHjEfCteGABlowTWtAi6bhyNAEiK/5FSAglwQ L7nfDgie/D1EDCiCohj2VZM9YzRi9qgwXm7jZXvzOzuYdpuhBK71+cvneHWtUc9nBQCI wybdjT5XHLQHp6CIzCLedfJTxHeQINq9oG7/D7OglrfEoy4oT+DbkP8WznS09KNTmmsV qBUay9Ho17XgN1c/a1e341OuxqYHjZLdKtI4Hddt8WrEwhGd12ralz0x7cbdO+sW33Gg Zb2+XXzI0Txv4BU5VAj6NXnlz/YQN0ufh3+xbFucboLbqDZM/PXpmYzD/VO++Qd+rtxT cE0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date:from :references:cc:to:subject; bh=3cWPIadADm+cLhmF7cOf6vrgF7hnfcskEKjPljJt6JM=; b=ZBUAfJEEw8LMnho6PL0l5shEU77ex5Pvn8r4BOTmBdQRrWFCwu6ezAqZPIJZnB3jut R28lhLaczL7Hdktq99VirhJ+l6RN6lkzype/8pQbQmpUcxWkdd/c40qcXWYuF1CChVnL rCJ/s6rwttzCayZunIHJ/RMd+OBJ6Zuqz4W9nDGl1RhTEZNTsG2WcYUzd2+itt7uZv4v sxzk8vV9fJH6M0ZYX7cpbTEY7rhV9gGj+f7KS3nYLmc9iiYbs+SM4oEmfVHtntPlv/vd qVq4ik+FM98faEs+OWTzdQQ2Eyc879Gwri3eMSnSZ2Bd7UA8YVmscTkiO1h2ZAG7hQOI /CXA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w58si1055526edc.312.2019.09.17.03.42.40; Tue, 17 Sep 2019 03:43:11 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727247AbfIQKMb (ORCPT + 99 others); Tue, 17 Sep 2019 06:12:31 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:41854 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727237AbfIQKMb (ORCPT ); Tue, 17 Sep 2019 06:12:31 -0400 Received: from pps.filterd (m0098414.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x8HA6Kkg146817 for ; Tue, 17 Sep 2019 06:12:29 -0400 Received: from e06smtp02.uk.ibm.com (e06smtp02.uk.ibm.com [195.75.94.98]) by mx0b-001b2d01.pphosted.com with ESMTP id 2v2u286jj2-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 17 Sep 2019 06:12:29 -0400 Received: from localhost by e06smtp02.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 17 Sep 2019 11:12:27 +0100 Received: from b06cxnps3074.portsmouth.uk.ibm.com (9.149.109.194) by e06smtp02.uk.ibm.com (192.168.101.132) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Tue, 17 Sep 2019 11:12:22 +0100 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x8HACLFJ19136644 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 17 Sep 2019 10:12:21 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B04C55205A; Tue, 17 Sep 2019 10:12:21 +0000 (GMT) Received: from localhost.localdomain (unknown [9.124.31.57]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id 3568352052; Tue, 17 Sep 2019 10:12:20 +0000 (GMT) Subject: Re: [PATCH v3 5/6] ext4: introduce direct IO write path using iomap infrastructure To: Christoph Hellwig Cc: Matthew Bobrowski , tytso@mit.edu, jack@suse.cz, adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, david@fromorbit.com, darrick.wong@oracle.com References: <20190916121248.GD4005@infradead.org> <20190916223741.GA5936@bobrowski> <20190917090016.266CB520A1@d06av21.portsmouth.uk.ibm.com> <20190917090233.GB29487@infradead.org> From: Ritesh Harjani Date: Tue, 17 Sep 2019 15:42:19 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2 MIME-Version: 1.0 In-Reply-To: <20190917090233.GB29487@infradead.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 x-cbid: 19091710-0008-0000-0000-000003176F8B X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19091710-0009-0000-0000-00004A35ECD0 Message-Id: <20190917101220.3568352052@d06av21.portsmouth.uk.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-09-17_05:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=686 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1908290000 definitions=main-1909170103 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On 9/17/19 2:32 PM, Christoph Hellwig wrote: > On Tue, Sep 17, 2019 at 02:30:15PM +0530, Ritesh Harjani wrote: >> So if we have a delayed buffered write to a file, >> in that case we first only update inode->i_size and update >> i_disksize at writeback time >> (i.e. during block allocation). >> In that case when we call for ext4_dio_write_iter >> since offset + len > i_disksize, we call for ext4_update_i_disksize(). >> >> Now if writeback for some reason failed. And the system crashes, during the >> DIO writes, after the blocks are allocated. Then during reboot we may have >> an inconsistent inode, since we did not add the inode into the >> orphan list before we updated the inode->i_disksize. And journal replay >> may not succeed. >> >> 1. Can above actually happen? I am still not able to figure out the >> race/inconsistency completely. >> 2. Can you please help explain under what other cases >> it was necessary to call ext4_update_i_disksize() in DIO write paths? >> 3. When will i_disksize be out-of-sync with i_size during DIO writes? > > None of the above seems new in this patchset, does it? That being said In original code before updating i_disksize in ext4_direct_IO_write, we used to add the inode into the orphan list (which will mark the iloc dirty and also update the ondisk inode size). Only then we update the i_disksize to inode->i_size (which still I don't understand the reason to put inside open journal handle). So in case if the crash happens, then in the recovery, we can replay the journal and we truncate any extra blocks beyond i_size. (ext4_orphan_cleanup()). In new iomap implementation (i.e. this patchset), we are doing this in reverse. We first call for ext4_update_i_disksize() in ext4_dio_write_iter(), and then in ext4_iomap_begin() after ext4_map_blocks(), we add the inode to orphan list, which I am not really sure whether it is really consistent with on disk size?? > I found the early size update odd. XFS updates the on-disk size only > at I/O completion time to deal with various races including the > potential exposure of stale data. > Yes, can't really say why it is the case in ext4. That's mostly what I wanted to understand from previous queries. -ritesh