Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp6629032ybe; Wed, 18 Sep 2019 06:39:45 -0700 (PDT) X-Google-Smtp-Source: APXvYqxOqDiqf00AyG4Pp6ToK7y2S4r+eMByVkbDzSvV8BdEy9HzI1hfiQUq96klM9/BYKIaFSv1 X-Received: by 2002:a50:b501:: with SMTP id y1mr10241243edd.167.1568813985350; Wed, 18 Sep 2019 06:39:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568813985; cv=none; d=google.com; s=arc-20160816; b=LO1+La2zmplkF8IsL7mxQsUEEoCqQ4dH97V3kKaUo6rnOdexqWnlEv+8jm+yygX/hv jofvuo8fKPs9igUA9DLlY9CyLbFOBmkhgAMOMOnPB67nr4pRzgiadHh7H7e1IhCIhElV 0mBc9joGC/sxZHqadLwBwdT0lukdR0R6bGQIh1GZPsHzE3pa0cthG09igso8zn0UyKzt gQyHX/Nkb4Xgfk7K3/NsRW7uGOkxhJ3eV6oWe5Bt9CFKiInRl9D2KwyHPqR/GY4BUm2f t0LHVXPZfxwkqVnuCKGBFsqL0omBolCrucu3Ti+XSfcFB2j9sr3aYc0l6PJt/ziV1c+j 0dsQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=JCGNrZZAS0iQsNqBXaxwE+ULN2unaM234UgifTuHviA=; b=gnREWWvu7mKCdIu3tD/tsD+6kmhOOxUcSbS6Ji2Acjo1QP32Bc+dBTh1+x8MxQJ6uc tagBkJJIiOII+mMAIRYzmgu/FyM6boyLd8YmKkeDjNqVBHX047n8TGBePsAYcKGr/SaW Kgdf3HXjSM5o6sEMFqlGhVn/icTAfwuSmVzhMWiFaINcW5rFvojzWv+zS9TQ+aNohFAx +HIDG2+d9FKGt54R/hsok1FdVuecm4gxpVaameJYABYhonDH0bPo3Qc73NoPlxyHrpRs OPQ1tFnDh3aV8wwiEjqk3up8FL/HjJe4OC5q7eXSwmv/6G9oA2DnjfGUblgfNB63wK/E 5QAw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v25si3408486edb.286.2019.09.18.06.39.20; Wed, 18 Sep 2019 06:39:45 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726369AbfIRN1L (ORCPT + 99 others); Wed, 18 Sep 2019 09:27:11 -0400 Received: from mx2.suse.de ([195.135.220.15]:44006 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726038AbfIRN1L (ORCPT ); Wed, 18 Sep 2019 09:27:11 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 8567EB633; Wed, 18 Sep 2019 13:27:09 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id C907A1E4201; Wed, 18 Sep 2019 15:27:19 +0200 (CEST) Date: Wed, 18 Sep 2019 15:27:19 +0200 From: Jan Kara To: yangerkun Cc: Jan Kara , tytso@mit.edu, linux-ext4@vger.kernel.org, yi.zhang@huawei.com, houtao1@huawei.com Subject: Re: [PATCH] ext4: fix a bug in ext4_wait_for_tail_page_commit Message-ID: <20190918132719.GE31891@quack2.suse.cz> References: <20190917084814.40370-1-yangerkun@huawei.com> <20190918104535.GC25056@quack2.suse.cz> <71e28624-214c-f676-b215-19f78266de84@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <71e28624-214c-f676-b215-19f78266de84@huawei.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Wed 18-09-19 21:09:00, yangerkun wrote: > On 2019/9/18 18:45, Jan Kara wrote: > > On Tue 17-09-19 16:48:14, yangerkun wrote: > > > No need to wait when offset equals to 0. And it will trigger a bug since > > > the latter __ext4_journalled_invalidatepage can free the buffers but leave > > > page still dirty. > > > > > > [ 26.057508] ------------[ cut here ]------------ > > > [ 26.058531] kernel BUG at fs/ext4/inode.c:2134! > > > ... > > > [ 26.088130] Call trace: > > > [ 26.088695] ext4_writepage+0x914/0xb28 > > > [ 26.089541] writeout.isra.4+0x1b4/0x2b8 > > > [ 26.090409] move_to_new_page+0x3b0/0x568 > > > [ 26.091338] __unmap_and_move+0x648/0x988 > > > [ 26.092241] unmap_and_move+0x48c/0xbb8 > > > [ 26.093096] migrate_pages+0x220/0xb28 > > > [ 26.093945] kernel_mbind+0x828/0xa18 > > > [ 26.094791] __arm64_sys_mbind+0xc8/0x138 > > > [ 26.095716] el0_svc_common+0x190/0x490 > > > [ 26.096571] el0_svc_handler+0x60/0xd0 > > > [ 26.097423] el0_svc+0x8/0xc > > > > > > Run below parallel can reproduce it easily(ext3): > > > void main() > > > { > > > int fd, fd1, fd2, fd3, ret; > > > void *addr; > > > size_t length = 4096; > > > int flags; > > > off_t offset = 0; > > > char *str = "12345"; > > > > > > fd = open("a", O_RDWR | O_CREAT); > > > assert(fd >= 0); > > > > > > ret = ftruncate(fd, length); > > > assert(ret == 0); > > > > > > fd1 = open("a", O_RDWR | O_CREAT, -1); > > > assert(fd1 >= 0); > > > > > > flags = 0xc00f;/*Journal data mode*/ > > > ret = ioctl(fd1, _IOW('f', 2, long), &flags); > > > assert(ret == 0); > > > > > > fd2 = open("a", O_RDWR | O_CREAT); > > > assert(fd2 >= 0); > > > > > > fd3 = open("a", O_TRUNC | O_NOATIME); > > > assert(fd3 >= 0); > > > > > > addr = mmap(NULL, length, 0xe, 0x28013, fd2, offset); > > > > Ugh, these mmap flags look pretty bogus. Were they generated by some > > fuzzer? > Yeah, generated by syzkaller. > > > > > assert(addr != (void *)-1); > > > memcpy(addr, str, 5); > > > > Also the O_TRUNC open above will truncate "a" to 0 so the mapping is > > actually beyond i_size and this memcpy should fail with SIGBUS. So I'm > > surprised your test program gets up to mbind()... > > We run the program parallel, sometimes will run as below: > > reproduce1 reproduce2 > > ... | ... > truncate to 4k | > change to journal data mode | > | memcpy(set page dirty) > truncate to 0: | > ext4_setattr: | > ... | > ext4_wait_for_tail_page_commit | > | mbind(trigger bug) > truncate_pagecache(clean dirty)| ... > ... | > Reproduce2 will mark page as dirty by memcpy, then mbind run between > ext4_wait_for_tail_page_commit and truncate_pagecache in ext4_setattr can > trigger the bug with page still be dirty but buffer head has been free. Aha! Thanks for explanation. Makes sense. So I agree with your patch but we also need to update the comment before the condition in ext4_wait_for_tail_page_commit(). Something like: If the page is fully truncated, we don't need to wait for any commit (and we even should not as __ext4_journalled_invalidatepage() may strip all buffers from the page but keep the page dirty which can then confuse e.g. concurrent ext4_writepage() seeing dirty page without buffers). Also we don't need to wait for any commit if all buffers in the page remain valid. This is most beneficial for the common case of blocksize == PAGE_SIZE. Honza > > > --- > > > fs/ext4/inode.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > > > index 006b7a2070bf..a9943ae4f74d 100644 > > > --- a/fs/ext4/inode.c > > > +++ b/fs/ext4/inode.c > > > @@ -5479,7 +5479,7 @@ static void ext4_wait_for_tail_page_commit(struct inode *inode) > > > * do. We do the check mainly to optimize the common PAGE_SIZE == > > > * blocksize case > > > */ > > > - if (offset > PAGE_SIZE - i_blocksize(inode)) > > > + if (!offset || offset > PAGE_SIZE - i_blocksize(inode)) > > > return; > > > while (1) { > > > page = find_lock_page(inode->i_mapping, -- Jan Kara SUSE Labs, CR