Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp346497ybe; Wed, 18 Sep 2019 18:36:22 -0700 (PDT) X-Google-Smtp-Source: APXvYqwhPVcYyPUupJFpK/uXGd2GWXii7q1t3ZKy1yYmFdp0D9acfaZLODUSlmbW9zd0xIdqbXDy X-Received: by 2002:aa7:df14:: with SMTP id c20mr13585990edy.133.1568856982683; Wed, 18 Sep 2019 18:36:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568856982; cv=none; d=google.com; s=arc-20160816; b=ZiOND6sIoxVcVPTnwDcz/o3z5FNbGQtHCymV9fWJNePRq6FsqCBeB5iOLAPGFH8Ny7 kmvNz7e5d6vUeeoTM/S5HiNJlvNa6uPdx7O1LqP8a79MrIsIJoNrRqQTDSkNZjg44pnN SCQYh8HEMEfHNcxr3eauVUsiApQXFiBzHf9OLtualHKssk2bdgWkoBbQbvOb4rOozVlZ XmfiIVVQCWRX2YJCpv/As2UbUYtH3jxssUsWLhGRYfgnVOVqRh+1a246izwDcRfSmKIQ TDPHWsWRJDhzqzmwHYPizLPQbSy1x89t+MQ/lBdMNKT/5cdH3n2i1hhZQsNuUXtAvNQZ AbvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=TRPRNSXu705BZiLqniL3yQ7LZIJzm+Ne056OHrr/ZoU=; b=Yixr0EE11ETfRiWWKLAFItM7t1k+MDS9HA2VwjLSQZXsMZe6Dd/K8m0S0gzPqWKpoa QVQxVbsSGja5XhlTOW3YIVRzSA7cIjccqCKT5IkVdShI1vtVYvv/V4f2TDrz6NkEtvFf atl9jvCoGafCjldvi8+gDfKmdGkWDy467YkflCu5XQfWnHoSotVxHDu/O6X2yy2Dmafd gK+S3qjpyud55xEBkii5PhUen1fVPbG3qdYEQYm8gzRpi9aC56x/x57BzKKt7civ/Vpz wDv5Ak1wG/tB4DW5veKcQIlJwxvDJ+Xo2saEC1XY2TZ9dPhLTmWm2huRaWmFY4ow5NoG HwJg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h53si4887080edh.147.2019.09.18.18.35.50; Wed, 18 Sep 2019 18:36:22 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387785AbfISB3g (ORCPT + 99 others); Wed, 18 Sep 2019 21:29:36 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:2730 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2387395AbfISB3f (ORCPT ); Wed, 18 Sep 2019 21:29:35 -0400 Received: from DGGEMS411-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id E4FEFF1993D3E79DF086; Thu, 19 Sep 2019 09:29:33 +0800 (CST) Received: from [127.0.0.1] (10.133.210.141) by DGGEMS411-HUB.china.huawei.com (10.3.19.211) with Microsoft SMTP Server id 14.3.439.0; Thu, 19 Sep 2019 09:29:27 +0800 Subject: Re: [PATCH] ext4: fix a bug in ext4_wait_for_tail_page_commit To: Jan Kara CC: , , , References: <20190917084814.40370-1-yangerkun@huawei.com> <20190918104535.GC25056@quack2.suse.cz> <71e28624-214c-f676-b215-19f78266de84@huawei.com> <20190918132719.GE31891@quack2.suse.cz> From: yangerkun Message-ID: Date: Thu, 19 Sep 2019 09:29:26 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0 MIME-Version: 1.0 In-Reply-To: <20190918132719.GE31891@quack2.suse.cz> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.133.210.141] X-CFilter-Loop: Reflected Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On 2019/9/18 21:27, Jan Kara wrote: > On Wed 18-09-19 21:09:00, yangerkun wrote: >> On 2019/9/18 18:45, Jan Kara wrote: >>> On Tue 17-09-19 16:48:14, yangerkun wrote: >>>> No need to wait when offset equals to 0. And it will trigger a bug since >>>> the latter __ext4_journalled_invalidatepage can free the buffers but leave >>>> page still dirty. >>>> >>>> [ 26.057508] ------------[ cut here ]------------ >>>> [ 26.058531] kernel BUG at fs/ext4/inode.c:2134! >>>> ... >>>> [ 26.088130] Call trace: >>>> [ 26.088695] ext4_writepage+0x914/0xb28 >>>> [ 26.089541] writeout.isra.4+0x1b4/0x2b8 >>>> [ 26.090409] move_to_new_page+0x3b0/0x568 >>>> [ 26.091338] __unmap_and_move+0x648/0x988 >>>> [ 26.092241] unmap_and_move+0x48c/0xbb8 >>>> [ 26.093096] migrate_pages+0x220/0xb28 >>>> [ 26.093945] kernel_mbind+0x828/0xa18 >>>> [ 26.094791] __arm64_sys_mbind+0xc8/0x138 >>>> [ 26.095716] el0_svc_common+0x190/0x490 >>>> [ 26.096571] el0_svc_handler+0x60/0xd0 >>>> [ 26.097423] el0_svc+0x8/0xc >>>> >>>> Run below parallel can reproduce it easily(ext3): >>>> void main() >>>> { >>>> int fd, fd1, fd2, fd3, ret; >>>> void *addr; >>>> size_t length = 4096; >>>> int flags; >>>> off_t offset = 0; >>>> char *str = "12345"; >>>> >>>> fd = open("a", O_RDWR | O_CREAT); >>>> assert(fd >= 0); >>>> >>>> ret = ftruncate(fd, length); >>>> assert(ret == 0); >>>> >>>> fd1 = open("a", O_RDWR | O_CREAT, -1); >>>> assert(fd1 >= 0); >>>> >>>> flags = 0xc00f;/*Journal data mode*/ >>>> ret = ioctl(fd1, _IOW('f', 2, long), &flags); >>>> assert(ret == 0); >>>> >>>> fd2 = open("a", O_RDWR | O_CREAT); >>>> assert(fd2 >= 0); >>>> >>>> fd3 = open("a", O_TRUNC | O_NOATIME); >>>> assert(fd3 >= 0); >>>> >>>> addr = mmap(NULL, length, 0xe, 0x28013, fd2, offset); >>> >>> Ugh, these mmap flags look pretty bogus. Were they generated by some >>> fuzzer? >> Yeah, generated by syzkaller. >>> >>>> assert(addr != (void *)-1); >>>> memcpy(addr, str, 5); >>> >>> Also the O_TRUNC open above will truncate "a" to 0 so the mapping is >>> actually beyond i_size and this memcpy should fail with SIGBUS. So I'm >>> surprised your test program gets up to mbind()... >> >> We run the program parallel, sometimes will run as below: >> >> reproduce1 reproduce2 >> >> ... | ... >> truncate to 4k | >> change to journal data mode | >> | memcpy(set page dirty) >> truncate to 0: | >> ext4_setattr: | >> ... | >> ext4_wait_for_tail_page_commit | >> | mbind(trigger bug) >> truncate_pagecache(clean dirty)| ... >> ... | >> Reproduce2 will mark page as dirty by memcpy, then mbind run between >> ext4_wait_for_tail_page_commit and truncate_pagecache in ext4_setattr can >> trigger the bug with page still be dirty but buffer head has been free. > > Aha! Thanks for explanation. Makes sense. So I agree with your patch but we > also need to update the comment before the condition in > ext4_wait_for_tail_page_commit(). Something like: > > If the page is fully truncated, we don't need to wait for any commit (and > we even should not as __ext4_journalled_invalidatepage() may strip all > buffers from the page but keep the page dirty which can then confuse e.g. > concurrent ext4_writepage() seeing dirty page without buffers). Also we > don't need to wait for any commit if all buffers in the page remain valid. > This is most beneficial for the common case of blocksize == PAGE_SIZE. I will add this comment and reorganize the patch. Thanks a lot! > > Honza > >>>> --- >>>> fs/ext4/inode.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c >>>> index 006b7a2070bf..a9943ae4f74d 100644 >>>> --- a/fs/ext4/inode.c >>>> +++ b/fs/ext4/inode.c >>>> @@ -5479,7 +5479,7 @@ static void ext4_wait_for_tail_page_commit(struct inode *inode) >>>> * do. We do the check mainly to optimize the common PAGE_SIZE == >>>> * blocksize case >>>> */ >>>> - if (offset > PAGE_SIZE - i_blocksize(inode)) >>>> + if (!offset || offset > PAGE_SIZE - i_blocksize(inode)) >>>> return; >>>> while (1) { >>>> page = find_lock_page(inode->i_mapping,