Received: by 10.223.164.200 with SMTP id h8csp645712wrb; Sun, 5 Nov 2017 17:36:39 -0800 (PST) X-Google-Smtp-Source: ABhQp+QI4XmT1VLfR5WbzFiNFhnC5ct0ZjuCeTkvWmc78rh/SJ5f+AoKZ4b8thGljyHQoHc4Uefg X-Received: by 10.159.198.11 with SMTP id f11mr13485184plo.425.1509932199199; Sun, 05 Nov 2017 17:36:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1509932199; cv=none; d=google.com; s=arc-20160816; b=Vg7kiRR77vios1Qe1l45INQGVp2dupqYbslNix7fQakMZCL1OVzeBbIY2ezCA0aLvO NgOTQlCE3Q8804mU9ALyHvARfICop0IMgsUCXCI+MkvVApqTLkXtT9eIw14BNbwczxpn 9LZjlbQXbH3bpw8+zLL0C6tE6/6FZYyVsN/hrahbR9+vMmnxgt2S7yVkExmHvmTPqs4k dy+EyU+GpcnTtJrKPs19bvgDLZQN3C5V81itXbGbO/6/fHegcvhrjTOlMxxpQuy6ZN+x MFiMZBdq2oEdZaZtQpBAxJI+kUbvJolO1X/OWKAEHxzKF70qL7kC1AMba/qvSBlbg5bR NxkA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=G3ceS8rCEyU9h7UxiRbaBtPXb2tUpekKa0OJijHxtTk=; b=UbTkVe+Hn7Rtfg59RcSZJtkrbk2+QwQ8GmOZbVu27QNnF6eGDI0vYzW+vRFBkhtxbv PyovNAgQQ7xR6+Cj/ADFBPgxd7+8JH2xrVDZf3P/Y0nODlYpoAIvkXjv5U9mZipjUTA8 yIyi3pKn4DozWRJ4cbgHCOepmN/Vu06GT1/roBymOgFBTLrBNYjb8ow0umrXiuMKqDNO djaR83xTKQ+TlSM5v7IYvw60Iuco3BMiB3aBDob3itiZRGQJGRLF8H1hxbBeg2mxrGe9 GT9mROKM1QVBhmswH2qaV4ginsE2aMCkDUTv0zfbQNf0QjNGHm4Le9WuMXw/oYUEKVbK qZRA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t73si10452659pgb.377.2017.11.05.17.36.26; Sun, 05 Nov 2017 17:36:39 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751428AbdKFBfu (ORCPT + 96 others); Sun, 5 Nov 2017 20:35:50 -0500 Received: from szxga05-in.huawei.com ([45.249.212.191]:10010 "EHLO szxga05-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750778AbdKFBfs (ORCPT ); Sun, 5 Nov 2017 20:35:48 -0500 Received: from 172.30.72.59 (EHLO DGGEMS413-HUB.china.huawei.com) ([172.30.72.59]) by dggrg05-dlp.huawei.com (MOS 4.4.6-GA FastPath queued) with ESMTP id DKL01088; Mon, 06 Nov 2017 09:35:27 +0800 (CST) Received: from [127.0.0.1] (10.111.220.140) by DGGEMS413-HUB.china.huawei.com (10.3.19.213) with Microsoft SMTP Server id 14.3.361.1; Mon, 6 Nov 2017 09:35:20 +0800 Subject: Re: [PATCH v2] f2fs: fix out-of-free problem caused by atomic write To: Chao Yu , Jaegeuk Kim CC: , , , , , , References: <1509027715-80477-1-git-send-email-yunlong.song@huawei.com> <1509368658-60355-1-git-send-email-yunlong.song@huawei.com> <20171103034618.GD11335@jaegeuk-macbookpro.roam.corp.google.com> <3ab54002-4164-1582-cb76-eb337e126ebf@huawei.com> <8bc47f71-2827-c80e-9099-d7a601e896be@kernel.org> From: Yunlong Song Message-ID: Date: Mon, 6 Nov 2017 09:34:10 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <8bc47f71-2827-c80e-9099-d7a601e896be@kernel.org> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-Originating-IP: [10.111.220.140] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A090204.59FFBC5F.00B8,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2014-11-16 11:51:01, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: e24accfc6787e471cb535da035119b87 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org So there is no connection between sbi->user_block_count - valid_user_blocks(sbi) and fi->inmem_blocks. It is sensible that sbi->user_block_count - valid_user_blocks(sbi) is smaller than fi->inmem_blocks. On 2017/11/3 23:23, Chao Yu wrote: > On 2017/11/3 22:40, Yunlong Song wrote: >> Test: >> Newest kernel source code from f2fs-dev >> 1G zram with f2fs >> 8 threads to atomic write one same file on zram >> there are four kinds of atomic write at the same time: >> 1 no atomic start, with atomic commit >> 2 no atomic start, no atomic commit >> 3 atomic start, with atomic commit >> 4 atomic start, no atomic commit >> >> And I add dump_stack after the check as following, >> + if ((sbi->user_block_count - valid_user_blocks(sbi)) < >> + fi->inmem_blocks) { > valid_user_blocks contains fi->inmem_blocks and all reserved new node blocks? > > Thanks, > >> + dump_stack(); >> + err = -ENOSPC; >> + goto drop; >> + } >> >> then we have: >> >> [ 136.237247] F2FS-fs (zram1): Unexpected flush for atomic writes: ino=4, npages=8193 >> [ 136.952469] CPU: 1 PID: 1274 Comm: atomic_t2 Not tainted 4.14.0-rc4+ #109 >> [ 136.952947] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014 >> [ 136.953162] Call Trace: >> [ 136.953162] dump_stack+0x4d/0x6e >> [ 136.953162] commit_inmem_pages+0x258/0x270 >> [ 136.953162] ? __sb_start_write+0x48/0x80 >> [ 136.953162] ? __mnt_want_write_file+0x18/0x30 >> [ 136.953162] f2fs_ioctl+0x1025/0x1e30 >> [ 136.953162] ? up_write+0x25/0x30 >> [ 136.953162] ? f2fs_file_write_iter+0xf3/0x1e0 >> [ 136.953162] ? selinux_file_ioctl+0x114/0x1e0 >> [ 136.953162] do_vfs_ioctl+0x96/0x5a0 >> [ 136.953162] SyS_ioctl+0x79/0x90 >> [ 136.953162] ? SyS_lseek+0x87/0xb0 >> [ 136.953162] entry_SYSCALL_64_fastpath+0x13/0x94 >> [ 136.953162] RIP: 0033:0x434b97 >> [ 136.953162] RSP: 002b:00007ffc68859de8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 >> [ 136.953162] RAX: ffffffffffffffda RBX: 00000000006b78e0 RCX: 0000000000434b97 >> [ 136.953162] RDX: 00000000006b70e8 RSI: 000000000000f502 RDI: 0000000000000003 >> [ 136.953162] RBP: 0000000002000010 R08: 00000000006b70e8 R09: 00000000006b7160 >> [ 136.953162] R10: 0000000000000022 R11: 0000000000000202 R12: 00007f491a1c4010 >> [ 136.953162] R13: 0000000002001000 R14: 0000000002000000 R15: 00000000006b7938 >> >> So I think we should add the check code. >> >> On 2017/11/3 12:48, Yunlong Song wrote: >>> Because I found that it will still lead to out-of-free problem with out that check. >>> I trace and find that it is possible that the committing date pages of the atomic >>> file is bigger than the sbi->user_block_count - valid_user_blocks(sbi), so I add >>> this check. >>> >>> On 2017/11/3 11:46, Jaegeuk Kim wrote: >>>> On 10/30, Yunlong Song wrote: >>>>> f2fs_balance_fs only actives once in the commit_inmem_pages, but there >>>>> are more than one page to commit, so all the other pages will miss the >>>>> check. This will lead to out-of-free problem when commit a very large >>>>> file. However, we cannot do f2fs_balance_fs for each inmem page, since >>>>> this will break atomicity. As a result, we should collect prefree >>>>> segments if needed and stop atomic commit when there are not enough >>>>> available blocks to write atomic pages. >>>>> >>>>> Signed-off-by: Yunlong Song >>>>> --- >>>>> fs/f2fs/f2fs.h | 1 + >>>>> fs/f2fs/segment.c | 29 ++++++++++++++++++++++++++++- >>>>> 2 files changed, 29 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h >>>>> index 13a96b8..04ce48f 100644 >>>>> --- a/fs/f2fs/f2fs.h >>>>> +++ b/fs/f2fs/f2fs.h >>>>> @@ -610,6 +610,7 @@ struct f2fs_inode_info { >>>>> struct list_head inmem_pages; /* inmemory pages managed by f2fs */ >>>>> struct task_struct *inmem_task; /* store inmemory task */ >>>>> struct mutex inmem_lock; /* lock for inmemory pages */ >>>>> + unsigned long inmem_blocks; /* inmemory blocks */ >>>>> struct extent_tree *extent_tree; /* cached extent_tree entry */ >>>>> struct rw_semaphore dio_rwsem[2];/* avoid racing between dio and gc */ >>>>> struct rw_semaphore i_mmap_sem; >>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c >>>>> index 46dfbca..813c110 100644 >>>>> --- a/fs/f2fs/segment.c >>>>> +++ b/fs/f2fs/segment.c >>>>> @@ -210,6 +210,7 @@ void register_inmem_page(struct inode *inode, struct page *page) >>>>> list_add_tail(&fi->inmem_ilist, &sbi->inode_list[ATOMIC_FILE]); >>>>> spin_unlock(&sbi->inode_lock[ATOMIC_FILE]); >>>>> inc_page_count(F2FS_I_SB(inode), F2FS_INMEM_PAGES); >>>>> + fi->inmem_blocks++; >>>>> mutex_unlock(&fi->inmem_lock); >>>>> trace_f2fs_register_inmem_page(page, INMEM); >>>>> @@ -221,6 +222,7 @@ static int __revoke_inmem_pages(struct inode *inode, >>>>> struct f2fs_sb_info *sbi = F2FS_I_SB(inode); >>>>> struct inmem_pages *cur, *tmp; >>>>> int err = 0; >>>>> + struct f2fs_inode_info *fi = F2FS_I(inode); >>>>> list_for_each_entry_safe(cur, tmp, head, list) { >>>>> struct page *page = cur->page; >>>>> @@ -263,6 +265,7 @@ static int __revoke_inmem_pages(struct inode *inode, >>>>> list_del(&cur->list); >>>>> kmem_cache_free(inmem_entry_slab, cur); >>>>> dec_page_count(F2FS_I_SB(inode), F2FS_INMEM_PAGES); >>>>> + fi->inmem_blocks--; >>>>> } >>>>> return err; >>>>> } >>>>> @@ -302,6 +305,10 @@ void drop_inmem_pages(struct inode *inode) >>>>> if (!list_empty(&fi->inmem_ilist)) >>>>> list_del_init(&fi->inmem_ilist); >>>>> spin_unlock(&sbi->inode_lock[ATOMIC_FILE]); >>>>> + if (fi->inmem_blocks) { >>>>> + f2fs_bug_on(sbi, 1); >>>>> + fi->inmem_blocks = 0; >>>>> + } >>>>> mutex_unlock(&fi->inmem_lock); >>>>> clear_inode_flag(inode, FI_ATOMIC_FILE); >>>>> @@ -326,6 +333,7 @@ void drop_inmem_page(struct inode *inode, struct page *page) >>>>> f2fs_bug_on(sbi, !cur || cur->page != page); >>>>> list_del(&cur->list); >>>>> + fi->inmem_blocks--; >>>>> mutex_unlock(&fi->inmem_lock); >>>>> dec_page_count(sbi, F2FS_INMEM_PAGES); >>>>> @@ -410,11 +418,26 @@ int commit_inmem_pages(struct inode *inode) >>>>> INIT_LIST_HEAD(&revoke_list); >>>>> f2fs_balance_fs(sbi, true); >>>>> + if (prefree_segments(sbi) >>>>> + && has_not_enough_free_secs(sbi, 0, >>>>> + fi->inmem_blocks / BLKS_PER_SEC(sbi))) { >>>>> + struct cp_control cpc; >>>>> + >>>>> + cpc.reason = __get_cp_reason(sbi); >>>>> + err = write_checkpoint(sbi, &cpc); >>>>> + if (err) >>>>> + goto drop; >>>>> + } >>>>> f2fs_lock_op(sbi); >>>>> set_inode_flag(inode, FI_ATOMIC_COMMIT); >>>>> mutex_lock(&fi->inmem_lock); >>>>> + if ((sbi->user_block_count - valid_user_blocks(sbi)) < >>>> What does this mean? We already allocated blocks successfully? >>>> >>>>> + fi->inmem_blocks) { >>>>> + err = -ENOSPC; >>>>> + goto drop; >>>>> + } >>>>> err = __commit_inmem_pages(inode, &revoke_list); >>>>> if (err) { >>>>> int ret; >>>>> @@ -429,7 +452,7 @@ int commit_inmem_pages(struct inode *inode) >>>>> ret = __revoke_inmem_pages(inode, &revoke_list, false, true); >>>>> if (ret) >>>>> err = ret; >>>>> - >>>>> +drop: >>>>> /* drop all uncommitted pages */ >>>>> __revoke_inmem_pages(inode, &fi->inmem_pages, true, false); >>>>> } >>>>> @@ -437,6 +460,10 @@ int commit_inmem_pages(struct inode *inode) >>>>> if (!list_empty(&fi->inmem_ilist)) >>>>> list_del_init(&fi->inmem_ilist); >>>>> spin_unlock(&sbi->inode_lock[ATOMIC_FILE]); >>>>> + if (fi->inmem_blocks) { >>>>> + f2fs_bug_on(sbi, 1); >>>>> + fi->inmem_blocks = 0; >>>>> + } >>>>> mutex_unlock(&fi->inmem_lock); >>>>> clear_inode_flag(inode, FI_ATOMIC_COMMIT); >>>>> -- >>>>> 1.8.5.2 >>>> . >>>> > . > -- Thanks, Yunlong Song From 1583058990328159559@xxx Fri Nov 03 15:25:00 +0000 2017 X-GM-THRID: 1582330312060990093 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread