Received: by 10.223.164.221 with SMTP id h29csp195850wrb; Fri, 3 Nov 2017 07:43:12 -0700 (PDT) X-Google-Smtp-Source: ABhQp+RADJzAuHUxSUyNLRNB7H+SO0crsxg+HQItRjedSMMX8LCvavlT+FC/uDFtxD5q6gq9bIRx X-Received: by 10.101.73.7 with SMTP id p7mr7543915pgs.106.1509720192328; Fri, 03 Nov 2017 07:43:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1509720192; cv=none; d=google.com; s=arc-20160816; b=USYxKoTz0M1b5Y9LCdeV/qIqBSYbDMrvSOhQJ2zs7+8mXwocTht8V5SZJIgf2MkTUo U9kYdtU113f6kv5w5xisfJ5TqptfmDeCM9MHvpa21PNeiYYzAfpnZAABQFThv/na3TkY +yiqj035Ll1g6W6EL68EgQ2sCM6+D813VRjdKGxWr/8JaECg7KIhDx49BW0g7nSQYrqU HXa7WdO4m2uYx9D/RSmRrspjKzY3SnN7CqEpiNetLhdsK5fKrQEofFEFdhGSIA7BfzoW K5A/eRQWa1R55lcQgb/mEsB1qR+Ws2d1Qu4RkyaT8DBokrxyrwr+gOeMm8GbanChiYn9 eUPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:references:cc:to:from:subject:arc-authentication-results; bh=JzHMVKniyRDvQzdcvgvYbl7WhF988IAflwRuqDt/uq4=; b=fWCCeKhJNGBFiypokd5pPxh8ZQcVntrgetUNGpN/RYUxp0On2fSCCX/I4Bqnym+x4g Mz16YxfoGdgP0PRPGLXdC24TiiSPJAeMofvv8nDbtmSeGcN2ORFDet+uR044F7Fw6TMC +F/hgL42AcysVD+LDADdOarXovFt7DhpUGRDkgcyovYWIpcPbb+Q4z6+fwLVdLkyxNWR hY4odeSxu6oUXrufbpSYadioGzpaLwhLByYc6OHoDjQQADHJ0RWVM8J7T/tJJxqRm5Xt rcxgZUzKborVxv+koqBn3T17a4GVeLNV2gTlBGea57HW2E2T40tKGn3PBXXqehXpXyy8 +OlQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 3si4952497plz.564.2017.11.03.07.42.58; Fri, 03 Nov 2017 07:43:12 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752302AbdKCOmW (ORCPT + 95 others); Fri, 3 Nov 2017 10:42:22 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:49961 "EHLO huawei.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1751370AbdKCOmU (ORCPT ); Fri, 3 Nov 2017 10:42:20 -0400 Received: from DGGEMS403-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id 9810AEB0B5BB0; Fri, 3 Nov 2017 22:42:05 +0800 (CST) Received: from [127.0.0.1] (10.111.220.140) by DGGEMS403-HUB.china.huawei.com (10.3.19.203) with Microsoft SMTP Server id 14.3.361.1; Fri, 3 Nov 2017 22:42:00 +0800 Subject: Re: [PATCH v2] f2fs: fix out-of-free problem caused by atomic write From: Yunlong Song To: Jaegeuk Kim CC: , , , , , , , References: <1509027715-80477-1-git-send-email-yunlong.song@huawei.com> <1509368658-60355-1-git-send-email-yunlong.song@huawei.com> <20171103034618.GD11335@jaegeuk-macbookpro.roam.corp.google.com> Message-ID: <3ab54002-4164-1582-cb76-eb337e126ebf@huawei.com> Date: Fri, 3 Nov 2017 22:40:42 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-Originating-IP: [10.111.220.140] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Test: Newest kernel source code from f2fs-dev 1G zram with f2fs 8 threads to atomic write one same file on zram there are four kinds of atomic write at the same time: 1 no atomic start, with atomic commit 2 no atomic start, no atomic commit 3 atomic start, with atomic commit 4 atomic start, no atomic commit And I add dump_stack after the check as following, + if ((sbi->user_block_count - valid_user_blocks(sbi)) < + fi->inmem_blocks) { + dump_stack(); + err = -ENOSPC; + goto drop; + } then we have: [ 136.237247] F2FS-fs (zram1): Unexpected flush for atomic writes: ino=4, npages=8193 [ 136.952469] CPU: 1 PID: 1274 Comm: atomic_t2 Not tainted 4.14.0-rc4+ #109 [ 136.952947] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014 [ 136.953162] Call Trace: [ 136.953162] dump_stack+0x4d/0x6e [ 136.953162] commit_inmem_pages+0x258/0x270 [ 136.953162] ? __sb_start_write+0x48/0x80 [ 136.953162] ? __mnt_want_write_file+0x18/0x30 [ 136.953162] f2fs_ioctl+0x1025/0x1e30 [ 136.953162] ? up_write+0x25/0x30 [ 136.953162] ? f2fs_file_write_iter+0xf3/0x1e0 [ 136.953162] ? selinux_file_ioctl+0x114/0x1e0 [ 136.953162] do_vfs_ioctl+0x96/0x5a0 [ 136.953162] SyS_ioctl+0x79/0x90 [ 136.953162] ? SyS_lseek+0x87/0xb0 [ 136.953162] entry_SYSCALL_64_fastpath+0x13/0x94 [ 136.953162] RIP: 0033:0x434b97 [ 136.953162] RSP: 002b:00007ffc68859de8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [ 136.953162] RAX: ffffffffffffffda RBX: 00000000006b78e0 RCX: 0000000000434b97 [ 136.953162] RDX: 00000000006b70e8 RSI: 000000000000f502 RDI: 0000000000000003 [ 136.953162] RBP: 0000000002000010 R08: 00000000006b70e8 R09: 00000000006b7160 [ 136.953162] R10: 0000000000000022 R11: 0000000000000202 R12: 00007f491a1c4010 [ 136.953162] R13: 0000000002001000 R14: 0000000002000000 R15: 00000000006b7938 So I think we should add the check code. On 2017/11/3 12:48, Yunlong Song wrote: > Because I found that it will still lead to out-of-free problem with > out that check. > I trace and find that it is possible that the committing date pages of > the atomic > file is bigger than the sbi->user_block_count - > valid_user_blocks(sbi), so I add > this check. > > On 2017/11/3 11:46, Jaegeuk Kim wrote: >> On 10/30, Yunlong Song wrote: >>> f2fs_balance_fs only actives once in the commit_inmem_pages, but there >>> are more than one page to commit, so all the other pages will miss the >>> check. This will lead to out-of-free problem when commit a very large >>> file. However, we cannot do f2fs_balance_fs for each inmem page, since >>> this will break atomicity. As a result, we should collect prefree >>> segments if needed and stop atomic commit when there are not enough >>> available blocks to write atomic pages. >>> >>> Signed-off-by: Yunlong Song >>> --- >>> fs/f2fs/f2fs.h | 1 + >>> fs/f2fs/segment.c | 29 ++++++++++++++++++++++++++++- >>> 2 files changed, 29 insertions(+), 1 deletion(-) >>> >>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h >>> index 13a96b8..04ce48f 100644 >>> --- a/fs/f2fs/f2fs.h >>> +++ b/fs/f2fs/f2fs.h >>> @@ -610,6 +610,7 @@ struct f2fs_inode_info { >>> struct list_head inmem_pages; /* inmemory pages managed by >>> f2fs */ >>> struct task_struct *inmem_task; /* store inmemory task */ >>> struct mutex inmem_lock; /* lock for inmemory pages */ >>> + unsigned long inmem_blocks; /* inmemory blocks */ >>> struct extent_tree *extent_tree; /* cached extent_tree >>> entry */ >>> struct rw_semaphore dio_rwsem[2];/* avoid racing between dio >>> and gc */ >>> struct rw_semaphore i_mmap_sem; >>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c >>> index 46dfbca..813c110 100644 >>> --- a/fs/f2fs/segment.c >>> +++ b/fs/f2fs/segment.c >>> @@ -210,6 +210,7 @@ void register_inmem_page(struct inode *inode, >>> struct page *page) >>> list_add_tail(&fi->inmem_ilist, >>> &sbi->inode_list[ATOMIC_FILE]); >>> spin_unlock(&sbi->inode_lock[ATOMIC_FILE]); >>> inc_page_count(F2FS_I_SB(inode), F2FS_INMEM_PAGES); >>> + fi->inmem_blocks++; >>> mutex_unlock(&fi->inmem_lock); >>> trace_f2fs_register_inmem_page(page, INMEM); >>> @@ -221,6 +222,7 @@ static int __revoke_inmem_pages(struct inode >>> *inode, >>> struct f2fs_sb_info *sbi = F2FS_I_SB(inode); >>> struct inmem_pages *cur, *tmp; >>> int err = 0; >>> + struct f2fs_inode_info *fi = F2FS_I(inode); >>> list_for_each_entry_safe(cur, tmp, head, list) { >>> struct page *page = cur->page; >>> @@ -263,6 +265,7 @@ static int __revoke_inmem_pages(struct inode >>> *inode, >>> list_del(&cur->list); >>> kmem_cache_free(inmem_entry_slab, cur); >>> dec_page_count(F2FS_I_SB(inode), F2FS_INMEM_PAGES); >>> + fi->inmem_blocks--; >>> } >>> return err; >>> } >>> @@ -302,6 +305,10 @@ void drop_inmem_pages(struct inode *inode) >>> if (!list_empty(&fi->inmem_ilist)) >>> list_del_init(&fi->inmem_ilist); >>> spin_unlock(&sbi->inode_lock[ATOMIC_FILE]); >>> + if (fi->inmem_blocks) { >>> + f2fs_bug_on(sbi, 1); >>> + fi->inmem_blocks = 0; >>> + } >>> mutex_unlock(&fi->inmem_lock); >>> clear_inode_flag(inode, FI_ATOMIC_FILE); >>> @@ -326,6 +333,7 @@ void drop_inmem_page(struct inode *inode, struct >>> page *page) >>> f2fs_bug_on(sbi, !cur || cur->page != page); >>> list_del(&cur->list); >>> + fi->inmem_blocks--; >>> mutex_unlock(&fi->inmem_lock); >>> dec_page_count(sbi, F2FS_INMEM_PAGES); >>> @@ -410,11 +418,26 @@ int commit_inmem_pages(struct inode *inode) >>> INIT_LIST_HEAD(&revoke_list); >>> f2fs_balance_fs(sbi, true); >>> + if (prefree_segments(sbi) >>> + && has_not_enough_free_secs(sbi, 0, >>> + fi->inmem_blocks / BLKS_PER_SEC(sbi))) { >>> + struct cp_control cpc; >>> + >>> + cpc.reason = __get_cp_reason(sbi); >>> + err = write_checkpoint(sbi, &cpc); >>> + if (err) >>> + goto drop; >>> + } >>> f2fs_lock_op(sbi); >>> set_inode_flag(inode, FI_ATOMIC_COMMIT); >>> mutex_lock(&fi->inmem_lock); >>> + if ((sbi->user_block_count - valid_user_blocks(sbi)) < >> What does this mean? We already allocated blocks successfully? >> >>> + fi->inmem_blocks) { >>> + err = -ENOSPC; >>> + goto drop; >>> + } >>> err = __commit_inmem_pages(inode, &revoke_list); >>> if (err) { >>> int ret; >>> @@ -429,7 +452,7 @@ int commit_inmem_pages(struct inode *inode) >>> ret = __revoke_inmem_pages(inode, &revoke_list, false, true); >>> if (ret) >>> err = ret; >>> - >>> +drop: >>> /* drop all uncommitted pages */ >>> __revoke_inmem_pages(inode, &fi->inmem_pages, true, false); >>> } >>> @@ -437,6 +460,10 @@ int commit_inmem_pages(struct inode *inode) >>> if (!list_empty(&fi->inmem_ilist)) >>> list_del_init(&fi->inmem_ilist); >>> spin_unlock(&sbi->inode_lock[ATOMIC_FILE]); >>> + if (fi->inmem_blocks) { >>> + f2fs_bug_on(sbi, 1); >>> + fi->inmem_blocks = 0; >>> + } >>> mutex_unlock(&fi->inmem_lock); >>> clear_inode_flag(inode, FI_ATOMIC_COMMIT); >>> -- >>> 1.8.5.2 >> . >> > -- Thanks, Yunlong Song From 1583019408252418143@xxx Fri Nov 03 04:55:52 +0000 2017 X-GM-THRID: 1582330312060990093 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread