Hello,
Testing f2fs (of linux-3.13) under fault simulation, we detected
umount() hangs up after
mkdir()->
f2fs_add_link()->
init_inode_metadata()->
f2fs_init_acl()->
f2fs_get_acl()->
f2fs_getxattr()->
read_all_xattrs() fails.
Also there was a BUG_ON triggered after the fault in
mkdir()->
f2fs_add_link()->
init_inode_metadata()->
remove_inode_page() ->
f2fs_bug_on(inode->i_blocks != 0 && inode->i_blocks != 1);
In this case there are 2 blocks allocated, which are counted with
inode->i_blocks field:
static struct page *init_inode_metadata(struct inode *inode,
struct inode *dir, const struct qstr *name)
{
struct page *page;
int err;
if (is_inode_flag_set(F2FS_I(inode), FI_NEW_INODE)) {
page = new_inode_page(inode, name); <----- First block is
allocated for the inode
if (IS_ERR(page))
return page;
if (S_ISDIR(inode->i_mode)) {
err = make_empty_dir(inode, dir, page); <----- Second block
is allocated for the inode
if (err)
goto error;
}
err = f2fs_init_acl(inode, dir, page); <----- This call returns
error
if (err)
goto error;
...
} else {
...
}
init_dent_inode(name, page);
/*
* This file should be checkpointed during fsync.
* We lost i_pino from now on.
*/
if (is_inode_flag_set(F2FS_I(inode), FI_INC_LINK)) {
file_lost_pino(inode);
inc_nlink(inode);
}
return page;
error:
f2fs_put_page(page, 1);
remove_inode_page(inode); <----- Expects at most one block
allocated for inode
return ERR_PTR(err);
}
Found by Linux File System Verification project (linuxtesting.org).
Here are some additional details.
F2fs-related kernel configuration is:
CONFIG_F2FS_FS=m
CONFIG_F2FS_STAT_FS=y
CONFIG_F2FS_FS_XATTR=y
CONFIG_F2FS_FS_POSIX_ACL=y
CONFIG_F2FS_FS_SECURITY=y
CONFIG_F2FS_CHECK_FS=y
BUG_ON log:
[ 117.863869] kernel BUG at fs/f2fs/node.c:825!
[ 117.863870] invalid opcode: 0000 [#1] SMP
[ 117.863872] Modules linked in: f2fs kedr_fsim_indicator_common(OF)
kedr_fsim_indicator_capable(OF)
kedr_fsim_indicator_kmalloc(OF) kedr_fsim_vmm(OF) kedr_fsim_mem_util(OF)
kedr_fsim_capable(OF)
kedr_fsim_uaccess(OF) kedr_fsim_cmm(OF) kedr_fault_simulation(OF)
kedr(OF) fuse nf_conntrack_netbios_ns
nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT xt_conntrack
ebtable_nat ebtable_broute bridge stp llc
ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6
nf_nat_ipv6 ip6table_mangle
ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4
nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security
iptable_raw microcode i2c_piix4 parport_pc
e1000 i2c_core parport ata_generic pata_acpi [last unloaded: kedr]
[ 117.863894] CPU: 0 PID: 2766 Comm: fs-driver-tests Tainted:
GF O 3.13.0fs #2
[ 117.863895] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
VirtualBox 12/01/2006
[ 117.863896] task: ffff8800001b6420 ti: ffff8800111ac000 task.ti:
ffff8800111ac000
[ 117.863897] RIP: 0010:[<ffffffffa01e03bf>] [<ffffffffa01e03bf>]
remove_inode_page+0xbf/0xd0 [f2fs]
[ 117.863901] RSP: 0018:ffff8800111ade08 EFLAGS: 00010202
[ 117.863902] RAX: 0000000000000000 RBX: ffff8800125a0c30 RCX:
ffff8800001b6a78
[ 117.863903] RDX: ffff880000000000 RSI: ffffea00004b9780 RDI:
ffff8800125a0c30
[ 117.863903] RBP: ffff8800111ade50 R08: 0000000000000001 R09:
0000000000000000
[ 117.863904] R10: 0000000000000006 R11: 000000000000000f R12:
ffffea00004b9780
[ 117.863905] R13: 0000000000000004 R14: ffffea00004b9780 R15:
ffffea00004b97c0
[ 117.863907] FS: 00007ff92dc9f740(0000) GS:ffff88003fc00000(0000)
knlGS:0000000000000000
[ 117.863908] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 117.863909] CR2: 00007f6673415000 CR3: 000000003c628000 CR4:
00000000000006f0
[ 117.863913] Stack:
[ 117.863913] 00000000000041ff ffffea00004b97c0 ffff8800111ade50
ffffffffa01e9ffa
[ 117.863915] ffffea00004b9780 ffff880000000000 ffff880012e5d000
0000000000000003
[ 117.863917] ffffea00004b9740 ffff8800111adec0 ffffffffa01d0d20
ffff880012532948
[ 117.863919] Call Trace:
[ 117.863922] [<ffffffffa01e9ffa>] ? f2fs_init_acl+0x10a/0x180 [f2fs]
[ 117.863925] [<ffffffffa01d0d20>] __f2fs_add_link+0x560/0x7c0 [f2fs]
[ 117.863928] [<ffffffffa01d3b8b>] f2fs_mkdir+0xbb/0x150 [f2fs]
[ 117.863929] [<ffffffff811cf4c7>] vfs_mkdir+0xb7/0x160
[ 117.863931] [<ffffffff811d010f>] SyS_mkdir+0x5f/0xc0
[ 117.863933] [<ffffffff8165bf29>] system_call_fastpath+0x16/0x1b
[ 117.863934] Code: e6 49 8b 14 24 83 e2 01 74 23 4c 89 e7 89 45 bc e8
f7 f9 f6 e0 4c 89 e7 e8 ff d1 f7
e0 8b 45 bc 48 83 c4 30 5b 41 5c 41 5d 5d c3 <0f> 0b e8 c8 9d 00 00 66
2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
[ 117.863953] RIP [<ffffffffa01e03bf>] remove_inode_page+0xbf/0xd0 [f2fs]
[ 117.863956] RSP <ffff8800111ade08>
There are several ways for simulate faults in kernel function calls,
e.g. manual faults insertion,
fault injection kernel infrustructure.
In my tests I use KEDR framework (http://forge.ispras.ru/projects/kedr)
for that purpose,
that is why kedr* modules are in the log above.
Call stack of the fault simulated is:
[ 117.863789] KEDR FAULT SIMULATION: forcing a failure
[ 117.863792] CPU: 0 PID: 2766 Comm: fs-driver-tests Tainted:
GF O 3.13.0fs #2
[ 117.863794] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
VirtualBox 12/01/2006
[ 117.863795] ffff8800111adce8 ffff8800111adcb0 ffffffff8164bd96
0000000000000001
[ 117.863797] ffff8800111adcd8 ffffffffa018ace2 ffffffffa018ac45
ffffffffa01e8b54
[ 117.863799] 0000000000001000 ffff8800111add18 ffffffffa0194816
ffffffffa01e8b54
[ 117.863801] Call Trace:
[ 117.863806] [<ffffffff8164bd96>] dump_stack+0x45/0x56
[ 117.863810] [<ffffffffa018ace2>] kedr_fsim_point_simulate+0xa2/0xb0
[kedr_fault_simulation]
[ 117.863812] [<ffffffffa018ac45>] ? kedr_fsim_point_simulate+0x5/0xb0
[kedr_fault_simulation]
[ 117.863816] [<ffffffffa01e8b54>] ? read_all_xattrs+0x3c4/0x3e0 [f2fs]
[ 117.863819] [<ffffffffa0194816>] kedr_repl___kmalloc+0x36/0x80
[kedr_fsim_cmm]
[ 117.863822] [<ffffffffa01e8b54>] ? read_all_xattrs+0x3c4/0x3e0 [f2fs]
[ 117.863824] [<ffffffffa01955d2>]
kedr_intermediate_func___kmalloc+0x72/0xd0 [kedr_fsim_cmm]
[ 117.863826] [<ffffffffa01e8b54>] ? read_all_xattrs+0x3c4/0x3e0 [f2fs]
[ 117.863829] [<ffffffffa01e8b54>] read_all_xattrs+0x3c4/0x3e0 [f2fs]
[ 117.863832] [<ffffffffa01e91d4>] f2fs_getxattr+0x44/0xf0 [f2fs]
[ 117.863835] [<ffffffffa01e9ba9>] f2fs_get_acl+0xe9/0x390 [f2fs]
[ 117.863838] [<ffffffffa01d99c9>] ? set_dirty_dir_page+0xb9/0xe0 [f2fs]
[ 117.863841] [<ffffffffa01e9ffa>] f2fs_init_acl+0x10a/0x180 [f2fs]
[ 117.863843] [<ffffffffa01d0ccc>] __f2fs_add_link+0x50c/0x7c0 [f2fs]
[ 117.863846] [<ffffffffa01d3b8b>] f2fs_mkdir+0xbb/0x150 [f2fs]
[ 117.863849] [<ffffffff811cf4c7>] vfs_mkdir+0xb7/0x160
[ 117.863851] [<ffffffff811d010f>] SyS_mkdir+0x5f/0xc0
[ 117.863854] [<ffffffff8165bf29>] system_call_fastpath+0x16/0x1b
--
Best regards,
Andrey Tsyvarev
Linux Verification Center, ISPRAS
web: http://linuxtesting.org
---
Это сообщение свободно от вирусов и вредоносного ПО благодаря защите от вирусов avast!
http://www.avast.com
Hi,
Thank you for the test and valuable report.
This bug was fixed recently by:
commit 03dea3129d558bf5293a6e9f12777176619ac876
Author: Jaegeuk Kim <[email protected]>
Date: Wed Feb 5 11:16:39 2014 +0900
f2fs: fix to truncate dentry pages in the error case
You can find that from the tree:
git://git.kernel.org/cgit/linux/kernel/git/jaegeuk/f2fs.git/log/?h=dev
Thanks,
2014-02-06 (목), 09:43 +0400, Andrey Tsyvarev:
> Hello,
>
> Testing f2fs (of linux-3.13) under fault simulation, we detected
> umount() hangs up after
> mkdir()->
> f2fs_add_link()->
> init_inode_metadata()->
> f2fs_init_acl()->
> f2fs_get_acl()->
> f2fs_getxattr()->
> read_all_xattrs() fails.
>
> Also there was a BUG_ON triggered after the fault in
> mkdir()->
> f2fs_add_link()->
> init_inode_metadata()->
> remove_inode_page() ->
> f2fs_bug_on(inode->i_blocks != 0 && inode->i_blocks != 1);
>
> In this case there are 2 blocks allocated, which are counted with
> inode->i_blocks field:
>
> static struct page *init_inode_metadata(struct inode *inode,
> struct inode *dir, const struct qstr *name)
> {
> struct page *page;
> int err;
>
> if (is_inode_flag_set(F2FS_I(inode), FI_NEW_INODE)) {
> page = new_inode_page(inode, name); <----- First block is
> allocated for the inode
> if (IS_ERR(page))
> return page;
>
> if (S_ISDIR(inode->i_mode)) {
> err = make_empty_dir(inode, dir, page); <----- Second block
> is allocated for the inode
> if (err)
> goto error;
> }
>
> err = f2fs_init_acl(inode, dir, page); <----- This call returns
> error
> if (err)
> goto error;
> ...
> } else {
> ...
> }
>
> init_dent_inode(name, page);
>
> /*
> * This file should be checkpointed during fsync.
> * We lost i_pino from now on.
> */
> if (is_inode_flag_set(F2FS_I(inode), FI_INC_LINK)) {
> file_lost_pino(inode);
> inc_nlink(inode);
> }
> return page;
>
> error:
> f2fs_put_page(page, 1);
> remove_inode_page(inode); <----- Expects at most one block
> allocated for inode
> return ERR_PTR(err);
> }
>
> Found by Linux File System Verification project (linuxtesting.org).
>
>
> Here are some additional details.
>
> F2fs-related kernel configuration is:
> CONFIG_F2FS_FS=m
> CONFIG_F2FS_STAT_FS=y
> CONFIG_F2FS_FS_XATTR=y
> CONFIG_F2FS_FS_POSIX_ACL=y
> CONFIG_F2FS_FS_SECURITY=y
> CONFIG_F2FS_CHECK_FS=y
>
>
> BUG_ON log:
> [ 117.863869] kernel BUG at fs/f2fs/node.c:825!
> [ 117.863870] invalid opcode: 0000 [#1] SMP
> [ 117.863872] Modules linked in: f2fs kedr_fsim_indicator_common(OF)
> kedr_fsim_indicator_capable(OF)
> kedr_fsim_indicator_kmalloc(OF) kedr_fsim_vmm(OF) kedr_fsim_mem_util(OF)
> kedr_fsim_capable(OF)
> kedr_fsim_uaccess(OF) kedr_fsim_cmm(OF) kedr_fault_simulation(OF)
> kedr(OF) fuse nf_conntrack_netbios_ns
> nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT xt_conntrack
> ebtable_nat ebtable_broute bridge stp llc
> ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6
> nf_nat_ipv6 ip6table_mangle
> ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat
> nf_conntrack_ipv4 nf_defrag_ipv4
> nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security
> iptable_raw microcode i2c_piix4 parport_pc
> e1000 i2c_core parport ata_generic pata_acpi [last unloaded: kedr]
> [ 117.863894] CPU: 0 PID: 2766 Comm: fs-driver-tests Tainted:
> GF O 3.13.0fs #2
> [ 117.863895] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> VirtualBox 12/01/2006
> [ 117.863896] task: ffff8800001b6420 ti: ffff8800111ac000 task.ti:
> ffff8800111ac000
> [ 117.863897] RIP: 0010:[<ffffffffa01e03bf>] [<ffffffffa01e03bf>]
> remove_inode_page+0xbf/0xd0 [f2fs]
> [ 117.863901] RSP: 0018:ffff8800111ade08 EFLAGS: 00010202
> [ 117.863902] RAX: 0000000000000000 RBX: ffff8800125a0c30 RCX:
> ffff8800001b6a78
> [ 117.863903] RDX: ffff880000000000 RSI: ffffea00004b9780 RDI:
> ffff8800125a0c30
> [ 117.863903] RBP: ffff8800111ade50 R08: 0000000000000001 R09:
> 0000000000000000
> [ 117.863904] R10: 0000000000000006 R11: 000000000000000f R12:
> ffffea00004b9780
> [ 117.863905] R13: 0000000000000004 R14: ffffea00004b9780 R15:
> ffffea00004b97c0
> [ 117.863907] FS: 00007ff92dc9f740(0000) GS:ffff88003fc00000(0000)
> knlGS:0000000000000000
> [ 117.863908] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 117.863909] CR2: 00007f6673415000 CR3: 000000003c628000 CR4:
> 00000000000006f0
> [ 117.863913] Stack:
> [ 117.863913] 00000000000041ff ffffea00004b97c0 ffff8800111ade50
> ffffffffa01e9ffa
> [ 117.863915] ffffea00004b9780 ffff880000000000 ffff880012e5d000
> 0000000000000003
> [ 117.863917] ffffea00004b9740 ffff8800111adec0 ffffffffa01d0d20
> ffff880012532948
> [ 117.863919] Call Trace:
> [ 117.863922] [<ffffffffa01e9ffa>] ? f2fs_init_acl+0x10a/0x180 [f2fs]
> [ 117.863925] [<ffffffffa01d0d20>] __f2fs_add_link+0x560/0x7c0 [f2fs]
> [ 117.863928] [<ffffffffa01d3b8b>] f2fs_mkdir+0xbb/0x150 [f2fs]
> [ 117.863929] [<ffffffff811cf4c7>] vfs_mkdir+0xb7/0x160
> [ 117.863931] [<ffffffff811d010f>] SyS_mkdir+0x5f/0xc0
> [ 117.863933] [<ffffffff8165bf29>] system_call_fastpath+0x16/0x1b
> [ 117.863934] Code: e6 49 8b 14 24 83 e2 01 74 23 4c 89 e7 89 45 bc e8
> f7 f9 f6 e0 4c 89 e7 e8 ff d1 f7
> e0 8b 45 bc 48 83 c4 30 5b 41 5c 41 5d 5d c3 <0f> 0b e8 c8 9d 00 00 66
> 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
> [ 117.863953] RIP [<ffffffffa01e03bf>] remove_inode_page+0xbf/0xd0 [f2fs]
> [ 117.863956] RSP <ffff8800111ade08>
>
> There are several ways for simulate faults in kernel function calls,
> e.g. manual faults insertion,
> fault injection kernel infrustructure.
> In my tests I use KEDR framework (http://forge.ispras.ru/projects/kedr)
> for that purpose,
> that is why kedr* modules are in the log above.
> Call stack of the fault simulated is:
>
> [ 117.863789] KEDR FAULT SIMULATION: forcing a failure
> [ 117.863792] CPU: 0 PID: 2766 Comm: fs-driver-tests Tainted:
> GF O 3.13.0fs #2
> [ 117.863794] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> VirtualBox 12/01/2006
> [ 117.863795] ffff8800111adce8 ffff8800111adcb0 ffffffff8164bd96
> 0000000000000001
> [ 117.863797] ffff8800111adcd8 ffffffffa018ace2 ffffffffa018ac45
> ffffffffa01e8b54
> [ 117.863799] 0000000000001000 ffff8800111add18 ffffffffa0194816
> ffffffffa01e8b54
> [ 117.863801] Call Trace:
> [ 117.863806] [<ffffffff8164bd96>] dump_stack+0x45/0x56
> [ 117.863810] [<ffffffffa018ace2>] kedr_fsim_point_simulate+0xa2/0xb0
> [kedr_fault_simulation]
> [ 117.863812] [<ffffffffa018ac45>] ? kedr_fsim_point_simulate+0x5/0xb0
> [kedr_fault_simulation]
> [ 117.863816] [<ffffffffa01e8b54>] ? read_all_xattrs+0x3c4/0x3e0 [f2fs]
> [ 117.863819] [<ffffffffa0194816>] kedr_repl___kmalloc+0x36/0x80
> [kedr_fsim_cmm]
> [ 117.863822] [<ffffffffa01e8b54>] ? read_all_xattrs+0x3c4/0x3e0 [f2fs]
> [ 117.863824] [<ffffffffa01955d2>]
> kedr_intermediate_func___kmalloc+0x72/0xd0 [kedr_fsim_cmm]
> [ 117.863826] [<ffffffffa01e8b54>] ? read_all_xattrs+0x3c4/0x3e0 [f2fs]
> [ 117.863829] [<ffffffffa01e8b54>] read_all_xattrs+0x3c4/0x3e0 [f2fs]
> [ 117.863832] [<ffffffffa01e91d4>] f2fs_getxattr+0x44/0xf0 [f2fs]
> [ 117.863835] [<ffffffffa01e9ba9>] f2fs_get_acl+0xe9/0x390 [f2fs]
> [ 117.863838] [<ffffffffa01d99c9>] ? set_dirty_dir_page+0xb9/0xe0 [f2fs]
> [ 117.863841] [<ffffffffa01e9ffa>] f2fs_init_acl+0x10a/0x180 [f2fs]
> [ 117.863843] [<ffffffffa01d0ccc>] __f2fs_add_link+0x50c/0x7c0 [f2fs]
> [ 117.863846] [<ffffffffa01d3b8b>] f2fs_mkdir+0xbb/0x150 [f2fs]
> [ 117.863849] [<ffffffff811cf4c7>] vfs_mkdir+0xb7/0x160
> [ 117.863851] [<ffffffff811d010f>] SyS_mkdir+0x5f/0xc0
> [ 117.863854] [<ffffffff8165bf29>] system_call_fastpath+0x16/0x1b
>
> --
> Best regards,
> Andrey Tsyvarev
> Linux Verification Center, ISPRAS
> web: http://linuxtesting.org
>
> ---
> Это сообщение свободно от вирусов и вредоносного ПО благодаря защите от вирусов avast!
> http://www.avast.com
>
--
Jaegeuk Kim
Samsung
Hi,
06.02.2014 10:02, Jaegeuk Kim пишет:
> Hi,
>
> Thank you for the test and valuable report.
>
> This bug was fixed recently by:
>
> commit 03dea3129d558bf5293a6e9f12777176619ac876
> Author: Jaegeuk Kim <[email protected]>
> Date: Wed Feb 5 11:16:39 2014 +0900
>
> f2fs: fix to truncate dentry pages in the error case
Now remove_inode_page() succeed, but another assertion failed (tested on
revision e964751c):
[ 1272.747011] kernel BUG at fs/f2fs/inode.c:274!
[ 1272.747011] invalid opcode: 0000 [#1] SMP
[ 1272.747011] Modules linked in: f2fs kedr_fsim_indicator_common(OF)
kedr_fsim_indicator_capable(OF)
kedr_fsim_indicator_kmalloc(OF) kedr_fsim_vmm(OF) kedr_fsim_mem_util(OF)
kedr_fsim_capable(OF)
kedr_fsim_uaccess(OF) kedr_fsim_cmm(OF) kedr_fault_simulation(OF)
kedr(OF) fuse nf_conntrack_netbios_ns
nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT xt_conntrack
ebtable_nat ebtable_broute bridge stp
llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6
nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle
ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4
nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security
iptable_raw parport_pc i2c_piix4 e1000
i2c_core microcode parport ata_generic pata_acpi [last unloaded: kedr]
[ 1272.747011] CPU: 0 PID: 14613 Comm: fs-driver-tests Tainted: GF
W O 3.14.0-rc1fs #1
[ 1272.747011] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
VirtualBox 12/01/2006
[ 1272.747011] task: ffff88001e939190 ti: ffff88000d7ec000 task.ti:
ffff88000d7ec000
[ 1272.747011] RIP: 0010:[<ffffffffa01c74a8>] [<ffffffffa01c74a8>]
f2fs_evict_inode+0x178/0x180 [f2fs]
[ 1272.747011] RSP: 0018:ffff88000d7ede50 EFLAGS: 00010202
[ 1272.747011] RAX: 0000000000000001 RBX: ffff88000475cc30 RCX:
ffff88001e9398a0
[ 1272.747011] RDX: 0000000000000002 RSI: 0000000000000000 RDI:
ffff88000475ce10
[ 1272.747011] RBP: ffff88000d7ede68 R08: 00000000ffffffff R09:
0000000000000000
[ 1272.747011] R10: 0000000000000000 R11: 0000000000000001 R12:
ffff88000475cc30
[ 1272.747011] R13: ffff88000f147800 R14: ffffffffa01e7080 R15:
ffff88000f147b80
[ 1272.747011] FS: 00007f1795424740(0000) GS:ffff88003fc00000(0000)
knlGS:0000000000000000
[ 1272.747011] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1272.747011] CR2: 00007fc33bfa9000 CR3: 000000000f14e000 CR4:
00000000000006f0
[ 1272.747011] Stack:
[ 1272.747011] ffff88000475cc30 ffff88000475cdc8 ffffffffa01e7080
ffff88000d7ede90
[ 1272.747011] ffffffff811fde03 ffff88000475cc30 ffff88000475ccb8
ffff88000f147000
[ 1272.747011] ffff88000d7edec0 ffffffff811fe615 ffff88000475cc30
ffff88000f147800
[ 1272.747011] Call Trace:
[ 1272.747011] [<ffffffff811fde03>] evict+0xa3/0x1a0
[ 1272.747011] [<ffffffff811fe615>] iput+0xf5/0x180
[ 1272.747011] [<ffffffffa01c7f63>] f2fs_mkdir+0xf3/0x150 [f2fs]
[ 1272.747011] [<ffffffff811f2a77>] vfs_mkdir+0xb7/0x160
[ 1272.747011] [<ffffffff811f36bf>] SyS_mkdir+0x5f/0xc0
[ 1272.747011] [<ffffffff81680769>] system_call_fastpath+0x16/0x1b
[ 1272.747011] Code: 01 e1 4c 89 e7 e8 39 59 03 e1 5b 41 5c 41 5d 5d c3
31 c0 49 83 bc 24 c8 00 00 00 01 0f 97 c0
eb 8f 4c 89 e7 e8 fa ec ff ff eb 89 <0f> 0b 66 0f 1f 44 00 00 0f 1f 44
00 00 55 48 c7 c0 dc ff ff ff
[ 1272.747011] RIP [<ffffffffa01c74a8>] f2fs_evict_inode+0x178/0x180 [f2fs]
[ 1272.747011] RSP <ffff88000d7ede50>
Failed assertion claims that dirty dentries counter should be zero when
inode is deleted.
This counter is incremented by
mkdir()->
f2fs_add_link()->
init_inode_metadata()->
make_empty_dir()->
set_page_dirty();
but no one decrement it.
May be, this should be done along with truncating directory inode in
error-path of init_inode_metadata() ?
--
Best regards,
Andrey Tsyvarev
Linux Verification Center, ISPRAS
web: http://linuxtesting.org
Hi,
2014-02-06 (목), 16:17 +0400, Andrey Tsyvarev:
> Hi,
>
> 06.02.2014 10:02, Jaegeuk Kim пишет:
> > Hi,
> >
> > Thank you for the test and valuable report.
> >
> > This bug was fixed recently by:
> >
> > commit 03dea3129d558bf5293a6e9f12777176619ac876
> > Author: Jaegeuk Kim <[email protected]>
> > Date: Wed Feb 5 11:16:39 2014 +0900
> >
> > f2fs: fix to truncate dentry pages in the error case
> Now remove_inode_page() succeed, but another assertion failed (tested on
> revision e964751c):
>
> [ 1272.747011] kernel BUG at fs/f2fs/inode.c:274!
> [ 1272.747011] invalid opcode: 0000 [#1] SMP
> [ 1272.747011] Modules linked in: f2fs kedr_fsim_indicator_common(OF)
> kedr_fsim_indicator_capable(OF)
> kedr_fsim_indicator_kmalloc(OF) kedr_fsim_vmm(OF) kedr_fsim_mem_util(OF)
> kedr_fsim_capable(OF)
> kedr_fsim_uaccess(OF) kedr_fsim_cmm(OF) kedr_fault_simulation(OF)
> kedr(OF) fuse nf_conntrack_netbios_ns
> nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT xt_conntrack
> ebtable_nat ebtable_broute bridge stp
> llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6
> nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle
> ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat
> nf_conntrack_ipv4 nf_defrag_ipv4
> nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security
> iptable_raw parport_pc i2c_piix4 e1000
> i2c_core microcode parport ata_generic pata_acpi [last unloaded: kedr]
> [ 1272.747011] CPU: 0 PID: 14613 Comm: fs-driver-tests Tainted: GF
> W O 3.14.0-rc1fs #1
> [ 1272.747011] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> VirtualBox 12/01/2006
> [ 1272.747011] task: ffff88001e939190 ti: ffff88000d7ec000 task.ti:
> ffff88000d7ec000
> [ 1272.747011] RIP: 0010:[<ffffffffa01c74a8>] [<ffffffffa01c74a8>]
> f2fs_evict_inode+0x178/0x180 [f2fs]
> [ 1272.747011] RSP: 0018:ffff88000d7ede50 EFLAGS: 00010202
> [ 1272.747011] RAX: 0000000000000001 RBX: ffff88000475cc30 RCX:
> ffff88001e9398a0
> [ 1272.747011] RDX: 0000000000000002 RSI: 0000000000000000 RDI:
> ffff88000475ce10
> [ 1272.747011] RBP: ffff88000d7ede68 R08: 00000000ffffffff R09:
> 0000000000000000
> [ 1272.747011] R10: 0000000000000000 R11: 0000000000000001 R12:
> ffff88000475cc30
> [ 1272.747011] R13: ffff88000f147800 R14: ffffffffa01e7080 R15:
> ffff88000f147b80
> [ 1272.747011] FS: 00007f1795424740(0000) GS:ffff88003fc00000(0000)
> knlGS:0000000000000000
> [ 1272.747011] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 1272.747011] CR2: 00007fc33bfa9000 CR3: 000000000f14e000 CR4:
> 00000000000006f0
> [ 1272.747011] Stack:
> [ 1272.747011] ffff88000475cc30 ffff88000475cdc8 ffffffffa01e7080
> ffff88000d7ede90
> [ 1272.747011] ffffffff811fde03 ffff88000475cc30 ffff88000475ccb8
> ffff88000f147000
> [ 1272.747011] ffff88000d7edec0 ffffffff811fe615 ffff88000475cc30
> ffff88000f147800
> [ 1272.747011] Call Trace:
> [ 1272.747011] [<ffffffff811fde03>] evict+0xa3/0x1a0
> [ 1272.747011] [<ffffffff811fe615>] iput+0xf5/0x180
> [ 1272.747011] [<ffffffffa01c7f63>] f2fs_mkdir+0xf3/0x150 [f2fs]
> [ 1272.747011] [<ffffffff811f2a77>] vfs_mkdir+0xb7/0x160
> [ 1272.747011] [<ffffffff811f36bf>] SyS_mkdir+0x5f/0xc0
> [ 1272.747011] [<ffffffff81680769>] system_call_fastpath+0x16/0x1b
> [ 1272.747011] Code: 01 e1 4c 89 e7 e8 39 59 03 e1 5b 41 5c 41 5d 5d c3
> 31 c0 49 83 bc 24 c8 00 00 00 01 0f 97 c0
> eb 8f 4c 89 e7 e8 fa ec ff ff eb 89 <0f> 0b 66 0f 1f 44 00 00 0f 1f 44
> 00 00 55 48 c7 c0 dc ff ff ff
> [ 1272.747011] RIP [<ffffffffa01c74a8>] f2fs_evict_inode+0x178/0x180 [f2fs]
> [ 1272.747011] RSP <ffff88000d7ede50>
>
> Failed assertion claims that dirty dentries counter should be zero when
> inode is deleted.
> This counter is incremented by
> mkdir()->
> f2fs_add_link()->
> init_inode_metadata()->
> make_empty_dir()->
> set_page_dirty();
>
> but no one decrement it.
> May be, this should be done along with truncating directory inode in
> error-path of init_inode_metadata() ?
It's weird, since original intention was that pages should be
invalidated by:
f2fs_evict_inode
- truncate_inode_pages
- f2fs_invalidate_page
- decrement dirty_dents
I'll see what happened a little bit more.
Thanks,
>
>
> --
> Best regards,
> Andrey Tsyvarev
> Linux Verification Center, ISPRAS
> web: http://linuxtesting.org
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Jaegeuk Kim
Samsung
Hi,
It turns out that make_bad_inode prior to iput sets i_mode to a regular
file, so that f2fs_evict_inode -> truncate_inode_pages ->
f2fs_invalidate_data_page doesn't decrement dirty_dents.
This patch should resolve the bug.
Thank you :)
----
When a new directory is allocated, if an error is occurred, we should
truncate
preallocated dentry pages too.
This bug was reported by Andrey Tsyvarev after a while as follows.
mkdir()->
f2fs_add_link()->
init_inode_metadata()->
f2fs_init_acl()->
f2fs_get_acl()->
f2fs_getxattr()->
read_all_xattrs() fails.
Also there was a BUG_ON triggered after the fault in
mkdir()->
f2fs_add_link()->
init_inode_metadata()->
remove_inode_page() ->
f2fs_bug_on(inode->i_blocks != 0 && inode->i_blocks != 1);
But, previous patch wasn't perfect to resolve that bug, so the following
bug
report was also submitted.
kernel BUG at fs/f2fs/inode.c:274!
Call Trace:
[<ffffffff811fde03>] evict+0xa3/0x1a0
[<ffffffff811fe615>] iput+0xf5/0x180
[<ffffffffa01c7f63>] f2fs_mkdir+0xf3/0x150 [f2fs]
[<ffffffff811f2a77>] vfs_mkdir+0xb7/0x160
[<ffffffff811f36bf>] SyS_mkdir+0x5f/0xc0
[<ffffffff81680769>] system_call_fastpath+0x16/0x1b
Finally, this patch resolves all the issues like below.
If an error is occurred after make_empty_dir(),
1. truncate_inode_pages()
The make_bad_inode() prior to iput() will change i_mode to S_IFREG,
which
means that f2fs will not decrement fi->dirty_dents during
f2fs_evict_inode.
But, by calling it here, we can do that.
2. truncate_blocks()
Preallocated dentry pages are trucated here to sync i_blocks.
Reported-by: Andrey Tsyvarev <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/dir.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index bfcb4ae..92ce1db 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -372,6 +372,9 @@ static struct page *init_inode_metadata(struct inode
*inode,
put_error:
f2fs_put_page(page, 1);
+ /* once the failed inode becomes a bad inode, i_mode is S_IFREG */
+ truncate_inode_pages(&inode->i_data, 0);
+ truncate_blocks(inode, 0);
error:
remove_inode_page(inode);
return ERR_PTR(err);
--
1.8.4.474.g128a96c
--
Jaegeuk Kim
Samsung
Hi,
> It turns out that make_bad_inode prior to iput sets i_mode to a regular
> file, so that f2fs_evict_inode -> truncate_inode_pages ->
> f2fs_invalidate_data_page doesn't decrement dirty_dents.
>
It seems that remove_dirty_dir_inode() call should also be added to the
error-path of
init_inode_metadata, because its functionality is also based on
inode->i_mode field
which is changed by make_bad_inode().
Otherwise memory leak is reported when f2fs module is unloaded:
[ 231.378192] BUG f2fs_dirty_dir_entry (Tainted: GF O):
Objects remaining in f2fs_dirty_dir_entry on kmem_cache_close()
[ 231.378193]
-----------------------------------------------------------------------------
[ 231.378194] Disabling lock debugging due to kernel taint
[ 231.378195] INFO: Slab 0xffffea0000437200 objects=102 used=1
fp=0xffff880010dc8fc8 flags=0x3fffc000000080
[ 231.378197] CPU: 0 PID: 2605 Comm: rmmod Tainted: GF B O
3.14.0-rc1fs #4
[ 231.378198] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
VirtualBox 12/01/2006
[ 231.378199] ffff88000e5e3200 ffff88000cc9bd40 ffffffff8166fd7e
ffffea0000437200
[ 231.378202] ffff88000cc9be28 ffffffff811c3fdf ffff88003fc10066
ffffffff0cc9bda0
[ 231.378203] ffffffff00000020 ffff88000cc9be38 ffff88000cc9bde0
656a624f00000296
[ 231.378205] Call Trace:
[ 231.378210] [<ffffffff8166fd7e>] dump_stack+0x45/0x56
[ 231.378213] [<ffffffff811c3fdf>] slab_err+0xaf/0xc0
[ 231.378215] [<ffffffff811c84a3>] ? kmem_cache_close+0x133/0x340
[ 231.378216] [<ffffffff811c6b55>] ? __kmalloc+0x1f5/0x250
[ 231.378218] [<ffffffff811c84c3>] kmem_cache_close+0x153/0x340
[ 231.378221] [<ffffffff81193417>] ? kmem_cache_destroy+0x27/0xf0
[ 231.378223] [<ffffffff811c86c4>] __kmem_cache_shutdown+0x14/0x80
[ 231.378224] [<ffffffff81193431>] kmem_cache_destroy+0x41/0xf0
[ 231.378229] [<ffffffffa01eab91>] destroy_checkpoint_caches+0x21/0x30
[f2fs]
[ 231.378232] [<ffffffffa01facda>] exit_f2fs_fs+0x28/0x34e [f2fs]
[ 231.378235] [<ffffffff810ffe32>] SyS_delete_module+0x152/0x1f0
[ 231.378237] [<ffffffff8111d85c>] ? __audit_syscall_entry+0x9c/0xf0
[ 231.378239] [<ffffffff81680729>] system_call_fastpath+0x16/0x1b
[ 231.378242] INFO: Object 0xffff880010dc8000 @offset=0
[ 231.378245] kmem_cache_destroy f2fs_dirty_dir_entry: Slab cache still
has objects
[ 231.378247] CPU: 0 PID: 2605 Comm: rmmod Tainted: GF B O
3.14.0-rc1fs #4
[ 231.378247] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
VirtualBox 12/01/2006
[ 231.378248] ffff88000e5e3268 ffff88000cc9beb8 ffffffff8166fd7e
ffff88000e5e3200
[ 231.378250] ffff88000cc9bed8 ffffffff811934cf 0000000000000000
ffffffffa0204f60
[ 231.378251] ffff88000cc9bee8 ffffffffa01eab91 ffff88000cc9bef8
ffffffffa01facda
[ 231.378253] Call Trace:
[ 231.378255] [<ffffffff8166fd7e>] dump_stack+0x45/0x56
[ 231.378256] [<ffffffff811934cf>] kmem_cache_destroy+0xdf/0xf0
[ 231.378259] [<ffffffffa01eab91>] destroy_checkpoint_caches+0x21/0x30
[f2fs]
[ 231.378262] [<ffffffffa01facda>] exit_f2fs_fs+0x28/0x34e [f2fs]
[ 231.378263] [<ffffffff810ffe32>] SyS_delete_module+0x152/0x1f0
[ 231.378265] [<ffffffff8111d85c>] ? __audit_syscall_entry+0x9c/0xf0
[ 231.378266] [<ffffffff81680729>] system_call_fastpath+0x16/0x1b
Stack of allocation (obtained with KEDR, which is also used for fault
simulation):
[ 231.414875] [leak_check] Address: 0xffff880010dc8000, size: 24; stack
trace of the allocation:
[ 231.414886] [leak_check] [<ffffffffa01e9d72>]
set_dirty_dir_page+0x62/0xe0 [f2fs]
[ 231.414893] [leak_check] [<ffffffffa01ec9be>]
f2fs_set_data_page_dirty+0x4e/0x90 [f2fs]
[ 231.414898] [leak_check] [<ffffffff8117b02a>] set_page_dirty+0x3a/0x60
[ 231.414904] [leak_check] [<ffffffffa01dfeb2>]
__f2fs_add_link+0x732/0x7d0 [f2fs]
[ 231.414909] [leak_check] [<ffffffffa01e2f1b>] f2fs_mkdir+0xbb/0x150
[f2fs]
[ 231.414914] [leak_check] [<ffffffff811f2a37>] vfs_mkdir+0xb7/0x160
[ 231.414918] [leak_check] [<ffffffff811f367f>] SyS_mkdir+0x5f/0xc0
[ 231.414923] [leak_check] [<ffffffff81680729>]
system_call_fastpath+0x16/0x1b
[ 231.414931] [leak_check] [<ffffffffffffffff>] 0xffffffffffffffff
P.S. It was required to add 'slub_debug' kernel options for make SLUB
output correct cache name,
otherwise cache "f2fs_dirty_dir_entry" was merged into "free_nid" one.
It was surprise for me,
that's why patch investigation took so long time.
--
Best regards,
Andrey Tsyvarev
Linux Verification Center, ISPRAS
web:http://linuxtesting.org
Hi Andrey,
On 02/11/2014 04:29 PM, Andrey Tsyvarev wrote:
> Hi,
>
>> It turns out that make_bad_inode prior to iput sets i_mode to a regular
>> file, so that f2fs_evict_inode -> truncate_inode_pages ->
>> f2fs_invalidate_data_page doesn't decrement dirty_dents.
>>
> It seems that remove_dirty_dir_inode() call should also be added to the error-path of
> init_inode_metadata, because its functionality is also based on inode->i_mode field
> which is changed by make_bad_inode().
It seems that your opinion is correct. remove_dirty_dir_inode() will not clean up the
dir_inode_entry because make_bad_inode() sets i_mode to S_IFREG in the fail path of
init_inode_metadata, and it leads to the following "memory leak".
BTW, have you tested the case that added remove_dirty_dir_inode() into the fail path
of init_inode_metadata?
diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index e095a4f..d5a2c9e 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -375,6 +375,7 @@ put_error:
/* once the failed inode becomes a bad inode, i_mode is S_IFREG */
truncate_inode_pages(&inode->i_data, 0);
truncate_blocks(inode, 0);
+ remove_dirty_dir_inode(inode);
error:
remove_inode_page(inode);
return ERR_PTR(err);
Regards,
Gu
>
> Otherwise memory leak is reported when f2fs module is unloaded:
>
> [ 231.378192] BUG f2fs_dirty_dir_entry (Tainted: GF O): Objects remaining in f2fs_dirty_dir_entry on kmem_cache_close()
> [ 231.378193] -----------------------------------------------------------------------------
>
> [ 231.378194] Disabling lock debugging due to kernel taint
> [ 231.378195] INFO: Slab 0xffffea0000437200 objects=102 used=1 fp=0xffff880010dc8fc8 flags=0x3fffc000000080
> [ 231.378197] CPU: 0 PID: 2605 Comm: rmmod Tainted: GF B O 3.14.0-rc1fs #4
> [ 231.378198] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
> [ 231.378199] ffff88000e5e3200 ffff88000cc9bd40 ffffffff8166fd7e ffffea0000437200
> [ 231.378202] ffff88000cc9be28 ffffffff811c3fdf ffff88003fc10066 ffffffff0cc9bda0
> [ 231.378203] ffffffff00000020 ffff88000cc9be38 ffff88000cc9bde0 656a624f00000296
> [ 231.378205] Call Trace:
> [ 231.378210] [<ffffffff8166fd7e>] dump_stack+0x45/0x56
> [ 231.378213] [<ffffffff811c3fdf>] slab_err+0xaf/0xc0
> [ 231.378215] [<ffffffff811c84a3>] ? kmem_cache_close+0x133/0x340
> [ 231.378216] [<ffffffff811c6b55>] ? __kmalloc+0x1f5/0x250
> [ 231.378218] [<ffffffff811c84c3>] kmem_cache_close+0x153/0x340
> [ 231.378221] [<ffffffff81193417>] ? kmem_cache_destroy+0x27/0xf0
> [ 231.378223] [<ffffffff811c86c4>] __kmem_cache_shutdown+0x14/0x80
> [ 231.378224] [<ffffffff81193431>] kmem_cache_destroy+0x41/0xf0
> [ 231.378229] [<ffffffffa01eab91>] destroy_checkpoint_caches+0x21/0x30 [f2fs]
> [ 231.378232] [<ffffffffa01facda>] exit_f2fs_fs+0x28/0x34e [f2fs]
> [ 231.378235] [<ffffffff810ffe32>] SyS_delete_module+0x152/0x1f0
> [ 231.378237] [<ffffffff8111d85c>] ? __audit_syscall_entry+0x9c/0xf0
> [ 231.378239] [<ffffffff81680729>] system_call_fastpath+0x16/0x1b
> [ 231.378242] INFO: Object 0xffff880010dc8000 @offset=0
> [ 231.378245] kmem_cache_destroy f2fs_dirty_dir_entry: Slab cache still has objects
> [ 231.378247] CPU: 0 PID: 2605 Comm: rmmod Tainted: GF B O 3.14.0-rc1fs #4
> [ 231.378247] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
> [ 231.378248] ffff88000e5e3268 ffff88000cc9beb8 ffffffff8166fd7e ffff88000e5e3200
> [ 231.378250] ffff88000cc9bed8 ffffffff811934cf 0000000000000000 ffffffffa0204f60
> [ 231.378251] ffff88000cc9bee8 ffffffffa01eab91 ffff88000cc9bef8 ffffffffa01facda
> [ 231.378253] Call Trace:
> [ 231.378255] [<ffffffff8166fd7e>] dump_stack+0x45/0x56
> [ 231.378256] [<ffffffff811934cf>] kmem_cache_destroy+0xdf/0xf0
> [ 231.378259] [<ffffffffa01eab91>] destroy_checkpoint_caches+0x21/0x30 [f2fs]
> [ 231.378262] [<ffffffffa01facda>] exit_f2fs_fs+0x28/0x34e [f2fs]
> [ 231.378263] [<ffffffff810ffe32>] SyS_delete_module+0x152/0x1f0
> [ 231.378265] [<ffffffff8111d85c>] ? __audit_syscall_entry+0x9c/0xf0
> [ 231.378266] [<ffffffff81680729>] system_call_fastpath+0x16/0x1b
>
>
> Stack of allocation (obtained with KEDR, which is also used for fault simulation):
>
> [ 231.414875] [leak_check] Address: 0xffff880010dc8000, size: 24; stack trace of the allocation:
> [ 231.414886] [leak_check] [<ffffffffa01e9d72>] set_dirty_dir_page+0x62/0xe0 [f2fs]
> [ 231.414893] [leak_check] [<ffffffffa01ec9be>] f2fs_set_data_page_dirty+0x4e/0x90 [f2fs]
> [ 231.414898] [leak_check] [<ffffffff8117b02a>] set_page_dirty+0x3a/0x60
> [ 231.414904] [leak_check] [<ffffffffa01dfeb2>] __f2fs_add_link+0x732/0x7d0 [f2fs]
> [ 231.414909] [leak_check] [<ffffffffa01e2f1b>] f2fs_mkdir+0xbb/0x150 [f2fs]
> [ 231.414914] [leak_check] [<ffffffff811f2a37>] vfs_mkdir+0xb7/0x160
> [ 231.414918] [leak_check] [<ffffffff811f367f>] SyS_mkdir+0x5f/0xc0
> [ 231.414923] [leak_check] [<ffffffff81680729>] system_call_fastpath+0x16/0x1b
> [ 231.414931] [leak_check] [<ffffffffffffffff>] 0xffffffffffffffff
>
>
> P.S. It was required to add 'slub_debug' kernel options for make SLUB output correct cache name,
> otherwise cache "f2fs_dirty_dir_entry" was merged into "free_nid" one. It was surprise for me,
> that's why patch investigation took so long time.
>
Hi,
> BTW, have you tested the case that added remove_dirty_dir_inode() into the fail path
> of init_inode_metadata?
> diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
> index e095a4f..d5a2c9e 100644
> --- a/fs/f2fs/dir.c
> +++ b/fs/f2fs/dir.c
> @@ -375,6 +375,7 @@ put_error:
> /* once the failed inode becomes a bad inode, i_mode is S_IFREG */
> truncate_inode_pages(&inode->i_data, 0);
> truncate_blocks(inode, 0);
> + remove_dirty_dir_inode(inode);
> error:
> remove_inode_page(inode);
> return ERR_PTR(err);
Yes, i have tested that case. Fail in init_inode_metadata has been
processed correctly. Thanks.
--
Best regards,
Andrey Tsyvarev
Linux Verification Center, ISPRAS
web:http://linuxtesting.org
Hi,
On 02/13/2014 05:40 PM, Andrey Tsyvarev wrote:
> Hi,
>
>> BTW, have you tested the case that added remove_dirty_dir_inode() into the fail path
>> of init_inode_metadata?
>> diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
>> index e095a4f..d5a2c9e 100644
>> --- a/fs/f2fs/dir.c
>> +++ b/fs/f2fs/dir.c
>> @@ -375,6 +375,7 @@ put_error:
>> /* once the failed inode becomes a bad inode, i_mode is S_IFREG */
>> truncate_inode_pages(&inode->i_data, 0);
>> truncate_blocks(inode, 0);
>> + remove_dirty_dir_inode(inode);
>> error:
>> remove_inode_page(inode);
>> return ERR_PTR(err);
> Yes, i have tested that case. Fail in init_inode_metadata has been processed correctly. Thanks.
If no other regressions, maybe you can send out the fix patch about this issue.:)
Thanks,
Gu
>
Hi,
Sorry for the late response.
I suffered from flu during last a couple of days. :(
2014-02-11 (화), 12:29 +0400, Andrey Tsyvarev:
> Hi,
>
> > It turns out that make_bad_inode prior to iput sets i_mode to a regular
> > file, so that f2fs_evict_inode -> truncate_inode_pages ->
> > f2fs_invalidate_data_page doesn't decrement dirty_dents.
> >
> It seems that remove_dirty_dir_inode() call should also be added to the
> error-path of
> init_inode_metadata, because its functionality is also based on
> inode->i_mode field
> which is changed by make_bad_inode().
Agreed.
I'll update the patch with this.
Thanks,
--
Jaegeuk Kim
Samsung
Hi,
2014-02-13 (목), 17:48 +0800, Gu Zheng:
> Hi,
> On 02/13/2014 05:40 PM, Andrey Tsyvarev wrote:
>
> > Hi,
> >
> >> BTW, have you tested the case that added remove_dirty_dir_inode() into the fail path
> >> of init_inode_metadata?
> >> diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
> >> index e095a4f..d5a2c9e 100644
> >> --- a/fs/f2fs/dir.c
> >> +++ b/fs/f2fs/dir.c
> >> @@ -375,6 +375,7 @@ put_error:
> >> /* once the failed inode becomes a bad inode, i_mode is S_IFREG */
> >> truncate_inode_pages(&inode->i_data, 0);
> >> truncate_blocks(inode, 0);
> >> + remove_dirty_dir_inode(inode);
> >> error:
> >> remove_inode_page(inode);
> >> return ERR_PTR(err);
> > Yes, i have tested that case. Fail in init_inode_metadata has been processed correctly. Thanks.
>
> If no other regressions, maybe you can send out the fix patch about this issue.:)
The original patch can treat this too.
Thank you. :)
>
> Thanks,
> Gu
>
> >
>
>
--
Jaegeuk Kim
Samsung
Hello,
When mount this f2fs image:
http://linuxtesting.org/downloads/f2fs_fault_image.zip
BUG_ON is triggered in f2fs driver (messages below are generated on
kernel 3.13.2; for other kernels output is similar):
[ 2416.364463] kernel BUG at fs/f2fs/node.c:215!
[ 2416.364464] invalid opcode: 0000 [#1] SMP
[ 2416.364466] Modules linked in: f2fs fuse ip6t_rpfilter ip6t_REJECT
xt_conntrack bnep bluetooth rfkill ebtable_nat ebtable_broute bridge stp
llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6
nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security
ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle
iptable_security iptable_raw vboxsf(OF) snd_intel8x0 snd_ac97_codec
ac97_bus snd_seq snd_seq_device ppdev snd_pcm snd_page_alloc snd_timer
snd e1000 joydev soundcore microcode serio_raw parport_pc parport
vboxvideo(OF) drm i2c_piix4 i2c_core vboxguest(OF) ata_generic pata_acpi
[ 2416.364493] CPU: 0 PID: 2117 Comm: mount Tainted: GF O 3.10.11fs #4
[ 2416.364494] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
VirtualBox 12/01/2006
[ 2416.364496] task: ffff8800304d3fc0 ti: ffff88000dbae000 task.ti:
ffff88000dbae000
[ 2416.364497] RIP: 0010:[<ffffffffa0329f2e>] [<ffffffffa0329f2e>]
set_node_addr.clone.1+0x1de/0x270 [f2fs]
[ 2416.364503] RSP: 0018:ffff88000dbafaa8 EFLAGS: 00010202
[ 2416.364504] RAX: ffff880034bc0030 RBX: ffff88000dbafaf8 RCX:
0000000000000000
[ 2416.364505] RDX: 0000000000000000 RSI: 0000000000000005 RDI:
0000000000000000
[ 2416.364505] RBP: ffff88000dbafae8 R08: ffff880034bc0030 R09:
ffff88000860e6e8
[ 2416.364506] R10: 0000000000000001 R11: 000000000084642a R12:
ffff88001f617020
[ 2416.364507] R13: ffff88001f617000 R14: ffff88001f617010 R15:
00000000ffffffff
[ 2416.364509] FS: 00007f8597b25880(0000) GS:ffff88003fc00000(0000)
knlGS:0000000000000000
[ 2416.364510] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2416.364511] CR2: 00007ffc645020b0 CR3: 000000003c699000 CR4:
00000000000006f0
[ 2416.364514] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 2416.364515] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[ 2416.364516] Stack:
[ 2416.364517] 01fa000000000400 ffff88001f617000 ffff88000dbafae8
ffff880033900000
[ 2416.364519] ffffea0000ddbec0 ffff8800339008f8 ffff88003bc4b000
ffff880000000000
[ 2416.364521] ffff88000dbafb68 ffffffffa032ebad 0000000500000005
000000000001fa00
[ 2416.364523] Call Trace:
[ 2416.364528] [<ffffffffa032ebad>] recover_inode_page+0x1fd/0x3e0 [f2fs]
[ 2416.364531] [<ffffffff811446e7>] ? __lock_page+0x67/0x70
[ 2416.364535] [<ffffffff81089990>] ? autoremove_wake_function+0x50/0x50
[ 2416.364538] [<ffffffffa0337788>] recover_fsync_data+0x1398/0x15d0 [f2fs]
[ 2416.364541] [<ffffffff812b9e5c>] ? selinux_d_instantiate+0x1c/0x20
[ 2416.364544] [<ffffffff811cb20b>] ? d_instantiate+0x5b/0x80
[ 2416.364547] [<ffffffffa0321044>] f2fs_fill_super+0xb04/0xbf0 [f2fs]
[ 2416.364549] [<ffffffff811b861e>] ? mount_bdev+0x7e/0x210
[ 2416.364551] [<ffffffff811b8769>] mount_bdev+0x1c9/0x210
[ 2416.364554] [<ffffffffa0320540>] ? validate_superblock+0x210/0x210 [f2fs]
[ 2416.364557] [<ffffffffa031cf8d>] f2fs_mount+0x1d/0x30 [f2fs]
[ 2416.364559] [<ffffffff811b9497>] mount_fs+0x47/0x1c0
[ 2416.364562] [<ffffffff81166e00>] ? __alloc_percpu+0x10/0x20
[ 2416.364564] [<ffffffff811d4032>] vfs_kern_mount+0x72/0x110
[ 2416.364566] [<ffffffff811d6763>] do_mount+0x493/0x910
[ 2416.364568] [<ffffffff811615cb>] ? strndup_user+0x5b/0x80
[ 2416.364570] [<ffffffff811d6c70>] SyS_mount+0x90/0xe0
[ 2416.364573] [<ffffffff8166f8d9>] system_call_fastpath+0x16/0x1b
[ 2416.364574] Code: a0 24 02 00 01 48 8b 13 48 89 50 18 48 8b 53 08 48
89 50 20 48 8b 53 10 48 89 50 28 48 83 7b 08 00 74 c4 48 83 05 82 24 02
00 01 <0f> 0b 48 83 05 80 24 02 00 01 48 83 05 58 24 02 00 01 0f 0b 48
[ 2416.364595] RIP [<ffffffffa0329f2e>]
set_node_addr.clone.1+0x1de/0x270 [f2fs]
[ 2416.364598] RSP <ffff88000dbafaa8>
[ 2416.364600] ---[ end trace d203dddb09f4fc3d ]---
Found by Linux File System Verification project (linuxtesting.org).
fsck.f2fs reports that given filesystem is valid.
Moreover, on kernels 3.13.2, 3.14 mount continues to fail(with same
error) even after these operations on given filesystem's image:
mkfs -t f2fs <img>
mount -t f2fs -omand <img> <mount-point>
touch <mount-point>/file.txt
setfacl <mount-point>/file.txt
umount <mount-point>
Initial filesystem's content for above operations is important: if one
applies them to zero-filled or one-filled image, resulted filesystem is
mounted successfully.
--
Best regards,
Andrey Tsyvarev
Linux Verification Center, ISPRAS
web: http://linuxtesting.org
--
Andrey Tsyvarev<[email protected]>
Hi,
Thank you for the report.
I retrieved the fault image and found out that previous garbage data
wreak such the wrong behaviors.
So, I wrote the following patch that fills one zero-block at the
checkpoint procedure.
If the underlying device supports discard, I expect that it mostly
doesn't incur any performance regression significantly.
Could you test this patch?
>From 60588ceb7277aae2a79e7f67f5217d1256720d78 Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim <[email protected]>
Date: Tue, 15 Apr 2014 13:57:55 +0900
Subject: [PATCH] f2fs: avoid to conduct roll-forward due to the remained
garbage blocks
The f2fs always scans the next chain of direct node blocks.
But some garbage blocks are able to be remained due to no discard
support or
SSR triggers.
This occasionally wreaks recovering wrong inodes that were used or
BUG_ONs
due to reallocating node ids as follows.
When mount this f2fs image:
http://linuxtesting.org/downloads/f2fs_fault_image.zip
BUG_ON is triggered in f2fs driver (messages below are generated on
kernel 3.13.2; for other kernels output is similar):
kernel BUG at fs/f2fs/node.c:215!
Call Trace:
[<ffffffffa032ebad>] recover_inode_page+0x1fd/0x3e0 [f2fs]
[<ffffffff811446e7>] ? __lock_page+0x67/0x70
[<ffffffff81089990>] ? autoremove_wake_function+0x50/0x50
[<ffffffffa0337788>] recover_fsync_data+0x1398/0x15d0 [f2fs]
[<ffffffff812b9e5c>] ? selinux_d_instantiate+0x1c/0x20
[<ffffffff811cb20b>] ? d_instantiate+0x5b/0x80
[<ffffffffa0321044>] f2fs_fill_super+0xb04/0xbf0 [f2fs]
[<ffffffff811b861e>] ? mount_bdev+0x7e/0x210
[<ffffffff811b8769>] mount_bdev+0x1c9/0x210
[<ffffffffa0320540>] ? validate_superblock+0x210/0x210 [f2fs]
[<ffffffffa031cf8d>] f2fs_mount+0x1d/0x30 [f2fs]
[<ffffffff811b9497>] mount_fs+0x47/0x1c0
[<ffffffff81166e00>] ? __alloc_percpu+0x10/0x20
[<ffffffff811d4032>] vfs_kern_mount+0x72/0x110
[<ffffffff811d6763>] do_mount+0x493/0x910
[<ffffffff811615cb>] ? strndup_user+0x5b/0x80
[<ffffffff811d6c70>] SyS_mount+0x90/0xe0
[<ffffffff8166f8d9>] system_call_fastpath+0x16/0x1b
Found by Linux File System Verification project (linuxtesting.org).
Reported-by: Andrey Tsyvarev <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/checkpoint.c | 6 ++++++
fs/f2fs/f2fs.h | 1 +
fs/f2fs/segment.c | 17 +++++++++++++++--
3 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 4aa521a..890e23d 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -762,6 +762,12 @@ static void do_checkpoint(struct f2fs_sb_info *sbi,
bool is_umount)
void *kaddr;
int i;
+ /*
+ * This avoids to conduct wrong roll-forward operations and uses
+ * metapages, so should be called prior to sync_meta_pages below.
+ */
+ discard_next_dnode(sbi);
+
/* Flush all the NAT/SIT pages */
while (get_pages(sbi, F2FS_DIRTY_META))
sync_meta_pages(sbi, META, LONG_MAX);
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 2ecac83..2c5a5da 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1179,6 +1179,7 @@ int f2fs_issue_flush(struct f2fs_sb_info *);
void invalidate_blocks(struct f2fs_sb_info *, block_t);
void refresh_sit_entry(struct f2fs_sb_info *, block_t, block_t);
void clear_prefree_segments(struct f2fs_sb_info *);
+void discard_next_dnode(struct f2fs_sb_info *);
int npages_for_summary_flush(struct f2fs_sb_info *);
void allocate_new_segments(struct f2fs_sb_info *);
struct page *get_sum_page(struct f2fs_sb_info *, unsigned int);
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 1e264e7..9993f94 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -335,13 +335,26 @@ static void locate_dirty_segment(struct
f2fs_sb_info *sbi, unsigned int segno)
mutex_unlock(&dirty_i->seglist_lock);
}
-static void f2fs_issue_discard(struct f2fs_sb_info *sbi,
+static int f2fs_issue_discard(struct f2fs_sb_info *sbi,
block_t blkstart, block_t blklen)
{
sector_t start = SECTOR_FROM_BLOCK(sbi, blkstart);
sector_t len = SECTOR_FROM_BLOCK(sbi, blklen);
- blkdev_issue_discard(sbi->sb->s_bdev, start, len, GFP_NOFS, 0);
trace_f2fs_issue_discard(sbi->sb, blkstart, blklen);
+ return blkdev_issue_discard(sbi->sb->s_bdev, start, len, GFP_NOFS, 0);
+}
+
+void discard_next_dnode(struct f2fs_sb_info *sbi)
+{
+ struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_WARM_NODE);
+ block_t blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
+
+ if (f2fs_issue_discard(sbi, blkaddr, 1)) {
+ struct page *page = grab_meta_page(sbi, blkaddr);
+ /* zero-filled page */
+ set_page_dirty(page);
+ f2fs_put_page(page, 1);
+ }
}
static void add_discard_addrs(struct f2fs_sb_info *sbi,
--
1.8.4.474.g128a96c
--
Jaegeuk Kim
Samsung
Hi,
With this patch mounting of the image continues to fail (with similar
BUG_ON).
But when image is formatted again (and steps mentioned in the previous
message are performed),
mounting of it is now succeed.
Is this is a true purpose of the patch?
15.04.2014 15:04, Jaegeuk Kim пишет:
> Hi,
>
> Thank you for the report.
> I retrieved the fault image and found out that previous garbage data
> wreak such the wrong behaviors.
> So, I wrote the following patch that fills one zero-block at the
> checkpoint procedure.
> If the underlying device supports discard, I expect that it mostly
> doesn't incur any performance regression significantly.
>
> Could you test this patch?
>
> >From 60588ceb7277aae2a79e7f67f5217d1256720d78 Mon Sep 17 00:00:00 2001
> From: Jaegeuk Kim <[email protected]>
> Date: Tue, 15 Apr 2014 13:57:55 +0900
> Subject: [PATCH] f2fs: avoid to conduct roll-forward due to the remained
> garbage blocks
>
> The f2fs always scans the next chain of direct node blocks.
> But some garbage blocks are able to be remained due to no discard
> support or
> SSR triggers.
> This occasionally wreaks recovering wrong inodes that were used or
> BUG_ONs
> due to reallocating node ids as follows.
>
> When mount this f2fs image:
> http://linuxtesting.org/downloads/f2fs_fault_image.zip
> BUG_ON is triggered in f2fs driver (messages below are generated on
> kernel 3.13.2; for other kernels output is similar):
>
> kernel BUG at fs/f2fs/node.c:215!
> Call Trace:
> [<ffffffffa032ebad>] recover_inode_page+0x1fd/0x3e0 [f2fs]
> [<ffffffff811446e7>] ? __lock_page+0x67/0x70
> [<ffffffff81089990>] ? autoremove_wake_function+0x50/0x50
> [<ffffffffa0337788>] recover_fsync_data+0x1398/0x15d0 [f2fs]
> [<ffffffff812b9e5c>] ? selinux_d_instantiate+0x1c/0x20
> [<ffffffff811cb20b>] ? d_instantiate+0x5b/0x80
> [<ffffffffa0321044>] f2fs_fill_super+0xb04/0xbf0 [f2fs]
> [<ffffffff811b861e>] ? mount_bdev+0x7e/0x210
> [<ffffffff811b8769>] mount_bdev+0x1c9/0x210
> [<ffffffffa0320540>] ? validate_superblock+0x210/0x210 [f2fs]
> [<ffffffffa031cf8d>] f2fs_mount+0x1d/0x30 [f2fs]
> [<ffffffff811b9497>] mount_fs+0x47/0x1c0
> [<ffffffff81166e00>] ? __alloc_percpu+0x10/0x20
> [<ffffffff811d4032>] vfs_kern_mount+0x72/0x110
> [<ffffffff811d6763>] do_mount+0x493/0x910
> [<ffffffff811615cb>] ? strndup_user+0x5b/0x80
> [<ffffffff811d6c70>] SyS_mount+0x90/0xe0
> [<ffffffff8166f8d9>] system_call_fastpath+0x16/0x1b
>
> Found by Linux File System Verification project (linuxtesting.org).
>
> Reported-by: Andrey Tsyvarev <[email protected]>
> Signed-off-by: Jaegeuk Kim <[email protected]>
> ---
> fs/f2fs/checkpoint.c | 6 ++++++
> fs/f2fs/f2fs.h | 1 +
> fs/f2fs/segment.c | 17 +++++++++++++++--
> 3 files changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index 4aa521a..890e23d 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -762,6 +762,12 @@ static void do_checkpoint(struct f2fs_sb_info *sbi,
> bool is_umount)
> void *kaddr;
> int i;
>
> + /*
> + * This avoids to conduct wrong roll-forward operations and uses
> + * metapages, so should be called prior to sync_meta_pages below.
> + */
> + discard_next_dnode(sbi);
> +
> /* Flush all the NAT/SIT pages */
> while (get_pages(sbi, F2FS_DIRTY_META))
> sync_meta_pages(sbi, META, LONG_MAX);
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 2ecac83..2c5a5da 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1179,6 +1179,7 @@ int f2fs_issue_flush(struct f2fs_sb_info *);
> void invalidate_blocks(struct f2fs_sb_info *, block_t);
> void refresh_sit_entry(struct f2fs_sb_info *, block_t, block_t);
> void clear_prefree_segments(struct f2fs_sb_info *);
> +void discard_next_dnode(struct f2fs_sb_info *);
> int npages_for_summary_flush(struct f2fs_sb_info *);
> void allocate_new_segments(struct f2fs_sb_info *);
> struct page *get_sum_page(struct f2fs_sb_info *, unsigned int);
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 1e264e7..9993f94 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -335,13 +335,26 @@ static void locate_dirty_segment(struct
> f2fs_sb_info *sbi, unsigned int segno)
> mutex_unlock(&dirty_i->seglist_lock);
> }
>
> -static void f2fs_issue_discard(struct f2fs_sb_info *sbi,
> +static int f2fs_issue_discard(struct f2fs_sb_info *sbi,
> block_t blkstart, block_t blklen)
> {
> sector_t start = SECTOR_FROM_BLOCK(sbi, blkstart);
> sector_t len = SECTOR_FROM_BLOCK(sbi, blklen);
> - blkdev_issue_discard(sbi->sb->s_bdev, start, len, GFP_NOFS, 0);
> trace_f2fs_issue_discard(sbi->sb, blkstart, blklen);
> + return blkdev_issue_discard(sbi->sb->s_bdev, start, len, GFP_NOFS, 0);
> +}
> +
> +void discard_next_dnode(struct f2fs_sb_info *sbi)
> +{
> + struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_WARM_NODE);
> + block_t blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
> +
> + if (f2fs_issue_discard(sbi, blkaddr, 1)) {
> + struct page *page = grab_meta_page(sbi, blkaddr);
> + /* zero-filled page */
> + set_page_dirty(page);
> + f2fs_put_page(page, 1);
> + }
> }
>
> static void add_discard_addrs(struct f2fs_sb_info *sbi,
--
Andrey Tsyvarev <[email protected]>
Hi,
2014-04-16 (수), 13:11 +0400, Andrey Tsyvarev:
> Hi,
>
> With this patch mounting of the image continues to fail (with similar
> BUG_ON).
> But when image is formatted again (and steps mentioned in the previous
> message are performed),
> mounting of it is now succeed.
>
> Is this is a true purpose of the patch?
Indeed. The patch solves there-in root cause.
But, if you're trying to use the failed image again, simply you can skip
the errorneous part by:
# mount ... -o disable_roll_forward ...
Once sync or umount whatever checkpoint is done after that, the image
will be mounted without "disable_roll_forward".
Thanks,
>
> 15.04.2014 15:04, Jaegeuk Kim пишет:
> > Hi,
> >
> > Thank you for the report.
> > I retrieved the fault image and found out that previous garbage data
> > wreak such the wrong behaviors.
> > So, I wrote the following patch that fills one zero-block at the
> > checkpoint procedure.
> > If the underlying device supports discard, I expect that it mostly
> > doesn't incur any performance regression significantly.
> >
> > Could you test this patch?
> >
> > >From 60588ceb7277aae2a79e7f67f5217d1256720d78 Mon Sep 17 00:00:00 2001
> > From: Jaegeuk Kim <[email protected]>
> > Date: Tue, 15 Apr 2014 13:57:55 +0900
> > Subject: [PATCH] f2fs: avoid to conduct roll-forward due to the remained
> > garbage blocks
> >
> > The f2fs always scans the next chain of direct node blocks.
> > But some garbage blocks are able to be remained due to no discard
> > support or
> > SSR triggers.
> > This occasionally wreaks recovering wrong inodes that were used or
> > BUG_ONs
> > due to reallocating node ids as follows.
> >
> > When mount this f2fs image:
> > http://linuxtesting.org/downloads/f2fs_fault_image.zip
> > BUG_ON is triggered in f2fs driver (messages below are generated on
> > kernel 3.13.2; for other kernels output is similar):
> >
> > kernel BUG at fs/f2fs/node.c:215!
> > Call Trace:
> > [<ffffffffa032ebad>] recover_inode_page+0x1fd/0x3e0 [f2fs]
> > [<ffffffff811446e7>] ? __lock_page+0x67/0x70
> > [<ffffffff81089990>] ? autoremove_wake_function+0x50/0x50
> > [<ffffffffa0337788>] recover_fsync_data+0x1398/0x15d0 [f2fs]
> > [<ffffffff812b9e5c>] ? selinux_d_instantiate+0x1c/0x20
> > [<ffffffff811cb20b>] ? d_instantiate+0x5b/0x80
> > [<ffffffffa0321044>] f2fs_fill_super+0xb04/0xbf0 [f2fs]
> > [<ffffffff811b861e>] ? mount_bdev+0x7e/0x210
> > [<ffffffff811b8769>] mount_bdev+0x1c9/0x210
> > [<ffffffffa0320540>] ? validate_superblock+0x210/0x210 [f2fs]
> > [<ffffffffa031cf8d>] f2fs_mount+0x1d/0x30 [f2fs]
> > [<ffffffff811b9497>] mount_fs+0x47/0x1c0
> > [<ffffffff81166e00>] ? __alloc_percpu+0x10/0x20
> > [<ffffffff811d4032>] vfs_kern_mount+0x72/0x110
> > [<ffffffff811d6763>] do_mount+0x493/0x910
> > [<ffffffff811615cb>] ? strndup_user+0x5b/0x80
> > [<ffffffff811d6c70>] SyS_mount+0x90/0xe0
> > [<ffffffff8166f8d9>] system_call_fastpath+0x16/0x1b
> >
> > Found by Linux File System Verification project (linuxtesting.org).
> >
> > Reported-by: Andrey Tsyvarev <[email protected]>
> > Signed-off-by: Jaegeuk Kim <[email protected]>
> > ---
> > fs/f2fs/checkpoint.c | 6 ++++++
> > fs/f2fs/f2fs.h | 1 +
> > fs/f2fs/segment.c | 17 +++++++++++++++--
> > 3 files changed, 22 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> > index 4aa521a..890e23d 100644
> > --- a/fs/f2fs/checkpoint.c
> > +++ b/fs/f2fs/checkpoint.c
> > @@ -762,6 +762,12 @@ static void do_checkpoint(struct f2fs_sb_info *sbi,
> > bool is_umount)
> > void *kaddr;
> > int i;
> >
> > + /*
> > + * This avoids to conduct wrong roll-forward operations and uses
> > + * metapages, so should be called prior to sync_meta_pages below.
> > + */
> > + discard_next_dnode(sbi);
> > +
> > /* Flush all the NAT/SIT pages */
> > while (get_pages(sbi, F2FS_DIRTY_META))
> > sync_meta_pages(sbi, META, LONG_MAX);
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index 2ecac83..2c5a5da 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -1179,6 +1179,7 @@ int f2fs_issue_flush(struct f2fs_sb_info *);
> > void invalidate_blocks(struct f2fs_sb_info *, block_t);
> > void refresh_sit_entry(struct f2fs_sb_info *, block_t, block_t);
> > void clear_prefree_segments(struct f2fs_sb_info *);
> > +void discard_next_dnode(struct f2fs_sb_info *);
> > int npages_for_summary_flush(struct f2fs_sb_info *);
> > void allocate_new_segments(struct f2fs_sb_info *);
> > struct page *get_sum_page(struct f2fs_sb_info *, unsigned int);
> > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > index 1e264e7..9993f94 100644
> > --- a/fs/f2fs/segment.c
> > +++ b/fs/f2fs/segment.c
> > @@ -335,13 +335,26 @@ static void locate_dirty_segment(struct
> > f2fs_sb_info *sbi, unsigned int segno)
> > mutex_unlock(&dirty_i->seglist_lock);
> > }
> >
> > -static void f2fs_issue_discard(struct f2fs_sb_info *sbi,
> > +static int f2fs_issue_discard(struct f2fs_sb_info *sbi,
> > block_t blkstart, block_t blklen)
> > {
> > sector_t start = SECTOR_FROM_BLOCK(sbi, blkstart);
> > sector_t len = SECTOR_FROM_BLOCK(sbi, blklen);
> > - blkdev_issue_discard(sbi->sb->s_bdev, start, len, GFP_NOFS, 0);
> > trace_f2fs_issue_discard(sbi->sb, blkstart, blklen);
> > + return blkdev_issue_discard(sbi->sb->s_bdev, start, len, GFP_NOFS, 0);
> > +}
> > +
> > +void discard_next_dnode(struct f2fs_sb_info *sbi)
> > +{
> > + struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_WARM_NODE);
> > + block_t blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
> > +
> > + if (f2fs_issue_discard(sbi, blkaddr, 1)) {
> > + struct page *page = grab_meta_page(sbi, blkaddr);
> > + /* zero-filled page */
> > + set_page_dirty(page);
> > + f2fs_put_page(page, 1);
> > + }
> > }
> >
> > static void add_discard_addrs(struct f2fs_sb_info *sbi,
>
--
Jaegeuk Kim
Samsung
Hi,
But would not ability to trigger BUG_ON by mounting a crafted image
considered as an issue having security implications?
Regards,
Alexey
On 16.04.2014 16:35, Jaegeuk Kim wrote:
> Hi,
>
> 2014-04-16 (수), 13:11 +0400, Andrey Tsyvarev:
>> Hi,
>>
>> With this patch mounting of the image continues to fail (with similar
>> BUG_ON).
>> But when image is formatted again (and steps mentioned in the previous
>> message are performed),
>> mounting of it is now succeed.
>>
>> Is this is a true purpose of the patch?
> Indeed. The patch solves there-in root cause.
> But, if you're trying to use the failed image again, simply you can skip
> the errorneous part by:
>
> # mount ... -o disable_roll_forward ...
>
> Once sync or umount whatever checkpoint is done after that, the image
> will be mounted without "disable_roll_forward".
>
> Thanks,
>
>> 15.04.2014 15:04, Jaegeuk Kim пишет:
>>> Hi,
>>>
>>> Thank you for the report.
>>> I retrieved the fault image and found out that previous garbage data
>>> wreak such the wrong behaviors.
>>> So, I wrote the following patch that fills one zero-block at the
>>> checkpoint procedure.
>>> If the underlying device supports discard, I expect that it mostly
>>> doesn't incur any performance regression significantly.
>>>
>>> Could you test this patch?
>>>
>>> >From 60588ceb7277aae2a79e7f67f5217d1256720d78 Mon Sep 17 00:00:00 2001
>>> From: Jaegeuk Kim <[email protected]>
>>> Date: Tue, 15 Apr 2014 13:57:55 +0900
>>> Subject: [PATCH] f2fs: avoid to conduct roll-forward due to the remained
>>> garbage blocks
>>>
>>> The f2fs always scans the next chain of direct node blocks.
>>> But some garbage blocks are able to be remained due to no discard
>>> support or
>>> SSR triggers.
>>> This occasionally wreaks recovering wrong inodes that were used or
>>> BUG_ONs
>>> due to reallocating node ids as follows.
>>>
>>> When mount this f2fs image:
>>> http://linuxtesting.org/downloads/f2fs_fault_image.zip
>>> BUG_ON is triggered in f2fs driver (messages below are generated on
>>> kernel 3.13.2; for other kernels output is similar):
>>>
>>> kernel BUG at fs/f2fs/node.c:215!
>>> Call Trace:
>>> [<ffffffffa032ebad>] recover_inode_page+0x1fd/0x3e0 [f2fs]
>>> [<ffffffff811446e7>] ? __lock_page+0x67/0x70
>>> [<ffffffff81089990>] ? autoremove_wake_function+0x50/0x50
>>> [<ffffffffa0337788>] recover_fsync_data+0x1398/0x15d0 [f2fs]
>>> [<ffffffff812b9e5c>] ? selinux_d_instantiate+0x1c/0x20
>>> [<ffffffff811cb20b>] ? d_instantiate+0x5b/0x80
>>> [<ffffffffa0321044>] f2fs_fill_super+0xb04/0xbf0 [f2fs]
>>> [<ffffffff811b861e>] ? mount_bdev+0x7e/0x210
>>> [<ffffffff811b8769>] mount_bdev+0x1c9/0x210
>>> [<ffffffffa0320540>] ? validate_superblock+0x210/0x210 [f2fs]
>>> [<ffffffffa031cf8d>] f2fs_mount+0x1d/0x30 [f2fs]
>>> [<ffffffff811b9497>] mount_fs+0x47/0x1c0
>>> [<ffffffff81166e00>] ? __alloc_percpu+0x10/0x20
>>> [<ffffffff811d4032>] vfs_kern_mount+0x72/0x110
>>> [<ffffffff811d6763>] do_mount+0x493/0x910
>>> [<ffffffff811615cb>] ? strndup_user+0x5b/0x80
>>> [<ffffffff811d6c70>] SyS_mount+0x90/0xe0
>>> [<ffffffff8166f8d9>] system_call_fastpath+0x16/0x1b
>>>
>>> Found by Linux File System Verification project (linuxtesting.org).
>>>
>>> Reported-by: Andrey Tsyvarev <[email protected]>
>>> Signed-off-by: Jaegeuk Kim <[email protected]>
>>> ---
>>> fs/f2fs/checkpoint.c | 6 ++++++
>>> fs/f2fs/f2fs.h | 1 +
>>> fs/f2fs/segment.c | 17 +++++++++++++++--
>>> 3 files changed, 22 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>>> index 4aa521a..890e23d 100644
>>> --- a/fs/f2fs/checkpoint.c
>>> +++ b/fs/f2fs/checkpoint.c
>>> @@ -762,6 +762,12 @@ static void do_checkpoint(struct f2fs_sb_info *sbi,
>>> bool is_umount)
>>> void *kaddr;
>>> int i;
>>>
>>> + /*
>>> + * This avoids to conduct wrong roll-forward operations and uses
>>> + * metapages, so should be called prior to sync_meta_pages below.
>>> + */
>>> + discard_next_dnode(sbi);
>>> +
>>> /* Flush all the NAT/SIT pages */
>>> while (get_pages(sbi, F2FS_DIRTY_META))
>>> sync_meta_pages(sbi, META, LONG_MAX);
>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>> index 2ecac83..2c5a5da 100644
>>> --- a/fs/f2fs/f2fs.h
>>> +++ b/fs/f2fs/f2fs.h
>>> @@ -1179,6 +1179,7 @@ int f2fs_issue_flush(struct f2fs_sb_info *);
>>> void invalidate_blocks(struct f2fs_sb_info *, block_t);
>>> void refresh_sit_entry(struct f2fs_sb_info *, block_t, block_t);
>>> void clear_prefree_segments(struct f2fs_sb_info *);
>>> +void discard_next_dnode(struct f2fs_sb_info *);
>>> int npages_for_summary_flush(struct f2fs_sb_info *);
>>> void allocate_new_segments(struct f2fs_sb_info *);
>>> struct page *get_sum_page(struct f2fs_sb_info *, unsigned int);
>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>> index 1e264e7..9993f94 100644
>>> --- a/fs/f2fs/segment.c
>>> +++ b/fs/f2fs/segment.c
>>> @@ -335,13 +335,26 @@ static void locate_dirty_segment(struct
>>> f2fs_sb_info *sbi, unsigned int segno)
>>> mutex_unlock(&dirty_i->seglist_lock);
>>> }
>>>
>>> -static void f2fs_issue_discard(struct f2fs_sb_info *sbi,
>>> +static int f2fs_issue_discard(struct f2fs_sb_info *sbi,
>>> block_t blkstart, block_t blklen)
>>> {
>>> sector_t start = SECTOR_FROM_BLOCK(sbi, blkstart);
>>> sector_t len = SECTOR_FROM_BLOCK(sbi, blklen);
>>> - blkdev_issue_discard(sbi->sb->s_bdev, start, len, GFP_NOFS, 0);
>>> trace_f2fs_issue_discard(sbi->sb, blkstart, blklen);
>>> + return blkdev_issue_discard(sbi->sb->s_bdev, start, len, GFP_NOFS, 0);
>>> +}
>>> +
>>> +void discard_next_dnode(struct f2fs_sb_info *sbi)
>>> +{
>>> + struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_WARM_NODE);
>>> + block_t blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
>>> +
>>> + if (f2fs_issue_discard(sbi, blkaddr, 1)) {
>>> + struct page *page = grab_meta_page(sbi, blkaddr);
>>> + /* zero-filled page */
>>> + set_page_dirty(page);
>>> + f2fs_put_page(page, 1);
>>> + }
>>> }
>>>
>>> static void add_discard_addrs(struct f2fs_sb_info *sbi,
Hi,
2014-04-16 (수), 18:11 -0700, Alexey Khoroshilov:
> Hi,
>
> But would not ability to trigger BUG_ON by mounting a crafted image
> considered as an issue having security implications?
Sorry, I can't come up with you.
Could you please explain why this can be related to the security hole?
Did you mean it needs to avoid such the BUG_ONs if the image has
obsolete data being used before?
Thanks,
>
> Regards,
> Alexey
>
>
> On 16.04.2014 16:35, Jaegeuk Kim wrote:
> > Hi,
> >
> > 2014-04-16 (수), 13:11 +0400, Andrey Tsyvarev:
> >> Hi,
> >>
> >> With this patch mounting of the image continues to fail (with similar
> >> BUG_ON).
> >> But when image is formatted again (and steps mentioned in the previous
> >> message are performed),
> >> mounting of it is now succeed.
> >>
> >> Is this is a true purpose of the patch?
> > Indeed. The patch solves there-in root cause.
> > But, if you're trying to use the failed image again, simply you can skip
> > the errorneous part by:
> >
> > # mount ... -o disable_roll_forward ...
> >
> > Once sync or umount whatever checkpoint is done after that, the image
> > will be mounted without "disable_roll_forward".
> >
> > Thanks,
> >
> >> 15.04.2014 15:04, Jaegeuk Kim пишет:
> >>> Hi,
> >>>
> >>> Thank you for the report.
> >>> I retrieved the fault image and found out that previous garbage data
> >>> wreak such the wrong behaviors.
> >>> So, I wrote the following patch that fills one zero-block at the
> >>> checkpoint procedure.
> >>> If the underlying device supports discard, I expect that it mostly
> >>> doesn't incur any performance regression significantly.
> >>>
> >>> Could you test this patch?
> >>>
> >>> >From 60588ceb7277aae2a79e7f67f5217d1256720d78 Mon Sep 17 00:00:00 2001
> >>> From: Jaegeuk Kim <[email protected]>
> >>> Date: Tue, 15 Apr 2014 13:57:55 +0900
> >>> Subject: [PATCH] f2fs: avoid to conduct roll-forward due to the remained
> >>> garbage blocks
> >>>
> >>> The f2fs always scans the next chain of direct node blocks.
> >>> But some garbage blocks are able to be remained due to no discard
> >>> support or
> >>> SSR triggers.
> >>> This occasionally wreaks recovering wrong inodes that were used or
> >>> BUG_ONs
> >>> due to reallocating node ids as follows.
> >>>
> >>> When mount this f2fs image:
> >>> http://linuxtesting.org/downloads/f2fs_fault_image.zip
> >>> BUG_ON is triggered in f2fs driver (messages below are generated on
> >>> kernel 3.13.2; for other kernels output is similar):
> >>>
> >>> kernel BUG at fs/f2fs/node.c:215!
> >>> Call Trace:
> >>> [<ffffffffa032ebad>] recover_inode_page+0x1fd/0x3e0 [f2fs]
> >>> [<ffffffff811446e7>] ? __lock_page+0x67/0x70
> >>> [<ffffffff81089990>] ? autoremove_wake_function+0x50/0x50
> >>> [<ffffffffa0337788>] recover_fsync_data+0x1398/0x15d0 [f2fs]
> >>> [<ffffffff812b9e5c>] ? selinux_d_instantiate+0x1c/0x20
> >>> [<ffffffff811cb20b>] ? d_instantiate+0x5b/0x80
> >>> [<ffffffffa0321044>] f2fs_fill_super+0xb04/0xbf0 [f2fs]
> >>> [<ffffffff811b861e>] ? mount_bdev+0x7e/0x210
> >>> [<ffffffff811b8769>] mount_bdev+0x1c9/0x210
> >>> [<ffffffffa0320540>] ? validate_superblock+0x210/0x210 [f2fs]
> >>> [<ffffffffa031cf8d>] f2fs_mount+0x1d/0x30 [f2fs]
> >>> [<ffffffff811b9497>] mount_fs+0x47/0x1c0
> >>> [<ffffffff81166e00>] ? __alloc_percpu+0x10/0x20
> >>> [<ffffffff811d4032>] vfs_kern_mount+0x72/0x110
> >>> [<ffffffff811d6763>] do_mount+0x493/0x910
> >>> [<ffffffff811615cb>] ? strndup_user+0x5b/0x80
> >>> [<ffffffff811d6c70>] SyS_mount+0x90/0xe0
> >>> [<ffffffff8166f8d9>] system_call_fastpath+0x16/0x1b
> >>>
> >>> Found by Linux File System Verification project (linuxtesting.org).
> >>>
> >>> Reported-by: Andrey Tsyvarev <[email protected]>
> >>> Signed-off-by: Jaegeuk Kim <[email protected]>
> >>> ---
> >>> fs/f2fs/checkpoint.c | 6 ++++++
> >>> fs/f2fs/f2fs.h | 1 +
> >>> fs/f2fs/segment.c | 17 +++++++++++++++--
> >>> 3 files changed, 22 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> >>> index 4aa521a..890e23d 100644
> >>> --- a/fs/f2fs/checkpoint.c
> >>> +++ b/fs/f2fs/checkpoint.c
> >>> @@ -762,6 +762,12 @@ static void do_checkpoint(struct f2fs_sb_info *sbi,
> >>> bool is_umount)
> >>> void *kaddr;
> >>> int i;
> >>>
> >>> + /*
> >>> + * This avoids to conduct wrong roll-forward operations and uses
> >>> + * metapages, so should be called prior to sync_meta_pages below.
> >>> + */
> >>> + discard_next_dnode(sbi);
> >>> +
> >>> /* Flush all the NAT/SIT pages */
> >>> while (get_pages(sbi, F2FS_DIRTY_META))
> >>> sync_meta_pages(sbi, META, LONG_MAX);
> >>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> >>> index 2ecac83..2c5a5da 100644
> >>> --- a/fs/f2fs/f2fs.h
> >>> +++ b/fs/f2fs/f2fs.h
> >>> @@ -1179,6 +1179,7 @@ int f2fs_issue_flush(struct f2fs_sb_info *);
> >>> void invalidate_blocks(struct f2fs_sb_info *, block_t);
> >>> void refresh_sit_entry(struct f2fs_sb_info *, block_t, block_t);
> >>> void clear_prefree_segments(struct f2fs_sb_info *);
> >>> +void discard_next_dnode(struct f2fs_sb_info *);
> >>> int npages_for_summary_flush(struct f2fs_sb_info *);
> >>> void allocate_new_segments(struct f2fs_sb_info *);
> >>> struct page *get_sum_page(struct f2fs_sb_info *, unsigned int);
> >>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> >>> index 1e264e7..9993f94 100644
> >>> --- a/fs/f2fs/segment.c
> >>> +++ b/fs/f2fs/segment.c
> >>> @@ -335,13 +335,26 @@ static void locate_dirty_segment(struct
> >>> f2fs_sb_info *sbi, unsigned int segno)
> >>> mutex_unlock(&dirty_i->seglist_lock);
> >>> }
> >>>
> >>> -static void f2fs_issue_discard(struct f2fs_sb_info *sbi,
> >>> +static int f2fs_issue_discard(struct f2fs_sb_info *sbi,
> >>> block_t blkstart, block_t blklen)
> >>> {
> >>> sector_t start = SECTOR_FROM_BLOCK(sbi, blkstart);
> >>> sector_t len = SECTOR_FROM_BLOCK(sbi, blklen);
> >>> - blkdev_issue_discard(sbi->sb->s_bdev, start, len, GFP_NOFS, 0);
> >>> trace_f2fs_issue_discard(sbi->sb, blkstart, blklen);
> >>> + return blkdev_issue_discard(sbi->sb->s_bdev, start, len, GFP_NOFS, 0);
> >>> +}
> >>> +
> >>> +void discard_next_dnode(struct f2fs_sb_info *sbi)
> >>> +{
> >>> + struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_WARM_NODE);
> >>> + block_t blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
> >>> +
> >>> + if (f2fs_issue_discard(sbi, blkaddr, 1)) {
> >>> + struct page *page = grab_meta_page(sbi, blkaddr);
> >>> + /* zero-filled page */
> >>> + set_page_dirty(page);
> >>> + f2fs_put_page(page, 1);
> >>> + }
> >>> }
> >>>
> >>> static void add_discard_addrs(struct f2fs_sb_info *sbi,
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Jaegeuk Kim
Samsung
On 17.04.2014 00:45, Jaegeuk Kim wrote:
> Hi,
>
> 2014-04-16 (수), 18:11 -0700, Alexey Khoroshilov:
>> Hi,
>>
>> But would not ability to trigger BUG_ON by mounting a crafted image
>> considered as an issue having security implications?
> Sorry, I can't come up with you.
> Could you please explain why this can be related to the security hole?
> Did you mean it needs to avoid such the BUG_ONs if the image has
> obsolete data being used before?
An ability to trigger a BUG_ON assert by mounting a crafted image is
usually considered as a local denial of service [1-3]. As far as I
understand, the reason is that some kernel data may become inconsistent
that can lead to further problems.
[1] http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-3353
[2] http://www.openwall.com/lists/oss-security/2011/06/24/4
[3] http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-2928
etc.
--
Alexey
> On 16.04.2014 16:35, Jaegeuk Kim wrote:
>>> Hi,
>>>
>>> 2014-04-16 (수), 13:11 +0400, Andrey Tsyvarev:
>>>> Hi,
>>>>
>>>> With this patch mounting of the image continues to fail (with similar
>>>> BUG_ON).
>>>> But when image is formatted again (and steps mentioned in the previous
>>>> message are performed),
>>>> mounting of it is now succeed.
>>>>
>>>> Is this is a true purpose of the patch?
>>> Indeed. The patch solves there-in root cause.
>>> But, if you're trying to use the failed image again, simply you can skip
>>> the errorneous part by:
>>>
>>> # mount ... -o disable_roll_forward ...
>>>
>>> Once sync or umount whatever checkpoint is done after that, the image
>>> will be mounted without "disable_roll_forward".
>>>
>>> Thanks,
>>>
>>>> 15.04.2014 15:04, Jaegeuk Kim пишет:
>>>>> Hi,
>>>>>
>>>>> Thank you for the report.
>>>>> I retrieved the fault image and found out that previous garbage data
>>>>> wreak such the wrong behaviors.
>>>>> So, I wrote the following patch that fills one zero-block at the
>>>>> checkpoint procedure.
>>>>> If the underlying device supports discard, I expect that it mostly
>>>>> doesn't incur any performance regression significantly.
>>>>>
>>>>> Could you test this patch?
>>>>>
>>>>> >From 60588ceb7277aae2a79e7f67f5217d1256720d78 Mon Sep 17 00:00:00 2001
>>>>> From: Jaegeuk Kim <[email protected]>
>>>>> Date: Tue, 15 Apr 2014 13:57:55 +0900
>>>>> Subject: [PATCH] f2fs: avoid to conduct roll-forward due to the remained
>>>>> garbage blocks
>>>>>
>>>>> The f2fs always scans the next chain of direct node blocks.
>>>>> But some garbage blocks are able to be remained due to no discard
>>>>> support or
>>>>> SSR triggers.
>>>>> This occasionally wreaks recovering wrong inodes that were used or
>>>>> BUG_ONs
>>>>> due to reallocating node ids as follows.
>>>>>
>>>>> When mount this f2fs image:
>>>>> http://linuxtesting.org/downloads/f2fs_fault_image.zip
>>>>> BUG_ON is triggered in f2fs driver (messages below are generated on
>>>>> kernel 3.13.2; for other kernels output is similar):
>>>>>
>>>>> kernel BUG at fs/f2fs/node.c:215!
>>>>> Call Trace:
>>>>> [<ffffffffa032ebad>] recover_inode_page+0x1fd/0x3e0 [f2fs]
>>>>> [<ffffffff811446e7>] ? __lock_page+0x67/0x70
>>>>> [<ffffffff81089990>] ? autoremove_wake_function+0x50/0x50
>>>>> [<ffffffffa0337788>] recover_fsync_data+0x1398/0x15d0 [f2fs]
>>>>> [<ffffffff812b9e5c>] ? selinux_d_instantiate+0x1c/0x20
>>>>> [<ffffffff811cb20b>] ? d_instantiate+0x5b/0x80
>>>>> [<ffffffffa0321044>] f2fs_fill_super+0xb04/0xbf0 [f2fs]
>>>>> [<ffffffff811b861e>] ? mount_bdev+0x7e/0x210
>>>>> [<ffffffff811b8769>] mount_bdev+0x1c9/0x210
>>>>> [<ffffffffa0320540>] ? validate_superblock+0x210/0x210 [f2fs]
>>>>> [<ffffffffa031cf8d>] f2fs_mount+0x1d/0x30 [f2fs]
>>>>> [<ffffffff811b9497>] mount_fs+0x47/0x1c0
>>>>> [<ffffffff81166e00>] ? __alloc_percpu+0x10/0x20
>>>>> [<ffffffff811d4032>] vfs_kern_mount+0x72/0x110
>>>>> [<ffffffff811d6763>] do_mount+0x493/0x910
>>>>> [<ffffffff811615cb>] ? strndup_user+0x5b/0x80
>>>>> [<ffffffff811d6c70>] SyS_mount+0x90/0xe0
>>>>> [<ffffffff8166f8d9>] system_call_fastpath+0x16/0x1b
>>>>>
>>>>> Found by Linux File System Verification project (linuxtesting.org).
>>>>>
>>>>> Reported-by: Andrey Tsyvarev <[email protected]>
>>>>> Signed-off-by: Jaegeuk Kim <[email protected]>
>>>>> ---
>>>>> fs/f2fs/checkpoint.c | 6 ++++++
>>>>> fs/f2fs/f2fs.h | 1 +
>>>>> fs/f2fs/segment.c | 17 +++++++++++++++--
>>>>> 3 files changed, 22 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>>>>> index 4aa521a..890e23d 100644
>>>>> --- a/fs/f2fs/checkpoint.c
>>>>> +++ b/fs/f2fs/checkpoint.c
>>>>> @@ -762,6 +762,12 @@ static void do_checkpoint(struct f2fs_sb_info *sbi,
>>>>> bool is_umount)
>>>>> void *kaddr;
>>>>> int i;
>>>>>
>>>>> + /*
>>>>> + * This avoids to conduct wrong roll-forward operations and uses
>>>>> + * metapages, so should be called prior to sync_meta_pages below.
>>>>> + */
>>>>> + discard_next_dnode(sbi);
>>>>> +
>>>>> /* Flush all the NAT/SIT pages */
>>>>> while (get_pages(sbi, F2FS_DIRTY_META))
>>>>> sync_meta_pages(sbi, META, LONG_MAX);
>>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>>>> index 2ecac83..2c5a5da 100644
>>>>> --- a/fs/f2fs/f2fs.h
>>>>> +++ b/fs/f2fs/f2fs.h
>>>>> @@ -1179,6 +1179,7 @@ int f2fs_issue_flush(struct f2fs_sb_info *);
>>>>> void invalidate_blocks(struct f2fs_sb_info *, block_t);
>>>>> void refresh_sit_entry(struct f2fs_sb_info *, block_t, block_t);
>>>>> void clear_prefree_segments(struct f2fs_sb_info *);
>>>>> +void discard_next_dnode(struct f2fs_sb_info *);
>>>>> int npages_for_summary_flush(struct f2fs_sb_info *);
>>>>> void allocate_new_segments(struct f2fs_sb_info *);
>>>>> struct page *get_sum_page(struct f2fs_sb_info *, unsigned int);
>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>>> index 1e264e7..9993f94 100644
>>>>> --- a/fs/f2fs/segment.c
>>>>> +++ b/fs/f2fs/segment.c
>>>>> @@ -335,13 +335,26 @@ static void locate_dirty_segment(struct
>>>>> f2fs_sb_info *sbi, unsigned int segno)
>>>>> mutex_unlock(&dirty_i->seglist_lock);
>>>>> }
>>>>>
>>>>> -static void f2fs_issue_discard(struct f2fs_sb_info *sbi,
>>>>> +static int f2fs_issue_discard(struct f2fs_sb_info *sbi,
>>>>> block_t blkstart, block_t blklen)
>>>>> {
>>>>> sector_t start = SECTOR_FROM_BLOCK(sbi, blkstart);
>>>>> sector_t len = SECTOR_FROM_BLOCK(sbi, blklen);
>>>>> - blkdev_issue_discard(sbi->sb->s_bdev, start, len, GFP_NOFS, 0);
>>>>> trace_f2fs_issue_discard(sbi->sb, blkstart, blklen);
>>>>> + return blkdev_issue_discard(sbi->sb->s_bdev, start, len, GFP_NOFS, 0);
>>>>> +}
>>>>> +
>>>>> +void discard_next_dnode(struct f2fs_sb_info *sbi)
>>>>> +{
>>>>> + struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_WARM_NODE);
>>>>> + block_t blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
>>>>> +
>>>>> + if (f2fs_issue_discard(sbi, blkaddr, 1)) {
>>>>> + struct page *page = grab_meta_page(sbi, blkaddr);
>>>>> + /* zero-filled page */
>>>>> + set_page_dirty(page);
>>>>> + f2fs_put_page(page, 1);
>>>>> + }
>>>>> }
>>>>>
>>>>> static void add_discard_addrs(struct f2fs_sb_info *sbi,
>>
Thank you for the explanation.
The following patch will resolve the issue.
Thanks,
>From 2048e7458c982f4297da9d3366ab29224ae2e8b0 Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim <[email protected]>
Date: Fri, 18 Apr 2014 15:21:04 +0900
Subject: [PATCH] f2fs: avoid BUG_ON when mouting corrupted image having
garbage blocks
If the disk has some garbage blocks, F2FS is able to face with BUG_ON
when
recovering direct node blocks.
This patch detects the error case and avoids that prior to reaching
BUG_ON.
Alexey Khoroshilov addressed the potential security issues as follows.
"An ability to trigger a BUG_ON assert by mounting a crafted image is
usually considered as a local denial of service [1-3]. As far as I
understand, the reason is that some kernel data may become inconsistent
that can lead to further problems.
[1] http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-3353
[2] http://www.openwall.com/lists/oss-security/2011/06/24/4
[3] http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-2928
etc."
Reported-by: Andrey Tsyvarev <[email protected]>
Cc: Alexey Khoroshilov <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/node.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 6ebdba1..64755f4 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1609,6 +1609,11 @@ int recover_inode_page(struct f2fs_sb_info *sbi,
struct page *page)
struct node_info old_ni, new_ni;
struct page *ipage;
+ get_node_info(sbi, ino, &old_ni);
+
+ if (unlikely(old_ni.blk_addr != NULL_ADDR))
+ return -EINVAL;
+
ipage = grab_cache_page(NODE_MAPPING(sbi), ino);
if (!ipage)
return -ENOMEM;
@@ -1616,7 +1621,6 @@ int recover_inode_page(struct f2fs_sb_info *sbi,
struct page *page)
/* Should not use this inode from free nid list */
remove_free_nid(NM_I(sbi), ino);
- get_node_info(sbi, ino, &old_ni);
SetPageUptodate(ipage);
fill_node_footer(ipage, ino, ino, 0, true);
--
1.8.4.474.g128a96c
--
Jaegeuk Kim
Samsung
Hi Alexey, Kim,
On 04/18/2014 02:04 PM, Alexey Khoroshilov wrote:
> On 17.04.2014 00:45, Jaegeuk Kim wrote:
>> Hi,
>>
>> 2014-04-16 (수), 18:11 -0700, Alexey Khoroshilov:
>>> Hi,
>>>
>>> But would not ability to trigger BUG_ON by mounting a crafted image
>>> considered as an issue having security implications?
>> Sorry, I can't come up with you.
>> Could you please explain why this can be related to the security hole?
>> Did you mean it needs to avoid such the BUG_ONs if the image has
>> obsolete data being used before?
> An ability to trigger a BUG_ON assert by mounting a crafted image is
> usually considered as a local denial of service [1-3]. As far as I
> understand, the reason is that some kernel data may become inconsistent
> that can lead to further problems.
I think I caught the key point if I do not miss-read it.
Alexey's misgiving is that users can mount a crafted or malicious
image to trigger a BUG_ON, and results in the server panic, especially the
normal user can do this via fuse,user_namespace...
So it seems a leak that the user only mounts an invalid image, but results
in denial of service.
The following 3 cases also show this point.
So the right way may be return the suitable *ERROR* to the user rather than
trigger BUG_ON.
Regards,
Gu
>
> [1] http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-3353
> [2] http://www.openwall.com/lists/oss-security/2011/06/24/4
> [3] http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-2928
> etc.
>
> --
> Alexey
>
>
>> On 16.04.2014 16:35, Jaegeuk Kim wrote:
>>>> Hi,
>>>>
>>>> 2014-04-16 (수), 13:11 +0400, Andrey Tsyvarev:
>>>>> Hi,
>>>>>
>>>>> With this patch mounting of the image continues to fail (with similar
>>>>> BUG_ON).
>>>>> But when image is formatted again (and steps mentioned in the previous
>>>>> message are performed),
>>>>> mounting of it is now succeed.
>>>>>
>>>>> Is this is a true purpose of the patch?
>>>> Indeed. The patch solves there-in root cause.
>>>> But, if you're trying to use the failed image again, simply you can skip
>>>> the errorneous part by:
>>>>
>>>> # mount ... -o disable_roll_forward ...
>>>>
>>>> Once sync or umount whatever checkpoint is done after that, the image
>>>> will be mounted without "disable_roll_forward".
>>>>
>>>> Thanks,
>>>>
>>>>> 15.04.2014 15:04, Jaegeuk Kim пишет:
>>>>>> Hi,
>>>>>>
>>>>>> Thank you for the report.
>>>>>> I retrieved the fault image and found out that previous garbage data
>>>>>> wreak such the wrong behaviors.
>>>>>> So, I wrote the following patch that fills one zero-block at the
>>>>>> checkpoint procedure.
>>>>>> If the underlying device supports discard, I expect that it mostly
>>>>>> doesn't incur any performance regression significantly.
>>>>>>
>>>>>> Could you test this patch?
>>>>>>
>>>>>> >From 60588ceb7277aae2a79e7f67f5217d1256720d78 Mon Sep 17 00:00:00 2001
>>>>>> From: Jaegeuk Kim <[email protected]>
>>>>>> Date: Tue, 15 Apr 2014 13:57:55 +0900
>>>>>> Subject: [PATCH] f2fs: avoid to conduct roll-forward due to the remained
>>>>>> garbage blocks
>>>>>>
>>>>>> The f2fs always scans the next chain of direct node blocks.
>>>>>> But some garbage blocks are able to be remained due to no discard
>>>>>> support or
>>>>>> SSR triggers.
>>>>>> This occasionally wreaks recovering wrong inodes that were used or
>>>>>> BUG_ONs
>>>>>> due to reallocating node ids as follows.
>>>>>>
>>>>>> When mount this f2fs image:
>>>>>> http://linuxtesting.org/downloads/f2fs_fault_image.zip
>>>>>> BUG_ON is triggered in f2fs driver (messages below are generated on
>>>>>> kernel 3.13.2; for other kernels output is similar):
>>>>>>
>>>>>> kernel BUG at fs/f2fs/node.c:215!
>>>>>> Call Trace:
>>>>>> [<ffffffffa032ebad>] recover_inode_page+0x1fd/0x3e0 [f2fs]
>>>>>> [<ffffffff811446e7>] ? __lock_page+0x67/0x70
>>>>>> [<ffffffff81089990>] ? autoremove_wake_function+0x50/0x50
>>>>>> [<ffffffffa0337788>] recover_fsync_data+0x1398/0x15d0 [f2fs]
>>>>>> [<ffffffff812b9e5c>] ? selinux_d_instantiate+0x1c/0x20
>>>>>> [<ffffffff811cb20b>] ? d_instantiate+0x5b/0x80
>>>>>> [<ffffffffa0321044>] f2fs_fill_super+0xb04/0xbf0 [f2fs]
>>>>>> [<ffffffff811b861e>] ? mount_bdev+0x7e/0x210
>>>>>> [<ffffffff811b8769>] mount_bdev+0x1c9/0x210
>>>>>> [<ffffffffa0320540>] ? validate_superblock+0x210/0x210 [f2fs]
>>>>>> [<ffffffffa031cf8d>] f2fs_mount+0x1d/0x30 [f2fs]
>>>>>> [<ffffffff811b9497>] mount_fs+0x47/0x1c0
>>>>>> [<ffffffff81166e00>] ? __alloc_percpu+0x10/0x20
>>>>>> [<ffffffff811d4032>] vfs_kern_mount+0x72/0x110
>>>>>> [<ffffffff811d6763>] do_mount+0x493/0x910
>>>>>> [<ffffffff811615cb>] ? strndup_user+0x5b/0x80
>>>>>> [<ffffffff811d6c70>] SyS_mount+0x90/0xe0
>>>>>> [<ffffffff8166f8d9>] system_call_fastpath+0x16/0x1b
>>>>>>
>>>>>> Found by Linux File System Verification project (linuxtesting.org).
>>>>>>
>>>>>> Reported-by: Andrey Tsyvarev <[email protected]>
>>>>>> Signed-off-by: Jaegeuk Kim <[email protected]>
>>>>>> ---
>>>>>> fs/f2fs/checkpoint.c | 6 ++++++
>>>>>> fs/f2fs/f2fs.h | 1 +
>>>>>> fs/f2fs/segment.c | 17 +++++++++++++++--
>>>>>> 3 files changed, 22 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>>>>>> index 4aa521a..890e23d 100644
>>>>>> --- a/fs/f2fs/checkpoint.c
>>>>>> +++ b/fs/f2fs/checkpoint.c
>>>>>> @@ -762,6 +762,12 @@ static void do_checkpoint(struct f2fs_sb_info *sbi,
>>>>>> bool is_umount)
>>>>>> void *kaddr;
>>>>>> int i;
>>>>>>
>>>>>> + /*
>>>>>> + * This avoids to conduct wrong roll-forward operations and uses
>>>>>> + * metapages, so should be called prior to sync_meta_pages below.
>>>>>> + */
>>>>>> + discard_next_dnode(sbi);
>>>>>> +
>>>>>> /* Flush all the NAT/SIT pages */
>>>>>> while (get_pages(sbi, F2FS_DIRTY_META))
>>>>>> sync_meta_pages(sbi, META, LONG_MAX);
>>>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>>>>> index 2ecac83..2c5a5da 100644
>>>>>> --- a/fs/f2fs/f2fs.h
>>>>>> +++ b/fs/f2fs/f2fs.h
>>>>>> @@ -1179,6 +1179,7 @@ int f2fs_issue_flush(struct f2fs_sb_info *);
>>>>>> void invalidate_blocks(struct f2fs_sb_info *, block_t);
>>>>>> void refresh_sit_entry(struct f2fs_sb_info *, block_t, block_t);
>>>>>> void clear_prefree_segments(struct f2fs_sb_info *);
>>>>>> +void discard_next_dnode(struct f2fs_sb_info *);
>>>>>> int npages_for_summary_flush(struct f2fs_sb_info *);
>>>>>> void allocate_new_segments(struct f2fs_sb_info *);
>>>>>> struct page *get_sum_page(struct f2fs_sb_info *, unsigned int);
>>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>>>> index 1e264e7..9993f94 100644
>>>>>> --- a/fs/f2fs/segment.c
>>>>>> +++ b/fs/f2fs/segment.c
>>>>>> @@ -335,13 +335,26 @@ static void locate_dirty_segment(struct
>>>>>> f2fs_sb_info *sbi, unsigned int segno)
>>>>>> mutex_unlock(&dirty_i->seglist_lock);
>>>>>> }
>>>>>>
>>>>>> -static void f2fs_issue_discard(struct f2fs_sb_info *sbi,
>>>>>> +static int f2fs_issue_discard(struct f2fs_sb_info *sbi,
>>>>>> block_t blkstart, block_t blklen)
>>>>>> {
>>>>>> sector_t start = SECTOR_FROM_BLOCK(sbi, blkstart);
>>>>>> sector_t len = SECTOR_FROM_BLOCK(sbi, blklen);
>>>>>> - blkdev_issue_discard(sbi->sb->s_bdev, start, len, GFP_NOFS, 0);
>>>>>> trace_f2fs_issue_discard(sbi->sb, blkstart, blklen);
>>>>>> + return blkdev_issue_discard(sbi->sb->s_bdev, start, len, GFP_NOFS, 0);
>>>>>> +}
>>>>>> +
>>>>>> +void discard_next_dnode(struct f2fs_sb_info *sbi)
>>>>>> +{
>>>>>> + struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_WARM_NODE);
>>>>>> + block_t blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
>>>>>> +
>>>>>> + if (f2fs_issue_discard(sbi, blkaddr, 1)) {
>>>>>> + struct page *page = grab_meta_page(sbi, blkaddr);
>>>>>> + /* zero-filled page */
>>>>>> + set_page_dirty(page);
>>>>>> + f2fs_put_page(page, 1);
>>>>>> + }
>>>>>> }
>>>>>>
>>>>>> static void add_discard_addrs(struct f2fs_sb_info *sbi,
>>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
> .
>
Hello,
Using memory error detector reveals the following use-after-free error
in 3.15.0:
AddressSanitizer: heap-use-after-free in f2fs_evict_inode
Read of size 8 by thread T22279:
[<ffffffffa02d8702>] f2fs_evict_inode+0x102/0x2e0 [f2fs]
/home/tester/linux-sources/linux-kasan/fs/f2fs/f2fs.h:584
[<ffffffff812359af>] evict+0x15f/0x290
/home/tester/linux-sources/linux-kasan/fs/inode.c:550
[< inlined >] iput+0x196/0x280 iput_final
/home/tester/linux-sources/linux-kasan/fs/inode.c:1418
[<ffffffff812369a6>] iput+0x196/0x280
/home/tester/linux-sources/linux-kasan/fs/inode.c:1436
[<ffffffffa02dc416>] f2fs_put_super+0xd6/0x170 [f2fs]
/home/tester/linux-sources/linux-kasan/fs/f2fs/super.c:434
[<ffffffff81210095>] generic_shutdown_super+0xc5/0x1b0
/home/tester/linux-sources/linux-kasan/fs/super.c:406
[<ffffffff812105fd>] kill_block_super+0x4d/0xb0
/home/tester/linux-sources/linux-kasan/fs/super.c:1019
[<ffffffff81210a86>] deactivate_locked_super+0x66/0x80
/home/tester/linux-sources/linux-kasan/fs/super.c:284
[<ffffffff81211c98>] deactivate_super+0x68/0x80
/home/tester/linux-sources/linux-kasan/fs/super.c:307
[<ffffffff8123cc88>] mntput_no_expire+0x198/0x250
/home/tester/linux-sources/linux-kasan/fs/namespace.c:986 (discriminator 3)
[< inlined >] SyS_umount+0xe9/0x1a0 SYSC_umount
/home/tester/linux-sources/linux-kasan/fs/namespace.c:1424
[<ffffffff8123f1c9>] SyS_umount+0xe9/0x1a0
/home/tester/linux-sources/linux-kasan/fs/namespace.c:1392
[<ffffffff81cc8df9>] system_call_fastpath+0x16/0x1b
/home/tester/linux-sources/linux-kasan/arch/x86/kernel/entry_64.S:426
Freed by thread T3:
[<ffffffffa02dc337>] f2fs_i_callback+0x27/0x30 [f2fs]
/home/tester/linux-sources/linux-kasan/fs/f2fs/super.c:408
[< inlined >] rcu_process_callbacks+0x2d6/0x930 __rcu_reclaim
/home/tester/linux-sources/linux-kasan/kernel/rcu/rcu.h:114
[< inlined >] rcu_process_callbacks+0x2d6/0x930 rcu_do_batch
/home/tester/linux-sources/linux-kasan/kernel/rcu/tree.c:2242
[< inlined >] rcu_process_callbacks+0x2d6/0x930
invoke_rcu_callbacks
/home/tester/linux-sources/linux-kasan/kernel/rcu/tree.c:2499
[< inlined >] rcu_process_callbacks+0x2d6/0x930
__rcu_process_callbacks
/home/tester/linux-sources/linux-kasan/kernel/rcu/tree.c:2466
[<ffffffff810fd266>] rcu_process_callbacks+0x2d6/0x930
/home/tester/linux-sources/linux-kasan/kernel/rcu/tree.c:2483
[<ffffffff8107cce2>] __do_softirq+0x142/0x380
/home/tester/linux-sources/linux-kasan/kernel/softirq.c:269
[<ffffffff8107cf50>] run_ksoftirqd+0x30/0x50
/home/tester/linux-sources/linux-kasan/kernel/softirq.c:658
[<ffffffff810b2a87>] smpboot_thread_fn+0x197/0x280
/home/tester/linux-sources/linux-kasan/kernel/smpboot.c:160
[<ffffffff810a8238>] kthread+0x148/0x160
/home/tester/linux-sources/linux-kasan/kernel/kthread.c:207
[<ffffffff81cc8d4c>] ret_from_fork+0x7c/0xb0
/home/tester/linux-sources/linux-kasan/arch/x86/kernel/entry_64.S:351
Allocated by thread T22276:
[<ffffffffa02dc7dd>] f2fs_alloc_inode+0x2d/0x170 [f2fs]
/home/tester/linux-sources/linux-kasan/fs/f2fs/super.c:356
[<ffffffff8123471d>] alloc_inode+0x2d/0xe0
/home/tester/linux-sources/linux-kasan/fs/inode.c:208
[<ffffffff81235e2a>] iget_locked+0x10a/0x230
/home/tester/linux-sources/linux-kasan/fs/inode.c:1085
[<ffffffffa02d7495>] f2fs_iget+0x35/0xa80 [f2fs]
/home/tester/linux-sources/linux-kasan/fs/f2fs/inode.c:129
[<ffffffffa02e2393>] f2fs_fill_super+0xb53/0xff0 [f2fs]
/home/tester/linux-sources/linux-kasan/fs/f2fs/super.c:1021
[<ffffffff81211bce>] mount_bdev+0x1de/0x240
/home/tester/linux-sources/linux-kasan/fs/super.c:992
[<ffffffffa02dbce0>] f2fs_mount+0x10/0x20 [f2fs]
/home/tester/linux-sources/linux-kasan/fs/f2fs/super.c:1127
[<ffffffff81212a85>] mount_fs+0x55/0x220
/home/tester/linux-sources/linux-kasan/fs/super.c:1095
[<ffffffff8123c026>] vfs_kern_mount+0x66/0x200
/home/tester/linux-sources/linux-kasan/fs/namespace.c:851
[< inlined >] do_mount+0x2b4/0x1120 do_new_mount
/home/tester/linux-sources/linux-kasan/fs/namespace.c:2129
[<ffffffff812400d4>] do_mount+0x2b4/0x1120
/home/tester/linux-sources/linux-kasan/fs/namespace.c:2453
[< inlined >] SyS_mount+0xb2/0x110 SYSC_mount
/home/tester/linux-sources/linux-kasan/fs/namespace.c:2647
[<ffffffff812414a2>] SyS_mount+0xb2/0x110
/home/tester/linux-sources/linux-kasan/fs/namespace.c:2620
[<ffffffff81cc8df9>] system_call_fastpath+0x16/0x1b
/home/tester/linux-sources/linux-kasan/arch/x86/kernel/entry_64.S:426
The buggy address ffff8800587866c8 is located 48 bytes inside
of 680-byte region [ffff880058786698, ffff880058786940)
Memory state around the buggy address:
ffff880058786100: ffffffff ffffffff ffffffff ffffffff
ffff880058786200: ffffffff ffffffff ffffffrr rrrrrrrr
ffff880058786300: rrrrrrrr rrffffff ffffffff ffffffff
ffff880058786400: ffffffff ffffffff ffffffff ffffffff
ffff880058786500: ffffffff ffffffff ffffffff fffffffr
>ffff880058786600: rrrrrrrr rrrrrrrr rrrfffff ffffffff
^
ffff880058786700: ffffffff ffffffff ffffffff ffffffff
ffff880058786800: ffffffff ffffffff ffffffff ffffffff
ffff880058786900: ffffffff rrrrrrrr rrrrrrrr rrrr....
ffff880058786a00: ........ ........ ........ ........
ffff880058786b00: ........ ........ ........ ........
Legend:
f - 8 freed bytes
r - 8 redzone bytes
. - 8 allocated bytes
x=1..7 - x allocated bytes + (8-x) redzone bytes
Investigation shows, that f2fs_evict_inode, when called for
'meta_inode', uses invalidate_mapping_pages() for 'node_inode'.
But 'node_inode' is deleted before 'meta_inode' in f2fs_put_super via
iput().
It seems that in common usage scenario this use-after-free is benign,
because 'node_inode' remains partially valid data even after
kmem_cache_free().
But things may change if, while 'meta_inode' is evicted in one f2fs
filesystem, another (mounted) f2fs filesystem requests inode from cache,
and formely
'node_inode' of the first filesystem is returned.
Found by Linux File System Verification project (linuxtesting.org).
--
Best regards,
Andrey Tsyvarev
Linux Verification Center, ISPRAS
web:http://linuxtesting.org
Hi Andrey,
On 07/21/2014 06:56 PM, Andrey Tsyvarev wrote:
> Hello,
>
> Using memory error detector reveals the following use-after-free error in 3.15.0:
>
> AddressSanitizer: heap-use-after-free in f2fs_evict_inode
> Read of size 8 by thread T22279:
> [<ffffffffa02d8702>] f2fs_evict_inode+0x102/0x2e0 [f2fs] /home/tester/linux-sources/linux-kasan/fs/f2fs/f2fs.h:584
> [<ffffffff812359af>] evict+0x15f/0x290 /home/tester/linux-sources/linux-kasan/fs/inode.c:550
> [< inlined >] iput+0x196/0x280 iput_final /home/tester/linux-sources/linux-kasan/fs/inode.c:1418
> [<ffffffff812369a6>] iput+0x196/0x280 /home/tester/linux-sources/linux-kasan/fs/inode.c:1436
> [<ffffffffa02dc416>] f2fs_put_super+0xd6/0x170 [f2fs] /home/tester/linux-sources/linux-kasan/fs/f2fs/super.c:434
> [<ffffffff81210095>] generic_shutdown_super+0xc5/0x1b0 /home/tester/linux-sources/linux-kasan/fs/super.c:406
> [<ffffffff812105fd>] kill_block_super+0x4d/0xb0 /home/tester/linux-sources/linux-kasan/fs/super.c:1019
> [<ffffffff81210a86>] deactivate_locked_super+0x66/0x80 /home/tester/linux-sources/linux-kasan/fs/super.c:284
> [<ffffffff81211c98>] deactivate_super+0x68/0x80 /home/tester/linux-sources/linux-kasan/fs/super.c:307
> [<ffffffff8123cc88>] mntput_no_expire+0x198/0x250 /home/tester/linux-sources/linux-kasan/fs/namespace.c:986 (discriminator 3)
> [< inlined >] SyS_umount+0xe9/0x1a0 SYSC_umount /home/tester/linux-sources/linux-kasan/fs/namespace.c:1424
> [<ffffffff8123f1c9>] SyS_umount+0xe9/0x1a0 /home/tester/linux-sources/linux-kasan/fs/namespace.c:1392
> [<ffffffff81cc8df9>] system_call_fastpath+0x16/0x1b /home/tester/linux-sources/linux-kasan/arch/x86/kernel/entry_64.S:426
>
> Freed by thread T3:
> [<ffffffffa02dc337>] f2fs_i_callback+0x27/0x30 [f2fs] /home/tester/linux-sources/linux-kasan/fs/f2fs/super.c:408
> [< inlined >] rcu_process_callbacks+0x2d6/0x930 __rcu_reclaim /home/tester/linux-sources/linux-kasan/kernel/rcu/rcu.h:114
> [< inlined >] rcu_process_callbacks+0x2d6/0x930 rcu_do_batch /home/tester/linux-sources/linux-kasan/kernel/rcu/tree.c:2242
> [< inlined >] rcu_process_callbacks+0x2d6/0x930 invoke_rcu_callbacks /home/tester/linux-sources/linux-kasan/kernel/rcu/tree.c:2499
> [< inlined >] rcu_process_callbacks+0x2d6/0x930 __rcu_process_callbacks /home/tester/linux-sources/linux-kasan/kernel/rcu/tree.c:2466
> [<ffffffff810fd266>] rcu_process_callbacks+0x2d6/0x930 /home/tester/linux-sources/linux-kasan/kernel/rcu/tree.c:2483
> [<ffffffff8107cce2>] __do_softirq+0x142/0x380 /home/tester/linux-sources/linux-kasan/kernel/softirq.c:269
> [<ffffffff8107cf50>] run_ksoftirqd+0x30/0x50 /home/tester/linux-sources/linux-kasan/kernel/softirq.c:658
> [<ffffffff810b2a87>] smpboot_thread_fn+0x197/0x280 /home/tester/linux-sources/linux-kasan/kernel/smpboot.c:160
> [<ffffffff810a8238>] kthread+0x148/0x160 /home/tester/linux-sources/linux-kasan/kernel/kthread.c:207
> [<ffffffff81cc8d4c>] ret_from_fork+0x7c/0xb0 /home/tester/linux-sources/linux-kasan/arch/x86/kernel/entry_64.S:351
>
> Allocated by thread T22276:
> [<ffffffffa02dc7dd>] f2fs_alloc_inode+0x2d/0x170 [f2fs] /home/tester/linux-sources/linux-kasan/fs/f2fs/super.c:356
> [<ffffffff8123471d>] alloc_inode+0x2d/0xe0 /home/tester/linux-sources/linux-kasan/fs/inode.c:208
> [<ffffffff81235e2a>] iget_locked+0x10a/0x230 /home/tester/linux-sources/linux-kasan/fs/inode.c:1085
> [<ffffffffa02d7495>] f2fs_iget+0x35/0xa80 [f2fs] /home/tester/linux-sources/linux-kasan/fs/f2fs/inode.c:129
> [<ffffffffa02e2393>] f2fs_fill_super+0xb53/0xff0 [f2fs] /home/tester/linux-sources/linux-kasan/fs/f2fs/super.c:1021
> [<ffffffff81211bce>] mount_bdev+0x1de/0x240 /home/tester/linux-sources/linux-kasan/fs/super.c:992
> [<ffffffffa02dbce0>] f2fs_mount+0x10/0x20 [f2fs] /home/tester/linux-sources/linux-kasan/fs/f2fs/super.c:1127
> [<ffffffff81212a85>] mount_fs+0x55/0x220 /home/tester/linux-sources/linux-kasan/fs/super.c:1095
> [<ffffffff8123c026>] vfs_kern_mount+0x66/0x200 /home/tester/linux-sources/linux-kasan/fs/namespace.c:851
> [< inlined >] do_mount+0x2b4/0x1120 do_new_mount /home/tester/linux-sources/linux-kasan/fs/namespace.c:2129
> [<ffffffff812400d4>] do_mount+0x2b4/0x1120 /home/tester/linux-sources/linux-kasan/fs/namespace.c:2453
> [< inlined >] SyS_mount+0xb2/0x110 SYSC_mount /home/tester/linux-sources/linux-kasan/fs/namespace.c:2647
> [<ffffffff812414a2>] SyS_mount+0xb2/0x110 /home/tester/linux-sources/linux-kasan/fs/namespace.c:2620
> [<ffffffff81cc8df9>] system_call_fastpath+0x16/0x1b /home/tester/linux-sources/linux-kasan/arch/x86/kernel/entry_64.S:426
>
> The buggy address ffff8800587866c8 is located 48 bytes inside
> of 680-byte region [ffff880058786698, ffff880058786940)
>
> Memory state around the buggy address:
> ffff880058786100: ffffffff ffffffff ffffffff ffffffff
> ffff880058786200: ffffffff ffffffff ffffffrr rrrrrrrr
> ffff880058786300: rrrrrrrr rrffffff ffffffff ffffffff
> ffff880058786400: ffffffff ffffffff ffffffff ffffffff
> ffff880058786500: ffffffff ffffffff ffffffff fffffffr
>>ffff880058786600: rrrrrrrr rrrrrrrr rrrfffff ffffffff
> ^
> ffff880058786700: ffffffff ffffffff ffffffff ffffffff
> ffff880058786800: ffffffff ffffffff ffffffff ffffffff
> ffff880058786900: ffffffff rrrrrrrr rrrrrrrr rrrr....
> ffff880058786a00: ........ ........ ........ ........
> ffff880058786b00: ........ ........ ........ ........
> Legend:
> f - 8 freed bytes
> r - 8 redzone bytes
> . - 8 allocated bytes
> x=1..7 - x allocated bytes + (8-x) redzone bytes
>
>
> Investigation shows, that f2fs_evict_inode, when called for 'meta_inode', uses invalidate_mapping_pages() for 'node_inode'.
> But 'node_inode' is deleted before 'meta_inode' in f2fs_put_super via iput().
>
> It seems that in common usage scenario this use-after-free is benign, because 'node_inode' remains partially valid data even after kmem_cache_free().
> But things may change if, while 'meta_inode' is evicted in one f2fs filesystem, another (mounted) f2fs filesystem requests inode from cache, and formely
> 'node_inode' of the first filesystem is returned.
The analysis seems reasonable. Have you tried to swap the reclaim order of node_inde
and meta_inode?
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 870fe19..e114418 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -430,8 +430,8 @@ static void f2fs_put_super(struct super_block *sb)
if (sbi->s_dirty && get_pages(sbi, F2FS_DIRTY_NODES))
write_checkpoint(sbi, true);
- iput(sbi->node_inode);
iput(sbi->meta_inode);
+ iput(sbi->node_inode);
/* destroy f2fs internal modules */
destroy_node_manager(sbi);
Thanks,
Gu
>
>
> Found by Linux File System Verification project (linuxtesting.org).
>
>
Hi Gu,
>> Investigation shows, that f2fs_evict_inode, when called for 'meta_inode', uses invalidate_mapping_pages() for 'node_inode'.
>> But 'node_inode' is deleted before 'meta_inode' in f2fs_put_super via iput().
>>
>> It seems that in common usage scenario this use-after-free is benign, because 'node_inode' remains partially valid data even after kmem_cache_free().
>> But things may change if, while 'meta_inode' is evicted in one f2fs filesystem, another (mounted) f2fs filesystem requests inode from cache, and formely
>> 'node_inode' of the first filesystem is returned.
> The analysis seems reasonable. Have you tried to swap the reclaim order of node_inde
> and meta_inode?
>
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 870fe19..e114418 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -430,8 +430,8 @@ static void f2fs_put_super(struct super_block *sb)
> if (sbi->s_dirty && get_pages(sbi, F2FS_DIRTY_NODES))
> write_checkpoint(sbi, true);
>
> - iput(sbi->node_inode);
> iput(sbi->meta_inode);
> + iput(sbi->node_inode);
>
> /* destroy f2fs internal modules */
> destroy_node_manager(sbi);
>
> Thanks,
> Gu
With reclaim order of node_inode and meta_inode swapped, use-after-free
error disappears.
But shouldn't initialization order of these inodes be swapped too?
As meta_inode uses node_inode, it seems logical that it should be
initialized after it.
--
Best regards,
Andrey Tsyvarev
Linux Verification Center, ISPRAS
web:http://linuxtesting.org
Hi Andrey Gu,
> -----Original Message-----
> From: Andrey Tsyvarev [mailto:[email protected]]
> Sent: Tuesday, July 22, 2014 6:04 PM
> To: Gu Zheng
> Cc: Jaegeuk Kim; linux-kernel; Alexey Khoroshilov; [email protected]
> Subject: Re: [f2fs-dev] f2fs: Possible use-after-free when umount filesystem
>
> Hi Gu,
>
> >> Investigation shows, that f2fs_evict_inode, when called for 'meta_inode', uses
> invalidate_mapping_pages() for 'node_inode'.
> >> But 'node_inode' is deleted before 'meta_inode' in f2fs_put_super via iput().
> >>
> >> It seems that in common usage scenario this use-after-free is benign, because 'node_inode'
> remains partially valid data even after kmem_cache_free().
> >> But things may change if, while 'meta_inode' is evicted in one f2fs filesystem, another (mounted)
> f2fs filesystem requests inode from cache, and formely
> >> 'node_inode' of the first filesystem is returned.
> > The analysis seems reasonable. Have you tried to swap the reclaim order of node_inde
> > and meta_inode?
> >
> > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> > index 870fe19..e114418 100644
> > --- a/fs/f2fs/super.c
> > +++ b/fs/f2fs/super.c
> > @@ -430,8 +430,8 @@ static void f2fs_put_super(struct super_block *sb)
> > if (sbi->s_dirty && get_pages(sbi, F2FS_DIRTY_NODES))
> > write_checkpoint(sbi, true);
> >
> > - iput(sbi->node_inode);
> > iput(sbi->meta_inode);
> > + iput(sbi->node_inode);
> >
> > /* destroy f2fs internal modules */
> > destroy_node_manager(sbi);
> >
> > Thanks,
> > Gu
>
> With reclaim order of node_inode and meta_inode swapped, use-after-free
> error disappears.
>
> But shouldn't initialization order of these inodes be swapped too?
> As meta_inode uses node_inode, it seems logical that it should be
> initialized after it.
IMO, it's not easy to exchange order of initialization between meta_inode and
node_inode, because we should use meta_inode in get_valid_checkpoint for valid
cp first for usual verification, then init node_inode.
As I checked, nids for both meta_inode and node_inode are reservation, so it's not
necessary for us to invalidate pages which will never alloced.
How about skipping it as following?
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 2cf6962..cafba3c 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -273,7 +273,7 @@ void f2fs_evict_inode(struct inode *inode)
if (inode->i_ino == F2FS_NODE_INO(sbi) ||
inode->i_ino == F2FS_META_INO(sbi))
- goto no_delete;
+ goto out_clear;
f2fs_bug_on(get_dirty_dents(inode));
remove_dirty_dir_inode(inode);
@@ -295,6 +295,7 @@ void f2fs_evict_inode(struct inode *inode)
sb_end_intwrite(inode->i_sb);
no_delete:
- clear_inode(inode);
invalidate_mapping_pages(NODE_MAPPING(sbi), inode->i_ino, inode->i_ino);
+out_clear:
+ clear_inode(inode);
}
>
> --
> Best regards,
>
> Andrey Tsyvarev
> Linux Verification Center, ISPRAS
> web:http://linuxtesting.org
>
>
> ------------------------------------------------------------------------------
> Want fast and easy access to all the code in your enterprise? Index and
> search up to 200,000 lines of code with a free copy of Black Duck
> Code Sight - the same software that powers the world's largest code
> search on Ohloh, the Black Duck Open Hub! Try it now.
> http://p.sf.net/sfu/bds
> _______________________________________________
> Linux-f2fs-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Hi,
On 07/23/2014 10:12 AM, Chao Yu wrote:
> Hi Andrey Gu,
>
>> -----Original Message-----
>> From: Andrey Tsyvarev [mailto:[email protected]]
>> Sent: Tuesday, July 22, 2014 6:04 PM
>> To: Gu Zheng
>> Cc: Jaegeuk Kim; linux-kernel; Alexey Khoroshilov; [email protected]
>> Subject: Re: [f2fs-dev] f2fs: Possible use-after-free when umount filesystem
>>
>> Hi Gu,
>>
>>>> Investigation shows, that f2fs_evict_inode, when called for 'meta_inode', uses
>> invalidate_mapping_pages() for 'node_inode'.
>>>> But 'node_inode' is deleted before 'meta_inode' in f2fs_put_super via iput().
>>>>
>>>> It seems that in common usage scenario this use-after-free is benign, because 'node_inode'
>> remains partially valid data even after kmem_cache_free().
>>>> But things may change if, while 'meta_inode' is evicted in one f2fs filesystem, another (mounted)
>> f2fs filesystem requests inode from cache, and formely
>>>> 'node_inode' of the first filesystem is returned.
>>> The analysis seems reasonable. Have you tried to swap the reclaim order of node_inde
>>> and meta_inode?
>>>
>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>>> index 870fe19..e114418 100644
>>> --- a/fs/f2fs/super.c
>>> +++ b/fs/f2fs/super.c
>>> @@ -430,8 +430,8 @@ static void f2fs_put_super(struct super_block *sb)
>>> if (sbi->s_dirty && get_pages(sbi, F2FS_DIRTY_NODES))
>>> write_checkpoint(sbi, true);
>>>
>>> - iput(sbi->node_inode);
>>> iput(sbi->meta_inode);
>>> + iput(sbi->node_inode);
>>>
>>> /* destroy f2fs internal modules */
>>> destroy_node_manager(sbi);
>>>
>>> Thanks,
>>> Gu
>>
>> With reclaim order of node_inode and meta_inode swapped, use-after-free
>> error disappears.
>>
>> But shouldn't initialization order of these inodes be swapped too?
>> As meta_inode uses node_inode, it seems logical that it should be
>> initialized after it.
The initialization order dose not affect anything, so swapping the order dose not
make more sense here.
>
> IMO, it's not easy to exchange order of initialization between meta_inode and
> node_inode, because we should use meta_inode in get_valid_checkpoint for valid
> cp first for usual verification, then init node_inode.
Yeah, but I think just moving node_inode's initialization to the front of meta_inode
dose not break anything.
>
> As I checked, nids for both meta_inode and node_inode are reservation, so it's not
> necessary for us to invalidate pages which will never alloced.
>
> How about skipping it as following?
It seems the right way to fix this issue.
To Andrey:
Could you please try this one?
Thanks,
Gu
>
> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
> index 2cf6962..cafba3c 100644
> --- a/fs/f2fs/inode.c
> +++ b/fs/f2fs/inode.c
> @@ -273,7 +273,7 @@ void f2fs_evict_inode(struct inode *inode)
>
> if (inode->i_ino == F2FS_NODE_INO(sbi) ||
> inode->i_ino == F2FS_META_INO(sbi))
> - goto no_delete;
> + goto out_clear;
>
> f2fs_bug_on(get_dirty_dents(inode));
> remove_dirty_dir_inode(inode);
> @@ -295,6 +295,7 @@ void f2fs_evict_inode(struct inode *inode)
>
> sb_end_intwrite(inode->i_sb);
> no_delete:
> - clear_inode(inode);
> invalidate_mapping_pages(NODE_MAPPING(sbi), inode->i_ino, inode->i_ino);
> +out_clear:
> + clear_inode(inode);
> }
>
>>
>> --
>> Best regards,
>>
>> Andrey Tsyvarev
>> Linux Verification Center, ISPRAS
>> web:http://linuxtesting.org
>>
>>
>> ------------------------------------------------------------------------------
>> Want fast and easy access to all the code in your enterprise? Index and
>> search up to 200,000 lines of code with a free copy of Black Duck
>> Code Sight - the same software that powers the world's largest code
>> search on Ohloh, the Black Duck Open Hub! Try it now.
>> http://p.sf.net/sfu/bds
>> _______________________________________________
>> Linux-f2fs-devel mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
>
> .
>
Hi,
With patch skipping invalidating pages for node_inode and meta_inode
use-after-free error disappears too.
23.07.2014 7:39, Gu Zheng пишет:
> Hi,
> On 07/23/2014 10:12 AM, Chao Yu wrote:
>
>> Hi Andrey Gu,
>>
>>> -----Original Message-----
>>> From: Andrey Tsyvarev [mailto:[email protected]]
>>> Sent: Tuesday, July 22, 2014 6:04 PM
>>> To: Gu Zheng
>>> Cc: Jaegeuk Kim; linux-kernel; Alexey Khoroshilov; [email protected]
>>> Subject: Re: [f2fs-dev] f2fs: Possible use-after-free when umount filesystem
>>>
>>> Hi Gu,
>>>
>>>>> Investigation shows, that f2fs_evict_inode, when called for 'meta_inode', uses
>>> invalidate_mapping_pages() for 'node_inode'.
>>>>> But 'node_inode' is deleted before 'meta_inode' in f2fs_put_super via iput().
>>>>>
>>>>> It seems that in common usage scenario this use-after-free is benign, because 'node_inode'
>>> remains partially valid data even after kmem_cache_free().
>>>>> But things may change if, while 'meta_inode' is evicted in one f2fs filesystem, another (mounted)
>>> f2fs filesystem requests inode from cache, and formely
>>>>> 'node_inode' of the first filesystem is returned.
>>>> The analysis seems reasonable. Have you tried to swap the reclaim order of node_inde
>>>> and meta_inode?
>>>>
>>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>>>> index 870fe19..e114418 100644
>>>> --- a/fs/f2fs/super.c
>>>> +++ b/fs/f2fs/super.c
>>>> @@ -430,8 +430,8 @@ static void f2fs_put_super(struct super_block *sb)
>>>> if (sbi->s_dirty && get_pages(sbi, F2FS_DIRTY_NODES))
>>>> write_checkpoint(sbi, true);
>>>>
>>>> - iput(sbi->node_inode);
>>>> iput(sbi->meta_inode);
>>>> + iput(sbi->node_inode);
>>>>
>>>> /* destroy f2fs internal modules */
>>>> destroy_node_manager(sbi);
>>>>
>>>> Thanks,
>>>> Gu
>>> With reclaim order of node_inode and meta_inode swapped, use-after-free
>>> error disappears.
>>>
>>> But shouldn't initialization order of these inodes be swapped too?
>>> As meta_inode uses node_inode, it seems logical that it should be
>>> initialized after it.
> The initialization order dose not affect anything, so swapping the order dose not
> make more sense here.
>
>> IMO, it's not easy to exchange order of initialization between meta_inode and
>> node_inode, because we should use meta_inode in get_valid_checkpoint for valid
>> cp first for usual verification, then init node_inode.
> Yeah, but I think just moving node_inode's initialization to the front of meta_inode
> dose not break anything.
>
>> As I checked, nids for both meta_inode and node_inode are reservation, so it's not
>> necessary for us to invalidate pages which will never alloced.
>>
>> How about skipping it as following?
> It seems the right way to fix this issue.
>
> To Andrey:
> Could you please try this one?
>
> Thanks,
> Gu
>
>> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
>> index 2cf6962..cafba3c 100644
>> --- a/fs/f2fs/inode.c
>> +++ b/fs/f2fs/inode.c
>> @@ -273,7 +273,7 @@ void f2fs_evict_inode(struct inode *inode)
>>
>> if (inode->i_ino == F2FS_NODE_INO(sbi) ||
>> inode->i_ino == F2FS_META_INO(sbi))
>> - goto no_delete;
>> + goto out_clear;
>>
>> f2fs_bug_on(get_dirty_dents(inode));
>> remove_dirty_dir_inode(inode);
>> @@ -295,6 +295,7 @@ void f2fs_evict_inode(struct inode *inode)
>>
>> sb_end_intwrite(inode->i_sb);
>> no_delete:
>> - clear_inode(inode);
>> invalidate_mapping_pages(NODE_MAPPING(sbi), inode->i_ino, inode->i_ino);
>> +out_clear:
>> + clear_inode(inode);
>> }
>>
>>> --
>>> Best regards,
>>>
>>> Andrey Tsyvarev
>>> Linux Verification Center, ISPRAS
>>> web:http://linuxtesting.org
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Want fast and easy access to all the code in your enterprise? Index and
>>> search up to 200,000 lines of code with a free copy of Black Duck
>>> Code Sight - the same software that powers the world's largest code
>>> search on Ohloh, the Black Duck Open Hub! Try it now.
>>> http://p.sf.net/sfu/bds
>>> _______________________________________________
>>> Linux-f2fs-devel mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
>> .
>>
>
>
--
Best regards,
Andrey Tsyvarev
Linux Verification Center, ISPRAS
web:http://linuxtesting.org
Hi,
To Andrey:
Thanks for your test on this patch!
To Gu:
If you do not object, let me make and resend a patch base on the one which
skip invalidating pages.
Regards,
Yu
> -----Original Message-----
> From: Andrey Tsyvarev [mailto:[email protected]]
> Sent: Thursday, July 24, 2014 6:15 PM
> To: Gu Zheng; Chao Yu
> Cc: 'Jaegeuk Kim'; 'linux-kernel'; 'Alexey Khoroshilov';
> [email protected]
> Subject: Re: [f2fs-dev] f2fs: Possible use-after-free when umount filesystem
>
> Hi,
>
> With patch skipping invalidating pages for node_inode and meta_inode
> use-after-free error disappears too.
>
> 23.07.2014 7:39, Gu Zheng пишет:
> > Hi,
> > On 07/23/2014 10:12 AM, Chao Yu wrote:
> >
> >> Hi Andrey Gu,
> >>
> >>> -----Original Message-----
> >>> From: Andrey Tsyvarev [mailto:[email protected]]
> >>> Sent: Tuesday, July 22, 2014 6:04 PM
> >>> To: Gu Zheng
> >>> Cc: Jaegeuk Kim; linux-kernel; Alexey Khoroshilov; [email protected]
> >>> Subject: Re: [f2fs-dev] f2fs: Possible use-after-free when umount filesystem
> >>>
> >>> Hi Gu,
> >>>
> >>>>> Investigation shows, that f2fs_evict_inode, when called for 'meta_inode', uses
> >>> invalidate_mapping_pages() for 'node_inode'.
> >>>>> But 'node_inode' is deleted before 'meta_inode' in f2fs_put_super via iput().
> >>>>>
> >>>>> It seems that in common usage scenario this use-after-free is benign, because 'node_inode'
> >>> remains partially valid data even after kmem_cache_free().
> >>>>> But things may change if, while 'meta_inode' is evicted in one f2fs filesystem, another
> (mounted)
> >>> f2fs filesystem requests inode from cache, and formely
> >>>>> 'node_inode' of the first filesystem is returned.
> >>>> The analysis seems reasonable. Have you tried to swap the reclaim order of node_inde
> >>>> and meta_inode?
> >>>>
> >>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> >>>> index 870fe19..e114418 100644
> >>>> --- a/fs/f2fs/super.c
> >>>> +++ b/fs/f2fs/super.c
> >>>> @@ -430,8 +430,8 @@ static void f2fs_put_super(struct super_block *sb)
> >>>> if (sbi->s_dirty && get_pages(sbi, F2FS_DIRTY_NODES))
> >>>> write_checkpoint(sbi, true);
> >>>>
> >>>> - iput(sbi->node_inode);
> >>>> iput(sbi->meta_inode);
> >>>> + iput(sbi->node_inode);
> >>>>
> >>>> /* destroy f2fs internal modules */
> >>>> destroy_node_manager(sbi);
> >>>>
> >>>> Thanks,
> >>>> Gu
> >>> With reclaim order of node_inode and meta_inode swapped, use-after-free
> >>> error disappears.
> >>>
> >>> But shouldn't initialization order of these inodes be swapped too?
> >>> As meta_inode uses node_inode, it seems logical that it should be
> >>> initialized after it.
> > The initialization order dose not affect anything, so swapping the order dose not
> > make more sense here.
> >
> >> IMO, it's not easy to exchange order of initialization between meta_inode and
> >> node_inode, because we should use meta_inode in get_valid_checkpoint for valid
> >> cp first for usual verification, then init node_inode.
> > Yeah, but I think just moving node_inode's initialization to the front of meta_inode
> > dose not break anything.
> >
> >> As I checked, nids for both meta_inode and node_inode are reservation, so it's not
> >> necessary for us to invalidate pages which will never alloced.
> >>
> >> How about skipping it as following?
> > It seems the right way to fix this issue.
> >
> > To Andrey:
> > Could you please try this one?
> >
> > Thanks,
> > Gu
> >
> >> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
> >> index 2cf6962..cafba3c 100644
> >> --- a/fs/f2fs/inode.c
> >> +++ b/fs/f2fs/inode.c
> >> @@ -273,7 +273,7 @@ void f2fs_evict_inode(struct inode *inode)
> >>
> >> if (inode->i_ino == F2FS_NODE_INO(sbi) ||
> >> inode->i_ino == F2FS_META_INO(sbi))
> >> - goto no_delete;
> >> + goto out_clear;
> >>
> >> f2fs_bug_on(get_dirty_dents(inode));
> >> remove_dirty_dir_inode(inode);
> >> @@ -295,6 +295,7 @@ void f2fs_evict_inode(struct inode *inode)
> >>
> >> sb_end_intwrite(inode->i_sb);
> >> no_delete:
> >> - clear_inode(inode);
> >> invalidate_mapping_pages(NODE_MAPPING(sbi), inode->i_ino, inode->i_ino);
> >> +out_clear:
> >> + clear_inode(inode);
> >> }
> >>
> >>> --
> >>> Best regards,
> >>>
> >>> Andrey Tsyvarev
> >>> Linux Verification Center, ISPRAS
> >>> web:http://linuxtesting.org
> >>>
> >>>
> >>> ------------------------------------------------------------------------------
> >>> Want fast and easy access to all the code in your enterprise? Index and
> >>> search up to 200,000 lines of code with a free copy of Black Duck
> >>> Code Sight - the same software that powers the world's largest code
> >>> search on Ohloh, the Black Duck Open Hub! Try it now.
> >>> http://p.sf.net/sfu/bds
> >>> _______________________________________________
> >>> Linux-f2fs-devel mailing list
> >>> [email protected]
> >>> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> >> .
> >>
> >
> >
>
> --
> Best regards,
>
> Andrey Tsyvarev
> Linux Verification Center, ISPRAS
> web:http://linuxtesting.org
On 07/25/2014 11:22 AM, Chao Yu wrote:
> Hi,
>
> To Andrey:
> Thanks for your test on this patch!
>
> To Gu:
> If you do not object, let me make and resend a patch base on the one which
> skip invalidating pages.
Please go ahead.:)
Thanks,
Gu
>
> Regards,
> Yu
>
>> -----Original Message-----
>> From: Andrey Tsyvarev [mailto:[email protected]]
>> Sent: Thursday, July 24, 2014 6:15 PM
>> To: Gu Zheng; Chao Yu
>> Cc: 'Jaegeuk Kim'; 'linux-kernel'; 'Alexey Khoroshilov';
>> [email protected]
>> Subject: Re: [f2fs-dev] f2fs: Possible use-after-free when umount filesystem
>>
>> Hi,
>>
>> With patch skipping invalidating pages for node_inode and meta_inode
>> use-after-free error disappears too.
>>
>> 23.07.2014 7:39, Gu Zheng пишет:
>>> Hi,
>>> On 07/23/2014 10:12 AM, Chao Yu wrote:
>>>
>>>> Hi Andrey Gu,
>>>>
>>>>> -----Original Message-----
>>>>> From: Andrey Tsyvarev [mailto:[email protected]]
>>>>> Sent: Tuesday, July 22, 2014 6:04 PM
>>>>> To: Gu Zheng
>>>>> Cc: Jaegeuk Kim; linux-kernel; Alexey Khoroshilov; [email protected]
>>>>> Subject: Re: [f2fs-dev] f2fs: Possible use-after-free when umount filesystem
>>>>>
>>>>> Hi Gu,
>>>>>
>>>>>>> Investigation shows, that f2fs_evict_inode, when called for 'meta_inode', uses
>>>>> invalidate_mapping_pages() for 'node_inode'.
>>>>>>> But 'node_inode' is deleted before 'meta_inode' in f2fs_put_super via iput().
>>>>>>>
>>>>>>> It seems that in common usage scenario this use-after-free is benign, because 'node_inode'
>>>>> remains partially valid data even after kmem_cache_free().
>>>>>>> But things may change if, while 'meta_inode' is evicted in one f2fs filesystem, another
>> (mounted)
>>>>> f2fs filesystem requests inode from cache, and formely
>>>>>>> 'node_inode' of the first filesystem is returned.
>>>>>> The analysis seems reasonable. Have you tried to swap the reclaim order of node_inde
>>>>>> and meta_inode?
>>>>>>
>>>>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>>>>>> index 870fe19..e114418 100644
>>>>>> --- a/fs/f2fs/super.c
>>>>>> +++ b/fs/f2fs/super.c
>>>>>> @@ -430,8 +430,8 @@ static void f2fs_put_super(struct super_block *sb)
>>>>>> if (sbi->s_dirty && get_pages(sbi, F2FS_DIRTY_NODES))
>>>>>> write_checkpoint(sbi, true);
>>>>>>
>>>>>> - iput(sbi->node_inode);
>>>>>> iput(sbi->meta_inode);
>>>>>> + iput(sbi->node_inode);
>>>>>>
>>>>>> /* destroy f2fs internal modules */
>>>>>> destroy_node_manager(sbi);
>>>>>>
>>>>>> Thanks,
>>>>>> Gu
>>>>> With reclaim order of node_inode and meta_inode swapped, use-after-free
>>>>> error disappears.
>>>>>
>>>>> But shouldn't initialization order of these inodes be swapped too?
>>>>> As meta_inode uses node_inode, it seems logical that it should be
>>>>> initialized after it.
>>> The initialization order dose not affect anything, so swapping the order dose not
>>> make more sense here.
>>>
>>>> IMO, it's not easy to exchange order of initialization between meta_inode and
>>>> node_inode, because we should use meta_inode in get_valid_checkpoint for valid
>>>> cp first for usual verification, then init node_inode.
>>> Yeah, but I think just moving node_inode's initialization to the front of meta_inode
>>> dose not break anything.
>>>
>>>> As I checked, nids for both meta_inode and node_inode are reservation, so it's not
>>>> necessary for us to invalidate pages which will never alloced.
>>>>
>>>> How about skipping it as following?
>>> It seems the right way to fix this issue.
>>>
>>> To Andrey:
>>> Could you please try this one?
>>>
>>> Thanks,
>>> Gu
>>>
>>>> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
>>>> index 2cf6962..cafba3c 100644
>>>> --- a/fs/f2fs/inode.c
>>>> +++ b/fs/f2fs/inode.c
>>>> @@ -273,7 +273,7 @@ void f2fs_evict_inode(struct inode *inode)
>>>>
>>>> if (inode->i_ino == F2FS_NODE_INO(sbi) ||
>>>> inode->i_ino == F2FS_META_INO(sbi))
>>>> - goto no_delete;
>>>> + goto out_clear;
>>>>
>>>> f2fs_bug_on(get_dirty_dents(inode));
>>>> remove_dirty_dir_inode(inode);
>>>> @@ -295,6 +295,7 @@ void f2fs_evict_inode(struct inode *inode)
>>>>
>>>> sb_end_intwrite(inode->i_sb);
>>>> no_delete:
>>>> - clear_inode(inode);
>>>> invalidate_mapping_pages(NODE_MAPPING(sbi), inode->i_ino, inode->i_ino);
>>>> +out_clear:
>>>> + clear_inode(inode);
>>>> }
>>>>
>>>>> --
>>>>> Best regards,
>>>>>
>>>>> Andrey Tsyvarev
>>>>> Linux Verification Center, ISPRAS
>>>>> web:http://linuxtesting.org
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Want fast and easy access to all the code in your enterprise? Index and
>>>>> search up to 200,000 lines of code with a free copy of Black Duck
>>>>> Code Sight - the same software that powers the world's largest code
>>>>> search on Ohloh, the Black Duck Open Hub! Try it now.
>>>>> http://p.sf.net/sfu/bds
>>>>> _______________________________________________
>>>>> Linux-f2fs-devel mailing list
>>>>> [email protected]
>>>>> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
>>>> .
>>>>
>>>
>>>
>>
>> --
>> Best regards,
>>
>> Andrey Tsyvarev
>> Linux Verification Center, ISPRAS
>> web:http://linuxtesting.org
>
> .
>
Thank you guys.
I merged two patches. :)
--
Jaegeuk Kim
2014-07-24 22:49 GMT-07:00 Gu Zheng <[email protected]>:
> On 07/25/2014 11:22 AM, Chao Yu wrote:
>
>> Hi,
>>
>> To Andrey:
>> Thanks for your test on this patch!
>>
>> To Gu:
>> If you do not object, let me make and resend a patch base on the one which
>> skip invalidating pages.
>
> Please go ahead.:)
>
> Thanks,
> Gu
>
>>
>> Regards,
>> Yu
>>
>>> -----Original Message-----
>>> From: Andrey Tsyvarev [mailto:[email protected]]
>>> Sent: Thursday, July 24, 2014 6:15 PM
>>> To: Gu Zheng; Chao Yu
>>> Cc: 'Jaegeuk Kim'; 'linux-kernel'; 'Alexey Khoroshilov';
>>> [email protected]
>>> Subject: Re: [f2fs-dev] f2fs: Possible use-after-free when umount filesystem
>>>
>>> Hi,
>>>
>>> With patch skipping invalidating pages for node_inode and meta_inode
>>> use-after-free error disappears too.
>>>
>>> 23.07.2014 7:39, Gu Zheng пишет:
>>>> Hi,
>>>> On 07/23/2014 10:12 AM, Chao Yu wrote:
>>>>
>>>>> Hi Andrey Gu,
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Andrey Tsyvarev [mailto:[email protected]]
>>>>>> Sent: Tuesday, July 22, 2014 6:04 PM
>>>>>> To: Gu Zheng
>>>>>> Cc: Jaegeuk Kim; linux-kernel; Alexey Khoroshilov; [email protected]
>>>>>> Subject: Re: [f2fs-dev] f2fs: Possible use-after-free when umount filesystem
>>>>>>
>>>>>> Hi Gu,
>>>>>>
>>>>>>>> Investigation shows, that f2fs_evict_inode, when called for 'meta_inode', uses
>>>>>> invalidate_mapping_pages() for 'node_inode'.
>>>>>>>> But 'node_inode' is deleted before 'meta_inode' in f2fs_put_super via iput().
>>>>>>>>
>>>>>>>> It seems that in common usage scenario this use-after-free is benign, because 'node_inode'
>>>>>> remains partially valid data even after kmem_cache_free().
>>>>>>>> But things may change if, while 'meta_inode' is evicted in one f2fs filesystem, another
>>> (mounted)
>>>>>> f2fs filesystem requests inode from cache, and formely
>>>>>>>> 'node_inode' of the first filesystem is returned.
>>>>>>> The analysis seems reasonable. Have you tried to swap the reclaim order of node_inde
>>>>>>> and meta_inode?
>>>>>>>
>>>>>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>>>>>>> index 870fe19..e114418 100644
>>>>>>> --- a/fs/f2fs/super.c
>>>>>>> +++ b/fs/f2fs/super.c
>>>>>>> @@ -430,8 +430,8 @@ static void f2fs_put_super(struct super_block *sb)
>>>>>>> if (sbi->s_dirty && get_pages(sbi, F2FS_DIRTY_NODES))
>>>>>>> write_checkpoint(sbi, true);
>>>>>>>
>>>>>>> - iput(sbi->node_inode);
>>>>>>> iput(sbi->meta_inode);
>>>>>>> + iput(sbi->node_inode);
>>>>>>>
>>>>>>> /* destroy f2fs internal modules */
>>>>>>> destroy_node_manager(sbi);
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Gu
>>>>>> With reclaim order of node_inode and meta_inode swapped, use-after-free
>>>>>> error disappears.
>>>>>>
>>>>>> But shouldn't initialization order of these inodes be swapped too?
>>>>>> As meta_inode uses node_inode, it seems logical that it should be
>>>>>> initialized after it.
>>>> The initialization order dose not affect anything, so swapping the order dose not
>>>> make more sense here.
>>>>
>>>>> IMO, it's not easy to exchange order of initialization between meta_inode and
>>>>> node_inode, because we should use meta_inode in get_valid_checkpoint for valid
>>>>> cp first for usual verification, then init node_inode.
>>>> Yeah, but I think just moving node_inode's initialization to the front of meta_inode
>>>> dose not break anything.
>>>>
>>>>> As I checked, nids for both meta_inode and node_inode are reservation, so it's not
>>>>> necessary for us to invalidate pages which will never alloced.
>>>>>
>>>>> How about skipping it as following?
>>>> It seems the right way to fix this issue.
>>>>
>>>> To Andrey:
>>>> Could you please try this one?
>>>>
>>>> Thanks,
>>>> Gu
>>>>
>>>>> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
>>>>> index 2cf6962..cafba3c 100644
>>>>> --- a/fs/f2fs/inode.c
>>>>> +++ b/fs/f2fs/inode.c
>>>>> @@ -273,7 +273,7 @@ void f2fs_evict_inode(struct inode *inode)
>>>>>
>>>>> if (inode->i_ino == F2FS_NODE_INO(sbi) ||
>>>>> inode->i_ino == F2FS_META_INO(sbi))
>>>>> - goto no_delete;
>>>>> + goto out_clear;
>>>>>
>>>>> f2fs_bug_on(get_dirty_dents(inode));
>>>>> remove_dirty_dir_inode(inode);
>>>>> @@ -295,6 +295,7 @@ void f2fs_evict_inode(struct inode *inode)
>>>>>
>>>>> sb_end_intwrite(inode->i_sb);
>>>>> no_delete:
>>>>> - clear_inode(inode);
>>>>> invalidate_mapping_pages(NODE_MAPPING(sbi), inode->i_ino, inode->i_ino);
>>>>> +out_clear:
>>>>> + clear_inode(inode);
>>>>> }
>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>>
>>>>>> Andrey Tsyvarev
>>>>>> Linux Verification Center, ISPRAS
>>>>>> web:http://linuxtesting.org
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>> Want fast and easy access to all the code in your enterprise? Index and
>>>>>> search up to 200,000 lines of code with a free copy of Black Duck
>>>>>> Code Sight - the same software that powers the world's largest code
>>>>>> search on Ohloh, the Black Duck Open Hub! Try it now.
>>>>>> http://p.sf.net/sfu/bds
>>>>>> _______________________________________________
>>>>>> Linux-f2fs-devel mailing list
>>>>>> [email protected]
>>>>>> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
>>>>> .
>>>>>
>>>>
>>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Andrey Tsyvarev
>>> Linux Verification Center, ISPRAS
>>> web:http://linuxtesting.org
>>
>> .
>>
>
>