2012-02-15 11:48:59

by Amit Sahrawat

[permalink] [raw]
Subject: [PATCH 1/1] xfs: fix buffer flushing during log unmount

Whenever there is a mount/unmount failure - there is a chance of calling the
callbacks functions once - transaction ail mount pointer is destroyed. So, it results
in NULL pointer exception followed by hang. So, before unmount of the log - flush all
the pending buffers.

Signed-off-by: Amit Sahrawat <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
---
fs/xfs/xfs_log.c | 10 ++++++++++
fs/xfs/xfs_mount.c | 9 ---------
2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index e2cc356..b924a5b 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -739,6 +739,16 @@ xfs_log_unmount_write(xfs_mount_t *mp)
void
xfs_log_unmount(xfs_mount_t *mp)
{
+ int error = 0;
+ /*
+ * Make sure all buffers have been flushed and completed before
+ * unmounting the log.
+ */
+ error = xfs_flush_buftarg(mp->m_ddev_targp, 1);
+ if (error)
+ cmn_err(CE_WARN, "%d busy buffers during log unmount.", error);
+ xfs_wait_buftarg(mp->m_ddev_targp);
+
xfs_trans_ail_destroy(mp);
xlog_dealloc_log(mp->m_log);
}
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index d06afbc..3bd2246 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1519,15 +1519,6 @@ xfs_unmountfs(
"Freespace may not be correct on next mount.");
xfs_unmountfs_writesb(mp);

- /*
- * Make sure all buffers have been flushed and completed before
- * unmounting the log.
- */
- error = xfs_flush_buftarg(mp->m_ddev_targp, 1);
- if (error)
- xfs_warn(mp, "%d busy buffers during unmount.", error);
- xfs_wait_buftarg(mp->m_ddev_targp);
-
xfs_log_unmount_write(mp);
xfs_log_unmount(mp);
xfs_uuid_unmount(mp);
--
1.7.2.3


2012-02-15 12:26:56

by Amit Sahrawat

[permalink] [raw]
Subject: Re: [PATCH 1/1] xfs: fix buffer flushing during log unmount

Errror logs on 3.0.18
Architecture: ARM
Just tried to mount a corrupted USB HDD - which resulted in the below
mentioned behaviour.

#> mount -t xfs /dev/sdb3 /mnt/
XFS (sdb3): Mounting Filesystem
XFS (sdb3): Starting recovery (logdev: internal)
e420d000: 3f b5 ce 5d 15 3b 64 e2 bb b4 f2 9b a0 97 f5 f4 ?..].;d.........
XFS (sdb3): Internal error xfs_btree_check_sblock at line 120 of file
fs/xfs/xfs_btree.c. Caller 0xc012c444

[<c003a008>] (unwind_backtrace+0x0/0xe4) from [<c013c990>]
(xfs_corruption_error+0x54/0x70)
[<c013c990>] (xfs_corruption_error+0x54/0x70) from [<c012c310>]
(xfs_btree_check_sblock+0xe4/0xf8)
[<c012c310>] (xfs_btree_check_sblock+0xe4/0xf8) from [<c012c444>]
(xfs_btree_read_buf_block+0x78/0x98)
[<c012c444>] (xfs_btree_read_buf_block+0x78/0x98) from [<c012e0b0>]
(xfs_btree_rshift+0xb0/0x508)
[<c012e0b0>] (xfs_btree_rshift+0xb0/0x508) from [<c012e5c4>]
(xfs_btree_make_block_unfull+0xbc/0x168)
[<c012e5c4>] (xfs_btree_make_block_unfull+0xbc/0x168) from
[<c012e854>] (xfs_btree_insrec+0x1e4/0x504)
[<c012e854>] (xfs_btree_insrec+0x1e4/0x504) from [<c012ebd8>]
(xfs_btree_insert+0x64/0x15c)
[<c012ebd8>] (xfs_btree_insert+0x64/0x15c) from [<c011a51c>]
(xfs_free_ag_extent+0x478/0x5a8)
[<c011a51c>] (xfs_free_ag_extent+0x478/0x5a8) from [<c011af1c>]
(xfs_free_extent+0xcc/0x108)
[<c011af1c>] (xfs_free_extent+0xcc/0x108) from [<c014d2b4>]
(xlog_recover_process_efi+0x168/0x1d4)
[<c014d2b4>] (xlog_recover_process_efi+0x168/0x1d4) from [<c014d380>]
(xlog_recover_process_efis+0x60/0xac)
[<c014d380>] (xlog_recover_process_efis+0x60/0xac) from [<c014d8b4>]
(xlog_recover_finish+0x18/0x90)
[<c014d8b4>] (xlog_recover_finish+0x18/0x90) from [<c0154390>]
(xfs_mountfs+0x4c8/0x5c4)
[<c0154390>] (xfs_mountfs+0x4c8/0x5c4) from [<c0167c5c>]
(xfs_fs_fill_super+0x150/0x244)
[<c0167c5c>] (xfs_fs_fill_super+0x150/0x244) from [<c00c05f4>]
(mount_bdev+0x120/0x19c)
[<c00c05f4>] (mount_bdev+0x120/0x19c) from [<c0166198>] (xfs_fs_mount+0x10/0x18)
[<c0166198>] (xfs_fs_mount+0x10/0x18) from [<c00bf39c>] (mount_fs+0x10/0xb8)
[<c00bf39c>] (mount_fs+0x10/0xb8) from [<c00d6d50>] (vfs_kern_mount+0x50/0x88)
[<c00d6d50>] (vfs_kern_mount+0x50/0x88) from [<c00d700c>]
(do_kern_mount+0x34/0xc8)
[<c00d700c>] (do_kern_mount+0x34/0xc8) from [<c00d8420>] (do_mount+0x620/0x688)
[<c00d8420>] (do_mount+0x620/0x688) from [<c00d850c>] (sys_mount+0x84/0xc4)
[<c00d850c>] (sys_mount+0x84/0xc4) from [<c0034260>] (ret_fast_syscall+0x0/0x30)
XFS (sdb3): Corruption detected. Unmount and run xfs_repair
XFS (sdb3): Internal error xfs_trans_cancel at line 1928 of file
fs/xfs/xfs_trans.c. Caller 0xc014d314

[<c003a008>] (unwind_backtrace+0x0/0xe4) from [<c0156e34>]
(xfs_trans_cancel+0x70/0xfc)
[<c0156e34>] (xfs_trans_cancel+0x70/0xfc) from [<c014d314>]
(xlog_recover_process_efi+0x1c8/0x1d4)
[<c014d314>] (xlog_recover_process_efi+0x1c8/0x1d4) from [<c014d380>]
(xlog_recover_process_efis+0x60/0xac)
[<c014d380>] (xlog_recover_process_efis+0x60/0xac) from [<c014d8b4>]
(xlog_recover_finish+0x18/0x90)
[<c014d8b4>] (xlog_recover_finish+0x18/0x90) from [<c0154390>]
(xfs_mountfs+0x4c8/0x5c4)
[<c0154390>] (xfs_mountfs+0x4c8/0x5c4) from [<c0167c5c>]
(xfs_fs_fill_super+0x150/0x244)
[<c0167c5c>] (xfs_fs_fill_super+0x150/0x244) from [<c00c05f4>]
(mount_bdev+0x120/0x19c)
[<c00c05f4>] (mount_bdev+0x120/0x19c) from [<c0166198>] (xfs_fs_mount+0x10/0x18)
[<c0166198>] (xfs_fs_mount+0x10/0x18) from [<c00bf39c>] (mount_fs+0x10/0xb8)
[<c00bf39c>] (mount_fs+0x10/0xb8) from [<c00d6d50>] (vfs_kern_mount+0x50/0x88)
[<c00d6d50>] (vfs_kern_mount+0x50/0x88) from [<c00d700c>]
(do_kern_mount+0x34/0xc8)
[<c00d700c>] (do_kern_mount+0x34/0xc8) from [<c00d8420>] (do_mount+0x620/0x688)
[<c00d8420>] (do_mount+0x620/0x688) from [<c00d850c>] (sys_mount+0x84/0xc4)
[<c00d850c>] (sys_mount+0x84/0xc4) from [<c0034260>] (ret_fast_syscall+0x0/0x30)
XFS (sdb3): xfs_do_force_shutdown(0x8) called from line 1929 of file
fs/xfs/xfs_trans.c. Return address = 0xc0156e48
XFS (sdb3): Corruption of in-memory data detected. Shutting down filesystem
XFS (sdb3): Please umount the filesystem and rectify the problem(s)
XFS (sdb3): Failed to recover EFIs
XFS (sdb3): log mount finish failed
Unable to handle kernel paging request at virtual address ffffffff
pgd = e80bc000
[ffffffff] *pgd=68ffc821, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 Not tainted (3.0.18 #17)
PC is at strnlen+0x10/0x28
LR is at string+0x34/0xcc
pc : [<c01964e0>] lr : [<c0197ad8>] psr: a0000093
sp : e424fca0 ip : 00000000 fp : 00000400
r10: e424fd8c r9 : 00000002 r8 : ffffffff
r7 : 00000000 r6 : 0000ffff r5 : c03abca8 r4 : c03ab8b0
r3 : 00000000 r2 : ffffffff r1 : ffffffff r0 : ffffffff
Flags: NzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user
Control: 10c53c7d Table: 680bc04a DAC: 00000015
Process mount (pid: 656, stack limit = 0xe424e2f0)
Stack: (0xe424fca0 to 0xe4250000)
...
...
ffe0: 00000000 be88470c 000436dc 00009604 a0000010 be884b4f 45d65600 04000000
[<c01964e0>] (strnlen+0x10/0x28) from [<c0197ad8>] (string+0x34/0xcc)
[<c0197ad8>] (string+0x34/0xcc) from [<c0198874>] (vsnprintf+0x1bc/0x344)
[<c0198874>] (vsnprintf+0x1bc/0x344) from [<c0198a68>] (vscnprintf+0xc/0x24)
[<c0198a68>] (vscnprintf+0xc/0x24) from [<c0057424>] (vprintk+0x14c/0x3fc)
[<c0057424>] (vprintk+0x14c/0x3fc) from [<c0293820>] (printk+0x18/0x24)
[<c0293820>] (printk+0x18/0x24) from [<c01660c8>] (xfs_alert_tag+0x64/0x98)
[<c01660c8>] (xfs_alert_tag+0x64/0x98) from [<c0158034>]
(xfs_trans_ail_delete_bulk+0x74/0x118)
[<c0158034>] (xfs_trans_ail_delete_bulk+0x74/0x118) from [<c012fe80>]
(xfs_buf_iodone+0x2c/0x38)
[<c012fe80>] (xfs_buf_iodone+0x2c/0x38) from [<c012fe30>]
(xfs_buf_do_callbacks+0x28/0x38)
[<c012fe30>] (xfs_buf_do_callbacks+0x28/0x38) from [<c012fffc>]
(xfs_buf_iodone_callbacks+0x13c/0x164)
[<c012fffc>] (xfs_buf_iodone_callbacks+0x13c/0x164) from [<c015ffc4>]
(xfs_buf_iodone_work+0x1c/0x40)
[<c015ffc4>] (xfs_buf_iodone_work+0x1c/0x40) from [<c0160194>]
(xfs_bioerror+0x44/0x4c)
[<c0160194>] (xfs_bioerror+0x44/0x4c) from [<c016075c>]
(xfs_flush_buftarg+0xcc/0x148)
[<c016075c>] (xfs_flush_buftarg+0xcc/0x148) from [<c01607f8>]
(xfs_free_buftarg+0x20/0x5c)
[<c01607f8>] (xfs_free_buftarg+0x20/0x5c) from [<c0167cd8>]
(xfs_fs_fill_super+0x1cc/0x244)
[<c0167cd8>] (xfs_fs_fill_super+0x1cc/0x244) from [<c00c05f4>]
(mount_bdev+0x120/0x19c)
[<c00c05f4>] (mount_bdev+0x120/0x19c) from [<c0166198>] (xfs_fs_mount+0x10/0x18)
[<c0166198>] (xfs_fs_mount+0x10/0x18) from [<c00bf39c>] (mount_fs+0x10/0xb8)
[<c00bf39c>] (mount_fs+0x10/0xb8) from [<c00d6d50>] (vfs_kern_mount+0x50/0x88)
[<c00d6d50>] (vfs_kern_mount+0x50/0x88) from [<c00d700c>]
(do_kern_mount+0x34/0xc8)
[<c00d700c>] (do_kern_mount+0x34/0xc8) from [<c00d8420>] (do_mount+0x620/0x688)
[<c00d8420>] (do_mount+0x620/0x688) from [<c00d850c>] (sys_mount+0x84/0xc4)
[<c00d850c>] (sys_mount+0x84/0xc4) from [<c0034260>] (ret_fast_syscall+0x0/0x30)
Code: e3a03000 e1510003 e0832000 0a000003 (e7d0c003)
---[ end trace 9fae26d925820746 ]---
note: mount[656] exited with preempt_count 2
Segmentation fault
#>
#>

Regards,
Amit Sahrawat

On Wed, Feb 15, 2012 at 5:26 PM, Amit Sahrawat
<[email protected]> wrote:
> Whenever there is a mount/unmount failure - there is a chance of calling the
> callbacks functions once - transaction ail mount pointer is destroyed. So, it results
> in NULL pointer exception followed by hang. So, before unmount of the log - flush all
> the pending buffers.
>
> Signed-off-by: Amit Sahrawat <[email protected]>
> Signed-off-by: Namjae Jeon <[email protected]>
> ---
> ?fs/xfs/xfs_log.c ? | ? 10 ++++++++++
> ?fs/xfs/xfs_mount.c | ? ?9 ---------
> ?2 files changed, 10 insertions(+), 9 deletions(-)
>
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index e2cc356..b924a5b 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -739,6 +739,16 @@ xfs_log_unmount_write(xfs_mount_t *mp)
> ?void
> ?xfs_log_unmount(xfs_mount_t *mp)
> ?{
> + ? ? ? int error = 0;
> + ? ? ? /*
> + ? ? ? ?* Make sure all buffers have been flushed and completed before
> + ? ? ? ?* unmounting the log.
> + ? ? ? ?*/
> + ? ? ? error = xfs_flush_buftarg(mp->m_ddev_targp, 1);
> + ? ? ? if (error)
> + ? ? ? ? ? ? ? cmn_err(CE_WARN, "%d busy buffers during log unmount.", error);
> + ? ? ? xfs_wait_buftarg(mp->m_ddev_targp);
> +
> ? ? ? ?xfs_trans_ail_destroy(mp);
> ? ? ? ?xlog_dealloc_log(mp->m_log);
> ?}
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index d06afbc..3bd2246 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -1519,15 +1519,6 @@ xfs_unmountfs(
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?"Freespace may not be correct on next mount.");
> ? ? ? ?xfs_unmountfs_writesb(mp);
>
> - ? ? ? /*
> - ? ? ? ?* Make sure all buffers have been flushed and completed before
> - ? ? ? ?* unmounting the log.
> - ? ? ? ?*/
> - ? ? ? error = xfs_flush_buftarg(mp->m_ddev_targp, 1);
> - ? ? ? if (error)
> - ? ? ? ? ? ? ? xfs_warn(mp, "%d busy buffers during unmount.", error);
> - ? ? ? xfs_wait_buftarg(mp->m_ddev_targp);
> -
> ? ? ? ?xfs_log_unmount_write(mp);
> ? ? ? ?xfs_log_unmount(mp);
> ? ? ? ?xfs_uuid_unmount(mp);
> --
> 1.7.2.3
>

2012-02-17 05:39:00

by Namjae Jeon

[permalink] [raw]
Subject: Re: [PATCH 1/1] xfs: fix buffer flushing during log unmount

Hi. Dave, Christoph.

Would you please review this patch ?

Thanks.

2012/2/15 Amit Sahrawat <[email protected]>:
> Errror logs on 3.0.18
> Architecture: ARM
> Just tried to mount a corrupted USB HDD - which resulted in the below
> mentioned behaviour.
>
> #> mount -t xfs /dev/sdb3 /mnt/
> XFS (sdb3): Mounting Filesystem
> XFS (sdb3): Starting recovery (logdev: internal)
> e420d000: 3f b5 ce 5d 15 3b 64 e2 bb b4 f2 9b a0 97 f5 f4  ?..].;d.........
> XFS (sdb3): Internal error xfs_btree_check_sblock at line 120 of file
> fs/xfs/xfs_btree.c.  Caller 0xc012c444
>
> [<c003a008>] (unwind_backtrace+0x0/0xe4) from [<c013c990>]
> (xfs_corruption_error+0x54/0x70)
> [<c013c990>] (xfs_corruption_error+0x54/0x70) from [<c012c310>]
> (xfs_btree_check_sblock+0xe4/0xf8)
> [<c012c310>] (xfs_btree_check_sblock+0xe4/0xf8) from [<c012c444>]
> (xfs_btree_read_buf_block+0x78/0x98)
> [<c012c444>] (xfs_btree_read_buf_block+0x78/0x98) from [<c012e0b0>]
> (xfs_btree_rshift+0xb0/0x508)
> [<c012e0b0>] (xfs_btree_rshift+0xb0/0x508) from [<c012e5c4>]
> (xfs_btree_make_block_unfull+0xbc/0x168)
> [<c012e5c4>] (xfs_btree_make_block_unfull+0xbc/0x168) from
> [<c012e854>] (xfs_btree_insrec+0x1e4/0x504)
> [<c012e854>] (xfs_btree_insrec+0x1e4/0x504) from [<c012ebd8>]
> (xfs_btree_insert+0x64/0x15c)
> [<c012ebd8>] (xfs_btree_insert+0x64/0x15c) from [<c011a51c>]
> (xfs_free_ag_extent+0x478/0x5a8)
> [<c011a51c>] (xfs_free_ag_extent+0x478/0x5a8) from [<c011af1c>]
> (xfs_free_extent+0xcc/0x108)
> [<c011af1c>] (xfs_free_extent+0xcc/0x108) from [<c014d2b4>]
> (xlog_recover_process_efi+0x168/0x1d4)
> [<c014d2b4>] (xlog_recover_process_efi+0x168/0x1d4) from [<c014d380>]
> (xlog_recover_process_efis+0x60/0xac)
> [<c014d380>] (xlog_recover_process_efis+0x60/0xac) from [<c014d8b4>]
> (xlog_recover_finish+0x18/0x90)
> [<c014d8b4>] (xlog_recover_finish+0x18/0x90) from [<c0154390>]
> (xfs_mountfs+0x4c8/0x5c4)
> [<c0154390>] (xfs_mountfs+0x4c8/0x5c4) from [<c0167c5c>]
> (xfs_fs_fill_super+0x150/0x244)
> [<c0167c5c>] (xfs_fs_fill_super+0x150/0x244) from [<c00c05f4>]
> (mount_bdev+0x120/0x19c)
> [<c00c05f4>] (mount_bdev+0x120/0x19c) from [<c0166198>] (xfs_fs_mount+0x10/0x18)
> [<c0166198>] (xfs_fs_mount+0x10/0x18) from [<c00bf39c>] (mount_fs+0x10/0xb8)
> [<c00bf39c>] (mount_fs+0x10/0xb8) from [<c00d6d50>] (vfs_kern_mount+0x50/0x88)
> [<c00d6d50>] (vfs_kern_mount+0x50/0x88) from [<c00d700c>]
> (do_kern_mount+0x34/0xc8)
> [<c00d700c>] (do_kern_mount+0x34/0xc8) from [<c00d8420>] (do_mount+0x620/0x688)
> [<c00d8420>] (do_mount+0x620/0x688) from [<c00d850c>] (sys_mount+0x84/0xc4)
> [<c00d850c>] (sys_mount+0x84/0xc4) from [<c0034260>] (ret_fast_syscall+0x0/0x30)
> XFS (sdb3): Corruption detected. Unmount and run xfs_repair
> XFS (sdb3): Internal error xfs_trans_cancel at line 1928 of file
> fs/xfs/xfs_trans.c.  Caller 0xc014d314
>
> [<c003a008>] (unwind_backtrace+0x0/0xe4) from [<c0156e34>]
> (xfs_trans_cancel+0x70/0xfc)
> [<c0156e34>] (xfs_trans_cancel+0x70/0xfc) from [<c014d314>]
> (xlog_recover_process_efi+0x1c8/0x1d4)
> [<c014d314>] (xlog_recover_process_efi+0x1c8/0x1d4) from [<c014d380>]
> (xlog_recover_process_efis+0x60/0xac)
> [<c014d380>] (xlog_recover_process_efis+0x60/0xac) from [<c014d8b4>]
> (xlog_recover_finish+0x18/0x90)
> [<c014d8b4>] (xlog_recover_finish+0x18/0x90) from [<c0154390>]
> (xfs_mountfs+0x4c8/0x5c4)
> [<c0154390>] (xfs_mountfs+0x4c8/0x5c4) from [<c0167c5c>]
> (xfs_fs_fill_super+0x150/0x244)
> [<c0167c5c>] (xfs_fs_fill_super+0x150/0x244) from [<c00c05f4>]
> (mount_bdev+0x120/0x19c)
> [<c00c05f4>] (mount_bdev+0x120/0x19c) from [<c0166198>] (xfs_fs_mount+0x10/0x18)
> [<c0166198>] (xfs_fs_mount+0x10/0x18) from [<c00bf39c>] (mount_fs+0x10/0xb8)
> [<c00bf39c>] (mount_fs+0x10/0xb8) from [<c00d6d50>] (vfs_kern_mount+0x50/0x88)
> [<c00d6d50>] (vfs_kern_mount+0x50/0x88) from [<c00d700c>]
> (do_kern_mount+0x34/0xc8)
> [<c00d700c>] (do_kern_mount+0x34/0xc8) from [<c00d8420>] (do_mount+0x620/0x688)
> [<c00d8420>] (do_mount+0x620/0x688) from [<c00d850c>] (sys_mount+0x84/0xc4)
> [<c00d850c>] (sys_mount+0x84/0xc4) from [<c0034260>] (ret_fast_syscall+0x0/0x30)
> XFS (sdb3): xfs_do_force_shutdown(0x8) called from line 1929 of file
> fs/xfs/xfs_trans.c.  Return address = 0xc0156e48
> XFS (sdb3): Corruption of in-memory data detected.  Shutting down filesystem
> XFS (sdb3): Please umount the filesystem and rectify the problem(s)
> XFS (sdb3): Failed to recover EFIs
> XFS (sdb3): log mount finish failed
> Unable to handle kernel paging request at virtual address ffffffff
> pgd = e80bc000
> [ffffffff] *pgd=68ffc821, *pte=00000000, *ppte=00000000
> Internal error: Oops: 17 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 1    Not tainted  (3.0.18 #17)
> PC is at strnlen+0x10/0x28
> LR is at string+0x34/0xcc
> pc : [<c01964e0>]    lr : [<c0197ad8>]    psr: a0000093
> sp : e424fca0  ip : 00000000  fp : 00000400
> r10: e424fd8c  r9 : 00000002  r8 : ffffffff
> r7 : 00000000  r6 : 0000ffff  r5 : c03abca8  r4 : c03ab8b0
> r3 : 00000000  r2 : ffffffff  r1 : ffffffff  r0 : ffffffff
> Flags: NzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
> Control: 10c53c7d  Table: 680bc04a  DAC: 00000015
> Process mount (pid: 656, stack limit = 0xe424e2f0)
> Stack: (0xe424fca0 to 0xe4250000)
> ...
> ...
> ffe0: 00000000 be88470c 000436dc 00009604 a0000010 be884b4f 45d65600 04000000
> [<c01964e0>] (strnlen+0x10/0x28) from [<c0197ad8>] (string+0x34/0xcc)
> [<c0197ad8>] (string+0x34/0xcc) from [<c0198874>] (vsnprintf+0x1bc/0x344)
> [<c0198874>] (vsnprintf+0x1bc/0x344) from [<c0198a68>] (vscnprintf+0xc/0x24)
> [<c0198a68>] (vscnprintf+0xc/0x24) from [<c0057424>] (vprintk+0x14c/0x3fc)
> [<c0057424>] (vprintk+0x14c/0x3fc) from [<c0293820>] (printk+0x18/0x24)
> [<c0293820>] (printk+0x18/0x24) from [<c01660c8>] (xfs_alert_tag+0x64/0x98)
> [<c01660c8>] (xfs_alert_tag+0x64/0x98) from [<c0158034>]
> (xfs_trans_ail_delete_bulk+0x74/0x118)
> [<c0158034>] (xfs_trans_ail_delete_bulk+0x74/0x118) from [<c012fe80>]
> (xfs_buf_iodone+0x2c/0x38)
> [<c012fe80>] (xfs_buf_iodone+0x2c/0x38) from [<c012fe30>]
> (xfs_buf_do_callbacks+0x28/0x38)
> [<c012fe30>] (xfs_buf_do_callbacks+0x28/0x38) from [<c012fffc>]
> (xfs_buf_iodone_callbacks+0x13c/0x164)
> [<c012fffc>] (xfs_buf_iodone_callbacks+0x13c/0x164) from [<c015ffc4>]
> (xfs_buf_iodone_work+0x1c/0x40)
> [<c015ffc4>] (xfs_buf_iodone_work+0x1c/0x40) from [<c0160194>]
> (xfs_bioerror+0x44/0x4c)
> [<c0160194>] (xfs_bioerror+0x44/0x4c) from [<c016075c>]
> (xfs_flush_buftarg+0xcc/0x148)
> [<c016075c>] (xfs_flush_buftarg+0xcc/0x148) from [<c01607f8>]
> (xfs_free_buftarg+0x20/0x5c)
> [<c01607f8>] (xfs_free_buftarg+0x20/0x5c) from [<c0167cd8>]
> (xfs_fs_fill_super+0x1cc/0x244)
> [<c0167cd8>] (xfs_fs_fill_super+0x1cc/0x244) from [<c00c05f4>]
> (mount_bdev+0x120/0x19c)
> [<c00c05f4>] (mount_bdev+0x120/0x19c) from [<c0166198>] (xfs_fs_mount+0x10/0x18)
> [<c0166198>] (xfs_fs_mount+0x10/0x18) from [<c00bf39c>] (mount_fs+0x10/0xb8)
> [<c00bf39c>] (mount_fs+0x10/0xb8) from [<c00d6d50>] (vfs_kern_mount+0x50/0x88)
> [<c00d6d50>] (vfs_kern_mount+0x50/0x88) from [<c00d700c>]
> (do_kern_mount+0x34/0xc8)
> [<c00d700c>] (do_kern_mount+0x34/0xc8) from [<c00d8420>] (do_mount+0x620/0x688)
> [<c00d8420>] (do_mount+0x620/0x688) from [<c00d850c>] (sys_mount+0x84/0xc4)
> [<c00d850c>] (sys_mount+0x84/0xc4) from [<c0034260>] (ret_fast_syscall+0x0/0x30)
> Code: e3a03000 e1510003 e0832000 0a000003 (e7d0c003)
> ---[ end trace 9fae26d925820746 ]---
> note: mount[656] exited with preempt_count 2
> Segmentation fault
> #>
> #>
>
> Regards,
> Amit Sahrawat
>
> On Wed, Feb 15, 2012 at 5:26 PM, Amit Sahrawat
> <[email protected]> wrote:
>> Whenever there is a mount/unmount failure - there is a chance of calling the
>> callbacks functions once - transaction ail mount pointer is destroyed. So, it results
>> in NULL pointer exception followed by hang. So, before unmount of the log - flush all
>> the pending buffers.
>>
>> Signed-off-by: Amit Sahrawat <[email protected]>
>> Signed-off-by: Namjae Jeon <[email protected]>
>> ---
>>  fs/xfs/xfs_log.c   |   10 ++++++++++
>>  fs/xfs/xfs_mount.c |    9 ---------
>>  2 files changed, 10 insertions(+), 9 deletions(-)
>>
>> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
>> index e2cc356..b924a5b 100644
>> --- a/fs/xfs/xfs_log.c
>> +++ b/fs/xfs/xfs_log.c
>> @@ -739,6 +739,16 @@ xfs_log_unmount_write(xfs_mount_t *mp)
>>  void
>>  xfs_log_unmount(xfs_mount_t *mp)
>>  {
>> +       int error = 0;
>> +       /*
>> +        * Make sure all buffers have been flushed and completed before
>> +        * unmounting the log.
>> +        */
>> +       error = xfs_flush_buftarg(mp->m_ddev_targp, 1);
>> +       if (error)
>> +               cmn_err(CE_WARN, "%d busy buffers during log unmount.", error);
>> +       xfs_wait_buftarg(mp->m_ddev_targp);
>> +
>>        xfs_trans_ail_destroy(mp);
>>        xlog_dealloc_log(mp->m_log);
>>  }
>> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
>> index d06afbc..3bd2246 100644
>> --- a/fs/xfs/xfs_mount.c
>> +++ b/fs/xfs/xfs_mount.c
>> @@ -1519,15 +1519,6 @@ xfs_unmountfs(
>>                                "Freespace may not be correct on next mount.");
>>        xfs_unmountfs_writesb(mp);
>>
>> -       /*
>> -        * Make sure all buffers have been flushed and completed before
>> -        * unmounting the log.
>> -        */
>> -       error = xfs_flush_buftarg(mp->m_ddev_targp, 1);
>> -       if (error)
>> -               xfs_warn(mp, "%d busy buffers during unmount.", error);
>> -       xfs_wait_buftarg(mp->m_ddev_targp);
>> -
>>        xfs_log_unmount_write(mp);
>>        xfs_log_unmount(mp);
>>        xfs_uuid_unmount(mp);
>> --
>> 1.7.2.3
>>

2012-02-17 19:15:31

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 1/1] xfs: fix buffer flushing during log unmount

> Whenever there is a mount/unmount failure - there is a chance of calling the
> callbacks functions once - transaction ail mount pointer is destroyed. So, it results
> in NULL pointer exception followed by hang. So, before unmount of the log - flush all
> the pending buffers.

> void
> xfs_log_unmount(xfs_mount_t *mp)
> {
> + int error = 0;
> + /*
> + * Make sure all buffers have been flushed and completed before
> + * unmounting the log.
> + */
> + error = xfs_flush_buftarg(mp->m_ddev_targp, 1);
> + if (error)
> + cmn_err(CE_WARN, "%d busy buffers during log unmount.", error);
> + xfs_wait_buftarg(mp->m_ddev_targp);
> +

We do exactly that sequence before the xfs_log_unmount_write call on
umount. Care to explain what code in xfs_log_unmount_write would
require this to be called again?

2012-02-18 16:30:14

by Amit Sahrawat

[permalink] [raw]
Subject: Re: [PATCH 1/1] xfs: fix buffer flushing during log unmount

Hi Christoph,

There are ?2? scenarios which results in the similar problem related
to flushing. Both related to handling of buffer callbacks after the
corresponding mount point passed in the transactions is destroyed.

The first one reported 2 months back (in umount path) ? which was
related with the asynchronous callback of buf-iodone handlers being
called after the freeing up of the mount point in:
void xfs_log_unmount(xfs_mount_t *mp)
{
xfs_trans_ail_destroy(mp);
xlog_dealloc_log(mp->m_log);
}

The complete tracing information for the first issue is available at:
http://patchwork.xfs.org/patch/2485/

For which the solution was also provided by you as -> xfs: fix buffer
flushing during unmount
(http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=87c7bec7fc3377b3873eb3a0f4b603981ea16ebb)

Now, we have encountered a similar issue in mount failure path.
In this scenarios also ? it is the calling of ?xfs_log_umount? without
waiting for the dependent buffer callbacks to be flushed.
xfs_mountfs()
{
?
?
error = xfs_log_mount_finish(mp);
if (error) {
cmn_err(CE_WARN, "XFS: log mount finish failed");
goto out_rtunmount;
}
?
..
out_log_dealloc:
xfs_log_unmount(mp);
?
}
xfs_fs_fill_super()
{
?
error = xfs_mountfs(mp);
if (error)
goto out_filestream_unmount;
?
?
out_filestream_unmount:
xfs_filestream_unmount(mp);
out_free_sb:
xfs_freesb(mp);
out_destroy_counters:
xfs_icsb_destroy_counters(mp);
xfs_close_devices(mp);
}

Now, in xfs_close_devices() ? it tries to flush all the pending buffer
callbacks ? but because xfs_log_umount has resulted in destroying of
ail mountpoint in xfs_trans_ail_destroy(mp);
This again results in crash at
xfs_buf_iodone()
{
?
xfs_trans_ail_delete(ailp, (xfs_log_item_t *)bip);
?
}
The backtrace and exact problem was shared as part of the reply to the
patch submission mail.

So a solution to this ? also we need to flush buffers.

And because it is a similar situation and also ? we can make this a
generic solution.
The fix from the patch
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=87c7bec7fc3377b3873eb3a0f4b603981ea16ebb
was removed and instead tried to put it at one place which could solve
the similar issues.

Please let me know incase I am not able to clarify properly.

Thanks & Regards,
Amit Sahrawat

On Sat, Feb 18, 2012 at 12:45 AM, Christoph Hellwig <[email protected]> wrote:
>> Whenever there is a mount/unmount failure - there is a chance of calling the
>> callbacks functions once - transaction ail mount pointer is destroyed. So, it results
>> in NULL pointer exception followed by hang. So, before unmount of the log - flush all
>> the pending buffers.
>
>> ?void
>> ?xfs_log_unmount(xfs_mount_t *mp)
>> ?{
>> + ? ? int error = 0;
>> + ? ? /*
>> + ? ? ?* Make sure all buffers have been flushed and completed before
>> + ? ? ?* unmounting the log.
>> + ? ? ?*/
>> + ? ? error = xfs_flush_buftarg(mp->m_ddev_targp, 1);
>> + ? ? if (error)
>> + ? ? ? ? ? ? cmn_err(CE_WARN, "%d busy buffers during log unmount.", error);
>> + ? ? xfs_wait_buftarg(mp->m_ddev_targp);
>> +
>
> We do exactly that sequence before the xfs_log_unmount_write call on
> umount. ?Care to explain what code in xfs_log_unmount_write would
> require this to be called again?
>

2012-02-19 22:27:40

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 1/1] xfs: fix buffer flushing during log unmount

Hi Amit,

thanks, your explanation makes sense. Can you resend the patch with
the full description of what is going on? (or if Ben wants to edit it in
place before applying it's also fine with me).

Thanks a lot for tracking this down.

Reviewed-by: Christoph Hellwig <[email protected]>

2012-02-20 01:33:08

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH 1/1] xfs: fix buffer flushing during log unmount


[ Just as an administritive note, please don't top-post. It's bad
form and makes it hard to follow threads.
http://idallen.com/topposting.html

There's also a lot of random control characters in your messages
that make quoting hard - can you make sure you are using clean 7-bit
ascii for your messages? ]

On Sat, Feb 18, 2012 at 10:00:11PM +0530, Amit Sahrawat wrote:
> On Sat, Feb 18, 2012 at 12:45 AM, Christoph Hellwig <[email protected]> wrote:
> >> Whenever there is a mount/unmount failure - there is a chance of calling the
> >> callbacks functions once - transaction ail mount pointer is destroyed. So, it results
> >> in NULL pointer exception followed by hang. So, before unmount of the log - flush all
> >> the pending buffers.
> >
> >> void
> >> xfs_log_unmount(xfs_mount_t *mp)
> >> {
> >> + int error = 0;
> >> + /*
> >> + * Make sure all buffers have been flushed and completed before
> >> + * unmounting the log.
> >> + */
> >> + error = xfs_flush_buftarg(mp->m_ddev_targp, 1);
> >> + if (error)
> >> + cmn_err(CE_WARN, "%d busy buffers during log unmount.", error);
> >> + xfs_wait_buftarg(mp->m_ddev_targp);
> >> +

BTW, this won't compile - cmn_err() doesn't exist anymore.

> > We do exactly that sequence before the xfs_log_unmount_write call on
> > umount. Care to explain what code in xfs_log_unmount_write would
> > require this to be called again?
>
> There are scenarios which results in the similar problem related
> to flushing. Both related to handling of buffer callbacks after the
> corresponding mount point passed in the transactions is destroyed.
>
> The first one reported 2 months back (in umount path) which was
> related with the asynchronous callback of buf-iodone handlers being
> called after the freeing up of the mount point in:
> void xfs_log_unmount(xfs_mount_t *mp)
> {
> xfs_trans_ail_destroy(mp);
> xlog_dealloc_log(mp->m_log);
> }
.....

So the question is, why is the mount failure path leaving stuff
around in the AIL?

".....
XFS (sdb3): Failed to recover EFIs
XFS (sdb3): log mount finish failed
...."

Came from your crash log. So the problem is that the log recovery
code is not cleaning up after a failure and leaving stale recovery
items in the AIL. i.e. xlog_recover_process_efis() is just returning
an error after a recovery failure and not doing any cleanup at all.
EFI recovery uses the AIL, so it's guaranteed to have stuff in the
AIL at this point regardless of success or failure ofthe recovery.
IOWs, it is assuming that an error has caused the filesystem to shut
down and so none of the work it has done will get to disk.

In this case, however, we probably should be be flushing the log and
AIL list in xfs_log_mount_finish() regardless of the error state of
the function - we want all the items on disk and the recovery
process complete before we start doing real work.

Indeed, in the first phase of recovery (xlog_do_recover) we flush
the buftarg on before checking for a shutdown situation. I think we
need to do this flushing in both phases of log recovery regardless
of the error state, and the EFI recovery will also need a log force
before flushing to ensure that the extent free transactions that
recovery runs hit the disk so we can flush the modified metadata
from the AIL....

That then ensures that on either a successful or unsuccessful log
recovery we finish log recovery with an empty AIL and the entirity
of log recovery on disk. This seems like a much better way to solve
the problem than to try to make the high level mount/unmount code
paper over problems caused by failed log recovery....

Cheers,

Dave.
--
Dave Chinner
[email protected]