I received this when running xfs_fsr to defragment my xfs volume (which
resides on a crypted lvm).
Could it be that something is wrong with my xfs volume and this is not
being properly handled (since it seems to refer to getbmap below?).
Kernel is configured with oldconfig from ubuntu hardy stock kernel plus
Alsa. Hardware can be seen at
<https://bugs.launchpad.net/ubuntu/+source/linux/+bug/235074>.
Kernel oopsed (caps lock+other flashing when I woke up this morning),
don't know if it's related to the below bug or if it's something else.
Screensaver was on so I couldn't see any text to see more info.
Jul 8 04:44:56 via kernel: [554197.888008] ------------[ cut here ]------------
Jul 8 04:44:56 via kernel: [554197.888008] kernel BUG at fs/xfs/support/debug.c:81!
Jul 8 04:44:56 via kernel: [554197.888008] invalid opcode: 0000 [#1] SMP
Jul 8 04:44:56 via kernel: [554197.888008] Modules linked in: soundcore
ac97_bus isofs udf crc_itu_t nvidia(P) af_packet rfcomm l2c
ap bluetooth ppdev cpufreq_conservative cpufreq_stats cpufreq_ondemand
cpufreq_userspace cpufreq_powersave freq_table container sbs
sbshc wmi video output battery iptable_filter ip_tables x_tables xfs
sha256_generic aes_i586 dm_crypt ac max6650 padlock_sha padlock
_aes crypto_blkcipher aes_generic parport_pc lp parport pcspkr i2c_viapro
i2c_core netxen_nic button via_agp shpchp pci_hotplug agpg
art ipv6 evdev ext3 jbd mbcache usbhid hid sr_mod cdrom pata_acpi pata_via
sg sd_mod uhci_hcd ehci_hcd via_velocity crc_ccitt ata_ge
neric ahci libata via_rhine mii scsi_mod usbcore dock raid10 raid456
async_xor async_memcpy async_tx xor raid1 raid0 multipath linea
r md_mod dm_mirror dm_log dm_snapshot dm_mod thermal processor fan fuse
[last unloaded: nvidia]
Jul 8 04:44:56 via kernel: [554197.888008]
Jul 8 04:44:56 via kernel: [554197.888008] Pid: 16200, comm: xfs_fsr Tainted: P (2.6.26-rc8 #1)
Jul 8 04:44:56 via kernel: [554197.888008] EIP: 0060:[<f956201b>] EFLAGS: 00010282 CPU: 0
Jul 8 04:44:56 via kernel: [554197.888008] EIP is at assfail+0x1b/0x20 [xfs]
Jul 8 04:44:56 via kernel: [554197.888008] EAX: 00000080 EBX: 00000000 ECX: 00000001 EDX: 00000092
Jul 8 04:44:56 via kernel: [554197.888008] ESI: 00000000 EDI: c279fed0 EBP: e890ea80 ESP: c279fde0
Jul 8 04:44:56 via kernel: [554197.888008] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Jul 8 04:44:56 via kernel: [554197.888008] Process xfs_fsr (pid: 16200, ti=c279e000 task=c6f7ec00 task.ti=c279e000)
Jul 8 04:44:56 via kernel: [554197.888008] Stack: f9570520 f9566db4 f95719fd 000016f7 f9505590 ffffffff ffffffff 00000000
Jul 8 04:44:56 via kernel: [554197.888008] 00000000 00000001 019cc780 f4463880 c279fe98 00000000 c01a1c25 f7879c80
Jul 8 04:44:56 via kernel: [554197.888008] c0196cbb bf8e9ffc f4463880 c279ff04 002e9b68 00000000 00000000 c01a6318
Jul 8 04:44:56 via kernel: [554197.888008] Call Trace:
Jul 8 04:44:56 via kernel: [554197.888008] [<f9505590>] xfs_getbmap+0x320/0x7e0 [xfs]
Jul 8 04:44:56 via kernel: [554197.888008] [xfs:dput+0x65/0x2b0] dput+0x65/0xe0
Jul 8 04:44:56 via kernel: [554197.888008] [__follow_mount+0x1b/0x70] __follow_mount+0x1b/0x70
Jul 8 04:44:56 via kernel: [554197.888008] [fuse:mntput_no_expire+0x18/0x3fc0] mntput_no_expire+0x18/0x100
Jul 8 04:44:56 via kernel: [554197.888008] [sg:seq_open+0x3c/0x100] seq_open+0x3c/0x90
Jul 8 04:44:56 via kernel: [554197.888008] [fuse:mntput_no_expire+0x18/0x3fc0] mntput_no_expire+0x18/0x100
Jul 8 04:44:56 via kernel: [554197.888008] [blk_init_queue_node+0x86/0x160] blk_init_queue_node+0x86/0x160
Jul 8 04:44:56 via kernel: [554197.888008] [<f955b8e0>] xfs_ioc_getbmap+0x60/0xd0 [xfs]
Jul 8 04:44:56 via kernel: [554197.888008] [blk_init_queue_node+0x86/0x160] blk_init_queue_node+0x86/0x160
Jul 8 04:44:56 via kernel: [554197.888008] [<f955d036>] xfs_ioctl+0x306/0x780 [xfs]
Jul 8 04:44:56 via kernel: [554197.888008] [free_pgtables+0x85/0xc0] free_pgtables+0x85/0xc0
Jul 8 04:44:56 via kernel: [554197.888008] [flush_tlb_mm+0x5f/0x70] flush_tlb_mm+0x5f/0x70
Jul 8 04:44:56 via kernel: [554197.888008] [blk_init_queue_node+0x86/0x160] blk_init_queue_node+0x86/0x160
Jul 8 04:44:56 via kernel: [554197.888008] [<f955ada0>] xfs_file_ioctl_invis+0x30/0x70 [xfs]
Jul 8 04:44:56 via kernel: [554197.888008] [blk_init_queue_node+0x86/0x160] blk_init_queue_node+0x86/0x160
Jul 8 04:44:56 via kernel: [554197.888008] [<f955ad70>] xfs_file_ioctl_invis+0x0/0x70 [xfs]
Jul 8 04:44:56 via kernel: [554197.888008] [vfs_ioctl+0x2b/0x90] vfs_ioctl+0x2b/0x90
Jul 8 04:44:56 via kernel: [554197.888008] [do_vfs_ioctl+0x25b/0x2a0] do_vfs_ioctl+0x25b/0x2a0
Jul 8 04:44:56 via kernel: [554197.888008] [sys_ioctl+0x56/0x70] sys_ioctl+0x56/0x70
Jul 8 04:44:56 via kernel: [554197.888008] [sysenter_past_esp+0x78/0xb9] sysenter_past_esp+0x78/0xb9
Jul 8 04:44:56 via kernel: [554197.888008] [blk_init_queue_node+0x86/0x160] blk_init_queue_node+0x86/0x160
Jul 8 04:44:56 via kernel: [554197.888008] =======================
Jul 8 04:44:56 via kernel: [554197.888008] Code: 00 e8 ea 6e cb c6 83 c4 14 c3 8d b6 00 00 00 00 83 ec 10 89 4c 24 0c 89 54 24 08 89 44 24 04 c7 04 24 20 05 57 f9 e8 35 76 bc c6 <0f> 0b eb fe 90 55 57 89 c7 56 b8 20 71 58 f9 53 89 d6 83 ec 0c
Jul 8 04:44:56 via kernel: [554197.888008] EIP: [<f956201b>] assfail+0x1b/0x20 [xfs] SS:ESP 0068:c279fde0
Jul 8 04:44:56 via kernel: [554201.441862] ---[ end trace 598853b934570aec ]---
--
Mikael Abrahamsson email: [email protected]
On Fri, Jul 11, 2008 at 09:46:47AM +0200, Mikael Abrahamsson wrote:
>
> I received this when running xfs_fsr to defragment my xfs volume (which
> resides on a crypted lvm).
Hmmmm. XFS has a history of triggering unique bugs in crypted
volumes - thank you for letting us know up front that this is
what you are using.
> Could it be that something is wrong with my xfs volume and this is not
> being properly handled (since it seems to refer to getbmap below?).
Could be - xfs_check will tell you if there is anything wrong with
it.
> Kernel is configured with oldconfig from ubuntu hardy stock kernel plus
> Alsa. Hardware can be seen at
> <https://bugs.launchpad.net/ubuntu/+source/linux/+bug/235074>.
>
> Kernel oopsed (caps lock+other flashing when I woke up this morning),
> don't know if it's related to the below bug or if it's something else.
> Screensaver was on so I couldn't see any text to see more info.
>
> Jul 8 04:44:56 via kernel: [554197.888008] ------------[ cut here ]------------
> Jul 8 04:44:56 via kernel: [554197.888008] kernel BUG at fs/xfs/support/debug.c:81!
> Jul 8 04:44:56 via kernel: [554197.888008] invalid opcode: 0000 [#1] SMP
> Jul 8 04:44:56 via kernel: [554197.888008] Modules linked in: soundcore
> ac97_bus isofs udf crc_itu_t nvidia(P) af_packet rfcomm l2c
^^^^^^^^^
That could also be a factor.
> Pid: 16200, comm: xfs_fsr Tainted: P (2.6.26-rc8 #1)
> EIP: 0060:[<f956201b>] EFLAGS: 00010282 CPU: 0
> EIP is at assfail+0x1b/0x20 [xfs]
^^^^^^^
Oh - you must be running a debug XFS. CONFIG_XFS_DEBUG was only
introduced in 2.6.26-rc1 and defaults to 'N', so you must have
selected the non-default option when prompted. This will cause your
machine to oops at the slightest inconsistency that is found,
regardless of whether it is fatal or not. Like the help text says,
don't set this unless you are an XFS developer....
That aside, what was the assert failure reported prior to the oops? i.e.
paste the lines in the log before the ---[ cut here ]--- line?
One of them will start with 'Assertion failed:', I think....
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Fri, 11 Jul 2008, Dave Chinner wrote:
> That aside, what was the assert failure reported prior to the oops? i.e.
> paste the lines in the log before the ---[ cut here ]--- line? One of
> them will start with 'Assertion failed:', I think....
These ones?
Jul 8 04:44:56 via kernel: [554197.888008] Assertion failed: whichfork == XFS_ATTR_FORK || ip->i_delayed_blks == 0, file: fs/xfs/xfs_bmap.c, line: 5879
Jul 9 03:25:21 via kernel: [42940.748007] Assertion failed: whichfork == XFS_ATTR_FORK || ip->i_delayed_blks == 0, file: fs/xfs/xfs_bmap.c, line: 5879
I'll happily rebuild the kernel without the debug option and do xfs_check
to weed out any possible logical problem with the volume, if you don't
need any further information from the current state of my volume.
I should also say that this assert failue happened two nights in a row so
I guess it's fairly reproducible (didn't happen on the 10th, and today,
the 11th it seems to have panic:ed around 03:30 (I start the
defragmentation via cron at 03:00) which I think is related.
--
Mikael Abrahamsson email: [email protected]
* Dave Chinner | 2008-07-11 18:42:49 [+1000]:
>Oh - you must be running a debug XFS. CONFIG_XFS_DEBUG was only
>introduced in 2.6.26-rc1 and defaults to 'N', so you must have
>selected the non-default option when prompted. This will cause your
>machine to oops at the slightest inconsistency that is found,
>regardless of whether it is fatal or not. Like the help text says,
>don't set this unless you are an XFS developer....
Could you please add this to the Kconfig entry. Debug mode is usually
noisy, little slower and mostly usefull just to the developers but *I*
would not expect to BUG() in the non-fatal case.
Not sure but if this is just for hch and you than a define in xfs.h
might be safer :)
>Dave.
Sebastian
Sebastian Siewior wrote:
> * Dave Chinner | 2008-07-11 18:42:49 [+1000]:
>
>> Oh - you must be running a debug XFS. CONFIG_XFS_DEBUG was only
>> introduced in 2.6.26-rc1 and defaults to 'N', so you must have
>> selected the non-default option when prompted. This will cause your
>> machine to oops at the slightest inconsistency that is found,
>> regardless of whether it is fatal or not. Like the help text says,
>> don't set this unless you are an XFS developer....
> Could you please add this to the Kconfig entry. Debug mode is usually
> noisy, little slower and mostly usefull just to the developers but *I*
> would not expect to BUG() in the non-fatal case.
> Not sure but if this is just for hch and you than a define in xfs.h
> might be safer :)
>
>> Dave.
> Sebastian
>
>
heh, it ws hch who added the Kconfig option in the first place :)
> Back when I first submitted XFS for mainline inclusion we made the
> decision that the debug code is far to extensive to be accidentally
> enabled by users in mainline. But then again it's often quite useful
> to track problems down and hacking the makefile all the time is rather
> annoying. Given all the debug options with even more overhead like
> lockdep or DEBUG_PAGE_ALLOC users (or rather developers) should know
> by now what they're doing.
>
>
> Signed-off-by: Christoph Hellwig <[email protected]>
But yeah, a bit more of a stern warning about the fatality of the debug
tests might be useful. Just in case anyone reads that part and not the
"only use this if you are an xfs developer" part ;)
-Eric
On Fri, Jul 11, 2008 at 09:02:09PM +0200, Sebastian Siewior wrote:
> * Dave Chinner | 2008-07-11 18:42:49 [+1000]:
>
> >Oh - you must be running a debug XFS. CONFIG_XFS_DEBUG was only
> >introduced in 2.6.26-rc1 and defaults to 'N', so you must have
> >selected the non-default option when prompted. This will cause your
> >machine to oops at the slightest inconsistency that is found,
> >regardless of whether it is fatal or not. Like the help text says,
> >don't set this unless you are an XFS developer....
> Could you please add this to the Kconfig entry.
I effectively quoted from it:
config XFS_DEBUG
bool "XFS Debugging support (EXPERIMENTAL)"
depends on XFS_FS && EXPERIMENTAL
help
Say Y here to get an XFS build with many debugging features,
including ASSERT checks, function wrappers around macros,
and extra sanity-checking functions in various code paths.
Note that the resulting code will be HUGE and SLOW, and probably
not useful unless you are debugging a particular problem.
Say N unless you are an XFS developer, or you play one on TV.
> Debug mode is usually
> noisy, little slower and mostly usefull just to the developers but *I*
> would not expect to BUG() in the non-fatal case.
What do you expect debug code to do? Asserts are designed to
drop the machine into a debugger when they fail so the problem can
be, well, debugged.
> Not sure but if this is just for hch and you than a define in xfs.h
> might be safer :)
And any other XFS developer using the XFS git tree or mainline, as
tends to happen these days. And there are cases where a debug XFS
might be needed to help find a problem that is being hit out in
the field.
Like all other debug config options, don't set them unless you know
what you are doing....
Cheers,
Dave.
--
Dave Chinner
[email protected]
* Dave Chinner | 2008-07-12 09:22:01 [+1000]:
>I effectively quoted from it:
>
>config XFS_DEBUG
> bool "XFS Debugging support (EXPERIMENTAL)"
> depends on XFS_FS && EXPERIMENTAL
> help
> Say Y here to get an XFS build with many debugging features,
> including ASSERT checks, function wrappers around macros,
> and extra sanity-checking functions in various code paths.
>
> Note that the resulting code will be HUGE and SLOW, and probably
> not useful unless you are debugging a particular problem.
>
> Say N unless you are an XFS developer, or you play one on TV.
>
>> Debug mode is usually
>> noisy, little slower and mostly usefull just to the developers but *I*
>> would not expect to BUG() in the non-fatal case.
>
>What do you expect debug code to do? Asserts are designed to
>drop the machine into a debugger when they fail so the problem can
>be, well, debugged.
Sorry, I haven't read this. Userspace assert() results in abort() so it
sane to bug() in kernel.
>Dave.
Sebastian
Mikael Abrahamsson wrote:
> On Fri, 11 Jul 2008, Dave Chinner wrote:
>
>> That aside, what was the assert failure reported prior to the oops?
>> i.e. paste the lines in the log before the ---[ cut here ]--- line?
>> One of them will start with 'Assertion failed:', I think....
>
> These ones?
>
> Jul 8 04:44:56 via kernel: [554197.888008] Assertion failed: whichfork
> == XFS_ATTR_FORK || ip->i_delayed_blks == 0, file: fs/xfs/xfs_bmap.c,
> line: 5879
> Jul 9 03:25:21 via kernel: [42940.748007] Assertion failed: whichfork
> == XFS_ATTR_FORK || ip->i_delayed_blks == 0, file: fs/xfs/xfs_bmap.c,
> line: 5879
xfs_ilock(ip, XFS_IOLOCK_SHARED);
if (whichfork == XFS_DATA_FORK &&
(ip->i_delayed_blks || ip->i_size > ip->i_d.di_size)) {
/* xfs_fsize_t last_byte = xfs_file_last_byte(ip); */
error = xfs_flush_pages(ip, (xfs_off_t)0,
-1, 0, FI_REMAPF);
if (error) {
xfs_iunlock(ip, XFS_IOLOCK_SHARED);
return error;
}
}
ASSERT(whichfork == XFS_ATTR_FORK || ip->i_delayed_blks == 0);
This is a race between xfs_fsr and a mmap write. xfs_fsr acquires the
iolock and then flushes the file and because it has the iolock it doesn't
expect any new delayed allocations to occur. A mmap write can allocate
delayed allocations without acquiring the iolock so is able to get in
after the flush but before the ASSERT.
>
> I'll happily rebuild the kernel without the debug option and do
> xfs_check to weed out any possible logical problem with the volume, if
> you don't need any further information from the current state of my volume.
>
> I should also say that this assert failue happened two nights in a row
> so I guess it's fairly reproducible (didn't happen on the 10th, and
> today, the 11th it seems to have panic:ed around 03:30 (I start the
> defragmentation via cron at 03:00) which I think is related.
>
On Fri, Jul 11, 2008 at 12:21:56PM +0200, Mikael Abrahamsson wrote:
> On Fri, 11 Jul 2008, Dave Chinner wrote:
>
>> That aside, what was the assert failure reported prior to the oops?
>> i.e. paste the lines in the log before the ---[ cut here ]--- line? One
>> of them will start with 'Assertion failed:', I think....
>
> These ones?
>
> Jul 8 04:44:56 via kernel: [554197.888008] Assertion failed: whichfork == XFS_ATTR_FORK || ip->i_delayed_blks == 0, file: fs/xfs/xfs_bmap.c, line: 5879
> Jul 9 03:25:21 via kernel: [42940.748007] Assertion failed: whichfork == XFS_ATTR_FORK || ip->i_delayed_blks == 0, file: fs/xfs/xfs_bmap.c, line: 5879
That implies a flush to disk failed to write everything, but no error was
reported back to the flush. Not particularly conclusive what caused
your problem.
That being said, it's not a fatal error - it simply means that
the bmap will return a bogus block number reported for the delalloc
extent that still exists.
This implies an in-memory error, not an on-disk error...
> I should also say that this assert failue happened two nights in a row so
> I guess it's fairly reproducible (didn't happen on the 10th, and today,
> the 11th it seems to have panic:ed around 03:30 (I start the
> defragmentation via cron at 03:00) which I think is related.
Can you find the file it is failing on and run 'xfs_bmap -vvp
<file>' to just extract the extent map outside the context of
xfs_fsr?
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Mon, 14 Jul 2008, Dave Chinner wrote:
> Can you find the file it is failing on and run 'xfs_bmap -vvp <file>' to
> just extract the extent map outside the context of xfs_fsr?
It's run twice without any errors now, so I guess I can't reproduce it
anymore. If it happens again, how can I find out which file is causing the
problem? I looked at the xfs_fsr man page and it didn't say much, would it
help if I ran it in "-v" mode?
--
Mikael Abrahamsson email: [email protected]
On Mon, Jul 14, 2008 at 05:34:51PM +1000, Lachlan McIlroy wrote:
> Mikael Abrahamsson wrote:
>> On Fri, 11 Jul 2008, Dave Chinner wrote:
>>
>>> That aside, what was the assert failure reported prior to the oops?
>>> i.e. paste the lines in the log before the ---[ cut here ]--- line?
>>> One of them will start with 'Assertion failed:', I think....
>>
>> These ones?
>>
>> Jul 8 04:44:56 via kernel: [554197.888008] Assertion failed: whichfork
>> == XFS_ATTR_FORK || ip->i_delayed_blks == 0, file: fs/xfs/xfs_bmap.c,
>> line: 5879
>> Jul 9 03:25:21 via kernel: [42940.748007] Assertion failed: whichfork
>> == XFS_ATTR_FORK || ip->i_delayed_blks == 0, file: fs/xfs/xfs_bmap.c,
>> line: 5879
>
> xfs_ilock(ip, XFS_IOLOCK_SHARED);
>
> if (whichfork == XFS_DATA_FORK &&
> (ip->i_delayed_blks || ip->i_size > ip->i_d.di_size)) {
> /* xfs_fsize_t last_byte = xfs_file_last_byte(ip); */
> error = xfs_flush_pages(ip, (xfs_off_t)0,
> -1, 0, FI_REMAPF);
> if (error) {
> xfs_iunlock(ip, XFS_IOLOCK_SHARED);
> return error;
> }
> }
>
> ASSERT(whichfork == XFS_ATTR_FORK || ip->i_delayed_blks == 0);
>
> This is a race between xfs_fsr and a mmap write. xfs_fsr acquires the
> iolock and then flushes the file and because it has the iolock it doesn't
> expect any new delayed allocations to occur. A mmap write can allocate
> delayed allocations without acquiring the iolock so is able to get in
> after the flush but before the ASSERT.
Christoph and I were contemplating this problem with ->page_mkwrite
reecently. The problem is that we can't, right now, return an
EAGAIN-like error to ->page_mkwrite() and have it retry the
page fault. Other parts of the page faulting code can do this,
so it seems like a solvable problem.
The basic concept is that if we can return a EAGAIN result we can
try-lock the inode and hold the locks necessary to avoid this race
or prevent the page fault from dirtying the page until the
filesystem is unfrozen.
Added linux-mm to the cc list for discussion.
Cheers,
Dave.
--
Dave Chinner
[email protected]
Dave Chinner wrote:
> On Mon, Jul 14, 2008 at 05:34:51PM +1000, Lachlan McIlroy wrote:
>> Mikael Abrahamsson wrote:
>>> On Fri, 11 Jul 2008, Dave Chinner wrote:
>>>
>>>> That aside, what was the assert failure reported prior to the oops?
>>>> i.e. paste the lines in the log before the ---[ cut here ]--- line?
>>>> One of them will start with 'Assertion failed:', I think....
>>> These ones?
>>>
>>> Jul 8 04:44:56 via kernel: [554197.888008] Assertion failed: whichfork
>>> == XFS_ATTR_FORK || ip->i_delayed_blks == 0, file: fs/xfs/xfs_bmap.c,
>>> line: 5879
>>> Jul 9 03:25:21 via kernel: [42940.748007] Assertion failed: whichfork
>>> == XFS_ATTR_FORK || ip->i_delayed_blks == 0, file: fs/xfs/xfs_bmap.c,
>>> line: 5879
>> xfs_ilock(ip, XFS_IOLOCK_SHARED);
>>
>> if (whichfork == XFS_DATA_FORK &&
>> (ip->i_delayed_blks || ip->i_size > ip->i_d.di_size)) {
>> /* xfs_fsize_t last_byte = xfs_file_last_byte(ip); */
>> error = xfs_flush_pages(ip, (xfs_off_t)0,
>> -1, 0, FI_REMAPF);
>> if (error) {
>> xfs_iunlock(ip, XFS_IOLOCK_SHARED);
>> return error;
>> }
>> }
>>
>> ASSERT(whichfork == XFS_ATTR_FORK || ip->i_delayed_blks == 0);
>>
>> This is a race between xfs_fsr and a mmap write. xfs_fsr acquires the
>> iolock and then flushes the file and because it has the iolock it doesn't
>> expect any new delayed allocations to occur. A mmap write can allocate
>> delayed allocations without acquiring the iolock so is able to get in
>> after the flush but before the ASSERT.
>
> Christoph and I were contemplating this problem with ->page_mkwrite
> reecently. The problem is that we can't, right now, return an
> EAGAIN-like error to ->page_mkwrite() and have it retry the
> page fault. Other parts of the page faulting code can do this,
> so it seems like a solvable problem.
>
> The basic concept is that if we can return a EAGAIN result we can
> try-lock the inode and hold the locks necessary to avoid this race
> or prevent the page fault from dirtying the page until the
> filesystem is unfrozen.
Why do we need to try-lock the inode? Will we have an ABBA deadlock
if we block on the iolock in ->page_mkwrite()?
On Tue, Jul 15, 2008 at 12:12:52PM +1000, Lachlan McIlroy wrote:
> Dave Chinner wrote:
>> On Mon, Jul 14, 2008 at 05:34:51PM +1000, Lachlan McIlroy wrote:
>>> This is a race between xfs_fsr and a mmap write. xfs_fsr acquires the
>>> iolock and then flushes the file and because it has the iolock it doesn't
>>> expect any new delayed allocations to occur. A mmap write can allocate
>>> delayed allocations without acquiring the iolock so is able to get in
>>> after the flush but before the ASSERT.
>>
>> Christoph and I were contemplating this problem with ->page_mkwrite
>> reecently. The problem is that we can't, right now, return an
>> EAGAIN-like error to ->page_mkwrite() and have it retry the
>> page fault. Other parts of the page faulting code can do this,
>> so it seems like a solvable problem.
>>
>> The basic concept is that if we can return a EAGAIN result we can
>> try-lock the inode and hold the locks necessary to avoid this race
>> or prevent the page fault from dirtying the page until the
>> filesystem is unfrozen.
> Why do we need to try-lock the inode? Will we have an ABBA deadlock
> if we block on the iolock in ->page_mkwrite()?
Yes. With the mmap_sem. Look at the rules in mm/filemap.c
and replace i_mutex with iolock....
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Monday 14 July 2008 22:13, Dave Chinner wrote:
> Christoph and I were contemplating this problem with ->page_mkwrite
> reecently. The problem is that we can't, right now, return an
> EAGAIN-like error to ->page_mkwrite() and have it retry the
> page fault. Other parts of the page faulting code can do this,
> so it seems like a solvable problem.
>
> The basic concept is that if we can return a EAGAIN result we can
> try-lock the inode and hold the locks necessary to avoid this race
> or prevent the page fault from dirtying the page until the
> filesystem is unfrozen.
>
> Added linux-mm to the cc list for discussion.
It would be easily possible to do, yes.
> It would be easily possible to do, yes.
What happened to your plans to merge ->nopfn into ->fault? Beeing
able to restart page based faults would be a logical fallout from that.
On Tuesday 15 July 2008 22:22, Christoph Hellwig wrote:
> > It would be easily possible to do, yes.
>
> What happened to your plans to merge ->nopfn into ->fault? Beeing
> able to restart page based faults would be a logical fallout from that.
Yeah I guess I should really do that, and you're right it would
work nicely for this.
Actually I have some code but it is not quite as nice as I'd like.
The problem is that we have the generic file fault handler, but not
a generic page mkwrite handler. So we still need some kind of
page_mkwrite aop which the file fault handler can then call if it
exists. It isn't a big problem AFAIKS, but just a bit irritating.