2007-01-06 03:49:20

by Sami Farin

[permalink] [raw]
Subject: BUG: warning at mm/truncate.c:60/cancel_dirty_page()

Linux 2.6.19.1 SMP [2] on Pentium D...
I was running dt-15.14 [2] and I ran
"cinfo datafile" (it does mincore()).
Well it went OK but when I ran "strace cinfo datafile"...:
04:18:48.062466 mincore(0x37f1f000, 2147266560,
...
2007-01-06 04:19:03.788181500 <4>BUG: warning at mm/truncate.c:60/cancel_dirty_page()
2007-01-06 04:19:03.788221500 <4> [<c0103cfb>] dump_trace+0x215/0x21a
2007-01-06 04:19:03.788223500 <4> [<c0103da3>] show_trace_log_lvl+0x1a/0x30
2007-01-06 04:19:03.788224500 <4> [<c0103dcb>] show_trace+0x12/0x14
2007-01-06 04:19:03.788225500 <4> [<c0103ec8>] dump_stack+0x19/0x1b
2007-01-06 04:19:03.788227500 <4> [<c01546a6>] cancel_dirty_page+0x7e/0x80
2007-01-06 04:19:03.788228500 <4> [<c01546c2>] truncate_complete_page+0x1a/0x47
2007-01-06 04:19:03.788229500 <4> [<c0154854>] truncate_inode_pages_range+0x114/0x2ae
2007-01-06 04:19:03.788245500 <4> [<c0154a08>] truncate_inode_pages+0x1a/0x1c
2007-01-06 04:19:03.788247500 <4> [<c0269244>] fs_flushinval_pages+0x40/0x77
2007-01-06 04:19:03.788248500 <4> [<c026d48c>] xfs_write+0x8c4/0xb68
2007-01-06 04:19:03.788250500 <4> [<c0268b14>] xfs_file_aio_write+0x7e/0x95
2007-01-06 04:19:03.788251500 <4> [<c016d66c>] do_sync_write+0xca/0x119
2007-01-06 04:19:03.788265500 <4> [<c016d842>] vfs_write+0x187/0x18c
2007-01-06 04:19:03.788267500 <4> [<c016d8e8>] sys_write+0x3d/0x64
2007-01-06 04:19:03.788268500 <4> [<c0102e73>] syscall_call+0x7/0xb
2007-01-06 04:19:03.788269500 <4> [<001cf410>] 0x1cf410
2007-01-06 04:19:03.788289500 <4> =======================

funny that when stracing, mincore does not return?

$ time cinfo dtfile-2091
dtfile-2091: 524285 pages, 0 pages cached (0.00%)

real 0m0.049s
user 0m0.003s
sys 0m0.046s

safari 6941 29.9 10.8 2098768 108788 pts/2 D+ 04:20 3:41 strace -vfttT cinfo dtfile-2091

strace D C179A000 0 6941 8737 6942 2089 (NOTLB)
e7dddd90 00000046 c172e240 c179a000 c1731880 c17c51a0 c17b5be0 c1771c20
e9b1f740 00000086 d55dd74c e7dddd6c 00000000 c02666b8 c17fd700 c2c2513c
0000fdac b2203610 0000616f c2c25030 c17fd700 e7ddddf8 c17d31f0 e7dddd9c
Call Trace:
[<c04b4568>] io_schedule+0x26/0x30
[<c014cdad>] sync_page+0x3d/0x48
[<c04b47b1>] __wait_on_bit_lock+0x45/0x67
[<c014d556>] __lock_page+0x88/0x95
[<c014e5a7>] filemap_nopage+0x1f4/0x386
[<c015b24c>] do_no_page+0x82/0x2fa
[<c015b8a3>] __handle_mm_fault+0x1fe/0x2eb
[<c0159a99>] get_user_pages+0xc7/0x2e5
[<c015bb8a>] access_process_vm+0x74/0x116
[<c010639e>] arch_ptrace+0x388/0x539
[<c012ad7f>] sys_ptrace+0x58/0xb9
[<c0102e73>] syscall_call+0x7/0xb
[<0081e410>] 0x81e410
=======================
cinfo t CB7DA040 0 6942 6941 (NOTLB)
e5fcdee8 00000046 ce7d9284 cb7da040 ce7d9284 cb7da078 cb7da040 e5fcdf10
cb7da040 13a973f5 000060e3 00000078 00000000 c2c25030 c1805700 d483013c
0021fd53 13a97513 000060e3 d4830030 e5fcdf04 00000005 00000000 e5fcdefc
Call Trace:
[<c012e32b>] ptrace_stop+0xf8/0x17f
[<c012e41c>] ptrace_notify+0x6a/0x92
[<c01066a9>] do_syscall_trace+0xd4/0x1eb
[<c0102f66>] syscall_exit_work+0x16/0x1b
[<00878410>] 0x878410
=======================


[1]
includes these
8c08540f8755c451d8b96ea14cfe796bc3cd712d [PATCH] clean up __set_page_dirty_nobuffers()
55e829af06681e5d731c03ba04febbd1c76ca293 [PATCH] io-accounting: write accounting
e08748ce01e02f0ec154b141f392ccb9555333f4 [PATCH] io-accounting: write-cancel accounting
fba2591bf4e418b6c3f9f8794c9dd8fe40ae7bd9 VM: Remove "clear_page_dirty()" and "test_clear_page_dirty()" functions
3e67c0987d7567ad666641164a153dca9a43b11d [PATCH] truncate: clear page dirtiness before running try_to_free_buffers()
5f2a105d5e33a038a717995d2738434f9c25aed2 [PATCH] truncate: dirty memory accounting fix
8368e328dfe1c534957051333a87b3210a12743b Clean up and export cancel_dirty_page() to modules
7658cc289288b8ae7dd2c2224549a048431222b3 VM: Fix nasty and subtle race in shared mmap'ed page writeback
7c3ab7381e79dfc7db14a67c6f4f3285664e1ec2 [PATCH] io-accounting: core statistics
cb876f451455b6187a7d69de2c112c45ec4b7f99 Fix up CIFS for "test_clear_page_dirty()" removal
7dfb71030f7636a0d65200158113c37764552f93 [PATCH] Add include/linux/freezer.h and move definitions from sched.h
46d2277c796f9f4937bfa668c40b2e3f43e93dd0 Clean up and make try_to_free_buffers() not race with dirty
e61c90188b9956edae1105eef361d8981a352fcd [PATCH] optimize o_direct on block devices
5fcf7bb73f66cc1c4ad90788b0f367c4d6852b75 [PATCH] read_zero_pagealigned() locking fix
ffaa82008f1aad52a6d3979f49d2a76c2928b60f Fix reiserfs after "test_clear_page_dirty()" removal
d0e671a932cb9c653b27393cec26aec012a8d97e [PATCH] Fix JFS after clear_page_dirty() removal
9280f6822c2d7112b47107251fce307aefb31f35 [PATCH] fuse: remove clear_page_dirty() call
921320210bd2ec4f17053d283355b73048ac0e56 [PATCH] Fix XFS after clear_page_dirty() removal

[2]
dt pattern=iot incr=variable records=32768 lbs=65536 bs=65536 of=dtfile log=dtfile.log.direct.random passes=1 procs=2 iotype=random flags=direct
http://home.comcast.net/~SCSIguy/SCSI_FAQ/RMiller_Tools/ftp/dt/dt-source.tar.gz

--


2007-01-06 04:39:55

by Sami Farin

[permalink] [raw]
Subject: Re: BUG: warning at mm/truncate.c:60/cancel_dirty_page()

On Sat, Jan 06, 2007 at 04:39:07 +0200, Sami Farin wrote:
> Linux 2.6.19.1 SMP [2] on Pentium D...
> I was running dt-15.14 [2] and I ran
> "cinfo datafile" (it does mincore()).
> Well it went OK but when I ran "strace cinfo datafile"...:
> 04:18:48.062466 mincore(0x37f1f000, 2147266560,

Forgot to do "git-whatchanged mm/mincore.c"...
Looks like git and 2.6.19.2 review patch include a fix for mincore.
Maybe it fixes this issue.

--

2007-01-06 21:10:55

by Hugh Dickins

[permalink] [raw]
Subject: Re: BUG: warning at mm/truncate.c:60/cancel_dirty_page()

On Sat, 6 Jan 2007, Sami Farin wrote:

> Linux 2.6.19.1 SMP [2] on Pentium D...
> I was running dt-15.14 [2] and I ran
> "cinfo datafile" (it does mincore()).
> Well it went OK but when I ran "strace cinfo datafile"...:
> 04:18:48.062466 mincore(0x37f1f000, 2147266560,

You rightly noted in a followup that there have been changes to
mincore, but I doubt they have any bearing on this: I think the
BUG just happened at the same time as your mincore.

> ...
> 2007-01-06 04:19:03.788181500 <4>BUG: warning at mm/truncate.c:60/cancel_dirty_page()
> 2007-01-06 04:19:03.788221500 <4> [<c0103cfb>] dump_trace+0x215/0x21a
> 2007-01-06 04:19:03.788223500 <4> [<c0103da3>] show_trace_log_lvl+0x1a/0x30
> 2007-01-06 04:19:03.788224500 <4> [<c0103dcb>] show_trace+0x12/0x14
> 2007-01-06 04:19:03.788225500 <4> [<c0103ec8>] dump_stack+0x19/0x1b
> 2007-01-06 04:19:03.788227500 <4> [<c01546a6>] cancel_dirty_page+0x7e/0x80
> 2007-01-06 04:19:03.788228500 <4> [<c01546c2>] truncate_complete_page+0x1a/0x47
> 2007-01-06 04:19:03.788229500 <4> [<c0154854>] truncate_inode_pages_range+0x114/0x2ae
> 2007-01-06 04:19:03.788245500 <4> [<c0154a08>] truncate_inode_pages+0x1a/0x1c
> 2007-01-06 04:19:03.788247500 <4> [<c0269244>] fs_flushinval_pages+0x40/0x77
> 2007-01-06 04:19:03.788248500 <4> [<c026d48c>] xfs_write+0x8c4/0xb68
> 2007-01-06 04:19:03.788250500 <4> [<c0268b14>] xfs_file_aio_write+0x7e/0x95
> 2007-01-06 04:19:03.788251500 <4> [<c016d66c>] do_sync_write+0xca/0x119
> 2007-01-06 04:19:03.788265500 <4> [<c016d842>] vfs_write+0x187/0x18c
> 2007-01-06 04:19:03.788267500 <4> [<c016d8e8>] sys_write+0x3d/0x64
> 2007-01-06 04:19:03.788268500 <4> [<c0102e73>] syscall_call+0x7/0xb
> 2007-01-06 04:19:03.788269500 <4> [<001cf410>] 0x1cf410
> 2007-01-06 04:19:03.788289500 <4> =======================

So... XFS uses truncate_inode_pages when serving the write system call.
That's very inventive, and now it looks like Linus' cancel_dirty_page
and new warning have caught it out. VM people expect it to be called
either when freeing an inode no longer in use, or when doing a truncate,
after ensuring that all pages mapped into userspace have been taken out.

>
> funny that when stracing, mincore does not return?
>
> $ time cinfo dtfile-2091
> dtfile-2091: 524285 pages, 0 pages cached (0.00%)
>
> real 0m0.049s
> user 0m0.003s
> sys 0m0.046s
>
> safari 6941 29.9 10.8 2098768 108788 pts/2 D+ 04:20 3:41 strace -vfttT cinfo dtfile-2091
>
> strace D C179A000 0 6941 8737 6942 2089 (NOTLB)
> e7dddd90 00000046 c172e240 c179a000 c1731880 c17c51a0 c17b5be0 c1771c20
> e9b1f740 00000086 d55dd74c e7dddd6c 00000000 c02666b8 c17fd700 c2c2513c
> 0000fdac b2203610 0000616f c2c25030 c17fd700 e7ddddf8 c17d31f0 e7dddd9c
> Call Trace:
> [<c04b4568>] io_schedule+0x26/0x30
> [<c014cdad>] sync_page+0x3d/0x48
> [<c04b47b1>] __wait_on_bit_lock+0x45/0x67
> [<c014d556>] __lock_page+0x88/0x95
> [<c014e5a7>] filemap_nopage+0x1f4/0x386
> [<c015b24c>] do_no_page+0x82/0x2fa
> [<c015b8a3>] __handle_mm_fault+0x1fe/0x2eb
> [<c0159a99>] get_user_pages+0xc7/0x2e5
> [<c015bb8a>] access_process_vm+0x74/0x116
> [<c010639e>] arch_ptrace+0x388/0x539
> [<c012ad7f>] sys_ptrace+0x58/0xb9
> [<c0102e73>] syscall_call+0x7/0xb
> [<0081e410>] 0x81e410
> =======================
> cinfo t CB7DA040 0 6942 6941 (NOTLB)
> e5fcdee8 00000046 ce7d9284 cb7da040 ce7d9284 cb7da078 cb7da040 e5fcdf10
> cb7da040 13a973f5 000060e3 00000078 00000000 c2c25030 c1805700 d483013c
> 0021fd53 13a97513 000060e3 d4830030 e5fcdf04 00000005 00000000 e5fcdefc
> Call Trace:
> [<c012e32b>] ptrace_stop+0xf8/0x17f
> [<c012e41c>] ptrace_notify+0x6a/0x92
> [<c01066a9>] do_syscall_trace+0xd4/0x1eb
> [<c0102f66>] syscall_exit_work+0x16/0x1b
> [<00878410>] 0x878410
> =======================
>
>
> [1]
> includes these
> 8c08540f8755c451d8b96ea14cfe796bc3cd712d [PATCH] clean up __set_page_dirty_nobuffers()
> 55e829af06681e5d731c03ba04febbd1c76ca293 [PATCH] io-accounting: write accounting
> e08748ce01e02f0ec154b141f392ccb9555333f4 [PATCH] io-accounting: write-cancel accounting
> fba2591bf4e418b6c3f9f8794c9dd8fe40ae7bd9 VM: Remove "clear_page_dirty()" and "test_clear_page_dirty()" functions
> 3e67c0987d7567ad666641164a153dca9a43b11d [PATCH] truncate: clear page dirtiness before running try_to_free_buffers()
> 5f2a105d5e33a038a717995d2738434f9c25aed2 [PATCH] truncate: dirty memory accounting fix
> 8368e328dfe1c534957051333a87b3210a12743b Clean up and export cancel_dirty_page() to modules
> 7658cc289288b8ae7dd2c2224549a048431222b3 VM: Fix nasty and subtle race in shared mmap'ed page writeback
> 7c3ab7381e79dfc7db14a67c6f4f3285664e1ec2 [PATCH] io-accounting: core statistics
> cb876f451455b6187a7d69de2c112c45ec4b7f99 Fix up CIFS for "test_clear_page_dirty()" removal
> 7dfb71030f7636a0d65200158113c37764552f93 [PATCH] Add include/linux/freezer.h and move definitions from sched.h
> 46d2277c796f9f4937bfa668c40b2e3f43e93dd0 Clean up and make try_to_free_buffers() not race with dirty
> e61c90188b9956edae1105eef361d8981a352fcd [PATCH] optimize o_direct on block devices
> 5fcf7bb73f66cc1c4ad90788b0f367c4d6852b75 [PATCH] read_zero_pagealigned() locking fix
> ffaa82008f1aad52a6d3979f49d2a76c2928b60f Fix reiserfs after "test_clear_page_dirty()" removal
> d0e671a932cb9c653b27393cec26aec012a8d97e [PATCH] Fix JFS after clear_page_dirty() removal
> 9280f6822c2d7112b47107251fce307aefb31f35 [PATCH] fuse: remove clear_page_dirty() call
> 921320210bd2ec4f17053d283355b73048ac0e56 [PATCH] Fix XFS after clear_page_dirty() removal

Gosh. Might be better to reproduce it on 2.6.20-rc3;
but I think we have to hand this over to some XFS people.

Hugh

>
> [2]
> dt pattern=iot incr=variable records=32768 lbs=65536 bs=65536 of=dtfile log=dtfile.log.direct.random passes=1 procs=2 iotype=random flags=direct
> http://home.comcast.net/~SCSIguy/SCSI_FAQ/RMiller_Tools/ftp/dt/dt-source.tar.gz

2007-01-07 02:34:29

by Sami Farin

[permalink] [raw]
Subject: Re: BUG: warning at mm/truncate.c:60/cancel_dirty_page()

On Sat, Jan 06, 2007 at 21:11:07 +0000, Hugh Dickins wrote:
> On Sat, 6 Jan 2007, Sami Farin wrote:
>
> > Linux 2.6.19.1 SMP [2] on Pentium D...
> > I was running dt-15.14 [2] and I ran
> > "cinfo datafile" (it does mincore()).
> > Well it went OK but when I ran "strace cinfo datafile"...:
> > 04:18:48.062466 mincore(0x37f1f000, 2147266560,
>
> You rightly noted in a followup that there have been changes to
> mincore, but I doubt they have any bearing on this: I think the
> BUG just happened at the same time as your mincore.

BUG does not happen if I do not do "strace cinfo dtfile" with O_DIRECT
test file. It's easy to reproduce.
Without strace BUG does not happen.

Now I got it again, with also the mincore patch applied:

01:48:42.831060 mincore(0x37ff1000, 2147254272, ^[

2007-01-07 01:48:51.480531500 <4>BUG: warning at mm/truncate.c:60/cancel_dirty_page()
2007-01-07 01:48:51.480532500 <4> [<c0103cff>] dump_trace+0x215/0x21a
2007-01-07 01:48:51.480557500 <4> [<c0103da7>] show_trace_log_lvl+0x1a/0x30
2007-01-07 01:48:51.480559500 <4> [<c0103dcf>] show_trace+0x12/0x14
2007-01-07 01:48:51.480560500 <4> [<c0103ecc>] dump_stack+0x19/0x1b
2007-01-07 01:48:51.480561500 <4> [<c0155616>] cancel_dirty_page+0x7e/0x80
2007-01-07 01:48:51.480562500 <4> [<c0155632>] truncate_complete_page+0x1a/0x47
2007-01-07 01:48:51.480563500 <4> [<c01557c4>] truncate_inode_pages_range+0x114/0x2ae
2007-01-07 01:48:51.480564500 <4> [<c0155978>] truncate_inode_pages+0x1a/0x1c
2007-01-07 01:48:51.480574500 <4> [<c026a558>] fs_flushinval_pages+0x40/0x77
2007-01-07 01:48:51.480575500 <4> [<c026e7a8>] xfs_write+0x8c4/0xb68
2007-01-07 01:48:51.480576500 <4> [<c0269e28>] xfs_file_aio_write+0x7e/0x95
2007-01-07 01:48:51.480577500 <4> [<c016e5d0>] do_sync_write+0xca/0x119
2007-01-07 01:48:51.480578500 <4> [<c016e7a6>] vfs_write+0x187/0x18c
2007-01-07 01:48:51.480579500 <4> [<c016e84c>] sys_write+0x3d/0x64
2007-01-07 01:48:51.480589500 <4> [<c0102e77>] syscall_call+0x7/0xb
2007-01-07 01:48:51.480590500 <4> [<00bed410>] 0xbed410

$ /sbin/sysctl -n vm.dirty_expire_centisecs
999

cancel_dirty_page would be more useful if it executed WARN_ON
at max once per 10s or something instead of five times out of 2^32 or 2^64
errors... I mean, user might think program/kernel started to work OK when
BUGs stop if he/she doesn't check cancel_dirty_page() function.


--- linux-2.6.19/mm/truncate.c.bak 2007-01-01 19:10:00.627310000 +0200
+++ linux-2.6.19/mm/truncate.c 2007-01-07 02:35:53.786636897 +0200
@@ -55,11 +55,16 @@ void cancel_dirty_page(struct page *page
{
/* If we're cancelling the page, it had better not be mapped any more */
if (page_mapped(page)) {
- static unsigned int warncount;
+ static int first = 1;
+ static unsigned long lastwarn;

- WARN_ON(++warncount < 5);
+ if (unlikely(first) || time_after(jiffies, (lastwarn + 10*HZ))) {
+ first = 0;
+ lastwarn = jiffies;
+ WARN_ON(42);
+ }
}
-
+
if (TestClearPageDirty(page)) {
struct address_space *mapping = page->mapping;
if (mapping && mapping_cap_account_dirty(mapping)) {


> > ...
> > 2007-01-06 04:19:03.788181500 <4>BUG: warning at mm/truncate.c:60/cancel_dirty_page()
> > 2007-01-06 04:19:03.788221500 <4> [<c0103cfb>] dump_trace+0x215/0x21a
> > 2007-01-06 04:19:03.788223500 <4> [<c0103da3>] show_trace_log_lvl+0x1a/0x30
> > 2007-01-06 04:19:03.788224500 <4> [<c0103dcb>] show_trace+0x12/0x14
> > 2007-01-06 04:19:03.788225500 <4> [<c0103ec8>] dump_stack+0x19/0x1b
> > 2007-01-06 04:19:03.788227500 <4> [<c01546a6>] cancel_dirty_page+0x7e/0x80
> > 2007-01-06 04:19:03.788228500 <4> [<c01546c2>] truncate_complete_page+0x1a/0x47
> > 2007-01-06 04:19:03.788229500 <4> [<c0154854>] truncate_inode_pages_range+0x114/0x2ae
> > 2007-01-06 04:19:03.788245500 <4> [<c0154a08>] truncate_inode_pages+0x1a/0x1c
> > 2007-01-06 04:19:03.788247500 <4> [<c0269244>] fs_flushinval_pages+0x40/0x77
> > 2007-01-06 04:19:03.788248500 <4> [<c026d48c>] xfs_write+0x8c4/0xb68
> > 2007-01-06 04:19:03.788250500 <4> [<c0268b14>] xfs_file_aio_write+0x7e/0x95
> > 2007-01-06 04:19:03.788251500 <4> [<c016d66c>] do_sync_write+0xca/0x119
> > 2007-01-06 04:19:03.788265500 <4> [<c016d842>] vfs_write+0x187/0x18c
> > 2007-01-06 04:19:03.788267500 <4> [<c016d8e8>] sys_write+0x3d/0x64
> > 2007-01-06 04:19:03.788268500 <4> [<c0102e73>] syscall_call+0x7/0xb
> > 2007-01-06 04:19:03.788269500 <4> [<001cf410>] 0x1cf410
> > 2007-01-06 04:19:03.788289500 <4> =======================
>
> So... XFS uses truncate_inode_pages when serving the write system call.
> That's very inventive, and now it looks like Linus' cancel_dirty_page
> and new warning have caught it out. VM people expect it to be called
> either when freeing an inode no longer in use, or when doing a truncate,
> after ensuring that all pages mapped into userspace have been taken out.

Maybe XFS people can tell is their code doing something unwise?

And about my earlier xfs_freeze issue... =)

...
> Gosh. Might be better to reproduce it on 2.6.20-rc3;

Well those were not all the patches =)
I have spent so many days with 2.6.18 and 2.6.19 after getting
new mobo that I rather use this 2.6.19.x for as long as it works.

But why doesn't somebody want to reproduce with latest kernel...
Just running dt and "strace cinfo dtfile*" needed...

dt pattern=iot incr=variable records=32768 bs=65536 of=dtfile log=dtfile.log.txt passes=1 procs=2 iotype=random flags=direct capacity=2G

Note: I was not able to reproduce with capacity=512M.
I have 1 GB of RAM and 4 GB of swap (HIGHMEM PAE kernel).

> but I think we have to hand this over to some XFS people.
>
> Hugh

--

2007-01-07 22:24:06

by David Chinner

[permalink] [raw]
Subject: Re: BUG: warning at mm/truncate.c:60/cancel_dirty_page()

On Sat, Jan 06, 2007 at 09:11:07PM +0000, Hugh Dickins wrote:
> On Sat, 6 Jan 2007, Sami Farin wrote:
>
> > Linux 2.6.19.1 SMP [2] on Pentium D...
> > I was running dt-15.14 [2] and I ran
> > "cinfo datafile" (it does mincore()).
> > Well it went OK but when I ran "strace cinfo datafile"...:
> > 04:18:48.062466 mincore(0x37f1f000, 2147266560,
>
> You rightly noted in a followup that there have been changes to
> mincore, but I doubt they have any bearing on this: I think the
> BUG just happened at the same time as your mincore.
>
> > ...
> > 2007-01-06 04:19:03.788181500 <4>BUG: warning at mm/truncate.c:60/cancel_dirty_page()
> > 2007-01-06 04:19:03.788221500 <4> [<c0103cfb>] dump_trace+0x215/0x21a
> > 2007-01-06 04:19:03.788223500 <4> [<c0103da3>] show_trace_log_lvl+0x1a/0x30
> > 2007-01-06 04:19:03.788224500 <4> [<c0103dcb>] show_trace+0x12/0x14
> > 2007-01-06 04:19:03.788225500 <4> [<c0103ec8>] dump_stack+0x19/0x1b
> > 2007-01-06 04:19:03.788227500 <4> [<c01546a6>] cancel_dirty_page+0x7e/0x80
> > 2007-01-06 04:19:03.788228500 <4> [<c01546c2>] truncate_complete_page+0x1a/0x47
> > 2007-01-06 04:19:03.788229500 <4> [<c0154854>] truncate_inode_pages_range+0x114/0x2ae
> > 2007-01-06 04:19:03.788245500 <4> [<c0154a08>] truncate_inode_pages+0x1a/0x1c
> > 2007-01-06 04:19:03.788247500 <4> [<c0269244>] fs_flushinval_pages+0x40/0x77
> > 2007-01-06 04:19:03.788248500 <4> [<c026d48c>] xfs_write+0x8c4/0xb68
> > 2007-01-06 04:19:03.788250500 <4> [<c0268b14>] xfs_file_aio_write+0x7e/0x95
> > 2007-01-06 04:19:03.788251500 <4> [<c016d66c>] do_sync_write+0xca/0x119
> > 2007-01-06 04:19:03.788265500 <4> [<c016d842>] vfs_write+0x187/0x18c
> > 2007-01-06 04:19:03.788267500 <4> [<c016d8e8>] sys_write+0x3d/0x64
> > 2007-01-06 04:19:03.788268500 <4> [<c0102e73>] syscall_call+0x7/0xb
> > 2007-01-06 04:19:03.788269500 <4> [<001cf410>] 0x1cf410
> > 2007-01-06 04:19:03.788289500 <4> =======================
>
> So... XFS uses truncate_inode_pages when serving the write system call.

Only when you are doing direct I/O. XFS does direct writes without
the i_mutex held, so it has to invalidate the range of cached pages
while holding it's own locks to ensure direct I/O cache semantics are
kept.

> That's very inventive,

Not really - been doing it for years.

> and now it looks like Linus' cancel_dirty_page
> and new warning have caught it out. VM people expect it to be called
> either when freeing an inode no longer in use, or when doing a truncate,
> after ensuring that all pages mapped into userspace have been taken out.

Ok, so we are punching a hole in the middle of the address space
because we are doing direct I/O on it and need to invalidate the
cache.

How are you supposed to invalidate a range of pages in a mapping for
this case, then? invalidate_mapping_pages() would appear to be the
candidate (the generic code uses this), but it _skips_ pages that
are already mapped. invalidate_mapping_pages() then advises you to
use truncate_inode_pages():

/**
* invalidate_mapping_pages - Invalidate all the unlocked pages of one inode
* @mapping: the address_space which holds the pages to invalidate
* @start: the offset 'from' which to invalidate
* @end: the offset 'to' which to invalidate (inclusive)
*
* This function only removes the unlocked pages, if you want to
* remove all the pages of one inode, you must call truncate_inode_pages.
*
* invalidate_mapping_pages() will not block on IO activity. It will not
* invalidate pages which are dirty, locked, under writeback or mapped into
* pagetables.
*/

We want to remove all pages within the range given, so, as directed by
the comment here, we use truncate_inode_pages(). Says nothing about
mappings needing to be removed first so I guess that's where we've
been caught.....

I think we can use invalidate_inode_pages2_range(), but that doesn't
handle partial page invalidations. I think this will be ok, but it's
going to need some serious fsx testing on blocksize != page size
configs.

So, am I correct in assuming we should be calling invalidate_inode_pages2_range()
instead of truncate_inode_pages()?

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2007-01-07 22:48:23

by Andrew Morton

[permalink] [raw]
Subject: Re: BUG: warning at mm/truncate.c:60/cancel_dirty_page()

On Mon, 8 Jan 2007 09:23:41 +1100
David Chinner <[email protected]> wrote:

> How are you supposed to invalidate a range of pages in a mapping for
> this case, then? invalidate_mapping_pages() would appear to be the
> candidate (the generic code uses this), but it _skips_ pages that
> are already mapped.

unmap_mapping_range()?

> So, am I correct in assuming we should be calling invalidate_inode_pages2_range()
> instead of truncate_inode_pages()?

That would be conventional.

2007-01-07 23:05:00

by David Chinner

[permalink] [raw]
Subject: Re: BUG: warning at mm/truncate.c:60/cancel_dirty_page()

On Sun, Jan 07, 2007 at 02:48:12PM -0800, Andrew Morton wrote:
> On Mon, 8 Jan 2007 09:23:41 +1100
> David Chinner <[email protected]> wrote:
>
> > How are you supposed to invalidate a range of pages in a mapping for
> > this case, then? invalidate_mapping_pages() would appear to be the
> > candidate (the generic code uses this), but it _skips_ pages that
> > are already mapped.
>
> unmap_mapping_range()?

/me looks at how it's used in invalidate_inode_pages2_range() and
decides it's easier not to call this directly.

> > So, am I correct in assuming we should be calling invalidate_inode_pages2_range()
> > instead of truncate_inode_pages()?
>
> That would be conventional.

.... in that case the following patch should fix the warning:

---
fs/xfs/linux-2.6/xfs_fs_subr.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_fs_subr.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_fs_subr.c 2006-12-12 12:05:17.000000000 +1100
+++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_fs_subr.c 2007-01-08 09:30:22.056571711 +1100
@@ -21,6 +21,8 @@ int fs_noerr(void) { return 0; }
int fs_nosys(void) { return ENOSYS; }
void fs_noval(void) { return; }

+#define XFS_OFF_TO_PCSIZE(off) \
+ (((off) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT)
void
fs_tosspages(
bhv_desc_t *bdp,
@@ -32,7 +34,9 @@ fs_tosspages(
struct inode *ip = vn_to_inode(vp);

if (VN_CACHED(vp))
- truncate_inode_pages(ip->i_mapping, first);
+ invalidate_inode_pages2_range(ip->i_mapping,
+ XFS_OFF_TO_PCSIZE(first),
+ XFS_OFF_TO_PCSIZE(last));
}

void
@@ -49,7 +53,9 @@ fs_flushinval_pages(
if (VN_TRUNC(vp))
VUNTRUNCATE(vp);
filemap_write_and_wait(ip->i_mapping);
- truncate_inode_pages(ip->i_mapping, first);
+ invalidate_inode_pages2_range(ip->i_mapping,
+ XFS_OFF_TO_PCSIZE(first),
+ XFS_OFF_TO_PCSIZE(last));
}
}


--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2007-01-08 13:08:15

by Sami Farin

[permalink] [raw]
Subject: Re: BUG: warning at mm/truncate.c:60/cancel_dirty_page()

On Mon, Jan 08, 2007 at 10:04:36 +1100, David Chinner wrote:
> On Sun, Jan 07, 2007 at 02:48:12PM -0800, Andrew Morton wrote:
> > On Mon, 8 Jan 2007 09:23:41 +1100
> > David Chinner <[email protected]> wrote:
> >
> > > How are you supposed to invalidate a range of pages in a mapping for
> > > this case, then? invalidate_mapping_pages() would appear to be the
> > > candidate (the generic code uses this), but it _skips_ pages that
> > > are already mapped.
> >
> > unmap_mapping_range()?
>
> /me looks at how it's used in invalidate_inode_pages2_range() and
> decides it's easier not to call this directly.
>
> > > So, am I correct in assuming we should be calling invalidate_inode_pages2_range()
> > > instead of truncate_inode_pages()?
> >
> > That would be conventional.
>
> .... in that case the following patch should fix the warning:

I tried dt+strace+cinfo with this patch applied and got no warnings.
Thanks for quick fix.

--
Do what you love because life is too short for anything else.