2002-11-20 16:00:23

by Mark Haverkamp

[permalink] [raw]
Subject: Call trace at mm/page-writeback.c in 2.5.47

While running a memory stress workload test on a 16 processor numa
system, I received a number of call traces like the following:

buffer layer error at mm/page-writeback.c:559
Pass this trace through ksymoops for reporting
Call Trace:
[<c013f1fb>] __set_page_dirty_buffers+0x3b/0x150
[<c012d746>] zap_pte_range+0x1d6/0x2c0
[<c0183401>] do_get_write_access+0x4a1/0x4d0
[<c012d89c>] zap_pmd_range+0x6c/0x80
[<c012d8f0>] unmap_page_range+0x40/0x60
[<c012da0f>] zap_page_range+0xff/0x180
[<c012e76a>] vmtruncate_list+0x5a/0x80
[<c012e835>] vmtruncate+0xa5/0x150
[<c015b456>] inode_setattr+0x56/0x120
[<c0179087>] ext3_setattr+0x167/0x1d0
[<c015b696>] notify_change+0x106/0x1d9
[<c01433a8>] do_truncate+0x58/0x80
[<c01213d0>] tasklet_hi_action+0x80/0xd0
[<c01210cb>] do_softirq+0x5b/0xc0
[<c0143916>] sys_ftruncate64+0x106/0x120
[<c0108d73>] syscall_call+0x7/0xb

The system did not crash and continues to run. If someone wants to look
into this and needs more information, let me know.

Thanks,
Mark.

--
Mark Haverkamp <[email protected]>


2002-11-20 19:00:09

by Andrew Morton

[permalink] [raw]
Subject: Re: Call trace at mm/page-writeback.c in 2.5.47

Mark Haverkamp wrote:
>
> While running a memory stress workload test on a 16 processor numa
> system, I received a number of call traces like the following:

What is the workload? And in which journalling mode was ext3
being used?

Was the workload actually being run against ext3?

> buffer layer error at mm/page-writeback.c:559
> Pass this trace through ksymoops for reporting
> Call Trace:
> [<c013f1fb>] __set_page_dirty_buffers+0x3b/0x150
> [<c012d746>] zap_pte_range+0x1d6/0x2c0
> [<c0183401>] do_get_write_access+0x4a1/0x4d0
> [<c012d89c>] zap_pmd_range+0x6c/0x80

A non-uptodate page mapped into pagetables. I _think_ I
can see how that can happen. If the workload was, say,
bash-shared-mapping...

If it is reproducible, does the removal of the ClearPageUptodate
statement from mm/truncate.c:truncate_complete_page() make it
go away?

Thanks.

2002-11-20 21:06:46

by Mark Haverkamp

[permalink] [raw]
Subject: Re: Call trace at mm/page-writeback.c in 2.5.47

On Wed, 2002-11-20 at 11:07, Andrew Morton wrote:
> Mark Haverkamp wrote:
> >
> > While running a memory stress workload test on a 16 processor numa
> > system, I received a number of call traces like the following:
>
> What is the workload? And in which journalling mode was ext3
> being used?

I am using bash-shared-mapping and the ext3 journaling mode was the
default.

> Was the workload actually being run against ext3?

Yes.

> > buffer layer error at mm/page-writeback.c:559
> > Pass this trace through ksymoops for reporting
> > Call Trace:
> > [<c013f1fb>] __set_page_dirty_buffers+0x3b/0x150
> > [<c012d746>] zap_pte_range+0x1d6/0x2c0
> > [<c0183401>] do_get_write_access+0x4a1/0x4d0
> > [<c012d89c>] zap_pmd_range+0x6c/0x80
>
> A non-uptodate page mapped into pagetables. I _think_ I
> can see how that can happen. If the workload was, say,
> bash-shared-mapping...
>
> If it is reproducible, does the removal of the ClearPageUptodate
> statement from mm/truncate.c:truncate_complete_page() make it
> go away?

I get about 10 of these each time I run. Usually after a few minutes of
run time and all at once. Then no more.

I tried your suggestion and still got the call traces:

buffer layer error at mm/page-writeback.c:559
Pass this trace through ksymoops for reporting
Call Trace:
[<c0142fcb>] __set_page_dirty_buffers+0x3b/0x180
[<c012fec6>] zap_pte_range+0x1d6/0x2c0
[<c0188188>] do_get_write_access+0x508/0x530
[<c013001c>] zap_pmd_range+0x6c/0x80
[<c0130070>] unmap_page_range+0x40/0x60
[<c01301a3>] zap_page_range+0x113/0x1e0
[<c013109a>] vmtruncate_list+0x5a/0x80
[<c013116f>] vmtruncate+0xaf/0x170
[<c0161189>] inode_setattr+0x59/0x130
[<c017fa5a>] ext3_setattr+0x16a/0x1e0
[<c01613d6>] notify_change+0x106/0x216
[<c0147648>] do_truncate+0x58/0x80
[<c01494cc>] vfs_write+0xbc/0x180
[<c0122d6b>] do_softirq+0x5b/0xc0
[<c0147bb6>] sys_ftruncate64+0x106/0x120
[<c0108ecf>] syscall_call+0x7/0xb

Mark.


--

Mark Haverkamp <[email protected]>

2002-11-20 21:55:11

by Andrew Morton

[permalink] [raw]
Subject: Re: Call trace at mm/page-writeback.c in 2.5.47

Mark Haverkamp wrote:
>
> On Wed, 2002-11-20 at 11:07, Andrew Morton wrote:
> > Mark Haverkamp wrote:
> > >
> > > While running a memory stress workload test on a 16 processor numa
> > > system, I received a number of call traces like the following:
> >
> > What is the workload? And in which journalling mode was ext3
> > being used?
>
> I am using bash-shared-mapping and the ext3 journaling mode was the
> default.

OK, thanks. The fact that we actually survive this, on ext3, on
a 16p NUMA-Q is fairly encouraging.

> ...
>
> I get about 10 of these each time I run. Usually after a few minutes of
> run time and all at once. Then no more.

The warning shuts itself up after 10 messages.

> I tried your suggestion and still got the call traces:
>
> buffer layer error at mm/page-writeback.c:559

I shall attempt to reproduce this, thanks.