fre, 06.02.2004 kl. 00.51 skrev Marcelo Tosatti:
> On Tue, 3 Feb 2004, Stian Jordet wrote:
>
> > Hello,
> >
> > I have a server which was running 2.4.18 and 2.4.19 for almost 200 days
> > each, without problems. After an upgrade to 2.4.22, the box haven't been
> > up for 30 days in a row. This happened early november. I have caputered
> > oopses with both 2.4.23 and 2.6.1 which I have sent decoded to the list,
> > but have never got any reply.
> >
> > I have ran memtest86 on the box, no errors. What else can be the
> > problem? I could of course go back to 2.4.19, which I know worked fine,
> > but I there have been some fixed security holes since then...
> >
> > Any thoughts?
>
> Stian,
>
> I have seen your 2.4.x oopses and they seemed odd. The faults were
> happening in different functions (mostly inside VM "freeing" , due to
> what seems to be random crap in memory:
>
> <1>Unable to handle kernel NULL pointer dereference at virtual address 00000021
> c0132e86
> *pde = 00000000
>
> eax: 00000000 ebx: 00000009 ecx: 000001d2 edx: 00000012
> esi: 00000000 edi: c17e38c0 ebp: c1047a00 esp: c86cbdb4
>
> >>EIP; c0132e86 <sync_page_buffers+e/a4> <=====
>
> >>edi; c17e38c0 <_end+14b5844/bd23f84>
> >>ebp; c1047a00 <_end+d19984/bd23f84>
> >>esp; c86cbdb4 <_end+839dd38/bd23f84>
>
> Trace; c0132fdc <try_to_free_buffers+c0/ec>
>
> Code; c0132e86 <sync_page_buffers+e/a4>
> 00000000 <_EIP>:
> Code; c0132e86 <sync_page_buffers+e/a4> <=====
> 0: f6 43 18 06 testb $0x6,0x18(%ebx) <=====
> Code; c0132e8a <sync_page_buffers+12/a4>
> 4: 74 7c je 82 <_EIP+0x82> c0132f08
> <sync_page_buffers+90/a4>
> Code; c0132e8c <sync_page_buffers+14/a4>
> 6: b8 07 00 00 00 mov $0x7,%eax
> Code; c0132e91 <sync_page_buffers+19/a4>
>
>
>
>
> <1>Unable to handle kernel NULL pointer dereference at virtual address
> 00000028
> c015e3a2
> *pde = 00000000
> Oops: 0000
> CPU: 0
> EIP: 0010:[<c015e3a2>] Not tainted
> EFLAGS: 00010203
>
> eax: 0100004d ebx: 00000000 ecx: 000001d2 edx: 00000000
>
> Code; c015e3a2 <journal_try_to_free_buffers+5a/98>
> 00000000 <_EIP>:
> Code; c015e3a2 <journal_try_to_free_buffers+5a/98> <=====
> 0: 8b 5b 28 mov 0x28(%ebx),%ebx <=====
> Code; c015e3a5 <journal_try_to_free_buffers+5d/98>
> 3: f6 42 19 04 testb $0x4,0x19(%edx)
> Code; c015e3a9 <journal_try_to_free_buffers+61/98>
> 7: 74 17 je 20 <_EIP+0x20> c015e3c2
> <journal_try_to_free_buffers+7a/98>
>
> And other similar oopses.
>
> Are you sure there is nothing messing up the hardware ?
>
> How long have you ran memtest86? It can, sometimes, take a long to showup
> errors.
>
> The 2.6.x oopses on the same hardware is also a useful source of
> information.
Marcelo,
sorry for getting back to you so insanely late. This was a production
server, and I have now moved the services it were running to another
box, so I could run a more exhaustive memtest86. It has now ran for two
days, without any errors. Of course there could be other flaky hardware,
but since I don't know any way to test it, and the oops occurs with
two-four weeks interval, it's quite time consuming to find out. I'm not
even sure if it will oops without the typical load it used to have.
Anyway, thank you very much for at least answering me. Much appreciated
:)
Best regards,
Stian
Btw, here is one of the 2.6.x oopses as well (as you requested).
Best regards,
Stian
On Tue, 2 Mar 2004, Stian Jordet wrote:
> Btw, here is one of the 2.6.x oopses as well (as you requested).
Stian,
This sounds like bad hardware. Did I already ask you to try memtest86 ?
Quoting Marcelo Tosatti <[email protected]>:
>
>
> On Tue, 2 Mar 2004, Stian Jordet wrote:
>
> > Btw, here is one of the 2.6.x oopses as well (as you requested).
>
> Stian,
>
> This sounds like bad hardware. Did I already ask you to try memtest86 ?
>
Yup, and I think I wrote that I had it running for almost two days with no
errors. Oh well. Thanks for looking into this :) I guess I'll try go afford a
new server (very) soon.
Best regards,
Stian
Marcelo and Stian,
I have also seen this oops relating to low memory situations. I think
ext3 allocates some data, has a null return, sets something to null, and
then later it is dereferenced in kwapd.
Anyone have a patch for this problem?
Thanks
-steve
On Tue, 2004-03-02 at 04:03, Stian Jordet wrote:
> fre, 06.02.2004 kl. 00.51 skrev Marcelo Tosatti:
> > On Tue, 3 Feb 2004, Stian Jordet wrote:
> >
> > > Hello,
> > >
> > > I have a server which was running 2.4.18 and 2.4.19 for almost 200 days
> > > each, without problems. After an upgrade to 2.4.22, the box haven't been
> > > up for 30 days in a row. This happened early november. I have caputered
> > > oopses with both 2.4.23 and 2.6.1 which I have sent decoded to the list,
> > > but have never got any reply.
> > >
> > > I have ran memtest86 on the box, no errors. What else can be the
> > > problem? I could of course go back to 2.4.19, which I know worked fine,
> > > but I there have been some fixed security holes since then...
> > >
> > > Any thoughts?
> >
> > Stian,
> >
> > I have seen your 2.4.x oopses and they seemed odd. The faults were
> > happening in different functions (mostly inside VM "freeing" , due to
> > what seems to be random crap in memory:
> >
> > <1>Unable to handle kernel NULL pointer dereference at virtual address 00000021
> > c0132e86
> > *pde = 00000000
> >
> > eax: 00000000 ebx: 00000009 ecx: 000001d2 edx: 00000012
> > esi: 00000000 edi: c17e38c0 ebp: c1047a00 esp: c86cbdb4
> >
> > >>EIP; c0132e86 <sync_page_buffers+e/a4> <=====
> >
> > >>edi; c17e38c0 <_end+14b5844/bd23f84>
> > >>ebp; c1047a00 <_end+d19984/bd23f84>
> > >>esp; c86cbdb4 <_end+839dd38/bd23f84>
> >
> > Trace; c0132fdc <try_to_free_buffers+c0/ec>
> >
> > Code; c0132e86 <sync_page_buffers+e/a4>
> > 00000000 <_EIP>:
> > Code; c0132e86 <sync_page_buffers+e/a4> <=====
> > 0: f6 43 18 06 testb $0x6,0x18(%ebx) <=====
> > Code; c0132e8a <sync_page_buffers+12/a4>
> > 4: 74 7c je 82 <_EIP+0x82> c0132f08
> > <sync_page_buffers+90/a4>
> > Code; c0132e8c <sync_page_buffers+14/a4>
> > 6: b8 07 00 00 00 mov $0x7,%eax
> > Code; c0132e91 <sync_page_buffers+19/a4>
> >
> >
> >
> >
> > <1>Unable to handle kernel NULL pointer dereference at virtual address
> > 00000028
> > c015e3a2
> > *pde = 00000000
> > Oops: 0000
> > CPU: 0
> > EIP: 0010:[<c015e3a2>] Not tainted
> > EFLAGS: 00010203
> >
> > eax: 0100004d ebx: 00000000 ecx: 000001d2 edx: 00000000
> >
> > Code; c015e3a2 <journal_try_to_free_buffers+5a/98>
> > 00000000 <_EIP>:
> > Code; c015e3a2 <journal_try_to_free_buffers+5a/98> <=====
> > 0: 8b 5b 28 mov 0x28(%ebx),%ebx <=====
> > Code; c015e3a5 <journal_try_to_free_buffers+5d/98>
> > 3: f6 42 19 04 testb $0x4,0x19(%edx)
> > Code; c015e3a9 <journal_try_to_free_buffers+61/98>
> > 7: 74 17 je 20 <_EIP+0x20> c015e3c2
> > <journal_try_to_free_buffers+7a/98>
> >
> > And other similar oopses.
> >
> > Are you sure there is nothing messing up the hardware ?
> >
> > How long have you ran memtest86? It can, sometimes, take a long to showup
> > errors.
> >
> > The 2.6.x oopses on the same hardware is also a useful source of
> > information.
>
> Marcelo,
>
> sorry for getting back to you so insanely late. This was a production
> server, and I have now moved the services it were running to another
> box, so I could run a more exhaustive memtest86. It has now ran for two
> days, without any errors. Of course there could be other flaky hardware,
> but since I don't know any way to test it, and the oops occurs with
> two-four weeks interval, it's quite time consuming to find out. I'm not
> even sure if it will oops without the typical load it used to have.
>
> Anyway, thank you very much for at least answering me. Much appreciated
> :)
>
> Best regards,
> Stian
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
Marcelo, Stian, Steve,
I also have seen this, more than a dozen times -- always related
to writes to my ext3 partition concurrent with heavy swapping
due to memory pressure. Although my oopses are usually not NULL
dereferences, but simply bad addresses. In the two or three cases
where I've actually traced the oops back to the code, (search
the April archives for my name) it's been a corrupted pointer,
always in the high part of the word, but not always the same
part. For weeks, I thought that it was flakey RAM, but since
days of memtest86 didn't fail I went back to trying to strip
more stuff out of the kernel. Since the last cut, I haven't had
a single oops, for approx. 3 weeks of uptime. Previous pattern
was 1 to 3 days between oopses.
I didn't really learn much that would help you. However, if you
are trying to reproduce the failure more rapidly, my experience
would suggest that it is necessary to run a memory hog,
concurrent with an io process (e.g kernel compile.) Either one
alone may run for days with no failure.
Let me know if I can help.
-Chris
On Mon, Jun 14, 2004 at 10:07:05AM -0700, Steven Dake wrote:
> Marcelo and Stian,
>
> I have also seen this oops relating to low memory situations. I think
> ext3 allocates some data, has a null return, sets something to null, and
> then later it is dereferenced in kwapd.
>
> Anyone have a patch for this problem?
>
> Thanks
> -steve
>
> On Tue, 2004-03-02 at 04:03, Stian Jordet wrote:
> > fre, 06.02.2004 kl. 00.51 skrev Marcelo Tosatti:
> > > On Tue, 3 Feb 2004, Stian Jordet wrote:
> > >
> > > > Hello,
> > > >
> > > > I have a server which was running 2.4.18 and 2.4.19 for almost 200 days
> > > > each, without problems. After an upgrade to 2.4.22, the box haven't been
> > > > up for 30 days in a row. This happened early november. I have caputered
> > > > oopses with both 2.4.23 and 2.6.1 which I have sent decoded to the list,
> > > > but have never got any reply.
> > > >
> > > > I have ran memtest86 on the box, no errors. What else can be the
> > > > problem? I could of course go back to 2.4.19, which I know worked fine,
> > > > but I there have been some fixed security holes since then...
> > > >
> > > > Any thoughts?
> > >
> > > Stian,
> > >
> > > I have seen your 2.4.x oopses and they seemed odd. The faults were
> > > happening in different functions (mostly inside VM "freeing" , due to
> > > what seems to be random crap in memory:
> > >
> > > <1>Unable to handle kernel NULL pointer dereference at virtual address 00000021
> > > c0132e86
> > > *pde = 00000000
> > >
> > > eax: 00000000 ebx: 00000009 ecx: 000001d2 edx: 00000012
> > > esi: 00000000 edi: c17e38c0 ebp: c1047a00 esp: c86cbdb4
> > >
> > > >>EIP; c0132e86 <sync_page_buffers+e/a4> <=====
> > >
> > > >>edi; c17e38c0 <_end+14b5844/bd23f84>
> > > >>ebp; c1047a00 <_end+d19984/bd23f84>
> > > >>esp; c86cbdb4 <_end+839dd38/bd23f84>
> > >
> > > Trace; c0132fdc <try_to_free_buffers+c0/ec>
> > >
> > > Code; c0132e86 <sync_page_buffers+e/a4>
> > > 00000000 <_EIP>:
> > > Code; c0132e86 <sync_page_buffers+e/a4> <=====
> > > 0: f6 43 18 06 testb $0x6,0x18(%ebx) <=====
> > > Code; c0132e8a <sync_page_buffers+12/a4>
> > > 4: 74 7c je 82 <_EIP+0x82> c0132f08
> > > <sync_page_buffers+90/a4>
> > > Code; c0132e8c <sync_page_buffers+14/a4>
> > > 6: b8 07 00 00 00 mov $0x7,%eax
> > > Code; c0132e91 <sync_page_buffers+19/a4>
> > >
> > >
> > >
> > >
> > > <1>Unable to handle kernel NULL pointer dereference at virtual address
> > > 00000028
> > > c015e3a2
> > > *pde = 00000000
> > > Oops: 0000
> > > CPU: 0
> > > EIP: 0010:[<c015e3a2>] Not tainted
> > > EFLAGS: 00010203
> > >
> > > eax: 0100004d ebx: 00000000 ecx: 000001d2 edx: 00000000
> > >
> > > Code; c015e3a2 <journal_try_to_free_buffers+5a/98>
> > > 00000000 <_EIP>:
> > > Code; c015e3a2 <journal_try_to_free_buffers+5a/98> <=====
> > > 0: 8b 5b 28 mov 0x28(%ebx),%ebx <=====
> > > Code; c015e3a5 <journal_try_to_free_buffers+5d/98>
> > > 3: f6 42 19 04 testb $0x4,0x19(%edx)
> > > Code; c015e3a9 <journal_try_to_free_buffers+61/98>
> > > 7: 74 17 je 20 <_EIP+0x20> c015e3c2
> > > <journal_try_to_free_buffers+7a/98>
> > >
> > > And other similar oopses.
> > >
> > > Are you sure there is nothing messing up the hardware ?
> > >
> > > How long have you ran memtest86? It can, sometimes, take a long to showup
> > > errors.
> > >
> > > The 2.6.x oopses on the same hardware is also a useful source of
> > > information.
> >
> > Marcelo,
> >
> > sorry for getting back to you so insanely late. This was a production
> > server, and I have now moved the services it were running to another
> > box, so I could run a more exhaustive memtest86. It has now ran for two
> > days, without any errors. Of course there could be other flaky hardware,
> > but since I don't know any way to test it, and the oops occurs with
> > two-four weeks interval, it's quite time consuming to find out. I'm not
> > even sure if it will oops without the typical load it used to have.
> >
> > Anyway, thank you very much for at least answering me. Much appreciated
> > :)
> >
> > Best regards,
> > Stian
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On Mon, Jun 14, 2004 at 10:07:05AM -0700, Steven Dake wrote:
> Marcelo and Stian,
>
> I have also seen this oops relating to low memory situations. I think
> ext3 allocates some data, has a null return, sets something to null, and
> then later it is dereferenced in kwapd.
>
> Anyone have a patch for this problem?
Steven,
For what I remember Stian oopses were happening in random places in the VM freeing
routines. That makes me belive what he was seeing was some kind of hardware issue,
because otherwise the oopses would be happening in the same place (in case it was
a software bug). The codepaths which he saw trying to access invalid addresses are
executed flawlessly by all 2.4.x mainline users. He was also seeing oopses with v2.6.
Assuming his HW is not faulty, I can think of some driver corrupting his memory.
Do you have any traces of the oopses you are seeing?
Stian, you told us switched servers now, I assume the problem is gone?
Are you still running v2.4 on that server?
Thanks!
tir, 15.06.2004 kl. 10.16 -0300, skrev Marcelo Tosatti:
> Stian, you told us switched servers now, I assume the problem is gone?
> Are you still running v2.4 on that server?
I switched servers, and the oops is gone. The old server is still
running, but without the nightly memory intensive perl-script, and have
never seen an oops since I stopped running that. Both servers are
running 2.6 now.
Best regards,
Stian
On Tue, 2004-06-15 at 06:16, Marcelo Tosatti wrote:
> On Mon, Jun 14, 2004 at 10:07:05AM -0700, Steven Dake wrote:
> > Marcelo and Stian,
> >
> > I have also seen this oops relating to low memory situations. I think
> > ext3 allocates some data, has a null return, sets something to null, and
> > then later it is dereferenced in kwapd.
> >
> > Anyone have a patch for this problem?
> Steven,
>
> For what I remember Stian oopses were happening in random places in the VM freeing
> routines. That makes me belive what he was seeing was some kind of hardware issue,
> because otherwise the oopses would be happening in the same place (in case it was
> a software bug). The codepaths which he saw trying to access invalid addresses are
> executed flawlessly by all 2.4.x mainline users. He was also seeing oopses with v2.6.
>
> Assuming his HW is not faulty, I can think of some driver corrupting his memory.
>
> Do you have any traces of the oopses you are seeing?
>
> Stian, you told us switched servers now, I assume the problem is gone?
> Are you still running v2.4 on that server?
>
Marcelo,
Stian responded saying he upgraded to 2.6 and also removed the memory
intensive script and his problems went away. I suspect removing the
memory intesive script did the trick. 2.6 could also be fixed, who
knows?
After reading lkml, there are about 4-5 Oops in the same function at the
same location. The people report they were heavily using memory (low
memory situation) in some of the bug reports.
I have tracked down the problem to a null dereference during a buffer
cache rebalance (which occurs during low memory situations). Here is
the info I have. Unfortunately I don't know much about how vm handles
low memory situations with the vfs, so if you have any ideas, it would
be helpful. :)
nable to handle kernel NULL pointer dereference at virtual address
00000028
<4> printing eip:
<4>c018aa67
<1>*pde = 00000000
<4>Oops: 0000
<4>CPU: 2
<4>EIP: 0010:[<c018aa67>] Not tainted
<4>EFLAGS: 00010203
<4>eax: 00000000 ebx: 00000000 ecx: c0380490 edx: c217a540
<4>esi: f2757a80 edi: 00000001 ebp: 00000000 esp: f7bd1ee8
<4>ds: 0018 es: 0018 ss: 0018
<4>Process kswapd (pid: 11, stackpage=f7bd1000)
<4>Stack: 00000000 00000000 c217a600 00000000 00000202 00000000 c217a540 000001d0
<4> c217a540 000102ec c018144f f7abe400 c217a540 000001d0 c0155006 c217a540
<4> 000001d0 f7bd0000 00000000 c0147f0a c217a540 000001d0 f7bd0000 f7bd0000
<4>Call Trace: [<c018144f>] [<c0155006>] [<c0147f0a>]
[<c014832b>] [<c0148398>]
<4> [<c0148431>] [<c014847f>] [<c0148593>] [<c0105000>]
[<c010578a>] [<c01484f8>]
<4>
<4>Code: 8b 5b 28 f6 40 19 02 75 47 39 f3 75 f1 c6 05 80 24 38 c0 01
>>EIP; c018aa67 <journal_try_to_free_buffers+45/f4> <=====
Trace; c018144f <ext3_releasepage+2d/32>
Trace; c0155006 <try_to_release_page+4e/78>
Trace; c0147f0a <shrink_cache+270/4f6>
Trace; c014832b <shrink_caches+61/98>
Trace; c0148398 <try_to_free_pages+36/54>
Trace; c0148431 <kswapd_balance_pgdat+51/8a>
Trace; c014847f <kswapd_balance+15/2c>
Trace; c0148593 <kswapd+9b/b6>
Trace; c0105000 <_stext+0/0>
Trace; c010578a <kernel_thread+2e/40>
Trace; c01484f8 <kswapd+0/b6>
The problem is that in this function:
int journal_try_to_free_buffers(journal_t *journal,
struct page *page, int gfp_mask)
{
struct buffer_head *bh;
struct buffer_head *tmp;
int locked_or_dirty = 0;
int call_ttfb = 1;
J_ASSERT(PageLocked(page));
bh = page->buffers;
tmp = bh;
spin_lock(&journal_datalist_lock);
do {
struct buffer_head *p = tmp;
tmp = tmp->b_this_page;
...
tmp (page->buffers) above is null. b_this_page is at offset 0x28 (the accessed address in the oops). This means that
page->buffers is set to null by some other routine which results in the oops.
I read the page allocate code
(ext3_read_page->block_read_full_page->create_emty_buffers->create_buffers), and it appears that it is not possible to allocate a page->buffers value of zero in the allocate function. I am having difficulty reproducing and cannot debug further, however. Can page->buffers be set to zero somewhere else? Perhaps kswapd and some other thread are racing on the free?
Thansk
-steve
> Thanks!
On Tue, Jun 15, 2004 at 10:56:38AM -0700, Steven Dake wrote:
> On Tue, 2004-06-15 at 06:16, Marcelo Tosatti wrote:
> > On Mon, Jun 14, 2004 at 10:07:05AM -0700, Steven Dake wrote:
> > > Marcelo and Stian,
> > >
> > > I have also seen this oops relating to low memory situations. I think
> > > ext3 allocates some data, has a null return, sets something to null, and
> > > then later it is dereferenced in kwapd.
> > >
> > > Anyone have a patch for this problem?
>
> > Steven,
> >
> > For what I remember Stian oopses were happening in random places in the VM freeing
> > routines. That makes me belive what he was seeing was some kind of hardware issue,
> > because otherwise the oopses would be happening in the same place (in case it was
> > a software bug). The codepaths which he saw trying to access invalid addresses are
> > executed flawlessly by all 2.4.x mainline users. He was also seeing oopses with v2.6.
> >
> > Assuming his HW is not faulty, I can think of some driver corrupting his memory.
> >
> > Do you have any traces of the oopses you are seeing?
> >
> > Stian, you told us switched servers now, I assume the problem is gone?
> > Are you still running v2.4 on that server?
> >
>
> Marcelo,
>
> Stian responded saying he upgraded to 2.6 and also removed the memory
> intensive script and his problems went away. I suspect removing the
> memory intesive script did the trick. 2.6 could also be fixed, who
> knows?
>
> After reading lkml, there are about 4-5 Oops in the same function at the
> same location. The people report they were heavily using memory (low
> memory situation) in some of the bug reports.
>
> I have tracked down the problem to a null dereference during a buffer
> cache rebalance (which occurs during low memory situations). Here is
> the info I have. Unfortunately I don't know much about how vm handles
> low memory situations with the vfs, so if you have any ideas, it would
> be helpful. :)
>
> nable to handle kernel NULL pointer dereference at virtual address
> 00000028
> <4> printing eip:
> <4>c018aa67
> <1>*pde = 00000000
> <4>Oops: 0000
> <4>CPU: 2
> <4>EIP: 0010:[<c018aa67>] Not tainted
> <4>EFLAGS: 00010203
> <4>eax: 00000000 ebx: 00000000 ecx: c0380490 edx: c217a540
> <4>esi: f2757a80 edi: 00000001 ebp: 00000000 esp: f7bd1ee8
> <4>ds: 0018 es: 0018 ss: 0018
> <4>Process kswapd (pid: 11, stackpage=f7bd1000)
> <4>Stack: 00000000 00000000 c217a600 00000000 00000202 00000000 c217a540 000001d0
> <4> c217a540 000102ec c018144f f7abe400 c217a540 000001d0 c0155006 c217a540
> <4> 000001d0 f7bd0000 00000000 c0147f0a c217a540 000001d0 f7bd0000 f7bd0000
> <4>Call Trace: [<c018144f>] [<c0155006>] [<c0147f0a>]
> [<c014832b>] [<c0148398>]
> <4> [<c0148431>] [<c014847f>] [<c0148593>] [<c0105000>]
> [<c010578a>] [<c01484f8>]
> <4>
> <4>Code: 8b 5b 28 f6 40 19 02 75 47 39 f3 75 f1 c6 05 80 24 38 c0 01
>
>
> >>EIP; c018aa67 <journal_try_to_free_buffers+45/f4> <=====
> Trace; c018144f <ext3_releasepage+2d/32>
> Trace; c0155006 <try_to_release_page+4e/78>
> Trace; c0147f0a <shrink_cache+270/4f6>
> Trace; c014832b <shrink_caches+61/98>
> Trace; c0148398 <try_to_free_pages+36/54>
> Trace; c0148431 <kswapd_balance_pgdat+51/8a>
> Trace; c014847f <kswapd_balance+15/2c>
> Trace; c0148593 <kswapd+9b/b6>
> Trace; c0105000 <_stext+0/0>
> Trace; c010578a <kernel_thread+2e/40>
> Trace; c01484f8 <kswapd+0/b6>
>
> The problem is that in this function:
>
> int journal_try_to_free_buffers(journal_t *journal,
> struct page *page, int gfp_mask)
> {
> struct buffer_head *bh;
> struct buffer_head *tmp;
> int locked_or_dirty = 0;
> int call_ttfb = 1;
>
> J_ASSERT(PageLocked(page));
>
> bh = page->buffers;
> tmp = bh;
> spin_lock(&journal_datalist_lock);
> do {
> struct buffer_head *p = tmp;
>
> tmp = tmp->b_this_page;
> ...
>
> tmp (page->buffers) above is null. b_this_page is at offset 0x28 (the accessed address in the oops). This means that
> page->buffers is set to null by some other routine which results in the oops.
>
> I read the page allocate code
> (ext3_read_page->block_read_full_page->create_emty_buffers->create_buffers), and it appears that it is not possible to allocate a page->buffers value of zero in the allocate function. I am having difficulty reproducing and cannot debug further, however. Can page->buffers be set to zero somewhere else?
>Perhaps kswapd and some other thread are racing on the free?
Steve,
Hum, I'm starting to believe we might have an issue here.
Searching lkml archives I find other similar oopses at the same place
(trying to access 00000028, tmp->b_this_page), as you said.
However I wonder what other kernel codepath could remove the page buffers
under us, the page MUST be locked here. In the backtrace above the page
is locked by shrink_cache(). And with the page locked, we guarantee the VM
freeing routines (shrink_cache) wont try to mess with the page.
Can you reproduce the oopsen?
Stephen, Andrew, do you have any idea how the buffers could have vanished
under us with the page locked? That should not be possible.
I dont see how this "page->buffers = NULL" could be caused by hardware problem,
which is usually one or two bit flip.
Thanks everyone
Marcelo Tosatti <[email protected]> wrote:
>
> > tmp (page->buffers) above is null. b_this_page is at offset 0x28 (the accessed address in the oops). This means that
> > page->buffers is set to null by some other routine which results in the oops.
> >
> > I read the page allocate code
> > (ext3_read_page->block_read_full_page->create_emty_buffers->create_buffers), and it appears that it is not possible to allocate a page->buffers value of zero in the allocate function. I am having difficulty reproducing and cannot debug further, however. Can page->buffers be set to zero somewhere else?
> >Perhaps kswapd and some other thread are racing on the free?
>
> Steve,
>
> Hum, I'm starting to believe we might have an issue here.
>
> Searching lkml archives I find other similar oopses at the same place
> (trying to access 00000028, tmp->b_this_page), as you said.
>
> However I wonder what other kernel codepath could remove the page buffers
> under us, the page MUST be locked here. In the backtrace above the page
> is locked by shrink_cache(). And with the page locked, we guarantee the VM
> freeing routines (shrink_cache) wont try to mess with the page.
>
> Can you reproduce the oopsen?
>
> Stephen, Andrew, do you have any idea how the buffers could have vanished
> under us with the page locked? That should not be possible.
>
> I dont see how this "page->buffers = NULL" could be caused by hardware problem,
> which is usually one or two bit flip.
It's a bit odd. The page is definitely locked, and definitely had non-null
->buffers a few tens of instructions beforehand.
Is this an SMP machine?
One possibility is that we died on the second pass around the loop:
page->buffers points at a buffer_head which has a NULL ->b_this_page. But
I cannot suggest how ->b_this_page could have been zapped.
On Thu, Jun 17, 2004 at 08:08:59PM -0700, Andrew Morton wrote:
> Marcelo Tosatti <[email protected]> wrote:
> >
> > > tmp (page->buffers) above is null. b_this_page is at offset 0x28 (the accessed address in the oops). This means that
> > > page->buffers is set to null by some other routine which results in the oops.
> > >
> > > I read the page allocate code
> > > (ext3_read_page->block_read_full_page->create_emty_buffers->create_buffers), and it appears that it is not possible to allocate a page->buffers value of zero in the allocate function. I am having difficulty reproducing and cannot debug further, however. Can page->buffers be set to zero somewhere else?
> > >Perhaps kswapd and some other thread are racing on the free?
> >
> > Steve,
> >
> > Hum, I'm starting to believe we might have an issue here.
> >
> > Searching lkml archives I find other similar oopses at the same place
> > (trying to access 00000028, tmp->b_this_page), as you said.
> >
> However I wonder what other kernel codepath could remove the page buffers
> > under us, the page MUST be locked here. In the backtrace above the page
> > is locked by shrink_cache(). And with the page locked, we guarantee the VM
> > freeing routines (shrink_cache) wont try to mess with the page.
> >
> > Can you reproduce the oopsen?
> >
> > Stephen, Andrew, do you have any idea how the buffers could have vanished
> > under us with the page locked? That should not be possible.
> >
> > I dont see how this "page->buffers = NULL" could be caused by hardware problem,
> > which is usually one or two bit flip.
>
> It's a bit odd. The page is definitely locked, and definitely had non-null
> ->buffers a few tens of instructions beforehand.
>
> Is this an SMP machine?
Steven, did you see the NULL ->b_this_page on SMP or UP?
Stian Jordet had an SMP server, but he also was seeing oopses with v2.6:
kernel BUG at mm/page_alloc.c:201!
invalid operand: 0000 [#1]
CPU: 0
EIP: 0060:[free_pages_bulk+482/512] Not tainted
EIP is at free_pages_bulk+0x1e2/0x200
eax: 00000001 ebx: c00609c8 ecx: 00000000 edx: \
666026a5
esi: 666026a4 edi: ffffffff ebp: 33301352 esp: \
c86d5d90
Process mrtg (pid: 26804, threadinfo=c86d4000 \
task=c9b860c0)
Call Trace:
[free_hot_cold_page+217/240] \
free_hot_cold_page+0xd9/0xf0
[do_generic_mapping_read+714/1008] \
do_generic_mapping_read+0x2ca/0x3f0
[file_read_actor+0/256] file_read_actor+0x0/0x100
[__generic_file_aio_read+454/512] \
__generic_file_aio_read+0x1c6/0x200
[file_read_actor+0/256] file_read_actor+0x0/0x100
[generic_file_aio_read+91/128] \
generic_file_aio_read+0x5b/0x80
[do_sync_read+137/192] do_sync_read+0x89/0xc0
[do_page_fault+300/1328] do_page_fault+0x12c/0x530
[do_brk+324/560] do_brk+0x144/0x230
[vfs_read+184/304] vfs_read+0xb8/0x130
[sys_read+66/112] sys_read+0x42/0x70
[syscall_call+7/11] syscall_call+0x7/0xb
and different oopses on v2.4, including sync_page_buffers (also NULL+offset access):
<1>Unable to handle kernel NULL pointer dereference at virtual address 00000021
c0132e86
eax: 00000000 ebx: 00000009 ecx: 000001d2 edx: 00000012
esi: 00000000 edi: c17e38c0 ebp: c1047a00 esp: c86cbdb4
>>EIP; c0132e86 <sync_page_buffers+e/a4> <=====
Trace; c0132fdc <try_to_free_buffers+c0/ec>
Code; c0132e86 <sync_page_buffers+e/a4>
00000000 <_EIP>:
Code; c0132e86 <sync_page_buffers+e/a4> <=====
0: f6 43 18 06 testb $0x6,0x18(%ebx) <=====
Code; c0132e8a <sync_page_buffers+12/a4>
and the journal_try_to_free_buffers() one:
Unable to handle kernel NULL pointer dereference at virtual address 00000028
c015e3a2
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c015e3a2>] Not tainted
EFLAGS: 00010203
eax: 0100004d ebx: 00000000 ecx: 000001d2 edx: 00000000
Code; c015e3a2 <journal_try_to_free_buffers+5a/98>
00000000 <_EIP>:
Code; c015e3a2 <journal_try_to_free_buffers+5a/98> <=====
0: 8b 5b 28 mov 0x28(%ebx),%ebx <=====
Code; c015e3a5 <journal_try_to_free_buffers+5d/98>
He upgraded the box and stopped seeing the crashes, running
recent v2.6.
However, he also mentioned that his crashes started after upgrading
from v2.4.19->2.4.22. Should search the diff between them looking for
anything suspicious.
I can't figure out from the archived reports if this is UP or SMP only.
Frank van Maarseveen has also seen the journal_try_to_free_buffers() NULL
b_this_page. Frank, were you running SMP or UP when you reported the oops
with 2.4.23?
> One possibility is that we died on the second pass around the loop:
> page->buffers points at a buffer_head which has a NULL ->b_this_page. But
> I cannot suggest how ->b_this_page could have been zapped.
Oh, yes, indeed.
Maybe adding this (untested) to v2.4 mainline helps? Comments?
--- transaction.c.orig 2004-06-19 15:21:32.861148560 -0300
+++ transaction.c 2004-06-19 15:23:18.214132472 -0300
@@ -1694,6 +1694,24 @@
return 0;
}
+void debug_page(struct page *p)
+{
+ struct buffer_head *bh;
+
+ bh = p->buffers;
+
+ printk(KERN_ERR "%s: page index:%u count:%d flags:%x\n", __FUNCTION__,
+ ,p->index , atomic_read(&p->count), p->flags);
+
+ do {
+ printk(KERN_ERR "%s: bh b_next:%p blocknr:%u b_list:%u state:%x\n",
+ __FUNCTION__, bh->b_next, bh->b_blocknr, bh->b_list,
+ bh->b_state);
+ bh = bh->b_this_page;
+ } while (bh);
+}
+
+
/**
* int journal_try_to_free_buffers() - try to free page buffers.
@@ -1752,6 +1770,11 @@
do {
struct buffer_head *p = tmp;
+ if (!unlikely(tmp)) {
+ debug_page(page);
+ BUG();
+ }
+
tmp = tmp->b_this_page;
if (buffer_jbd(p))
if (!__journal_try_to_free_buffer(p, &locked_or_dirty))
Marcelo Tosatti <[email protected]> wrote:
>
> Maybe adding this (untested) to v2.4 mainline helps? Comments?
It would be helpful.
> --- transaction.c.orig 2004-06-19 15:21:32.861148560 -0300
> +++ transaction.c 2004-06-19 15:23:18.214132472 -0300
> @@ -1694,6 +1694,24 @@
> return 0;
> }
>
> +void debug_page(struct page *p)
> +{
> + struct buffer_head *bh;
> +
> + bh = p->buffers;
> +
> + printk(KERN_ERR "%s: page index:%u count:%d flags:%x\n", __FUNCTION__,
> + ,p->index , atomic_read(&p->count), p->flags);
^
> +
> + do {
> + printk(KERN_ERR "%s: bh b_next:%p blocknr:%u b_list:%u state:%x\n",
> + __FUNCTION__, bh->b_next, bh->b_blocknr, bh->b_list,
> + bh->b_state);
> + bh = bh->b_this_page;
> + } while (bh);
> +}
> +
you'll want to make this a while (!bh) {} loop, to handle the
page->buffers==NULL case.
On Sat, Jun 19, 2004 at 04:48:49PM -0300, Marcelo Tosatti wrote:
>
> However, he also mentioned that his crashes started after upgrading
> from v2.4.19->2.4.22. Should search the diff between them looking for
> anything suspicious.
>
> I can't figure out from the archived reports if this is UP or SMP only.
>
> Frank van Maarseveen has also seen the journal_try_to_free_buffers() NULL
> b_this_page. Frank, were you running SMP or UP when you reported the oops
> with 2.4.23?
UP
--
Frank
On Sat, Jun 19, 2004 at 09:50:13PM +0200, Frank van Maarseveen wrote:
> On Sat, Jun 19, 2004 at 04:48:49PM -0300, Marcelo Tosatti wrote:
> >
> > However, he also mentioned that his crashes started after upgrading
> > from v2.4.19->2.4.22. Should search the diff between them looking for
> > anything suspicious.
> >
> > I can't figure out from the archived reports if this is UP or SMP only.
> >
> > Frank van Maarseveen has also seen the journal_try_to_free_buffers() NULL
> > b_this_page. Frank, were you running SMP or UP when you reported the oops
> > with 2.4.23?
>
> UP
Hi Frank,
Has the oops happened again? What kernel are you running now?
Thanks!
On Sat, Jun 19, 2004 at 07:17:11PM -0300, Marcelo Tosatti wrote:
>
> Has the oops happened again? What kernel are you running now?
no new oopses.
The machine has used 2.4.21 for a while, then 2.4.25 and now 2.4.26. The
problem _seems_ to show up here only in 2.4.22, 2.4.23 and 2.4.24.
--
Frank
Hi Marcelo,
On Sat, Jun 19, 2004 at 04:48:49PM -0300, Marcelo Tosatti wrote:
> + if (!unlikely(tmp)) {
I think you meant "if (unlikely(!tmp))" here. Howver this does not make
a big difference.
Regards,
willy
Hi,
On Thu, 2004-06-17 at 14:16, Marcelo Tosatti wrote:
> Stephen, Andrew, do you have any idea how the buffers could have vanished
> under us with the page locked? That should not be possible.
No, especially not on UP as Frank reported.
> I dont see how this "page->buffers = NULL" could be caused by hardware problem,
> which is usually one or two bit flip.
We don't know for sure that it's page->buffers. If we have gone round
the bh->b_this_page loop already, we could have ended up following the
pointers either to an invalid bh, or to one that's not on the current
page. So it could also be the previous buffer's b_this_page that got
clobbered, rather than page->buffers.
That's possible in this case, but it's still a bit surprising that we'd
*always* get a NULL pointer rather than some other random pointer as a
result.
The buffer-ring debug patch that you posted looks like the obvious way
to dig further into this. If that doesn't get anyway, we can also trap
the case where following bh->b_this_page gives us a buffer whose b_page
is on a different page.
--Stephen
Hi Stephen,
On Mon, Jun 21, 2004 at 04:06:50PM +0100, Stephen C. Tweedie wrote:
> Hi,
>
> On Thu, 2004-06-17 at 14:16, Marcelo Tosatti wrote:
>
> > Stephen, Andrew, do you have any idea how the buffers could have vanished
> > under us with the page locked? That should not be possible.
>
> No, especially not on UP as Frank reported.
>
> > I dont see how this "page->buffers = NULL" could be caused by hardware problem,
> > which is usually one or two bit flip.
>
> We don't know for sure that it's page->buffers. If we have gone round
> the bh->b_this_page loop already, we could have ended up following the
> pointers either to an invalid bh, or to one that's not on the current
> page. So it could also be the previous buffer's b_this_page that got
> clobbered, rather than page->buffers.
>
> That's possible in this case, but it's still a bit surprising that we'd
> *always* get a NULL pointer rather than some other random pointer as a
> result.
I dont remember seeing any case which was not a NULL pointer dereference.
> The buffer-ring debug patch that you posted looks like the obvious way
> to dig further into this. If that doesn't get anyway, we can also trap
> the case where following bh->b_this_page gives us a buffer whose b_page
> is on a different page.
Fine. Just printing out bh->b_page at debug_page() will allow us to verify that, yes?
--- transaction.c.orig 2004-06-21 12:50:01.090082264 -0300
+++ transaction.c 2004-06-21 12:50:45.574319632 -0300
@@ -1704,9 +1704,9 @@ void debug_page(struct page *p)
p->index, atomic_read(&p->count), p->flags);
while (bh) {
- printk(KERN_ERR "%s: bh b_next:%p blocknr:%lu b_list:%u state:%lx\n",
+ printk(KERN_ERR "%s: bh b_next:%p blocknr:%lu b_list:%u state:%lx b_page:%p\n",
__FUNCTION__, bh->b_next, bh->b_blocknr, bh->b_list,
- bh->b_state);
+ bh->b_state, bh->b_page);
bh = bh->b_this_page;
}
}
Hi,
On Mon, 2004-06-21 at 16:53, Marcelo Tosatti wrote:
> > The buffer-ring debug patch that you posted looks like the obvious way
> > to dig further into this. If that doesn't get anyway, we can also trap
> > the case where following bh->b_this_page gives us a buffer whose b_page
> > is on a different page.
>
> Fine. Just printing out bh->b_page at debug_page() will allow us to verify that, yes?
For most cases, yes. There are basically three corruption cases ---
b_this_page leads us to an oops, an infinite loop, or a loop including a
bogus page. Trapping the b_this_page ring walks to trap on any bad
b_page would help in the latter two cases, but if we're always getting
the first case, just extending the existing debug patch would be fine.
--Stephen