LinuxLists.cc - Re: Make sure we populate the initroot filesystem late enough

[permalink] [raw]

Subject: Re: Make sure we populate the initroot filesystem late enough

On Sun, 25 Feb 2007, David Woodhouse wrote:
>
> One side-effect of this patch is to move the call to free_initrd() much
> later in the init sequence, potentially after other memory management
> code is assuming it's already been freed.

Hmm. No, I don't think that should be a problem. free_initmem() only
happens at the very, after do_basic_setup() has been run, which includes
all the initcall stuff.

However, it's an interesting observation. How sure are you that it's this
commit that triggers it. You say "This seems to be what's triggering ..",
I'm wondering how firm that is..

Linus

2007-02-26 00:44:37

[permalink] [raw]

Subject: Re: Make sure we populate the initroot filesystem late enough

On Sun, 2007-02-25 at 16:24 -0800, Linus Torvalds wrote:
> Hmm. No, I don't think that should be a problem. free_initmem() only
> happens at the very, after do_basic_setup() has been run, which includes
> all the initcall stuff.

> However, it's an interesting observation. How sure are you that it's this
> commit that triggers it. You say "This seems to be what's triggering ..",
> I'm wondering how firm that is..

I found it with git-bisect. The Fedora kernel has been broken on this
particular 512MiB Mac Mini for a while, and now I've reverted the patch
it seems to be fine again. So I'm fairly sure. I'll be surer in a few
minutes once the full RPM build has finished with the patch reverted.

Of course, it could easily be an entirely separate bug which by some
bizarre coincidence is just triggered by this.

--
dwmw2

2007-02-26 01:17:13

[permalink] [raw]

Subject: Re: Make sure we populate the initroot filesystem late enough

On Sun, 2007-02-25 at 16:24 -0800, Linus Torvalds wrote:
> Hmm. No, I don't think that should be a problem. free_initmem() only
> happens at the very, after do_basic_setup() has been run, which
> includes all the initcall stuff.

I'm inclined to agree that it _shouldn't_ be a problem. Nevertheless,
even this hack seems sufficient to 'fix' it:

--- arch/powerpc/mm/init_32.c 2007-02-25 20:06:54.000000000 -0500
+++ arch/powerpc/mm/init_32.c.not 2007-02-25 20:06:41.000000000 -0500
@@ -243,13 +243,14 @@ void free_initmem(void)
#ifdef CONFIG_BLK_DEV_INITRD
void free_initrd_mem(unsigned long start, unsigned long end)
{
if (start < end)
- printk ("Freeing initrd memory: %ldk freed\n", (end - start) >> 10);
+ printk ("NOT Freeing initrd memory: %ldKiB would be freed\n", (end - start) >> 10);
+ return;
for (; start < end; start += PAGE_SIZE) {
ClearPageReserved(virt_to_page(start));
init_page_count(virt_to_page(start));
free_page(start);
totalram_pages++;
}
}
#endif

--
dwmw2

2007-02-26 03:45:55

[permalink] [raw]

Subject: Re: Make sure we populate the initroot filesystem late enough

On Sun, 25 Feb 2007, David Woodhouse wrote:
>
> I'm inclined to agree that it _shouldn't_ be a problem. Nevertheless,
> even this hack seems sufficient to 'fix' it:

Ok. Clearly something is using that memory. That said, I *suspect* that
the commit that you bisected to is just showing the problem indirectly.
The ordering shouldn't make any difference, but it can obviously make a
huge difference in various allocation patterns etc, thus just showing a
pre-existing problem more clearly..

Can you try adding something like

memset(start, 0xf0, end - start);

to before the return? That might give a better idea of exactly what is
using it after it's free'd, hopefully by having the user trigger some more
spectacular oops..

It is, of course, also entirely possible that the rootfs unpacking change
really *was* buggy, and I am just missing something totally obvious. The
memset() might still make it more obvious, though. Maybe.

> if (start < end)
> - printk ("Freeing initrd memory: %ldk freed\n", (end - start) >> 10);
> + printk ("NOT Freeing initrd memory: %ldKiB would be freed\n", (end - start) >> 10);

.. so adding the "memset()" here would be what I'm suggesting ..

> + return;

.. and you might as well leave the return there, so that nobody else comes
along and re-uses the memory. That should just improve on the chances of
the memset() hopefully catching the problem..

Linus "I don't see anything wrong" Torvalds

2007-02-26 04:00:37

[permalink] [raw]

Subject: Re: Make sure we populate the initroot filesystem late enough

On Sun, 2007-02-25 at 19:45 -0800, Linus Torvalds wrote:
> Ok. Clearly something is using that memory. That said, I *suspect* that
> the commit that you bisected to is just showing the problem indirectly.
> The ordering shouldn't make any difference, but it can obviously make a
> huge difference in various allocation patterns etc, thus just showing a
> pre-existing problem more clearly..

Indeed.

> Can you try adding something like
>
> memset(start, 0xf0, end - start);

Yeah, I did that before giving up on it for the day and going in search
of dinner. It changes the failure mode to a BUG() in
cache_free_debugcheck(), at line 2876 of mm/slab.c

It smells like the pages weren't actually reserved in the first place
and we were blithely allocating them. The only problem with that theory
is that the initrd doesn't seem to be getting corrupted -- and if we
were handing out its pages like that then surely _something_ would have
scribbled on it before we tried to read it.

When I head back in tomorrow morning I'll instrument free_initrd_mem()
to check that the PageReserved bit was actually set on each page, before
clearing it. And I'll make the page allocation routines check whether
they're giving out pages between initrd_start and initrd_end, etc.

--
dwmw2

2007-02-26 04:13:54

[permalink] [raw]

Subject: Re: Make sure we populate the initroot filesystem late enough

On Sun, 25 Feb 2007, David Woodhouse wrote:
>
> > Can you try adding something like
> >
> > memset(start, 0xf0, end - start);
>
> Yeah, I did that before giving up on it for the day and going in search
> of dinner. It changes the failure mode to a BUG() in
> cache_free_debugcheck(), at line 2876 of mm/slab.c

Ok, that's just strange.

One obvious thing to do would be to remove all the "__initdata" entries in
mm/slab.c.. But I'd also like to see the full backtrace for the BUG_ON(),
in case that gives any clues at all.

> It smells like the pages weren't actually reserved in the first place
> and we were blithely allocating them. The only problem with that theory
> is that the initrd doesn't seem to be getting corrupted -- and if we
> were handing out its pages like that then surely _something_ would have
> scribbled on it before we tried to read it.

Yeah, I don't think it's necessarily initrd itself, I'd be more inclined
to think that the reason you see this change with the initrd unpacking is
simply that it does a lot of allocations for the initrd files, so I think
it is only indirectly involved - just because it ends up being a slab
user.

> When I head back in tomorrow morning I'll instrument free_initrd_mem()
> to check that the PageReserved bit was actually set on each page, before
> clearing it. And I'll make the page allocation routines check whether
> they're giving out pages between initrd_start and initrd_end, etc.

Sounds like a sane plan.

Linus

2007-02-26 07:00:07

by William Lee Irwin III

[permalink] [raw]

Subject: Re: Make sure we populate the initroot filesystem late enough

On Sun, Feb 25, 2007 at 11:01:06PM -0500, David Woodhouse wrote:
> Yeah, I did that before giving up on it for the day and going in search
> of dinner. It changes the failure mode to a BUG() in
> cache_free_debugcheck(), at line 2876 of mm/slab.c
> It smells like the pages weren't actually reserved in the first place
> and we were blithely allocating them. The only problem with that theory
> is that the initrd doesn't seem to be getting corrupted -- and if we
> were handing out its pages like that then surely _something_ would have
> scribbled on it before we tried to read it.
> When I head back in tomorrow morning I'll instrument free_initrd_mem()
> to check that the PageReserved bit was actually set on each page, before
> clearing it. And I'll make the page allocation routines check whether
> they're giving out pages between initrd_start and initrd_end, etc.

Another few things to try would be inserting checks in page_alloc.c for
pages in that specific range before some flag set in free_initrd_mem()
is set, and (conflicting with that, though easily reconciled) unmapping
initrd memory in free_initrd_mem() instead of freeing it.

-- wli

2007-02-26 15:54:55

[permalink] [raw]

Subject: Re: Make sure we populate the initroot filesystem late enough

On Sun, 2007-02-25 at 20:17 -0500, David Woodhouse wrote:
> On Sun, 2007-02-25 at 16:24 -0800, Linus Torvalds wrote:
> > Hmm. No, I don't think that should be a problem. free_initmem() only
> > happens at the very, after do_basic_setup() has been run, which
> > includes all the initcall stuff.
>
> I'm inclined to agree that it _shouldn't_ be a problem. Nevertheless,
> even this hack seems sufficient to 'fix' it:

Could be a powerpc specific bug in initrd handling... I'm still
traveling so I can't really look at it right now, but I wouldn't be
surprised if some of that code did indeed bitrot.

Ben.

> --- arch/powerpc/mm/init_32.c 2007-02-25 20:06:54.000000000 -0500
> +++ arch/powerpc/mm/init_32.c.not 2007-02-25 20:06:41.000000000 -0500
> @@ -243,13 +243,14 @@ void free_initmem(void)
> #ifdef CONFIG_BLK_DEV_INITRD
> void free_initrd_mem(unsigned long start, unsigned long end)
> {
> if (start < end)
> - printk ("Freeing initrd memory: %ldk freed\n", (end - start) >> 10);
> + printk ("NOT Freeing initrd memory: %ldKiB would be freed\n", (end - start) >> 10);
> + return;
> for (; start < end; start += PAGE_SIZE) {
> ClearPageReserved(virt_to_page(start));
> init_page_count(virt_to_page(start));
> free_page(start);
> totalram_pages++;
> }
> }
> #endif
>
>

2007-02-26 15:55:24

[permalink] [raw]

Subject: Re: Make sure we populate the initroot filesystem late enough

On Sun, 2007-02-25 at 23:01 -0500, David Woodhouse wrote:

> Yeah, I did that before giving up on it for the day and going in search
> of dinner. It changes the failure mode to a BUG() in
> cache_free_debugcheck(), at line 2876 of mm/slab.c
>
> It smells like the pages weren't actually reserved in the first place
> and we were blithely allocating them. The only problem with that theory
> is that the initrd doesn't seem to be getting corrupted -- and if we
> were handing out its pages like that then surely _something_ would have
> scribbled on it before we tried to read it.
>
> When I head back in tomorrow morning I'll instrument free_initrd_mem()
> to check that the PageReserved bit was actually set on each page, before
> clearing it. And I'll make the page allocation routines check whether
> they're giving out pages between initrd_start and initrd_end, etc.

And check that we didn't end up stupidly having the initrd share a page
with something else ... (like not aligned end or such thingy).

Ben.

2007-02-26 16:00:56

by Segher Boessenkool

[permalink] [raw]

Subject: Re: Make sure we populate the initroot filesystem late enough

> And check that we didn't end up stupidly having the initrd share a page
> with something else ... (like not aligned end or such thingy).

David tested that yesterday, it's not the case. Too bad,
would have been too easy ;-)

Segher

2007-02-26 16:24:32

[permalink] [raw]

Subject: Re: Make sure we populate the initroot filesystem late enough

On Sun, 2007-02-25 at 20:13 -0800, Linus Torvalds wrote:
>
> On Sun, 25 Feb 2007, David Woodhouse wrote:
> >
> > > Can you try adding something like
> > >
> > > memset(start, 0xf0, end - start);
> >
> > Yeah, I did that before giving up on it for the day and going in search
> > of dinner. It changes the failure mode to a BUG() in
> > cache_free_debugcheck(), at line 2876 of mm/slab.c
>
> Ok, that's just strange.

In this case I hadn't left the 'return' in free_initrd_mem(). I was
poisoning the pages and then returning them to the pool as usual.

If I poison the pages and _don't_ return them to the pool, it boots
fine. PageReserved is set on every page in the initrd region; total
page_count() is equal to the number of pages (which doesn't
_necessarily_ mean that page_count() for every page is equal to 1 but
it's a strong hint that that's the case).

Looking in /dev/mem after it boots, I see that my poison is still
present throughout the whole region.

> One obvious thing to do would be to remove all the "__initdata" entries in
> mm/slab.c..

This is biting us long before we call free_initmem().

> But I'd also like to see the full backtrace for the BUG_ON(),
> in case that gives any clues at all.

I'll see if I can find a camera.

> > It smells like the pages weren't actually reserved in the first place
> > and we were blithely allocating them. The only problem with that theory
> > is that the initrd doesn't seem to be getting corrupted -- and if we
> > were handing out its pages like that then surely _something_ would have
> > scribbled on it before we tried to read it.
>
> Yeah, I don't think it's necessarily initrd itself, I'd be more inclined
> to think that the reason you see this change with the initrd unpacking is
> simply that it does a lot of allocations for the initrd files, so I think
> it is only indirectly involved - just because it ends up being a slab
> user.

Whatever happens, initrd as a 'slab user' is fine. The crashes happen
_later_, when someone else is using the memory which used to belong to
the initrd. In that 'BUG at slab.c:2876' I mentioned above, r3 was
within the initrd region. As I said, I'll try to find a camera.

--
dwmw2

2007-02-26 16:45:13

by Milton Miller

[permalink] [raw]

Subject: Re: Make sure we populate the initroot filesystem late enough

On Feb 27, 2007, at 2:24 AM, David Woodhouse wrote:
> On Sun, 2007-02-25 at 20:13 -0800, Linus Torvalds wrote:
>> On Sun, 25 Feb 2007, David Woodhouse wrote:
>>>> Can you try adding something like
>>>>
>>>> memset(start, 0xf0, end - start);
>>>
>>> Yeah, I did that before giving up on it for the day and going in
>>> search
>>> of dinner. It changes the failure mode to a BUG() in
>>> cache_free_debugcheck(), at line 2876 of mm/slab.c
>>
>> Ok, that's just strange.
>
> In this case I hadn't left the 'return' in free_initrd_mem(). I was
> poisoning the pages and then returning them to the pool as usual.
>
> If I poison the pages and _don't_ return them to the pool, it boots
> fine. PageReserved is set on every page in the initrd region; total
> page_count() is equal to the number of pages (which doesn't
> _necessarily_ mean that page_count() for every page is equal to 1 but
> it's a strong hint that that's the case).
>
> Looking in /dev/mem after it boots, I see that my poison is still
> present throughout the whole region.
>
>> One obvious thing to do would be to remove all the "__initdata"
>> entries in
>> mm/slab.c..
>
> This is biting us long before we call free_initmem().
>
>> But I'd also like to see the full backtrace for the BUG_ON(),
>> in case that gives any clues at all.
>
> I'll see if I can find a camera.
>
>>> It smells like the pages weren't actually reserved in the first place
>>> and we were blithely allocating them. The only problem with that
>>> theory
>>> is that the initrd doesn't seem to be getting corrupted -- and if we
>>> were handing out its pages like that then surely _something_ would
>>> have
>>> scribbled on it before we tried to read it.
>>
>> Yeah, I don't think it's necessarily initrd itself, I'd be more
>> inclined
>> to think that the reason you see this change with the initrd
>> unpacking is
>> simply that it does a lot of allocations for the initrd files, so I
>> think
>> it is only indirectly involved - just because it ends up being a slab
>> user.
>
> Whatever happens, initrd as a 'slab user' is fine. The crashes happen
> _later_, when someone else is using the memory which used to belong to
> the initrd. In that 'BUG at slab.c:2876' I mentioned above, r3 was
> within the initrd region. As I said, I'll try to find a camera.

Just a thought,

Any chance you are using one of the unusal code paths, like the
bootloader
moving the initrd or using a kernel-crash region?

milton

2007-02-26 19:27:52

by john stultz

[permalink] [raw]

Subject: Re: Make sure we populate the initroot filesystem late enough

On Sun, 2007-02-25 at 19:00 -0500, David Woodhouse wrote:
> On Mon, 2006-12-11 at 20:59 +0000, Linux Kernel Mailing List wrote:
> >
> > Make sure we populate the initroot filesystem late enough
>
> This seems to be what's triggering the apparent memory corruption we've
> been seeing recently -- in the case of the Fedora kernel it manifests
> itself as a BUG() in cache_alloc_refill() when the pmac ide driver
> initialises.
>
> Another report was at http://lkml.org/lkml/2006/12/17/4
>
> We've been seeing it on a Mac Mini too, and I managed to reproduce it on
> my shinybook this evening by booting with 'mem=512M'.

Just for reference (as its not in the thread linked above), this issue
disappeared for me after some config changes (I somehow changed my
selection when I backtracked and then moved forward w/ git bisect).

I've not been able to reproduce it since, but I know others (BCC'ed on
this note) have seen it and might prod them to come forth with details
(and broken .config files)

thanks
-john

2007-02-26 20:52:14

by Kumar Gala

[permalink] [raw]

Subject: Re: Make sure we populate the initroot filesystem late enough

On Feb 26, 2007, at 9:51 AM, Benjamin Herrenschmidt wrote:

> On Sun, 2007-02-25 at 20:17 -0500, David Woodhouse wrote:
>> On Sun, 2007-02-25 at 16:24 -0800, Linus Torvalds wrote:
>>> Hmm. No, I don't think that should be a problem. free_initmem() only
>>> happens at the very, after do_basic_setup() has been run, which
>>> includes all the initcall stuff.
>>
>> I'm inclined to agree that it _shouldn't_ be a problem. Nevertheless,
>> even this hack seems sufficient to 'fix' it:
>
> Could be a powerpc specific bug in initrd handling... I'm still
> traveling so I can't really look at it right now, but I wouldn't be
> surprised if some of that code did indeed bitrot.
>
> Ben.

Could there be some issue with initrd getting reserved properly via
prom_init.c. I know we make sure there are memreserve's in the fdt
for initrd on embedded ppc.

- k

2007-02-26 20:57:19

[permalink] [raw]

Subject: Re: Make sure we populate the initroot filesystem late enough

On Mon, 2007-02-26 at 10:44 -0600, Milton Miller wrote:
> Any chance you are using one of the unusal code paths, like the
> bootloader moving the initrd or using a kernel crash region?

I'm doing nothing special. And I'm less sure now about the trigger. I
built a Fedora 7 test 2 install tree with the patch reverted, and
managed to boot and install.... but now when I boot the _same_ machine
with the same CD, it fails.

Now I'm starting to wonder if it's something the firmware sets up to DMA
to a certain region of memory, which makes it non-deterministic. And the
other things we're blaming are only making a difference because they
change the layout of what we have in memory.

--
dwmw2

2007-02-26 21:17:49

[permalink] [raw]

Subject: Re: Make sure we populate the initroot filesystem late enough

On Mon, 26 Feb 2007, David Woodhouse wrote:
>
> Now I'm starting to wonder if it's something the firmware sets up to DMA
> to a certain region of memory, which makes it non-deterministic. And the
> other things we're blaming are only making a difference because they
> change the layout of what we have in memory.

USB controller issues? We used to have these really hard-to-debug problems
with the USB controller being active and having had the BIOS set up the
command queues etc. Really subtle. It's why we now have PCI quirks for
shutting up (most) USB controllers very early.

If there is some USB controller that we miss, or that sets up its command
chain to some unexpected area (so that USB is active and corrupting memory
even very early on), that could explain it.

Linus

2007-02-26 22:52:38

by TBBle

[permalink] [raw]

Subject: Re: Make sure we populate the initroot filesystem late enough

On Mon, Feb 26, 2007 at 11:27:47AM -0800, john stultz wrote:
> On Sun, 2007-02-25 at 19:00 -0500, David Woodhouse wrote:
>> On Mon, 2006-12-11 at 20:59 +0000, Linux Kernel Mailing List wrote:
> >>
> >> Make sure we populate the initroot filesystem late enough

>> This seems to be what's triggering the apparent memory corruption we've
>> been seeing recently -- in the case of the Fedora kernel it manifests
>> itself as a BUG() in cache_alloc_refill() when the pmac ide driver
>> initialises.
>>
>> Another report was at http://lkml.org/lkml/2006/12/17/4
>>
>> We've been seeing it on a Mac Mini too, and I managed to reproduce it on
>> my shinybook this evening by booting with 'mem=512M'.

> Just for reference (as its not in the thread linked above), this issue
> disappeared for me after some config changes (I somehow changed my
> selection when I backtracked and then moved forward w/ git bisect).

> I've not been able to reproduce it since, but I know others (BCC'ed on
> this note) have seen it and might prod them to come forth with details
> (and broken .config files)

In my case, disabling CPU_FREQ_PMAC made the failure go away.
After reverting this patch, CPU_FREQ_PMAC is once again operating
successfully, so far.

--
-----------------------------------------------------------
Paul "TBBle" Hampson, B.Sc, LPI, MCSE
On-hiatus Asian Studies student, ANU
The Boss, Bubblesworth Pty Ltd (ABN: 51 095 284 361)
[email protected]

Of course Pacman didn't influence us as kids. If it did,
we'd be running around in darkened rooms, popping pills and
listening to repetitive music.
-- Kristian Wilson, Nintendo, Inc, 1989

License: http://creativecommons.org/licenses/by/2.1/au/
-----------------------------------------------------------

Attachments:

(No filename) (1.72 kB)
(No filename) (189.00 B)
Download all attachments

2007-02-27 06:48:46

[permalink] [raw]

Subject: Re: Make sure we populate the initroot filesystem late enough

> USB controller issues? We used to have these really hard-to-debug problems
> with the USB controller being active and having had the BIOS set up the
> command queues etc. Really subtle. It's why we now have PCI quirks for
> shutting up (most) USB controllers very early.

On powermacs or powerbooks, the USB controller is shut down by the
firmware when we call the "quiesce" OF call from prom_init.c, which
happens before the kernel relocates itself to 0 and takes over memory.
Unless we fucked up something in there, I wouldn't expect that to be the
cause.

> If there is some USB controller that we miss, or that sets up its command
> chain to some unexpected area (so that USB is active and corrupting memory
> even very early on), that could explain it.

Did we setup the OHCI controller when the crash happen ? Maybe we broke
something subtle in the USB stack ?

Ben.

2007-02-27 07:04:22

[permalink] [raw]

Subject: Re: Make sure we populate the initroot filesystem late enough

> > I've not been able to reproduce it since, but I know others (BCC'ed on
> > this note) have seen it and might prod them to come forth with details
> > (and broken .config files)
>
> In my case, disabling CPU_FREQ_PMAC made the failure go away.
> After reverting this patch, CPU_FREQ_PMAC is once again operating
> successfully, so far.

Hrm.. which cpufreq method is used on both your machines ? If it's the
one involving the PMU, it does involve a full hard reset of the
processor (with appropriate cache flushes etc...), maybe something's
going wrong in that area....

Ben.

2007-02-27 11:58:34

by Segher Boessenkool

[permalink] [raw]

Subject: Re: Make sure we populate the initroot filesystem late enough

>>> I've not been able to reproduce it since, but I know others (BCC'ed
>>> on
>>> this note) have seen it and might prod them to come forth with
>>> details
>>> (and broken .config files)
>>
>> In my case, disabling CPU_FREQ_PMAC made the failure go away.
>> After reverting this patch, CPU_FREQ_PMAC is once again operating
>> successfully, so far.
>
> Hrm.. which cpufreq method is used on both your machines ? If it's the
> one involving the PMU, it does involve a full hard reset of the
> processor (with appropriate cache flushes etc...), maybe something's
> going wrong in that area....

It's most likely a red herring, lots of config changes
make the bug go away on some kernel versions (but not
on others); the problem is very sensitive to changes in
memory layout.

Segher

2007-02-28 06:49:18

[permalink] [raw]

Subject: Re: Make sure we populate the initroot filesystem late enough

> It's most likely a red herring, lots of config changes
> make the bug go away on some kernel versions (but not
> on others); the problem is very sensitive to changes in
> memory layout.

I wouldn't be that sure ... I've had problems in the past with PMU based
cpufreq... looks like flushing all caches and hard-resetting the
processor on the fly when there can be pending DMAs might be a source of
trouble... especially on CPUs that don't have working cache flush HW
assist.

Ben.

2007-02-28 10:13:29