2003-06-10 20:05:50

by Robert Schwebel

[permalink] [raw]
Subject: init does not run on 405GP system

Hi,

I'm currently porting u-boot and Linux to an IBM 405GP based board. The
problem is now that init seems not to be running and does not give any
output. Up to that point where init should make some noise the kernel
boots smoothly (serial console), I see all output and NFS-Root is
mounted via an Intel 82559 network chip. The kernel threads are also
running, I see kupdated & friends being put into the run queue from time
to time.

I have replaced /sbin/init by a statically linked "hello world" (which
also does not give any output). My impression is that the binary code of
the init ELF binary is never run. When I switch on the SHOW_SYSCALLs
macro in arch/ppc/kernel/entry.S I see the system calls for open(),
dup(), dup() and execve() which come from init/main.c. Opening the
console works, execve() to /sbin/init as well. When I follow the path of
execution up to load_elf_binary() in fs/binfmt_elf.c I can even see the
correct code being load and pointed to by elf_entry in that file. But
there is never any output from init, nor does something happen when I
replace init by a piece of code which should immediately make a zero
pointer exception.

Nevertheless, the kernel runs smoothly. I can ping the machine, I can
even floodping it with 0% packet loss. Only that there is no userspace
running.

Has anybody seen something like this before?

- Kernel is 2.4.21-rc2 with bitkeeper from 20030515 plus board port
- userland was tested with Debian bootdisks, Denx 4xx boot image and
others
- toolchain is the Debian powerpc-linux cross toolchain.

Robert
--
Dipl.-Ing. Robert Schwebel | http://www.pengutronix.de
Pengutronix - Linux Solutions for Science and Industry
Braunschweiger Str. 79, 31134 Hildesheim, Germany
Handelsregister: Amtsgericht Hildesheim, HRA 2686
Phone: +49-5121-28619-0 | Fax: +49-5121-28619-4


2003-06-11 06:44:19

by Denis Vlasenko

[permalink] [raw]
Subject: Re: init does not run on 405GP system

On 10 June 2003 23:16, Robert Schwebel wrote:
> Hi,
>
> I'm currently porting u-boot and Linux to an IBM 405GP based board. The
> problem is now that init seems not to be running and does not give any
> output. Up to that point where init should make some noise the kernel
> boots smoothly (serial console), I see all output and NFS-Root is
> mounted via an Intel 82559 network chip. The kernel threads are also
> running, I see kupdated & friends being put into the run queue from time
> to time.
>
> I have replaced /sbin/init by a statically linked "hello world" (which
> also does not give any output). My impression is that the binary code of
> the init ELF binary is never run. When I switch on the SHOW_SYSCALLs
> macro in arch/ppc/kernel/entry.S I see the system calls for open(),
> dup(), dup() and execve() which come from init/main.c. Opening the
> console works, execve() to /sbin/init as well. When I follow the path of
> execution up to load_elf_binary() in fs/binfmt_elf.c I can even see the
> correct code being load and pointed to by elf_entry in that file. But
> there is never any output from init, nor does something happen when I
> replace init by a piece of code which should immediately make a zero
> pointer exception.
>
> Nevertheless, the kernel runs smoothly. I can ping the machine, I can
> even floodping it with 0% packet loss. Only that there is no userspace
> running.
>
> Has anybody seen something like this before?

Yes.

I once tried to run 686 based libc on a 486, init was rained upon
by SIGILLs 'coz it had 586+ instructions. No output on the screen
whatsoever.
--
vda

2003-06-11 06:59:17

by Robert Schwebel

[permalink] [raw]
Subject: Re: init does not run on 405GP system

On Wed, Jun 11, 2003 at 09:53:04AM +0300, Denis Vlasenko wrote:
> I once tried to run 686 based libc on a 486, init was rained upon
> by SIGILLs 'coz it had 586+ instructions. No output on the screen
> whatsoever.

I've tried it with the DENX busybox rootimage which is definitely tested
extensively on PPC4xx, but it does not work.

Robert
--
Dipl.-Ing. Robert Schwebel | http://www.pengutronix.de
Pengutronix - Linux Solutions for Science and Industry
Braunschweiger Str. 79, 31134 Hildesheim, Germany
Handelsregister: Amtsgericht Hildesheim, HRA 2686
Phone: +49-5121-28619-0 | Fax: +49-5121-28619-4

2003-06-11 13:56:32

by Jocelyn Mayer

[permalink] [raw]
Subject: Re: init does not run on 405GP system


> On Wed, Jun 11, 2003 at 09:53:04AM +0300, Denis Vlasenko wrote:
> > I once tried to run 686 based libc on a 486, init was rained upon
> > by SIGILLs 'coz it had 586+ instructions. No output on the screen
> > whatsoever.
>
>
> I've tried it with the DENX busybox rootimage which is definitely
> tested
> extensively on PPC4xx, but it does not work.
>
>
> Robert
>
Hi,

You may have wrong compilation options.
Please send me your test program and I'll test it on a PPC403 board.
>From my tests, it seems that gcc 3.xx specs are quite buggy for those
embedded processors and need to be patched to produce correct code.

Regards.

--
Jocelyn Mayer <[email protected]>

2003-06-12 19:00:14

by Robert Schwebel

[permalink] [raw]
Subject: Re: init does not run on 405GP system

On Tue, Jun 10, 2003 at 04:10:47PM +0200, Robert Schwebel wrote:
> I'm currently porting u-boot and Linux to an IBM 405GP based board.
> The problem is now that init seems not to be running and does not give
> any output.

We have some new information about what happens on that machine, and the
more I know the stranger it is :-(

When you look at fs/exec.c:setup_arg_pages() there is a call to
put_dirty_pages(). After that call the argv and env data should be found
on the stack, at 0x7fffffda, which can also be looked at when you
generate a kernel mapping with kmap(page) plus an offset of 0xfba. These
both addresses should point to the same piece of physical RAM. I have
printed out the content of the TLBs and they look correct:

...
TLB 7, v = 1, sz = 1, flags = 0x0110, EPN 0x7ffff000, RPN 0x0014a000
...
TLB 63, v = 1, sz = 7, flags = 0x0300, EPN 0xc1000000, RPN 0x01000000
...

Nevertheless, when I set a breakpoint to the location after the
put_dirty_pages() call I see different memory. The kernel mapping
contains correct content:

"init",0,"HOME=/",0,"TERM=linux",0,"/sbin/init"

But the user mapping (0x7fffffda) shows crap. It is a writable piece of
memory, I can place something in it by writing after the
put_dirty_pages(). When I write a "unique" pattern to that place, stop
the processor with the BDI and read out a complete memory dump I don't
find the pattern any more - this looks like a caching problem, but I'm
not entirely sure. I've tried an invalidate_dcache_range() to the user
space mapping addresses without success.

-+-+-

What could happen here? Is the cache handling code bullet proof? I'm
running out of ideas.

Kernel is still 2.4.21-rc2-ppc20030515 plus port to the board in
question.

Robert
--
Dipl.-Ing. Robert Schwebel | http://www.pengutronix.de
Pengutronix - Linux Solutions for Science and Industry
Braunschweiger Str. 79, 31134 Hildesheim, Germany
Handelsregister: Amtsgericht Hildesheim, HRA 2686
Phone: +49-5121-28619-0 | Fax: +49-5121-28619-4