2002-07-17 15:52:41

by Matthew Wilcox

[permalink] [raw]
Subject: 2.5.26 broken on headless boxes


On a headless box with both CONFIG_VT_CONSOLE and CONFIG_SERIAL_CONSOLE
defined, I get:

Freeing unused kernel memory: 452k freed
visual_init: sw = 00000000, conswitchp = 00000000, currcons = 0, init = 1
Unable to handle kernel NULL pointer dereference at virtual address 00000000
printing eip:
c01b775f
*pde = 37868001
*pte = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c01b775f>] Not tainted
EFLAGS: 00010286
eax: 00000000 ebx: c03dc9a0 ecx: 00000001 edx: c02f2cb0
esi: 00000000 edi: 00000000 ebp: 00000000 esp: c3d45e9c
ds: 0018 es: 0018 ss: 0018
Process init (pid: 1, threadinfo=c3d44000 task=c3d4f300)
Stack: 00000000 f784a000 00000000 f78518f0 f7907600 c01baa05 00000000 00000000
00000001 c01ac03c f784a000 f7871da0 c3d44000 f7871da0 00000000 f78518f0
f786da20 01021e80 f784a000 f7910160 c0145a93 f786da20 c3d45f80 f7a06000
Call Trace: [<c01baa05>] [<c01ac03c>] [<c0145a93>] [<c0144ee7>] [<c0146383>]
[<c013c34a>] [<c013b011>] [<c013af26>] [<c013b317>] [<c0108893>]

Code: 83 38 00 75 09 57 e8 36 ed ff ff 83 c4 04 8b 86 00 cc 3d c0
<0>Kernel panic: Attempted to kill init!

I put an `if (!sw) return;' in visual_init and the panic persists, so
it's not entirely obvious to me what's going on. Someone more intimate
with the console system want to comment?

--
Revolutions do not require corporate support.


2002-07-18 01:03:23

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.5.26 broken on headless boxes

On Wed, Jul 17, 2002 at 04:55:38PM +0100, Matthew Wilcox wrote:
> On a headless box with both CONFIG_VT_CONSOLE and CONFIG_SERIAL_CONSOLE
> defined, I get:
> Freeing unused kernel memory: 452k freed
> visual_init: sw = 00000000, conswitchp = 00000000, currcons = 0, init = 1
> Unable to handle kernel NULL pointer dereference at virtual address 00000000
> printing eip:
> c01b775f
> *pde = 37868001
> *pte = 00000000
> Oops: 0000
> CPU: 0
> EIP: 0010:[<c01b775f>] Not tainted
> EFLAGS: 00010286

Could you reproduce this and get maybe a backtrace and a line number?


Thanks,
Bill

2002-07-18 09:43:49

by Tobias Rittweiler

[permalink] [raw]
Subject: Re[3]: 2.5.26 broken on headless boxes

Hello William,

Thursday, July 18, 2002, 3:06:17 AM, you wrote:

>> [... kernelpanic ...]

> Could you reproduce this and get maybe a backtrace and a line number?

I just want to say that I do get a similiar panic (i hope it's the
same reason and I do not mix up something), the story about the
trouble which 2.5.26 is causing I will expound below:

-snip-
Unable to handle kernel NULL pointer dereference at virtual address 00000000
printing eip:
c025a9d7
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c025a9d7>] Not tainted
EFLAGS: 00010202
eax: 00000000 ebx: c039e060 ecx: 00000013 edx: c02db5d4
esi: 00000014 edi: 00000000 ebp: c02dc85c esp: c11f3f98
ds: 0018 es: 0018 ss: 0018
Process init (pid: 1, threadinfo=c11f2000 task=c11f0040)
Stack: c02f86cc c02dc85c 00000014 00000000 c02dffc8 c039df41 0008e000 00000005
c039e000 c02f8618 c039df20 c11c5360 00000000 c02589ac c031ebe8 00000000
c02e0746 c11f2000 c02e077f c0105073 00010f00 c02dffc8 c01056a4 00000000
Call Trace: [<c02589ac>] [<c0105073>] [<c01056a4>]

Code: 83 38 00 75 f4 0f b7 c1 c3 8b 54 24 04 8b 4c 24 08 31 c0 49
<0>Kernel panic: Attempted to kill init!
-snap-

Sorry, I am just a guy having free time to test dev-kernels and to
report bugs if there are some, but I do not really know what you mean
with "backtrace" and how i should discover the effecting line number.
Please do not spare with seemingly trivial information.

Well then to the 2.5.26 misadventure story:
I patched from 2.5.25 to .26 and first it didn't want to be compiled
until I created a symlink from /usr/include/asm-generic to
/usr/src/linux/include/asm-generic (while /usr/src/linux is a symlink
to /usr/src/linux-2.5.26) - for me it seemed as if the makefile
doesn't care about include/ in the kernelsource dir, although this file
seems to be proper - a failure/ignorance of mine i do not exclude
though (if someone want to recommend to use the whole tarball, i tried
it already).

However, after the creation it compiled using the old config file of
2.5.25. The image did even boot, init stopped though when the gettys
should be invoked. So far so bad, I modified the /etc/inittab not to
use the vc's but the ordinary console instead - and the kernel
successfully booted and I could work.

This phenomenon looks like the VCs are broken, but I am absolutely not
sure, because none has complained about them so far.

To make sure I wanted to build a kernel completely new (and not just
using the old configfile), but during the boot of the resulting kernelimage I
got the panic above. If somebody wants, I can create a diff between the
configfile used by the kernel where the VCs seems to be broken and the
one of the kernel causing the panic.

--
cheers,
Tobias

http://freebits.org

2002-07-18 10:43:12

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.5.26 broken on headless boxes

Thursday, July 18, 2002, 3:06:17 AM, you wrote:
>> Could you reproduce this and get maybe a backtrace and a line number?

On Thu, Jul 18, 2002 at 11:48:00AM +0200, Tobias Rittweiler wrote:
> I just want to say that I do get a similiar panic (i hope it's the
> same reason and I do not mix up something), the story about the
> trouble which 2.5.26 is causing I will expound below:
>
> -snip-
> Unable to handle kernel NULL pointer dereference at virtual address 00000000
> printing eip:
> c025a9d7

Please, decode your oopses.


Cheers,
Bill

2002-07-18 12:25:42

by Tobias Rittweiler

[permalink] [raw]
Subject: Re[2]: 2.5.26 broken on headless boxes

Hello William,

Thursday, July 18, 2002, 12:46:04 PM, you wrote:

WLII> Please, decode your oopses.

Ville Herva did already point me to do this, thanks.

=======================================================
ksymoops 2.4.5 on i686 2.5.25. Options used
-v linux-2.5.26/vmlinux (specified)
-K (specified)
-L (specified)
-O (specified)
-m linux-2.5.26/System.map (specified)

Unable to handle kernel NULL pointer dereference at virtual address 00000000
c025a9d7
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c025a9d7>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010202
eax: 00000000 ebx: c039e060 ecx: 00000013 edx: c02db5d4
esi: 00000014 edi: 00000000 ebp: c02dc85c esp: c11f3f98
ds: 0018 es: 0018 ss: 0018
Stack: c02f86cc c02dc85c 00000014 00000000 c02dffc8 c039df41 0008e000 00000005
c039e000 c02f8618 c039df20 c11c5360 00000000 c02589ac c031ebe8 00000000
c02e0746 c11f2000 c02e077f c0105073 00010f00 c02dffc8 c01056a4 00000000
Call Trace: [<c02589ac>] [<c0105073>] [<c01056a4>]
Code: 83 38 00 75 f4 0f b7 c1 c3 8b 54 24 04 8b 4c 24 08 31 c0 49


>>EIP; c025a9d7 <find_next_offset+1f/28> <=====

>>ebx; c039e060 <llc_offset_table+60/f0>
>>edx; c02db5d4 <llc_reject_state_transitions+9c/d8>
>>ebp; c02dc85c <llc_conn_state_table+20/60>
>>esp; c11f3f98 <END_OF_CODE+e55c28/????>

Trace; c02589ac <mac_indicate+0/224>
Trace; c0105073 <init+2b/178>
Trace; c01056a4 <kernel_thread+28/38>

Code; c025a9d7 <find_next_offset+1f/28>
00000000 <_EIP>:
Code; c025a9d7 <find_next_offset+1f/28> <=====
0: 83 38 00 cmpl $0x0,(%eax) <=====
Code; c025a9da <find_next_offset+22/28>
3: 75 f4 jne fffffff9 <_EIP+0xfffffff9> c025a9d0 <find_next_offset+18/28>
Code; c025a9dc <find_next_offset+24/28>
5: 0f b7 c1 movzwl %cx,%eax
Code; c025a9df <find_next_offset+27/28>
8: c3 ret
Code; c025a9e0 <llc_find_offset+0/4b>
9: 8b 54 24 04 mov 0x4(%esp,1),%edx
Code; c025a9e4 <llc_find_offset+4/4b>
d: 8b 4c 24 08 mov 0x8(%esp,1),%ecx
Code; c025a9e8 <llc_find_offset+8/4b>
11: 31 c0 xor %eax,%eax
Code; c025a9ea <llc_find_offset+a/4b>
13: 49 dec %ecx

<0>Kernel panic: Attempted to kill init!
=======================================================

--
cheers,
Tobias

http://freebits.org

2002-07-18 12:43:07

by Andi Kleen

[permalink] [raw]
Subject: Re: 2.5.26 broken on headless boxes

Matthew Wilcox <[email protected]> writes:

> On a headless box with both CONFIG_VT_CONSOLE and CONFIG_SERIAL_CONSOLE
> defined, I get:

[...]

I also see similar problems on x86-64 in 2.5.25. The kernel quickly crashes
when trying to return from opost_write() because something below has zeroed
out the stack (with serial console and vga console and early console enabled)
I have not tried it with 2.5.26 yet.

-Andi

2002-07-18 13:26:47

by Matthew Wilcox

[permalink] [raw]
Subject: Re: 2.5.26 broken on headless boxes

On Wed, Jul 17, 2002 at 06:06:17PM -0700, William Lee Irwin III wrote:
> Could you reproduce this and get maybe a backtrace and a line number?

100% reproducible...

ksymoops 2.4.5 on i686 2.4.14. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.14/ (default)
-m System.map (specified)

Error (regular_file): read_ksyms stat /proc/ksyms failed
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
Reading Oops report from the terminal
Unable to handle kernel NULL pointer dereference at virtual address 00000004
c01b7695
*pde = 37867001
Oops: 0000
CPU: 0
EIP: 0010:[<c01b7695>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax: 00000001 ebx: 00000000 ecx: 00000000 edx: f7906600
esi: 00000000 edi: c03dcc00 ebp: 00000000 esp: c3d45e7c
ds: 0018 es: 0018 ss: 0018
Stack: f7906600 00000001 c03dc9a0 00000000 00000000 c01b7773 00000000 00000001
00000000 f784a000 00000000 f78518f0 f7906600 c01baa25 00000000 00000000
00000001 c01ac08c f784a000 f7871da0 c3d44000 f7871da0 00000000 f78518f0
Call Trace: [<c01b7773>] [<c01baa25>] [<c01ac08c>] [<c0145a83>] [<c0144ed7>]
[<c0146373>] [<c013c33a>] [<c013b001>] [<c013af16>] [<c013b307>] [<c0108893>]

Code: 8b 41 04 ff d0 8b 34 3b 83 c4 08 66 83 7e 22 00 75 22 8a 86


>>EIP; c01b7695 <visual_init+85/e0> <=====

>>edx; f7906600 <END_OF_CODE+37502e5c/????>
>>edi; c03dcc00 <vc_cons+0/fc>
>>esp; c3d45e7c <END_OF_CODE+39426d8/????>

Trace; c01b7773 <vc_allocate+83/140>
Trace; c01baa25 <con_open+19/88>
Trace; c01ac08c <tty_open+20c/394>
Trace; c0145a83 <link_path_walk+683/874>
Trace; c0144ed7 <permission+27/2c>
Trace; c0146373 <may_open+5f/2ac>
Trace; c013c33a <chrdev_open+66/98>
Trace; c013b001 <dentry_open+e1/1b0>
Trace; c013af16 <filp_open+52/5c>
Trace; c013b307 <sys_open+37/74>
Trace; c0108893 <syscall_call+7/b>

Code; c01b7695 <visual_init+85/e0>
00000000 <_EIP>:
Code; c01b7695 <visual_init+85/e0> <=====
0: 8b 41 04 mov 0x4(%ecx),%eax <=====
Code; c01b7698 <visual_init+88/e0>
3: ff d0 call *%eax
Code; c01b769a <visual_init+8a/e0>
5: 8b 34 3b mov (%ebx,%edi,1),%esi
Code; c01b769d <visual_init+8d/e0>
8: 83 c4 08 add $0x8,%esp
Code; c01b76a0 <visual_init+90/e0>
b: 66 83 7e 22 00 cmpw $0x0,0x22(%esi)
Code; c01b76a5 <visual_init+95/e0>
10: 75 22 jne 34 <_EIP+0x34> c01b76c9 <visual_init+b9
/e0>
Code; c01b76a7 <visual_init+97/e0>
12: 8a 86 00 00 00 00 mov 0x0(%esi),%al

<0>Kernel panic: Attempted to kill init!



--
Revolutions do not require corporate support.

2002-07-18 20:16:04

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.5.26 broken on headless boxes

On Thu, Jul 18, 2002 at 02:29:46PM +0100, Matthew Wilcox wrote:
>>>EIP; c01b7695 <visual_init+85/e0> <=====
>>>edx; f7906600 <END_OF_CODE+37502e5c/????>
>>>edi; c03dcc00 <vc_cons+0/fc>
>>>esp; c3d45e7c <END_OF_CODE+39426d8/????>
> Trace; c01b7773 <vc_allocate+83/140>
> Trace; c01baa25 <con_open+19/88>
> Trace; c01ac08c <tty_open+20c/394>
> Trace; c0145a83 <link_path_walk+683/874>
> Trace; c0144ed7 <permission+27/2c>
> Trace; c0146373 <may_open+5f/2ac>
> Trace; c013c33a <chrdev_open+66/98>
> Trace; c013b001 <dentry_open+e1/1b0>
> Trace; c013af16 <filp_open+52/5c>
> Trace; c013b307 <sys_open+37/74>
> Trace; c0108893 <syscall_call+7/b>

This is the 4th one of these I've seen in the last two days. Any chance
of being able to compile with -g and get an addr2line on the EIP? I've
tried to reproduce it myself, but haven't gotten it to happen yet.

Thanks,
Bill

2002-07-18 20:14:08

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.5.26 broken on headless boxes

On Thu, Jul 18, 2002 at 02:29:39PM +0200, Tobias Rittweiler wrote:
> >>EIP; c025a9d7 <find_next_offset+1f/28> <=====
> >>ebx; c039e060 <llc_offset_table+60/f0>
> >>edx; c02db5d4 <llc_reject_state_transitions+9c/d8>
> >>ebp; c02dc85c <llc_conn_state_table+20/60>
> >>esp; c11f3f98 <END_OF_CODE+e55c28/????>
> Trace; c02589ac <mac_indicate+0/224>
> Trace; c0105073 <init+2b/178>
> Trace; c01056a4 <kernel_thread+28/38>

This looks pretty ugly, you're probably going to have to find someone
who's dealt with the LLC stack.


Cheers,
Bill

2002-07-18 20:29:08

by Matthew Wilcox

[permalink] [raw]
Subject: Re: 2.5.26 broken on headless boxes

On Thu, Jul 18, 2002 at 01:18:57PM -0700, William Lee Irwin III wrote:
> On Thu, Jul 18, 2002 at 02:29:46PM +0100, Matthew Wilcox wrote:
> >>>EIP; c01b7695 <visual_init+85/e0> <=====
> >>>edx; f7906600 <END_OF_CODE+37502e5c/????>
> >>>edi; c03dcc00 <vc_cons+0/fc>
> >>>esp; c3d45e7c <END_OF_CODE+39426d8/????>
> > Trace; c01b7773 <vc_allocate+83/140>
> > Trace; c01baa25 <con_open+19/88>
> > Trace; c01ac08c <tty_open+20c/394>
> > Trace; c0145a83 <link_path_walk+683/874>
> > Trace; c0144ed7 <permission+27/2c>
> > Trace; c0146373 <may_open+5f/2ac>
> > Trace; c013c33a <chrdev_open+66/98>
> > Trace; c013b001 <dentry_open+e1/1b0>
> > Trace; c013af16 <filp_open+52/5c>
> > Trace; c013b307 <sys_open+37/74>
> > Trace; c0108893 <syscall_call+7/b>
>
> This is the 4th one of these I've seen in the last two days. Any chance
> of being able to compile with -g and get an addr2line on the EIP? I've
> tried to reproduce it myself, but haven't gotten it to happen yet.

seems fairly obvious what's happening with a couple of printks...

printk("visual_init: sw = %p, conswitchp = %p, currcons = %d, init = %d\n",
sw, conswitchp, currcons, init);

gets me the interesting fact that sw & conswitchp are both NULL.
later on, we call:
sw->con_init(vc_cons[currcons].d, init);
which seems like it would be the exact cause, no?

now whether putting a:

if (!sw)
return;

call into visual_init or whether we should determine earlier never to
call visual_init, I don't know. The people who know about the console
have been conspicuously silent so far...

--
Revolutions do not require corporate support.

2002-07-18 20:35:01

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.5.26 broken on headless boxes

On Thu, Jul 18, 2002 at 01:18:57PM -0700, William Lee Irwin III wrote:
>> This is the 4th one of these I've seen in the last two days. Any chance
>> of being able to compile with -g and get an addr2line on the EIP? I've
>> tried to reproduce it myself, but haven't gotten it to happen yet.

On Thu, Jul 18, 2002 at 09:32:08PM +0100, Matthew Wilcox wrote:
> seems fairly obvious what's happening with a couple of printks...
> printk("visual_init: sw = %p, conswitchp = %p, currcons = %d, init = %d\n",
> sw, conswitchp, currcons, init);
> gets me the interesting fact that sw & conswitchp are both NULL.
> later on, we call:
> sw->con_init(vc_cons[currcons].d, init);
> which seems like it would be the exact cause, no?

Ugh, I should have been able to see this somehow...

On Thu, Jul 18, 2002 at 09:32:08PM +0100, Matthew Wilcox wrote:
> now whether putting a:
> if (!sw)
> return;
> call into visual_init or whether we should determine earlier never to
> call visual_init, I don't know. The people who know about the console
> have been conspicuously silent so far...

To heck with waiting for them, if you can't boot because of it, I'd say
push the patch.


Cheers,
Bill

2002-07-18 21:02:24

by Petr Vandrovec

[permalink] [raw]
Subject: Re: 2.5.26 broken on headless boxes

On 18 Jul 02 at 21:32, Matthew Wilcox wrote:
> On Thu, Jul 18, 2002 at 01:18:57PM -0700, William Lee Irwin III wrote:
> > On Thu, Jul 18, 2002 at 02:29:46PM +0100, Matthew Wilcox wrote:
> > >>>EIP; c01b7695 <visual_init+85/e0> <=====
> > >>>edx; f7906600 <END_OF_CODE+37502e5c/????>
> > >>>edi; c03dcc00 <vc_cons+0/fc>
> > >>>esp; c3d45e7c <END_OF_CODE+39426d8/????>
> > > Trace; c01b7773 <vc_allocate+83/140>
> > > Trace; c01baa25 <con_open+19/88>
> > > Trace; c01ac08c <tty_open+20c/394>
> > > Trace; c0145a83 <link_path_walk+683/874>
> > > Trace; c0144ed7 <permission+27/2c>
> > > Trace; c0146373 <may_open+5f/2ac>
> > > Trace; c013c33a <chrdev_open+66/98>
> > > Trace; c013b001 <dentry_open+e1/1b0>
> > > Trace; c013af16 <filp_open+52/5c>
> > > Trace; c013b307 <sys_open+37/74>
> > > Trace; c0108893 <syscall_call+7/b>
> >
> > This is the 4th one of these I've seen in the last two days. Any chance
> > of being able to compile with -g and get an addr2line on the EIP? I've
> > tried to reproduce it myself, but haven't gotten it to happen yet.
>
> seems fairly obvious what's happening with a couple of printks...
>
> printk("visual_init: sw = %p, conswitchp = %p, currcons = %d, init = %d\n",
> sw, conswitchp, currcons, init);
>
> gets me the interesting fact that sw & conswitchp are both NULL.
> later on, we call:
> sw->con_init(vc_cons[currcons].d, init);
> which seems like it would be the exact cause, no?
>
> now whether putting a:
>
> if (!sw)
> return;
>
> call into visual_init or whether we should determine earlier never to
> call visual_init, I don't know. The people who know about the console
> have been conspicuously silent so far...

You have enabled CONFIG_VT without CONFIG_VGA_CONSOLE and
CONFIG_DUMMY_CONSOLE. It is illegal configuration.

To fix oopses, either enable 'Framebuffer devices' under 'Console
drivers' section (you do not have to enable any fbdev driver, just
check this option...), or disable CONFIG_VT. See arch/*/kernel/setup.c
for explanation, no code in VT subsystem kernel expects conswitchp == NULL,
but couple of architectures leaves sometime conswitchp uninitialized.

It would be possible to add error return path to visual_init, but
I think that adding

conswitchp = &dummy_con;
+ #else
+ #error No console defined with CONFIG_VT enabled
#endif
#endif

at the end of setup.c will work same way, as open of /dev/tty* will
never suceed with your config with added error path anyway.
Best regards,
Petr Vandrovec
[email protected]

2002-07-18 21:16:02

by Matthew Wilcox

[permalink] [raw]
Subject: Re: 2.5.26 broken on headless boxes

On Thu, Jul 18, 2002 at 11:04:54PM +0200, Petr Vandrovec wrote:
> You have enabled CONFIG_VT without CONFIG_VGA_CONSOLE and
> CONFIG_DUMMY_CONSOLE. It is illegal configuration.

Huh. So CONFIG_VT_CONSOLE is not enough any more? I really do think
this should be documented in Config.help.

> To fix oopses, either enable 'Framebuffer devices' under 'Console
> drivers' section (you do not have to enable any fbdev driver, just
> check this option...), or disable CONFIG_VT. See arch/*/kernel/setup.c
> for explanation, no code in VT subsystem kernel expects conswitchp == NULL,
> but couple of architectures leaves sometime conswitchp uninitialized.

well, this is on x86 ...

--
Revolutions do not require corporate support.

2002-07-18 21:39:50

by Petr Vandrovec

[permalink] [raw]
Subject: Re: 2.5.26 broken on headless boxes

On 18 Jul 02 at 22:18, Matthew Wilcox wrote:
> On Thu, Jul 18, 2002 at 11:04:54PM +0200, Petr Vandrovec wrote:
> > You have enabled CONFIG_VT without CONFIG_VGA_CONSOLE and
> > CONFIG_DUMMY_CONSOLE. It is illegal configuration.
>
> Huh. So CONFIG_VT_CONSOLE is not enough any more? I really do think
> this should be documented in Config.help.

CONFIG_VT_CONSOLE works other way around. If you set CONFIG_VT_CONSOLE,
/dev/console can be displayed on your VTs (on your screen).

CONFIG_VGA_CONSOLE/CONFIG_DUMMY_CONSOLE determines whether your VT can
be created at all - maybe _CONSOLE suffix is misleading - without
having at least one displaying device virtual terminals cannot be build.
I always thought that CONFIG_DUMMY_CONSOLE cannot be unset, but
apparently it can...

And BTW, when such configuration worked for you last time? It does not
look to me like that it should ever work.

> > To fix oopses, either enable 'Framebuffer devices' under 'Console
> > drivers' section (you do not have to enable any fbdev driver, just
> > check this option...), or disable CONFIG_VT. See arch/*/kernel/setup.c
> > for explanation, no code in VT subsystem kernel expects conswitchp == NULL,
> > but couple of architectures leaves sometime conswitchp uninitialized.
>
> well, this is on x86 ...
Petr Vandrovec
[email protected]

2002-07-18 21:42:21

by Matthew Wilcox

[permalink] [raw]
Subject: Re: 2.5.26 broken on headless boxes

On Thu, Jul 18, 2002 at 11:42:18PM +0200, Petr Vandrovec wrote:
> CONFIG_VGA_CONSOLE/CONFIG_DUMMY_CONSOLE determines whether your VT can
> be created at all - maybe _CONSOLE suffix is misleading - without
> having at least one displaying device virtual terminals cannot be build.
> I always thought that CONFIG_DUMMY_CONSOLE cannot be unset, but
> apparently it can...
>
> And BTW, when such configuration worked for you last time? It does not
> look to me like that it should ever work.

erm, 2.5.25 worked, and i didn't change the .config between 2.5.25 and
2.5.26 (just ran make oldconfig).

--
Revolutions do not require corporate support.

2002-07-18 22:13:36

by Petr Vandrovec

[permalink] [raw]
Subject: Re: 2.5.26 broken on headless boxes

On 18 Jul 02 at 22:45, Matthew Wilcox wrote:
> On Thu, Jul 18, 2002 at 11:42:18PM +0200, Petr Vandrovec wrote:
> > CONFIG_VGA_CONSOLE/CONFIG_DUMMY_CONSOLE determines whether your VT can
> > be created at all - maybe _CONSOLE suffix is misleading - without
> > having at least one displaying device virtual terminals cannot be build.
> > I always thought that CONFIG_DUMMY_CONSOLE cannot be unset, but
> > apparently it can...
> >
> > And BTW, when such configuration worked for you last time? It does not
> > look to me like that it should ever work.
>
> erm, 2.5.25 worked, and i didn't change the .config between 2.5.25 and
> 2.5.26 (just ran make oldconfig).

Got it. It was introduced in console.c:1.13 change from jsimmons. Before
this change we did not registered tty driver:

con_init says:
const char* display_desc = NULL;
if (conswitchp) display_desc = conswitchp->con_startup();
if (!display_desc) {
fg_console = 0;
return;
}
...
if (tty_register_driver(&console_driver)) ...

so we did not registered VT subsystem and panic did not happened:
instead of that you got 'cannot open initial console' or something
like that...

But after change tty_register_driver is invoked (through vty_init)
unconditionally from tty_io.c, where it depends only on CONFIG_VT.

So quick untested fix is

--- a/drivers/char/console.c Tue Jul 16 01:22:31 2002
+++ b/drivers/char/console.c Fri Jul 19 00:12:01 2002
@@ -2487,6 +2487,9 @@

int __init vty_init(void)
{
+ if (!conswitchp) {
+ return 0;
+ }
memset(&console_driver, 0, sizeof(struct tty_driver));
console_driver.magic = TTY_DRIVER_MAGIC;
console_driver.name = "vc/%d";

But I'll leave final decision at James, maybe he want to support
VT without underlying console, and testing almost same condition
on two places looks suspicious to me. Either we need blank timer
and console, or do not. But registering one half in vty_init,
and second half in con_init?
Best regards,
Petr Vandrovec

2002-07-18 22:17:45

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: 2.5.26 broken on headless boxes

Em Fri, Jul 19, 2002 at 12:16:19AM +0200, Petr Vandrovec escreveu:
> so we did not registered VT subsystem and panic did not happened:
> instead of that you got 'cannot open initial console' or something
> like that...

Hummm, maybe this is what was biting me yesterday... I'll check RSN,
as soon as I arrive at home 8)

- Arnaldo

2002-07-18 22:58:14

by James Simmons

[permalink] [raw]
Subject: Re: 2.5.26 broken on headless boxes


> But I'll leave final decision at James, maybe he want to support
> VT without underlying console, and testing almost same condition
> on two places looks suspicious to me. Either we need blank timer
> and console, or do not. But registering one half in vty_init,
> and second half in con_init?

This problem arises from the code being half way in between the final
results. The idea was to register a console if we have a displayed.
All we need for printk. When we detect a keyboard to attach to a display
without a keyboard then we would register a tty device then. The idea was
to allow for a setup like having one keyboard and one display as your
normal tty and a extra display as printk display. The code will be
changing lot over the next few weeks.


2002-07-18 23:04:17

by James Simmons

[permalink] [raw]
Subject: Re: 2.5.26 broken on headless boxes


> I also see similar problems on x86-64 in 2.5.25. The kernel quickly crashes
> when trying to return from opost_write() because something below has zeroed
> out the stack (with serial console and vga console and early console enabled)
> I have not tried it with 2.5.26 yet.

It is the result of registering the console device first for printk and
then later registering the tty device. Eventually I like to be able to
have VT_CONSOLE independent of CONFIG_VT so we could have a light weight
printk. The goal is register tty device once we find a keyboard of some
kind.

2002-07-19 09:42:26

by Andi Kleen

[permalink] [raw]
Subject: Re: 2.5.26 broken on headless boxes

On Thu, Jul 18, 2002 at 04:07:06PM -0700, James Simmons wrote:
>
> > I also see similar problems on x86-64 in 2.5.25. The kernel quickly crashes
> > when trying to return from opost_write() because something below has zeroed
> > out the stack (with serial console and vga console and early console enabled)
> > I have not tried it with 2.5.26 yet.
>
> It is the result of registering the console device first for printk and
> then later registering the tty device. Eventually I like to be able to
> have VT_CONSOLE independent of CONFIG_VT so we could have a light weight
> printk. The goal is register tty device once we find a keyboard of some
> kind.


Could you explain to me how this causes a stack overwrite with the
current kernel ? I would like to fix this ASAP because it is a showstopper
for me in 2.5 right now.

-Andi