2003-01-10 16:52:35

by Derek Atkins

[permalink] [raw]
Subject: Linus BK tree crashes with PANIC: INIT: segmentation violation

Hi,

I've been trying to get a current 2.5 kernel up and running but I've
hit a wall. When I run my machine with a current kernel I get the
following message to my terminal, repeated ad nausium:

PANIC: INIT: segmentation violation at 0x804a08c (code)! sleeping for 30 seconds!

I've been working off of Linus' BK repository and was finally able to
get a "working" kernel when I backed up to approximately 2002-12-30
(which is sometime between 2.5.53 and 2.5.54). That kernel works just
fine.

Sometime between December 30 and January 1 a patch was added that
causes the kernel to go into an infinite loop of Oopses. Then
sometime later the behavior changed to the INIT problem I mentioned
above.

Does anyone have any clue how to deal with this? I can assure you
that it is NOT a hardware problem (otherwise why would the same
hardware work with 2.4 and 2.5.53+?) The only change being made here
is the kernel.

I have not had a chance to back out each and every ChangeSet
individually between Jan1 and Dec30 to figure out what was causing the
stream of oopses -- nor am I even confident that that would lead me to
a solution for the "PANIC: INIT" problem.

In case anyone cares, the most recent ChangeSet from my
confirmed-working (2.5.53+) tree is labeled:

1.1004 02/12/30 13:47:09 [email protected] +2 -0
Make x86 platform choice strings more easily selectable

However I have not guaranteed that this is the Changeset just before
it failed (I'm not enough of a bk guru to figure out how to pull down
one changeset at a time).

Any advice would be greatly appreciated.... I'd be more than happy to
try things out for people if you have tests you want me to run.

Thanks!

-derek

PS: I am not subscribed directly so please CC me on your replies.

--
Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
Member, MIT Student Information Processing Board (SIPB)
URL: http://web.mit.edu/warlord/ PP-ASEL-IA N1NWH
[email protected] PGP key available


2003-01-10 21:11:37

by Tomas Szepe

[permalink] [raw]
Subject: Re: Linus BK tree crashes with PANIC: INIT: segmentation violation

> [[email protected]]
>
> PANIC: INIT: segmentation violation at 0x804a08c (code)! sleeping for 30 seconds!

I'm seeing the same problem with vanilla 2.5.56 booting in vmware
workstation 3.2.0.

--
Tomas Szepe <[email protected]>

2003-01-10 22:04:18

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linus BK tree crashes with PANIC: INIT: segmentation violation

In article <[email protected]>,
>
>I've been trying to get a current 2.5 kernel up and running but I've
>hit a wall. When I run my machine with a current kernel I get the
>following message to my terminal, repeated ad nausium:
>
> PANIC: INIT: segmentation violation at 0x804a08c (code)! sleeping for 30 seconds!

Hmm.. Can you try to pinpoint more exactly the change that caused it?

>In case anyone cares, the most recent ChangeSet from my
>confirmed-working (2.5.53+) tree is labeled:
>
> 1.1004 02/12/30 13:47:09 [email protected] +2 -0
> Make x86 platform choice strings more easily selectable
>
>However I have not guaranteed that this is the Changeset just before
>it failed (I'm not enough of a bk guru to figure out how to pull down
>one changeset at a time).

Don't pull one at a time - instead just get my latest BK tree, and then
you can do

bk clone -ql -rXXXX linus-BK test-tree

to get a tree with the top-of-tree being XXXX.

Together with "bk revtool" you can traverse the merge tree to decide on
interesting points you want to back up further with. If, for example,
the kernel still doesn't work at XXXX, you can then do a

cd test-tree
bk revtool
.. find an interesting spot YYYY ...
bk undo -aYYYY

to clip some changes from the test-tree to see if that helps.

Linus

2003-01-10 22:44:44

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linus BK tree crashes with PANIC: INIT: segmentation violation


[ Larry added to cc in case he has some magic ways to make BK more easily
do the "split the set in half" problem for doing binary searches on BK
trees ]

On 10 Jan 2003, Derek Atkins wrote:
>
> Linus Torvalds <[email protected]> writes:
>
> > Hmm.. Can you try to pinpoint more exactly the change that caused it?
>
> Well, which change-point do you want? Do you want the change-point
> from the string-of-oppses to the "PANIC: INIT: ..."? Or the
> change-point from a working kernel to the string-of-oopses? I already
> computed the latter point. I have not computed the former.

Well, both are interesting. Nothing says that the problems have to be
related, but it's certainly not inconceivable, and as such both points end
up being interesting.

> 1) PANIC: INIT: ...
> 2) String of Oopses
> 3) Working Tree.
>
> The changeover from 2-3 is approximately December 30 (see my previous
> post).

I was hoping for a exact changset, your post didn't seem to be 100% sure.

Anyway, the one you pinpointed ("Make x86 platform choice strings more
easily selectable" top-of-tree is working), is followed by a patch by
Christop Hellwig ("Missed one 'try_inc_mod_count'") which almost certainly
isn't the cause of your trouble. So I'd like you to go forward a bit.

For example, if you know that your (2) happens before 2.5.54, then you can
do

bk clone -ql -rv2.5.54 linux-BK test-tree
bk changes
.. look for the one you already know is ok: it's called
"1.911.13.50" in the full 2.5.54 tree ..

Now you can either just visibly look at what got merged afterwards:

bk revtool
.. look for it in the tree - now you see what followed it
and merge points are interesting places to look at.

Then you can figure out some interesting places to look for (depends on
your hardware - if you have a aic7xxx controller, one obvious interesting
place to look at is the merge of the aic7xx changes by justin gibbs).

If you care about full BK magic, you can also see a full list of changes
between two versions by doing

bk set -n -d -rXXXX -rYYYY | bk -R prs -h -d':I: <:P:@:HOST:>\n$each(:C:){\t(:C:)\n}\n' -

to see every changeset that happened between XXXX and YYYY.

(In case you care: the "bk set" will spit out all keys that are the "set
difference" between what is in XXXX and YYYY, and then the "bk prs" part
will print out those keys in a readable manner according to the
description string given by "-d").

So with XXXX being 1.911.13.50 and YYYY being top-of-tree (you can use '+'
for this), you get:

1.858.2.3 <[email protected]>
Added in Radeon PCI ids into pci_ids.h from radeon.h. IGA fbdev uses C99 now.

1.865.2.1 <[email protected]>
Merge maxwell.earthlink.net:/usr/src/linus-2.5
into maxwell.earthlink.net:/usr/src/fbdev-2.5

1.865.2.2 <[email protected]>
Fixes from the PPC guys. Lots of small fixes.
....

and you can look for things that look interesting and decide to try a
kernel with that instead..

I don't know what the right way to make bk show a "halfway point" between
two releases is, since a BK tree is not just a linear collection. The
above will give all points between two releases, but won't help you much
decide _which_ point is the best one to start with (unless the description
makes you suspicious).

> Question: how do I determine the "global" id for a particular
> revision? My 1.1004 is certainly not the same as yours (which is why
> I gave the changelog and timestamp information).

The "global ID" is the key of a changeset, and you can get it with ':KEY:'
in the bk prs description (ie in the above command, you can add a :KEY:\n
to the beginning of the string that '-d' specifies, and you'll see
something like

[email protected]|ChangeSet|20021210013840|58905
1.858.2.3 <[email protected]>
Added in Radeon PCI ids into pci_ids.h from radeon.h. IGA fbdev uses C99 now.

[email protected]|ChangeSet|20021210063345|36692
1.865.2.1 <[email protected]>
Merge maxwell.earthlink.net:/usr/src/linus-2.5
into maxwell.earthlink.net:/usr/src/fbdev-2.5

[email protected]|ChangeSet|20021211155159|31301
1.865.2.2 <[email protected]>
Fixes from the PPC guys. Lots of small fixes.

...

instead, where the "[email protected]|ChangeSet|20021211155159|31301"
things are the global keys that are stable across merges. They are kind of
awkward to use for day-to-day stuff, though, for obvious reasons.

Linus

2003-01-10 22:58:36

by Tomas Szepe

[permalink] [raw]
Subject: Re: Linus BK tree crashes with PANIC: INIT: segmentation violation

> [[email protected]]
>
> I was hoping for a exact changset, your post didn't seem to be 100% sure.
>
> Anyway, the one you pinpointed ("Make x86 platform choice strings more
> easily selectable" top-of-tree is working), is followed by a patch by
> Christop Hellwig ("Missed one 'try_inc_mod_count'") which almost certainly
> isn't the cause of your trouble. So I'd like you to go forward a bit.
>
> For example, if you know that your (2) happens before 2.5.54, then you can
> do
>
> bk clone -ql -rv2.5.54 linux-BK test-tree
> bk changes
> .. look for the one you already know is ok: it's called
> "1.911.13.50" in the full 2.5.54 tree ..

Or you can just do everything "by hand" using these two together:
http://linux.bkbits.net:8080/linux-2.5
ftp://ftp.nl.linux.org/pub/linux/bk2patch/v2.5

--
Tomas Szepe <[email protected]>

2003-01-11 04:58:14

by Derek Atkins

[permalink] [raw]
Subject: Re: Linus BK tree crashes with PANIC: INIT: segmentation violation

Ok, I'll work on tracking down the exact patches where the changeovers
from 1->2 and 2->3 occur. It's going to take me some time to
continually back out patches and rebuild, but I'll work on it.
Hopefully it will only take a couple days.

Any hints on ways to "move forward" as opposed to just moving
backwards would be useful too.

Thanks,

-derek

--
Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
Member, MIT Student Information Processing Board (SIPB)
URL: http://web.mit.edu/warlord/ PP-ASEL-IA N1NWH
[email protected] PGP key available

2003-01-11 05:29:47

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linus BK tree crashes with PANIC: INIT: segmentation violation


On 11 Jan 2003, Derek Atkins wrote:
>
> Ok, I'll work on tracking down the exact patches where the changeovers
> from 1->2 and 2->3 occur. It's going to take me some time to
> continually back out patches and rebuild, but I'll work on it.
> Hopefully it will only take a couple days.
>
> Any hints on ways to "move forward" as opposed to just moving
> backwards would be useful too.

None really good - just keep a (local) full tree around, and when you want
to move forward you do a "bk pull" from that tree (or clone a whole new
base tree to work with).

Linus

2003-01-11 22:11:00

by Derek Atkins

[permalink] [raw]
Subject: Re: Linus BK tree crashes with PANIC: INIT: segmentation violation

Linus Torvalds <[email protected]> writes:

> > Well, which change-point do you want? Do you want the change-point
> > from the string-of-oppses to the "PANIC: INIT: ..."? Or the
> > change-point from a working kernel to the string-of-oopses? I already
> > computed the latter point. I have not computed the former.
>
> Well, both are interesting. Nothing says that the problems have to be
> related, but it's certainly not inconceivable, and as such both points end
> up being interesting.
>
> > 1) PANIC: INIT: ...
> > 2) String of Oopses
> > 3) Working Tree.
> >
> > The changeover from 2-3 is approximately December 30 (see my previous
> > post).
>
> I was hoping for a exact changset, your post didn't seem to be 100% sure.

Ok, I've found the culprit for the "2-3" changepoint (the
string-of-oopses). Now that I've figured this one out, I'll go work
out the 1-2 changepoint. The string of oopses is a result of the
attached ChangeSet (a change to kernel/kallsyms.c:kallsyms_lookup()).

It is quite clearly a problem with the "grab name" loop around line
49. In particular, it appears that "*name" might be a "negative"
number which is sending the strncpy() into lala land. Even if you
force *name to be unsigned (as a later patch does), it still
sign-extends the negative number and you get a HUGE positive number.

I'm trying to find a better way to do this, and I'll try pulling a
change forward before I work backwards to search for the 1-2
changepoint. In looking at this patch, I believe that you do NOT need
the extra "namebuf" argument -- you just need to handle the loop
properly (and return the correct string).

More in a bit...

-derek

[email protected], 2002-12-25 21:46:32-06:00, [email protected]
kbuild: Stem compression for kallsyms

This patch implements simple stem compression for the kallsyms symbol
table. Each symbol has as first byte a count on how many characters
are identical to the previous symbol. This compresses the often
common repetive prefixes (like subsys_) fairly effectively.

On a fairly full featured monolithic i386 kernel this saves about 60k in
the kallsyms symbol table.

The changes are very simple, so the 60k are not shabby.

One visible change is that the caller of kallsyms_lookup has to pass in
a buffer now, because it has to be modified. I added an arbitary
127 character limit to it.

Still >210k left in the symbol table unfortunately. Another idea would be to
delta encode the addresses in 16bits (functions are all likely to be smaller
than 64K). This would especially help on 64bit hosts. Not done yet, however.

No, before someone asks, I don't want to use zlib for that. Far too fragile
during an oops and overkill too and it would require to link it into all
kernels.

--
Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
Member, MIT Student Information Processing Board (SIPB)
URL: http://web.mit.edu/warlord/ PP-ASEL-IA N1NWH
[email protected] PGP key available

2003-01-12 01:40:54

by Derek Atkins

[permalink] [raw]
Subject: Re: Linus BK tree crashes with PANIC: INIT: segmentation violation

Linus,

Linus Torvalds <[email protected]> writes:

> > 1) PANIC: INIT: ...
> > 2) String of Oopses
> > 3) Working Tree.
> >
> > The changeover from 2-3 is approximately December 30 (see my previous
> > post).
>
> I was hoping for a exact changset, your post didn't seem to be 100% sure.

The 'String of oopses' was a red herring. It was fixed sometime in early
January. The PANIC: INIT: problem, however, is real, and was introduced
by the following ChangeSet on January 7:

D 1.972 03/01/07 10:08:55-08:00 [email protected] 15824 15815 2/0/1
P ChangeSet
C Move x86 signal handler return stub to the vsyscall page,
C and stop honoring the SA_RESTORER information.
C
C This will prepare us for alternate signal handler returns.

If I build a kernel WITH this changeset it fails; if I build a kernel
at 1.971 (bjorn_helgaas's i810/i830 AGP patches) the kernel works just
fine.

So, something in your changes to kernel/sysenter.c or kernel/signal.c
causes INIT to PANIC and fail.

So, what would you like me to test, now?

-derek

--
Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
Member, MIT Student Information Processing Board (SIPB)
URL: http://web.mit.edu/warlord/ PP-ASEL-IA N1NWH
[email protected] PGP key available

2003-01-12 03:58:52

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linus BK tree crashes with PANIC: INIT: segmentation violation


On 11 Jan 2003, Derek Atkins wrote:
>
> The 'String of oopses' was a red herring. It was fixed sometime in early
> January. The PANIC: INIT: problem, however, is real, and was introduced
> by the following ChangeSet on January 7:
>
> D 1.972 03/01/07 10:08:55-08:00 [email protected] 15824 15815 2/0/1
> P ChangeSet
> C Move x86 signal handler return stub to the vsyscall page,
> C and stop honoring the SA_RESTORER information.
> C
> C This will prepare us for alternate signal handler returns.

Interesting.

I was afraid that somebody would actually be _using_ the SA_RESTORER thing
for some totally private version of signal handler return, but I was
hoping that wouldn't be the case.

SA_RESTORER was always a bit broken.. The functionality can trivially be
restored (suggested untested patch appended), since it makes it very hard
to improve on signal handling, since old binaries that use SA_RESTORER
will force our hand.

Oh, well. Can you verify whether this fixes it for you? And thanks for
hunting down the exact changeset.

Btw, what version of "init" are you running? It would be interesting to
see what it actually does, and obviously none of the machines I have
around have that init..

Linus

----
===== arch/i386/kernel/signal.c 1.25 vs edited =====
--- 1.25/arch/i386/kernel/signal.c Tue Jan 7 10:08:52 2003
+++ edited/arch/i386/kernel/signal.c Sat Jan 11 19:59:57 2003
@@ -350,6 +350,7 @@
static void setup_frame(int sig, struct k_sigaction *ka,
sigset_t *set, struct pt_regs * regs)
{
+ void *restorer;
struct sigframe *frame;
int err = 0;

@@ -378,8 +379,12 @@
if (err)
goto give_sigsegv;

+ restorer = (void *) (fix_to_virt(FIX_VSYSCALL) + 32);
+ if (ka->sa.sa_flags & SA_RESTORER)
+ restorer = ka->sa.sa_restorer;
+
/* Set up to return from userspace. */
- err |= __put_user(fix_to_virt(FIX_VSYSCALL) + 32, &frame->pretcode);
+ err |= __put_user(restorer, &frame->pretcode);

/*
* This is popl %eax ; movl $,%eax ; int $0x80
@@ -422,6 +427,7 @@
static void setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
sigset_t *set, struct pt_regs * regs)
{
+ void *restorer;
struct rt_sigframe *frame;
int err = 0;

@@ -456,7 +462,10 @@
goto give_sigsegv;

/* Set up to return from userspace. */
- err |= __put_user(fix_to_virt(FIX_VSYSCALL) + 64, &frame->pretcode);
+ restorer = (void *) (fix_to_virt(FIX_VSYSCALL) + 64);
+ if (ka->sa.sa_flags & SA_RESTORER)
+ restorer = ka->sa.sa_restorer;
+ err |= __put_user(restorer, &frame->pretcode);

/*
* This is movl $,%eax ; int $0x80

2003-01-12 04:18:55

by Derek Atkins

[permalink] [raw]
Subject: Re: Linus BK tree crashes with PANIC: INIT: segmentation violation

Linus Torvalds <[email protected]> writes:

> Oh, well. Can you verify whether this fixes it for you? And thanks for
> hunting down the exact changeset.

Thanks. This patch does indeed fix it for me. And you're welcome --
I learned a lot about bitkeeper in the process of tracking this down.
FTR, BK *DID* make it a lot easier to visualize the graph so I could
walk backwards and track it down.

> Btw, what version of "init" are you running? It would be interesting to
> see what it actually does, and obviously none of the machines I have
> around have that init..

Red Hat's SysVinit-2.84-2.

> Linus

-derek
--
Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
Member, MIT Student Information Processing Board (SIPB)
URL: http://web.mit.edu/warlord/ PP-ASEL-IA N1NWH
[email protected] PGP key available