2007-06-13 19:59:23

by Michal Piotrowski

[permalink] [raw]
Subject: Re: [2/2] 2.6.22-rc4: known regressions v3

Hi all,

Here is a list of some known regressions in 2.6.22-rc4.

Feel free to add new regressions/remove fixed etc.
http://kernelnewbies.org/known_regressions



Networking

Subject : commit 9093bbb2d96d0184f037cea9b4e952a44ebe7c32 broke the bonding driver
References : http://lkml.org/lkml/2007/6/13/65
Submitter : Dan Aloni <[email protected]>
Handled-By : Stephen Hemminger <[email protected]>
Status : Unknown



Sparc64

Subject : 2.6.22-rc broke X on Ultra5
References : http://lkml.org/lkml/2007/5/22/78
Submitter : Mikael Pettersson <[email protected]>
Handled-By : David Miller <[email protected]>
Status : problem is being debugged



Suspend

Subject : hibernate(?) fails totally - regression
References : http://lkml.org/lkml/2007/6/1/401
Submitter : David Greaves <[email protected]>
Handled-By : Rafael J. Wysocki <[email protected]>
Caused-By : Tejun Heo <[email protected]>
commit 9666f4009c22f6520ac3fb8a19c9e32ab973e828
Status : problem is being debugged



TTY

Subject : OOPS (NULL pointer dereference) in v2.6.22-rc3
References : http://lkml.org/lkml/2007/6/1/389
http://bugzilla.kernel.org/show_bug.cgi?id=8473
http://bugzilla.kernel.org/show_bug.cgi?id=8574
Submitter : Alex Riesen <[email protected]>
Status : problem is being debugged



x86-64

Subject : x86-64 2.6.22-rc2 random segfaults
References : http://lkml.org/lkml/2007/5/24/275
Submitter : Ioan Ionita <[email protected]>
Status : Unknown



Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/


2007-06-13 20:22:15

by Björn Steinbrink

[permalink] [raw]
Subject: Re: [2/2] 2.6.22-rc4: known regressions v3

On 2007.06.13 21:57:56 +0200, Michal Piotrowski wrote:
> TTY
>
> Subject : OOPS (NULL pointer dereference) in v2.6.22-rc3
> References : http://lkml.org/lkml/2007/6/1/389
> http://bugzilla.kernel.org/show_bug.cgi?id=8473
> http://bugzilla.kernel.org/show_bug.cgi?id=8574
> Submitter : Alex Riesen <[email protected]>
> Status : problem is being debugged

Patch available at: http://lkml.org/lkml/2007/6/8/490

Bj?rn

2007-06-13 20:50:56

by Michal Piotrowski

[permalink] [raw]
Subject: Re: [2/2] 2.6.22-rc4: known regressions v3

On 13/06/07, Bj?rn Steinbrink <[email protected]> wrote:
> On 2007.06.13 21:57:56 +0200, Michal Piotrowski wrote:
> > TTY
> >
> > Subject : OOPS (NULL pointer dereference) in v2.6.22-rc3
> > References : http://lkml.org/lkml/2007/6/1/389
> > http://bugzilla.kernel.org/show_bug.cgi?id=8473
> > http://bugzilla.kernel.org/show_bug.cgi?id=8574
> > Submitter : Alex Riesen <[email protected]>
> > Status : problem is being debugged
>
> Patch available at: http://lkml.org/lkml/2007/6/8/490

Thanks for letting me know.

Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/

2007-06-13 22:25:43

by Mark Fortescue

[permalink] [raw]
Subject: Re: [2/2] 2.6.22-rc4: known regressions v3

Hi all,

The random seg faults on x86_64 is interesting as I have been getting
random illegal instruction faults on sparc (sun4c) with 2.6.22-rc3. I have
not yet tried to track it down. All I know at present is that it is not a
problem on 2.6.20.9.

Regards
Mark Fortescue.

On Wed, 13 Jun 2007, Michal Piotrowski wrote:

> Hi all,
>
> Here is a list of some known regressions in 2.6.22-rc4.
>
> Feel free to add new regressions/remove fixed etc.
> http://kernelnewbies.org/known_regressions
>
>
>
> Networking
>
> Subject : commit 9093bbb2d96d0184f037cea9b4e952a44ebe7c32 broke the
> bonding driver
> References : http://lkml.org/lkml/2007/6/13/65
> Submitter : Dan Aloni <[email protected]>
> Handled-By : Stephen Hemminger <[email protected]>
> Status : Unknown
>
>
>
> Sparc64
>
> Subject : 2.6.22-rc broke X on Ultra5
> References : http://lkml.org/lkml/2007/5/22/78
> Submitter : Mikael Pettersson <[email protected]>
> Handled-By : David Miller <[email protected]>
> Status : problem is being debugged
>
>
>
> Suspend
>
> Subject : hibernate(?) fails totally - regression
> References : http://lkml.org/lkml/2007/6/1/401
> Submitter : David Greaves <[email protected]>
> Handled-By : Rafael J. Wysocki <[email protected]>
> Caused-By : Tejun Heo <[email protected]>
> commit 9666f4009c22f6520ac3fb8a19c9e32ab973e828
> Status : problem is being debugged
>
>
>
> TTY
>
> Subject : OOPS (NULL pointer dereference) in v2.6.22-rc3
> References : http://lkml.org/lkml/2007/6/1/389
> http://bugzilla.kernel.org/show_bug.cgi?id=8473
> http://bugzilla.kernel.org/show_bug.cgi?id=8574
> Submitter : Alex Riesen <[email protected]>
> Status : problem is being debugged
>
>
>
> x86-64
>
> Subject : x86-64 2.6.22-rc2 random segfaults
> References : http://lkml.org/lkml/2007/5/24/275
> Submitter : Ioan Ionita <[email protected]>
> Status : Unknown
>
>
>
> Regards,
> Michal
>
> --
> LOG
> http://www.stardust.webpages.pl/log/
> -
> To unsubscribe from this list: send the line "unsubscribe sparclinux" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2007-06-14 01:59:59

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [2/2] 2.6.22-rc4: known regressions v3

On Wed, Jun 13, 2007 at 11:25:20PM +0100, Mark Fortescue wrote:
> The random seg faults on x86_64 is interesting as I have been getting
> random illegal instruction faults on sparc (sun4c) with 2.6.22-rc3. I have
> not yet tried to track it down. All I know at present is that it is not a
> problem on 2.6.20.9.

Very interesting. Any hints as to how to test or how long to wait
before the illegal instructions happen?


-- wli

2007-06-14 10:30:58

by Mark Fortescue

[permalink] [raw]
Subject: Re: [2/2] 2.6.22-rc4: known regressions v3

Hi All,

They apear as soon as simpleinit starts up. Somtimes I get to a login
prompt before seeing any. Other times, commands in the simpleinit rc
script fail.

They do apear to be random. If a command failes, you re-run the command
and it is OK. Commands seen to fail are basic (depmod, rm cat ..).

The test I did use the same binaries with both the OK and problem kernels
so it is not a change to the application code, it is definatly a kernel
issue.

Regards
Mark Fortescue.

On Wed, 13 Jun 2007, William Lee Irwin III wrote:

> On Wed, Jun 13, 2007 at 11:25:20PM +0100, Mark Fortescue wrote:
>> The random seg faults on x86_64 is interesting as I have been getting
>> random illegal instruction faults on sparc (sun4c) with 2.6.22-rc3. I have
>> not yet tried to track it down. All I know at present is that it is not a
>> problem on 2.6.20.9.
>
> Very interesting. Any hints as to how to test or how long to wait
> before the illegal instructions happen?
>
>
> -- wli
>

2007-06-14 14:21:24

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [2/2] 2.6.22-rc4: known regressions v3

On Thu, Jun 14, 2007 at 11:30:25AM +0100, Mark Fortescue wrote:
> They apear as soon as simpleinit starts up. Somtimes I get to a login
> prompt before seeing any. Other times, commands in the simpleinit rc
> script fail.
> They do apear to be random. If a command failes, you re-run the command
> and it is OK. Commands seen to fail are basic (depmod, rm cat ..).
> The test I did use the same binaries with both the OK and problem kernels
> so it is not a change to the application code, it is definatly a kernel
> issue.

This sounds like it may be addressed by benh's ptep_set_access_flags()
fixes. Those fixes are still in -mm, hopefully to hit mainline by 2.6.22.


-- wli

2007-06-14 14:57:42

by Mark Fortescue

[permalink] [raw]
Subject: Re: [2/2] 2.6.22-rc4: known regressions v3


Benh's ptep_set_access_flags() patch needs to be applied in order to get
anyware with sun4c for all kernels >= linux-2.6.15. If not applied, you
will be lucky to get sash running as your init and even that will have
very limitit capabilities before it locks up the processor (power up
reset required).

It has been applied to both the kernels I used for testing so this
problem is independent of the ptep_set_access_flags patch but that
does not mean that it is not a related issue.

I will try to get some testing done over the weekend to narrow down
when the random illegal instructions first occour.

If I start with 2.6.21 then if that is OK, then I should be able to narow
the issue down without too much trouble. If it is between 2.6.20 and
2.6.21 then it will be a right pig as there are a large number of commits
that don't compile for sun4c between these two. What I am hoping is that
it occours in the 2.6.22-rc2 as per the x86_64.

I am going to have to put a 'reset' button onto my test system as power up
resets are bad news on this old hardware and almost all kernel failures
result in a processor lockup. I have even had to make BUG reports 'panic'
as thoes that I have during kernel fault location had are terminal to a
sun4c (they cause a processor lockup).

Regards
Mark Fortescue.

On Thu, 14 Jun 2007, William Lee Irwin III wrote:

> On Thu, Jun 14, 2007 at 11:30:25AM +0100, Mark Fortescue wrote:
>> They apear as soon as simpleinit starts up. Somtimes I get to a login
>> prompt before seeing any. Other times, commands in the simpleinit rc
>> script fail.
>> They do apear to be random. If a command failes, you re-run the command
>> and it is OK. Commands seen to fail are basic (depmod, rm cat ..).
>> The test I did use the same binaries with both the OK and problem kernels
>> so it is not a change to the application code, it is definatly a kernel
>> issue.
>
> This sounds like it may be addressed by benh's ptep_set_access_flags()
> fixes. Those fixes are still in -mm, hopefully to hit mainline by 2.6.22.
>
>
> -- wli
>

2007-06-14 15:01:49

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [2/2] 2.6.22-rc4: known regressions v3

On Thu, Jun 14, 2007 at 03:57:25PM +0100, Mark Fortescue wrote:
> Benh's ptep_set_access_flags() patch needs to be applied in order to get
> anyware with sun4c for all kernels >= linux-2.6.15. If not applied, you
> will be lucky to get sash running as your init and even that will have
> very limitit capabilities before it locks up the processor (power up
> reset required).
> It has been applied to both the kernels I used for testing so this
> problem is independent of the ptep_set_access_flags patch but that
> does not mean that it is not a related issue.
> I will try to get some testing done over the weekend to narrow down
> when the random illegal instructions first occour.
> If I start with 2.6.21 then if that is OK, then I should be able to narow
> the issue down without too much trouble. If it is between 2.6.20 and
> 2.6.21 then it will be a right pig as there are a large number of commits
> that don't compile for sun4c between these two. What I am hoping is that
> it occours in the 2.6.22-rc2 as per the x86_64.

Sounds like I'll be digging through my hardware stockpiles this weekend
to find a functional sun4c box.


-- wli

2007-06-14 21:56:09

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [2/2] 2.6.22-rc4: known regressions v3

On Wed, 13 Jun 2007 21:57:56 +0200
Michal Piotrowski <[email protected]> wrote:

> Hi all,
>
> Here is a list of some known regressions in 2.6.22-rc4.
>
> Feel free to add new regressions/remove fixed etc.
> http://kernelnewbies.org/known_regressions
>
>
>
> Networking
>
> Subject : commit 9093bbb2d96d0184f037cea9b4e952a44ebe7c32 broke the bonding driver
> References : http://lkml.org/lkml/2007/6/13/65
> Submitter : Dan Aloni <[email protected]>
> Handled-By : Stephen Hemminger <[email protected]>
> Status : Unknown
>
>

Patch available (to bonding).

2007-06-15 23:27:59

by Linus Torvalds

[permalink] [raw]
Subject: Re: [2/2] 2.6.22-rc4: known regressions v3



On Wed, 13 Jun 2007, Michal Piotrowski wrote:
>
> Subject : hibernate(?) fails totally - regression
> References : http://lkml.org/lkml/2007/6/1/401
> Submitter : David Greaves <[email protected]>
> Handled-By : Rafael J. Wysocki <[email protected]>
> Caused-By : Tejun Heo <[email protected]>
> commit 9666f4009c22f6520ac3fb8a19c9e32ab973e828
> Status : problem is being debugged

Ahh. This is fixed (fix by Tejun, confirmed by David), and the fix has
been merged. It's commit bc90ba093a, in case anybody cares.

Linus

2007-06-17 11:35:55

by Mark Fortescue

[permalink] [raw]
Subject: Re: [2/2] 2.6.22-rc4: known regressions v3

Hi all,

I have been investigating the random invalid instruction occourances on
sparc32 (sun4c) and identified that the problem was introduced
pre-v2.6.22-rc1. v2.6.21 is OK. The first time I have observed the issue
so far is after commit b46b8f19c9cd435ecac4d9d12b39d78c137ecd66:
Increase slab redzone to 64bits.

Prior to this commit there apears to be a problem with the memory
management (depmod -a causes the system to run out of memory!) that may be
masking the issue. As a result of this, I am going to try to find the
'last known good' commit after v2.6.21 to see if this helps narrow down
the cause.

Regards
Mark Fortescue.

On Thu, 14 Jun 2007, William Lee Irwin III wrote:

> On Thu, Jun 14, 2007 at 11:30:25AM +0100, Mark Fortescue wrote:
>> They apear as soon as simpleinit starts up. Somtimes I get to a login
>> prompt before seeing any. Other times, commands in the simpleinit rc
>> script fail.
>> They do apear to be random. If a command failes, you re-run the command
>> and it is OK. Commands seen to fail are basic (depmod, rm cat ..).
>> The test I did use the same binaries with both the OK and problem kernels
>> so it is not a change to the application code, it is definatly a kernel
>> issue.
>
> This sounds like it may be addressed by benh's ptep_set_access_flags()
> fixes. Those fixes are still in -mm, hopefully to hit mainline by 2.6.22.
>
>
> -- wli
>