2007-10-24 04:19:26

by Linus Torvalds

[permalink] [raw]
Subject: Linux v2.6.24-rc1


This may count as one of the biggest -rc releases ever. It's humongous.
Usually the compressed -rc1 diffs are in the 3-5MB range, with occasional
smaller ones, and the occasional ones that top 6M, but this one is
*eleven* megs.

I'd blame the x86 renames (and the watchdog ones), but the thing is, it's
absolutely huge even when I generate the diff with git turning all those
renames into relatively small rename diffs (which I don't do for the
public diffs, since I expect that git people use git to get the changes,
and non-git people won't have tools that understand a diff that involves
renames).

In short, we just had an unusually large amount of not just x86 merges,
but also tons of new drivers (wireless networking stands out, but is by no
means the only thing - we've got dvb, regular wired network, mmc etc all
joining in), and a fair amount or architecture stuff, filesystems,
networking etc too.

So there's just lots of new stuff. The diffstat is ten thousand lines
long, and weighing in at comfortably over half a megabyte it is way over
the limits of this - or any sane - mailing list. The shortlog is barely
shorter, weighing in at "just" 8461 lines and almost 400k. The full
changelog (which I'm still producing for y'all, since people told me they
actually care last time I asked) is 4 megs.

In other words, I don't even know where to start. The big noticeable thing
is the x86 merge, and I think we all fervently hope that it won't cause
any issues. So far it's been pretty smooth sailing. Knock wood.

Less smooth has the scatter-gather changes to the block layer been, but
they are hopefully all in reasonable shape by now too. And the VM changes?
I honestly hope nobody even notices. Same goes for some of the VFS layer
changes that affected basically every filesystem (although in mostly very
straightforward ways).

Just for fun, I'd really encourage git users to just try the

git shortlog v2.6.23..

thing, it really is quite impressive.

Linus


2007-10-24 04:53:03

by Willy Tarreau

[permalink] [raw]
Subject: Re: Linux v2.6.24-rc1

On Tue, Oct 23, 2007 at 09:19:16PM -0700, Linus Torvalds wrote:
> Just for fun, I'd really encourage git users to just try the
>
> git shortlog v2.6.23..
>
> thing, it really is quite impressive.

Impressive, indeed! At least it's a great testimonial for GIT and the
workflow it permits, but from a user's perspective, so many changes at
once may look frightening!

Regards,
Willy

2007-10-24 05:22:53

by Dave Young

[permalink] [raw]
Subject: Re: Linux v2.6.24-rc1

Hi,
build failed on my pc:

arch/x86/kernel/built-in.o(.text+0x1b192): In function
`smp_send_nmi_allbutself':
: undefined reference to `genapic'

Regards
dave

2007-10-24 07:23:44

by Ingo Molnar

[permalink] [raw]
Subject: Re: Linux v2.6.24-rc1


* Dave Young <[email protected]> wrote:

> Hi,
> build failed on my pc:
>
> arch/x86/kernel/built-in.o(.text+0x1b192): In function
> `smp_send_nmi_allbutself':
> : undefined reference to `genapic'

please send us the .config you are using. Chances are that the patch
below will fix the build breakage for you.

Ingo

--------------------->
Subject: x86: fix CONFIG_KEXEC build breakage
From: Mike Galbraith <[email protected]>

X86_32 build fix to commit 62a31a03b3d2a9d20e7a073e2cd9b27bfb7d6a3f

Signed-off-by: Mike Galbraith <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
---
arch/x86/kernel/crash.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

Index: linux/arch/x86/kernel/crash.c
===================================================================
--- linux.orig/arch/x86/kernel/crash.c
+++ linux/arch/x86/kernel/crash.c
@@ -25,7 +25,7 @@
#include <linux/kdebug.h>
#include <asm/smp.h>

-#ifdef X86_32
+#ifdef CONFIG_X86_32
#include <mach_ipi.h>
#else
#include <asm/mach_apic.h>
@@ -41,7 +41,7 @@ static int crash_nmi_callback(struct not
unsigned long val, void *data)
{
struct pt_regs *regs;
-#ifdef X86_32
+#ifdef CONFIG_X86_32
struct pt_regs fixed_regs;
#endif
int cpu;
@@ -60,7 +60,7 @@ static int crash_nmi_callback(struct not
return NOTIFY_STOP;
local_irq_disable();

-#ifdef X86_32
+#ifdef CONFIG_X86_32
if (!user_mode_vm(regs)) {
crash_fixup_ss_esp(&fixed_regs, regs);
regs = &fixed_regs;

2007-10-24 07:32:19

by Ohad Ben-Cohen

[permalink] [raw]
Subject: Re: Linux v2.6.24-rc1

Hi Ingo,

On 10/24/07, Ingo Molnar <[email protected]> wrote:
> * Dave Young <[email protected]> wrote:
> > build failed on my pc:
>
> please send us the .config you are using. Chances are that the patch
> below will fix the build breakage for you.

I had the same issue, which is now fixed with your patch.

Thanks,
Ohad.

(.config attached)


Attachments:
(No filename) (347.00 B)
.config (82.47 kB)
Download all attachments

2007-10-24 07:34:14

by Dave Young

[permalink] [raw]
Subject: Re: Linux v2.6.24-rc1

On 10/24/07, Ingo Molnar <[email protected]> wrote:
>
> * Dave Young <[email protected]> wrote:
>
> > Hi,
> > build failed on my pc:
> >
> > arch/x86/kernel/built-in.o(.text+0x1b192): In function
> > `smp_send_nmi_allbutself':
> > : undefined reference to `genapic'
>
> please send us the .config you are using. Chances are that the patch
> below will fix the build breakage for you.
>
Hi,
Yes, I have tried it, works fine.

Part of my .config:
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.24-rc1
# Wed Oct 24 13:02:24 2007
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
----cut----
> Ingo
>
> --------------------->
> Subject: x86: fix CONFIG_KEXEC build breakage
> From: Mike Galbraith <[email protected]>
>
> X86_32 build fix to commit 62a31a03b3d2a9d20e7a073e2cd9b27bfb7d6a3f
>
> Signed-off-by: Mike Galbraith <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>
> ---
> arch/x86/kernel/crash.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> Index: linux/arch/x86/kernel/crash.c
> ===================================================================
> --- linux.orig/arch/x86/kernel/crash.c
> +++ linux/arch/x86/kernel/crash.c
> @@ -25,7 +25,7 @@
> #include <linux/kdebug.h>
> #include <asm/smp.h>
>
> -#ifdef X86_32
> +#ifdef CONFIG_X86_32
> #include <mach_ipi.h>
> #else
> #include <asm/mach_apic.h>
> @@ -41,7 +41,7 @@ static int crash_nmi_callback(struct not
> unsigned long val, void *data)
> {
> struct pt_regs *regs;
> -#ifdef X86_32
> +#ifdef CONFIG_X86_32
> struct pt_regs fixed_regs;
> #endif
> int cpu;
> @@ -60,7 +60,7 @@ static int crash_nmi_callback(struct not
> return NOTIFY_STOP;
> local_irq_disable();
>
> -#ifdef X86_32
> +#ifdef CONFIG_X86_32
> if (!user_mode_vm(regs)) {
> crash_fixup_ss_esp(&fixed_regs, regs);
> regs = &fixed_regs;
>

2007-10-24 07:45:27

by Paolo Ornati

[permalink] [raw]
Subject: kvm_main.c:220: error: implicit declaration of function 'smp_call_function_mask'

CC drivers/kvm/kvm_main.o
drivers/kvm/kvm_main.c: In function 'kvm_flush_remote_tlbs':
drivers/kvm/kvm_main.c:220: error: implicit declaration of function 'smp_call_function_mask'
make[2]: *** [drivers/kvm/kvm_main.o] Error 1
make[1]: *** [drivers/kvm] Error 2
make: *** [drivers] Error 2

-----------

"smp_call_function_mask" is defined only on "CONFIG_SMP" but kvm uses it
unconditionally, oops!

--
Paolo Ornati
Linux 2.6.23-ge8b8c977 on x86_64

2007-10-24 07:56:17

by Jeff Garzik

[permalink] [raw]
Subject: Re: kvm_main.c:220: error: implicit declaration of function 'smp_call_function_mask'

Paolo Ornati wrote:
> CC drivers/kvm/kvm_main.o
> drivers/kvm/kvm_main.c: In function 'kvm_flush_remote_tlbs':
> drivers/kvm/kvm_main.c:220: error: implicit declaration of function 'smp_call_function_mask'
> make[2]: *** [drivers/kvm/kvm_main.o] Error 1
> make[1]: *** [drivers/kvm] Error 2
> make: *** [drivers] Error 2
>
> -----------
>
> "smp_call_function_mask" is defined only on "CONFIG_SMP" but kvm uses it
> unconditionally, oops!

Yep, posted a build fix for this the other day...

Jeff



2007-10-24 08:04:22

by Ingo Molnar

[permalink] [raw]
Subject: Re: Linux v2.6.24-rc1


* Linus Torvalds <[email protected]> wrote:

> Just for fun, I'd really encourage git users to just try the
>
> git shortlog v2.6.23..
>
> thing, it really is quite impressive.

what is also impressive is:

$ git shortlog v2.6.23.. | grep \):$ | wc -l
756

756 individual contributors. Wow!

Ingo

2007-10-24 08:05:07

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Linux v2.6.24-rc1

On Tue, Oct 23, 2007 at 09:19:16PM -0700, Linus Torvalds wrote:
> In short, we just had an unusually large amount of not just x86 merges,

Btw, can we please finis up this merge a little more before we freeze
2.6.24? The way we currently have leftovers of arch/i386/ and arch/x86_64/
is quite a nightmare and not how the other architectures were merged.

Thomas, what again prevents us from just killing these leftovers?

2007-10-24 08:12:30

by Jens Axboe

[permalink] [raw]
Subject: Re: Linux v2.6.24-rc1

On Wed, Oct 24 2007, Ingo Molnar wrote:
>
> * Dave Young <[email protected]> wrote:
>
> > Hi,
> > build failed on my pc:
> >
> > arch/x86/kernel/built-in.o(.text+0x1b192): In function
> > `smp_send_nmi_allbutself':
> > : undefined reference to `genapic'
>
> please send us the .config you are using. Chances are that the patch
> below will fix the build breakage for you.

The patch worked for me (I had the same error, .config attached). Thanks
Ingo!

--
Jens Axboe


Attachments:
(No filename) (482.00 B)
config (35.26 kB)
Download all attachments

2007-10-24 08:15:17

by Cong Wang

[permalink] [raw]
Subject: [Git Patch] arch/um/drivers/ubd_kern.c: fix a building error


Fix this uml building error:
arch/um/drivers/ubd_kern.c: In function 'do_ubd_request':
arch/um/drivers/ubd_kern.c:1118: error: implicit declaration of function 'sg_page'
arch/um/drivers/ubd_kern.c:1118: warning: passing argument 6 of 'prepare_request' makes pointer from integer without a cast
make[1]: *** [arch/um/drivers/ubd_kern.o] Error 1
make: *** [arch/um/drivers] Error 2

Signed-off-by: WANG Cong <[email protected]>

---
arch/um/drivers/ubd_kern.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c
index 3a8cd3d..440ed25 100644
--- a/arch/um/drivers/ubd_kern.c
+++ b/arch/um/drivers/ubd_kern.c
@@ -35,6 +35,7 @@
#include "linux/genhd.h"
#include "linux/spinlock.h"
#include "linux/platform_device.h"
+#include "linux/scatterlist.h"
#include "asm/segment.h"
#include "asm/uaccess.h"
#include "asm/irq.h"

2007-10-24 10:17:43

by Ingo Molnar

[permalink] [raw]
Subject: Re: Linux v2.6.24-rc1, x86 arch code quality, unifications


* Christoph Hellwig <[email protected]> wrote:

> On Tue, Oct 23, 2007 at 09:19:16PM -0700, Linus Torvalds wrote:
> > In short, we just had an unusually large amount of not just x86 merges,
>
> Btw, can we please finis up this merge a little more before we freeze
> 2.6.24? The way we currently have leftovers of arch/i386/ and
> arch/x86_64/ is quite a nightmare and not how the other architectures
> were merged.
>
> Thomas, what again prevents us from just killing these leftovers?

to answer that question one should first be aware of the fundamental
code quality problems that the unified x86 architecture has inherited
from the split i386 and x86_64 architectures.

To get objective and automated metrics about code quality, i've
constructed a table of "coding style errors per one thousand lines of
source code" numbers with the help of the latest checkpatch.pl. The
codebases i measured are the pre-merge i386 and x86_64 tree, the
post-merge arch/x86 unified architecture, and i've also added a handful
of other architectures and selected core subsystems, as comparison:

-------------------------------------------------------
| errors | lines of code | errors/KLOC
| | | (smaller is better)
--------------|------------|----------------|------------------------
arch/i386/ 5717 73865 77.3
arch/x86_64/ 2993 31155 96.0
arch/x86/ 8504 114654 74.1
..............|............|................|........................
arch/ia64/ 1779 64022 27.7
arch/mips/ 2110 94692 22.2
arch/sparc64/ 1387 49253 28.1
..............|............|................|........................
kernel/ 762 83540 9.1
kernel/time/ 15 4191 3.5
kernel/irq/ 1 2317 0.4
mm/ 464 46324 10.0
net/core 176 24413 7.2
..............|............|................|........................

a couple of observations. Firstly, it is plainly obvious that the x86_64
and i386 architectures were in a dreadful state of code quality before
the unification. Their code quality was almost an order of magnitude
worse than that of the core kernel (!) - and their code quality was
significantly worse than that of a couple of other, comparable
architectures. (we knew this when we started the x86 unification effort
- but i suspect it's even more apparent via the hard numbers in this
table.)

( Note: code metrics should be taken with a grain of salt, as they
often over-simplify the picture, but in this particular situation the
trends are clear and the numbers match my personal impressions of
code quality and robustness of these codebases. )

paradoxically the x86_64 architecture that had a _worse_ code quality
than the "legacy" 32-bit code - so much about the "newer code must be
better" misconception. The first, mechanic round of unifications thus
brought a net degradation in quality - but we've reversed that trend in
2.6.24-rc1 already, via unifications and cleanups, as it can be seen
from the table. (and we did that while adding new features like
high-resolution timers and dynticks to the x86-64bit architecture in
v2.6.24-rc1 - or the new IOMMU code. So the x86 architecture is not
standing still at all while the unification is going on.)

so to answer your question: full unification is no easy task and it is
not automatic at all. The x86_64 tree has diverged from the i386 tree in
the past 5 years due to their illogical, forced separation and a
resulting bitrot. The two architectures have grown different sets of
cleanliness problems and different sets of functions with arbitrary
differences that often cover the same functionality. It's all compounded
by the fact that the 64-bit code is in worse shape than the 32-bit - so
it's not like we could just pick the 64-bit code and use that as the
unified code. The 32-bit code is also used about 8-10 times more
frequently than the 64-bit code. So there is no easy "just unify it"
path.

The new maintainers of the x86 architecture (Thomas, Peter and me), and
many other x86 developers are highly motivated to improve the x86
architecture's code quality and unify the heck out of it, and there are
some real improvements in 2.6.24-rc1 already, but we _must_ be (and are)
working on this carefully. So we do unifications on a case by case
basis, with the highest priority being to not introduce "unification
regressions". The x86 architecture is the most common Linux architecture
after all - and users care much more about having a working kernel than
they care about cleanups and unifications. So yes, we agree with you,
but please be patient! :-) This cannot be realistically finished in
v2.6.24, without upsetting the codebase.

Ingo

2007-10-24 11:03:35

by Jens Axboe

[permalink] [raw]
Subject: Re: [Git Patch] arch/um/drivers/ubd_kern.c: fix a building error

On Wed, Oct 24 2007, WANG Cong wrote:
>
> Fix this uml building error:
> arch/um/drivers/ubd_kern.c: In function 'do_ubd_request':
> arch/um/drivers/ubd_kern.c:1118: error: implicit declaration of function 'sg_page'
> arch/um/drivers/ubd_kern.c:1118: warning: passing argument 6 of 'prepare_request' makes pointer from integer without a cast
> make[1]: *** [arch/um/drivers/ubd_kern.o] Error 1
> make: *** [arch/um/drivers] Error 2
>
> Signed-off-by: WANG Cong <[email protected]>

Thanks, applied!

--
Jens Axboe

2007-10-24 11:06:16

by Sam Ravnborg

[permalink] [raw]
Subject: Re: Linux v2.6.24-rc1

On Wed, Oct 24, 2007 at 09:04:51AM +0100, Christoph Hellwig wrote:
> On Tue, Oct 23, 2007 at 09:19:16PM -0700, Linus Torvalds wrote:
> > In short, we just had an unusually large amount of not just x86 merges,
>
> Btw, can we please finis up this merge a little more before we freeze
> 2.6.24? The way we currently have leftovers of arch/i386/ and arch/x86_64/
> is quite a nightmare and not how the other architectures were merged.

If these 10 files gives you nightmares then poor soul ;-)

Anyway - the primary issue is the two defconfig files and the Kconfig stuff.
For defconfig we can inheritate the solution from powerpc where
the defconfig file is selected based on architecture.
Something like a
DEFCONFIG := defconfig_$(ARCH)

and then stuff them in configs/ directory.

The Kconfig stuff could be handled by special casing in scripts/kconfig.
We cannot just do the more obvious which is source the files since
they have conflicting choice symbols.
That could be fixed but requires a bit more work to do so - since we need
to track all relevant usages of the choice symbols and rename these.
We could also teach kconfig to allow duplicate symbols names in two choices
but Roman Zippel has not done that yet.

The Makefile stuff is trivial to merge.


The above is not preventing us - more what needs to be done.

Sam

2007-10-24 11:31:04

by Ingo Molnar

[permalink] [raw]
Subject: [git pull] x86 arch updates


Linus, please pull the latest x86 git tree from:

git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git

it contains two build fixes and an oops-printout-ugliness fix.

Thanks!

Ingo

------------------>
Alexey Dobriyan (1):
x86: fix bogus KERN_ALERT on oops

Jeff Garzik (1):
x86: lguest build fix

Mike Galbraith (1):
x86: fix CONFIG_KEXEC build breakage

kernel/crash.c | 6 +++---
lguest/boot.c | 1 +
mm/fault_32.c | 2 +-
3 files changed, 5 insertions(+), 4 deletions(-)

2007-10-24 11:49:03

by Jeff Garzik

[permalink] [raw]
Subject: Re: [git pull] x86 arch updates

Ingo Molnar wrote:
> Linus, please pull the latest x86 git tree from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git
>
> it contains two build fixes and an oops-printout-ugliness fix.

Any chance you can kill the warnings that appear on !CONFIG_SMP,
!LOCAL_APIC ? :)

http://marc.info/?l=linux-kernel&m=119317911305274&w=2
http://marc.info/?l=linux-kernel&m=119317911105258&w=2

It is /likely/ that my fixes are not upstream-worthy, these patches
mainly illustrate the problem, not necessarily the best solution.

Jeff



2007-10-24 12:04:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: [git pull] x86 arch updates


* Jeff Garzik <[email protected]> wrote:

> Ingo Molnar wrote:
> >Linus, please pull the latest x86 git tree from:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git
> >
> >it contains two build fixes and an oops-printout-ugliness fix.
>
> Any chance you can kill the warnings that appear on !CONFIG_SMP,
> !LOCAL_APIC ? :)
>
> http://marc.info/?l=linux-kernel&m=119317911305274&w=2
> http://marc.info/?l=linux-kernel&m=119317911105258&w=2

thx, i have picked them up. Please keep the warnings fixes coming, they
are much appreciated! I've missed real kernel bugs due to compiler noise
way too many times.

Ingo

2007-10-24 12:12:44

by Ingo Molnar

[permalink] [raw]
Subject: Re: Linux v2.6.24-rc1


* Sam Ravnborg <[email protected]> wrote:

> On Wed, Oct 24, 2007 at 09:04:51AM +0100, Christoph Hellwig wrote:
> > On Tue, Oct 23, 2007 at 09:19:16PM -0700, Linus Torvalds wrote:
> > > In short, we just had an unusually large amount of not just x86 merges,
> >
> > Btw, can we please finis up this merge a little more before we freeze
> > 2.6.24? The way we currently have leftovers of arch/i386/ and arch/x86_64/
> > is quite a nightmare and not how the other architectures were merged.
>
> If these 10 files gives you nightmares then poor soul ;-)
>
> Anyway - the primary issue is the two defconfig files and the Kconfig stuff.
> For defconfig we can inheritate the solution from powerpc where
> the defconfig file is selected based on architecture.
> Something like a
> DEFCONFIG := defconfig_$(ARCH)
>
> and then stuff them in configs/ directory.
>
> The Kconfig stuff could be handled by special casing in scripts/kconfig.
> We cannot just do the more obvious which is source the files since
> they have conflicting choice symbols.
> That could be fixed but requires a bit more work to do so - since we need
> to track all relevant usages of the choice symbols and rename these.
> We could also teach kconfig to allow duplicate symbols names in two choices
> but Roman Zippel has not done that yet.
>
> The Makefile stuff is trivial to merge.

yes. But even Makefile merging can be surprisingly nontrivial at times:
we had bugs in earlier versions of the unification due to link ordering
and silent init section dependencies in the code. When we unified the
makefiles certain init code broke because the initcall ordering changed.
That's why we went for the "stupid, mechanic unification" approach first
- to always have a 100% correct fallback position that people can bisect
to.

Ingo

2007-10-24 12:20:28

by Sam Ravnborg

[permalink] [raw]
Subject: Re: Linux v2.6.24-rc1

On Wed, Oct 24, 2007 at 02:12:18PM +0200, Ingo Molnar wrote:
> > The Makefile stuff is trivial to merge.
>
> yes. But even Makefile merging can be surprisingly nontrivial at times:
> we had bugs in earlier versions of the unification due to link ordering
> and silent init section dependencies in the code. When we unified the
> makefiles certain init code broke because the initcall ordering changed.
> That's why we went for the "stupid, mechanic unification" approach first
> - to always have a 100% correct fallback position that people can bisect
> to.

With trivial I thought of:
ifeq ($(ARCH),x86_64)
include arch/x86/Makefile_64
else
include arch/x86/Makefile_32
endif

And common stuff could be put before/after the include.

Sam

2007-10-24 13:25:56

by Romano Giannetti

[permalink] [raw]
Subject: 2.6.24-rc1 fails with lockup and BUG:


Hi,

2.6.23-rc1 fails for me. I have the sensation it is network-related, but
I am not sure, so I send this message just to the list.
This same failure was present in git-5734-gd85714d, I sent
a message to the list but it seems it never arrived. I hope this will
pass through. My system is a toshiba satellite A305-S5077, dual core pentium.

The symptoms are quite strange. At boot, NetworkManager fails to activate
my eth0 (r8169). Just stopping/restarting NM will make it works.

Then, after one or two or maximum three suspend to ram and resume that
works, all go awry. Notice that I do not know if the s2ram is the cause, or
simply the way to accelerate the bug.

The suspend-to-ram will fail with a messages:

"gnome-power-manager: (romano) DBUS timed out, but recovering"

and a number of processes go into D state (please find their sysrq-t traces
few lines down). Now I cannot create new windows, nor doing sudo (sudo
anything will go into D limbo), and not even a clean shutdown. Trying that
the system loops forever saying:

BUG: soft lockup - CPU#0 stuck for 11s! [ifconfig: 7481]

and sysrq-b is the only option.

Complete dmesg, config, etc at:
http://www.dea.icai.upcomillas.es/romano/linux/info/2624rc1_1/

Thanks,

Romano

PS sorry for the disclaimer, I cannot stop it (¡!)

nmbd D ca9cbea4 0 5464 1
c256eaa0 00000086 00000002 ca9cbea4 ca9cbe9c 00000000 c256ebdc c17fba80
c250e900 c0426080 c0426080 c22f67d0 c01773a3 00000010 c0426080 00013bab
00000000 000000ff 00000000 00000000 00000000 c03bc514 c03bc51c c03bc518
Call Trace:
[cache_alloc_refill+115/1280] cache_alloc_refill+0x73/0x500
[__mutex_lock_slowpath+85/144] __mutex_lock_slowpath+0x55/0x90
[mutex_lock+20/32] mutex_lock+0x14/0x20
[sock_ioctl+0/560] sock_ioctl+0x0/0x230
[dev_ioctl+200/1312] dev_ioctl+0xc8/0x520
[sock_init_data+108/384] sock_init_data+0x6c/0x180
[inet_create+413/832] inet_create+0x19d/0x340
[inotify_d_instantiate+24/128] inotify_d_instantiate+0x18/0x80
[d_alloc+265/400] d_alloc+0x109/0x190
[d_instantiate+59/80] d_instantiate+0x3b/0x50
[udp_ioctl+0/160] udp_ioctl+0x0/0xa0
[inet_ioctl+58/192] inet_ioctl+0x3a/0xc0
[sock_ioctl+207/560] sock_ioctl+0xcf/0x230
[sock_ioctl+0/560] sock_ioctl+0x0/0x230
[do_ioctl+43/144] do_ioctl+0x2b/0x90
[sys_socket+41/80] sys_socket+0x29/0x50
[vfs_ioctl+92/656] vfs_ioctl+0x5c/0x290
[sys_ioctl+61/112] sys_ioctl+0x3d/0x70
[sysenter_past_esp+95/133] sysenter_past_esp+0x5f/0x85
=======================

x-session-man D c2c49d28 0 5774 5246
c1c4d000 00200082 00000002 c2c49d28 c2c49d20 00000000 c1c4d13c c1803a80
c2c25c80 c0426080 c0426080 cab12550 c011d3c2 c03a13a0 c0426080 00016269
00000000 000000ff 00000000 00000000 00000000 c03bc514 c03bc51c c03bc518
Call Trace:
[enqueue_task+18/48] enqueue_task+0x12/0x30
[__mutex_lock_slowpath+85/144] __mutex_lock_slowpath+0x55/0x90
[mutex_lock+20/32] mutex_lock+0x14/0x20
[rtnetlink_rcv+8/32] rtnetlink_rcv+0x8/0x20
[netlink_unicast+502/544] netlink_unicast+0x1f6/0x220
[copy_from_user+46/112] copy_from_user+0x2e/0x70
[memcpy_fromiovec+56/80] memcpy_fromiovec+0x38/0x50
[netlink_sendmsg+488/720] netlink_sendmsg+0x1e8/0x2d0
[__wake_up_sync+65/128] __wake_up_sync+0x41/0x80
[sock_sendmsg+206/256] sock_sendmsg+0xce/0x100
[autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50
[__wake_up+62/96] __wake_up+0x3e/0x60
[netlink_insert+197/320] netlink_insert+0xc5/0x140
[copy_from_user+46/112] copy_from_user+0x2e/0x70
[sys_sendto+307/384] sys_sendto+0x133/0x180
[move_addr_to_user+95/112] move_addr_to_user+0x5f/0x70
[sys_getsockname+205/208] sys_getsockname+0xcd/0xd0
[__netlink_create+97/176] __netlink_create+0x61/0xb0
[inotify_d_instantiate+24/128] inotify_d_instantiate+0x18/0x80
[d_alloc+265/400] d_alloc+0x109/0x190
[d_instantiate+59/80] d_instantiate+0x3b/0x50
[sock_attach_fd+128/192] sock_attach_fd+0x80/0xc0
[sys_socketcall+408/640] sys_socketcall+0x198/0x280
[sysenter_past_esp+95/133] sysenter_past_esp+0x5f/0x85
[clip_device_event+16/160] clip_device_event+0x10/0xa0
=======================

ip D cb573d28 0 7487 7486
c1dc7aa0 00000082 00000002 cb573d28 cb573d20 00000000 c1dc7bdc c1803a80
c27ed740 c0426080 c0426080 001280d2 c218eeb0 c218ef54 c0426080 00010bb9
00000000 000000ff 00000000 00000000 00000000 c03bc514 c03bc51c c03bc518
Call Trace:
[__mutex_lock_slowpath+85/144] __mutex_lock_slowpath+0x55/0x90
[mutex_lock+20/32] mutex_lock+0x14/0x20
[rtnetlink_rcv+8/32] rtnetlink_rcv+0x8/0x20
[netlink_unicast+502/544] netlink_unicast+0x1f6/0x220
[copy_from_user+46/112] copy_from_user+0x2e/0x70
[memcpy_fromiovec+56/80] memcpy_fromiovec+0x38/0x50
[netlink_sendmsg+488/720] netlink_sendmsg+0x1e8/0x2d0
[do_lookup+101/400] do_lookup+0x65/0x190
[sock_sendmsg+206/256] sock_sendmsg+0xce/0x100
[<f88fc042>] __ext3_journal_dirty_metadata+0x22/0x60 [ext3]
[<f88de999>] journal_get_write_access+0x29/0x40 [jbd]
[autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50
[do_generic_mapping_read+871/1120] do_generic_mapping_read+0x367/0x460
[file_read_actor+0/272] file_read_actor+0x0/0x110
[current_fs_time+19/32] current_fs_time+0x13/0x20
[touch_atime+122/288] touch_atime+0x7a/0x120
[find_lock_page+47/160] find_lock_page+0x2f/0xa0
[copy_from_user+46/112] copy_from_user+0x2e/0x70
[sys_sendto+307/384] sys_sendto+0x133/0x180
[__do_fault+423/896] __do_fault+0x1a7/0x380
[unmap_vmas+847/1360] unmap_vmas+0x34f/0x550
[handle_mm_fault+248/1536] handle_mm_fault+0xf8/0x600
[sys_socketcall+408/640] sys_socketcall+0x198/0x280
[sysenter_past_esp+95/133] sysenter_past_esp+0x5f/0x85
=======================
ip D c1c033c0 0 7506 5140
c1c47000 00000082 c1c033c0 c1c033c0 c0181c11 c210bc6c c1c4713c c1803a80
c2d99c80 c0426080 c0426080 c2cc1e00 c21da65c 00000000 c0426080 54652911
c3d936bc c2174f28 f88ff2e9 c2174f28 c019af72 c03bc514 c03bc51c c03bc518
Call Trace:
[permission+97/256] permission+0x61/0x100
[__find_get_block+130/400] __find_get_block+0x82/0x190
[__mutex_lock_slowpath+85/144] __mutex_lock_slowpath+0x55/0x90
[mutex_lock+20/32] mutex_lock+0x14/0x20
[rtnetlink_rcv+8/32] rtnetlink_rcv+0x8/0x20
[netlink_unicast+502/544] netlink_unicast+0x1f6/0x220
[copy_from_user+46/112] copy_from_user+0x2e/0x70
[memcpy_fromiovec+56/80] memcpy_fromiovec+0x38/0x50
[netlink_sendmsg+488/720] netlink_sendmsg+0x1e8/0x2d0
[sock_sendmsg+206/256] sock_sendmsg+0xce/0x100
[<f88fc042>] __ext3_journal_dirty_metadata+0x22/0x60 [ext3]
[<f88de999>] journal_get_write_access+0x29/0x40 [jbd]
[autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50
[<f88de4c2>] journal_stop+0xf2/0x1a0 [jbd]
[prio_tree_insert+485/592] prio_tree_insert+0x1e5/0x250
[__wake_up+62/96] __wake_up+0x3e/0x60
[find_lock_page+47/160] find_lock_page+0x2f/0xa0
[copy_from_user+46/112] copy_from_user+0x2e/0x70
[sys_sendto+307/384] sys_sendto+0x133/0x180
[__do_fault+423/896] __do_fault+0x1a7/0x380
[__alloc_pages+79/880] __alloc_pages+0x4f/0x370
[handle_mm_fault+248/1536] handle_mm_fault+0xf8/0x600
[sys_socketcall+408/640] sys_socketcall+0x198/0x280
[sysenter_past_esp+95/133] sysenter_past_esp+0x5f/0x85
[clip_device_event+16/160] clip_device_event+0x10/0xa0
=======================
sudo D c1c033c0 0 7530 6462
cacfdaa0 00000086 c1c033c0 c1c033c0 c0181c11 c210bc6c cacfdbdc c17fba80
c2c25900 c0426080 c0426080 c2cc1e00 c21e69dc 00000000 c0426080 db0505f3
c3d937fc c2174f28 f88ff2e9 c2174f28 c019af72 c03bc514 c03bc51c c03bc518
Call Trace:
[permission+97/256] permission+0x61/0x100
[__find_get_block+130/400] __find_get_block+0x82/0x190
[__mutex_lock_slowpath+85/144] __mutex_lock_slowpath+0x55/0x90
[mutex_lock+20/32] mutex_lock+0x14/0x20
[rtnetlink_rcv+8/32] rtnetlink_rcv+0x8/0x20
[netlink_unicast+502/544] netlink_unicast+0x1f6/0x220
[copy_from_user+46/112] copy_from_user+0x2e/0x70
[memcpy_fromiovec+56/80] memcpy_fromiovec+0x38/0x50
[netlink_sendmsg+488/720] netlink_sendmsg+0x1e8/0x2d0
[do_lookup+101/400] do_lookup+0x65/0x190
[sock_sendmsg+206/256] sock_sendmsg+0xce/0x100
[<f88fc042>] __ext3_journal_dirty_metadata+0x22/0x60 [ext3]
[<f88de999>] journal_get_write_access+0x29/0x40 [jbd]
[mntput_no_expire+27/128] mntput_no_expire+0x1b/0x80
[autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50
[__wake_up+62/96] __wake_up+0x3e/0x60
[find_lock_page+47/160] find_lock_page+0x2f/0xa0
[copy_from_user+46/112] copy_from_user+0x2e/0x70
[sys_sendto+307/384] sys_sendto+0x133/0x180
[__do_fault+423/896] __do_fault+0x1a7/0x380
[handle_mm_fault+248/1536] handle_mm_fault+0xf8/0x600
[sys_socketcall+408/640] sys_socketcall+0x198/0x280
[sysenter_past_esp+95/133] sysenter_past_esp+0x5f/0x85
=======================




--
La presente comunicación tiene carácter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribución, reproducción o uso de esta comunicación y/o de la información contenida en la misma están estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por favor, notifíquelo inmediatamente al remitente contestando a este mensaje y proceda a continuación a destruirlo. Gracias por su colaboración.

This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation.

2007-10-24 14:28:23

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.24-rc1 fails with lockup and BUG:


* Romano Giannetti <[email protected]> wrote:

> 2.6.23-rc1 fails for me. I have the sensation it is network-related,
> but I am not sure, so I send this message just to the list. This same
> failure was present in git-5734-gd85714d, I sent a message to the list
> but it seems it never arrived. I hope this will pass through. My
> system is a toshiba satellite A305-S5077, dual core pentium.

could you turn on these in your .config:

CONFIG_PROVE_LOCKING=y
CONFIG_DEBUG_LIST=y
CONFIG_FRAME_POINTER=y
CONFIG_DEBUG_SLAB=y

and please post the resulting dmesg output - does lockdep notice any
lockup reason? (your backtrace suggests some mutex stuff so it might as
well detect it)

Ingo

2007-10-24 15:53:22

by Romano Giannetti

[permalink] [raw]
Subject: Re: 2.6.24-rc1 fails with lockup and BUG:


On Wed, 2007-10-24 at 16:27 +0200, Ingo Molnar wrote:

>
> CONFIG_PROVE_LOCKING=y
> CONFIG_DEBUG_LIST=y
> CONFIG_FRAME_POINTER=y
> CONFIG_DEBUG_SLAB=y
>
> and please post the resulting dmesg output - does lockdep notice any
> lockup reason? (your backtrace suggests some mutex stuff so it might as
> well detect it)
>

Done. The results are at:

http://www.dea.icai.upcomillas.es/romano/linux/info/2624rc1_2/

in the syslog-after-failed-suspend.txt file. After the failed suspend
(at line 15766) there where the bunch of things in D-state. I have left
the file intact.

At line 17646 there is:

WARNING: at kernel/lockdep.c:2033 trace_hardirqs_on()

I waited a bit and then, on an already-opened root shell, did
s2ram -f -p -m (line 17811)

and then a lot more things happened, and I am somewhat lost.

Hope this could be useful to you.

Romano

--
Sorry for the disclaimer --- ?I cannot stop it!



--
La presente comunicaci?n tiene car?cter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribuci?n, reproducci?n o uso de esta comunicaci?n y/o de la informaci?n contenida en la misma est?n estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicaci?n por error, por favor, notif?quelo inmediatamente al remitente contestando a este mensaje y proceda a continuaci?n a destruirlo. Gracias por su colaboraci?n.

This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation.

2007-10-24 15:55:37

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.24-rc1 fails with lockup and BUG:


* Romano Giannetti <[email protected]> wrote:

> Done. The results are at:
>
> http://www.dea.icai.upcomillas.es/romano/linux/info/2624rc1_2/
>
> in the syslog-after-failed-suspend.txt file. After the failed suspend
> (at line 15766) there where the bunch of things in D-state. I have
> left the file intact.
>
> At line 17646 there is:
>
> WARNING: at kernel/lockdep.c:2033 trace_hardirqs_on()

hm, this lockdep warning caused lockdep to turn itself off - hence we
wont get to the really interesting warnings. We'll try to come up with a
solution for this.

Ingo

2007-10-24 16:11:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: 2.6.24-rc1 fails with lockup and BUG:

On Wed, 2007-10-24 at 17:55 +0200, Ingo Molnar wrote:
> * Romano Giannetti <[email protected]> wrote:
>
> > Done. The results are at:
> >
> > http://www.dea.icai.upcomillas.es/romano/linux/info/2624rc1_2/
> >
> > in the syslog-after-failed-suspend.txt file. After the failed suspend
> > (at line 15766) there where the bunch of things in D-state. I have
> > left the file intact.
> >
> > At line 17646 there is:
> >
> > WARNING: at kernel/lockdep.c:2033 trace_hardirqs_on()
>
> hm, this lockdep warning caused lockdep to turn itself off - hence we
> wont get to the really interesting warnings. We'll try to come up with a
> solution for this.

Does this help?

---
Subject: lockdep: invalid irq usage

this function can be called from hardirq context.

Signed-off-by: Peter Zijlstra <[email protected]>
---

Index: linux-2.6-2/kernel/sched_debug.c
===================================================================
--- linux-2.6-2.orig/kernel/sched_debug.c
+++ linux-2.6-2/kernel/sched_debug.c
@@ -80,6 +80,7 @@ print_task(struct seq_file *m, struct rq
static void print_rq(struct seq_file *m, struct rq *rq, int rq_cpu)
{
struct task_struct *g, *p;
+ unsigned long flags;

SEQ_printf(m,
"\nrunnable tasks:\n"
@@ -88,7 +89,7 @@ static void print_rq(struct seq_file *m,
"------------------------------------------------------"
"----------------------------------------------------\n");

- read_lock_irq(&tasklist_lock);
+ read_lock_irqsave(&tasklist_lock, flags);

do_each_thread(g, p) {
if (!p->se.on_rq || task_cpu(p) != rq_cpu)
@@ -97,7 +98,7 @@ static void print_rq(struct seq_file *m,
print_task(m, rq, p);
} while_each_thread(g, p);

- read_unlock_irq(&tasklist_lock);
+ read_unlock_irqrestore(&tasklist_lock, flags);
}

void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2007-10-24 16:59:00

by Joseph Fannin

[permalink] [raw]
Subject: Re: 2.6.24-rc1 fails with lockup and BUG:

On Wed, Oct 24, 2007 at 03:25:44PM +0200, Romano Giannetti wrote:
>
> Hi,
>
> 2.6.23-rc1 fails for me. I have the sensation it is network-related, but
> I am not sure, so I send this message just to the list.
> This same failure was present in git-5734-gd85714d, I sent
> a message to the list but it seems it never arrived. I hope this will
> pass through. My system is a toshiba satellite A305-S5077, dual core pentium.
>
> The symptoms are quite strange. At boot, NetworkManager fails to activate
> my eth0 (r8169). Just stopping/restarting NM will make it works.


Denis V. Lunev wrote a patch for the NetworkManager thing a day or two
ago (which DaveM has queued).

Since netlink is involved in the traces you sent, this might do something
for the other too.

The patch I recieved follows:


> Revert to original netlink behavior. Do not reply with ACK if the
> netlink dump has bees successfully started.

> libnl has been broken by the cd40b7d3983c708aabe3d3008ec64ffce56d33b0
> The following command reproduce the problem:
> /nl-route-get 192.168.1.1

> Signed-off-by: Denis V. Lunev <[email protected]>



diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 98e313e..44a8b41 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1565,7 +1565,10 @@ int netlink_dump_start(struct sock *ssk, struct sk_buff *skb,

netlink_dump(sk);
sock_put(sk);
- return 0;
+
+ /* We successfully started a dump, by returning -EINTR we
+ * signal not to send ACK even if it was requested */
+ return -EINTR;
}

void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err)
@@ -1619,17 +1622,21 @@ int netlink_rcv_skb(struct sk_buff *skb, int (*cb)(struct sk_buff *,

/* Only requests are handled by the kernel */
if (!(nlh->nlmsg_flags & NLM_F_REQUEST))
- goto skip;
+ goto ack;

/* Skip control messages */
if (nlh->nlmsg_type < NLMSG_MIN_TYPE)
- goto skip;
+ goto ack;

err = cb(skb, nlh);
-skip:
+ if (err == -EINTR)
+ goto skip;
+
+ack:
if (nlh->nlmsg_flags & NLM_F_ACK || err)
netlink_ack(skb, nlh, err);

+skip:
msglen = NLMSG_ALIGN(nlh->nlmsg_len);
if (msglen > skb->len)
msglen = skb->len;





--
Joseph Fannin
[email protected]

2007-10-24 18:19:50

by Giacomo Catenazzi

[permalink] [raw]
Subject: Re: Linux v2.6.24-rc1


On 2.6.24-rc1-gc9927c2b BUG: unable to handle kernel paging request at virtual address 3d15b925


In last git, I see the following BUGs in various programs. It seems
reproducible, but sometime I've hard lookup on poweroff.

ciao
cate


vivi: open called (minor=0)
vivi: close called (minor=0, users=0)
BUG: unable to handle kernel paging request at virtual address 3d15b925
printing eip: 3d15b925 *pde = 00000000
Oops: 0000 [#1] SMP
Modules linked in: fuse tuner tea5767 tda8290 tuner_simple mt20xx bttv ir_common videobuf_dma_sg btcx_risc tveeprom floppy

Pid: 3389, comm: icedove-bin Not tainted (2.6.24-rc1-gc9927c2b #17)
EIP: 0060:[<3d15b925>] EFLAGS: 00210206 CPU: 0
EIP is at 0x3d15b925
EAX: c4c7a2bc EBX: c4c7a36c ECX: 00000000 EDX: 00000000
ESI: c4c7a2bc EDI: 00000000 EBP: 00000000 ESP: c495ee00
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process icedove-bin (pid: 3389, ti=c495e000 task=c4876570 task.ti=c495e000)
Stack: 00000001 0000002e 00000000 00006c01 b1171795 00000000 c4c7a2bc 00008068
c495ee78 c017c362 00000000 00000000 c495ee78 00008068 00000000 c4c7a2bc
c017c53a c24058c0 00000001 c4c79f14 00000008 222281a4 22222222 22222222
Call Trace:
[<c017c362>] inode_setattr+0x53/0x152
[<c017c53a>] notify_change+0xd9/0x2a3
[<c0169b51>] do_truncate+0x5e/0x75
[<c0169130>] get_unused_fd_flags+0x52/0xc1
[<c0170d4b>] permission+0xc5/0xde
[<c0171e07>] may_open+0x155/0x1b3
[<c0173ba4>] open_namei+0x6b/0x5f9
[<c01693e3>] do_filp_open+0x25/0x40
[<c0169130>] get_unused_fd_flags+0x52/0xc1
[<c016943e>] do_sys_open+0x40/0xc5
[<c01694fe>] sys_open+0x1c/0x20
[<c0103fde>] sysenter_past_esp+0x5f/0x85
=======================
Code: Bad EIP value.
EIP: [<3d15b925>] 0x3d15b925 SS:ESP 0068:c495ee00
BUG: unable to handle kernel paging request at virtual address 3d15b925
printing eip: 3d15b925 *pde = 00000000
Oops: 0000 [#2] SMP
Modules linked in: fuse tuner tea5767 tda8290 tuner_simple mt20xx bttv ir_common videobuf_dma_sg btcx_risc tveeprom floppy

Pid: 3696, comm: audacious Tainted: G D (2.6.24-rc1-gc9927c2b #17)
EIP: 0060:[<3d15b925>] EFLAGS: 00210202 CPU: 1
EIP is at 0x3d15b925
EAX: c4c8969c EBX: c4c8974c ECX: 00000000 EDX: 00000000
ESI: c4c8969c EDI: 00000000 EBP: 00000000 ESP: c48abe00
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process audacious (pid: 3696, ti=c48ab000 task=c24b2030 task.ti=c48ab000)
Stack: 00000b8f 0000007b c43a007b c04900d8 b5b87b8f 00000000 c4c8969c 00008068
c48abe78 c017c362 00000000 00000000 c48abe78 00008068 00000000 c4c8969c
c017c53a c24058c0 00000001 c4c8a414 00000008 222281a4 22222222 22222222
Call Trace:
[<c017c362>] inode_setattr+0x53/0x152
[<c017c53a>] notify_change+0xd9/0x2a3
[<c0169b51>] do_truncate+0x5e/0x75
[<c0169130>] get_unused_fd_flags+0x52/0xc1
[<c0170d4b>] permission+0xc5/0xde
[<c0171e07>] may_open+0x155/0x1b3
[<c0173ba4>] open_namei+0x6b/0x5f9
[<c01693e3>] do_filp_open+0x25/0x40
[<c0169130>] get_unused_fd_flags+0x52/0xc1
[<c016943e>] do_sys_open+0x40/0xc5
[<c01694fe>] sys_open+0x1c/0x20
[<c0103fde>] sysenter_past_esp+0x5f/0x85
=======================
Code: Bad EIP value.
EIP: [<3d15b925>] 0x3d15b925 SS:ESP 0068:c48abe00
BUG: unable to handle kernel paging request at virtual address 3d15b925
printing eip: 3d15b925 *pde = 00000000
Oops: 0000 [#3] SMP
Modules linked in: fuse tuner tea5767 tda8290 tuner_simple mt20xx bttv ir_common videobuf_dma_sg btcx_risc tveeprom floppy

Pid: 3154, comm: akregator Tainted: G D (2.6.24-rc1-gc9927c2b #17)
EIP: 0060:[<3d15b925>] EFLAGS: 00010206 CPU: 1
EIP is at 0x3d15b925
EAX: c2b830cc EBX: c2b8317c ECX: 00000000 EDX: 00000000
ESI: c2b830cc EDI: 00000000 EBP: 00000000 ESP: c3a15e00
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process akregator (pid: 3154, ti=c3a15000 task=c24ae030 task.ti=c3a15000)
Stack: c298c2bc c0172b52 c482ee94 c02fc0ca c3a46005 00000000 c2b830cc 00008068
c3a15e78 c017c362 00000000 00000000 c3a15e78 00008068 00000000 c2b830cc
c017c53a c24058c0 00000708 c4763694 00000008 222281a4 22222222 22222222
Call Trace:
[<c0172b52>] __link_path_walk+0xac3/0xc4f
[<c02fc0ca>] sock_def_readable+0x12/0x68
[<c017c362>] inode_setattr+0x53/0x152
[<c017c53a>] notify_change+0xd9/0x2a3
[<c0169b51>] do_truncate+0x5e/0x75
[<c0169130>] get_unused_fd_flags+0x52/0xc1
[<c0170d4b>] permission+0xc5/0xde
[<c0171e07>] may_open+0x155/0x1b3
[<c0173ba4>] open_namei+0x6b/0x5f9
[<c01209da>] scheduler_tick+0xdf/0x125
[<c01693e3>] do_filp_open+0x25/0x40
[<c0169130>] get_unused_fd_flags+0x52/0xc1
[<c016943e>] do_sys_open+0x40/0xc5
[<c01694fe>] sys_open+0x1c/0x20
[<c0103fde>] sysenter_past_esp+0x5f/0x85
=======================
Code: Bad EIP value.
EIP: [<3d15b925>] 0x3d15b925 SS:ESP 0068:c3a15e00
BUG: unable to handle kernel paging request at virtual address 3d15b925
printing eip: 3d15b925 *pde = 00000000
Oops: 0000 [#4] SMP
Modules linked in: fuse tuner tea5767 tda8290 tuner_simple mt20xx bttv ir_common videobuf_dma_sg btcx_risc tveeprom floppy

Pid: 3681, comm: icedove-bin Tainted: G D (2.6.24-rc1-gc9927c2b #17)
EIP: 0060:[<3d15b925>] EFLAGS: 00210206 CPU: 0
EIP is at 0x3d15b925
EAX: c4c7a0cc EBX: c4c7a17c ECX: 00000000 EDX: 00000000
ESI: c4c7a0cc EDI: 00000000 EBP: 00000000 ESP: c4a49e00
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process icedove-bin (pid: 3681, ti=c4a49000 task=c391cab0 task.ti=c4a49000)
Stack: 00000001 00000051 00000000 000082ca b5dde790 00000000 c4c7a0cc 00008068
c4a49e78 c017c362 00000000 00000000 c4a49e78 00008068 00000000 c4c7a0cc
c017c53a c24058c0 00000001 c4c79e94 00000008 222281a4 22222222 22222222
Call Trace:
[<c017c362>] inode_setattr+0x53/0x152
[<c017c53a>] notify_change+0xd9/0x2a3
[<c0169b51>] do_truncate+0x5e/0x75
[<c0169130>] get_unused_fd_flags+0x52/0xc1
[<c0170d4b>] permission+0xc5/0xde
[<c0171e07>] may_open+0x155/0x1b3
[<c0173ba4>] open_namei+0x6b/0x5f9
[<c01693e3>] do_filp_open+0x25/0x40
[<c0169130>] get_unused_fd_flags+0x52/0xc1
[<c016943e>] do_sys_open+0x40/0xc5
[<c01694fe>] sys_open+0x1c/0x20
[<c0103fde>] sysenter_past_esp+0x5f/0x85
=======================
Code: Bad EIP value.
EIP: [<3d15b925>] 0x3d15b925 SS:ESP 0068:c4a49e00

2007-10-24 19:44:36

by Ingo Molnar

[permalink] [raw]
Subject: [patch] portman2x4.c: fix boot hang

Subject: portman2x4.c: fix boot hang
From: Ingo Molnar <[email protected]>

when booting an allyesconfig bzImage kernel the bootup hangs in the
portman2x4 driver (on a box that does not have this hardware), at:

Pid: 1, comm: swapper
EIP: 0060:[<c02f763c>] CPU: 0
EIP is at parport_pc_read_status+0x4/0x8
EFLAGS: 00000202 Not tainted (2.6.23-rc9 #904)
EAX: f7e57a7f EBX: 00000010 ECX: c2b808c0 EDX: 00000379
ESI: f7cb8230 EDI: 00000010 EBP: f7cb8230 DS: 007b ES: 007b FS: 0000
CR0: 8005003b CR2: fff9c000 CR3: 007ec000 CR4: 00000690
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
[<c04613de>] portman_flush_input+0xde/0x12c
[<c0461a24>] snd_portman_probe+0x368/0x484
[<c02fbb8c>] __device_attach+0x0/0x8
[<c02fce68>] platform_drv_probe+0xc/0x10
[<c02fba6c>] driver_probe_device+0x74/0x194
[<c0587174>] klist_next+0x38/0x70
[<c02fbb8c>] __device_attach+0x0/0x8
[<c02faea1>] bus_for_each_drv+0x35/0x68
[<c02fbc22>] device_attach+0x72/0x78

the reason is due to an inconsistent error return code of 1 or 2, while
snd_portman_probe only realizes negative error codes.

with this fixed the probe fails as it should and the bootup continues.

Signed-off-by: Ingo Molnar <[email protected]>
---
sound/drivers/portman2x4.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

Index: linux/sound/drivers/portman2x4.c
===================================================================
--- linux.orig/sound/drivers/portman2x4.c
+++ linux/sound/drivers/portman2x4.c
@@ -467,7 +467,7 @@ static int portman_probe(struct parport
/* Check for ESTB to be clear */
/* 4 */
if ((parport_read_status(p) & ESTB) == ESTB)
- return 1; /* CODE 1 - Strobe Failure. */
+ return -EIO; /* CODE 1 - Strobe Failure. */

/* Set for RXDATA0 where no damage will be done. */
/* 5 */
@@ -475,7 +475,7 @@ static int portman_probe(struct parport

/* 6 */
if ((parport_read_status(p) & ESTB) != ESTB)
- return 1; /* CODE 1 - Strobe Failure. */
+ return -EIO; /* CODE 1 - Strobe Failure. */

/* 7 */
parport_write_control(p, 0); /* Reset Strobe=0. */
@@ -491,7 +491,7 @@ static int portman_probe(struct parport
*/
/* 9 */
if ((parport_read_status(p) & TXEMPTY) == 0)
- return 2;
+ return -EIO;

/* Return OK status. */
return 0;

2007-10-24 20:12:45

by Frans Pop

[permalink] [raw]
Subject: Re: [patch] portman2x4.c: fix boot hang

Ingo Molnar wrote:
> if ((parport_read_status(p) & ESTB) == ESTB)
> - return 1; /* CODE 1 - Strobe Failure. */
> + return -EIO; /* CODE 1 - Strobe Failure. */
>
> /* Set for RXDATA0 where no damage will be done. */
> /* 5 */
> @@ -475,7 +475,7 @@ static int portman_probe(struct parport
>
> /* 6 */
> if ((parport_read_status(p) & ESTB) != ESTB)
> - return 1; /* CODE 1 - Strobe Failure. */
> + return -EIO; /* CODE 1 - Strobe Failure. */

Why are you keeping the "CODE 1" comment? Just "Strobe Failure" as comment
would seem more consistent with the change.

2007-10-24 21:29:28

by Sam Ravnborg

[permalink] [raw]
Subject: [RFC - GIT pull] first step to get rid of x86_64 and i386 dirs

Hi Ingo.

This is first step in getting rid of the two directories.
I had to do some very minor modifications in common files
to let it work out - but nothing really hackish.

If you & Thomas + hpa are OK with the changes they can be
pulled from:

git://git.kernel.org/pub/scm/linux/kernel/git/sam/x86.git

As this is mostly renames I have attached a git -M diff only.

The remaining stuff is Kconfig files.

Before looking into these I am hoping someone could
step in and make the two Kconfig.debug
files 100% equal - because then I can fix the kconfig
stuff and finally kill the two directories.

Sam


commit 4aaac9bda3be500750347129ee13d63e80bf4b9f
Author: Sam Ravnborg <[email protected]>
Date: Wed Oct 24 23:00:06 2007 +0200

x86: move defconfig files for i386 and x86_64 to x86

With some small changes to kconfig makefile we can now
locate the defconfig files for i386 and x86_64 in
the configs/ subdirectory under x86.

Signed-off-by: Sam Ravnborg <[email protected]>

commit f745ab20e4697829100edfe29035d491f7efdc42
Author: Sam Ravnborg <[email protected]>
Date: Wed Oct 24 22:44:11 2007 +0200

x86: move i386 and x86_64 Makefiles to arch/x86

Moving the ARCH specific MAkefiles for i386 and x86_64
required a litle bit tweaking in the top-lvel Makefile.
But this is one of the final steps to get rid of the
x86_64 and i386 directories.

Signed-off-by: Sam Ravnborg <[email protected]>


git diff -M --stat:

Makefile | 7 +++++--
arch/{i386/Makefile => x86/Makefile_32} | 4 ++--
arch/{i386/Makefile.cpu => x86/Makefile_32.cpu} | 0
arch/{x86_64/Makefile => x86/Makefile_64} | 2 +-
.../{i386/defconfig => x86/configs/i386_defconfig} | 0
.../defconfig => x86/configs/x86_64_defconfig} | 0
scripts/kconfig/Makefile | 6 +++---
7 files changed, 11 insertions(+), 8 deletions(-)

git diff -M:

diff --git a/Makefile b/Makefile
index 2a47290..8816060 100644
--- a/Makefile
+++ b/Makefile
@@ -196,6 +196,9 @@ CROSS_COMPILE ?=
UTS_MACHINE := $(ARCH)
SRCARCH := $(ARCH)

+# for i386 and x86_64 we use SRCARCH equal to x86
+SRCARCH := $(if $(filter x86_64 i386,$(SRCARCH)),x86,$(SRCARCH))
+
KCONFIG_CONFIG ?= .config

# SHELL used by kbuild
@@ -418,7 +421,7 @@ ifeq ($(config-targets),1)
# Read arch specific Makefile to set KBUILD_DEFCONFIG as needed.
# KBUILD_DEFCONFIG may point out an alternative default configuration
# used for 'make defconfig'
-include $(srctree)/arch/$(ARCH)/Makefile
+include $(srctree)/arch/$(SRCARCH)/Makefile
export KBUILD_DEFCONFIG

config %config: scripts_basic outputmakefile FORCE
@@ -497,7 +500,7 @@ else
KBUILD_CFLAGS += -O2
endif

-include $(srctree)/arch/$(ARCH)/Makefile
+include $(srctree)/arch/$(SRCARCH)/Makefile

ifdef CONFIG_FRAME_POINTER
KBUILD_CFLAGS += -fno-omit-frame-pointer -fno-optimize-sibling-calls
diff --git a/arch/i386/Makefile b/arch/x86/Makefile_32
similarity index 98%
rename from arch/i386/Makefile
rename to arch/x86/Makefile_32
index f5b9a37..c0b81d0 100644
--- a/arch/i386/Makefile
+++ b/arch/x86/Makefile_32
@@ -1,5 +1,5 @@
#
-# i386/Makefile
+# i386 Makefile
#
# This file is included by the global makefile so that you can add your own
# architecture-specific flags and dependencies. Remember to do have actions
@@ -46,7 +46,7 @@ KBUILD_CFLAGS += -pipe -msoft-float -mregparm=3 -freg-struct-return
KBUILD_CFLAGS += $(call cc-option,-mpreferred-stack-boundary=2)

# CPU-specific tuning. Anything which can be shared with UML should go here.
-include $(srctree)/arch/i386/Makefile.cpu
+include $(srctree)/arch/x86/Makefile_32.cpu

# temporary until string.h is fixed
cflags-y += -ffreestanding
diff --git a/arch/i386/Makefile.cpu b/arch/x86/Makefile_32.cpu
similarity index 100%
rename from arch/i386/Makefile.cpu
rename to arch/x86/Makefile_32.cpu
diff --git a/arch/x86_64/Makefile b/arch/x86/Makefile_64
similarity index 99%
rename from arch/x86_64/Makefile
rename to arch/x86/Makefile_64
index 20eb69b..52adc8c 100644
--- a/arch/x86_64/Makefile
+++ b/arch/x86/Makefile_64
@@ -1,5 +1,5 @@
#
-# x86_64/Makefile
+# x86_64 Makefile
#
# This file is included by the global makefile so that you can add your own
# architecture-specific flags and dependencies. Remember to do have actions
diff --git a/arch/i386/defconfig b/arch/x86/configs/i386_defconfig
similarity index 100%
rename from arch/i386/defconfig
rename to arch/x86/configs/i386_defconfig
diff --git a/arch/x86_64/defconfig b/arch/x86/configs/x86_64_defconfig
similarity index 100%
rename from arch/x86_64/defconfig
rename to arch/x86/configs/x86_64_defconfig
diff --git a/scripts/kconfig/Makefile b/scripts/kconfig/Makefile
index 83c5e76..fbf39cc 100644
--- a/scripts/kconfig/Makefile
+++ b/scripts/kconfig/Makefile
@@ -60,12 +60,12 @@ defconfig: $(obj)/conf
ifeq ($(KBUILD_DEFCONFIG),)
$< -d arch/$(ARCH)/Kconfig
else
- @echo *** Default configuration is based on '$(KBUILD_DEFCONFIG)'
- $(Q)$< -D arch/$(ARCH)/configs/$(KBUILD_DEFCONFIG) arch/$(ARCH)/Kconfig
+ @echo "*** Default configuration is based on '$(KBUILD_DEFCONFIG)'"
+ $(Q)$< -D arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG) arch/$(ARCH)/Kconfig
endif

%_defconfig: $(obj)/conf
- $(Q)$< -D arch/$(ARCH)/configs/$@ arch/$(ARCH)/Kconfig
+ $(Q)$< -D arch/$(SRCARCH)/configs/$@ arch/$(ARCH)/Kconfig

# Help text used by make help
help:

2007-10-24 21:30:22

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] portman2x4.c: fix boot hang


* Frans Pop <[email protected]> wrote:

> Ingo Molnar wrote:
> > if ((parport_read_status(p) & ESTB) == ESTB)
> > - return 1; /* CODE 1 - Strobe Failure. */
> > + return -EIO; /* CODE 1 - Strobe Failure. */
> >
> > /* Set for RXDATA0 where no damage will be done. */
> > /* 5 */
> > @@ -475,7 +475,7 @@ static int portman_probe(struct parport
> >
> > /* 6 */
> > if ((parport_read_status(p) & ESTB) != ESTB)
> > - return 1; /* CODE 1 - Strobe Failure. */
> > + return -EIO; /* CODE 1 - Strobe Failure. */
>
> Why are you keeping the "CODE 1" comment? Just "Strobe Failure" as
> comment would seem more consistent with the change.

looking at other uses of 'CODE' suggested that 'CODE 1' is not the
return code - so i left it alone.

Ingo

2007-10-24 22:51:05

by Randy Dunlap

[permalink] [raw]
Subject: Re: [RFC - GIT pull] first step to get rid of x86_64 and i386 dirs

On Wed, 24 Oct 2007 23:30:52 +0200 Sam Ravnborg wrote:

> Hi Ingo.
>
> This is first step in getting rid of the two directories.
> I had to do some very minor modifications in common files
> to let it work out - but nothing really hackish.
>
> If you & Thomas + hpa are OK with the changes they can be
> pulled from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/sam/x86.git
>
> As this is mostly renames I have attached a git -M diff only.
>
> The remaining stuff is Kconfig files.
>
> Before looking into these I am hoping someone could
> step in and make the two Kconfig.debug
> files 100% equal - because then I can fix the kconfig
> stuff and finally kill the two directories.

Uh, maybe I jumped too far. I merged the 2 x86 Kconfig.debug files
into arch/x86/Kconfig.debug....

---

From: Randy Dunlap <[email protected]>

Merge i386/Kconfig.debug and x86_64/Kconfig.debug into x86/Kconfig.debug,
using "depends on X86_32" or X86_64 when needed.

Signed-off-by: Randy Dunlap <[email protected]>
---
arch/i386/Kconfig.debug | 88 ----------------------------------
arch/x86/Kconfig.debug | 119 ++++++++++++++++++++++++++++++++++++++++++++++
arch/x86_64/Kconfig.debug | 61 -----------------------
3 files changed, 119 insertions(+), 149 deletions(-)

--- linux-2.6.24-rc1.orig/arch/i386/Kconfig.debug
+++ /dev/null
@@ -1,88 +0,0 @@
-menu "Kernel hacking"
-
-config TRACE_IRQFLAGS_SUPPORT
- bool
- default y
-
-source "lib/Kconfig.debug"
-
-config EARLY_PRINTK
- bool "Early printk" if EMBEDDED && DEBUG_KERNEL
- default y
- help
- Write kernel log output directly into the VGA buffer or to a serial
- port.
-
- This is useful for kernel debugging when your machine crashes very
- early before the console code is initialized. For normal operation
- it is not recommended because it looks ugly and doesn't cooperate
- with klogd/syslogd or the X server. You should normally N here,
- unless you want to debug such a crash.
-
-config DEBUG_STACKOVERFLOW
- bool "Check for stack overflows"
- depends on DEBUG_KERNEL
- help
- This option will cause messages to be printed if free stack space
- drops below a certain limit.
-
-config DEBUG_STACK_USAGE
- bool "Stack utilization instrumentation"
- depends on DEBUG_KERNEL
- help
- Enables the display of the minimum amount of free stack which each
- task has ever had available in the sysrq-T and sysrq-P debug output.
-
- This option will slow down process creation somewhat.
-
-comment "Page alloc debug is incompatible with Software Suspend on i386"
- depends on DEBUG_KERNEL && HIBERNATION
-
-config DEBUG_PAGEALLOC
- bool "Debug page memory allocations"
- depends on DEBUG_KERNEL && !HIBERNATION && !HUGETLBFS
- help
- Unmap pages from the kernel linear mapping after free_pages().
- This results in a large slowdown, but helps to find certain types
- of memory corruptions.
-
-config DEBUG_RODATA
- bool "Write protect kernel read-only data structures"
- depends on DEBUG_KERNEL
- help
- Mark the kernel read-only data as write-protected in the pagetables,
- in order to catch accidental (and incorrect) writes to such const
- data. This option may have a slight performance impact because a
- portion of the kernel code won't be covered by a 2MB TLB anymore.
- If in doubt, say "N".
-
-config 4KSTACKS
- bool "Use 4Kb for kernel stacks instead of 8Kb"
- depends on DEBUG_KERNEL
- help
- If you say Y here the kernel will use a 4Kb stacksize for the
- kernel stack attached to each process/thread. This facilitates
- running more threads on a system and also reduces the pressure
- on the VM subsystem for higher order allocations. This option
- will also use IRQ stacks to compensate for the reduced stackspace.
-
-config X86_FIND_SMP_CONFIG
- bool
- depends on X86_LOCAL_APIC || X86_VOYAGER
- default y
-
-config X86_MPPARSE
- bool
- depends on X86_LOCAL_APIC && !X86_VISWS
- default y
-
-config DOUBLEFAULT
- default y
- bool "Enable doublefault exception handler" if EMBEDDED
- help
- This option allows trapping of rare doublefault exceptions that
- would otherwise cause a system to silently reboot. Disabling this
- option saves about 4k and might cause you much additional grey
- hair.
-
-endmenu
--- /dev/null
+++ linux-2.6.24-rc1/arch/x86/Kconfig.debug
@@ -0,0 +1,119 @@
+menu "Kernel hacking"
+
+config TRACE_IRQFLAGS_SUPPORT
+ def_bool y
+
+source "lib/Kconfig.debug"
+
+config EARLY_PRINTK
+ bool "Early printk" if EMBEDDED && DEBUG_KERNEL
+ default y
+ depends on X86_32
+ help
+ Write kernel log output directly into the VGA buffer or to a serial
+ port.
+
+ This is useful for kernel debugging when your machine crashes very
+ early before the console code is initialized. For normal operation
+ it is not recommended because it looks ugly and doesn't cooperate
+ with klogd/syslogd or the X server. You should normally N here,
+ unless you want to debug such a crash.
+
+config DEBUG_STACKOVERFLOW
+ bool "Check for stack overflows"
+ depends on DEBUG_KERNEL
+ help
+ This option will cause messages to be printed if free stack space
+ drops below a certain limit.
+
+config DEBUG_STACK_USAGE
+ bool "Stack utilization instrumentation"
+ depends on DEBUG_KERNEL
+ help
+ Enables the display of the minimum amount of free stack which each
+ task has ever had available in the sysrq-T and sysrq-P debug output.
+
+ This option will slow down process creation somewhat.
+
+comment "Page alloc debug is incompatible with Software Suspend on i386"
+ depends on DEBUG_KERNEL && HIBERNATION
+ depends on X86_32
+
+config DEBUG_PAGEALLOC
+ bool "Debug page memory allocations"
+ depends on DEBUG_KERNEL && !HIBERNATION && !HUGETLBFS
+ depends on X86_32
+ help
+ Unmap pages from the kernel linear mapping after free_pages().
+ This results in a large slowdown, but helps to find certain types
+ of memory corruptions.
+
+config DEBUG_RODATA
+ bool "Write protect kernel read-only data structures"
+ depends on DEBUG_KERNEL
+ help
+ Mark the kernel read-only data as write-protected in the pagetables,
+ in order to catch accidental (and incorrect) writes to such const
+ data. This option may have a slight performance impact because a
+ portion of the kernel code won't be covered by a 2MB TLB anymore.
+ If in doubt, say "N".
+
+config 4KSTACKS
+ bool "Use 4Kb for kernel stacks instead of 8Kb"
+ depends on DEBUG_KERNEL
+ depends on X86_32
+ help
+ If you say Y here the kernel will use a 4Kb stacksize for the
+ kernel stack attached to each process/thread. This facilitates
+ running more threads on a system and also reduces the pressure
+ on the VM subsystem for higher order allocations. This option
+ will also use IRQ stacks to compensate for the reduced stackspace.
+
+config X86_FIND_SMP_CONFIG
+ def_bool y
+ depends on X86_LOCAL_APIC || X86_VOYAGER
+ depends on X86_32
+
+config X86_MPPARSE
+ def_bool y
+ depends on X86_LOCAL_APIC && !X86_VISWS
+ depends on X86_32
+
+config DOUBLEFAULT
+ default y
+ bool "Enable doublefault exception handler" if EMBEDDED
+ depends on X86_32
+ help
+ This option allows trapping of rare doublefault exceptions that
+ would otherwise cause a system to silently reboot. Disabling this
+ option saves about 4k and might cause you much additional grey
+ hair.
+
+config IOMMU_DEBUG
+ bool "Enable IOMMU debugging"
+ depends on IOMMU && DEBUG_KERNEL
+ depends on X86_64
+ help
+ Force the IOMMU to on even when you have less than 4GB of
+ memory and add debugging code. On overflow always panic. And
+ allow to enable IOMMU leak tracing. Can be disabled at boot
+ time with iommu=noforce. This will also enable scatter gather
+ list merging. Currently not recommended for production
+ code. When you use it make sure you have a big enough
+ IOMMU/AGP aperture. Most of the options enabled by this can
+ be set more finegrained using the iommu= command line
+ options. See Documentation/x86_64/boot-options.txt for more
+ details.
+
+config IOMMU_LEAK
+ bool "IOMMU leak tracing"
+ depends on DEBUG_KERNEL
+ depends on IOMMU_DEBUG
+ help
+ Add a simple leak tracer to the IOMMU code. This is useful when you
+ are debugging a buggy device driver that leaks IOMMU mappings.
+
+#config X86_REMOTE_DEBUG
+# bool "kgdb debugging stub"
+
+endmenu
--- linux-2.6.24-rc1.orig/arch/x86_64/Kconfig.debug
+++ /dev/null
@@ -1,61 +0,0 @@
-menu "Kernel hacking"
-
-config TRACE_IRQFLAGS_SUPPORT
- bool
- default y
-
-source "lib/Kconfig.debug"
-
-config DEBUG_RODATA
- bool "Write protect kernel read-only data structures"
- depends on DEBUG_KERNEL
- help
- Mark the kernel read-only data as write-protected in the pagetables,
- in order to catch accidental (and incorrect) writes to such const data.
- This option may have a slight performance impact because a portion
- of the kernel code won't be covered by a 2MB TLB anymore.
- If in doubt, say "N".
-
-config IOMMU_DEBUG
- depends on IOMMU && DEBUG_KERNEL
- bool "Enable IOMMU debugging"
- help
- Force the IOMMU to on even when you have less than 4GB of
- memory and add debugging code. On overflow always panic. And
- allow to enable IOMMU leak tracing. Can be disabled at boot
- time with iommu=noforce. This will also enable scatter gather
- list merging. Currently not recommended for production
- code. When you use it make sure you have a big enough
- IOMMU/AGP aperture. Most of the options enabled by this can
- be set more finegrained using the iommu= command line
- options. See Documentation/x86_64/boot-options.txt for more
- details.
-
-config IOMMU_LEAK
- bool "IOMMU leak tracing"
- depends on DEBUG_KERNEL
- depends on IOMMU_DEBUG
- help
- Add a simple leak tracer to the IOMMU code. This is useful when you
- are debugging a buggy device driver that leaks IOMMU mappings.
-
-config DEBUG_STACKOVERFLOW
- bool "Check for stack overflows"
- depends on DEBUG_KERNEL
- help
- This option will cause messages to be printed if free stack space
- drops below a certain limit.
-
-config DEBUG_STACK_USAGE
- bool "Stack utilization instrumentation"
- depends on DEBUG_KERNEL
- help
- Enables the display of the minimum amount of free stack which each
- task has ever had available in the sysrq-T and sysrq-P debug output.
-
- This option will slow down process creation somewhat.
-
-#config X86_REMOTE_DEBUG
-# bool "kgdb debugging stub"
-
-endmenu

2007-10-25 05:19:34

by Theodore Ts'o

[permalink] [raw]
Subject: 2.6.24-rc1 doesn't build...

I can't seem to get 2.6.24-rc1 to build:

...
LD .tmp_vmlinux1
arch/x86/kernel/built-in.o: In function `smp_send_nmi_allbutself':
/usr/projects/linux/linux-2.6/arch/x86/kernel/crash.c:85: undefined reference to `genapic'
make: *** [.tmp_vmlinux1] Error 1

Has anyone else seen this?

- Ted


Attachments:
(No filename) (306.00 B)
.config (64.07 kB)
Download all attachments

2007-10-25 05:31:45

by Kamalesh Babulal

[permalink] [raw]
Subject: Re: 2.6.24-rc1 doesn't build...

Theodore Tso wrote:
> I can't seem to get 2.6.24-rc1 to build:
>
> ...
> LD .tmp_vmlinux1
> arch/x86/kernel/built-in.o: In function `smp_send_nmi_allbutself':
> /usr/projects/linux/linux-2.6/arch/x86/kernel/crash.c:85: undefined reference to `genapic'
> make: *** [.tmp_vmlinux1] Error 1
>
> Has anyone else seen this?
>
> - Ted
Hi Ted,

The patch for this build issue is available at http://lkml.org/lkml/2007/10/24/128.
--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.

2007-10-25 06:14:18

by Yinghai Lu

[permalink] [raw]
Subject: Re: [RFC - GIT pull] first step to get rid of x86_64 and i386 dirs

On 10/24/07, Randy Dunlap <[email protected]> wrote:
> On Wed, 24 Oct 2007 23:30:52 +0200 Sam Ravnborg wrote:
>
> > Hi Ingo.
> >
> > This is first step in getting rid of the two directories.
> > I had to do some very minor modifications in common files
> > to let it work out - but nothing really hackish.
> >
> > If you & Thomas + hpa are OK with the changes they can be
> > pulled from:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/sam/x86.git
> >
> > As this is mostly renames I have attached a git -M diff only.
> >
> > The remaining stuff is Kconfig files.
> >
> > Before looking into these I am hoping someone could
> > step in and make the two Kconfig.debug
> > files 100% equal - because then I can fix the kconfig
> > stuff and finally kill the two directories.
>
> Uh, maybe I jumped too far. I merged the 2 x86 Kconfig.debug files
> into arch/x86/Kconfig.debug....
>
> ---
>
> From: Randy Dunlap <[email protected]>
>
> Merge i386/Kconfig.debug and x86_64/Kconfig.debug into x86/Kconfig.debug,
> using "depends on X86_32" or X86_64 when needed.
>
> Signed-off-by: Randy Dunlap <[email protected]>
> ---
> arch/i386/Kconfig.debug | 88 ----------------------------------
> arch/x86/Kconfig.debug | 119 ++++++++++++++++++++++++++++++++++++++++++++++
> arch/x86_64/Kconfig.debug | 61 -----------------------
> 3 files changed, 119 insertions(+), 149 deletions(-)
>
> --- linux-2.6.24-rc1.orig/arch/i386/Kconfig.debug
> +++ /dev/null
> @@ -1,88 +0,0 @@
> -menu "Kernel hacking"
> -
> -config TRACE_IRQFLAGS_SUPPORT
> - bool
> - default y
> -
> -source "lib/Kconfig.debug"
> -
> -config EARLY_PRINTK
> - bool "Early printk" if EMBEDDED && DEBUG_KERNEL
> - default y
> - help
> - Write kernel log output directly into the VGA buffer or to a serial
> - port.
...
> --- /dev/null
> +++ linux-2.6.24-rc1/arch/x86/Kconfig.debug
> @@ -0,0 +1,119 @@
> +menu "Kernel hacking"
> +
> +config TRACE_IRQFLAGS_SUPPORT
> + def_bool y
> +
> +source "lib/Kconfig.debug"
> +
> +config EARLY_PRINTK
> + bool "Early printk" if EMBEDDED && DEBUG_KERNEL
> + default y
> + depends on X86_32
> + help
> + Write kernel log output directly into the VGA buffer or to a serial
> + port.
> +
> + This is useful for kernel debugging when your machine crashes very
> + early before the console code is initialized. For normal operation
> + it is not recommended because it looks ugly and doesn't cooperate
> + with klogd/syslogd or the X server. You should normally N here,
> + unless you want to debug such a crash.
> +
...
in x86_64/Kconfig has EARLY_PRINTK too

config EARLY_PRINTK
bool
default y


YH

2007-10-25 09:52:19

by Takashi Iwai

[permalink] [raw]
Subject: Re: [patch] portman2x4.c: fix boot hang

At Wed, 24 Oct 2007 21:44:17 +0200,
Ingo Molnar wrote:
>
> Subject: portman2x4.c: fix boot hang
> From: Ingo Molnar <[email protected]>
>
> when booting an allyesconfig bzImage kernel the bootup hangs in the
> portman2x4 driver (on a box that does not have this hardware), at:
>
> Pid: 1, comm: swapper
> EIP: 0060:[<c02f763c>] CPU: 0
> EIP is at parport_pc_read_status+0x4/0x8
> EFLAGS: 00000202 Not tainted (2.6.23-rc9 #904)
> EAX: f7e57a7f EBX: 00000010 ECX: c2b808c0 EDX: 00000379
> ESI: f7cb8230 EDI: 00000010 EBP: f7cb8230 DS: 007b ES: 007b FS: 0000
> CR0: 8005003b CR2: fff9c000 CR3: 007ec000 CR4: 00000690
> DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> DR6: ffff0ff0 DR7: 00000400
> [<c04613de>] portman_flush_input+0xde/0x12c
> [<c0461a24>] snd_portman_probe+0x368/0x484
> [<c02fbb8c>] __device_attach+0x0/0x8
> [<c02fce68>] platform_drv_probe+0xc/0x10
> [<c02fba6c>] driver_probe_device+0x74/0x194
> [<c0587174>] klist_next+0x38/0x70
> [<c02fbb8c>] __device_attach+0x0/0x8
> [<c02faea1>] bus_for_each_drv+0x35/0x68
> [<c02fbc22>] device_attach+0x72/0x78
>
> the reason is due to an inconsistent error return code of 1 or 2, while
> snd_portman_probe only realizes negative error codes.
>
> with this fixed the probe fails as it should and the bootup continues.
>
> Signed-off-by: Ingo Molnar <[email protected]>

Thanks. But isn't the patch below easier?


Takashi

diff -r 09088524dd7f drivers/portman2x4.c
--- a/drivers/portman2x4.c Thu Oct 25 11:46:24 2007 +0200
+++ b/drivers/portman2x4.c Thu Oct 25 11:49:17 2007 +0200
@@ -668,7 +668,7 @@ static int __devinit snd_portman_probe_p
parport_release(pardev);
parport_unregister_device(pardev);

- return res;
+ return res ? -EIO : 0;
}

static void __devinit snd_portman_attach(struct parport *p)

2007-10-25 10:18:59

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC - GIT pull] first step to get rid of x86_64 and i386 dirs


* Sam Ravnborg <[email protected]> wrote:

> Hi Ingo.
>
> This is first step in getting rid of the two directories. I had to do
> some very minor modifications in common files to let it work out - but
> nothing really hackish.
>
> If you & Thomas + hpa are OK with the changes they can be pulled from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/sam/x86.git
>
> As this is mostly renames I have attached a git -M diff only.

wow, cool stuff!

> The remaining stuff is Kconfig files.
>
> Before looking into these I am hoping someone could step in and make
> the two Kconfig.debug files 100% equal - because then I can fix the
> kconfig stuff and finally kill the two directories.

How about mechanically unifying them by adding:

depends on X86_32

and:

depends on X86_64

lines? Can you see any problem with such an approach?

Ingo

2007-10-25 10:53:57

by Sam Ravnborg

[permalink] [raw]
Subject: Re: [RFC - GIT pull] first step to get rid of x86_64 and i386 dirs

On Thu, Oct 25, 2007 at 12:18:31PM +0200, Ingo Molnar wrote:
>
> * Sam Ravnborg <[email protected]> wrote:
>
> > Hi Ingo.
> >
> > This is first step in getting rid of the two directories. I had to do
> > some very minor modifications in common files to let it work out - but
> > nothing really hackish.
> >
> > If you & Thomas + hpa are OK with the changes they can be pulled from:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/sam/x86.git
> >
> > As this is mostly renames I have attached a git -M diff only.
>
> wow, cool stuff!
>
> > The remaining stuff is Kconfig files.
> >
> > Before looking into these I am hoping someone could step in and make
> > the two Kconfig.debug files 100% equal - because then I can fix the
> > kconfig stuff and finally kill the two directories.
>
> How about mechanically unifying them by adding:
>
> depends on X86_32
>
> and:
>
> depends on X86_64
>
> lines? Can you see any problem with such an approach?
Not really.
But when we touch Kconfig.debug we should check up on that it reflect
reality. I would be suprised if we do not miss an option or two.
I am tempted to just go forward with the version Randy made where
he merged the two.
If I get time I will look at it tonight and send out a series of patches
for review.


We have to get rid of the directories - just consider the poor victim
that has her/his code reviewed by hch after he have had his nightmare ;-)

Sam

2007-10-25 10:55:27

by Sam Ravnborg

[permalink] [raw]
Subject: Re: [RFC - GIT pull] first step to get rid of x86_64 and i386 dirs

> ...
> in x86_64/Kconfig has EARLY_PRINTK too
>
> config EARLY_PRINTK
> bool
> default y

I noticed this too. So on x86_64 it was unconditionally enabled
whereas with i386 (and the merged files) if it an option that
can be turned off.

Randy - did you realise this when you did the merge?

Sam

2007-10-25 13:17:32

by edz_mania

[permalink] [raw]
Subject: Re: Linux v2.6.24-rc1

-- Linus Torvalds wrote :

This may count as one of the biggest -rc releases ever. It's humongous.
Usually the compressed -rc1 diffs are in the 3-5MB range, with occasional
smaller ones, and the occasional ones that top 6M, but this one is
*eleven* megs.

I'd blame the x86 renames (and the watchdog ones), but the thing is, it's
absolutely huge even when I generate the diff with git turning all those
renames into relatively small rename diffs (which I don't do for the
public diffs, since I expect that git people use git to get the changes,
and non-git people won't have tools that understand a diff that involves
renames).

In short, we just had an unusually large amount of not just x86 merges,
but also tons of new drivers (wireless networking stands out, but is by no
means the only thing - we've got dvb, regular wired network, mmc etc all
joining in), and a fair amount or architecture stuff, filesystems,
networking etc too.

So there's just lots of new stuff. The diffstat is ten thousand lines
long, and weighing in at comfortably over half a megabyte it is way over
the limits of this - or any sane - mailing list. The shortlog is barely
shorter, weighing in at "just" 8461 lines and almost 400k. The full
changelog (which I'm still producing for y'all, since people told me they
actually care last time I asked) is 4 megs.

In other words, I don't even know where to start. The big noticeable thing
is the x86 merge, and I think we all fervently hope that it won't cause
any issues. So far it's been pretty smooth sailing. Knock wood.

Less smooth has the scatter-gather changes to the block layer been, but
they are hopefully all in reasonable shape by now too. And the VM changes?
I honestly hope nobody even notices. Same goes for some of the VFS layer
changes that affected basically every filesystem (although in mostly very
straightforward ways).

Just for fun, I'd really encourage git users to just try the

git shortlog v2.6.23..

thing, it really is quite impressive.

Linus

2007-10-25 15:46:05

by Randy Dunlap

[permalink] [raw]
Subject: Re: [RFC - GIT pull] first step to get rid of x86_64 and i386 dirs

On Thu, 25 Oct 2007 12:56:52 +0200 Sam Ravnborg wrote:

> > ...
> > in x86_64/Kconfig has EARLY_PRINTK too
> >
> > config EARLY_PRINTK
> > bool
> > default y
>
> I noticed this too. So on x86_64 it was unconditionally enabled
> whereas with i386 (and the merged files) if it an option that
> can be turned off.
>
> Randy - did you realise this when you did the merge?

No, I missed that.

---
~Randy

2007-10-26 05:21:17

by Ken'ichi Ohmichi

[permalink] [raw]
Subject: [PATCH] Dump filtering supports x86_64 sparsemem(Re: Linux v2.6.24-rc1)


Hi,

This patch adds the symbol "init_level4_pgt" to the vmcoreinfo data so
that makedumpfile (dump filtering command) supports x86_64 sparsemem
kernel of linux-2.6.24.

makedumpfile creates a small dumpfile by excluding unnecessary pages for
the analysis. It checks attributes in page structures and distinguishes
necessary pages and unnecessary ones. To check them, makedumpfile gets
the vmcoreinfo data which has the minimum debugging information only for
dump filtering.

For older x86_64 kernel (linux-2.6.23 or before), makedumpfile translates
the virtual address of page structure into physical address by subtracting
PAGE_OFFSET from virtual address, but this translation isn't effective for
linux-2.6.24 sparsemem kernel, because its page structures are in virtual
memmap area. makedumpfile should translate their virtual address by 4-levels
paging and it needs the symbol "init_level4_pgt".


Thanks
Ken'ichi Ohmichi

---
Signed-off-by: Ken'ichi Ohmichi <[email protected]>

---
diff -rpuN a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
--- a/arch/x86/kernel/machine_kexec_64.c 2007-10-26 11:05:34.000000000 +0900
+++ b/arch/x86/kernel/machine_kexec_64.c 2007-10-26 11:16:24.000000000 +0900
@@ -233,6 +233,8 @@ NORET_TYPE void machine_kexec(struct kim

void arch_crash_save_vmcoreinfo(void)
{
+ VMCOREINFO_SYMBOL(init_level4_pgt);
+
#ifdef CONFIG_ARCH_DISCONTIGMEM_ENABLE
VMCOREINFO_SYMBOL(node_data);
VMCOREINFO_LENGTH(node_data, MAX_NUMNODES);
_




2007-10-26 05:57:59

by Romano Giannetti

[permalink] [raw]
Subject: Re: 2.6.24-rc1 fails with lockup and BUG:


On Wed, 2007-10-24 at 18:11 +0200, Peter Zijlstra wrote:
> On Wed, 2007-10-24 at 17:55 +0200, Ingo Molnar wrote:
> >
> > hm, this lockdep warning caused lockdep to turn itself off - hence we
> > wont get to the really interesting warnings. We'll try to come up with a
> > solution for this.
>
> Does this help?

I tried this, but although I have the D-state processes, I cannot see
any debug trace now. Results are at:

http://www.dea.icai.upcomillas.es/romano/linux/info/2624rc1_3/

Can I try anything more? This is quite a show-stopper for me... and
before trying to bisect 11Mbyte of patches...

Romano


--
Sorry for the disclaimer --- ?I cannot stop it!



--
La presente comunicaci?n tiene car?cter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribuci?n, reproducci?n o uso de esta comunicaci?n y/o de la informaci?n contenida en la misma est?n estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicaci?n por error, por favor, notif?quelo inmediatamente al remitente contestando a este mensaje y proceda a continuaci?n a destruirlo. Gracias por su colaboraci?n.

This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation.

2007-10-26 05:59:59

by Romano Giannetti

[permalink] [raw]
Subject: Re: 2.6.24-rc1 fails with lockup and BUG:


On Wed, 2007-10-24 at 12:44 -0400, Joseph Fannin wrote:
> On Wed, Oct 24, 2007 at 03:25:44PM +0200, Romano Giannetti wrote:
> >
> >
> Denis V. Lunev wrote a patch for the NetworkManager thing a day or two
> ago (which DaveM has queued).
>
> Since netlink is involved in the traces you sent, this might do something
> for the other too.

This *do* fix the "network manager needs to be restarted at boot" part
of the problem, but leave as is the worst one (the failed suspend to ram
and following bugs).

Romano

--
Sorry for the disclaimer --- ?I cannot stop it!



--
La presente comunicaci?n tiene car?cter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribuci?n, reproducci?n o uso de esta comunicaci?n y/o de la informaci?n contenida en la misma est?n estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicaci?n por error, por favor, notif?quelo inmediatamente al remitente contestando a este mensaje y proceda a continuaci?n a destruirlo. Gracias por su colaboraci?n.

This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation.

2007-10-26 06:38:19

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.24-rc1 fails with lockup - /sbin/ifconfig / inet_ioctl() / dev_close() / rtl8169_down()


* Romano Giannetti <[email protected]> wrote:

> > Does this help?
>
> I tried this, but although I have the D-state processes, I cannot see
> any debug trace now. Results are at:
>
> http://www.dea.icai.upcomillas.es/romano/linux/info/2624rc1_3/
>
> Can I try anything more? This is quite a show-stopper for me... and
> before trying to bisect 11Mbyte of patches...

hm, from your log it appears that lockdep did not find anything, still
the hang does trigger.

it's /sbin/ifconfig and inet_ioctl() / dev_close() / rtl8169_down() that
seems to be hanging. I've extracted the relevant backtrace below. I've
Cc:-ed people who might have a better idea about what's going on.

Ingo

------------------>
ifconfig S c0476f80 0 7226 7166
cbb67df0 00000046 c02f3f97 c0476f80 cbb67dc0 c01489d5 c2b81550 c2b8168c
c1cf7b80 00000000 c30bd250 00000000 cbb67dd0 00000282 cbb67e00 c0476f80
cbb67df0 c0132618 00004232 00000000 00000282 cbb67e00 00004232 f884e000
Call Trace:
[schedule_timeout+72/192] schedule_timeout+0x48/0xc0
[schedule_timeout_interruptible+21/32] schedule_timeout_interruptible+0x15/0x20
[msleep_interruptible+39/64] msleep_interruptible+0x27/0x40
[<f88422f0>] rtl8169_down+0xb0/0xd0 [r8169]
[<f88424cf>] rtl8169_close+0x1f/0xb0 [r8169]
[dev_close+71/96] dev_close+0x47/0x60
[dev_change_flags+125/384] dev_change_flags+0x7d/0x180
[devinet_ioctl+1225/1632] devinet_ioctl+0x4c9/0x660
[inet_ioctl+107/144] inet_ioctl+0x6b/0x90
[sock_ioctl+208/544] sock_ioctl+0xd0/0x220
[do_ioctl+40/128] do_ioctl+0x28/0x80
[vfs_ioctl+87/640] vfs_ioctl+0x57/0x280
[sys_ioctl+57/96] sys_ioctl+0x39/0x60
[sysenter_past_esp+95/165] sysenter_past_esp+0x5f/0xa5
=======================

2007-10-26 16:50:16

by Stephen Hemminger

[permalink] [raw]
Subject: Re: 2.6.24-rc1 fails with lockup - /sbin/ifconfig / inet_ioctl() / dev_close() / rtl8169_down()

On Fri, 26 Oct 2007 08:37:33 +0200
Ingo Molnar <[email protected]> wrote:

>
> * Romano Giannetti <[email protected]> wrote:
>
> > > Does this help?
> >
> > I tried this, but although I have the D-state processes, I cannot see
> > any debug trace now. Results are at:
> >
> > http://www.dea.icai.upcomillas.es/romano/linux/info/2624rc1_3/
> >
> > Can I try anything more? This is quite a show-stopper for me... and
> > before trying to bisect 11Mbyte of patches...
>
> hm, from your log it appears that lockdep did not find anything, still
> the hang does trigger.
>
> it's /sbin/ifconfig and inet_ioctl() / dev_close() / rtl8169_down() that
> seems to be hanging. I've extracted the relevant backtrace below. I've
> Cc:-ed people who might have a better idea about what's going on.
>
> Ingo

Are you building with NAPI enabled or not. Looks like the following
might help the non-napi case.

--- a/drivers/net/r8169.c 2007-10-24 21:38:43.000000000 -0700
+++ b/drivers/net/r8169.c 2007-10-26 09:46:07.000000000 -0700
@@ -2989,13 +2989,16 @@ static void rtl8169_down(struct net_devi
{
struct rtl8169_private *tp = netdev_priv(dev);
void __iomem *ioaddr = tp->mmio_addr;
- unsigned int poll_locked = 0;
unsigned int intrmask;

rtl8169_delete_timer(dev);

netif_stop_queue(dev);

+#ifdef CONFIG_R8169_NAPI
+ napi_disable(&tp->napi);
+#endif
+
core_down:
spin_lock_irq(&tp->lock);

@@ -3009,11 +3012,6 @@ core_down:

synchronize_irq(dev->irq);

- if (!poll_locked) {
- napi_disable(&tp->napi);
- poll_locked++;
- }
-
/* Give a racing hard_start_xmit a few cycles to complete. */
synchronize_sched(); /* FIXME: should this be synchronize_irq()? */





--
Stephen Hemminger <[email protected]>

2007-10-26 17:56:47

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.24-rc1 fails with lockup - /sbin/ifconfig / inet_ioctl() / dev_close() / rtl8169_down()


* Stephen Hemminger <[email protected]> wrote:

> > hm, from your log it appears that lockdep did not find anything,
> > still the hang does trigger.
> >
> > it's /sbin/ifconfig and inet_ioctl() / dev_close() / rtl8169_down()
> > that seems to be hanging. I've extracted the relevant backtrace
> > below. I've Cc:-ed people who might have a better idea about what's
> > going on.
>
> Are you building with NAPI enabled or not. Looks like the following
> might help the non-napi case.

the config from:

http://www.dea.icai.upcomillas.es/romano/linux/info/2624rc1_1/

suggests that NAPI was disabled:

CONFIG_R8169=m
# CONFIG_R8169_NAPI is not set
CONFIG_R8169_VLAN=y

Ingo

2007-10-26 18:35:14

by Stephen Hemminger

[permalink] [raw]
Subject: [PATCH] r8169: don't call napi_disable if not doing NAPI

Don't call napi_disable if not configured.
And make sure that any misuse of napi_xxx in future fails
with a compile error.

Signed-off-by: Stephen Hemminger <[email protected]>

--- a/drivers/net/r8169.c 2007-10-24 21:38:43.000000000 -0700
+++ b/drivers/net/r8169.c 2007-10-26 11:27:02.000000000 -0700
@@ -392,7 +392,9 @@ struct rtl8169_private {
void __iomem *mmio_addr; /* memory map physical address */
struct pci_dev *pci_dev; /* Index of PCI device */
struct net_device *dev;
+#ifdef CONFIG_R8169_NAPI
struct napi_struct napi;
+#endif
spinlock_t lock; /* spin lock flag */
u32 msg_enable;
int chipset;
@@ -2989,13 +2991,16 @@ static void rtl8169_down(struct net_devi
{
struct rtl8169_private *tp = netdev_priv(dev);
void __iomem *ioaddr = tp->mmio_addr;
- unsigned int poll_locked = 0;
unsigned int intrmask;

rtl8169_delete_timer(dev);

netif_stop_queue(dev);

+#ifdef CONFIG_R8169_NAPI
+ napi_disable(&tp->napi);
+#endif
+
core_down:
spin_lock_irq(&tp->lock);

@@ -3009,11 +3014,6 @@ core_down:

synchronize_irq(dev->irq);

- if (!poll_locked) {
- napi_disable(&tp->napi);
- poll_locked++;
- }
-
/* Give a racing hard_start_xmit a few cycles to complete. */
synchronize_sched(); /* FIXME: should this be synchronize_irq()? */

2007-10-26 20:24:21

by Francois Romieu

[permalink] [raw]
Subject: Re: [PATCH] r8169: don't call napi_disable if not doing NAPI

Stephen Hemminger <[email protected]> :
> Don't call napi_disable if not configured.
> And make sure that any misuse of napi_xxx in future fails
> with a compile error.

Disable napi polling early and remove the useless poll_locked logic.

>
> Signed-off-by: Stephen Hemminger <[email protected]>

Acked-off-by: Francois Romieu <[email protected]>

--
Ueimor

2007-10-28 22:18:46

by Romano Giannetti

[permalink] [raw]
Subject: Re: [PATCH] r8169: don't call napi_disable if not doing NAPI


On Fri, 2007-10-26 at 11:33 -0700, Stephen Hemminger wrote:
> Don't call napi_disable if not configured.
> And make sure that any misuse of napi_xxx in future fails
> with a compile error.

Will test as soon as possible (been without internet in the week end).
Thanks.

As a bonus, I tried more thing, and I had a signal form lockdep. You can
find it here:

http://www.dea.icai.upcomillas.es/romano/linux/info/2624rc1_5/

patching and compiling now.

Romano

--
Sorry for the disclaimer --- ?I cannot stop it!



--
La presente comunicaci?n tiene car?cter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribuci?n, reproducci?n o uso de esta comunicaci?n y/o de la informaci?n contenida en la misma est?n estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicaci?n por error, por favor, notif?quelo inmediatamente al remitente contestando a este mensaje y proceda a continuaci?n a destruirlo. Gracias por su colaboraci?n.

This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation.

2007-10-29 08:56:56

by Romano Giannetti

[permalink] [raw]
Subject: Re: [PATCH] r8169: don't call napi_disable if not doing NAPI


On Fri, 2007-10-26 at 11:33 -0700, Stephen Hemminger wrote:
> Don't call napi_disable if not configured.
> And make sure that any misuse of napi_xxx in future fails
> with a compile error.
>
> Signed-off-by: Stephen Hemminger <[email protected]>
>

This fix the problem for me (at least, after 8 suspend/resume cycles).
Thanks.

Tested-by: Romano Giannetti <[email protected]>

--
Sorry for the disclaimer --- ¡I cannot stop it!



--
La presente comunicación tiene carácter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribución, reproducción o uso de esta comunicación y/o de la información contenida en la misma están estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por favor, notifíquelo inmediatamente al remitente contestando a este mensaje y proceda a continuación a destruirlo. Gracias por su colaboración.

This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation.

2007-12-04 10:09:02

by Ingo Molnar

[permalink] [raw]
Subject: [Bug 9246] On 2.6.24-rc1-gc9927c2b BUG: unable to handle kernel paging request at virtual address 3d15b925


hi,

* Giacomo Catenazzi <[email protected]> wrote:

> On 2.6.24-rc1-gc9927c2b BUG: unable to handle kernel paging request at
> virtual address 3d15b925
>
> In last git, I see the following BUGs in various programs. It seems
> reproducible, but sometime I've hard lookup on poweroff.

do you still get this with more recent kernels? We had a number of fixes
for memory corruptors since -rc1 - perhaps one of them took care of your
problem as well.

Ingo

2007-12-04 16:47:51

by Giacomo Catenazzi

[permalink] [raw]
Subject: Re: [Bug 9246] On 2.6.24-rc1-gc9927c2b BUG: unable to handle kernel paging request at virtual address 3d15b925

Ingo Molnar wrote:
> hi,
>
> * Giacomo Catenazzi <[email protected]> wrote:
>
>> On 2.6.24-rc1-gc9927c2b BUG: unable to handle kernel paging request at
>> virtual address 3d15b925
>>
>> In last git, I see the following BUGs in various programs. It seems
>> reproducible, but sometime I've hard lookup on poweroff.
>
> do you still get this with more recent kernels? We had a number of fixes
> for memory corruptors since -rc1 - perhaps one of them took care of your
> problem as well.

No, the problem was solved few days after the report.

Thanks,
cate

2007-12-04 19:49:31

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug 9246] On 2.6.24-rc1-gc9927c2b BUG: unable to handle kernel paging request at virtual address 3d15b925

On Tuesday, 4 of December 2007, Giacomo A. Catenazzi wrote:
> Ingo Molnar wrote:
> > hi,
> >
> > * Giacomo Catenazzi <[email protected]> wrote:
> >
> >> On 2.6.24-rc1-gc9927c2b BUG: unable to handle kernel paging request at
> >> virtual address 3d15b925
> >>
> >> In last git, I see the following BUGs in various programs. It seems
> >> reproducible, but sometime I've hard lookup on poweroff.
> >
> > do you still get this with more recent kernels? We had a number of fixes
> > for memory corruptors since -rc1 - perhaps one of them took care of your
> > problem as well.
>
> No, the problem was solved few days after the report.

Can you point me to the fix, please?

Rafael

2007-12-05 09:27:19

by Giacomo Catenazzi

[permalink] [raw]
Subject: Re: [Bug 9246] On 2.6.24-rc1-gc9927c2b BUG: unable to handle kernel paging request at virtual address 3d15b925

Rafael J. Wysocki wrote:
> On Tuesday, 4 of December 2007, Giacomo A. Catenazzi wrote:
>> Ingo Molnar wrote:
>>> hi,
>>>
>>> * Giacomo Catenazzi <[email protected]> wrote:
>>>
>>>> On 2.6.24-rc1-gc9927c2b BUG: unable to handle kernel paging request at
>>>> virtual address 3d15b925
>>>>
>>>> In last git, I see the following BUGs in various programs. It seems
>>>> reproducible, but sometime I've hard lookup on poweroff.
>>> do you still get this with more recent kernels? We had a number of fixes
>>> for memory corruptors since -rc1 - perhaps one of them took care of your
>>> problem as well.
>> No, the problem was solved few days after the report.
>
> Can you point me to the fix, please?

Unfortunately no. To much chaos in that period:
I think I incurred into two or three different kernel bugs
(and a Debian keyboard bug) in one or two days. Usually
I find such important bugs only few times per year,
and never together).

I tried also git-bisect, but too much runs, to many non-compiling
commits, bad environment (the Debian bug) and the quick fix of the
kernel bug ;-) stopped me in further searching.

ciao
cate