2012-05-21 00:00:45

by Linus Torvalds

[permalink] [raw]
Subject: Linux 3.4 released

I just pushed out the 3.4 release.

Nothing really exciting happened since -rc7, although the workaround
for a linker bug on x86 is larger than I'd have liked at this stage,
and sticks out like a sore thumb in the diffstat. That said, it's not
like even that patch was really all that scary.

In fact, I think the 3.4 release cycle as a whole has been fairly
calm. Sure, I always wish for the -rc's to calm down more quickly than
they ever seem to do, but I think on the whole we didn't have any big
disruptive events, which is just how I like it. Let's hope the 3.5
merge window is a calm one too.

Linus

--- ShortLog since 3.4-rc7 ---

Alan Cox (2):
tty: Fix LED error return
x86, kvm: KVM paravirt kernels don't check for CPUID being unavailable

Alexander Clouter (1):
crypto: mv_cesa requires on CRYPTO_HASH to build

Alexander Graf (3):
KVM: PPC: Book3S: PR: Handle EMUL_ASSIST
KVM: PPC: Fix PR KVM on POWER7 bare metal
KVM: PPC: Book3S: PR: Fix hsrr code

Amit Shah (2):
virtio: console: tell host of open ports after resume from s3/s4
virtio: balloon: let host know of updated balloon size before
module removal

Asai Thambi S P (1):
mtip32xx: release the semaphore on an error path

Barry Song (1):
ARM: PRIMA2: fix irq domain size and IRQ mask of internal
interrupt controller

Benjamin Herrenschmidt (1):
powerpc/kvm: Fix VSID usage in 64-bit "PR" KVM

Bernd Schubert (1):
bio allocation failure due to bio_get_nr_vecs()

Bernhard Kohl (1):
target: Fix SPC-2 RELEASE bug for multi-session iSCSI client setups

Brian Austin (1):
ASoC: cs42l73: Sync digital mixer kcontrols to allow for 0dB

Chris Metcalf (3):
arch/tile: fix up some issues in calling do_work_pending()
arch/tile: apply commit 74fca9da0 to the compat signal handling as well
tilegx: enable SYSCALL_WRAPPERS support

Cyrill Gorcunov (1):
fs, proc: fix ABBA deadlock in case of execution attempt of
map_files/ entries

Dan Carpenter (2):
[media] fintek-cir: change || to &&
openvswitch: checking wrong variable in queue_userspace_packet()

Dan Williams (1):
cdc_ether: add Novatel USB551L device IDs for FLAG_WWAN

David Ahern (1):
perf stat: handle ENXIO error for perf_event_open

David S. Miller (1):
bonding: Fix LACPDU rx_dropped commit.

Eric Dumazet (2):
pch_gbe: fix transmit races
pktgen: fix module unload for good

Geert Uytterhoeven (1):
ptp_pch: Add missing #include <linux/slab.h>

Greg Kroah-Hartman (1):
perf: Turn off compiler warnings for flex and bison generated files

Guennadi Liakhovetski (1):
[media] V4L: soc-camera: protect hosts during probing from
overzealous user-space

Gustavo Padovan (1):
Bluetooth: notify userspace of security level change

H Hartley Sweeten (2):
[media] media: videobuf2-dma-contig: quiet sparse noise about
plain integer as NULL pointer
[media] media: videobuf2-dma-contig: include header for exported symbols

H. Peter Anvin (3):
x86, realmode: 16-bit real-mode code support for relocs tool
x86, relocs: Workaround for binutils 2.22.52.0.1 section bug
x86, relocs: When printing an error, say relative or absolute

Hugh Dickins (1):
memcg,thp: fix res_counter:96 regression

Igor Mammedov (1):
sched: Fix KVM and ia64 boot crash due to sched_groups circular
linked list assumption

James Bottomley (2):
[PARISC] fix PA1.1 oops on boot
[PARISC] fix panic on prefetch(NULL) on PA7300LC

Jan Beulich (1):
x86: Fix section annotation of acpi_map_cpu2node()

Janusz Krzysztofik (1):
mtd: ams-delta: fix request_mem_region() failure

Jean-Fran?ois Moine (1):
[media] gspca - sonixj: Fix a zero divide in isoc interrupt

Jeff Layton (1):
cifs: fix misspelling of "forcedirectio"

Jeff Moyer (1):
block: don't mark buffers beyond end of disk as mapped

Jesper Juhl (1):
dac960: Remove unused variables from DAC960_CreateProcEntries()

Jiri Kosina (1):
genirq: export handle_edge_irq() and irq_to_desc()

Johan Hedberg (1):
Bluetooth: mgmt: Fix device_connected sending order

John David Anglin (1):
[PARISC] fix crash in flush_icache_page_asm on PA1.1

Jonathan Brassow (1):
MD: Add del_timer_sync to mddev_suspend (fix nasty panic)

Jonathan Corbet (1):
[media] marvell-cam: fix an ARM build error

Josh Cartwright (1):
jffs2: Fix lock acquisition order bug in gc path

Jozsef Kadlecsik (1):
netfilter: ipset: fix hash size checking in kernel

Larry Finger (1):
rtlwifi: fix for race condition when firmware is cached

Laurent Pinchart (1):
[media] media: vb2-memops: Export vb2_get_vma symbol

Linus Torvalds (2):
proc: move fd symlink i_mode calculations into tid_fd_revalidate()
Linux 3.4

Luis Henriques (1):
[media] rc: Postpone ISR registration

Mark Brown (1):
ASoC: wm8994: Fix AIF2ADC power down

Mauro Carvalho Chehab (1):
[media] dvb_frontend: fix a regression with DVB-S zig-zag

Michael S. Tsirkin (1):
virtio_net: invoke softirqs after __napi_schedule

Mike Snitzer (1):
dm thin: fix table output when pool target disables discard
passdown internally

Ming Lei (1):
usbnet: fix skb traversing races during unlink(v2)

Namhyung Kim (1):
perf build-id: Fix filename size calculation

NeilBrown (2):
md/raid10: set dev_sectors properly when resizing devices in array.
md/raid10: fix transcription error in calc_sectors conversion.

Nicholas Bellinger (1):
target: Fix bug in handling of FILEIO + block_device resize ops

Paul Gortmaker (1):
frv: delete incorrect task prototypes causing compile fail

Paul Mackerras (1):
KVM: PPC: Book3S HV: Fix bug leading to deadlock in guest HPT updates

Peter De Schrijver (1):
ARM: tegra: Fix flow controller accesses

Rafael J. Wysocki (1):
ACPI / PCI / PM: Fix device PM regression related to D3hot/D3cold

Rajkumar Kasirajan (1):
drivers/rtc/rtc-pl031.c: configure correct wday for 2000-01-01

Russell King (2):
Fix blkdev.h build errors when BLOCK=n
ARM: prevent VM_GROWSDOWN mmaps extending below FIRST_USER_ADDRESS

Srivatsa S. Bhat (3):
x86/microcode: Ensure that module is only loaded on supported Intel CPUs
parisc/CPU hotplug: Add missing call to notify_cpu_starting()
mn10300/CPU hotplug: Add missing call to notify_cpu_starting()

Stephane Eranian (1):
perf stat: Fix case where guest/host monitoring is not supported by kernel

Steven Rostedt (1):
tracing: Do not enable function event with enable

Subramaniam Chanderashekarapuram (1):
remoteproc: fix off-by-one bug in __rproc_free_vrings

Sylwester Nawrocki (3):
[media] V4L: Schedule V4L2_CID_HCENTER, V4L2_CID_VCENTER
controls for removal
[media] s5p-fimc: Fix locking in subdev set_crop op
[media] s5p-fimc: Correct memory allocation for VIDIOC_CREATE_BUFS

Takashi Iwai (1):
ALSA: hda/idt - Fix power-map for speaker-pins with some HP laptops

Tejun Heo (1):
block: fix buffer overflow when printing partition UUIDs

Tony Luck (1):
x86/mce: Only restart instruction after machine check recovery
if it is safe

Tushar Dave (1):
e1000: Prevent reset task killing itself.

Vinod Koul (2):
dmaengine: pl330: dont complete descriptor for cyclic dma
dmaengine: fix cyclic dma usage

Vitaly Andrianov (1):
ARM: 7418/1: LPAE: fix access flag setup in mem_type_table

Will Deacon (2):
ARM: 7417/1: vfp: ensure preemption is disabled when enabling VFP access
ARM: 7419/1: vfp: fix VFP flushing regression on sigreturn path

Will Newton (1):
mtd: fix oops in dataflash driver

Willy Tarreau (1):
tcp: do_tcp_sendpages() must try to push data out on oom conditions

majianpeng (1):
slub: missing test for partial pages flush work in flush_all()


2012-05-21 11:38:06

by Josh Boyer

[permalink] [raw]
Subject: Re: Linux 3.4 released

On Sun, May 20, 2012 at 8:00 PM, Linus Torvalds
<[email protected]> wrote:
> I just pushed out the 3.4 release.

I see patch-3.4 on http://www.kernel.org/pub/linux/kernel/v3.x/ but
I can't find the full tarball release. Was that skipped or did I miss
it somewhere?

josh

2012-05-21 15:20:36

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 3.4 released

On Mon, May 21, 2012 at 4:38 AM, Josh Boyer <[email protected]> wrote:
>
> I see patch-3.4 on http://www.kernel.org/pub/linux/kernel/v3.x/ but
> I can't find the full tarball release. ?Was that skipped or did I miss
> it somewhere?

My release-script generates both automatically, but it seems to not
have shown up. I might have screwed something up, or it might never
have made it due to the network trouble we had yesterday, and I didn't
check errors from the script.

Anyway, I re-did the tar-ball a few minutes ago, and now it seems to be there.

Sorry for the bother,

Linus

2012-05-21 16:41:05

by Josh Boyer

[permalink] [raw]
Subject: Re: Linux 3.4 released

On Mon, May 21, 2012 at 11:20 AM, Linus Torvalds
<[email protected]> wrote:
> On Mon, May 21, 2012 at 4:38 AM, Josh Boyer <[email protected]> wrote:
>>
>> I see patch-3.4 on http://www.kernel.org/pub/linux/kernel/v3.x/ but
>> I can't find the full tarball release. ?Was that skipped or did I miss
>> it somewhere?
>
> My release-script generates both automatically, but it seems to not
> have shown up. I might have screwed something up, or it might never
> have made it due to the network trouble we had yesterday, and I didn't
> check errors from the script.
>
> Anyway, I re-did the tar-ball a few minutes ago, and now it seems to be there.


Yep. Thanks!

josh

2012-05-21 18:13:40

by Tobias Klausmann

[permalink] [raw]
Subject: Re: Linux 3.4 released

Hello there, got a build error while compiling linux-3.4:

drivers/scsi/lpfc/lpfc_scsi.c: In function ?lpfc_bg_setup_bpl?:
drivers/scsi/lpfc/lpfc_scsi.c:1900:11: error: unused variable ?rc?
[-Werror=unused-variable]
drivers/scsi/lpfc/lpfc_scsi.c: In function ?lpfc_bg_setup_bpl_prot?:
drivers/scsi/lpfc/lpfc_scsi.c:2037:11: error: unused variable ?rc?
[-Werror=unused-variable]
drivers/scsi/lpfc/lpfc_scsi.c: In function ?lpfc_bg_setup_sgl?:
drivers/scsi/lpfc/lpfc_scsi.c:2256:11: error: unused variable ?rc?
[-Werror=unused-variable]
drivers/scsi/lpfc/lpfc_scsi.c: In function ?lpfc_bg_setup_sgl_prot?:
drivers/scsi/lpfc/lpfc_scsi.c:2386:11: error: unused variable ?rc?
[-Werror=unused-variable]
cc1: all warnings being treated as errors

This is caused by the args defined in the Makefile for the driver:
ccflags-y += -Werror

Thought this might be helpful

Greetings
Tobias Klausmann

2012-05-22 15:30:56

by Jörg-Volker Peetz

[permalink] [raw]
Subject: Re: Linux 3.4 released

Switching from self-compiled kernel 3.2 17 to a self compiled kernel 3.4.0,
a notebook HP Pavilion dv7 gets hard locked with a kernel panic, when trying to
start a web-cam video viewer (guvcview) for the built-in USB web-cam.

Please find attached a (hand-typed) screen-shot of the text-console and the
kernel config.

By the way, thank you for all the great work on Linux.
--
Best regards,
J?rg-Volker.


Attachments:
panic-screen-3.4.0 (2.69 kB)
config-3.4.0 (62.58 kB)
Download all attachments

2012-05-22 15:53:52

by Tejun Heo

[permalink] [raw]
Subject: Re: Linux 3.4 released

Hello,

On Tue, May 22, 2012 at 05:30:37PM +0200, J?rg-Volker Peetz wrote:
> Switching from self-compiled kernel 3.2 17 to a self compiled kernel 3.4.0,
> a notebook HP Pavilion dv7 gets hard locked with a kernel panic, when trying to
> start a web-cam video viewer (guvcview) for the built-in USB web-cam.
>
> Please find attached a (hand-typed) screen-shot of the text-console and the
> kernel config.
>
> By the way, thank you for all the great work on Linux.
> --
> Best regards,
> J?rg-Volker.

> BUG: Unable to handle kernel NULL pointer dereference at 0000000000000008
...
> Code: 8b 7c 24 50 48 83 c4 58 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 48 8b 0f 31 c0 48 89 fa 48 89 ce 40 80 e6 00 83 e1 04 48 0f 45 c6 <48> 8b 70 08 65 8b 3c 25 60 cc 00 00 e9 b9 fc ff ff 66 0f 1f 84
> RIP [<ffffffff8103ed46>] delayed_work_timer_fn+0x16/0x30

So, that looks like get_work_cwq() returning NULL and then
delayed_work_timer_fn() trying to dereference it. Either work item is
being corrupted (e.g. freed early) or somebody is mucking with the
work item embedded in a delayed work item.

Something like the following may reveal the offending work function.

Thanks.

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 5abf42f..adc1057 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1101,6 +1101,10 @@ static void delayed_work_timer_fn(unsigned long __data)
struct delayed_work *dwork = (struct delayed_work *)__data;
struct cpu_workqueue_struct *cwq = get_work_cwq(&dwork->work);

+ if (!cwq)
+ printk("XXX delayed_work_timer_fn: NULL cwq, fn=%pf\n",
+ dwork->work.func);
+
__queue_work(smp_processor_id(), cwq->wq, &dwork->work);
}

2012-05-22 16:53:03

by Jörg-Volker Peetz

[permalink] [raw]
Subject: Re: Linux 3.4 released

Tejun Heo wrote, on 05/22/12 17:53:
> Hello,
>
> On Tue, May 22, 2012 at 05:30:37PM +0200, J?rg-Volker Peetz wrote:
>> Switching from self-compiled kernel 3.2 17 to a self compiled kernel 3.4.0,
>> a notebook HP Pavilion dv7 gets hard locked with a kernel panic, when trying to
>> start a web-cam video viewer (guvcview) for the built-in USB web-cam.
>>
>> Please find attached a (hand-typed) screen-shot of the text-console and the
>> kernel config.
>>
>> By the way, thank you for all the great work on Linux.
>> --
>> Best regards,
>> J?rg-Volker.
>
>> BUG: Unable to handle kernel NULL pointer dereference at 0000000000000008
> ...
>> Code: 8b 7c 24 50 48 83 c4 58 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 48 8b 0f 31 c0 48 89 fa 48 89 ce 40 80 e6 00 83 e1 04 48 0f 45 c6 <48> 8b 70 08 65 8b 3c 25 60 cc 00 00 e9 b9 fc ff ff 66 0f 1f 84
>> RIP [<ffffffff8103ed46>] delayed_work_timer_fn+0x16/0x30
>
> So, that looks like get_work_cwq() returning NULL and then
> delayed_work_timer_fn() trying to dereference it. Either work item is
> being corrupted (e.g. freed early) or somebody is mucking with the
> work item embedded in a delayed work item.
>
> Something like the following may reveal the offending work function.
>
> Thanks.
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 5abf42f..adc1057 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1101,6 +1101,10 @@ static void delayed_work_timer_fn(unsigned long __data)
> struct delayed_work *dwork = (struct delayed_work *)__data;
> struct cpu_workqueue_struct *cwq = get_work_cwq(&dwork->work);
>
> + if (!cwq)
> + printk("XXX delayed_work_timer_fn: NULL cwq, fn=%pf\n",
> + dwork->work.func);
> +
> __queue_work(smp_processor_id(), cwq->wq, &dwork->work);
> }
>

Hello,

I tried the above patch but was not able to see a line beginning with "XXX", not
on the text-console nor in any log-file. After the hard-lock, I can see only the
console-screen which now changed slightly:

BUG: Unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff8103ed60>] delayed_work_timer_fn+0x30/0x60
PGD 214fbc067 PUD 211c50067 PMD 0
Oops: 0000 [#1] SMP
CPU 1

...

Call Trace:
<IRQ>
[<ffffffff8103ed30>] ? __queu_work+0x320/0x320
[<ffffffff810342c6>] ? run_timer_softirq+0x106/0x220
[<ffffffff8105cc34>] ? tick_handle_oneshot_broadcast+0xb5/0xe0
[<ffffffff8102f19d>] ? __do_softirq+0x8d/0x110
[<ffffffff81069589>] ? handle_irq_event_percpu+0x79/0x140
[<ffffffff813dfa8c>] ? call_softirq+0x1c/0x26
[<ffffffff81003a8d>] ? do_softirq+0x4d/0x80
[<ffffffff8102f495>] ? irq_exit+0xa5/0xb0
[<ffffffff8100372b>] ? do_IRQ+0x5b/0xd0
[<ffffffff813de227>] ? common_interrupt+0x67/0x67
<EOI>
[<ffffffff81009e60>] ? default_idle+0x20/0x40
[<ffffffff81009ff8>] ? amd_e400_idle+0xa8/0xf0
[<ffffffff8100a7a6>] ? cpu_idle+0xb6/0xd0

...

After that I have to press the power button until the computer switches off.
What else could I do?
--
Best regards,
J?rg-Volker.

2012-05-22 17:03:28

by Tejun Heo

[permalink] [raw]
Subject: Re: Linux 3.4 released

On Tue, May 22, 2012 at 06:52:49PM +0200, J?rg-Volker Peetz wrote:
> I tried the above patch but was not able to see a line beginning with "XXX", not
> on the text-console nor in any log-file. After the hard-lock, I can see only the
> console-screen which now changed slightly:
>
> BUG: Unable to handle kernel NULL pointer dereference at 0000000000000008
> IP: [<ffffffff8103ed60>] delayed_work_timer_fn+0x30/0x60

Oh, &cwq->wq is at offset 8 so cwq should have been -8. Maybe I'm
just confused. Can you please try the following instead?

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 5abf42f..14babfe 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1096,10 +1096,16 @@ queue_work_on(int cpu, struct workqueue_struct *wq, struct work_struct *work)
}
EXPORT_SYMBOL_GPL(queue_work_on);

+#include <linux/uaccess.h>
static void delayed_work_timer_fn(unsigned long __data)
{
struct delayed_work *dwork = (struct delayed_work *)__data;
struct cpu_workqueue_struct *cwq = get_work_cwq(&dwork->work);
+ unsigned long v;
+
+ if (probe_kernel_read(&v, &cwq->wq, sizeof(v)))
+ printk("XXX delayed_work_timer_fn: cwq %p, fn=%pf\n",
+ cwq, dwork->work.func);

__queue_work(smp_processor_id(), cwq->wq, &dwork->work);
}

2012-05-22 18:26:26

by Jörg-Volker Peetz

[permalink] [raw]
Subject: Re: Linux 3.4 released

Tejun Heo wrote, on 05/22/12 19:03:
> On Tue, May 22, 2012 at 06:52:49PM +0200, J?rg-Volker Peetz wrote:
>> I tried the above patch but was not able to see a line beginning with "XXX", not
>> on the text-console nor in any log-file. After the hard-lock, I can see only the
>> console-screen which now changed slightly:
>>
>> BUG: Unable to handle kernel NULL pointer dereference at 0000000000000008
>> IP: [<ffffffff8103ed60>] delayed_work_timer_fn+0x30/0x60
>
> Oh, &cwq->wq is at offset 8 so cwq should have been -8. Maybe I'm
> just confused. Can you please try the following instead?
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 5abf42f..14babfe 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1096,10 +1096,16 @@ queue_work_on(int cpu, struct workqueue_struct *wq, struct work_struct *work)
> }
> EXPORT_SYMBOL_GPL(queue_work_on);
>
> +#include <linux/uaccess.h>
> static void delayed_work_timer_fn(unsigned long __data)
> {
> struct delayed_work *dwork = (struct delayed_work *)__data;
> struct cpu_workqueue_struct *cwq = get_work_cwq(&dwork->work);
> + unsigned long v;
> +
> + if (probe_kernel_read(&v, &cwq->wq, sizeof(v)))
> + printk("XXX delayed_work_timer_fn: cwq %p, fn=%pf\n",
> + cwq, dwork->work.func);
>
> __queue_work(smp_processor_id(), cwq->wq, &dwork->work);
> }

Also with this second patch I wasn't able to see any output beginning with "XXX
delayed_work_timer_fn:". It should appear in the system log or on the text-console?

The screen dump starts with:

BUG: Unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff........>] delayed_work_timer_fn+0x31/0x70

I didn't find the time to type more. Or did I miss the essential?
--
Best regards,
J?rg-Volker.

2012-05-22 18:35:53

by Tejun Heo

[permalink] [raw]
Subject: Re: Linux 3.4 released

Hello,

On Tue, May 22, 2012 at 11:26 AM, J?rg-Volker Peetz <[email protected]> wrote:
> Also with this second patch I wasn't able to see any output beginning with "XXX
> delayed_work_timer_fn:". It should appear in the system log or on the text-console?

Hmmm... it should appear on the console but your printk setting might
be different. Can you please prepend KERN_CRIT before the printk
format string? So that it looks like printk(KERN_CRIT "XXX: ...").

Thanks.

--
tejun

2012-05-22 19:50:28

by Jörg-Volker Peetz

[permalink] [raw]
Subject: Re: Linux 3.4 released

Tejun Heo wrote, on 05/22/12 20:35:
> Hello,
>
> On Tue, May 22, 2012 at 11:26 AM, J?rg-Volker Peetz <[email protected]> wrote:
>> Also with this second patch I wasn't able to see any output beginning with "XXX
>> delayed_work_timer_fn:". It should appear in the system log or on the text-console?
>
> Hmmm... it should appear on the console but your printk setting might
> be different. Can you please prepend KERN_CRIT before the printk
> format string? So that it looks like printk(KERN_CRIT "XXX: ...").
>
> Thanks.
>

Hello,

no, changing to printk(KERN_CRIT "XXX: ...", ...) in kernel/workqueue.c didn't
print anything I could read. Would it appear before the "panic screen"? But I'm
not able to scroll back.
The Panic screen starts with

BUG: Unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff8103eef1>] delayed_work_timer_fn+0x13/0x70

and the Call Trace with

Call Trace:
<IRQ>
[<ffffffff81056fbf>] ? ktime_get+0x5f/0xf0
[<ffffffff810342c6>] ? run_timer_softirq+0x106/0x220
[<ffffffff8105cc44>] ? tick_handle_oneshot_broadcast+0xc4/0xe0

Do I need special settings in the kernel config?

--
Best regards,
J?rg-Volker.

2012-05-23 06:34:33

by Yong Zhang

[permalink] [raw]
Subject: Re: Linux 3.4 released

On Tue, May 22, 2012 at 09:50:10PM +0200, J�rg-Volker Peetz wrote:
> Tejun Heo wrote, on 05/22/12 20:35:
> > Hello,
> >
> > On Tue, May 22, 2012 at 11:26 AM, J?rg-Volker Peetz <[email protected]> wrote:
> >> Also with this second patch I wasn't able to see any output beginning with "XXX
> >> delayed_work_timer_fn:". It should appear in the system log or on the text-console?
> >
> > Hmmm... it should appear on the console but your printk setting might
> > be different. Can you please prepend KERN_CRIT before the printk
> > format string? So that it looks like printk(KERN_CRIT "XXX: ...").
> >
> > Thanks.
> >
>
> Hello,
>
> no, changing to printk(KERN_CRIT "XXX: ...", ...) in kernel/workqueue.c didn't
> print anything I could read. Would it appear before the "panic screen"? But I'm
> not able to scroll back.
> The Panic screen starts with
>
> BUG: Unable to handle kernel NULL pointer dereference at 0000000000000008
> IP: [<ffffffff8103eef1>] delayed_work_timer_fn+0x13/0x70
>
> and the Call Trace with
>
> Call Trace:
> <IRQ>
> [<ffffffff81056fbf>] ? ktime_get+0x5f/0xf0
> [<ffffffff810342c6>] ? run_timer_softirq+0x106/0x220
> [<ffffffff8105cc44>] ? tick_handle_oneshot_broadcast+0xc4/0xe0
>
> Do I need special settings in the kernel config?

Maybe you can enable CONFIG_DEBUG_OBJECTS_WORK and give another try.

Thanks,
Yong

>
> --
> Best regards,
> J?rg-Volker.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Only stand for myself

2012-05-23 12:33:23

by Jörg-Volker Peetz

[permalink] [raw]
Subject: Re: Linux 3.4 released

Yong Zhang wrote, on 05/23/12 08:34:
> On Tue, May 22, 2012 at 09:50:10PM +0200, J�rg-Volker Peetz wrote:
>> Tejun Heo wrote, on 05/22/12 20:35:
>>> Hello,
>>>
>>> On Tue, May 22, 2012 at 11:26 AM, J?rg-Volker Peetz <[email protected]> wrote:
>>>> Also with this second patch I wasn't able to see any output beginning with "XXX
>>>> delayed_work_timer_fn:". It should appear in the system log or on the text-console?
>>>
>>> Hmmm... it should appear on the console but your printk setting might
>>> be different. Can you please prepend KERN_CRIT before the printk
>>> format string? So that it looks like printk(KERN_CRIT "XXX: ...").
>>>
>>> Thanks.
>>>
>>
>> Hello,
>>
>> no, changing to printk(KERN_CRIT "XXX: ...", ...) in kernel/workqueue.c didn't
>> print anything I could read. Would it appear before the "panic screen"? But I'm
>> not able to scroll back.
>> The Panic screen starts with
>>
>> BUG: Unable to handle kernel NULL pointer dereference at 0000000000000008
>> IP: [<ffffffff8103eef1>] delayed_work_timer_fn+0x13/0x70
>>
>> and the Call Trace with
>>
>> Call Trace:
>> <IRQ>
>> [<ffffffff81056fbf>] ? ktime_get+0x5f/0xf0
>> [<ffffffff810342c6>] ? run_timer_softirq+0x106/0x220
>> [<ffffffff8105cc44>] ? tick_handle_oneshot_broadcast+0xc4/0xe0
>>
>> Do I need special settings in the kernel config?
>
> Maybe you can enable CONFIG_DEBUG_OBJECTS_WORK and give another try.
>
> Thanks,
> Yong
>

Hello,

with the patch of Tejun Heo and CONFIG_DEBUG_OBJECTS_WORK enabled via "make
nconfig" doesn't show any output line beginning with "XXX".

--
Best regards,
Jörg-Volker.

2012-05-23 18:25:04

by Tejun Heo

[permalink] [raw]
Subject: Re: Linux 3.4 released

Hello, J?rg-Volker.

Please always use reply-to-all.

On Tue, May 22, 2012 at 09:50:10PM +0200, J?rg-Volker Peetz wrote:
> no, changing to printk(KERN_CRIT "XXX: ...", ...) in kernel/workqueue.c didn't
> print anything I could read. Would it appear before the "panic screen"? But I'm
> not able to scroll back.
> The Panic screen starts with

It should appear right above BUG:. Can you please try the following
instead?

Thanks.

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 5abf42f..57c33ef 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1096,10 +1096,18 @@ queue_work_on(int cpu, struct workqueue_struct *wq, struct work_struct *work)
}
EXPORT_SYMBOL_GPL(queue_work_on);

+#include <linux/uaccess.h>
static void delayed_work_timer_fn(unsigned long __data)
{
struct delayed_work *dwork = (struct delayed_work *)__data;
struct cpu_workqueue_struct *cwq = get_work_cwq(&dwork->work);
+ unsigned long v;
+
+ if (probe_kernel_read(&v, &cwq->wq, sizeof(v))) {
+ printk(KERN_CRIT "XXX delayed_work_timer_fn: cwq %p, fn=%pf\n",
+ cwq, dwork->work.func);
+ return;
+ }

__queue_work(smp_processor_id(), cwq->wq, &dwork->work);
}

2012-05-23 19:57:01

by Jörg-Volker Peetz

[permalink] [raw]
Subject: Re: Linux 3.4 released

Tejun Heo wrote, on 05/23/12 20:24:
> Hello, J?rg-Volker.
>
> Please always use reply-to-all.
>
> On Tue, May 22, 2012 at 09:50:10PM +0200, J?rg-Volker Peetz wrote:
>> no, changing to printk(KERN_CRIT "XXX: ...", ...) in kernel/workqueue.c didn't
>> print anything I could read. Would it appear before the "panic screen"? But I'm
>> not able to scroll back.
>> The Panic screen starts with
>
> It should appear right above BUG:. Can you please try the following
> instead?
>
> Thanks.
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 5abf42f..57c33ef 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1096,10 +1096,18 @@ queue_work_on(int cpu, struct workqueue_struct *wq, struct work_struct *work)
> }
> EXPORT_SYMBOL_GPL(queue_work_on);
>
> +#include <linux/uaccess.h>
> static void delayed_work_timer_fn(unsigned long __data)
> {
> struct delayed_work *dwork = (struct delayed_work *)__data;
> struct cpu_workqueue_struct *cwq = get_work_cwq(&dwork->work);
> + unsigned long v;
> +
> + if (probe_kernel_read(&v, &cwq->wq, sizeof(v))) {
> + printk(KERN_CRIT "XXX delayed_work_timer_fn: cwq %p, fn=%pf\n",
> + cwq, dwork->work.func);
> + return;
> + }
>
> __queue_work(smp_processor_id(), cwq->wq, &dwork->work);
> }

Hello Tejun,

thank you for bearing with me. I applied this patch. When starting "guvcvideo"
now, the computer seems to work normally, i.e. this program shows the view from
the built-in USB web-cam on the screen. In the kernel log file it says:

May 23 21:32:33 hostname kernel: XXX delayed_work_timer_fn: cwq
(null), fn=hdmi_repoll_eld

(without line-break).

By the way, don't know if this is related, I have a phenomenon with a spurious
interrupt with every linux version I've used before on this notebook. Half a
minute after starting the system the computer produces approx. 220 lines like

... kernel: hda-intel: spurious response 0x0:0x0, last cmd=0x170503

Now with 3.4.0, I see an additional message right before (the minute before) the
"XXX ..." line:

...kernel: hda_intel: azx_get_response timeout, switching to single_cmd mode:
last cmd=0x003f0900

--
Best regards,
J?rg-Volker.

2012-05-23 20:27:05

by Tejun Heo

[permalink] [raw]
Subject: Re: Linux 3.4 released

Cc'ing Takashi. Hi!

On Wed, May 23, 2012 at 09:56:36PM +0200, J?rg-Volker Peetz wrote:
> May 23 21:32:33 hostname kernel: XXX delayed_work_timer_fn: cwq
> (null), fn=hdmi_repoll_eld

So, we have the winner.

Takashi, sound/pci/hda/patch_hdmi.c::hdmi_repoll_eld() is causing
workqueue code dereference %NULL pointer. It *looks* like something
is corrupting the work item while it's queued. It could be a
workqueue bug but I don't think that's likely - the code has been
stable for quite some time now. I glanced through the code and
nothing stands out. Does something ring a bell?

> (without line-break).
>
> By the way, don't know if this is related, I have a phenomenon with a spurious
> interrupt with every linux version I've used before on this notebook. Half a
> minute after starting the system the computer produces approx. 220 lines like
>
> ... kernel: hda-intel: spurious response 0x0:0x0, last cmd=0x170503
>
> Now with 3.4.0, I see an additional message right before (the minute before) the
> "XXX ..." line:
>
> ...kernel: hda_intel: azx_get_response timeout, switching to single_cmd mode:
> last cmd=0x003f0900

These too seem to be for you, Takashi. :)

Thanks.

--
tejun

2012-05-25 07:25:15

by Takashi Iwai

[permalink] [raw]
Subject: Re: Linux 3.4 released

At Wed, 23 May 2012 13:26:57 -0700,
Tejun Heo wrote:
>
> Cc'ing Takashi. Hi!

Also Cc'ed Fengguang, who worked on ELD stuff.

> On Wed, May 23, 2012 at 09:56:36PM +0200, Jörg-Volker Peetz wrote:
> > May 23 21:32:33 hostname kernel: XXX delayed_work_timer_fn: cwq
> > (null), fn=hdmi_repoll_eld
>
> So, we have the winner.
>
> Takashi, sound/pci/hda/patch_hdmi.c::hdmi_repoll_eld() is causing
> workqueue code dereference %NULL pointer. It *looks* like something
> is corrupting the work item while it's queued. It could be a
> workqueue bug but I don't think that's likely - the code has been
> stable for quite some time now. I glanced through the code and
> nothing stands out. Does something ring a bell?

I also don't know of this problem. My initial thought was that the
work struct placed right after sink_eld in struct hdmi_spec_per_pin is
overwritten wrongly by reading some ELD data. But I failed to spot
out the bug...

Reading back through the thread, the problem seems triggered via usb
video cam. I wonder how this is connected to the HDMI audio.

To get things straight: does this bug happen even without HDMI, DP or
DVI cable plugged, i.e. only with the laptop without connecting to the
external digital output?


> > (without line-break).
> >
> > By the way, don't know if this is related, I have a phenomenon with a spurious
> > interrupt with every linux version I've used before on this notebook. Half a
> > minute after starting the system the computer produces approx. 220 lines like
> >
> > ... kernel: hda-intel: spurious response 0x0:0x0, last cmd=0x170503
> >
> > Now with 3.4.0, I see an additional message right before (the minute before) the
> > "XXX ..." line:
> >
> > ...kernel: hda_intel: azx_get_response timeout, switching to single_cmd mode:
> > last cmd=0x003f0900
>
> These too seem to be for you, Takashi. :)

This means essentially the codec communication got stalled. This is a
bad signal. It happens often with a wrong HD-audio verb, but often
with a bad IRQ, whatever.

I'd need alsa-info.sh output (run with --no-upload option) for further
analysis.


thanks,

Takashi

2012-05-25 15:33:44

by Jörg-Volker Peetz

[permalink] [raw]
Subject: Re: Linux 3.4 released

Hello,

Takashi Iwai wrote, on 05/25/12 09:25:
> At Wed, 23 May 2012 13:26:57 -0700,
> Tejun Heo wrote:
>>
>> Cc'ing Takashi. Hi!
>
> Also Cc'ed Fengguang, who worked on ELD stuff.
>
>> On Wed, May 23, 2012 at 09:56:36PM +0200, Jörg-Volker Peetz wrote:
>>> May 23 21:32:33 hostname kernel: XXX delayed_work_timer_fn: cwq
>>> (null), fn=hdmi_repoll_eld
>>
>> So, we have the winner.
>>
>> Takashi, sound/pci/hda/patch_hdmi.c::hdmi_repoll_eld() is causing
>> workqueue code dereference %NULL pointer. It *looks* like something
>> is corrupting the work item while it's queued. It could be a
>> workqueue bug but I don't think that's likely - the code has been
>> stable for quite some time now. I glanced through the code and
>> nothing stands out. Does something ring a bell?
>
> I also don't know of this problem. My initial thought was that the
> work struct placed right after sink_eld in struct hdmi_spec_per_pin is
> overwritten wrongly by reading some ELD data. But I failed to spot
> out the bug...
>
> Reading back through the thread, the problem seems triggered via usb
> video cam. I wonder how this is connected to the HDMI audio.
>
> To get things straight: does this bug happen even without HDMI, DP or
> DVI cable plugged, i.e. only with the laptop without connecting to the
> external digital output?
>
yes it happens without any HDMI cable plugged. The notebook is only connected to
an ethernet cable and the power cable. I'll append /var/log/dmesg, it also
contains the kernel command line with "radeon.audio=1".

The computer has two graphic chips:
ATI Mobility Radeon HD 4200 integrated graphics (non-free firmware R600_rlc.bin)
ATI Mobility Radeon HD 5470 graphic (512MB) (non-free firmware CEDAR_*.bin)
During booting, the discrete GPU is switched off using vga switcheroo:

$ mount -t debugfs none /sys/kernel/debug
$ echo -n OFF > /sys/kernel/debug/vgaswitcheroo/switch

For the sound kernel module the following options are set in
/etc/modprobe.d/alsa-base.conf:

options snd-hda-intel model=hp-dv7-4000 enable_msi=1

>
>>> (without line-break).
>>>
>>> By the way, don't know if this is related, I have a phenomenon with a spurious
>>> interrupt with every linux version I've used before on this notebook. Half a
>>> minute after starting the system the computer produces approx. 220 lines like
>>>
>>> ... kernel: hda-intel: spurious response 0x0:0x0, last cmd=0x170503
>>>
>>> Now with 3.4.0, I see an additional message right before (the minute before) the
>>> "XXX ..." line:
>>>
>>> ...kernel: hda_intel: azx_get_response timeout, switching to single_cmd mode:
>>> last cmd=0x003f0900
>>
>> These too seem to be for you, Takashi. :)
>
> This means essentially the codec communication got stalled. This is a
> bad signal. It happens often with a wrong HD-audio verb, but often
> with a bad IRQ, whatever.
>
> I'd need alsa-info.sh output (run with --no-upload option) for further
> analysis.
>
>
> thanks,
>
> Takashi

My first try to run the alsa-info.sh script with the plain 3.4 kernel produced
the same kernel oops freezing the notebook (and /tmp is mounted on tmpfs).
Therefore I applied the patch from Tejun to produce a usable output.
I attach it also. As you will notice, it contains the line beginning with "XXX"
due to Tejun's patch.

Thank you all for your help.
--
Best regards,
Jörg-Volker.


Attachments:
alsa-info-hp-pavilion-dv7.txt (49.00 kB)
dmesg_hp-pavilion-dv7-linux-3.4.0 (42.26 kB)
Download all attachments

2012-05-25 16:06:15

by Takashi Iwai

[permalink] [raw]
Subject: Re: Linux 3.4 released

At Fri, 25 May 2012 17:33:11 +0200,
Jörg-Volker Peetz wrote:
>
> Hello,
>
> Takashi Iwai wrote, on 05/25/12 09:25:
> > At Wed, 23 May 2012 13:26:57 -0700,
> > Tejun Heo wrote:
> >>
> >> Cc'ing Takashi. Hi!
> >
> > Also Cc'ed Fengguang, who worked on ELD stuff.
> >
> >> On Wed, May 23, 2012 at 09:56:36PM +0200, Jörg-Volker Peetz wrote:
> >>> May 23 21:32:33 hostname kernel: XXX delayed_work_timer_fn: cwq
> >>> (null), fn=hdmi_repoll_eld
> >>
> >> So, we have the winner.
> >>
> >> Takashi, sound/pci/hda/patch_hdmi.c::hdmi_repoll_eld() is causing
> >> workqueue code dereference %NULL pointer. It *looks* like something
> >> is corrupting the work item while it's queued. It could be a
> >> workqueue bug but I don't think that's likely - the code has been
> >> stable for quite some time now. I glanced through the code and
> >> nothing stands out. Does something ring a bell?
> >
> > I also don't know of this problem. My initial thought was that the
> > work struct placed right after sink_eld in struct hdmi_spec_per_pin is
> > overwritten wrongly by reading some ELD data. But I failed to spot
> > out the bug...
> >
> > Reading back through the thread, the problem seems triggered via usb
> > video cam. I wonder how this is connected to the HDMI audio.
> >
> > To get things straight: does this bug happen even without HDMI, DP or
> > DVI cable plugged, i.e. only with the laptop without connecting to the
> > external digital output?
> >
> yes it happens without any HDMI cable plugged. The notebook is only connected to
> an ethernet cable and the power cable. I'll append /var/log/dmesg, it also
> contains the kernel command line with "radeon.audio=1".
>
> The computer has two graphic chips:
> ATI Mobility Radeon HD 4200 integrated graphics (non-free firmware R600_rlc.bin)
> ATI Mobility Radeon HD 5470 graphic (512MB) (non-free firmware CEDAR_*.bin)
> During booting, the discrete GPU is switched off using vga switcheroo:
>
> $ mount -t debugfs none /sys/kernel/debug
> $ echo -n OFF > /sys/kernel/debug/vgaswitcheroo/switch

This explains the codec stall, at least. Disabling the D-GPU also
disables the HD-audio controller. Once when it's disabled, even
accessing the PCI may trigger an Oops. It's a known problem.

The support of vga-switcheroo for HD-audio was recently added, and I
sent a pull request to Linus today. Try the latest Linus tree and
pull sound git tree hda-switcheroo tag onto it:
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git tags/hda-switcheroo

I'm not sure whether this is related with the workq Oops, though.
At least, you can try without disabling D-GPU to check whether you see
the same workq problem.


> For the sound kernel module the following options are set in
> /etc/modprobe.d/alsa-base.conf:
>
> options snd-hda-intel model=hp-dv7-4000 enable_msi=1
>
> >
> >>> (without line-break).
> >>>
> >>> By the way, don't know if this is related, I have a phenomenon with a spurious
> >>> interrupt with every linux version I've used before on this notebook. Half a
> >>> minute after starting the system the computer produces approx. 220 lines like
> >>>
> >>> ... kernel: hda-intel: spurious response 0x0:0x0, last cmd=0x170503
> >>>
> >>> Now with 3.4.0, I see an additional message right before (the minute before) the
> >>> "XXX ..." line:
> >>>
> >>> ...kernel: hda_intel: azx_get_response timeout, switching to single_cmd mode:
> >>> last cmd=0x003f0900
> >>
> >> These too seem to be for you, Takashi. :)
> >
> > This means essentially the codec communication got stalled. This is a
> > bad signal. It happens often with a wrong HD-audio verb, but often
> > with a bad IRQ, whatever.
> >
> > I'd need alsa-info.sh output (run with --no-upload option) for further
> > analysis.
> >
> >
> > thanks,
> >
> > Takashi
>
> My first try to run the alsa-info.sh script with the plain 3.4 kernel produced
> the same kernel oops freezing the notebook (and /tmp is mounted on tmpfs).
> Therefore I applied the patch from Tejun to produce a usable output.
> I attach it also. As you will notice, it contains the line beginning with "XXX"
> due to Tejun's patch.

Get alsa-info.sh without disabling D-GPU if you run it on 3.4 or
earlier kernel.


thanks,

Takashi

2012-05-25 16:06:25

by Takashi Iwai

[permalink] [raw]
Subject: Re: Linux 3.4 released

At Fri, 25 May 2012 17:33:11 +0200,
Jörg-Volker Peetz wrote:
>
> Hello,
>
> Takashi Iwai wrote, on 05/25/12 09:25:
> > At Wed, 23 May 2012 13:26:57 -0700,
> > Tejun Heo wrote:
> >>
> >> Cc'ing Takashi. Hi!
> >
> > Also Cc'ed Fengguang, who worked on ELD stuff.
> >
> >> On Wed, May 23, 2012 at 09:56:36PM +0200, Jörg-Volker Peetz wrote:
> >>> May 23 21:32:33 hostname kernel: XXX delayed_work_timer_fn: cwq
> >>> (null), fn=hdmi_repoll_eld
> >>
> >> So, we have the winner.
> >>
> >> Takashi, sound/pci/hda/patch_hdmi.c::hdmi_repoll_eld() is causing
> >> workqueue code dereference %NULL pointer. It *looks* like something
> >> is corrupting the work item while it's queued. It could be a
> >> workqueue bug but I don't think that's likely - the code has been
> >> stable for quite some time now. I glanced through the code and
> >> nothing stands out. Does something ring a bell?
> >
> > I also don't know of this problem. My initial thought was that the
> > work struct placed right after sink_eld in struct hdmi_spec_per_pin is
> > overwritten wrongly by reading some ELD data. But I failed to spot
> > out the bug...
> >
> > Reading back through the thread, the problem seems triggered via usb
> > video cam. I wonder how this is connected to the HDMI audio.
> >
> > To get things straight: does this bug happen even without HDMI, DP or
> > DVI cable plugged, i.e. only with the laptop without connecting to the
> > external digital output?
> >
> yes it happens without any HDMI cable plugged. The notebook is only connected to
> an ethernet cable and the power cable. I'll append /var/log/dmesg, it also
> contains the kernel command line with "radeon.audio=1".
>
> The computer has two graphic chips:
> ATI Mobility Radeon HD 4200 integrated graphics (non-free firmware R600_rlc.bin)
> ATI Mobility Radeon HD 5470 graphic (512MB) (non-free firmware CEDAR_*.bin)
> During booting, the discrete GPU is switched off using vga switcheroo:
>
> $ mount -t debugfs none /sys/kernel/debug
> $ echo -n OFF > /sys/kernel/debug/vgaswitcheroo/switch

This explains the codec stall, at least. Disabling the D-GPU also
disables the HD-audio controller. Once when it's disabled, even
accessing the PCI may trigger an Oops. It's a known problem.

The support of vga-switcheroo for HD-audio was recently added, and I
sent a pull request to Linus today. Try the latest Linus tree and
pull sound git tree hda-switcheroo tag onto it:
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git tags/hda-switcheroo

I'm not sure whether this is related with the workq Oops, though.
At least, you can try without disabling D-GPU to check whether you see
the same workq problem.


> For the sound kernel module the following options are set in
> /etc/modprobe.d/alsa-base.conf:
>
> options snd-hda-intel model=hp-dv7-4000 enable_msi=1
>
> >
> >>> (without line-break).
> >>>
> >>> By the way, don't know if this is related, I have a phenomenon with a spurious
> >>> interrupt with every linux version I've used before on this notebook. Half a
> >>> minute after starting the system the computer produces approx. 220 lines like
> >>>
> >>> ... kernel: hda-intel: spurious response 0x0:0x0, last cmd=0x170503
> >>>
> >>> Now with 3.4.0, I see an additional message right before (the minute before) the
> >>> "XXX ..." line:
> >>>
> >>> ...kernel: hda_intel: azx_get_response timeout, switching to single_cmd mode:
> >>> last cmd=0x003f0900
> >>
> >> These too seem to be for you, Takashi. :)
> >
> > This means essentially the codec communication got stalled. This is a
> > bad signal. It happens often with a wrong HD-audio verb, but often
> > with a bad IRQ, whatever.
> >
> > I'd need alsa-info.sh output (run with --no-upload option) for further
> > analysis.
> >
> >
> > thanks,
> >
> > Takashi
>
> My first try to run the alsa-info.sh script with the plain 3.4 kernel produced
> the same kernel oops freezing the notebook (and /tmp is mounted on tmpfs).
> Therefore I applied the patch from Tejun to produce a usable output.
> I attach it also. As you will notice, it contains the line beginning with "XXX"
> due to Tejun's patch.

Get alsa-info.sh without disabling D-GPU if you run it on 3.4 or
earlier kernel.


thanks,

Takashi

2012-05-25 18:42:05

by Jörg-Volker Peetz

[permalink] [raw]
Subject: Re: Linux 3.4 released

Takashi Iwai wrote, on 05/25/12 18:06:
> At Fri, 25 May 2012 17:33:11 +0200,
> Jörg-Volker Peetz wrote:
>>
>> Hello,
>>
>> Takashi Iwai wrote, on 05/25/12 09:25:
>>> At Wed, 23 May 2012 13:26:57 -0700,
>>> Tejun Heo wrote:
>>>>
>>>> Cc'ing Takashi. Hi!
>>>
>>> Also Cc'ed Fengguang, who worked on ELD stuff.
>>>
>>>> On Wed, May 23, 2012 at 09:56:36PM +0200, Jörg-Volker Peetz wrote:
>>>>> May 23 21:32:33 hostname kernel: XXX delayed_work_timer_fn: cwq
>>>>> (null), fn=hdmi_repoll_eld
>>>>
>>>> So, we have the winner.
>>>>
>>>> Takashi, sound/pci/hda/patch_hdmi.c::hdmi_repoll_eld() is causing
>>>> workqueue code dereference %NULL pointer. It *looks* like something
>>>> is corrupting the work item while it's queued. It could be a
>>>> workqueue bug but I don't think that's likely - the code has been
>>>> stable for quite some time now. I glanced through the code and
>>>> nothing stands out. Does something ring a bell?
>>>
>>> I also don't know of this problem. My initial thought was that the
>>> work struct placed right after sink_eld in struct hdmi_spec_per_pin is
>>> overwritten wrongly by reading some ELD data. But I failed to spot
>>> out the bug...
>>>
>>> Reading back through the thread, the problem seems triggered via usb
>>> video cam. I wonder how this is connected to the HDMI audio.
>>>
>>> To get things straight: does this bug happen even without HDMI, DP or
>>> DVI cable plugged, i.e. only with the laptop without connecting to the
>>> external digital output?
>>>
>> yes it happens without any HDMI cable plugged. The notebook is only connected to
>> an ethernet cable and the power cable. I'll append /var/log/dmesg, it also
>> contains the kernel command line with "radeon.audio=1".
>>
>> The computer has two graphic chips:
>> ATI Mobility Radeon HD 4200 integrated graphics (non-free firmware R600_rlc.bin)
>> ATI Mobility Radeon HD 5470 graphic (512MB) (non-free firmware CEDAR_*.bin)
>> During booting, the discrete GPU is switched off using vga switcheroo:
>>
>> $ mount -t debugfs none /sys/kernel/debug
>> $ echo -n OFF > /sys/kernel/debug/vgaswitcheroo/switch
>
> This explains the codec stall, at least. Disabling the D-GPU also
> disables the HD-audio controller. Once when it's disabled, even
> accessing the PCI may trigger an Oops. It's a known problem.
>
> The support of vga-switcheroo for HD-audio was recently added, and I
> sent a pull request to Linus today. Try the latest Linus tree and
> pull sound git tree hda-switcheroo tag onto it:
> git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git tags/hda-switcheroo
>

I will try that and report the result. Is it ok if I use the patch of Tejun on
top of this in order to avoid a freeze?

> I'm not sure whether this is related with the workq Oops, though.
> At least, you can try without disabling D-GPU to check whether you see
> the same workq problem.
>
Simply switching on the discrete GPU with

$ echo -n ON > /sys/kernel/debug/vgaswitcheroo/switch

after it has been switched off results in the same oops and the output of
alsa-info.sh differs only in a few lines (see the attached diff-file).

>
>> For the sound kernel module the following options are set in
>> /etc/modprobe.d/alsa-base.conf:
>>
>> options snd-hda-intel model=hp-dv7-4000 enable_msi=1
>>
>>>
>>>>> (without line-break).
>>>>>
>>>>> By the way, don't know if this is related, I have a phenomenon with a spurious
>>>>> interrupt with every linux version I've used before on this notebook. Half a
>>>>> minute after starting the system the computer produces approx. 220 lines like
>>>>>
>>>>> ... kernel: hda-intel: spurious response 0x0:0x0, last cmd=0x170503
>>>>>
>>>>> Now with 3.4.0, I see an additional message right before (the minute before) the
>>>>> "XXX ..." line:
>>>>>
>>>>> ...kernel: hda_intel: azx_get_response timeout, switching to single_cmd mode:
>>>>> last cmd=0x003f0900
>>>>
>>>> These too seem to be for you, Takashi. :)
>>>
>>> This means essentially the codec communication got stalled. This is a
>>> bad signal. It happens often with a wrong HD-audio verb, but often
>>> with a bad IRQ, whatever.
>>>
>>> I'd need alsa-info.sh output (run with --no-upload option) for further
>>> analysis.
>>>
>>>
>>> thanks,
>>>
>>> Takashi
>>
>> My first try to run the alsa-info.sh script with the plain 3.4 kernel produced
>> the same kernel oops freezing the notebook (and /tmp is mounted on tmpfs).
>> Therefore I applied the patch from Tejun to produce a usable output.
>> I attach it also. As you will notice, it contains the line beginning with "XXX"
>> due to Tejun's patch.
>
> Get alsa-info.sh without disabling D-GPU if you run it on 3.4 or
> earlier kernel.
>
For the case without mounting debugfs and , thus, both GPUS active, the output
of alsa-info.sh is also attached. It doesn't trigger the oops and the viewer for
the built-in USB-camera works also without triggering the oops.
>
> thanks,
>
> Takashi
--
Best regards,
Jörg-Volker.


Attachments:
alsa-info.txt-ddis-switched-on.diff (0.99 kB)
alsa-info.txt-both-gpus (23.30 kB)
Download all attachments

2012-05-27 13:04:00

by Jörg-Volker Peetz

[permalink] [raw]
Subject: Re: Linux 3.4 released

Hello,

meanwhile I tried the support of vga-switcheroo for HD-audio from your git tree
without success.

Since I'm not familiar with git I'll describe what I did:

$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
$ cd linux
$ git reset --keep v3.4
$ git pull git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git
tags/hda-switcheroo

This resulted in a merge conflict for file
drivers/gpu/drm/i915/intel_ringbuffer.c which I didn't resolve since both of my
GPU are AMD radeon chips.

$ cp -p /boot/config-3.4.0 .config
$ make oldconfig
$ make
$ make modules_install

After installing kernel and system map (I don't use an initrd) , I restarted the
notebook. Before starting X, when I call the script which should mount debugfs
and switche off the discrete GPU, the system freezes.
I haven't written down all of the text on the console. It seems to be a sequence
of bugs, but I couldn't scroll back. The last stack trace begins with

... wq_worker_sleeping+0x8/0x80
... __scheduler+0x35b/0x510
... do_exit+0x542/0x830

--
Best regards,
Jörg-Volker.

2012-05-28 05:16:53

by Takashi Iwai

[permalink] [raw]
Subject: Re: Linux 3.4 released

At Sun, 27 May 2012 15:03:41 +0200,
Jörg-Volker Peetz wrote:
>
> Hello,
>
> meanwhile I tried the support of vga-switcheroo for HD-audio from your git tree
> without success.
>
> Since I'm not familiar with git I'll describe what I did:
>
> $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> $ cd linux
> $ git reset --keep v3.4
> $ git pull git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git
> tags/hda-switcheroo

Pulling sound git tree branch onto 3.4 (or a tree without a proper
update of DRM tree) is known to be broken because it misses a few fix
commits in the VGA switcheroo part.

Linus already merged the HD-audio vga-switcheroo support in his tree.
So, just pull Linus tree and use it as is.

If it still causes the problem, get debugfs output of vga switcheroo
and the kernel messages before switching off D-GPU.


Takashi

2012-05-28 05:17:00

by Takashi Iwai

[permalink] [raw]
Subject: Re: Linux 3.4 released

At Sun, 27 May 2012 15:03:41 +0200,
Jörg-Volker Peetz wrote:
>
> Hello,
>
> meanwhile I tried the support of vga-switcheroo for HD-audio from your git tree
> without success.
>
> Since I'm not familiar with git I'll describe what I did:
>
> $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> $ cd linux
> $ git reset --keep v3.4
> $ git pull git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git
> tags/hda-switcheroo

Pulling sound git tree branch onto 3.4 (or a tree without a proper
update of DRM tree) is known to be broken because it misses a few fix
commits in the VGA switcheroo part.

Linus already merged the HD-audio vga-switcheroo support in his tree.
So, just pull Linus tree and use it as is.

If it still causes the problem, get debugfs output of vga switcheroo
and the kernel messages before switching off D-GPU.


Takashi

2012-06-06 11:12:14

by Jörg-Volker Peetz

[permalink] [raw]
Subject: freeze hard-lock with 3.5-rc1 with dynpm for radeon GPU [was Re: Linux 3.4 released]

Takashi Iwai wrote, on 05/28/12 07:16:
> At Sun, 27 May 2012 15:03:41 +0200,
> Jörg-Volker Peetz wrote:
>>
>> Hello,
>>
>> meanwhile I tried the support of vga-switcheroo for HD-audio from your git tree
>> without success.
>>
>> Since I'm not familiar with git I'll describe what I did:
>>
>> $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>> $ cd linux
>> $ git reset --keep v3.4
>> $ git pull git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git
>> tags/hda-switcheroo
>
> Pulling sound git tree branch onto 3.4 (or a tree without a proper
> update of DRM tree) is known to be broken because it misses a few fix
> commits in the VGA switcheroo part.
>
> Linus already merged the HD-audio vga-switcheroo support in his tree.
> So, just pull Linus tree and use it as is.
>
> If it still causes the problem, get debugfs output of vga switcheroo
> and the kernel messages before switching off D-GPU.
>
>
> Takashi

Hello Takashi, hello David,

I have managed to test 3.5-rc1 on my HP Pavilion dv7 notebook with two AMD GPU.
The problem I had with the USB web-cam is fixed:
using vgaswitcheroo to switch off the discrete GPU and the discrete HDMI audio via
echo -n OFF > /sys/kernel/debug/vgaswitcheroo/switch
works and after starting X I'm able to activate the built-in USB web-cam with
guvcview. Thank you very much.

But now I have another regression: trying to use the power management method
"dynpm" via
echo -n dynpm > /sys/class/drm/card0/device/power_method
even before mounting debugfs, lets the machine freeze with at least two trace
calls on the console screen. On the console only the last two lines of one call
trace and the last one are visible (machine is frozen). It says (typed by hand,
therefore, not complete):

...
---[ end trace a926a4156be75305 ]---
BUG: unable to handle kernel paging request at fffffffffffffff8
IP: [<ffffffff81045097>] kthread_data+0x7/0x10
PGD 1585067 PUD 1586067 PMD 0
Oops: 0000 [#2] SMP
CPU 1

...

Call Trace:
[< ... >] ? wq_worker_sleeping+0x8/0x80
... ? __schedule+0x363/0x520
... ? do_exit+0x552/0x850
... ? oops_end+0x67/0x90
... ? no_context+0x24e/0x279
... ? do_page_fault+0x2bb/0x460

...

What to do next?
--
Best regards,
Jörg-Volker.




2012-08-09 06:48:07

by Jörg-Volker Peetz

[permalink] [raw]
Subject: 3.5 kernel NULL pointer dereference net_tx_action

Dear maintainers,

with kernel 3.5 on debian x86_64 and wpa_supplicant 1.0 on an MSI laptop trying
to start the wireless network adapter, results in a kernel oops:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000082
IP: [<ffffffff81325e70>] net_tx_action+0xd0/0xd0
PGD 392aa067 PUD 390b7067 PMD 0
Oops: 0002 [#1]
CPU 0
Modules linked in: snd_atiixp_modem snd_atiixp snd_ac97_codec snd_pcm
snd_page_alloc ac97_bus arc4 rt2500pci eeprom_93cx6 snd_seq rt2x00pci
snd_seq_device snd_timer snd rt2x00lib mac80211 8250_pci soundcore 8139too
cfg80211 8250 pcmcia mii serial_core sdhci_pci psmouse sdhci k8temp

Pid: 2617, comm: wpa_supplicant Not tainted 3.5.0 #1 MICRO-STAR INT'L CO.,LTD
MS-1013
RIP: 0010:[<ffffffff81325e70>] [<ffffffff81325e70>] net_tx_action+0xd0/0xd0
RSP: 0018:ffff88003862fc20 EFLAGS: 00010086
RAX: ffff880039326ef8 RBX: ffff88003932b6c0 RCX: 0000000000000000
RDX: 00000000ffffffff RSI: 0000000000000001 RDI: 0000000000000002
RBP: 00000000000000b8 R08: ffffffff813aeee0 R09: ffff88003911aa00
R10: 0000000000000001 R11: 0000000000000000 R12: ffff88003932b6c1
R13: 0000000000000000 R14: ffff8800384251d0 R15: ffff8800384244e0
FS: 00007f6acab12700(0000) GS:ffffffff8153d000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000082 CR3: 000000003972f000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process wpa_supplicant (pid: 2617, threadinfo ffff88003862e000, task
ffff88003911aa00)
Stack:
ffffffffa010189d ffff88003a12c578 ffff88003932b6c0 0000000000000001
0000000000000246 ffff8800397f2a78 0000000000000000 0000000000000001
0000000000000000 ffff8800384244e0 ffffffffa0101939 ffff8800397f2a58
Call Trace:
[<ffffffffa010189d>] ? ieee80211_propagate_queue_wake+0xfd/0x110 [mac80211]
[<ffffffffa0101939>] ? ieee80211_wake_queue_by_reason+0x9/0x10 [mac80211]
[<ffffffffa0128c2d>] ? rt2x00queue_flush_queue+0x6d/0xd0 [rt2x00lib]
[<ffffffffa01277e0>] ? rt2x00mac_flush+0x20/0x50 [rt2x00lib]
[<ffffffffa00f1fe2>] ? __ieee80211_recalc_idle+0x202/0x220 [mac80211]
[<ffffffffa00f280c>] ? ieee80211_do_open+0x19c/0x7f0 [mac80211]
[<ffffffff8132aa6e>] ? __dev_open+0x8e/0xe0
[<ffffffff8132acfa>] ? __dev_change_flags+0x9a/0x180
[<ffffffff8132ae90>] ? dev_change_flags+0x20/0x70
[<ffffffff81374857>] ? devinet_ioctl+0x667/0x7e0
[<ffffffff8131722d>] ? sock_ioctl+0x5d/0x260
[<ffffffff810a47df>] ? do_vfs_ioctl+0x8f/0x550
[<ffffffff810a8496>] ? d_kill+0xb6/0x110
[<ffffffff810a8995>] ? dput+0x65/0x130
[<ffffffff810a4ce9>] ? sys_ioctl+0x49/0x90
[<ffffffff813a87e0>] ? system_call_fastpath+0x16/0x1b
Code: db 48 89 df 75 d5 48 83 c4 08 5b 5d c3 be 96 0b 00 00 48 c7 c7 6c bd 4a 81
e8 9d e3 cf ff e9 75 ff ff ff 0f 1f 84 00 00 00 00 00 <0f> ba af 80 00 00 00 00
19 c0 85 c0 74 02 f3 c3 53 9c 5b fa 48
RIP [<ffffffff81325e70>] net_tx_action+0xd0/0xd0
RSP <ffff88003862fc20>
CR2: 0000000000000082
---[ end trace 3590f1d9f55c8367 ]---

Sometimes the machine freezes.

I couldn't find any hints on the web.
Any clues?
Further informations about the system can be provided on request.

--
Best regards,
Jörg-Volker.




2012-08-10 08:45:39

by Jörg-Volker Peetz

[permalink] [raw]
Subject: Re: 3.5 kernel NULL pointer dereference net_tx_action

With stable release 3.5.1 this is cured and WLAN is working flawless again.
Many thanks to the maintainers.
--
Jörg-Volker Peetz.