2011-03-28 15:52:49

by Christopher

[permalink] [raw]
Subject: Linux 2.6.38 freeze because of sound/core/pcm_lib.c commit 59ff878ffb26bc0be812ca8295799164f413ae88

Hi

Kernel 2.6.38 has been freezing ever since rc1 on my asus p7p55d.
After booting up, the kernel seems to use a lot of cpu-time and after
20-30 seconds
the system comes to a complete halt.

I just finished bisecting 2.6.37 - 2.6.38-rc1 and found this commit to
be responsible:

59ff878ffb26bc0be812ca8295799164f413ae88

I attached a patch that reverts this commit, although I don't quite
understand how
it could cause a complete system-lockup. A friend of mine owns exactly the same
motherboard, cpu and ram as me but is not affected.

I noticed this in my logs, but I don't think it's related:

WARNING: at drivers/pci/dmar.c:634 warn_invalid_dmar+0x8a/0xa0()
Hardware name: System Product Name
Your BIOS is broken; DMAR reported at address fed90000 returns all ones!
BIOS vendor: American Megatrends Inc.; Ver: 2003 ; Product Version:
System Version
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.38.2-ARCH #1
Call Trace:
[<ffffffff8105c07a>] ? warn_slowpath_common+0x7a/0xb0
[<ffffffff81627c47>] ? __early_set_fixmap+0x96/0x9d
[<ffffffff8105c10a>] ? warn_slowpath_fmt_taint+0x3a/0x40
[<ffffffff81627c47>] ? __early_set_fixmap+0x96/0x9d
[<ffffffff812337ea>] ? warn_invalid_dmar+0x8a/0xa0
[<ffffffff81628143>] ? early_iounmap+0xf4/0x13d
[<ffffffff8163d2b7>] ? check_zero_address+0xd4/0x116
[<ffffffff81276bad>] ? acpi_get_table_with_size+0x53/0xb4
[<ffffffff813bb55a>] ? _etext+0x0/0x256aa6
[<ffffffff8163d310>] ? detect_intel_iommu+0x17/0x84
[<ffffffff81619865>] ? pci_iommu_alloc+0x47/0x72
[<ffffffff81627a7c>] ? mem_init+0x19/0xec
[<ffffffff81612ace>] ? start_kernel+0x203/0x40d
[<ffffffff81612347>] ? x86_64_start_reservations+0x132/0x136
[<ffffffff8161244c>] ? x86_64_start_kernel+0x101/0x110
---[ end trace a7919e7f17c0a725 ]---


I'd be glad to help debug this.
Hope I'm not doing anything wrong, this is my first time posting to lkml.

Regards
Chris

--- a/sound/core/pcm_lib.c 2011-03-27 20:37:20.000000000 +0200
+++ b/sound/core/pcm_lib.c 2011-03-28 17:14:26.000000000 +0200
@@ -374,25 +374,8 @@
(unsigned long)runtime->hw_ptr_base);
}

- if (runtime->no_period_wakeup) {
- /*
- * Without regular period interrupts, we have to check
- * the elapsed time to detect xruns.
- */
- jdelta = jiffies - runtime->hw_ptr_jiffies;
- if (jdelta < runtime->hw_ptr_buffer_jiffies / 2)
- goto no_delta_check;
- hdelta = jdelta - delta * HZ / runtime->rate;
- while (hdelta > runtime->hw_ptr_buffer_jiffies / 2 + 1) {
- delta += runtime->buffer_size;
- hw_base += runtime->buffer_size;
- if (hw_base >= runtime->boundary)
- hw_base = 0;
- new_hw_ptr = hw_base + pos;
- hdelta -= runtime->hw_ptr_buffer_jiffies;
- }
+ if (runtime->no_period_wakeup)
goto no_delta_check;
- }

/* something must be really wrong */
if (delta >= runtime->buffer_size + runtime->period_size) {


2011-03-31 11:57:32

by Clemens Ladisch

[permalink] [raw]
Subject: Re: Linux 2.6.38 freeze because of sound/core/pcm_lib.c commit 59ff878ffb26bc0be812ca8295799164f413ae88

Christopher K. wrote:
> Kernel 2.6.38 has been freezing ever since rc1 on my asus p7p55d.
> After booting up, the kernel seems to use a lot of cpu-time and after
> 20-30 seconds
> the system comes to a complete halt.
>
> I just finished bisecting 2.6.37 - 2.6.38-rc1 and found this commit to
> be responsible:
>
> 59ff878ffb26bc0be812ca8295799164f413ae88

Please try this patch:

--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -425,6 +425,8 @@ static int snd_pcm_hw_params(struct snd_
runtime->no_period_wakeup =
(params->info & SNDRV_PCM_INFO_NO_PERIOD_WAKEUP) &&
(params->flags & SNDRV_PCM_HW_PARAMS_NO_PERIOD_WAKEUP);
+ runtime->hw_ptr_buffer_jiffies =
+ max(1uL, (runtime->buffer_size * HZ) / runtime->rate);

bits = snd_pcm_format_physical_width(runtime->format);
runtime->sample_bits = bits;
@@ -866,8 +868,6 @@ static void snd_pcm_post_start(struct sn
struct snd_pcm_runtime *runtime = substream->runtime;
snd_pcm_trigger_tstamp(substream);
runtime->hw_ptr_jiffies = jiffies;
- runtime->hw_ptr_buffer_jiffies = (runtime->buffer_size * HZ) /
- runtime->rate;
runtime->status->state = state;
if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK &&
runtime->silence_size > 0)

2011-03-31 16:09:14

by Christopher

[permalink] [raw]
Subject: Re: Linux 2.6.38 freeze because of sound/core/pcm_lib.c commit 59ff878ffb26bc0be812ca8295799164f413ae88

On 31 March 2011 13:58, Clemens Ladisch <[email protected]> wrote:
> Please try this patch:

I'm afraid the patch you provided did not change anything for me. I
applied it to 2.6.38.2,
still freezes 15 - 20 seconds after startup (Could be around the time
pulseaudio initializes).
Is there any way to provide you with more debug information?

Onboard: VIA VT1828S

HDA Intel 0000:00:1b.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
HDA Intel 0000:00:1b.0: irq 47 for MSI/MSI-X
HDA Intel 0000:00:1b.0: setting latency timer to 64
nvidia: module license 'NVIDIA' taints kernel.
hda-codec: no NID for mapping control Independent HP:0:0
HDA Intel 0000:01:00.1: PCI INT B -> GSI 17 (level, low) -> IRQ 17
hda_intel: Disable MSI for Nvidia chipset
HDA Intel 0000:01:00.1: setting latency timer to 64
...
hda_intel: Disable MSI for Nvidia chipset
nvidia 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
nvidia 0000:01:00.0: setting latency timer to 64
NVRM: loading NVIDIA UNIX x86_64 Kernel Module 270.30 Fri Feb 25
14:34:55 PST 2011

I use a Nvidia gtx 460, could it be related (as it features sound capabilities)?
Using nouveau does not change anything, neither does 2.6.39-rc1.

2011-03-31 21:50:48

by Kelly Anderson

[permalink] [raw]
Subject: Re: Linux 2.6.38 freeze because of sound/core/pcm_lib.c commit 59ff878ffb26bc0be812ca8295799164f413ae88

On 03/31/11 10:09, Christopher K. wrote:
> On 31 March 2011 13:58, Clemens Ladisch<[email protected]> wrote:
>> Please try this patch:
> I'm afraid the patch you provided did not change anything for me. I
> applied it to 2.6.38.2,
> still freezes 15 - 20 seconds after startup (Could be around the time
> pulseaudio initializes).
> Is there any way to provide you with more debug information?
>
> Onboard: VIA VT1828S
>
> HDA Intel 0000:00:1b.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
> HDA Intel 0000:00:1b.0: irq 47 for MSI/MSI-X
> HDA Intel 0000:00:1b.0: setting latency timer to 64
> nvidia: module license 'NVIDIA' taints kernel.
> hda-codec: no NID for mapping control Independent HP:0:0
> HDA Intel 0000:01:00.1: PCI INT B -> GSI 17 (level, low) -> IRQ 17
> hda_intel: Disable MSI for Nvidia chipset
> HDA Intel 0000:01:00.1: setting latency timer to 64
> ...
> hda_intel: Disable MSI for Nvidia chipset
> nvidia 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
> nvidia 0000:01:00.0: setting latency timer to 64
> NVRM: loading NVIDIA UNIX x86_64 Kernel Module 270.30 Fri Feb 25
> 14:34:55 PST 2011
>
> I use a Nvidia gtx 460, could it be related (as it features sound capabilities)?
> Using nouveau does not change anything, neither does 2.6.39-rc1.
Clemens,

Just to let you know. I'm using a gtx460 also. I haven't had a chance
to reboot into the newly patched kernel yet. It is interesting that we
both have gtx460's. I have a machine with a gtx430 that doesn't seem to
exhibit the problem.

cat /etc/modprobe.d/snd_hda_intel.conf

options snd-hda-intel probe_mask=-1,0xa

2011-04-01 01:34:05

by Kelly Anderson

[permalink] [raw]
Subject: Re: Linux 2.6.38 freeze because of sound/core/pcm_lib.c commit 59ff878ffb26bc0be812ca8295799164f413ae88

On 03/31/11 15:50, Kelly Anderson wrote:
>
> Just to let you know. I'm using a gtx460 also. I haven't had a
> chance to reboot into the newly patched kernel yet. It is interesting
> that we both have gtx460's. I have a machine with a gtx430 that
> doesn't seem to exhibit the problem.
>
> cat /etc/modprobe.d/snd_hda_intel.conf
>
> options snd-hda-intel probe_mask=-1,0xa
>
OK, I debugged the problem and came up with a solution. The bottom line
is that it's a problem with a signed to unsigned compare. First the fix:

--- ./sound/core/pcm_lib.c.orig 2011-03-27 12:37:20.000000000 -0600
+++ ./sound/core/pcm_lib.c 2011-03-31 19:01:35.392739127 -0600
@@ -379,17 +379,17 @@ static int snd_pcm_update_hw_ptr0(struct
* Without regular period interrupts, we have to check
* the elapsed time to detect xruns.
*/
- jdelta = jiffies - runtime->hw_ptr_jiffies;
- if (jdelta < runtime->hw_ptr_buffer_jiffies / 2)
+ jdelta = (snd_pcm_sframes_t)(jiffies - runtime->hw_ptr_jiffies);
+ if (jdelta < (snd_pcm_sframes_t)runtime->hw_ptr_buffer_jiffies / 2)
goto no_delta_check;
hdelta = jdelta - delta * HZ / runtime->rate;
- while (hdelta > runtime->hw_ptr_buffer_jiffies / 2 + 1) {
+ while (hdelta >
(snd_pcm_sframes_t)runtime->hw_ptr_buffer_jiffies / 2 + 1) {
delta += runtime->buffer_size;
hw_base += runtime->buffer_size;
if (hw_base >= runtime->boundary)
hw_base = 0;
new_hw_ptr = hw_base + pos;
- hdelta -= runtime->hw_ptr_buffer_jiffies;
+ hdelta -= (snd_pcm_sframes_t)runtime->hw_ptr_buffer_jiffies;
}
goto no_delta_check;
}

And here's the patch I used to debug the problem:

--- ./sound/core/pcm_lib.c.orig 2011-03-27 12:37:20.000000000 -0600
+++ ./sound/core/pcm_lib.c 2011-03-31 18:44:23.701411841 -0600
@@ -379,18 +379,31 @@ static int snd_pcm_update_hw_ptr0(struct
* Without regular period interrupts, we have to check
* the elapsed time to detect xruns.
*/
+ int loopCount = 0;
+ BUG_ON(runtime->hw_ptr_jiffies < 0);
+ BUG_ON(runtime->hw_ptr_jiffies == 0);
+ BUG_ON(runtime->hw_ptr_jiffies > jiffies);
jdelta = jiffies - runtime->hw_ptr_jiffies;
if (jdelta < runtime->hw_ptr_buffer_jiffies / 2)
goto no_delta_check;
hdelta = jdelta - delta * HZ / runtime->rate;
- while (hdelta > runtime->hw_ptr_buffer_jiffies / 2 + 1) {
+ printk(KERN_INFO "Before loop: jdelta=%ld, hdelta=%ld
runtime->hw_ptr_buffer_jiffies=%lu\n"
+ , jdelta , hdelta ,
runtime->hw_ptr_buffer_jiffies);
+ while (hdelta > runtime->hw_ptr_buffer_jiffies / 2 + 1) {
delta += runtime->buffer_size;
hw_base += runtime->buffer_size;
if (hw_base >= runtime->boundary)
hw_base = 0;
new_hw_ptr = hw_base + pos;
- hdelta -= runtime->hw_ptr_buffer_jiffies;
+ hdelta -= runtime->hw_ptr_buffer_jiffies;
+ if (++loopCount >= 1024*1024*4)
+ {
+ printk(KERN_INFO "stopped the madness at 4 thousand\n");
+ break;
+ }
}
+ printk(KERN_INFO "After loop: jdelta=%ld, hdelta=%ld
runtime->hw_ptr_buffer_jiffies=%lu\n"
+ , jdelta , hdelta ,
runtime->hw_ptr_buffer_jiffies);
goto no_delta_check;
}


The kernel output with the testing patch applied:

Mar 31 18:37:02 speedy kernel: Before loop: jdelta=352, hdelta=352
runtime->hw_ptr_buffer_jiffies=371
Mar 31 18:37:02 speedy kernel: stopped the madness at 4 million
Mar 31 18:37:02 speedy kernel: After loop: jdelta=352,
hdelta=-1556086432 runtime->hw_ptr_buffer_jiffies=371

2011-04-01 03:27:08

by Christopher

[permalink] [raw]
Subject: Re: Linux 2.6.38 freeze because of sound/core/pcm_lib.c commit 59ff878ffb26bc0be812ca8295799164f413ae88

> On 03/31/11 15:50, Kelly Anderson wrote:
>>
>> Just to let you know.  I'm using a gtx460 also.  I haven't had a chance to
>> reboot into the newly patched kernel yet.  It is interesting that we both
>> have gtx460's.  I have a machine with a gtx430 that doesn't seem to exhibit
>> the problem.
>>
>> cat /etc/modprobe.d/snd_hda_intel.conf
>>
>> options snd-hda-intel probe_mask=-1,0xa
>>
> OK, I debugged the problem and came up with a solution.  The bottom line is
> that it's a problem with a signed to unsigned compare.  First the fix:

I can confirm that this patch fixes the issue. Thanks a lot!

2011-04-01 07:59:07

by Clemens Ladisch

[permalink] [raw]
Subject: Re: Linux 2.6.38 freeze because of sound/core/pcm_lib.c commit 59ff878ffb26bc0be812ca8295799164f413ae88

Kelly Anderson wrote:
> OK, I debugged the problem and came up with a solution. The bottom line
> is that it's a problem with a signed to unsigned compare.

Aaargh, I didn't suspect this.

> First the fix:
>
> --- ./sound/core/pcm_lib.c.orig 2011-03-27 12:37:20.000000000 -0600
> +++ ./sound/core/pcm_lib.c 2011-03-31 19:01:35.392739127 -0600
> @@ -379,17 +379,17 @@ static int snd_pcm_update_hw_ptr0(struct
> * Without regular period interrupts, we have to check
> * the elapsed time to detect xruns.
> */
> - jdelta = jiffies - runtime->hw_ptr_jiffies;
> - if (jdelta < runtime->hw_ptr_buffer_jiffies / 2)
> + jdelta = (snd_pcm_sframes_t)(jiffies - runtime->hw_ptr_jiffies);
> + if (jdelta < (snd_pcm_sframes_t)runtime->hw_ptr_buffer_jiffies / 2)
> goto no_delta_check;
> hdelta = jdelta - delta * HZ / runtime->rate;
> - while (hdelta > runtime->hw_ptr_buffer_jiffies / 2 + 1) {
> + while (hdelta > (snd_pcm_sframes_t)runtime->hw_ptr_buffer_jiffies / 2 + 1) {
> delta += runtime->buffer_size;
> hw_base += runtime->buffer_size;
> if (hw_base >= runtime->boundary)
> hw_base = 0;
> new_hw_ptr = hw_base + pos;
> - hdelta -= runtime->hw_ptr_buffer_jiffies;
> + hdelta -= (snd_pcm_sframes_t)runtime->hw_ptr_buffer_jiffies;
> }
> goto no_delta_check;
> }

Most of these casts have no effect; only the loop condition is
important. Please test the patch below.

Kelly, to apply this patch I need a Signed-off-by tag from you (see
Documentation/SubmittingPatches).

--8<---------------------------------------------------------------->8--
ALSA: pcm: fix infinite loop in snd_pcm_update_hw_ptr0()

When period interrupts are disabled, snd_pcm_update_hw_ptr0() compares
the current time against the time estimated for the current hardware
pointer to detect xruns. The somewhat fuzzy threshold in the while loop
makes it possible that hdelta becomes negative; the comparison being
done with unsigned types then makes the loop go through the entire 2^63
negative range, and, depending on the value, never reaching an unsigned
value that is small enough to stop the loop. Doing this with interrupts
disabled results in the machine locking up.

To prevent this, ensure that the loop condition uses signed types for
both operands so that the comparison is correctly done.

Many thanks to Kelly Anderson for debugging this.

Reported-by: Nix <[email protected]>
Reported-by: "Christopher K." <[email protected]>
Reported-by: Kelly Anderson <[email protected]>
[cl: remove unneeded casts; use a temp variable]
Signed-off-by: Clemens Ladisch <[email protected]>

--- a/sound/core/pcm_lib.c
+++ b/sound/core/pcm_lib.c
@@ -375,6 +375,7 @@ static int snd_pcm_update_hw_ptr0(struct snd_pcm_substream *substream,
}

if (runtime->no_period_wakeup) {
+ snd_pcm_sframes_t xrun_threshold;
/*
* Without regular period interrupts, we have to check
* the elapsed time to detect xruns.
@@ -383,7 +384,8 @@ static int snd_pcm_update_hw_ptr0(struct snd_pcm_substream *substream,
if (jdelta < runtime->hw_ptr_buffer_jiffies / 2)
goto no_delta_check;
hdelta = jdelta - delta * HZ / runtime->rate;
+ xrun_threshold = runtime->hw_ptr_buffer_jiffies / 2 + 1;
- while (hdelta > runtime->hw_ptr_buffer_jiffies / 2 + 1) {
+ while (hdelta > xrun_threshold) {
delta += runtime->buffer_size;
hw_base += runtime->buffer_size;
if (hw_base >= runtime->boundary)

2011-04-01 09:14:39

by Kelly Anderson

[permalink] [raw]
Subject: Re: Linux 2.6.38 freeze because of sound/core/pcm_lib.c commit 59ff878ffb26bc0be812ca8295799164f413ae88

On 04/01/11 01:47, Clemens Ladisch wrote:
>
> Kelly, to apply this patch I need a Signed-off-by tag from you (see
> Documentation/SubmittingPatches).
It works fine Clemens.


--8<---------------------------------------------------------------->8--

ALSA: pcm: fix infinite loop in snd_pcm_update_hw_ptr0()

When period interrupts are disabled, snd_pcm_update_hw_ptr0() compares
the current time against the time estimated for the current hardware
pointer to detect xruns. The somewhat fuzzy threshold in the while loop
makes it possible that hdelta becomes negative; the comparison being
done with unsigned types then makes the loop go through the entire 263
negative range, and, depending on the value, never reaching an unsigned
value that is small enough to stop the loop. Doing this with interrupts
disabled results in the machine locking up.

To prevent this, ensure that the loop condition uses signed types for
both operands so that the comparison is correctly done.

Many thanks to Kelly Anderson for debugging this.

Reported-by: Nix <[email protected]>
Reported-by: "Christopher K." <[email protected]>
Reported-by: Kelly Anderson <[email protected]>
[cl: remove unneeded casts; use a temp variable]
Signed-off-by: Clemens Ladisch <[email protected]>
Signed-off-by: Kelly Anderson <[email protected]>

--- a/sound/core/pcm_lib.c
+++ b/sound/core/pcm_lib.c
@@ -375,6 +375,7 @@ static int snd_pcm_update_hw_ptr0(struct
snd_pcm_substream *substream,
}

if (runtime->no_period_wakeup) {
+ snd_pcm_sframes_t xrun_threshold;
/*
* Without regular period interrupts, we have to check
* the elapsed time to detect xruns.
@@ -383,7 +384,8 @@ static int snd_pcm_update_hw_ptr0(struct
snd_pcm_substream *substream,
if (jdelta < runtime->hw_ptr_buffer_jiffies / 2)
goto no_delta_check;
hdelta = jdelta - delta * HZ / runtime->rate;
+ xrun_threshold = runtime->hw_ptr_buffer_jiffies / 2 + 1;
- while (hdelta > runtime->hw_ptr_buffer_jiffies / 2 + 1) {
+ while (hdelta > xrun_threshold) {
delta += runtime->buffer_size;
hw_base += runtime->buffer_size;
if (hw_base >= runtime->boundary)
--