2017-12-14 09:57:34

by Pavel Machek

[permalink] [raw]
Subject: Re: thinkpad x60: sound problems in 4.15-rc1 was Re: thinkpad x60: sound problems in 4.14.0-next-20171114

On Tue 2017-11-28 08:00:20, Takashi Iwai wrote:
> On Mon, 27 Nov 2017 19:44:12 +0100,
> Pavel Machek wrote:
> >
> > On Mon 2017-11-27 19:35:32, Takashi Iwai wrote:
> > > On Mon, 27 Nov 2017 19:31:51 +0100,
> > > Pavel Machek wrote:
> > > >
> > > > On Mon 2017-11-27 17:33:28, Takashi Iwai wrote:
> > > > > On Mon, 27 Nov 2017 17:30:11 +0100,
> > > > > Pavel Machek wrote:
> > > > > >
> > > > > > On Wed 2017-11-15 12:11:20, Pavel Machek wrote:
> > > > > > > On Wed 2017-11-15 11:43:34, Takashi Iwai wrote:
> > > > > > > > On Wed, 15 Nov 2017 11:05:33 +0100,
> > > > > > > > Pavel Machek wrote:
> > > > > > > > >
> > > > > > > > > Hi!
> > > > > > > > >
> > > > > > > > > There are some sound problems in 4.14.0-next-20171114:
> > > > > > > > >
> > > > > > > > > mplayer shows pictures from video, but does not play.
> > > > > > > > >
> > > > > > > > > vlc plays video, but w/o sound
> > > > > > > > >
> > > > > > > > > mpg123 works.
> > > > > > > > >
> > > > > > > > > Hw is thinkpad X60. Any ideas?
> > > > > > > >
> > > > > > > > Nothing comes to my mind for now, sorry.
> > > > > > > >
> > > > > > > > Is it a regression, right? There are only few changes in both
> > > > > > > > HD-audio and ALSA core sides, and they should be fairly harmless.
> > > > > > >
> > > > > > > Regression: yes. Reproducible: not sure. It went away after a reboot
> > > > > > > :-(.
> > > > > >
> > > > > > Reappeared, 4.15-rc1.
> > > > > >
> > > > > > [ 40.473822] PM: suspend exit
> > > > > > [ 40.526027] sdhci-pci 0000:15:00.2: Will use DMA mode even though
> > > > > > HW doesn't fully claim to support it.
> > > > > > [ 40.569765] e1000e: eth1 NIC Link is Down
> > > > > > [ 40.578257] sdhci-pci 0000:15:00.2: Will use DMA mode even though
> > > > > > HW doesn't fully claim to support it.
> > > > > > [ 40.648476] sdhci-pci 0000:15:00.2: Will use DMA mode even though
> > > > > > HW doesn't fully claim to support it.
> > > > > > [ 40.737339] sdhci-pci 0000:15:00.2: Will use DMA mode even though
> > > > > > HW doesn't fully claim to support it.
> > > > > > [ 43.018955] wlan0: authenticate with 00:00:00:00:00:01
> > > > > > [ 43.019072] wlan0: send auth to 00:00:00:00:00:01 (try 1/3)
> > > > > > [ 43.023955] wlan0: authenticated
> > > > > > [ 43.031721] wlan0: associate with 00:00:00:00:00:01 (try 1/3)
> > > > > > [ 43.039733] wlan0: RX AssocResp from 00:00:00:00:00:01 (capab=0x401
> > > > > > status=0 aid=1)
> > > > > > [ 43.042712] wlan0: associated
> > > > > > [ 480.662456] snd_hda_intel 0000:00:1b.0: IRQ timing workaround is
> > > > > > activated for card #0. Suggest a bigger bdl_pos_adj.
> > > > >
> > > > > This message is often superfluous, so don't take this too seriously.
> > > > >
> > > > >
> > > > > > pavel@amd:~$
> > > > > >
> > > > > > Again, mplayer has problems, mpg123 works. This time mplayer started
> > > > > > playing video (w/o sound) after long delay.
> > > > > >
> > > > > > Uh. huh. And now problems appeared in mpg123, too, and then went away
> > > > > > in mpg123 _and_ mplayer. Interesting.
> > > > > >
> > > > > > I suspect some pulseaudio fun. chromium always has sound problems,
> > > > > > then I restart chromium and everything is ok. But something changed in
> > > > > > -next and 4.15-rc1, because mplayer did not have problems before.
> > > > >
> > > > > Hm, there is no code change at all in sound/*. If it happens only in
> > > > > linux-next, it must be something else...
> > > >
> > > > It happened first in -next, now it is in 4.15-rc1.
> > >
> > > So you meant a possible regression between 4.14 and 4.15-rc1?
> >
> > Yes.
>
> Hm, as far as I see, the only significant difference is the commit
> 20e3f985bb875fea4f86b04eba4b6cc29bfd6b71
> ALSA: pcm: update tstamp only if audio_tstamp changed
>
> Another change d6c0615f510bc1ee26cfb2b9a3343ac99b9c46fb
> ALSA: hda - Fix yet remaining issue with vmaster 0dB
> initialization
> is basically for fixing a previous wrong fix, and it should influence
> on all use cases, not only for a specific application.

Happened again, this time on -rc3. It is more than "audio is silent"
-- apps behave strangely. Let me test with
20e3f985bb875fea4f86b04eba4b6cc29bfd6b71 reverted.

Hmm. This is 4th regression this release cycle :-(.

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (4.33 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments

2017-12-19 01:59:28

by Vito Caputo

[permalink] [raw]
Subject: Re: thinkpad x60: sound problems in 4.15-rc1 was Re: thinkpad x60: sound problems in 4.14.0-next-20171114

On Thu, Dec 14, 2017 at 10:57:30AM +0100, Pavel Machek wrote:
> On Tue 2017-11-28 08:00:20, Takashi Iwai wrote:
> > On Mon, 27 Nov 2017 19:44:12 +0100,
> > Pavel Machek wrote:
> > >
> > > On Mon 2017-11-27 19:35:32, Takashi Iwai wrote:
> > > > On Mon, 27 Nov 2017 19:31:51 +0100,
> > > > Pavel Machek wrote:
> > > > >
> > > > > On Mon 2017-11-27 17:33:28, Takashi Iwai wrote:
> > > > > > On Mon, 27 Nov 2017 17:30:11 +0100,
> > > > > > Pavel Machek wrote:
> > > > > > >
> > > > > > > On Wed 2017-11-15 12:11:20, Pavel Machek wrote:
> > > > > > > > On Wed 2017-11-15 11:43:34, Takashi Iwai wrote:
> > > > > > > > > On Wed, 15 Nov 2017 11:05:33 +0100,
> > > > > > > > > Pavel Machek wrote:
> > > > > > > > > >
> > > > > > > > > > Hi!
> > > > > > > > > >
> > > > > > > > > > There are some sound problems in 4.14.0-next-20171114:
> > > > > > > > > >
> > > > > > > > > > mplayer shows pictures from video, but does not play.
> > > > > > > > > >
> > > > > > > > > > vlc plays video, but w/o sound
> > > > > > > > > >
> > > > > > > > > > mpg123 works.
> > > > > > > > > >
> > > > > > > > > > Hw is thinkpad X60. Any ideas?
> > > > > > > > >
> > > > > > > > > Nothing comes to my mind for now, sorry.
> > > > > > > > >
> > > > > > > > > Is it a regression, right? There are only few changes in both
> > > > > > > > > HD-audio and ALSA core sides, and they should be fairly harmless.
> > > > > > > >
> > > > > > > > Regression: yes. Reproducible: not sure. It went away after a reboot
> > > > > > > > :-(.
> > > > > > >
> > > > > > > Reappeared, 4.15-rc1.
> > > > > > >
> > > > > > > [ 40.473822] PM: suspend exit
> > > > > > > [ 40.526027] sdhci-pci 0000:15:00.2: Will use DMA mode even though
> > > > > > > HW doesn't fully claim to support it.
> > > > > > > [ 40.569765] e1000e: eth1 NIC Link is Down
> > > > > > > [ 40.578257] sdhci-pci 0000:15:00.2: Will use DMA mode even though
> > > > > > > HW doesn't fully claim to support it.
> > > > > > > [ 40.648476] sdhci-pci 0000:15:00.2: Will use DMA mode even though
> > > > > > > HW doesn't fully claim to support it.
> > > > > > > [ 40.737339] sdhci-pci 0000:15:00.2: Will use DMA mode even though
> > > > > > > HW doesn't fully claim to support it.
> > > > > > > [ 43.018955] wlan0: authenticate with 00:00:00:00:00:01
> > > > > > > [ 43.019072] wlan0: send auth to 00:00:00:00:00:01 (try 1/3)
> > > > > > > [ 43.023955] wlan0: authenticated
> > > > > > > [ 43.031721] wlan0: associate with 00:00:00:00:00:01 (try 1/3)
> > > > > > > [ 43.039733] wlan0: RX AssocResp from 00:00:00:00:00:01 (capab=0x401
> > > > > > > status=0 aid=1)
> > > > > > > [ 43.042712] wlan0: associated
> > > > > > > [ 480.662456] snd_hda_intel 0000:00:1b.0: IRQ timing workaround is
> > > > > > > activated for card #0. Suggest a bigger bdl_pos_adj.
> > > > > >
> > > > > > This message is often superfluous, so don't take this too seriously.
> > > > > >
> > > > > >
> > > > > > > pavel@amd:~$
> > > > > > >
> > > > > > > Again, mplayer has problems, mpg123 works. This time mplayer started
> > > > > > > playing video (w/o sound) after long delay.
> > > > > > >
> > > > > > > Uh. huh. And now problems appeared in mpg123, too, and then went away
> > > > > > > in mpg123 _and_ mplayer. Interesting.
> > > > > > >
> > > > > > > I suspect some pulseaudio fun. chromium always has sound problems,
> > > > > > > then I restart chromium and everything is ok. But something changed in
> > > > > > > -next and 4.15-rc1, because mplayer did not have problems before.
> > > > > >
> > > > > > Hm, there is no code change at all in sound/*. If it happens only in
> > > > > > linux-next, it must be something else...
> > > > >
> > > > > It happened first in -next, now it is in 4.15-rc1.
> > > >
> > > > So you meant a possible regression between 4.14 and 4.15-rc1?
> > >
> > > Yes.
> >
> > Hm, as far as I see, the only significant difference is the commit
> > 20e3f985bb875fea4f86b04eba4b6cc29bfd6b71
> > ALSA: pcm: update tstamp only if audio_tstamp changed
> >
> > Another change d6c0615f510bc1ee26cfb2b9a3343ac99b9c46fb
> > ALSA: hda - Fix yet remaining issue with vmaster 0dB
> > initialization
> > is basically for fixing a previous wrong fix, and it should influence
> > on all use cases, not only for a specific application.
>
> Happened again, this time on -rc3. It is more than "audio is silent"
> -- apps behave strangely. Let me test with
> 20e3f985bb875fea4f86b04eba4b6cc29bfd6b71 reverted.
>
> Hmm. This is 4th regression this release cycle :-(.


Today I jumped to 4.15-rc4 from 4.14-rc6, and have noticed some oddities
with audio in youtube under firefox which I never experienced before.

If I pause the playback, the audio seems to infinitely loop on whatever
is in the dma buffer. Resuming playback works but now the expected
audio has repeated pops and clicks mixed in with it.

Even closing firefox doesn't seem to stop the looping buffer...

Machine is an x61s 1.8ghz thinkpad, x86_64, debian stretch, .config attached.

This for me is a 4.15 blocker, and I presume it's related to Pavel's
experience as the x60 isn't much different AFAIK.

Regards,
Vito Caputo


Attachments:
(No filename) (5.04 kB)
v4.15-rc6.config (108.05 kB)
Download all attachments

2017-12-19 04:46:40

by Vito Caputo

[permalink] [raw]
Subject: Re: thinkpad x60: sound problems in 4.15-rc1 was Re: thinkpad x60: sound problems in 4.14.0-next-20171114

On Mon, Dec 18, 2017 at 06:06:58PM -0800, [email protected] wrote:
> On Thu, Dec 14, 2017 at 10:57:30AM +0100, Pavel Machek wrote:
> > On Tue 2017-11-28 08:00:20, Takashi Iwai wrote:
> > > On Mon, 27 Nov 2017 19:44:12 +0100,
> > > Pavel Machek wrote:
> > > >
> > > > On Mon 2017-11-27 19:35:32, Takashi Iwai wrote:
> > > > > On Mon, 27 Nov 2017 19:31:51 +0100,
> > > > > Pavel Machek wrote:
> > > > > >
> > > > > > On Mon 2017-11-27 17:33:28, Takashi Iwai wrote:
> > > > > > > On Mon, 27 Nov 2017 17:30:11 +0100,
> > > > > > > Pavel Machek wrote:
> > > > > > > >
> > > > > > > > On Wed 2017-11-15 12:11:20, Pavel Machek wrote:
> > > > > > > > > On Wed 2017-11-15 11:43:34, Takashi Iwai wrote:
> > > > > > > > > > On Wed, 15 Nov 2017 11:05:33 +0100,
> > > > > > > > > > Pavel Machek wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hi!
> > > > > > > > > > >
> > > > > > > > > > > There are some sound problems in 4.14.0-next-20171114:
> > > > > > > > > > >
> > > > > > > > > > > mplayer shows pictures from video, but does not play.
> > > > > > > > > > >
> > > > > > > > > > > vlc plays video, but w/o sound
> > > > > > > > > > >
> > > > > > > > > > > mpg123 works.
> > > > > > > > > > >
> > > > > > > > > > > Hw is thinkpad X60. Any ideas?
> > > > > > > > > >
> > > > > > > > > > Nothing comes to my mind for now, sorry.
> > > > > > > > > >
> > > > > > > > > > Is it a regression, right? There are only few changes in both
> > > > > > > > > > HD-audio and ALSA core sides, and they should be fairly harmless.
> > > > > > > > >
> > > > > > > > > Regression: yes. Reproducible: not sure. It went away after a reboot
> > > > > > > > > :-(.
> > > > > > > >
> > > > > > > > Reappeared, 4.15-rc1.
> > > > > > > >
> > > > > > > > [ 40.473822] PM: suspend exit
> > > > > > > > [ 40.526027] sdhci-pci 0000:15:00.2: Will use DMA mode even though
> > > > > > > > HW doesn't fully claim to support it.
> > > > > > > > [ 40.569765] e1000e: eth1 NIC Link is Down
> > > > > > > > [ 40.578257] sdhci-pci 0000:15:00.2: Will use DMA mode even though
> > > > > > > > HW doesn't fully claim to support it.
> > > > > > > > [ 40.648476] sdhci-pci 0000:15:00.2: Will use DMA mode even though
> > > > > > > > HW doesn't fully claim to support it.
> > > > > > > > [ 40.737339] sdhci-pci 0000:15:00.2: Will use DMA mode even though
> > > > > > > > HW doesn't fully claim to support it.
> > > > > > > > [ 43.018955] wlan0: authenticate with 00:00:00:00:00:01
> > > > > > > > [ 43.019072] wlan0: send auth to 00:00:00:00:00:01 (try 1/3)
> > > > > > > > [ 43.023955] wlan0: authenticated
> > > > > > > > [ 43.031721] wlan0: associate with 00:00:00:00:00:01 (try 1/3)
> > > > > > > > [ 43.039733] wlan0: RX AssocResp from 00:00:00:00:00:01 (capab=0x401
> > > > > > > > status=0 aid=1)
> > > > > > > > [ 43.042712] wlan0: associated
> > > > > > > > [ 480.662456] snd_hda_intel 0000:00:1b.0: IRQ timing workaround is
> > > > > > > > activated for card #0. Suggest a bigger bdl_pos_adj.
> > > > > > >
> > > > > > > This message is often superfluous, so don't take this too seriously.
> > > > > > >
> > > > > > >
> > > > > > > > pavel@amd:~$
> > > > > > > >
> > > > > > > > Again, mplayer has problems, mpg123 works. This time mplayer started
> > > > > > > > playing video (w/o sound) after long delay.
> > > > > > > >
> > > > > > > > Uh. huh. And now problems appeared in mpg123, too, and then went away
> > > > > > > > in mpg123 _and_ mplayer. Interesting.
> > > > > > > >
> > > > > > > > I suspect some pulseaudio fun. chromium always has sound problems,
> > > > > > > > then I restart chromium and everything is ok. But something changed in
> > > > > > > > -next and 4.15-rc1, because mplayer did not have problems before.
> > > > > > >
> > > > > > > Hm, there is no code change at all in sound/*. If it happens only in
> > > > > > > linux-next, it must be something else...
> > > > > >
> > > > > > It happened first in -next, now it is in 4.15-rc1.
> > > > >
> > > > > So you meant a possible regression between 4.14 and 4.15-rc1?
> > > >
> > > > Yes.
> > >
> > > Hm, as far as I see, the only significant difference is the commit
> > > 20e3f985bb875fea4f86b04eba4b6cc29bfd6b71
> > > ALSA: pcm: update tstamp only if audio_tstamp changed
> > >
> > > Another change d6c0615f510bc1ee26cfb2b9a3343ac99b9c46fb
> > > ALSA: hda - Fix yet remaining issue with vmaster 0dB
> > > initialization
> > > is basically for fixing a previous wrong fix, and it should influence
> > > on all use cases, not only for a specific application.
> >
> > Happened again, this time on -rc3. It is more than "audio is silent"
> > -- apps behave strangely. Let me test with
> > 20e3f985bb875fea4f86b04eba4b6cc29bfd6b71 reverted.
> >
> > Hmm. This is 4th regression this release cycle :-(.
>
>
> Today I jumped to 4.15-rc4 from 4.14-rc6, and have noticed some oddities
> with audio in youtube under firefox which I never experienced before.
>
> If I pause the playback, the audio seems to infinitely loop on whatever
> is in the dma buffer. Resuming playback works but now the expected
> audio has repeated pops and clicks mixed in with it.
>
> Even closing firefox doesn't seem to stop the looping buffer...
>
> Machine is an x61s 1.8ghz thinkpad, x86_64, debian stretch, .config attached.
>
> This for me is a 4.15 blocker, and I presume it's related to Pavel's
> experience as the x60 isn't much different AFAIK.
>

Just reproduced this, it seems to be trivial to repro and doesn't
actually require pausing or anything. Simply watching a youtube video
causes the audio to get messed up after a short period.

I monitored `journalctl --dmesg --follow` while reproducing this and saw
this line appear at the very moment the audio got messed up:

kernel: Monitor-Mwait will be used to enter C-3 state

Prior to that, everything seemed to be owrking fine.

Regards,
Vito Caputo

2017-12-19 23:05:22

by Vito Caputo

[permalink] [raw]
Subject: Re: thinkpad x60: sound problems in 4.15-rc1 was Re: thinkpad x60: sound problems in 4.14.0-next-20171114

On Mon, Dec 18, 2017 at 08:54:15PM -0800, [email protected] wrote:
> On Mon, Dec 18, 2017 at 06:06:58PM -0800, [email protected] wrote:
> > On Thu, Dec 14, 2017 at 10:57:30AM +0100, Pavel Machek wrote:
> > > On Tue 2017-11-28 08:00:20, Takashi Iwai wrote:
> > > > On Mon, 27 Nov 2017 19:44:12 +0100,
> > > > Pavel Machek wrote:
> > > > >
> > > > > On Mon 2017-11-27 19:35:32, Takashi Iwai wrote:
> > > > > > On Mon, 27 Nov 2017 19:31:51 +0100,
> > > > > > Pavel Machek wrote:
> > > > > > >
> > > > > > > On Mon 2017-11-27 17:33:28, Takashi Iwai wrote:
> > > > > > > > On Mon, 27 Nov 2017 17:30:11 +0100,
> > > > > > > > Pavel Machek wrote:
> > > > > > > > >
> > > > > > > > > On Wed 2017-11-15 12:11:20, Pavel Machek wrote:
> > > > > > > > > > On Wed 2017-11-15 11:43:34, Takashi Iwai wrote:
> > > > > > > > > > > On Wed, 15 Nov 2017 11:05:33 +0100,
> > > > > > > > > > > Pavel Machek wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi!
> > > > > > > > > > > >
> > > > > > > > > > > > There are some sound problems in 4.14.0-next-20171114:
> > > > > > > > > > > >
> > > > > > > > > > > > mplayer shows pictures from video, but does not play.
> > > > > > > > > > > >
> > > > > > > > > > > > vlc plays video, but w/o sound
> > > > > > > > > > > >
> > > > > > > > > > > > mpg123 works.
> > > > > > > > > > > >
> > > > > > > > > > > > Hw is thinkpad X60. Any ideas?
> > > > > > > > > > >
> > > > > > > > > > > Nothing comes to my mind for now, sorry.
> > > > > > > > > > >
> > > > > > > > > > > Is it a regression, right? There are only few changes in both
> > > > > > > > > > > HD-audio and ALSA core sides, and they should be fairly harmless.
> > > > > > > > > >
> > > > > > > > > > Regression: yes. Reproducible: not sure. It went away after a reboot
> > > > > > > > > > :-(.
> > > > > > > > >
> > > > > > > > > Reappeared, 4.15-rc1.
> > > > > > > > >
> > > > > > > > > [ 40.473822] PM: suspend exit
> > > > > > > > > [ 40.526027] sdhci-pci 0000:15:00.2: Will use DMA mode even though
> > > > > > > > > HW doesn't fully claim to support it.
> > > > > > > > > [ 40.569765] e1000e: eth1 NIC Link is Down
> > > > > > > > > [ 40.578257] sdhci-pci 0000:15:00.2: Will use DMA mode even though
> > > > > > > > > HW doesn't fully claim to support it.
> > > > > > > > > [ 40.648476] sdhci-pci 0000:15:00.2: Will use DMA mode even though
> > > > > > > > > HW doesn't fully claim to support it.
> > > > > > > > > [ 40.737339] sdhci-pci 0000:15:00.2: Will use DMA mode even though
> > > > > > > > > HW doesn't fully claim to support it.
> > > > > > > > > [ 43.018955] wlan0: authenticate with 00:00:00:00:00:01
> > > > > > > > > [ 43.019072] wlan0: send auth to 00:00:00:00:00:01 (try 1/3)
> > > > > > > > > [ 43.023955] wlan0: authenticated
> > > > > > > > > [ 43.031721] wlan0: associate with 00:00:00:00:00:01 (try 1/3)
> > > > > > > > > [ 43.039733] wlan0: RX AssocResp from 00:00:00:00:00:01 (capab=0x401
> > > > > > > > > status=0 aid=1)
> > > > > > > > > [ 43.042712] wlan0: associated
> > > > > > > > > [ 480.662456] snd_hda_intel 0000:00:1b.0: IRQ timing workaround is
> > > > > > > > > activated for card #0. Suggest a bigger bdl_pos_adj.
> > > > > > > >
> > > > > > > > This message is often superfluous, so don't take this too seriously.
> > > > > > > >
> > > > > > > >
> > > > > > > > > pavel@amd:~$
> > > > > > > > >
> > > > > > > > > Again, mplayer has problems, mpg123 works. This time mplayer started
> > > > > > > > > playing video (w/o sound) after long delay.
> > > > > > > > >
> > > > > > > > > Uh. huh. And now problems appeared in mpg123, too, and then went away
> > > > > > > > > in mpg123 _and_ mplayer. Interesting.
> > > > > > > > >
> > > > > > > > > I suspect some pulseaudio fun. chromium always has sound problems,
> > > > > > > > > then I restart chromium and everything is ok. But something changed in
> > > > > > > > > -next and 4.15-rc1, because mplayer did not have problems before.
> > > > > > > >
> > > > > > > > Hm, there is no code change at all in sound/*. If it happens only in
> > > > > > > > linux-next, it must be something else...
> > > > > > >
> > > > > > > It happened first in -next, now it is in 4.15-rc1.
> > > > > >
> > > > > > So you meant a possible regression between 4.14 and 4.15-rc1?
> > > > >
> > > > > Yes.
> > > >
> > > > Hm, as far as I see, the only significant difference is the commit
> > > > 20e3f985bb875fea4f86b04eba4b6cc29bfd6b71
> > > > ALSA: pcm: update tstamp only if audio_tstamp changed
> > > >
> > > > Another change d6c0615f510bc1ee26cfb2b9a3343ac99b9c46fb
> > > > ALSA: hda - Fix yet remaining issue with vmaster 0dB
> > > > initialization
> > > > is basically for fixing a previous wrong fix, and it should influence
> > > > on all use cases, not only for a specific application.
> > >
> > > Happened again, this time on -rc3. It is more than "audio is silent"
> > > -- apps behave strangely. Let me test with
> > > 20e3f985bb875fea4f86b04eba4b6cc29bfd6b71 reverted.
> > >
> > > Hmm. This is 4th regression this release cycle :-(.
> >
> >
> > Today I jumped to 4.15-rc4 from 4.14-rc6, and have noticed some oddities
> > with audio in youtube under firefox which I never experienced before.
> >
> > If I pause the playback, the audio seems to infinitely loop on whatever
> > is in the dma buffer. Resuming playback works but now the expected
> > audio has repeated pops and clicks mixed in with it.
> >
> > Even closing firefox doesn't seem to stop the looping buffer...
> >
> > Machine is an x61s 1.8ghz thinkpad, x86_64, debian stretch, .config attached.
> >
> > This for me is a 4.15 blocker, and I presume it's related to Pavel's
> > experience as the x60 isn't much different AFAIK.
> >
>
> Just reproduced this, it seems to be trivial to repro and doesn't
> actually require pausing or anything. Simply watching a youtube video
> causes the audio to get messed up after a short period.
>
> I monitored `journalctl --dmesg --follow` while reproducing this and saw
> this line appear at the very moment the audio got messed up:
>
> kernel: Monitor-Mwait will be used to enter C-3 state
>
> Prior to that, everything seemed to be owrking fine.
>

After a lengthy bisect, I ended up with this being the bad commit:

> Regards,
> Vito Caputo


Have not tested a revert yet, but it's reliably reproducible.

I mess up the audio by pulling the AC power while listening to music, causing
things to run from battery.

Addressed to tglx since it's his commit...

Regards,
Vito Caputo

2017-12-19 23:22:16

by Pavel Machek

[permalink] [raw]
Subject: Re: thinkpad x60: sound problems in 4.15-rc1 was Re: thinkpad x60: sound problems in 4.14.0-next-20171114

Hi!

> > > > > > > > > > Reappeared, 4.15-rc1.
> > > > > > > > > >
> > > > > > > > > > [ 40.473822] PM: suspend exit
> > > > > > > > > > [ 40.526027] sdhci-pci 0000:15:00.2: Will use DMA mode even though
> > > > > > > > > > HW doesn't fully claim to support it.
> > > > > > > > > > [ 40.569765] e1000e: eth1 NIC Link is Down
> > > > > > > > > > [ 40.578257] sdhci-pci 0000:15:00.2: Will use DMA mode even though
> > > > > > > > > > HW doesn't fully claim to support it.
> > > > > > > > > > [ 40.648476] sdhci-pci 0000:15:00.2: Will use DMA mode even though
> > > > > > > > > > HW doesn't fully claim to support it.
> > > > > > > > > > [ 40.737339] sdhci-pci 0000:15:00.2: Will use DMA mode even though
> > > > > > > > > > HW doesn't fully claim to support it.
> > > > > > > > > > [ 43.018955] wlan0: authenticate with 00:00:00:00:00:01
> > > > > > > > > > [ 43.019072] wlan0: send auth to 00:00:00:00:00:01 (try 1/3)
> > > > > > > > > > [ 43.023955] wlan0: authenticated
> > > > > > > > > > [ 43.031721] wlan0: associate with 00:00:00:00:00:01 (try 1/3)
> > > > > > > > > > [ 43.039733] wlan0: RX AssocResp from 00:00:00:00:00:01 (capab=0x401
> > > > > > > > > > status=0 aid=1)
> > > > > > > > > > [ 43.042712] wlan0: associated
> > > > > > > > > > [ 480.662456] snd_hda_intel 0000:00:1b.0: IRQ timing workaround is
> > > > > > > > > > activated for card #0. Suggest a bigger bdl_pos_adj.
> > > > > > > > >
> > > > > > > > > This message is often superfluous, so don't take this too seriously.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > pavel@amd:~$
> > > > > > > > > >
> > > > > > > > > > Again, mplayer has problems, mpg123 works. This time mplayer started
> > > > > > > > > > playing video (w/o sound) after long delay.
> > > > > > > > > >
> > > > > > > > > > Uh. huh. And now problems appeared in mpg123, too, and then went away
> > > > > > > > > > in mpg123 _and_ mplayer. Interesting.
> > > > > > > > > >
> > > > > > > > > > I suspect some pulseaudio fun. chromium always has sound problems,
> > > > > > > > > > then I restart chromium and everything is ok. But something changed in
> > > > > > > > > > -next and 4.15-rc1, because mplayer did not have problems before.
> > > > > > > > >
> > > > > > > > > Hm, there is no code change at all in sound/*. If it happens only in
> > > > > > > > > linux-next, it must be something else...
> > > > > > > >
> > > > > > > > It happened first in -next, now it is in 4.15-rc1.
> > > > > > >
> > > > > > > So you meant a possible regression between 4.14 and 4.15-rc1?
> > > > > >
> > > > > > Yes.
> > > > >
> > > > > Hm, as far as I see, the only significant difference is the commit
> > > > > 20e3f985bb875fea4f86b04eba4b6cc29bfd6b71
> > > > > ALSA: pcm: update tstamp only if audio_tstamp changed
> > > > >
> > > > > Another change d6c0615f510bc1ee26cfb2b9a3343ac99b9c46fb
> > > > > ALSA: hda - Fix yet remaining issue with vmaster 0dB
> > > > > initialization
> > > > > is basically for fixing a previous wrong fix, and it should influence
> > > > > on all use cases, not only for a specific application.
> > > >
> > > > Happened again, this time on -rc3. It is more than "audio is silent"
> > > > -- apps behave strangely. Let me test with
> > > > 20e3f985bb875fea4f86b04eba4b6cc29bfd6b71 reverted.
> > > >
> > > > Hmm. This is 4th regression this release cycle :-(.
> > >
> > >
> > > Today I jumped to 4.15-rc4 from 4.14-rc6, and have noticed some oddities
> > > with audio in youtube under firefox which I never experienced before.
> > >
> > > If I pause the playback, the audio seems to infinitely loop on whatever
> > > is in the dma buffer. Resuming playback works but now the expected
> > > audio has repeated pops and clicks mixed in with it.
> > >
> > > Even closing firefox doesn't seem to stop the looping buffer...
> > >
> > > Machine is an x61s 1.8ghz thinkpad, x86_64, debian stretch, .config attached.
> > >
> > > This for me is a 4.15 blocker, and I presume it's related to Pavel's
> > > experience as the x60 isn't much different AFAIK.
> > >
> >
> > Just reproduced this, it seems to be trivial to repro and doesn't
> > actually require pausing or anything. Simply watching a youtube video
> > causes the audio to get messed up after a short period.
> >
> > I monitored `journalctl --dmesg --follow` while reproducing this and saw
> > this line appear at the very moment the audio got messed up:
> >
> > kernel: Monitor-Mwait will be used to enter C-3 state
> >
> > Prior to that, everything seemed to be owrking fine.
> >
>
> After a lengthy bisect, I ended up with this being the bad commit:

You forgot to mention commit id :-).

> > Regards,
> > Vito Caputo
>
>
> Have not tested a revert yet, but it's reliably reproducible.
>
> I mess up the audio by pulling the AC power while listening to music, causing
> things to run from battery.
>
> Addressed to tglx since it's his commit...

Thanks for all the debugging!

Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (5.02 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments

2017-12-20 00:28:33

by Vito Caputo

[permalink] [raw]
Subject: Re: thinkpad x60: sound problems in 4.15-rc1 was Re: thinkpad x60: sound problems in 4.14.0-next-20171114

On Wed, Dec 20, 2017 at 12:22:12AM +0100, Pavel Machek wrote:
> Hi!
>
> > > > > > > > > > > Reappeared, 4.15-rc1.
> > > > > > > > > > >
> > > > > > > > > > > [ 40.473822] PM: suspend exit
> > > > > > > > > > > [ 40.526027] sdhci-pci 0000:15:00.2: Will use DMA mode even though
> > > > > > > > > > > HW doesn't fully claim to support it.
> > > > > > > > > > > [ 40.569765] e1000e: eth1 NIC Link is Down
> > > > > > > > > > > [ 40.578257] sdhci-pci 0000:15:00.2: Will use DMA mode even though
> > > > > > > > > > > HW doesn't fully claim to support it.
> > > > > > > > > > > [ 40.648476] sdhci-pci 0000:15:00.2: Will use DMA mode even though
> > > > > > > > > > > HW doesn't fully claim to support it.
> > > > > > > > > > > [ 40.737339] sdhci-pci 0000:15:00.2: Will use DMA mode even though
> > > > > > > > > > > HW doesn't fully claim to support it.
> > > > > > > > > > > [ 43.018955] wlan0: authenticate with 00:00:00:00:00:01
> > > > > > > > > > > [ 43.019072] wlan0: send auth to 00:00:00:00:00:01 (try 1/3)
> > > > > > > > > > > [ 43.023955] wlan0: authenticated
> > > > > > > > > > > [ 43.031721] wlan0: associate with 00:00:00:00:00:01 (try 1/3)
> > > > > > > > > > > [ 43.039733] wlan0: RX AssocResp from 00:00:00:00:00:01 (capab=0x401
> > > > > > > > > > > status=0 aid=1)
> > > > > > > > > > > [ 43.042712] wlan0: associated
> > > > > > > > > > > [ 480.662456] snd_hda_intel 0000:00:1b.0: IRQ timing workaround is
> > > > > > > > > > > activated for card #0. Suggest a bigger bdl_pos_adj.
> > > > > > > > > >
> > > > > > > > > > This message is often superfluous, so don't take this too seriously.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > pavel@amd:~$
> > > > > > > > > > >
> > > > > > > > > > > Again, mplayer has problems, mpg123 works. This time mplayer started
> > > > > > > > > > > playing video (w/o sound) after long delay.
> > > > > > > > > > >
> > > > > > > > > > > Uh. huh. And now problems appeared in mpg123, too, and then went away
> > > > > > > > > > > in mpg123 _and_ mplayer. Interesting.
> > > > > > > > > > >
> > > > > > > > > > > I suspect some pulseaudio fun. chromium always has sound problems,
> > > > > > > > > > > then I restart chromium and everything is ok. But something changed in
> > > > > > > > > > > -next and 4.15-rc1, because mplayer did not have problems before.
> > > > > > > > > >
> > > > > > > > > > Hm, there is no code change at all in sound/*. If it happens only in
> > > > > > > > > > linux-next, it must be something else...
> > > > > > > > >
> > > > > > > > > It happened first in -next, now it is in 4.15-rc1.
> > > > > > > >
> > > > > > > > So you meant a possible regression between 4.14 and 4.15-rc1?
> > > > > > >
> > > > > > > Yes.
> > > > > >
> > > > > > Hm, as far as I see, the only significant difference is the commit
> > > > > > 20e3f985bb875fea4f86b04eba4b6cc29bfd6b71
> > > > > > ALSA: pcm: update tstamp only if audio_tstamp changed
> > > > > >
> > > > > > Another change d6c0615f510bc1ee26cfb2b9a3343ac99b9c46fb
> > > > > > ALSA: hda - Fix yet remaining issue with vmaster 0dB
> > > > > > initialization
> > > > > > is basically for fixing a previous wrong fix, and it should influence
> > > > > > on all use cases, not only for a specific application.
> > > > >
> > > > > Happened again, this time on -rc3. It is more than "audio is silent"
> > > > > -- apps behave strangely. Let me test with
> > > > > 20e3f985bb875fea4f86b04eba4b6cc29bfd6b71 reverted.
> > > > >
> > > > > Hmm. This is 4th regression this release cycle :-(.
> > > >
> > > >
> > > > Today I jumped to 4.15-rc4 from 4.14-rc6, and have noticed some oddities
> > > > with audio in youtube under firefox which I never experienced before.
> > > >
> > > > If I pause the playback, the audio seems to infinitely loop on whatever
> > > > is in the dma buffer. Resuming playback works but now the expected
> > > > audio has repeated pops and clicks mixed in with it.
> > > >
> > > > Even closing firefox doesn't seem to stop the looping buffer...
> > > >
> > > > Machine is an x61s 1.8ghz thinkpad, x86_64, debian stretch, .config attached.
> > > >
> > > > This for me is a 4.15 blocker, and I presume it's related to Pavel's
> > > > experience as the x60 isn't much different AFAIK.
> > > >
> > >
> > > Just reproduced this, it seems to be trivial to repro and doesn't
> > > actually require pausing or anything. Simply watching a youtube video
> > > causes the audio to get messed up after a short period.
> > >
> > > I monitored `journalctl --dmesg --follow` while reproducing this and saw
> > > this line appear at the very moment the audio got messed up:
> > >
> > > kernel: Monitor-Mwait will be used to enter C-3 state
> > >
> > > Prior to that, everything seemed to be owrking fine.
> > >
> >
> > After a lengthy bisect, I ended up with this being the bad commit:
>
> You forgot to mention commit id :-).
>

That is very strange, anyhow:

commit fdba46ffb4c203b6e6794163493fd310f98bb4be
Author: Thomas Gleixner <[email protected]>
Date: Wed Sep 13 23:29:27 2017 +0200

x86/apic: Get rid of multi CPU affinity


Will try reverting soon, just a bit busy today out in the desert and the sun
is going down so my solar panel is useless.

Regards,
Vito Caputo

2017-12-20 00:33:53

by Thomas Gleixner

[permalink] [raw]
Subject: Re: thinkpad x60: sound problems in 4.15-rc1 was Re: thinkpad x60: sound problems in 4.14.0-next-20171114

On Tue, 19 Dec 2017, [email protected] wrote:
> On Wed, Dec 20, 2017 at 12:22:12AM +0100, Pavel Machek wrote:
> > You forgot to mention commit id :-).
> >
>
> That is very strange, anyhow:
>
> commit fdba46ffb4c203b6e6794163493fd310f98bb4be
> Author: Thomas Gleixner <[email protected]>
> Date: Wed Sep 13 23:29:27 2017 +0200
>
> x86/apic: Get rid of multi CPU affinity
>
>
> Will try reverting soon, just a bit busy today out in the desert and the sun
> is going down so my solar panel is useless.

The revert is not trivial.

What is the exact problem and how do you reproduce that?

Thanks,

tglx


2017-12-20 00:52:08

by Vito Caputo

[permalink] [raw]
Subject: Re: thinkpad x60: sound problems in 4.15-rc1 was Re: thinkpad x60: sound problems in 4.14.0-next-20171114

On Wed, Dec 20, 2017 at 01:33:45AM +0100, Thomas Gleixner wrote:
> On Tue, 19 Dec 2017, [email protected] wrote:
> > On Wed, Dec 20, 2017 at 12:22:12AM +0100, Pavel Machek wrote:
> > > You forgot to mention commit id :-).
> > >
> >
> > That is very strange, anyhow:
> >
> > commit fdba46ffb4c203b6e6794163493fd310f98bb4be
> > Author: Thomas Gleixner <[email protected]>
> > Date: Wed Sep 13 23:29:27 2017 +0200
> >
> > x86/apic: Get rid of multi CPU affinity
> >
> >
> > Will try reverting soon, just a bit busy today out in the desert and the sun
> > is going down so my solar panel is useless.
>
> The revert is not trivial.
>
> What is the exact problem and how do you reproduce that?
>

Dang.

Ostensibly the problem is audio playback looping what seems to be stuck in
a DMA buffer when I pause the audio.

I also saw once during the bisect on a 'bad' commit, a reproduction of
this, which hung everything doing IO until it the ata1 reset happened:

[ 36.606657] Monitor-Mwait will be used to enter C-3 state
[ 36.628663] do_IRQ: 0.35 No irq handler for vector
[ 37.875724] do_IRQ: 0.194 No irq handler for vector
[ 69.099090] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 69.099165] ata1.00: failed command: FLUSH CACHE EXT
[ 69.099211] ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 6
res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 69.099285] ata1.00: status: { DRDY }
[ 69.099309] ata1: hard resetting link
[ 69.406185] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 69.409255] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[ 69.409259] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[ 69.409261] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[ 69.409433] ata1.00: supports DRM functions and may not be fully accessible
[ 69.409819] ata1.00: NCQ Send/Recv Log not supported
[ 69.411644] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[ 69.411649] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[ 69.411654] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[ 69.411842] ata1.00: supports DRM functions and may not be fully accessible
[ 69.412242] ata1.00: NCQ Send/Recv Log not supported
[ 69.412719] ata1.00: configured for UDMA/133
[ 69.412721] ata1.00: retrying FLUSH 0xea Emask 0x4
[ 69.412964] ata1: EH complete

But that is more difficult to reproduce, it doesn't seem to happen
regularly. Infact, I thought that was independent from the audio problem
at first but now it's become clear they're all related.

The 'bad' commits always showed the 'do_IRQ: No irq handler for vector'
lines, sometimes they appear for the first time on the console when I
yanked the power cord while playing music in the quick repro cycle without
running X or anything.

What I've been doing to reproduce this is simply boot to multi-user.target,
login, run `cmus`, and play a song. Next pull the AC power from the
thinkpad x61s.

Immediately the audio gets messed up, pausing the audio doesn't pause it,
it just loops the last tiny buffer contents.

That's all I've got right now, but it doesn't seem to be limited to the
audio problem as shown by the ata1 reset. Also I've searched through all
my stored journals for anything resembling that ata1 problem, and there's
not a single occurrence going back ~300 boots across a handful of kernel
versions.

If you'd like to prep a patch for me to test, I'm happy to test, but I'm
not sure how online I'll be for the next 24 hours. It's some kind of
miracle we're communicating as-is, I'm in the middle of the damn desert but
somehow there's a cell signal.

Thanks,
Vito Caputo

2017-12-23 05:29:18

by Vito Caputo

[permalink] [raw]
Subject: Re: thinkpad x60: sound problems in 4.15-rc1 was Re: thinkpad x60: sound problems in 4.14.0-next-20171114

On Wed, Dec 20, 2017 at 01:33:45AM +0100, Thomas Gleixner wrote:
> On Tue, 19 Dec 2017, [email protected] wrote:
> > On Wed, Dec 20, 2017 at 12:22:12AM +0100, Pavel Machek wrote:
> > > You forgot to mention commit id :-).
> > >
> >
> > That is very strange, anyhow:
> >
> > commit fdba46ffb4c203b6e6794163493fd310f98bb4be
> > Author: Thomas Gleixner <[email protected]>
> > Date: Wed Sep 13 23:29:27 2017 +0200
> >
> > x86/apic: Get rid of multi CPU affinity
> >
> >
> > Will try reverting soon, just a bit busy today out in the desert and the sun
> > is going down so my solar panel is useless.
>
> The revert is not trivial.
>
> What is the exact problem and how do you reproduce that?
>
> Thanks,
>

So I had some time today to poke at this some more. Since it looks to
be easily reproduced by simply pulling the AC power while playing music
or doing IO, and dmesg clearly reports using mwait, I tried booting with
idle=nomwait to see if that made any difference. It didn't, the same
thing still occurs.

In trying to make sense of this totally unfamiliar apic code and better
understand these changes, I came across this comment which seemed a bit
telling:

40 void flat_vector_allocation_domain(int cpu, struct cpumask *retmask,
41 const struct cpumask *mask)
42 {
43 /*
44 * Careful. Some cpus do not strictly honor the set of cpus
45 * specified in the interrupt destination when using lowest
46 * priority interrupt delivery mode.
47 *
48 * In particular there was a hyperthreading cpu observed to
49 * deliver interrupts to the wrong hyperthread when only one
50 * hyperthread was specified in the interrupt desitination.
51 */
52 cpumask_clear(retmask);
53 cpumask_bits(retmask)[0] = APIC_ALL_CPUS;
54 }

It's this allocation domain mask hook which has been bypassed by the
offending commit. The existing approach is more robust in the face of
relaxed adherence to destination cpumasks since it's all-inclusive,
whereas the new code is exclusive to a specific cpu.

Is it possible what I'm observing is just another manifestation of
what's being described in that comment? This is a core 2 duo, so not
hyper-threaded. But maybe something funny happens when switching
cstates in response to interrupts - like maybe the wrong cpu can be used
if it can save power vs. powering up another? Just thinking out loud
here.

In any case, 4.15-rc4 is quite unusable on my machine because of this.

Pavel, do you observe the same behavior on your x60, WRT AC power?

I've dropped Takashi from the CC list as this pretty clearly isn't a
sound-specific problem.

Thanks,
Vito Caputo

2017-12-23 20:30:02

by Vito Caputo

[permalink] [raw]
Subject: Re: thinkpad x60: sound problems in 4.15-rc1 was Re: thinkpad x60: sound problems in 4.14.0-next-20171114

On Fri, Dec 22, 2017 at 09:37:01PM -0800, [email protected] wrote:
> On Wed, Dec 20, 2017 at 01:33:45AM +0100, Thomas Gleixner wrote:
> > On Tue, 19 Dec 2017, [email protected] wrote:
> > > On Wed, Dec 20, 2017 at 12:22:12AM +0100, Pavel Machek wrote:
> > > > You forgot to mention commit id :-).
> > > >
> > >
> > > That is very strange, anyhow:
> > >
> > > commit fdba46ffb4c203b6e6794163493fd310f98bb4be
> > > Author: Thomas Gleixner <[email protected]>
> > > Date: Wed Sep 13 23:29:27 2017 +0200
> > >
> > > x86/apic: Get rid of multi CPU affinity
> > >
> > >
> > > Will try reverting soon, just a bit busy today out in the desert and the sun
> > > is going down so my solar panel is useless.
> >
> > The revert is not trivial.
> >
> > What is the exact problem and how do you reproduce that?
> >
> > Thanks,
> >
>
> So I had some time today to poke at this some more. Since it looks to
> be easily reproduced by simply pulling the AC power while playing music
> or doing IO, and dmesg clearly reports using mwait, I tried booting with
> idle=nomwait to see if that made any difference. It didn't, the same
> thing still occurs.
>
> In trying to make sense of this totally unfamiliar apic code and better
> understand these changes, I came across this comment which seemed a bit
> telling:
>
> 40 void flat_vector_allocation_domain(int cpu, struct cpumask *retmask,
> 41 const struct cpumask *mask)
> 42 {
> 43 /*
> 44 * Careful. Some cpus do not strictly honor the set of cpus
> 45 * specified in the interrupt destination when using lowest
> 46 * priority interrupt delivery mode.
> 47 *
> 48 * In particular there was a hyperthreading cpu observed to
> 49 * deliver interrupts to the wrong hyperthread when only one
> 50 * hyperthread was specified in the interrupt desitination.
> 51 */
> 52 cpumask_clear(retmask);
> 53 cpumask_bits(retmask)[0] = APIC_ALL_CPUS;
> 54 }
>
> It's this allocation domain mask hook which has been bypassed by the
> offending commit. The existing approach is more robust in the face of
> relaxed adherence to destination cpumasks since it's all-inclusive,
> whereas the new code is exclusive to a specific cpu.
>
> Is it possible what I'm observing is just another manifestation of
> what's being described in that comment? This is a core 2 duo, so not
> hyper-threaded. But maybe something funny happens when switching
> cstates in response to interrupts - like maybe the wrong cpu can be used
> if it can save power vs. powering up another? Just thinking out loud
> here.
>
> In any case, 4.15-rc4 is quite unusable on my machine because of this.
>

Some more food for thought:

Added the following instrumentation:

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 93edc2236282..7034eda4d494 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -228,6 +228,9 @@ static int __assign_irq_vector(int irq, struct apic_chip_data *d,
cpumask_and(vector_searchmask, vector_searchmask, mask);
BUG_ON(apic->cpu_mask_to_apicid(vector_searchmask, irqdata,
&d->cfg.dest_apicid));
+
+ printk("allocated vector=%i maskfirst=%i\n", d->cfg.vector, cpumask_first(vector_searchmask));
+
return 0;
}

This is what I see:

Upon playing song in cmus (on AC power since boot):
Dec 22 22:26:52 iridesce kernel: allocated vector=35 maskfirst=1

Yank AC:
Dec 22 22:27:14 iridesce kernel: allocated vector=51 maskfirst=1
Dec 22 22:27:15 iridesce kernel: do_IRQ: 0.35 No irq handler for vector

So CPU 0 vector 35 got an interrupt when maskfirst=1 for 35 as seen in
the added printk.

It seems like the affinity changes are assuming a strict adherence to
the CPU mask when the underlying hardware is treating it more as a hint.
Perhaps handlers still need to be maintained on all CPUs in a given apic
domain, regardless of what the masks are configured as, to cover these
situations.

Regards,
Vito Caputo

2017-12-23 20:33:41

by Thomas Gleixner

[permalink] [raw]
Subject: Re: thinkpad x60: sound problems in 4.15-rc1 was Re: thinkpad x60: sound problems in 4.14.0-next-20171114

On Sat, 23 Dec 2017, [email protected] wrote:
> On Fri, Dec 22, 2017 at 09:37:01PM -0800, [email protected] wrote:
> Added the following instrumentation:
>
> diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
> index 93edc2236282..7034eda4d494 100644
> --- a/arch/x86/kernel/apic/vector.c
> +++ b/arch/x86/kernel/apic/vector.c
> @@ -228,6 +228,9 @@ static int __assign_irq_vector(int irq, struct apic_chip_data *d,
> cpumask_and(vector_searchmask, vector_searchmask, mask);
> BUG_ON(apic->cpu_mask_to_apicid(vector_searchmask, irqdata,
> &d->cfg.dest_apicid));
> +
> + printk("allocated vector=%i maskfirst=%i\n", d->cfg.vector, cpumask_first(vector_searchmask));
> +
> return 0;
> }
>
> This is what I see:
>
> Upon playing song in cmus (on AC power since boot):
> Dec 22 22:26:52 iridesce kernel: allocated vector=35 maskfirst=1
>
> Yank AC:
> Dec 22 22:27:14 iridesce kernel: allocated vector=51 maskfirst=1
> Dec 22 22:27:15 iridesce kernel: do_IRQ: 0.35 No irq handler for vector
>
> So CPU 0 vector 35 got an interrupt when maskfirst=1 for 35 as seen in
> the added printk.
>
> It seems like the affinity changes are assuming a strict adherence to
> the CPU mask when the underlying hardware is treating it more as a hint.
> Perhaps handlers still need to be maintained on all CPUs in a given apic
> domain, regardless of what the masks are configured as, to cover these
> situations.

That's odd. I'll have a look after the holidays.

Merry Christmas!

Thanks,

tglx

2017-12-24 16:08:17

by Vito Caputo

[permalink] [raw]
Subject: Re: thinkpad x60: sound problems in 4.15-rc1 was Re: thinkpad x60: sound problems in 4.14.0-next-20171114

On Sat, Dec 23, 2017 at 09:33:37PM +0100, Thomas Gleixner wrote:
> On Sat, 23 Dec 2017, [email protected] wrote:
> > On Fri, Dec 22, 2017 at 09:37:01PM -0800, [email protected] wrote:
> > Added the following instrumentation:
> >
> > diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
> > index 93edc2236282..7034eda4d494 100644
> > --- a/arch/x86/kernel/apic/vector.c
> > +++ b/arch/x86/kernel/apic/vector.c
> > @@ -228,6 +228,9 @@ static int __assign_irq_vector(int irq, struct apic_chip_data *d,
> > cpumask_and(vector_searchmask, vector_searchmask, mask);
> > BUG_ON(apic->cpu_mask_to_apicid(vector_searchmask, irqdata,
> > &d->cfg.dest_apicid));
> > +
> > + printk("allocated vector=%i maskfirst=%i\n", d->cfg.vector, cpumask_first(vector_searchmask));
> > +
> > return 0;
> > }
> >
> > This is what I see:
> >
> > Upon playing song in cmus (on AC power since boot):
> > Dec 22 22:26:52 iridesce kernel: allocated vector=35 maskfirst=1
> >
> > Yank AC:
> > Dec 22 22:27:14 iridesce kernel: allocated vector=51 maskfirst=1
> > Dec 22 22:27:15 iridesce kernel: do_IRQ: 0.35 No irq handler for vector
> >
> > So CPU 0 vector 35 got an interrupt when maskfirst=1 for 35 as seen in
> > the added printk.
> >
> > It seems like the affinity changes are assuming a strict adherence to
> > the CPU mask when the underlying hardware is treating it more as a hint.
> > Perhaps handlers still need to be maintained on all CPUs in a given apic
> > domain, regardless of what the masks are configured as, to cover these
> > situations.
>
> That's odd. I'll have a look after the holidays.
>

Ok, just FYI I've reproduced it on rc5 as well.

I may be offline a bit at the start of the new year, in case you've got
something for me to test and I'm unresponsive.

Regards,
Vito Caputo

2017-12-25 10:09:04

by Pavel Machek

[permalink] [raw]
Subject: Re: thinkpad x60: sound problems in 4.15-rc1 was Re: thinkpad x60: sound problems in 4.14.0-next-20171114

Hi!

> > > > > > > It happened first in -next, now it is in 4.15-rc1.
> > > > > >
> > > > > > So you meant a possible regression between 4.14 and 4.15-rc1?
> > > > >
> > > > > Yes.
> > > >
> > > > Hm, as far as I see, the only significant difference is the commit
> > > > 20e3f985bb875fea4f86b04eba4b6cc29bfd6b71
> > > > ALSA: pcm: update tstamp only if audio_tstamp changed
> > > >
> > > > Another change d6c0615f510bc1ee26cfb2b9a3343ac99b9c46fb
> > > > ALSA: hda - Fix yet remaining issue with vmaster 0dB
> > > > initialization
> > > > is basically for fixing a previous wrong fix, and it should influence
> > > > on all use cases, not only for a specific application.
> > >
> > > Happened again, this time on -rc3. It is more than "audio is silent"
> > > -- apps behave strangely. Let me test with
> > > 20e3f985bb875fea4f86b04eba4b6cc29bfd6b71 reverted.
> > >
> > > Hmm. This is 4th regression this release cycle :-(.
> >
> >
> > Today I jumped to 4.15-rc4 from 4.14-rc6, and have noticed some oddities
> > with audio in youtube under firefox which I never experienced before.
> >
> > If I pause the playback, the audio seems to infinitely loop on whatever
> > is in the dma buffer. Resuming playback works but now the expected
> > audio has repeated pops and clicks mixed in with it.
> >
> > Even closing firefox doesn't seem to stop the looping buffer...
> >
> > Machine is an x61s 1.8ghz thinkpad, x86_64, debian stretch, .config attached.
> >
> > This for me is a 4.15 blocker, and I presume it's related to Pavel's
> > experience as the x60 isn't much different AFAIK.
> >
>
> Just reproduced this, it seems to be trivial to repro and doesn't
> actually require pausing or anything. Simply watching a youtube video
> causes the audio to get messed up after a short period.

Hmm. Yes, I do experience something similar, but that has been there
for long time now. Audio does not work with chromium (looping weirdly)
for first few minutes after boot. Then I restart chromium and it
starts to work ok...

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (2.14 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments

2017-12-25 10:12:23

by Pavel Machek

[permalink] [raw]
Subject: Re: thinkpad x60: sound problems in 4.15-rc1 was Re: thinkpad x60: sound problems in 4.14.0-next-20171114

Hi!

> It's this allocation domain mask hook which has been bypassed by the
> offending commit. The existing approach is more robust in the face of
> relaxed adherence to destination cpumasks since it's all-inclusive,
> whereas the new code is exclusive to a specific cpu.
>
> Is it possible what I'm observing is just another manifestation of
> what's being described in that comment? This is a core 2 duo, so not
> hyper-threaded. But maybe something funny happens when switching
> cstates in response to interrupts - like maybe the wrong cpu can be used
> if it can save power vs. powering up another? Just thinking out loud
> here.
>
> In any case, 4.15-rc4 is quite unusable on my machine because of this.
>
> Pavel, do you observe the same behavior on your x60, WRT AC power?

No.. no Monitor-Mwait messages, and audio works if I unplug AC.

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (0.98 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments

2017-12-28 10:41:52

by Thomas Gleixner

[permalink] [raw]
Subject: Re: thinkpad x60: sound problems in 4.15-rc1 was Re: thinkpad x60: sound problems in 4.14.0-next-20171114

On Sun, 24 Dec 2017, [email protected] wrote:
> On Sat, Dec 23, 2017 at 09:33:37PM +0100, Thomas Gleixner wrote:
> > > It seems like the affinity changes are assuming a strict adherence to
> > > the CPU mask when the underlying hardware is treating it more as a hint.
> > > Perhaps handlers still need to be maintained on all CPUs in a given apic
> > > domain, regardless of what the masks are configured as, to cover these
> > > situations.
> >
> > That's odd. I'll have a look after the holidays.
> >
>
> Ok, just FYI I've reproduced it on rc5 as well.
>
> I may be offline a bit at the start of the new year, in case you've got
> something for me to test and I'm unresponsive.

Can you try the patch below?

Thanks,

tglx

8<---------------
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -151,7 +151,7 @@ static struct apic apic_flat __ro_after_
.apic_id_valid = default_apic_id_valid,
.apic_id_registered = flat_apic_id_registered,

- .irq_delivery_mode = dest_LowestPrio,
+ .irq_delivery_mode = dest_Fixed,
.irq_dest_mode = 1, /* logical */

.disable_esr = 0,
--- a/arch/x86/kernel/apic/probe_32.c
+++ b/arch/x86/kernel/apic/probe_32.c
@@ -105,7 +105,7 @@ static struct apic apic_default __ro_aft
.apic_id_valid = default_apic_id_valid,
.apic_id_registered = default_apic_id_registered,

- .irq_delivery_mode = dest_LowestPrio,
+ .irq_delivery_mode = dest_Fixed,
/* logical delivery broadcast to all CPUs: */
.irq_dest_mode = 1,

--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -184,7 +184,7 @@ static struct apic apic_x2apic_cluster _
.apic_id_valid = x2apic_apic_id_valid,
.apic_id_registered = x2apic_apic_id_registered,

- .irq_delivery_mode = dest_LowestPrio,
+ .irq_delivery_mode = dest_Fixed,
.irq_dest_mode = 1, /* logical */

.disable_esr = 0,

2017-12-28 18:22:21

by Vito Caputo

[permalink] [raw]
Subject: Re: thinkpad x60: sound problems in 4.15-rc1 was Re: thinkpad x60: sound problems in 4.14.0-next-20171114

On Thu, Dec 28, 2017 at 11:41:45AM +0100, Thomas Gleixner wrote:
> On Sun, 24 Dec 2017, [email protected] wrote:
> > On Sat, Dec 23, 2017 at 09:33:37PM +0100, Thomas Gleixner wrote:
> > > > It seems like the affinity changes are assuming a strict adherence to
> > > > the CPU mask when the underlying hardware is treating it more as a hint.
> > > > Perhaps handlers still need to be maintained on all CPUs in a given apic
> > > > domain, regardless of what the masks are configured as, to cover these
> > > > situations.
> > >
> > > That's odd. I'll have a look after the holidays.
> > >
> >
> > Ok, just FYI I've reproduced it on rc5 as well.
> >
> > I may be offline a bit at the start of the new year, in case you've got
> > something for me to test and I'm unresponsive.
>
> Can you try the patch below?
>

Looks fixed so far, I'll try living in 4.15-rc5 now and will report back
if anything goes sideways.

Thanks Thomas!


> Thanks,
>
> tglx
>
> 8<---------------
> --- a/arch/x86/kernel/apic/apic_flat_64.c
> +++ b/arch/x86/kernel/apic/apic_flat_64.c
> @@ -151,7 +151,7 @@ static struct apic apic_flat __ro_after_
> .apic_id_valid = default_apic_id_valid,
> .apic_id_registered = flat_apic_id_registered,
>
> - .irq_delivery_mode = dest_LowestPrio,
> + .irq_delivery_mode = dest_Fixed,
> .irq_dest_mode = 1, /* logical */
>
> .disable_esr = 0,
> --- a/arch/x86/kernel/apic/probe_32.c
> +++ b/arch/x86/kernel/apic/probe_32.c
> @@ -105,7 +105,7 @@ static struct apic apic_default __ro_aft
> .apic_id_valid = default_apic_id_valid,
> .apic_id_registered = default_apic_id_registered,
>
> - .irq_delivery_mode = dest_LowestPrio,
> + .irq_delivery_mode = dest_Fixed,
> /* logical delivery broadcast to all CPUs: */
> .irq_dest_mode = 1,
>
> --- a/arch/x86/kernel/apic/x2apic_cluster.c
> +++ b/arch/x86/kernel/apic/x2apic_cluster.c
> @@ -184,7 +184,7 @@ static struct apic apic_x2apic_cluster _
> .apic_id_valid = x2apic_apic_id_valid,
> .apic_id_registered = x2apic_apic_id_registered,
>
> - .irq_delivery_mode = dest_LowestPrio,
> + .irq_delivery_mode = dest_Fixed,
> .irq_dest_mode = 1, /* logical */
>
> .disable_esr = 0,

Subject: [tip:x86/urgent] x86/apic: Switch all APICs to Fixed delivery mode

Commit-ID: 45fa8d89192e4e8e801e67dac3394d6597613e07
Gitweb: https://git.kernel.org/tip/45fa8d89192e4e8e801e67dac3394d6597613e07
Author: Thomas Gleixner <[email protected]>
AuthorDate: Thu, 28 Dec 2017 11:33:33 +0100
Committer: Thomas Gleixner <[email protected]>
CommitDate: Fri, 29 Dec 2017 00:21:04 +0100

x86/apic: Switch all APICs to Fixed delivery mode

Some of the APIC incarnations are operating in lowest priority delivery
mode. This worked as long as the vector management code allocated the same
vector on all possible CPUs for each interrupt.

Lowest priority delivery mode does not necessarily respect the affinity
setting and may redirect to some other online CPU. This was documented
somewhere in the old code and the conversion to single target delivery
missed to update the delivery mode of the affected APIC drivers which
results in spurious interrupts on some of the affected CPU/Chipset
combinations.

Switch the APIC drivers over to Fixed delivery mode and remove all
leftovers of lowest priority delivery mode.

As a consequence of this change, the apic::irq_delivery_mode field is now
pointless, but this needs to be cleaned up in a separate patch.

Fixes: fdba46ffb4c2 ("x86/apic: Get rid of multi CPU affinity")
Reported-by: [email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: [email protected]
Cc: Pavel Machek <[email protected]>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712281140440.1688@nanos
---
arch/x86/kernel/apic/apic_flat_64.c | 2 +-
arch/x86/kernel/apic/apic_noop.c | 2 +-
arch/x86/kernel/apic/msi.c | 8 ++------
arch/x86/kernel/apic/probe_32.c | 2 +-
arch/x86/kernel/apic/x2apic_cluster.c | 2 +-
drivers/pci/host/pci-hyperv.c | 8 ++------
6 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/apic/apic_flat_64.c b/arch/x86/kernel/apic/apic_flat_64.c
index aa85690..25a8702 100644
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -151,7 +151,7 @@ static struct apic apic_flat __ro_after_init = {
.apic_id_valid = default_apic_id_valid,
.apic_id_registered = flat_apic_id_registered,

- .irq_delivery_mode = dest_LowestPrio,
+ .irq_delivery_mode = dest_Fixed,
.irq_dest_mode = 1, /* logical */

.disable_esr = 0,
diff --git a/arch/x86/kernel/apic/apic_noop.c b/arch/x86/kernel/apic/apic_noop.c
index 7b659c4..5078b5c 100644
--- a/arch/x86/kernel/apic/apic_noop.c
+++ b/arch/x86/kernel/apic/apic_noop.c
@@ -110,7 +110,7 @@ struct apic apic_noop __ro_after_init = {
.apic_id_valid = default_apic_id_valid,
.apic_id_registered = noop_apic_id_registered,

- .irq_delivery_mode = dest_LowestPrio,
+ .irq_delivery_mode = dest_Fixed,
/* logical delivery broadcast to all CPUs: */
.irq_dest_mode = 1,

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 9b18be7..ce503c9 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -39,17 +39,13 @@ static void irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
((apic->irq_dest_mode == 0) ?
MSI_ADDR_DEST_MODE_PHYSICAL :
MSI_ADDR_DEST_MODE_LOGICAL) |
- ((apic->irq_delivery_mode != dest_LowestPrio) ?
- MSI_ADDR_REDIRECTION_CPU :
- MSI_ADDR_REDIRECTION_LOWPRI) |
+ MSI_ADDR_REDIRECTION_CPU |
MSI_ADDR_DEST_ID(cfg->dest_apicid);

msg->data =
MSI_DATA_TRIGGER_EDGE |
MSI_DATA_LEVEL_ASSERT |
- ((apic->irq_delivery_mode != dest_LowestPrio) ?
- MSI_DATA_DELIVERY_FIXED :
- MSI_DATA_DELIVERY_LOWPRI) |
+ MSI_DATA_DELIVERY_FIXED |
MSI_DATA_VECTOR(cfg->vector);
}

diff --git a/arch/x86/kernel/apic/probe_32.c b/arch/x86/kernel/apic/probe_32.c
index fa22017..02e8acb 100644
--- a/arch/x86/kernel/apic/probe_32.c
+++ b/arch/x86/kernel/apic/probe_32.c
@@ -105,7 +105,7 @@ static struct apic apic_default __ro_after_init = {
.apic_id_valid = default_apic_id_valid,
.apic_id_registered = default_apic_id_registered,

- .irq_delivery_mode = dest_LowestPrio,
+ .irq_delivery_mode = dest_Fixed,
/* logical delivery broadcast to all CPUs: */
.irq_dest_mode = 1,

diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c
index 622f13c..8b04234 100644
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -184,7 +184,7 @@ static struct apic apic_x2apic_cluster __ro_after_init = {
.apic_id_valid = x2apic_apic_id_valid,
.apic_id_registered = x2apic_apic_id_registered,

- .irq_delivery_mode = dest_LowestPrio,
+ .irq_delivery_mode = dest_Fixed,
.irq_dest_mode = 1, /* logical */

.disable_esr = 0,
diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c
index 0fe3ea1..e7d9447 100644
--- a/drivers/pci/host/pci-hyperv.c
+++ b/drivers/pci/host/pci-hyperv.c
@@ -985,9 +985,7 @@ static u32 hv_compose_msi_req_v1(
int_pkt->wslot.slot = slot;
int_pkt->int_desc.vector = vector;
int_pkt->int_desc.vector_count = 1;
- int_pkt->int_desc.delivery_mode =
- (apic->irq_delivery_mode == dest_LowestPrio) ?
- dest_LowestPrio : dest_Fixed;
+ int_pkt->int_desc.delivery_mode = dest_Fixed;

/*
* Create MSI w/ dummy vCPU set, overwritten by subsequent retarget in
@@ -1008,9 +1006,7 @@ static u32 hv_compose_msi_req_v2(
int_pkt->wslot.slot = slot;
int_pkt->int_desc.vector = vector;
int_pkt->int_desc.vector_count = 1;
- int_pkt->int_desc.delivery_mode =
- (apic->irq_delivery_mode == dest_LowestPrio) ?
- dest_LowestPrio : dest_Fixed;
+ int_pkt->int_desc.delivery_mode = dest_Fixed;

/*
* Create MSI w/ dummy vCPU set targeting just one vCPU, overwritten

Subject: [tip:x86/urgent] x86/apic: Switch all APICs to Fixed delivery mode

Commit-ID: a31e58e129f73ab5b04016330b13ed51fde7a961
Gitweb: https://git.kernel.org/tip/a31e58e129f73ab5b04016330b13ed51fde7a961
Author: Thomas Gleixner <[email protected]>
AuthorDate: Thu, 28 Dec 2017 11:33:33 +0100
Committer: Thomas Gleixner <[email protected]>
CommitDate: Fri, 29 Dec 2017 14:20:48 +0100

x86/apic: Switch all APICs to Fixed delivery mode

Some of the APIC incarnations are operating in lowest priority delivery
mode. This worked as long as the vector management code allocated the same
vector on all possible CPUs for each interrupt.

Lowest priority delivery mode does not necessarily respect the affinity
setting and may redirect to some other online CPU. This was documented
somewhere in the old code and the conversion to single target delivery
missed to update the delivery mode of the affected APIC drivers which
results in spurious interrupts on some of the affected CPU/Chipset
combinations.

Switch the APIC drivers over to Fixed delivery mode and remove all
leftovers of lowest priority delivery mode.

Switching to Fixed delivery mode is not a problem on these CPUs because the
kernel already uses Fixed delivery mode for IPIs. The reason for this is
that th SDM explicitely forbids lowest prio mode for IPIs. The reason is
obvious: If the irq routing does not honor destination targets in lowest
prio mode then an IPI targeted at CPU1 might end up on CPU0, which would be
a fatal problem in many cases.

As a consequence of this change, the apic::irq_delivery_mode field is now
pointless, but this needs to be cleaned up in a separate patch.

Fixes: fdba46ffb4c2 ("x86/apic: Get rid of multi CPU affinity")
Reported-by: [email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: [email protected]
Cc: Pavel Machek <[email protected]>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712281140440.1688@nanos
---
arch/x86/kernel/apic/apic_flat_64.c | 2 +-
arch/x86/kernel/apic/apic_noop.c | 2 +-
arch/x86/kernel/apic/msi.c | 8 ++------
arch/x86/kernel/apic/probe_32.c | 2 +-
arch/x86/kernel/apic/x2apic_cluster.c | 2 +-
drivers/pci/host/pci-hyperv.c | 8 ++------
6 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/apic/apic_flat_64.c b/arch/x86/kernel/apic/apic_flat_64.c
index aa85690..25a8702 100644
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -151,7 +151,7 @@ static struct apic apic_flat __ro_after_init = {
.apic_id_valid = default_apic_id_valid,
.apic_id_registered = flat_apic_id_registered,

- .irq_delivery_mode = dest_LowestPrio,
+ .irq_delivery_mode = dest_Fixed,
.irq_dest_mode = 1, /* logical */

.disable_esr = 0,
diff --git a/arch/x86/kernel/apic/apic_noop.c b/arch/x86/kernel/apic/apic_noop.c
index 7b659c4..5078b5c 100644
--- a/arch/x86/kernel/apic/apic_noop.c
+++ b/arch/x86/kernel/apic/apic_noop.c
@@ -110,7 +110,7 @@ struct apic apic_noop __ro_after_init = {
.apic_id_valid = default_apic_id_valid,
.apic_id_registered = noop_apic_id_registered,

- .irq_delivery_mode = dest_LowestPrio,
+ .irq_delivery_mode = dest_Fixed,
/* logical delivery broadcast to all CPUs: */
.irq_dest_mode = 1,

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 9b18be7..ce503c9 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -39,17 +39,13 @@ static void irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
((apic->irq_dest_mode == 0) ?
MSI_ADDR_DEST_MODE_PHYSICAL :
MSI_ADDR_DEST_MODE_LOGICAL) |
- ((apic->irq_delivery_mode != dest_LowestPrio) ?
- MSI_ADDR_REDIRECTION_CPU :
- MSI_ADDR_REDIRECTION_LOWPRI) |
+ MSI_ADDR_REDIRECTION_CPU |
MSI_ADDR_DEST_ID(cfg->dest_apicid);

msg->data =
MSI_DATA_TRIGGER_EDGE |
MSI_DATA_LEVEL_ASSERT |
- ((apic->irq_delivery_mode != dest_LowestPrio) ?
- MSI_DATA_DELIVERY_FIXED :
- MSI_DATA_DELIVERY_LOWPRI) |
+ MSI_DATA_DELIVERY_FIXED |
MSI_DATA_VECTOR(cfg->vector);
}

diff --git a/arch/x86/kernel/apic/probe_32.c b/arch/x86/kernel/apic/probe_32.c
index fa22017..02e8acb 100644
--- a/arch/x86/kernel/apic/probe_32.c
+++ b/arch/x86/kernel/apic/probe_32.c
@@ -105,7 +105,7 @@ static struct apic apic_default __ro_after_init = {
.apic_id_valid = default_apic_id_valid,
.apic_id_registered = default_apic_id_registered,

- .irq_delivery_mode = dest_LowestPrio,
+ .irq_delivery_mode = dest_Fixed,
/* logical delivery broadcast to all CPUs: */
.irq_dest_mode = 1,

diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c
index 622f13c..8b04234 100644
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -184,7 +184,7 @@ static struct apic apic_x2apic_cluster __ro_after_init = {
.apic_id_valid = x2apic_apic_id_valid,
.apic_id_registered = x2apic_apic_id_registered,

- .irq_delivery_mode = dest_LowestPrio,
+ .irq_delivery_mode = dest_Fixed,
.irq_dest_mode = 1, /* logical */

.disable_esr = 0,
diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c
index 0fe3ea1..e7d9447 100644
--- a/drivers/pci/host/pci-hyperv.c
+++ b/drivers/pci/host/pci-hyperv.c
@@ -985,9 +985,7 @@ static u32 hv_compose_msi_req_v1(
int_pkt->wslot.slot = slot;
int_pkt->int_desc.vector = vector;
int_pkt->int_desc.vector_count = 1;
- int_pkt->int_desc.delivery_mode =
- (apic->irq_delivery_mode == dest_LowestPrio) ?
- dest_LowestPrio : dest_Fixed;
+ int_pkt->int_desc.delivery_mode = dest_Fixed;

/*
* Create MSI w/ dummy vCPU set, overwritten by subsequent retarget in
@@ -1008,9 +1006,7 @@ static u32 hv_compose_msi_req_v2(
int_pkt->wslot.slot = slot;
int_pkt->int_desc.vector = vector;
int_pkt->int_desc.vector_count = 1;
- int_pkt->int_desc.delivery_mode =
- (apic->irq_delivery_mode == dest_LowestPrio) ?
- dest_LowestPrio : dest_Fixed;
+ int_pkt->int_desc.delivery_mode = dest_Fixed;

/*
* Create MSI w/ dummy vCPU set targeting just one vCPU, overwritten

2017-12-29 13:35:43

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86/apic: Switch all APICs to Fixed delivery mode

On Fri, 29 Dec 2017, tip-bot for Thomas Gleixner wrote:

> Commit-ID: a31e58e129f73ab5b04016330b13ed51fde7a961
> Gitweb: https://git.kernel.org/tip/a31e58e129f73ab5b04016330b13ed51fde7a961
> Author: Thomas Gleixner <[email protected]>
> AuthorDate: Thu, 28 Dec 2017 11:33:33 +0100
> Committer: Thomas Gleixner <[email protected]>
> CommitDate: Fri, 29 Dec 2017 14:20:48 +0100
>
> x86/apic: Switch all APICs to Fixed delivery mode

Note, the patch itself is unchanged. I merily amended the change log to
point out that fixed delivery mode is already used on this kind of systems
so the risk of this change is very low.

Thanks,

tglx

> Some of the APIC incarnations are operating in lowest priority delivery
> mode. This worked as long as the vector management code allocated the same
> vector on all possible CPUs for each interrupt.
>
> Lowest priority delivery mode does not necessarily respect the affinity
> setting and may redirect to some other online CPU. This was documented
> somewhere in the old code and the conversion to single target delivery
> missed to update the delivery mode of the affected APIC drivers which
> results in spurious interrupts on some of the affected CPU/Chipset
> combinations.
>
> Switch the APIC drivers over to Fixed delivery mode and remove all
> leftovers of lowest priority delivery mode.
>
> Switching to Fixed delivery mode is not a problem on these CPUs because the
> kernel already uses Fixed delivery mode for IPIs. The reason for this is
> that th SDM explicitely forbids lowest prio mode for IPIs. The reason is
> obvious: If the irq routing does not honor destination targets in lowest
> prio mode then an IPI targeted at CPU1 might end up on CPU0, which would be
> a fatal problem in many cases.
>
> As a consequence of this change, the apic::irq_delivery_mode field is now
> pointless, but this needs to be cleaned up in a separate patch.
>
> Fixes: fdba46ffb4c2 ("x86/apic: Get rid of multi CPU affinity")
> Reported-by: [email protected]
> Signed-off-by: Thomas Gleixner <[email protected]>
> Tested-by: [email protected]
> Cc: Pavel Machek <[email protected]>
> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712281140440.1688@nanos
> ---
> arch/x86/kernel/apic/apic_flat_64.c | 2 +-
> arch/x86/kernel/apic/apic_noop.c | 2 +-
> arch/x86/kernel/apic/msi.c | 8 ++------
> arch/x86/kernel/apic/probe_32.c | 2 +-
> arch/x86/kernel/apic/x2apic_cluster.c | 2 +-
> drivers/pci/host/pci-hyperv.c | 8 ++------
> 6 files changed, 8 insertions(+), 16 deletions(-)
>
> diff --git a/arch/x86/kernel/apic/apic_flat_64.c b/arch/x86/kernel/apic/apic_flat_64.c
> index aa85690..25a8702 100644
> --- a/arch/x86/kernel/apic/apic_flat_64.c
> +++ b/arch/x86/kernel/apic/apic_flat_64.c
> @@ -151,7 +151,7 @@ static struct apic apic_flat __ro_after_init = {
> .apic_id_valid = default_apic_id_valid,
> .apic_id_registered = flat_apic_id_registered,
>
> - .irq_delivery_mode = dest_LowestPrio,
> + .irq_delivery_mode = dest_Fixed,
> .irq_dest_mode = 1, /* logical */
>
> .disable_esr = 0,
> diff --git a/arch/x86/kernel/apic/apic_noop.c b/arch/x86/kernel/apic/apic_noop.c
> index 7b659c4..5078b5c 100644
> --- a/arch/x86/kernel/apic/apic_noop.c
> +++ b/arch/x86/kernel/apic/apic_noop.c
> @@ -110,7 +110,7 @@ struct apic apic_noop __ro_after_init = {
> .apic_id_valid = default_apic_id_valid,
> .apic_id_registered = noop_apic_id_registered,
>
> - .irq_delivery_mode = dest_LowestPrio,
> + .irq_delivery_mode = dest_Fixed,
> /* logical delivery broadcast to all CPUs: */
> .irq_dest_mode = 1,
>
> diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
> index 9b18be7..ce503c9 100644
> --- a/arch/x86/kernel/apic/msi.c
> +++ b/arch/x86/kernel/apic/msi.c
> @@ -39,17 +39,13 @@ static void irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
> ((apic->irq_dest_mode == 0) ?
> MSI_ADDR_DEST_MODE_PHYSICAL :
> MSI_ADDR_DEST_MODE_LOGICAL) |
> - ((apic->irq_delivery_mode != dest_LowestPrio) ?
> - MSI_ADDR_REDIRECTION_CPU :
> - MSI_ADDR_REDIRECTION_LOWPRI) |
> + MSI_ADDR_REDIRECTION_CPU |
> MSI_ADDR_DEST_ID(cfg->dest_apicid);
>
> msg->data =
> MSI_DATA_TRIGGER_EDGE |
> MSI_DATA_LEVEL_ASSERT |
> - ((apic->irq_delivery_mode != dest_LowestPrio) ?
> - MSI_DATA_DELIVERY_FIXED :
> - MSI_DATA_DELIVERY_LOWPRI) |
> + MSI_DATA_DELIVERY_FIXED |
> MSI_DATA_VECTOR(cfg->vector);
> }
>
> diff --git a/arch/x86/kernel/apic/probe_32.c b/arch/x86/kernel/apic/probe_32.c
> index fa22017..02e8acb 100644
> --- a/arch/x86/kernel/apic/probe_32.c
> +++ b/arch/x86/kernel/apic/probe_32.c
> @@ -105,7 +105,7 @@ static struct apic apic_default __ro_after_init = {
> .apic_id_valid = default_apic_id_valid,
> .apic_id_registered = default_apic_id_registered,
>
> - .irq_delivery_mode = dest_LowestPrio,
> + .irq_delivery_mode = dest_Fixed,
> /* logical delivery broadcast to all CPUs: */
> .irq_dest_mode = 1,
>
> diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c
> index 622f13c..8b04234 100644
> --- a/arch/x86/kernel/apic/x2apic_cluster.c
> +++ b/arch/x86/kernel/apic/x2apic_cluster.c
> @@ -184,7 +184,7 @@ static struct apic apic_x2apic_cluster __ro_after_init = {
> .apic_id_valid = x2apic_apic_id_valid,
> .apic_id_registered = x2apic_apic_id_registered,
>
> - .irq_delivery_mode = dest_LowestPrio,
> + .irq_delivery_mode = dest_Fixed,
> .irq_dest_mode = 1, /* logical */
>
> .disable_esr = 0,
> diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c
> index 0fe3ea1..e7d9447 100644
> --- a/drivers/pci/host/pci-hyperv.c
> +++ b/drivers/pci/host/pci-hyperv.c
> @@ -985,9 +985,7 @@ static u32 hv_compose_msi_req_v1(
> int_pkt->wslot.slot = slot;
> int_pkt->int_desc.vector = vector;
> int_pkt->int_desc.vector_count = 1;
> - int_pkt->int_desc.delivery_mode =
> - (apic->irq_delivery_mode == dest_LowestPrio) ?
> - dest_LowestPrio : dest_Fixed;
> + int_pkt->int_desc.delivery_mode = dest_Fixed;
>
> /*
> * Create MSI w/ dummy vCPU set, overwritten by subsequent retarget in
> @@ -1008,9 +1006,7 @@ static u32 hv_compose_msi_req_v2(
> int_pkt->wslot.slot = slot;
> int_pkt->int_desc.vector = vector;
> int_pkt->int_desc.vector_count = 1;
> - int_pkt->int_desc.delivery_mode =
> - (apic->irq_delivery_mode == dest_LowestPrio) ?
> - dest_LowestPrio : dest_Fixed;
> + int_pkt->int_desc.delivery_mode = dest_Fixed;
>
> /*
> * Create MSI w/ dummy vCPU set targeting just one vCPU, overwritten
>