Hi Daniel,
sorry for the late reply. I'm just back from vacation.
At Tue, 25 Sep 2012 13:20:05 +0800,
Daniel J Blueman wrote:
>
> On my Macbook with a discrete Nvidia GPU, there is a race between
> selecting the integrated GPU and putting the discrete GPU into D3 [1],
> reliably causing a kernel oops [2].
>
> Introducing a delay of ~1s between the calls prevents this. When the
> second 'OFF' write path executes, it looks like struct azx at
> card->private_data hasn't yet been allocated yet [3], so there is
> likely some locking missing.
It's rather pci_get_drvdata() returning NULL (i.e. card is NULL, thus
card->private_data causes Oops). Could you check the patch like below
and see whether you get a kernel warning (but no Oops) or the problem
gets fixed by shifting the assignment of pci drvdata?
thanks,
Takashi
---
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index f09ff6c..152f9e1 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -2609,9 +2609,15 @@ static void azx_vs_set_state(struct pci_dev *pci,
enum vga_switcheroo_state state)
{
struct snd_card *card = pci_get_drvdata(pci);
- struct azx *chip = card->private_data;
+ struct azx *chip;
bool disabled;
+ if (WARN_ON(!card))
+ return;
+
+ chip = card->private_data;
+ if (WARN_ON(!chip))
+ return;
if (chip->init_failed)
return;
@@ -3314,6 +3320,7 @@ static int __devinit azx_probe(struct pci_dev *pci,
}
snd_card_set_dev(card, &pci->dev);
+ pci_set_drvdata(pci, card);
err = azx_create(card, pci, dev, pci_id->driver_data, &chip);
if (err < 0)
@@ -3340,8 +3347,6 @@ static int __devinit azx_probe(struct pci_dev *pci,
goto out_free;
}
- pci_set_drvdata(pci, card);
-
if (pci_dev_run_wake(pci))
pm_runtime_put_noidle(&pci->dev);
@@ -3350,6 +3355,7 @@ static int __devinit azx_probe(struct pci_dev *pci,
out_free:
snd_card_free(card);
+ pci_set_drvdata(pci, NULL);
return err;
}
On 8 October 2012 20:58, Takashi Iwai <[email protected]> wrote:
> At Tue, 25 Sep 2012 13:20:05 +0800,
> Daniel J Blueman wrote:
>> On my Macbook with a discrete Nvidia GPU, there is a race between
>> selecting the integrated GPU and putting the discrete GPU into D3 [1],
>> reliably causing a kernel oops [2].
>>
>> Introducing a delay of ~1s between the calls prevents this. When the
>> second 'OFF' write path executes, it looks like struct azx at
>> card->private_data hasn't yet been allocated yet [3], so there is
>> likely some locking missing.
>
> It's rather pci_get_drvdata() returning NULL (i.e. card is NULL, thus
> card->private_data causes Oops). Could you check the patch like below
> and see whether you get a kernel warning (but no Oops) or the problem
> gets fixed by shifting the assignment of pci drvdata?
[...]
Good patching. Calling pci_set_drvdata later prevents the oops in HDA,
though we see unexpected 0x0 responses in the response ring buffer
[1], which we don't see when there's a >~1.5s delay between IGD and
OFF.
Thanks,
Daniel
--- [1]
snd_hda_intel 0000:00:1b.0: enabling device (0000 -> 0002)
snd_hda_intel 0000:00:1b.0: irq 55 for MSI/MSI-X
vga_switcheroo: enabled
input: HDA Intel PCH Headphone as
/devices/pci0000:00/0000:00:1b.0/sound/card0/input11
snd_hda_intel 0000:01:00.1: enabling device (0000 -> 0002)
{echo IGD >/sys/kernel/debug/vgaswitcheroo/switch}
{echo OFF >/sys/kernel/debug/vgaswitcheroo/switch}
hda_intel: Disabling MSI
hda-intel: 0000:01:00.1: Handle VGA-switcheroo audio client
hda-intel: Disabling 0000:01:00.1 via VGA-switcheroo
VGA switcheroo: switched nouveau off
[drm] nouveau 0000:01:00.0: Disabling display...
[drm] nouveau 0000:01:00.0: Disabling fbcon...
[drm] nouveau 0000:01:00.0: Unpinning framebuffer(s)...
[drm] nouveau 0000:01:00.0: Evicting buffers...
[drm] nouveau 0000:01:00.0: Idling channels...
[drm] nouveau 0000:01:00.0: Suspending GPU objects...
[drm] nouveau 0000:01:00.0: And we're gone!
hda-intel: spurious response 0x0:0x0, last cmd=0x1f0004
{repeats 220 times}
hda-intel: spurious response 0x0:0x0, last cmd=0x1f0004
HDMI: failed to get afg sub nodes
--
Daniel J Blueman
At Tue, 9 Oct 2012 00:34:09 +0800,
Daniel J Blueman wrote:
>
> On 8 October 2012 20:58, Takashi Iwai <[email protected]> wrote:
> > At Tue, 25 Sep 2012 13:20:05 +0800,
> > Daniel J Blueman wrote:
> >> On my Macbook with a discrete Nvidia GPU, there is a race between
> >> selecting the integrated GPU and putting the discrete GPU into D3 [1],
> >> reliably causing a kernel oops [2].
> >>
> >> Introducing a delay of ~1s between the calls prevents this. When the
> >> second 'OFF' write path executes, it looks like struct azx at
> >> card->private_data hasn't yet been allocated yet [3], so there is
> >> likely some locking missing.
> >
> > It's rather pci_get_drvdata() returning NULL (i.e. card is NULL, thus
> > card->private_data causes Oops). Could you check the patch like below
> > and see whether you get a kernel warning (but no Oops) or the problem
> > gets fixed by shifting the assignment of pci drvdata?
> [...]
>
> Good patching. Calling pci_set_drvdata later prevents the oops in HDA,
> though we see unexpected 0x0 responses in the response ring buffer
> [1], which we don't see when there's a >~1.5s delay between IGD and
> OFF.
If the previous patch fixed, it means that the switching occurred
during the device was being probed. Maybe a better approach to
register the VGA switcheroo after the proper initialization.
The patch below is a revised one. Please give it a try.
Takashi
---
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index f09ff6c..dcbfd29 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -501,6 +501,7 @@ struct azx {
/* VGA-switcheroo setup */
unsigned int use_vga_switcheroo:1;
+ unsigned int vga_switcheroo_registered:1;
unsigned int init_failed:1; /* delayed init failed */
unsigned int disabled:1; /* disabled by VGA-switcher */
@@ -2684,14 +2685,20 @@ static const struct vga_switcheroo_client_ops azx_vs_ops = {
static int __devinit register_vga_switcheroo(struct azx *chip)
{
+ int err;
+
if (!chip->use_vga_switcheroo)
return 0;
/* FIXME: currently only handling DIS controller
* is there any machine with two switchable HDMI audio controllers?
*/
- return vga_switcheroo_register_audio_client(chip->pci, &azx_vs_ops,
+ err = vga_switcheroo_register_audio_client(chip->pci, &azx_vs_ops,
VGA_SWITCHEROO_DIS,
chip->bus != NULL);
+ if (err < 0)
+ return err;
+ chip->vga_switcheroo_registered = 1;
+ return 0;
}
#else
#define init_vga_switcheroo(chip) /* NOP */
@@ -2713,7 +2720,8 @@ static int azx_free(struct azx *chip)
if (use_vga_switcheroo(chip)) {
if (chip->disabled && chip->bus)
snd_hda_unlock_devices(chip->bus);
- vga_switcheroo_unregister_client(chip->pci);
+ if (chip->vga_switcheroo_registered)
+ vga_switcheroo_unregister_client(chip->pci);
}
if (chip->initialized) {
@@ -3067,14 +3075,6 @@ static int __devinit azx_create(struct snd_card *card, struct pci_dev *pci,
}
ok:
- err = register_vga_switcheroo(chip);
- if (err < 0) {
- snd_printk(KERN_ERR SFX
- "Error registering VGA-switcheroo client\n");
- azx_free(chip);
- return err;
- }
-
err = snd_device_new(card, SNDRV_DEV_LOWLEVEL, chip, &ops);
if (err < 0) {
snd_printk(KERN_ERR SFX "Error creating device [card]!\n");
@@ -3345,6 +3345,13 @@ static int __devinit azx_probe(struct pci_dev *pci,
if (pci_dev_run_wake(pci))
pm_runtime_put_noidle(&pci->dev);
+ err = register_vga_switcheroo(chip);
+ if (err < 0) {
+ snd_printk(KERN_ERR SFX
+ "Error registering VGA-switcheroo client\n");
+ goto out_free;
+ }
+
dev++;
return 0;
At Tue, 09 Oct 2012 12:04:08 +0200,
Takashi Iwai wrote:
>
> At Tue, 9 Oct 2012 00:34:09 +0800,
> Daniel J Blueman wrote:
> >
> > On 8 October 2012 20:58, Takashi Iwai <[email protected]> wrote:
> > > At Tue, 25 Sep 2012 13:20:05 +0800,
> > > Daniel J Blueman wrote:
> > >> On my Macbook with a discrete Nvidia GPU, there is a race between
> > >> selecting the integrated GPU and putting the discrete GPU into D3 [1],
> > >> reliably causing a kernel oops [2].
> > >>
> > >> Introducing a delay of ~1s between the calls prevents this. When the
> > >> second 'OFF' write path executes, it looks like struct azx at
> > >> card->private_data hasn't yet been allocated yet [3], so there is
> > >> likely some locking missing.
> > >
> > > It's rather pci_get_drvdata() returning NULL (i.e. card is NULL, thus
> > > card->private_data causes Oops). Could you check the patch like below
> > > and see whether you get a kernel warning (but no Oops) or the problem
> > > gets fixed by shifting the assignment of pci drvdata?
> > [...]
> >
> > Good patching. Calling pci_set_drvdata later prevents the oops in HDA,
> > though we see unexpected 0x0 responses in the response ring buffer
> > [1], which we don't see when there's a >~1.5s delay between IGD and
> > OFF.
>
> If the previous patch fixed, it means that the switching occurred
> during the device was being probed. Maybe a better approach to
> register the VGA switcheroo after the proper initialization.
>
> The patch below is a revised one. Please give it a try.
Also, it's not clear which card spews the spurious response.
Apply the patch below in addition.
thanks,
Takashi
---
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index f09ff6c..9a0a29d 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -829,8 +829,9 @@ static void azx_update_rirb(struct azx *chip)
smp_wmb();
chip->rirb.cmds[addr]--;
} else
- snd_printk(KERN_ERR SFX "spurious response %#x:%#x, "
+ snd_printk(KERN_ERR SFX "%s: spurious response %#x:%#x, "
"last cmd=%#08x\n",
+ pci_name(chip->pci),
res, res_ex,
chip->last_cmd[addr]);
}
On 9 October 2012 18:07, Takashi Iwai <[email protected]> wrote:
> At Tue, 09 Oct 2012 12:04:08 +0200,
> Takashi Iwai wrote:
>>
>> At Tue, 9 Oct 2012 00:34:09 +0800,
>> Daniel J Blueman wrote:
>> >
>> > On 8 October 2012 20:58, Takashi Iwai <[email protected]> wrote:
>> > > At Tue, 25 Sep 2012 13:20:05 +0800,
>> > > Daniel J Blueman wrote:
>> > >> On my Macbook with a discrete Nvidia GPU, there is a race between
>> > >> selecting the integrated GPU and putting the discrete GPU into D3 [1],
>> > >> reliably causing a kernel oops [2].
>> > >>
>> > >> Introducing a delay of ~1s between the calls prevents this. When the
>> > >> second 'OFF' write path executes, it looks like struct azx at
>> > >> card->private_data hasn't yet been allocated yet [3], so there is
>> > >> likely some locking missing.
>> > >
>> > > It's rather pci_get_drvdata() returning NULL (i.e. card is NULL, thus
>> > > card->private_data causes Oops). Could you check the patch like below
>> > > and see whether you get a kernel warning (but no Oops) or the problem
>> > > gets fixed by shifting the assignment of pci drvdata?
>> > [...]
>> >
>> > Good patching. Calling pci_set_drvdata later prevents the oops in HDA,
>> > though we see unexpected 0x0 responses in the response ring buffer
>> > [1], which we don't see when there's a >~1.5s delay between IGD and
>> > OFF.
>>
>> If the previous patch fixed, it means that the switching occurred
>> during the device was being probed. Maybe a better approach to
>> register the VGA switcheroo after the proper initialization.
>>
>> The patch below is a revised one. Please give it a try.
>
> Also, it's not clear which card spews the spurious response.
> Apply the patch below in addition.
[...]
hda-intel: 0000:01:00.1: spurious response 0x0:0x0, last cmd=0x1f0004
$ lspci -s :1:0.1
01:00.1 Audio device: NVIDIA Corporation Device 0e1b (rev ff)
It's the NVIDIA device which presumably hasn't completed it's
transition to D3 at the time the OFF is executed.
Thanks,
Daniel
--
Daniel J Blueman
At Tue, 9 Oct 2012 19:23:56 +0800,
Daniel J Blueman wrote:
>
> On 9 October 2012 18:07, Takashi Iwai <[email protected]> wrote:
> > At Tue, 09 Oct 2012 12:04:08 +0200,
> > Takashi Iwai wrote:
> >>
> >> At Tue, 9 Oct 2012 00:34:09 +0800,
> >> Daniel J Blueman wrote:
> >> >
> >> > On 8 October 2012 20:58, Takashi Iwai <[email protected]> wrote:
> >> > > At Tue, 25 Sep 2012 13:20:05 +0800,
> >> > > Daniel J Blueman wrote:
> >> > >> On my Macbook with a discrete Nvidia GPU, there is a race between
> >> > >> selecting the integrated GPU and putting the discrete GPU into D3 [1],
> >> > >> reliably causing a kernel oops [2].
> >> > >>
> >> > >> Introducing a delay of ~1s between the calls prevents this. When the
> >> > >> second 'OFF' write path executes, it looks like struct azx at
> >> > >> card->private_data hasn't yet been allocated yet [3], so there is
> >> > >> likely some locking missing.
> >> > >
> >> > > It's rather pci_get_drvdata() returning NULL (i.e. card is NULL, thus
> >> > > card->private_data causes Oops). Could you check the patch like below
> >> > > and see whether you get a kernel warning (but no Oops) or the problem
> >> > > gets fixed by shifting the assignment of pci drvdata?
> >> > [...]
> >> >
> >> > Good patching. Calling pci_set_drvdata later prevents the oops in HDA,
> >> > though we see unexpected 0x0 responses in the response ring buffer
> >> > [1], which we don't see when there's a >~1.5s delay between IGD and
> >> > OFF.
> >>
> >> If the previous patch fixed, it means that the switching occurred
> >> during the device was being probed. Maybe a better approach to
> >> register the VGA switcheroo after the proper initialization.
> >>
> >> The patch below is a revised one. Please give it a try.
> >
> > Also, it's not clear which card spews the spurious response.
> > Apply the patch below in addition.
> [...]
>
> hda-intel: 0000:01:00.1: spurious response 0x0:0x0, last cmd=0x1f0004
> $ lspci -s :1:0.1
> 01:00.1 Audio device: NVIDIA Corporation Device 0e1b (rev ff)
>
> It's the NVIDIA device which presumably hasn't completed it's
> transition to D3 at the time the OFF is executed.
OK, then could you try the patch below on the top of previous two
patches?
Takashi
---
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index f09ff6c..676c64d 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -2639,13 +2640,13 @@ static void azx_vs_set_state(struct pci_dev *pci,
disabled ? "Disabling" : "Enabling",
pci_name(chip->pci));
if (disabled) {
+ snd_hda_lock_devices(chip->bus);
azx_suspend(&pci->dev);
chip->disabled = true;
- snd_hda_lock_devices(chip->bus);
} else {
- snd_hda_unlock_devices(chip->bus);
chip->disabled = false;
azx_resume(&pci->dev);
+ snd_hda_unlock_devices(chip->bus);
}
}
}
On 9 October 2012 21:04, Takashi Iwai <[email protected]> wrote:
> At Tue, 9 Oct 2012 19:23:56 +0800,
> Daniel J Blueman wrote:
>> On 9 October 2012 18:07, Takashi Iwai <[email protected]> wrote:
>> > At Tue, 09 Oct 2012 12:04:08 +0200,
>> > Takashi Iwai wrote:
>> >> At Tue, 9 Oct 2012 00:34:09 +0800,
>> >> Daniel J Blueman wrote:
>> >> > On 8 October 2012 20:58, Takashi Iwai <[email protected]> wrote:
>> >> > > At Tue, 25 Sep 2012 13:20:05 +0800,
>> >> > > Daniel J Blueman wrote:
>> >> > >> On my Macbook with a discrete Nvidia GPU, there is a race between
>> >> > >> selecting the integrated GPU and putting the discrete GPU into D3 [1],
>> >> > >> reliably causing a kernel oops [2].
>> >> > >>
>> >> > >> Introducing a delay of ~1s between the calls prevents this. When the
>> >> > >> second 'OFF' write path executes, it looks like struct azx at
>> >> > >> card->private_data hasn't yet been allocated yet [3], so there is
>> >> > >> likely some locking missing.
>> >> > >
>> >> > > It's rather pci_get_drvdata() returning NULL (i.e. card is NULL, thus
>> >> > > card->private_data causes Oops). Could you check the patch like below
>> >> > > and see whether you get a kernel warning (but no Oops) or the problem
>> >> > > gets fixed by shifting the assignment of pci drvdata?
>> >> > [...]
>> >> >
>> >> > Good patching. Calling pci_set_drvdata later prevents the oops in HDA,
>> >> > though we see unexpected 0x0 responses in the response ring buffer
>> >> > [1], which we don't see when there's a >~1.5s delay between IGD and
>> >> > OFF.
>> >>
>> >> If the previous patch fixed, it means that the switching occurred
>> >> during the device was being probed. Maybe a better approach to
>> >> register the VGA switcheroo after the proper initialization.
>> >>
>> >> The patch below is a revised one. Please give it a try.
>> >
>> > Also, it's not clear which card spews the spurious response.
>> > Apply the patch below in addition.
>> [...]
>>
>> hda-intel: 0000:01:00.1: spurious response 0x0:0x0, last cmd=0x1f0004
>> $ lspci -s :1:0.1
>> 01:00.1 Audio device: NVIDIA Corporation Device 0e1b (rev ff)
>>
>> It's the NVIDIA device which presumably hasn't completed it's
>> transition to D3 at the time the OFF is executed.
>
> OK, then could you try the patch below on the top of previous two
> patches?
The first IGD switcheroo command fails to switch to the integrated GPU:
# cat /sys/kernel/debug/vgaswitcheroo/switch
0:DIS:+:Pwr:0000:01:00.0
1:IGD: :Pwr:0000:00:02.0
2:DIS-Audio: :Pwr:0000:01:00.1
# echo IGD >/sys/kernel/debug/vgaswitcheroo/switch
vga_switcheroo: client 1 refused switch
I also instrumented snd_hda_lock_devices, but none of the failure
paths are being taken, which would leave inconsistent state, as the
return value isn't checked.
Thanks,
Daniel
--
Daniel J Blueman
At Tue, 9 Oct 2012 22:26:40 +0800,
Daniel J Blueman wrote:
>
> On 9 October 2012 21:04, Takashi Iwai <[email protected]> wrote:
> > At Tue, 9 Oct 2012 19:23:56 +0800,
> > Daniel J Blueman wrote:
> >> On 9 October 2012 18:07, Takashi Iwai <[email protected]> wrote:
> >> > At Tue, 09 Oct 2012 12:04:08 +0200,
> >> > Takashi Iwai wrote:
> >> >> At Tue, 9 Oct 2012 00:34:09 +0800,
> >> >> Daniel J Blueman wrote:
> >> >> > On 8 October 2012 20:58, Takashi Iwai <[email protected]> wrote:
> >> >> > > At Tue, 25 Sep 2012 13:20:05 +0800,
> >> >> > > Daniel J Blueman wrote:
> >> >> > >> On my Macbook with a discrete Nvidia GPU, there is a race between
> >> >> > >> selecting the integrated GPU and putting the discrete GPU into D3 [1],
> >> >> > >> reliably causing a kernel oops [2].
> >> >> > >>
> >> >> > >> Introducing a delay of ~1s between the calls prevents this. When the
> >> >> > >> second 'OFF' write path executes, it looks like struct azx at
> >> >> > >> card->private_data hasn't yet been allocated yet [3], so there is
> >> >> > >> likely some locking missing.
> >> >> > >
> >> >> > > It's rather pci_get_drvdata() returning NULL (i.e. card is NULL, thus
> >> >> > > card->private_data causes Oops). Could you check the patch like below
> >> >> > > and see whether you get a kernel warning (but no Oops) or the problem
> >> >> > > gets fixed by shifting the assignment of pci drvdata?
> >> >> > [...]
> >> >> >
> >> >> > Good patching. Calling pci_set_drvdata later prevents the oops in HDA,
> >> >> > though we see unexpected 0x0 responses in the response ring buffer
> >> >> > [1], which we don't see when there's a >~1.5s delay between IGD and
> >> >> > OFF.
> >> >>
> >> >> If the previous patch fixed, it means that the switching occurred
> >> >> during the device was being probed. Maybe a better approach to
> >> >> register the VGA switcheroo after the proper initialization.
> >> >>
> >> >> The patch below is a revised one. Please give it a try.
> >> >
> >> > Also, it's not clear which card spews the spurious response.
> >> > Apply the patch below in addition.
> >> [...]
> >>
> >> hda-intel: 0000:01:00.1: spurious response 0x0:0x0, last cmd=0x1f0004
> >> $ lspci -s :1:0.1
> >> 01:00.1 Audio device: NVIDIA Corporation Device 0e1b (rev ff)
> >>
> >> It's the NVIDIA device which presumably hasn't completed it's
> >> transition to D3 at the time the OFF is executed.
> >
> > OK, then could you try the patch below on the top of previous two
> > patches?
>
> The first IGD switcheroo command fails to switch to the integrated GPU:
>
> # cat /sys/kernel/debug/vgaswitcheroo/switch
> 0:DIS:+:Pwr:0000:01:00.0
> 1:IGD: :Pwr:0000:00:02.0
> 2:DIS-Audio: :Pwr:0000:01:00.1
> # echo IGD >/sys/kernel/debug/vgaswitcheroo/switch
> vga_switcheroo: client 1 refused switch
>
> I also instrumented snd_hda_lock_devices, but none of the failure
> paths are being taken, which would leave inconsistent state, as the
> return value isn't checked.
Hm, right, the return value of snd_hda_lock_devices() isn't checked,
but I don't understand how this results like above.
Basically switching is protected by mutex in vga_switcheroo.c, so the
whole operation in the client side should be serialized.
In anyway, try the patch below cleanly, and see the spurious message
error coming up at which timing.
thanks,
Takashi
---
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index 6833835..4518b05 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -501,6 +501,7 @@ struct azx {
/* VGA-switcheroo setup */
unsigned int use_vga_switcheroo:1;
+ unsigned int vga_switcheroo_registered:1;
unsigned int init_failed:1; /* delayed init failed */
unsigned int disabled:1; /* disabled by VGA-switcher */
@@ -1597,6 +1598,8 @@ static int DELAYED_INIT_MARK azx_codec_create(struct azx *chip, const char *mode
int c, codecs, err;
int max_slots;
+ printk(KERN_DEBUG "XXX %s: azx_codec_create entered\n",
+ pci_name(chip->pci));
memset(&bus_temp, 0, sizeof(bus_temp));
bus_temp.private_data = chip;
bus_temp.modelname = model;
@@ -1673,6 +1676,8 @@ static int DELAYED_INIT_MARK azx_codec_create(struct azx *chip, const char *mode
snd_printk(KERN_ERR SFX "no codecs initialized\n");
return -ENXIO;
}
+ printk(KERN_DEBUG "XXX %s: azx_codec_create done\n",
+ pci_name(chip->pci));
return 0;
}
@@ -2615,6 +2620,8 @@ static void azx_vs_set_state(struct pci_dev *pci,
return;
disabled = (state == VGA_SWITCHEROO_OFF);
+ printk(KERN_DEBUG "XXX %s: chip->disabled=%d, disabled=%d\n",
+ pci_name(chip->pci), chip->disabled, disabled);
if (chip->disabled == disabled)
return;
@@ -2638,14 +2645,26 @@ static void azx_vs_set_state(struct pci_dev *pci,
disabled ? "Disabling" : "Enabling",
pci_name(chip->pci));
if (disabled) {
+ printk(KERN_DEBUG "XXX %s: azx_suspend...\n",
+ pci_name(chip->pci));
azx_suspend(&pci->dev);
+ printk(KERN_DEBUG "XXX %s: azx_suspend done...\n",
+ pci_name(chip->pci));
chip->disabled = true;
- snd_hda_lock_devices(chip->bus);
+ if (snd_hda_lock_devices(chip->bus))
+ snd_printk(KERN_WARNING SFX
+ "Cannot lock devices!\n");
} else {
snd_hda_unlock_devices(chip->bus);
chip->disabled = false;
+ printk(KERN_DEBUG "XXX %s: azx_resume...\n",
+ pci_name(chip->pci));
azx_resume(&pci->dev);
+ printk(KERN_DEBUG "XXX %s: azx_resume done...\n",
+ pci_name(chip->pci));
}
+ printk(KERN_DEBUG "XXX %s: switching finished: disabled=%d\n",
+ pci_name(chip->pci), chip->disabled);
}
}
@@ -2683,14 +2702,20 @@ static const struct vga_switcheroo_client_ops azx_vs_ops = {
static int __devinit register_vga_switcheroo(struct azx *chip)
{
+ int err;
+
if (!chip->use_vga_switcheroo)
return 0;
/* FIXME: currently only handling DIS controller
* is there any machine with two switchable HDMI audio controllers?
*/
- return vga_switcheroo_register_audio_client(chip->pci, &azx_vs_ops,
+ err = vga_switcheroo_register_audio_client(chip->pci, &azx_vs_ops,
VGA_SWITCHEROO_DIS,
chip->bus != NULL);
+ if (err < 0)
+ return err;
+ chip->vga_switcheroo_registered = 1;
+ return 0;
}
#else
#define init_vga_switcheroo(chip) /* NOP */
@@ -2712,7 +2737,8 @@ static int azx_free(struct azx *chip)
if (use_vga_switcheroo(chip)) {
if (chip->disabled && chip->bus)
snd_hda_unlock_devices(chip->bus);
- vga_switcheroo_unregister_client(chip->pci);
+ if (chip->vga_switcheroo_registered)
+ vga_switcheroo_unregister_client(chip->pci);
}
if (chip->initialized) {
@@ -3062,14 +3088,6 @@ static int __devinit azx_create(struct snd_card *card, struct pci_dev *pci,
}
ok:
- err = register_vga_switcheroo(chip);
- if (err < 0) {
- snd_printk(KERN_ERR SFX
- "Error registering VGA-switcheroo client\n");
- azx_free(chip);
- return err;
- }
-
err = snd_device_new(card, SNDRV_DEV_LOWLEVEL, chip, &ops);
if (err < 0) {
snd_printk(KERN_ERR SFX "Error creating device [card]!\n");
@@ -3340,6 +3358,13 @@ static int __devinit azx_probe(struct pci_dev *pci,
if (pci_dev_run_wake(pci))
pm_runtime_put_noidle(&pci->dev);
+ err = register_vga_switcheroo(chip);
+ if (err < 0) {
+ snd_printk(KERN_ERR SFX
+ "Error registering VGA-switcheroo client\n");
+ goto out_free;
+ }
+
dev++;
return 0;
On 10 October 2012 20:34, Takashi Iwai <[email protected]> wrote:
> At Tue, 9 Oct 2012 22:26:40 +0800,
> Daniel J Blueman wrote:
>> On 9 October 2012 21:04, Takashi Iwai <[email protected]> wrote:
>> > At Tue, 9 Oct 2012 19:23:56 +0800,
>> > Daniel J Blueman wrote:
>> >> On 9 October 2012 18:07, Takashi Iwai <[email protected]> wrote:
>> >> > At Tue, 09 Oct 2012 12:04:08 +0200,
>> >> > Takashi Iwai wrote:
>> >> >> At Tue, 9 Oct 2012 00:34:09 +0800,
>> >> >> Daniel J Blueman wrote:
>> >> >> > On 8 October 2012 20:58, Takashi Iwai <[email protected]> wrote:
>> >> >> > > At Tue, 25 Sep 2012 13:20:05 +0800,
>> >> >> > > Daniel J Blueman wrote:
>> >> >> > >> On my Macbook with a discrete Nvidia GPU, there is a race between
>> >> >> > >> selecting the integrated GPU and putting the discrete GPU into D3 [1],
>> >> >> > >> reliably causing a kernel oops [2].
>> >> >> > >>
>> >> >> > >> Introducing a delay of ~1s between the calls prevents this. When the
>> >> >> > >> second 'OFF' write path executes, it looks like struct azx at
>> >> >> > >> card->private_data hasn't yet been allocated yet [3], so there is
>> >> >> > >> likely some locking missing.
>> >> >> > >
>> >> >> > > It's rather pci_get_drvdata() returning NULL (i.e. card is NULL, thus
>> >> >> > > card->private_data causes Oops). Could you check the patch like below
>> >> >> > > and see whether you get a kernel warning (but no Oops) or the problem
>> >> >> > > gets fixed by shifting the assignment of pci drvdata?
>> >> >> > [...]
>> >> >> >
>> >> >> > Good patching. Calling pci_set_drvdata later prevents the oops in HDA,
>> >> >> > though we see unexpected 0x0 responses in the response ring buffer
>> >> >> > [1], which we don't see when there's a >~1.5s delay between IGD and
>> >> >> > OFF.
>> >> >>
>> >> >> If the previous patch fixed, it means that the switching occurred
>> >> >> during the device was being probed. Maybe a better approach to
>> >> >> register the VGA switcheroo after the proper initialization.
>> >> >>
>> >> >> The patch below is a revised one. Please give it a try.
>> >> >
>> >> > Also, it's not clear which card spews the spurious response.
>> >> > Apply the patch below in addition.
>> >> [...]
>> >>
>> >> hda-intel: 0000:01:00.1: spurious response 0x0:0x0, last cmd=0x1f0004
>> >> $ lspci -s :1:0.1
>> >> 01:00.1 Audio device: NVIDIA Corporation Device 0e1b (rev ff)
>> >>
>> >> It's the NVIDIA device which presumably hasn't completed it's
>> >> transition to D3 at the time the OFF is executed.
>> >
>> > OK, then could you try the patch below on the top of previous two
>> > patches?
>>
>> The first IGD switcheroo command fails to switch to the integrated GPU:
>>
>> # cat /sys/kernel/debug/vgaswitcheroo/switch
>> 0:DIS:+:Pwr:0000:01:00.0
>> 1:IGD: :Pwr:0000:00:02.0
>> 2:DIS-Audio: :Pwr:0000:01:00.1
>> # echo IGD >/sys/kernel/debug/vgaswitcheroo/switch
>> vga_switcheroo: client 1 refused switch
>>
>> I also instrumented snd_hda_lock_devices, but none of the failure
>> paths are being taken, which would leave inconsistent state, as the
>> return value isn't checked.
>
> Hm, right, the return value of snd_hda_lock_devices() isn't checked,
> but I don't understand how this results like above.
> Basically switching is protected by mutex in vga_switcheroo.c, so the
> whole operation in the client side should be serialized.
>
> In anyway, try the patch below cleanly, and see the spurious message
> error coming up at which timing.
[...]
The patch _does_ address the issue. A recent update to my Macbook
firmware misleadingly broke i915 switching, but since I can reproduce
the oops without the IGD switching completing with the stock kernel,
and consistently can't without [1], the patch is good.
Tested-by: Daniel J Blueman <[email protected]>
Thanks Takashi!
Daniel
--- [1]
snd_hda_intel 0000:00:1b.0: enabling device (0000 -> 0002)
snd_hda_intel 0000:00:1b.0: irq 54 for MSI/MSI-X
XXX 0000:00:1b.0: azx_codec_create entered
vga_switcheroo: enabled
XXX 0000:00:1b.0: azx_codec_create done
input: HDA Intel PCH Headphone as
/devices/pci0000:00/0000:00:1b.0/sound/card0/input9
snd_hda_intel 0000:01:00.1: enabling device (0000 -> 0002)
hda_intel: Disabling MSI
hda-intel: 0000:01:00.1: Handle VGA-switcheroo audio client
XXX 0000:01:00.1: azx_codec_create entered
XXX 0000:01:00.1: azx_codec_create done
input: HDA NVidia HDMI/DP,pcm=8 as
/devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input12
input: HDA NVidia HDMI/DP,pcm=7 as
/devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input13
input: HDA NVidia HDMI/DP,pcm=3 as
/devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input14
vga_switcheroo: client 1 refused switch
i915: switched off
--
Daniel J Blueman
At Fri, 12 Oct 2012 23:08:14 +0800,
Daniel J Blueman wrote:
>
> On 10 October 2012 20:34, Takashi Iwai <[email protected]> wrote:
> > At Tue, 9 Oct 2012 22:26:40 +0800,
> > Daniel J Blueman wrote:
> >> On 9 October 2012 21:04, Takashi Iwai <[email protected]> wrote:
> >> > At Tue, 9 Oct 2012 19:23:56 +0800,
> >> > Daniel J Blueman wrote:
> >> >> On 9 October 2012 18:07, Takashi Iwai <[email protected]> wrote:
> >> >> > At Tue, 09 Oct 2012 12:04:08 +0200,
> >> >> > Takashi Iwai wrote:
> >> >> >> At Tue, 9 Oct 2012 00:34:09 +0800,
> >> >> >> Daniel J Blueman wrote:
> >> >> >> > On 8 October 2012 20:58, Takashi Iwai <[email protected]> wrote:
> >> >> >> > > At Tue, 25 Sep 2012 13:20:05 +0800,
> >> >> >> > > Daniel J Blueman wrote:
> >> >> >> > >> On my Macbook with a discrete Nvidia GPU, there is a race between
> >> >> >> > >> selecting the integrated GPU and putting the discrete GPU into D3 [1],
> >> >> >> > >> reliably causing a kernel oops [2].
> >> >> >> > >>
> >> >> >> > >> Introducing a delay of ~1s between the calls prevents this. When the
> >> >> >> > >> second 'OFF' write path executes, it looks like struct azx at
> >> >> >> > >> card->private_data hasn't yet been allocated yet [3], so there is
> >> >> >> > >> likely some locking missing.
> >> >> >> > >
> >> >> >> > > It's rather pci_get_drvdata() returning NULL (i.e. card is NULL, thus
> >> >> >> > > card->private_data causes Oops). Could you check the patch like below
> >> >> >> > > and see whether you get a kernel warning (but no Oops) or the problem
> >> >> >> > > gets fixed by shifting the assignment of pci drvdata?
> >> >> >> > [...]
> >> >> >> >
> >> >> >> > Good patching. Calling pci_set_drvdata later prevents the oops in HDA,
> >> >> >> > though we see unexpected 0x0 responses in the response ring buffer
> >> >> >> > [1], which we don't see when there's a >~1.5s delay between IGD and
> >> >> >> > OFF.
> >> >> >>
> >> >> >> If the previous patch fixed, it means that the switching occurred
> >> >> >> during the device was being probed. Maybe a better approach to
> >> >> >> register the VGA switcheroo after the proper initialization.
> >> >> >>
> >> >> >> The patch below is a revised one. Please give it a try.
> >> >> >
> >> >> > Also, it's not clear which card spews the spurious response.
> >> >> > Apply the patch below in addition.
> >> >> [...]
> >> >>
> >> >> hda-intel: 0000:01:00.1: spurious response 0x0:0x0, last cmd=0x1f0004
> >> >> $ lspci -s :1:0.1
> >> >> 01:00.1 Audio device: NVIDIA Corporation Device 0e1b (rev ff)
> >> >>
> >> >> It's the NVIDIA device which presumably hasn't completed it's
> >> >> transition to D3 at the time the OFF is executed.
> >> >
> >> > OK, then could you try the patch below on the top of previous two
> >> > patches?
> >>
> >> The first IGD switcheroo command fails to switch to the integrated GPU:
> >>
> >> # cat /sys/kernel/debug/vgaswitcheroo/switch
> >> 0:DIS:+:Pwr:0000:01:00.0
> >> 1:IGD: :Pwr:0000:00:02.0
> >> 2:DIS-Audio: :Pwr:0000:01:00.1
> >> # echo IGD >/sys/kernel/debug/vgaswitcheroo/switch
> >> vga_switcheroo: client 1 refused switch
> >>
> >> I also instrumented snd_hda_lock_devices, but none of the failure
> >> paths are being taken, which would leave inconsistent state, as the
> >> return value isn't checked.
> >
> > Hm, right, the return value of snd_hda_lock_devices() isn't checked,
> > but I don't understand how this results like above.
> > Basically switching is protected by mutex in vga_switcheroo.c, so the
> > whole operation in the client side should be serialized.
> >
> > In anyway, try the patch below cleanly, and see the spurious message
> > error coming up at which timing.
> [...]
>
> The patch _does_ address the issue. A recent update to my Macbook
> firmware misleadingly broke i915 switching, but since I can reproduce
> the oops without the IGD switching completing with the stock kernel,
> and consistently can't without [1], the patch is good.
>
> Tested-by: Daniel J Blueman <[email protected]>
OK, then I'm going to apply the patch to 3.7 tree (with Cc to
stable).
thanks,
Takashi
>
> Thanks Takashi!
> Daniel
>
> --- [1]
>
> snd_hda_intel 0000:00:1b.0: enabling device (0000 -> 0002)
> snd_hda_intel 0000:00:1b.0: irq 54 for MSI/MSI-X
> XXX 0000:00:1b.0: azx_codec_create entered
> vga_switcheroo: enabled
> XXX 0000:00:1b.0: azx_codec_create done
> input: HDA Intel PCH Headphone as
> /devices/pci0000:00/0000:00:1b.0/sound/card0/input9
> snd_hda_intel 0000:01:00.1: enabling device (0000 -> 0002)
> hda_intel: Disabling MSI
> hda-intel: 0000:01:00.1: Handle VGA-switcheroo audio client
> XXX 0000:01:00.1: azx_codec_create entered
> XXX 0000:01:00.1: azx_codec_create done
> input: HDA NVidia HDMI/DP,pcm=8 as
> /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input12
> input: HDA NVidia HDMI/DP,pcm=7 as
> /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input13
> input: HDA NVidia HDMI/DP,pcm=3 as
> /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input14
> vga_switcheroo: client 1 refused switch
> i915: switched off
> --
> Daniel J Blueman
>