2011-06-14 16:52:12

by Daniel J Blueman

[permalink] [raw]
Subject: [PATCH 3.0-rc3] i915: Fix gen6 (SNB) GPU stalling

On 14 June 2011 13:23, Eric Anholt <[email protected]> wrote:
> On Tue, 14 Jun 2011 12:18:36 +0800, Daniel J Blueman <[email protected]> wrote:
>> Hi Eric,
>>
>> The frequent ~1.5s pauses I hit with SNB hardware in the gnome3 UI (eg
>> whenever you hit the top-left of the screen to show all windows) are
>> nicely addressed by your recent wake patch [1] (ported to -rc3). Thus
>> I see no 'missed IRQ' kernel messages.
>>
>> As this addresses a significant usability regression, are you happy to
>> add it to the 3.0-rc queue? I think it has very good value in -stable
>> also (assuming correctness). What do you think?
>
> This one had significant performance impacts, and later hacks in this
> series worked around the problem to approximately the same level of
> success with less impact, and we don't actually have a justification of
> why any of them work. We were still hoping to come up with some clue,
> and haven't yet.

True; that is quite heavy handed delay looping.

It's a pity the usual Intel font didn't make it to the programmer's
reference manuals. Anyway, unmasking the blitter user interrupt in the hardware
status mask register addresses the root cause. Out of reset it's FFFFFFFFh,
so we don't need to read it here.

It would be good to get this into -rc4. -stable probably needs some additional
tweaks.

Signed-off-by: Daniel J Blueman <[email protected]>
---
drivers/gpu/drm/i915/i915_irq.c | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index b9fafe3..9a98c1b 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1827,6 +1827,12 @@ int ironlake_irq_postinstall(struct drm_device *dev)
ironlake_enable_display_irq(dev_priv, DE_PCU_EVENT);
}

+ if (IS_GEN6(dev))
+ /* allow blitter user interrupt to generate a MSI write from
+ the ISR */
+ I915_WRITE(GEN6_BLITTER_HWSTAM,
+ 0xffffffff & ~GEN6_BLITTER_USER_INTERRUPT);
+
return 0;
}

--
1.7.4.1


2011-06-14 17:23:22

by Chris Wilson

[permalink] [raw]
Subject: Re: [PATCH 3.0-rc3] i915: Fix gen6 (SNB) GPU stalling

On Wed, 15 Jun 2011 00:51:47 +0800, Daniel J Blueman <[email protected]> wrote:
> True; that is quite heavy handed delay looping.
>
> It's a pity the usual Intel font didn't make it to the programmer's
> reference manuals. Anyway, unmasking the blitter user interrupt in the hardware
> status mask register addresses the root cause. Out of reset it's FFFFFFFFh,
> so we don't need to read it here.
>
> It would be good to get this into -rc4. -stable probably needs some additional
> tweaks.
>
> Signed-off-by: Daniel J Blueman <[email protected]>

Tested-by: Chris Wilson <[email protected]>

Should apply as is and be equally effective for stable.
-Chris

--
Chris Wilson, Intel Open Source Technology Centre

2011-06-15 03:24:42

by Daniel J Blueman

[permalink] [raw]
Subject: Re: [PATCH 3.0-rc3] i915: Fix gen6 (SNB) GPU stalling

On 15 June 2011 10:06, Eric Anholt <[email protected]> wrote:
> On Wed, 15 Jun 2011 00:51:47 +0800, Daniel J Blueman <[email protected]> wrote:
>> On 14 June 2011 13:23, Eric Anholt <[email protected]> wrote:
>> > On Tue, 14 Jun 2011 12:18:36 +0800, Daniel J Blueman <[email protected]> wrote:
>> >> Hi Eric,
>> >>
>> >> The frequent ~1.5s pauses I hit with SNB hardware in the gnome3 UI (eg
>> >> whenever you hit the top-left of the screen to show all windows) are
>> >> nicely addressed by your recent wake patch [1] (ported to -rc3). Thus
>> >> I see no 'missed IRQ' kernel messages.
>> >>
>> >> As this addresses a significant usability regression, are you happy to
>> >> add it to the 3.0-rc queue? I think it has very good value in -stable
>> >> also (assuming correctness). What do you think?
>> >
>> > This one had significant performance impacts, and later hacks in this
>> > series worked around the problem to approximately the same level of
>> > success with less impact, and we don't actually have a justification of
>> > why any of them work. ?We were still hoping to come up with some clue,
>> > and haven't yet.
>>
>> True; that is quite heavy handed delay looping.
>>
>> It's a pity the usual Intel font didn't make it to the programmer's
>> reference manuals. Anyway, unmasking the blitter user interrupt in the hardware
>> status mask register addresses the root cause. Out of reset it's FFFFFFFFh,
>> so we don't need to read it here.
>>
>> It would be good to get this into -rc4. -stable probably needs some additional
>> tweaks.
>>
>> Signed-off-by: Daniel J Blueman <[email protected]>

> So you're saying that our interrupts at the top-level IMR are triggered
> by the write to the status page of the lower-level ring? ?That's
> surprising to me. ?Or do you think that this write is just happening to
> trigger serialization so the interrupt comes after the DWORD write of
> the seqno by the ring? ?(hw folks just recently indicated that our
> particular code is not expected to serialize the interrupt after the
> seqno store, unless we had an MI_FLUSH_DWORD in between)

When ISR bits not masked by the hardware status mask register change,
a write is generated with the ISR contents to the status page, so I
believe that the blitter command streamer wasn't generating an
interrupt when an MI_USER_INTERRUPT command was issued. The interrupt
handler would kick in for other interrupts, read all the IIRs and
notice the bit set and wake the ring interrupt queue anyway.

I guess we could test this by observing that the BLT user interrupt
IIR bit is always not set on it's own (ie another interrupt woke us)
in the interrupt handler.

Thanks,
Daniel

> This patch has now passed 7000 iterations of the testcase that had a
> ~.5% failure rate before.
>
> Tested-by: Eric Anholt <[email protected]>
--
Daniel J Blueman

2011-06-15 04:54:41

by Ben Widawsky

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH 3.0-rc3] i915: Fix gen6 (SNB) GPU stalling

On Wed, Jun 15, 2011 at 12:51:47AM +0800, Daniel J Blueman wrote:
> On 14 June 2011 13:23, Eric Anholt <[email protected]> wrote:
> > On Tue, 14 Jun 2011 12:18:36 +0800, Daniel J Blueman <[email protected]> wrote:
> >> Hi Eric,
> >>
> >> The frequent ~1.5s pauses I hit with SNB hardware in the gnome3 UI (eg
> >> whenever you hit the top-left of the screen to show all windows) are
> >> nicely addressed by your recent wake patch [1] (ported to -rc3). Thus
> >> I see no 'missed IRQ' kernel messages.
> >>
> >> As this addresses a significant usability regression, are you happy to
> >> add it to the 3.0-rc queue? I think it has very good value in -stable
> >> also (assuming correctness). What do you think?
> >
> > This one had significant performance impacts, and later hacks in this
> > series worked around the problem to approximately the same level of
> > success with less impact, and we don't actually have a justification of
> > why any of them work. We were still hoping to come up with some clue,
> > and haven't yet.
>
> True; that is quite heavy handed delay looping.
>
> It's a pity the usual Intel font didn't make it to the programmer's
> reference manuals. Anyway, unmasking the blitter user interrupt in the hardware
> status mask register addresses the root cause. Out of reset it's FFFFFFFFh,
> so we don't need to read it here.
>
> It would be good to get this into -rc4. -stable probably needs some additional
> tweaks.
>
> Signed-off-by: Daniel J Blueman <[email protected]>
> ---
> drivers/gpu/drm/i915/i915_irq.c | 6 ++++++
> 1 files changed, 6 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index b9fafe3..9a98c1b 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1827,6 +1827,12 @@ int ironlake_irq_postinstall(struct drm_device *dev)
> ironlake_enable_display_irq(dev_priv, DE_PCU_EVENT);
> }
>
> + if (IS_GEN6(dev))
> + /* allow blitter user interrupt to generate a MSI write from
> + the ISR */
> + I915_WRITE(GEN6_BLITTER_HWSTAM,
> + 0xffffffff & ~GEN6_BLITTER_USER_INTERRUPT);
> +
> return 0;
> }
>

I wish the docs said that that this hwstam unmasked MI_USER_INTERRUPTS
parsed by the Blitter Command Parser, instead of the Render Command
parser.

I was just about to write an email about how this is just making the
same interrupt happen twice, when I realized that the docs make no
sense, and this must be a copy-paste doc bug.

This patch sounds good to me. Two small comments:
1. The HWSTAM is touched in preinstall already, why not move this there?
2. I'd prefer you read the register even though as you say it isn't
necessary. It just makes the code self-documented by doing it that way.

Ben

2011-06-15 05:04:53

by Daniel J Blueman

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH 3.0-rc3] i915: Fix gen6 (SNB) GPU stalling

On 15 June 2011 12:43, Ben Widawsky <[email protected]> wrote:
> On Wed, Jun 15, 2011 at 12:51:47AM +0800, Daniel J Blueman wrote:
>> On 14 June 2011 13:23, Eric Anholt <[email protected]> wrote:
>> > On Tue, 14 Jun 2011 12:18:36 +0800, Daniel J Blueman <[email protected]> wrote:
>> >> Hi Eric,
>> >>
>> >> The frequent ~1.5s pauses I hit with SNB hardware in the gnome3 UI (eg
>> >> whenever you hit the top-left of the screen to show all windows) are
>> >> nicely addressed by your recent wake patch [1] (ported to -rc3). Thus
>> >> I see no 'missed IRQ' kernel messages.
>> >>
>> >> As this addresses a significant usability regression, are you happy to
>> >> add it to the 3.0-rc queue? I think it has very good value in -stable
>> >> also (assuming correctness). What do you think?
>> >
>> > This one had significant performance impacts, and later hacks in this
>> > series worked around the problem to approximately the same level of
>> > success with less impact, and we don't actually have a justification of
>> > why any of them work. ?We were still hoping to come up with some clue,
>> > and haven't yet.
>>
>> True; that is quite heavy handed delay looping.
>>
>> It's a pity the usual Intel font didn't make it to the programmer's
>> reference manuals. Anyway, unmasking the blitter user interrupt in the hardware
>> status mask register addresses the root cause. Out of reset it's FFFFFFFFh,
>> so we don't need to read it here.
>>
>> It would be good to get this into -rc4. -stable probably needs some additional
>> tweaks.
>>
>> Signed-off-by: Daniel J Blueman <[email protected]>
>> ---
>> ?drivers/gpu/drm/i915/i915_irq.c | ? ?6 ++++++
>> ?1 files changed, 6 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
>> index b9fafe3..9a98c1b 100644
>> --- a/drivers/gpu/drm/i915/i915_irq.c
>> +++ b/drivers/gpu/drm/i915/i915_irq.c
>> @@ -1827,6 +1827,12 @@ int ironlake_irq_postinstall(struct drm_device *dev)
>> ? ? ? ? ? ? ? ironlake_enable_display_irq(dev_priv, DE_PCU_EVENT);
>> ? ? ? }
>>
>> + ? ? if (IS_GEN6(dev))
>> + ? ? ? ? ? ? /* allow blitter user interrupt to generate a MSI write from
>> + ? ? ? ? ? ? ? ?the ISR */
>> + ? ? ? ? ? ? I915_WRITE(GEN6_BLITTER_HWSTAM,
>> + ? ? ? ? ? ? ? ? ? ? 0xffffffff & ~GEN6_BLITTER_USER_INTERRUPT);
>> +
>> ? ? ? return 0;
>> ?}
>>
>
> I wish the docs said that that this hwstam unmasked MI_USER_INTERRUPTS
> parsed by the Blitter Command Parser, instead of the Render Command
> parser.
>
> I was just about to write an email about how this is just making the
> same interrupt happen twice, when I realized that the docs make no
> sense, and this must be a copy-paste doc bug.
>
> This patch sounds good to me. Two small comments:
> 1. The HWSTAM is touched in preinstall already, why not move this there?
> 2. I'd prefer you read the register even though as you say it isn't
> necessary. It just makes the code self-documented by doing it that way.

The render HWSTAM is tweaked in preinstall, but we need to tweak the
blitter HWSTAM (new to gen6).

To me, it makes sense to reset the blitter HWSTAM register to what the
driver expects, in case anything before the i915 module loads and
adjusts it for a particular purpose (including debug); the render
HWSTAM is set this way too. I could add a comment to both perhaps?

Updating the blitter HWSTAM in the postinstall was a marginally safer
choice, as there'll not be any potential race with a blitter user
interrupt getting emitted before we're ready (which wouldn't have been
tested), but the risk is probably so low that it could just go into
the preinstall.

What do you think?

Daniel
--
Daniel J Blueman

2011-06-15 15:17:42

by Ben Widawsky

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH 3.0-rc3] i915: Fix gen6 (SNB) GPU stalling

On Wed, Jun 15, 2011 at 01:04:51PM +0800, Daniel J Blueman wrote:
> On 15 June 2011 12:43, Ben Widawsky <[email protected]> wrote:
> > On Wed, Jun 15, 2011 at 12:51:47AM +0800, Daniel J Blueman wrote:
> >> On 14 June 2011 13:23, Eric Anholt <[email protected]> wrote:
> >> > On Tue, 14 Jun 2011 12:18:36 +0800, Daniel J Blueman <[email protected]> wrote:
> >> >> Hi Eric,
> >> >>
> >> >> The frequent ~1.5s pauses I hit with SNB hardware in the gnome3 UI (eg
> >> >> whenever you hit the top-left of the screen to show all windows) are
> >> >> nicely addressed by your recent wake patch [1] (ported to -rc3). Thus
> >> >> I see no 'missed IRQ' kernel messages.
> >> >>
> >> >> As this addresses a significant usability regression, are you happy to
> >> >> add it to the 3.0-rc queue? I think it has very good value in -stable
> >> >> also (assuming correctness). What do you think?
> >> >
> >> > This one had significant performance impacts, and later hacks in this
> >> > series worked around the problem to approximately the same level of
> >> > success with less impact, and we don't actually have a justification of
> >> > why any of them work. ?We were still hoping to come up with some clue,
> >> > and haven't yet.
> >>
> >> True; that is quite heavy handed delay looping.
> >>
> >> It's a pity the usual Intel font didn't make it to the programmer's
> >> reference manuals. Anyway, unmasking the blitter user interrupt in the hardware
> >> status mask register addresses the root cause. Out of reset it's FFFFFFFFh,
> >> so we don't need to read it here.
> >>
> >> It would be good to get this into -rc4. -stable probably needs some additional
> >> tweaks.
> >>
> >> Signed-off-by: Daniel J Blueman <[email protected]>
> >> ---
> >> ?drivers/gpu/drm/i915/i915_irq.c | ? ?6 ++++++
> >> ?1 files changed, 6 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> >> index b9fafe3..9a98c1b 100644
> >> --- a/drivers/gpu/drm/i915/i915_irq.c
> >> +++ b/drivers/gpu/drm/i915/i915_irq.c
> >> @@ -1827,6 +1827,12 @@ int ironlake_irq_postinstall(struct drm_device *dev)
> >> ? ? ? ? ? ? ? ironlake_enable_display_irq(dev_priv, DE_PCU_EVENT);
> >> ? ? ? }
> >>
> >> + ? ? if (IS_GEN6(dev))
> >> + ? ? ? ? ? ? /* allow blitter user interrupt to generate a MSI write from
> >> + ? ? ? ? ? ? ? ?the ISR */
> >> + ? ? ? ? ? ? I915_WRITE(GEN6_BLITTER_HWSTAM,
> >> + ? ? ? ? ? ? ? ? ? ? 0xffffffff & ~GEN6_BLITTER_USER_INTERRUPT);
> >> +
> >> ? ? ? return 0;
> >> ?}
> >>
> >
> > I wish the docs said that that this hwstam unmasked MI_USER_INTERRUPTS
> > parsed by the Blitter Command Parser, instead of the Render Command
> > parser.
> >
> > I was just about to write an email about how this is just making the
> > same interrupt happen twice, when I realized that the docs make no
> > sense, and this must be a copy-paste doc bug.
> >
> > This patch sounds good to me. Two small comments:
> > 1. The HWSTAM is touched in preinstall already, why not move this there?
> > 2. I'd prefer you read the register even though as you say it isn't
> > necessary. It just makes the code self-documented by doing it that way.
>
> The render HWSTAM is tweaked in preinstall, but we need to tweak the
> blitter HWSTAM (new to gen6).
>
> To me, it makes sense to reset the blitter HWSTAM register to what the
> driver expects, in case anything before the i915 module loads and
> adjusts it for a particular purpose (including debug); the render
> HWSTAM is set this way too. I could add a comment to both perhaps?

Well on that note, the docs clearly state only 1 bit can be unmasked at
a time. Not sure if that applies to masking as well, but if it does,
that would be not good.

I'm fine with a comment. Seeing the current masking doesn't make it
clear to me what is happening/why. Not sure I understand the reason for
saving the read is in this case.

>
> Updating the blitter HWSTAM in the postinstall was a marginally safer
> choice, as there'll not be any potential race with a blitter user
> interrupt getting emitted before we're ready (which wouldn't have been
> tested), but the risk is probably so low that it could just go into
> the preinstall.
>
> What do you think?

I think there is no risk since this command could only be executed if
the driver was up. I'd just like all HWSTAM writes in a single place. I
don't have any preference which place.

>
> Daniel
Ben

2011-06-15 17:12:11

by Kenneth Graunke

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH 3.0-rc3] i915: Fix gen6 (SNB) GPU stalling

On 06/14/2011 09:51 AM, Daniel J Blueman wrote:
> On 14 June 2011 13:23, Eric Anholt<[email protected]> wrote:
>> On Tue, 14 Jun 2011 12:18:36 +0800, Daniel J Blueman<[email protected]> wrote:
>>> Hi Eric,
>>>
>>> The frequent ~1.5s pauses I hit with SNB hardware in the gnome3 UI (eg
>>> whenever you hit the top-left of the screen to show all windows) are
>>> nicely addressed by your recent wake patch [1] (ported to -rc3). Thus
>>> I see no 'missed IRQ' kernel messages.
>>>
>>> As this addresses a significant usability regression, are you happy to
>>> add it to the 3.0-rc queue? I think it has very good value in -stable
>>> also (assuming correctness). What do you think?
>>
>> This one had significant performance impacts, and later hacks in this
>> series worked around the problem to approximately the same level of
>> success with less impact, and we don't actually have a justification of
>> why any of them work. We were still hoping to come up with some clue,
>> and haven't yet.
>
> True; that is quite heavy handed delay looping.
>
> It's a pity the usual Intel font didn't make it to the programmer's
> reference manuals. Anyway, unmasking the blitter user interrupt in the hardware
> status mask register addresses the root cause. Out of reset it's FFFFFFFFh,
> so we don't need to read it here.
>
> It would be good to get this into -rc4. -stable probably needs some additional
> tweaks.
>
> Signed-off-by: Daniel J Blueman<[email protected]>
> ---
> drivers/gpu/drm/i915/i915_irq.c | 6 ++++++
> 1 files changed, 6 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index b9fafe3..9a98c1b 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1827,6 +1827,12 @@ int ironlake_irq_postinstall(struct drm_device *dev)
> ironlake_enable_display_irq(dev_priv, DE_PCU_EVENT);
> }
>
> + if (IS_GEN6(dev))
> + /* allow blitter user interrupt to generate a MSI write from
> + the ISR */
> + I915_WRITE(GEN6_BLITTER_HWSTAM,
> + 0xffffffff& ~GEN6_BLITTER_USER_INTERRUPT);
> +
> return 0;
> }
>

Tested-by: Kenneth Graunke <[email protected]>

With i915.semaphores=0 on my Lenovo T420s, I reliably saw missed IRQs
from the blitter when using GNOME Shell or running GLBenchmark
2.0/Egypt. Applying this patch fixes the issue, making my system much
more responsive.

Thanks, Daniel!

2011-06-16 02:45:55

by Daniel J Blueman

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH 3.0-rc3] i915: Fix gen6 (SNB) GPU stalling

On 15 June 2011 23:16, Ben Widawsky <[email protected]> wrote:
> On Wed, Jun 15, 2011 at 01:04:51PM +0800, Daniel J Blueman wrote:
>> On 15 June 2011 12:43, Ben Widawsky <[email protected]> wrote:
>> > On Wed, Jun 15, 2011 at 12:51:47AM +0800, Daniel J Blueman wrote:
>> >> On 14 June 2011 13:23, Eric Anholt <[email protected]> wrote:
>> >> > On Tue, 14 Jun 2011 12:18:36 +0800, Daniel J Blueman <[email protected]> wrote:
>> >> >> Hi Eric,
>> >> >>
>> >> >> The frequent ~1.5s pauses I hit with SNB hardware in the gnome3 UI (eg
>> >> >> whenever you hit the top-left of the screen to show all windows) are
>> >> >> nicely addressed by your recent wake patch [1] (ported to -rc3). Thus
>> >> >> I see no 'missed IRQ' kernel messages.
>> >> >>
>> >> >> As this addresses a significant usability regression, are you happy to
>> >> >> add it to the 3.0-rc queue? I think it has very good value in -stable
>> >> >> also (assuming correctness). What do you think?
>> >> >
>> >> > This one had significant performance impacts, and later hacks in this
>> >> > series worked around the problem to approximately the same level of
>> >> > success with less impact, and we don't actually have a justification of
>> >> > why any of them work. ?We were still hoping to come up with some clue,
>> >> > and haven't yet.
>> >>
>> >> True; that is quite heavy handed delay looping.
>> >>
>> >> It's a pity the usual Intel font didn't make it to the programmer's
>> >> reference manuals. Anyway, unmasking the blitter user interrupt in the hardware
>> >> status mask register addresses the root cause. Out of reset it's FFFFFFFFh,
>> >> so we don't need to read it here.
>> >>
>> >> It would be good to get this into -rc4. -stable probably needs some additional
>> >> tweaks.
>> >>
>> >> Signed-off-by: Daniel J Blueman <[email protected]>
>> >> ---
>> >> ?drivers/gpu/drm/i915/i915_irq.c | ? ?6 ++++++
>> >> ?1 files changed, 6 insertions(+), 0 deletions(-)
>> >>
>> >> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
>> >> index b9fafe3..9a98c1b 100644
>> >> --- a/drivers/gpu/drm/i915/i915_irq.c
>> >> +++ b/drivers/gpu/drm/i915/i915_irq.c
>> >> @@ -1827,6 +1827,12 @@ int ironlake_irq_postinstall(struct drm_device *dev)
>> >> ? ? ? ? ? ? ? ironlake_enable_display_irq(dev_priv, DE_PCU_EVENT);
>> >> ? ? ? }
>> >>
>> >> + ? ? if (IS_GEN6(dev))
>> >> + ? ? ? ? ? ? /* allow blitter user interrupt to generate a MSI write from
>> >> + ? ? ? ? ? ? ? ?the ISR */
>> >> + ? ? ? ? ? ? I915_WRITE(GEN6_BLITTER_HWSTAM,
>> >> + ? ? ? ? ? ? ? ? ? ? 0xffffffff & ~GEN6_BLITTER_USER_INTERRUPT);
>> >> +
>> >> ? ? ? return 0;
>> >> ?}
>> >>
>> >
>> > I wish the docs said that that this hwstam unmasked MI_USER_INTERRUPTS
>> > parsed by the Blitter Command Parser, instead of the Render Command
>> > parser.
>> >
>> > I was just about to write an email about how this is just making the
>> > same interrupt happen twice, when I realized that the docs make no
>> > sense, and this must be a copy-paste doc bug.
>> >
>> > This patch sounds good to me. Two small comments:
>> > 1. The HWSTAM is touched in preinstall already, why not move this there?
>> > 2. I'd prefer you read the register even though as you say it isn't
>> > necessary. It just makes the code self-documented by doing it that way.
>>
>> The render HWSTAM is tweaked in preinstall, but we need to tweak the
>> blitter HWSTAM (new to gen6).
>>
>> To me, it makes sense to reset the blitter HWSTAM register to what the
>> driver expects, in case anything before the i915 module loads and
>> adjusts it for a particular purpose (including debug); the render
>> HWSTAM is set this way too. I could add a comment to both perhaps?
>
> Well on that note, the docs clearly state only 1 bit can be unmasked at
> a time. Not sure if that applies to masking as well, but if it does,
> that would be not good.

When a bit is unmasked, logic in the silicon will generate a write
from a particular functional block; multiple writes in the same cycles
may generate undefined behaviour (though would need multiple pending
interrupts). Masking would be safe, since there is no side-effect, and
this is typical.

> I'm fine with a comment. Seeing the current masking doesn't make it
> clear to me what is happening/why. Not sure I understand the reason for
> saving the read is in this case.
>
>>
>> Updating the blitter HWSTAM in the postinstall was a marginally safer
>> choice, as there'll not be any potential race with a blitter user
>> interrupt getting emitted before we're ready (which wouldn't have been
>> tested), but the risk is probably so low that it could just go into
>> the preinstall.
>>
>> What do you think?
>
> I think there is no risk since this command could only be executed if
> the driver was up. I'd just like all HWSTAM writes in a single place. I
> don't have any preference which place.

Ok, I'll prepare a patch that will read-modify-write the register and
place with the render HWSTAM write. Note that the render HWSTAM
unmasks multiple bits with impunity (probably fine if no pending
interrupts), but fixing that is outside the scope of what I want to
get into 3.0-rc4.

Thanks,
Daniel
--
Daniel J Blueman

2011-06-16 03:45:28

by Daniel J Blueman

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH 3.0-rc3] i915: Fix gen6 (SNB) GPU stalling

On 16 June 2011 00:38, Eric Anholt <[email protected]> wrote:
> On Wed, 15 Jun 2011 13:04:51 +0800, Daniel J Blueman <[email protected]> wrote:
>> The render HWSTAM is tweaked in preinstall, but we need to tweak the
>> blitter HWSTAM (new to gen6).
>
> This still doesn't *really* make sense -- HWSTAM is supposed to be
> masking updates to the status page's copy of the IIR, which we never
> read, and not be involved in masking updates to the MMIO I[IS]R. ?So it
> seems to me that this is just happening to get lucky and serialize in
> the hardware for the way that we do actually read IIR (through MMIO).
> And hey, we should be using the status page copy instead of MMIO some
> day anyway, so that's more reason to do this patch even if we don't like
> workarounds.

I see we're checking only the MMIO IIR in the interrupt handler.
Perhaps we should come up with a way to prove that we're not
immediately seeing the correct state in the MMIO IIR when the
interrupt handler is fired without the unmasking. We could also check
if we get only one interrupt bit set (ie the interrupt occurred for
what we wanted and not something else) when we issue a
MI_USER_INTERRUPT to the blitter ring, to see if there is some
unexpected behaviour on gen6.

What do you think?

Also, perhaps I add a comment into the patch to show it's a workaround, right?

>> To me, it makes sense to reset the blitter HWSTAM register to what the
>> driver expects, in case anything before the i915 module loads and
>> adjusts it for a particular purpose (including debug); the render
>> HWSTAM is set this way too. I could add a comment to both perhaps?
>>
>> Updating the blitter HWSTAM in the postinstall was a marginally safer
>> choice, as there'll not be any potential race with a blitter user
>> interrupt getting emitted before we're ready (which wouldn't have been
>> tested), but the risk is probably so low that it could just go into
>> the preinstall.
>
> The GPU is idle before our driver shows up, so there's no risk (there's
> a bunch of leftover paranoia in the driver from the DRI1 days, none of
> which ever made much sense).
--
Daniel J Blueman