2020-04-30 22:14:46

by Jason A. Donenfeld

[permalink] [raw]
Subject: [PATCH] drm/i915: check to see if SIMD registers are available before using SIMD

Sometimes it's not okay to use SIMD registers, the conditions for which
have changed subtly from kernel release to kernel release. Usually the
pattern is to check for may_use_simd() and then fallback to using
something slower in the unlikely case SIMD registers aren't available.
So, this patch fixes up i915's accelerated memcpy routines to fallback
to boring memcpy if may_use_simd() is false.

Cc: [email protected]
Signed-off-by: Jason A. Donenfeld <[email protected]>
---
drivers/gpu/drm/i915/i915_memcpy.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_memcpy.c b/drivers/gpu/drm/i915/i915_memcpy.c
index fdd550405fd3..7c0e022586bc 100644
--- a/drivers/gpu/drm/i915/i915_memcpy.c
+++ b/drivers/gpu/drm/i915/i915_memcpy.c
@@ -24,6 +24,7 @@

#include <linux/kernel.h>
#include <asm/fpu/api.h>
+#include <asm/simd.h>

#include "i915_memcpy.h"

@@ -38,6 +39,12 @@ static DEFINE_STATIC_KEY_FALSE(has_movntdqa);
#ifdef CONFIG_AS_MOVNTDQA
static void __memcpy_ntdqa(void *dst, const void *src, unsigned long len)
{
+ if (unlikely(!may_use_simd())) {
+ memcpy(dst, src, len);
+ return;
+ }
+
+
kernel_fpu_begin();

while (len >= 4) {
@@ -67,6 +74,11 @@ static void __memcpy_ntdqa(void *dst, const void *src, unsigned long len)

static void __memcpy_ntdqu(void *dst, const void *src, unsigned long len)
{
+ if (unlikely(!may_use_simd())) {
+ memcpy(dst, src, len);
+ return;
+ }
+
kernel_fpu_begin();

while (len >= 4) {
--
2.26.2


Subject: Re: [PATCH] drm/i915: check to see if SIMD registers are available before using SIMD

On 2020-04-30 16:10:16 [-0600], Jason A. Donenfeld wrote:
> Sometimes it's not okay to use SIMD registers, the conditions for which
> have changed subtly from kernel release to kernel release. Usually the
> pattern is to check for may_use_simd() and then fallback to using
> something slower in the unlikely case SIMD registers aren't available.
> So, this patch fixes up i915's accelerated memcpy routines to fallback
> to boring memcpy if may_use_simd() is false.

That would indicate that these functions are used from IRQ/softirq which
break otherwise if the kernel is also using the registers. The crypto
code uses it for that purpose.

So
Reviewed-by: Sebastian Andrzej Siewior <[email protected]>

May I ask how large the memcpy can be? I'm asking in case it is large
and an explicit rescheduling point might be needed.

> Cc: [email protected]
> Signed-off-by: Jason A. Donenfeld <[email protected]>

Sebastian

2020-05-01 11:37:03

by David Laight

[permalink] [raw]
Subject: RE: [PATCH] drm/i915: check to see if SIMD registers are available before using SIMD

From: Sebastian Andrzej Siewior
> Sent: 01 May 2020 11:42
> On 2020-04-30 16:10:16 [-0600], Jason A. Donenfeld wrote:
> > Sometimes it's not okay to use SIMD registers, the conditions for which
> > have changed subtly from kernel release to kernel release. Usually the
> > pattern is to check for may_use_simd() and then fallback to using
> > something slower in the unlikely case SIMD registers aren't available.
> > So, this patch fixes up i915's accelerated memcpy routines to fallback
> > to boring memcpy if may_use_simd() is false.
>
> That would indicate that these functions are used from IRQ/softirq which
> break otherwise if the kernel is also using the registers. The crypto
> code uses it for that purpose.
>
> So
> Reviewed-by: Sebastian Andrzej Siewior <[email protected]>
>
> May I ask how large the memcpy can be? I'm asking in case it is large
> and an explicit rescheduling point might be needed.

It is also quite likely that a 'rep movs' copy will be at least just as
fast on modern hardware.

Clearly if you are copying to/from PCIe memory you need the largest
resisters possible - but I think the graphics buffers are mapped cached?
(Otherwise I wouldn't see 3ms 'spins' while it invalidates the
entire screen buffer cache.)

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2020-05-01 18:12:11

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] drm/i915: check to see if SIMD registers are available before using SIMD

On Thu, Apr 30, 2020 at 04:10:16PM -0600, Jason A. Donenfeld wrote:
> Sometimes it's not okay to use SIMD registers, the conditions for which
> have changed subtly from kernel release to kernel release. Usually the
> pattern is to check for may_use_simd() and then fallback to using
> something slower in the unlikely case SIMD registers aren't available.
> So, this patch fixes up i915's accelerated memcpy routines to fallback
> to boring memcpy if may_use_simd() is false.

Err, why does i915 implements its own uncached memcpy instead of relying
on core functionality to start with?

2020-05-01 21:56:16

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH] drm/i915: check to see if SIMD registers are available before using SIMD

On Fri, May 1, 2020 at 4:42 AM Sebastian Andrzej Siewior
<[email protected]> wrote:
> Reviewed-by: Sebastian Andrzej Siewior <[email protected]>

Thanks.

>
> May I ask how large the memcpy can be? I'm asking in case it is large
> and an explicit rescheduling point might be needed.

Yea I was worried about that too. I'm not an i915 developer, but so
far as I can tell:

- The path from intel_engine_cmd_parser is <= 256 KiB for "known
users", so that's rather large.
- The path from perf_memcpy is either 4k, 64k, or 4M, depending on the
type of object, so that seems gigantic, but I think that might be
selftest code.
- The path from compress_page appears to be PAGE_SIZE, so 4k, which
meshes with the limits we set agreed on few weeks ago for the crypto
stuff.
- The path from guc_read_update_log_buffer appears to be 8k, 32k, 2M,
or 8M, depending on the type of object, so that seems absurdly huge
and doesn't appear to be selftest code either like the other case.

I have no doubt the i915 developers will jump in here waving their
arms, but either way, it sure seems to me like you might have a point.

2020-05-01 21:58:04

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH] drm/i915: check to see if SIMD registers are available before using SIMD

On Fri, May 1, 2020 at 12:07 PM Christoph Hellwig <[email protected]> wrote:
>
> On Thu, Apr 30, 2020 at 04:10:16PM -0600, Jason A. Donenfeld wrote:
> > Sometimes it's not okay to use SIMD registers, the conditions for which
> > have changed subtly from kernel release to kernel release. Usually the
> > pattern is to check for may_use_simd() and then fallback to using
> > something slower in the unlikely case SIMD registers aren't available.
> > So, this patch fixes up i915's accelerated memcpy routines to fallback
> > to boring memcpy if may_use_simd() is false.
>
> Err, why does i915 implements its own uncached memcpy instead of relying
> on core functionality to start with?

I was wondering the same. It sure does seem like this ought to be more
generalized functionality, with a name that represents the type of
transfer it's optimized for (wc or similar).

2020-05-03 20:24:20

by Chris Wilson

[permalink] [raw]
Subject: Re: [PATCH] drm/i915: check to see if SIMD registers are available before using SIMD

Quoting Christoph Hellwig (2020-05-01 19:07:31)
> On Thu, Apr 30, 2020 at 04:10:16PM -0600, Jason A. Donenfeld wrote:
> > Sometimes it's not okay to use SIMD registers, the conditions for which
> > have changed subtly from kernel release to kernel release. Usually the
> > pattern is to check for may_use_simd() and then fallback to using
> > something slower in the unlikely case SIMD registers aren't available.
> > So, this patch fixes up i915's accelerated memcpy routines to fallback
> > to boring memcpy if may_use_simd() is false.
>
> Err, why does i915 implements its own uncached memcpy instead of relying
> on core functionality to start with?

What is this core functionality that provides movntqda?
-Chris

2020-05-03 20:32:16

by Chris Wilson

[permalink] [raw]
Subject: Re: [PATCH] drm/i915: check to see if SIMD registers are available before using SIMD

Quoting Jason A. Donenfeld (2020-04-30 23:10:16)
> Sometimes it's not okay to use SIMD registers, the conditions for which
> have changed subtly from kernel release to kernel release. Usually the
> pattern is to check for may_use_simd() and then fallback to using
> something slower in the unlikely case SIMD registers aren't available.
> So, this patch fixes up i915's accelerated memcpy routines to fallback
> to boring memcpy if may_use_simd() is false.
>
> Cc: [email protected]

The same argument as on the previous submission is that we return to the
caller if we can't use movntqda as their fallback path should be faster
than uncached memcpy.
-Chris

2020-05-03 20:37:08

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH] drm/i915: check to see if SIMD registers are available before using SIMD

On Sun, May 3, 2020 at 2:30 PM Chris Wilson <[email protected]> wrote:
>
> Quoting Jason A. Donenfeld (2020-04-30 23:10:16)
> > Sometimes it's not okay to use SIMD registers, the conditions for which
> > have changed subtly from kernel release to kernel release. Usually the
> > pattern is to check for may_use_simd() and then fallback to using
> > something slower in the unlikely case SIMD registers aren't available.
> > So, this patch fixes up i915's accelerated memcpy routines to fallback
> > to boring memcpy if may_use_simd() is false.
> >
> > Cc: [email protected]
>
> The same argument as on the previous submission is that we return to the
> caller if we can't use movntqda as their fallback path should be faster
> than uncached memcpy.

Oh, THAT's what you meant before. Okay, will follow up.

2020-05-04 16:05:38

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] drm/i915: check to see if SIMD registers are available before using SIMD

On Sun, May 03, 2020 at 09:20:19PM +0100, Chris Wilson wrote:
> > Err, why does i915 implements its own uncached memcpy instead of relying
> > on core functionality to start with?
>
> What is this core functionality that provides movntqda?

A sensible name might be memcpy_uncached or mempcy_nontemporal.
But the important point is that this should be arch code with a common
fallback rather than hacking it up in drivers.

2020-05-04 16:20:04

by David Laight

[permalink] [raw]
Subject: RE: [PATCH] drm/i915: check to see if SIMD registers are available before using SIMD

From: Christoph Hellwig
> Sent: 04 May 2020 17:03
>
> On Sun, May 03, 2020 at 09:20:19PM +0100, Chris Wilson wrote:
> > > Err, why does i915 implements its own uncached memcpy instead of relying
> > > on core functionality to start with?
> >
> > What is this core functionality that provides movntqda?
>
> A sensible name might be memcpy_uncached or mempcy_nontemporal.
> But the important point is that this should be arch code with a common
> fallback rather than hacking it up in drivers.

More the point, you are trying to do a copy where:
1) The kernel isn't expected to read the data - so can bypass the cache.
and maybe:
2) The data needs flushing from the cache to actual memory.
and maybe:
3) The cache lines need invalidating.

The fallbacks depend on the required behaviour.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)