2021-08-24 17:37:29

by Alyssa Rosenzweig

[permalink] [raw]
Subject: [PATCH v2 0/4] drm/panfrost: Bug fixes for lock_region

Chris Morgan reported UBSAN errors in panfrost and tracked them down to
the size computation in lock_region. This calculation is overcomplicated
(cargo culted from kbase) and can be simplified with kernel helpers and
some mathematical identities. The first patch in the series rewrites the
calculation in a form avoiding undefined behaviour; Chris confirms it
placates UBSAN.

While researching this function, I noticed a pair of other potential
bugs: Bifrost can lock more than 4GiB at a time, but must lock at least
32KiB at a time. The latter patches in the series handle these cases.

In review of v1 of this series, Steven pointed out a fourth potential
bug: rounding down the iova can truncate the lock region. v2 adds a new
patch for this case.

The size computation was unit-tested in userspace. Relevant code below,
just missing some copypaste definitions for fls64/clamp/etc:

#define MIN_LOCK (1ULL << 12)
#define MAX_LOCK (1ULL << 48)

struct {
uint64_t size;
uint8_t encoded;
} tests[] = {
/* Clamping */
{ 0, 11 },
{ 1, 11 },
{ 2, 11 },
{ 4095, 11 },
/* Power of two */
{ 4096, 11 },
/* Round up */
{ 4097, 12 },
{ 8192, 12 },
{ 16384, 13 },
{ 16385, 14 },
/* Maximum */
{ ~0ULL, 47 },
};

static uint8_t region_width(uint64_t size)
{
size = clamp(size, MIN_LOCK, MAX_LOCK);
return fls64(size - 1) - 1;
}

int main(int argc, char **argv)
{
for (unsigned i = 0; i < ARRAY_SIZE(tests); ++i) {
uint64_t test = tests[i].size;
uint8_t expected = tests[i].encoded;
uint8_t actual = region_width(test);

assert(expected == actual);
}
}

Changes in v2:

* New patch for non-aligned lock addresses
* Commit message improvements.
* Add Steven's tags.

Alyssa Rosenzweig (4):
drm/panfrost: Simplify lock_region calculation
drm/panfrost: Use u64 for size in lock_region
drm/panfrost: Clamp lock region to Bifrost minimum
drm/panfrost: Handle non-aligned lock addresses

drivers/gpu/drm/panfrost/panfrost_mmu.c | 32 ++++++++++--------------
drivers/gpu/drm/panfrost/panfrost_regs.h | 2 ++
2 files changed, 15 insertions(+), 19 deletions(-)

--
2.30.2


2021-08-24 17:37:31

by Alyssa Rosenzweig

[permalink] [raw]
Subject: [PATCH v2 4/4] drm/panfrost: Handle non-aligned lock addresses

When locking memory, the base address is rounded down to the nearest
page. The current code does not adjust the size in this case,
truncating the lock region:

Input: [----size----]
Round: [----size----]

To fix the truncation, extend the lock region by the amount rounded off.

Input: [----size----]
Round: [-------size------]

This bug is difficult to hit under current assumptions: since the size
of the lock region is stored as a ceil(log2), the truncation must cause
us to cross a power-of-two boundary. This is possible, for example if
the caller tries to lock 65535 bytes starting at iova 0xCAFE0010. The
existing code rounds down the iova to 0xCAFE0000 and rounds up the lock
region to 65536 bytes, locking until 0xCAFF0000. This fails to lock the
last 15 bytes.

In practice, the current callers pass PAGE_SIZE aligned inputs, avoiding
the bug. Therefore this doesn't need to be backported. Still, that's a
happy accident and not a precondition of lock_region, so we let's do the
right thing to future proof.

Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reported-by: Steven Price <[email protected]>
---
drivers/gpu/drm/panfrost/panfrost_mmu.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index dfe5f1d29763..14be32497ec3 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -63,6 +63,9 @@ static void lock_region(struct panfrost_device *pfdev, u32 as_nr,
u8 region_width;
u64 region = iova & PAGE_MASK;

+ /* After rounding the address down, extend the size to lock the end. */
+ size += (region - iova);
+
/* The size is encoded as ceil(log2) minus(1), which may be calculated
* with fls. The size must be clamped to hardware bounds.
*/
--
2.30.2

2021-08-24 17:38:23

by Alyssa Rosenzweig

[permalink] [raw]
Subject: [PATCH v2 1/4] drm/panfrost: Simplify lock_region calculation

In lock_region, simplify the calculation of the region_width parameter.
This field is the size, but encoded as ceil(log2(size)) - 1.
ceil(log2(size)) may be computed directly as fls(size - 1). However, we
want to use the 64-bit versions as the amount to lock can exceed
32-bits.

This avoids undefined (and completely wrong) behaviour when locking all
memory (size ~0). In this case, the old code would "round up" ~0 to the
nearest page, overflowing to 0. Since fls(0) == 0, this would calculate
a region width of 10 + 0 = 10. But then the code would shift by
(region_width - 11) = -1. As shifting by a negative number is undefined,
UBSAN flags the bug. Of course, even if it were defined the behaviour is
wrong, instead of locking all memory almost none would get locked.

The new form of the calculation corrects this special case and avoids
the undefined behaviour.

Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reported-and-tested-by: Chris Morgan <[email protected]>
Fixes: f3ba91228e8e ("drm/panfrost: Add initial panfrost driver")
Cc: <[email protected]>
---
drivers/gpu/drm/panfrost/panfrost_mmu.c | 19 +++++--------------
1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 0da5b3100ab1..f6e02d0392f4 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -62,21 +62,12 @@ static void lock_region(struct panfrost_device *pfdev, u32 as_nr,
{
u8 region_width;
u64 region = iova & PAGE_MASK;
- /*
- * fls returns:
- * 1 .. 32
- *
- * 10 + fls(num_pages)
- * results in the range (11 .. 42)
- */
-
- size = round_up(size, PAGE_SIZE);

- region_width = 10 + fls(size >> PAGE_SHIFT);
- if ((size >> PAGE_SHIFT) != (1ul << (region_width - 11))) {
- /* not pow2, so must go up to the next pow2 */
- region_width += 1;
- }
+ /* The size is encoded as ceil(log2) minus(1), which may be calculated
+ * with fls. The size must be clamped to hardware bounds.
+ */
+ size = max_t(u64, size, PAGE_SIZE);
+ region_width = fls64(size - 1) - 1;
region |= region_width;

/* Lock the region that needs to be updated */
--
2.30.2

2021-08-24 17:57:37

by Alyssa Rosenzweig

[permalink] [raw]
Subject: [PATCH v2 2/4] drm/panfrost: Use u64 for size in lock_region

Mali virtual addresses are 48-bit. Use a u64 instead of size_t to ensure
we can express the "lock everything" condition as ~0ULL without
overflow. This code was silently broken on any platform where a size_t
is less than 48-bits; in particular, it was broken on 32-bit armv7
platforms which remain in use with panfrost. (Mainly RK3288)

Signed-off-by: Alyssa Rosenzweig <[email protected]>
Suggested-by: Rob Herring <[email protected]>
Tested-by: Chris Morgan <[email protected]>
Reviewed-by: Steven Price <[email protected]>
Fixes: f3ba91228e8e ("drm/panfrost: Add initial panfrost driver")
Cc: <[email protected]>
---
drivers/gpu/drm/panfrost/panfrost_mmu.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index f6e02d0392f4..3a795273e505 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -58,7 +58,7 @@ static int write_cmd(struct panfrost_device *pfdev, u32 as_nr, u32 cmd)
}

static void lock_region(struct panfrost_device *pfdev, u32 as_nr,
- u64 iova, size_t size)
+ u64 iova, u64 size)
{
u8 region_width;
u64 region = iova & PAGE_MASK;
@@ -78,7 +78,7 @@ static void lock_region(struct panfrost_device *pfdev, u32 as_nr,


static int mmu_hw_do_operation_locked(struct panfrost_device *pfdev, int as_nr,
- u64 iova, size_t size, u32 op)
+ u64 iova, u64 size, u32 op)
{
if (as_nr < 0)
return 0;
@@ -95,7 +95,7 @@ static int mmu_hw_do_operation_locked(struct panfrost_device *pfdev, int as_nr,

static int mmu_hw_do_operation(struct panfrost_device *pfdev,
struct panfrost_mmu *mmu,
- u64 iova, size_t size, u32 op)
+ u64 iova, u64 size, u32 op)
{
int ret;

@@ -112,7 +112,7 @@ static void panfrost_mmu_enable(struct panfrost_device *pfdev, struct panfrost_m
u64 transtab = cfg->arm_mali_lpae_cfg.transtab;
u64 memattr = cfg->arm_mali_lpae_cfg.memattr;

- mmu_hw_do_operation_locked(pfdev, as_nr, 0, ~0UL, AS_COMMAND_FLUSH_MEM);
+ mmu_hw_do_operation_locked(pfdev, as_nr, 0, ~0ULL, AS_COMMAND_FLUSH_MEM);

mmu_write(pfdev, AS_TRANSTAB_LO(as_nr), transtab & 0xffffffffUL);
mmu_write(pfdev, AS_TRANSTAB_HI(as_nr), transtab >> 32);
@@ -128,7 +128,7 @@ static void panfrost_mmu_enable(struct panfrost_device *pfdev, struct panfrost_m

static void panfrost_mmu_disable(struct panfrost_device *pfdev, u32 as_nr)
{
- mmu_hw_do_operation_locked(pfdev, as_nr, 0, ~0UL, AS_COMMAND_FLUSH_MEM);
+ mmu_hw_do_operation_locked(pfdev, as_nr, 0, ~0ULL, AS_COMMAND_FLUSH_MEM);

mmu_write(pfdev, AS_TRANSTAB_LO(as_nr), 0);
mmu_write(pfdev, AS_TRANSTAB_HI(as_nr), 0);
@@ -242,7 +242,7 @@ static size_t get_pgsize(u64 addr, size_t size)

static void panfrost_mmu_flush_range(struct panfrost_device *pfdev,
struct panfrost_mmu *mmu,
- u64 iova, size_t size)
+ u64 iova, u64 size)
{
if (mmu->as < 0)
return;
--
2.30.2

2021-08-24 17:58:58

by Alyssa Rosenzweig

[permalink] [raw]
Subject: [PATCH v2 3/4] drm/panfrost: Clamp lock region to Bifrost minimum

When locking a region, we currently clamp to a PAGE_SIZE as the minimum
lock region. While this is valid for Midgard, it is invalid for Bifrost,
where the minimum locking size is 8x larger than the 4k page size. Add a
hardware definition for the minimum lock region size (corresponding to
KBASE_LOCK_REGION_MIN_SIZE_LOG2 in kbase) and respect it.

Signed-off-by: Alyssa Rosenzweig <[email protected]>
Tested-by: Chris Morgan <[email protected]>
Reviewed-by: Steven Price <[email protected]>
Cc: <[email protected]>
---
drivers/gpu/drm/panfrost/panfrost_mmu.c | 2 +-
drivers/gpu/drm/panfrost/panfrost_regs.h | 2 ++
2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 3a795273e505..dfe5f1d29763 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -66,7 +66,7 @@ static void lock_region(struct panfrost_device *pfdev, u32 as_nr,
/* The size is encoded as ceil(log2) minus(1), which may be calculated
* with fls. The size must be clamped to hardware bounds.
*/
- size = max_t(u64, size, PAGE_SIZE);
+ size = max_t(u64, size, AS_LOCK_REGION_MIN_SIZE);
region_width = fls64(size - 1) - 1;
region |= region_width;

diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h b/drivers/gpu/drm/panfrost/panfrost_regs.h
index 1940ff86e49a..6c5a11ef1ee8 100644
--- a/drivers/gpu/drm/panfrost/panfrost_regs.h
+++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
@@ -316,6 +316,8 @@
#define AS_FAULTSTATUS_ACCESS_TYPE_READ (0x2 << 8)
#define AS_FAULTSTATUS_ACCESS_TYPE_WRITE (0x3 << 8)

+#define AS_LOCK_REGION_MIN_SIZE (1ULL << 15)
+
#define gpu_write(dev, reg, data) writel(data, dev->iomem + reg)
#define gpu_read(dev, reg) readl(dev->iomem + reg)

--
2.30.2

2021-08-24 22:40:44

by Rob Herring

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] drm/panfrost: Bug fixes for lock_region

On Tue, Aug 24, 2021 at 12:30 PM Alyssa Rosenzweig
<[email protected]> wrote:
>
> Chris Morgan reported UBSAN errors in panfrost and tracked them down to
> the size computation in lock_region. This calculation is overcomplicated
> (cargo culted from kbase) and can be simplified with kernel helpers and
> some mathematical identities. The first patch in the series rewrites the
> calculation in a form avoiding undefined behaviour; Chris confirms it
> placates UBSAN.
>
> While researching this function, I noticed a pair of other potential
> bugs: Bifrost can lock more than 4GiB at a time, but must lock at least
> 32KiB at a time. The latter patches in the series handle these cases.
>
> In review of v1 of this series, Steven pointed out a fourth potential
> bug: rounding down the iova can truncate the lock region. v2 adds a new
> patch for this case.
>
> The size computation was unit-tested in userspace. Relevant code below,
> just missing some copypaste definitions for fls64/clamp/etc:
>
> #define MIN_LOCK (1ULL << 12)
> #define MAX_LOCK (1ULL << 48)
>
> struct {
> uint64_t size;
> uint8_t encoded;
> } tests[] = {
> /* Clamping */
> { 0, 11 },
> { 1, 11 },
> { 2, 11 },
> { 4095, 11 },
> /* Power of two */
> { 4096, 11 },
> /* Round up */
> { 4097, 12 },
> { 8192, 12 },
> { 16384, 13 },
> { 16385, 14 },
> /* Maximum */
> { ~0ULL, 47 },
> };
>
> static uint8_t region_width(uint64_t size)
> {
> size = clamp(size, MIN_LOCK, MAX_LOCK);
> return fls64(size - 1) - 1;
> }
>
> int main(int argc, char **argv)
> {
> for (unsigned i = 0; i < ARRAY_SIZE(tests); ++i) {
> uint64_t test = tests[i].size;
> uint8_t expected = tests[i].encoded;
> uint8_t actual = region_width(test);
>
> assert(expected == actual);
> }
> }
>
> Changes in v2:
>
> * New patch for non-aligned lock addresses
> * Commit message improvements.
> * Add Steven's tags.
>
> Alyssa Rosenzweig (4):
> drm/panfrost: Simplify lock_region calculation
> drm/panfrost: Use u64 for size in lock_region
> drm/panfrost: Clamp lock region to Bifrost minimum
> drm/panfrost: Handle non-aligned lock addresses
>
> drivers/gpu/drm/panfrost/panfrost_mmu.c | 32 ++++++++++--------------
> drivers/gpu/drm/panfrost/panfrost_regs.h | 2 ++
> 2 files changed, 15 insertions(+), 19 deletions(-)

For the series,

Reviewed-by: Rob Herring <[email protected]>

2021-08-25 09:05:21

by Steven Price

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] drm/panfrost: Handle non-aligned lock addresses

On 24/08/2021 18:30, Alyssa Rosenzweig wrote:
> When locking memory, the base address is rounded down to the nearest
> page. The current code does not adjust the size in this case,
> truncating the lock region:
>
> Input: [----size----]
> Round: [----size----]
>
> To fix the truncation, extend the lock region by the amount rounded off.
>
> Input: [----size----]
> Round: [-------size------]
>
> This bug is difficult to hit under current assumptions: since the size
> of the lock region is stored as a ceil(log2), the truncation must cause
> us to cross a power-of-two boundary. This is possible, for example if
> the caller tries to lock 65535 bytes starting at iova 0xCAFE0010. The
> existing code rounds down the iova to 0xCAFE0000 and rounds up the lock
> region to 65536 bytes, locking until 0xCAFF0000. This fails to lock the
> last 15 bytes.
>
> In practice, the current callers pass PAGE_SIZE aligned inputs, avoiding
> the bug. Therefore this doesn't need to be backported. Still, that's a
> happy accident and not a precondition of lock_region, so we let's do the
> right thing to future proof.

Actually it's worse than that due to the hardware behaviour, the spec
states (for LOCKADDR_BASE):

> Only the upper bits of the address are used. The address is aligned to a
> multiple of the region size, so a variable number of low-order bits are
> ignored, depending on the selected region size. It is recommended that software
> ensures that these low bits in the address are cleared, to avoid confusion.

It appears that indeed this has caused confusion ;)

So for a simple request like locking from 0xCAFE0000 - 0xCB010000 (size
= 0x30000) the region width gets rounded up (to 0x40000) which causes
the start address to be effectively rounded down (by the hardware) to
0xCAFC0000 and we fail to lock 0xCB000000-0xCB010000.

Interestingly (unless my reading of this is wrong) that means to lock
0xFFFF0000-0x100010000 (i.e. crossing the 4GB boundary) requires locking
*at least* 0x00000000-0x200000000 (i.e. locking the first 8GB).

This appears to be broken in kbase (which actually does zero out the low
bits of the address) - I've raised a bug internally so hopefully someone
will tell me if I've read the spec completely wrong here.

Steve

> Signed-off-by: Alyssa Rosenzweig <[email protected]>
> Reported-by: Steven Price <[email protected]>
> ---
> drivers/gpu/drm/panfrost/panfrost_mmu.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> index dfe5f1d29763..14be32497ec3 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> @@ -63,6 +63,9 @@ static void lock_region(struct panfrost_device *pfdev, u32 as_nr,
> u8 region_width;
> u64 region = iova & PAGE_MASK;
>
> + /* After rounding the address down, extend the size to lock the end. */
> + size += (region - iova);
> +
> /* The size is encoded as ceil(log2) minus(1), which may be calculated
> * with fls. The size must be clamped to hardware bounds.
> */
>

2021-08-25 09:06:13

by Steven Price

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] drm/panfrost: Simplify lock_region calculation

On 24/08/2021 18:30, Alyssa Rosenzweig wrote:
> In lock_region, simplify the calculation of the region_width parameter.
> This field is the size, but encoded as ceil(log2(size)) - 1.
> ceil(log2(size)) may be computed directly as fls(size - 1). However, we
> want to use the 64-bit versions as the amount to lock can exceed
> 32-bits.
>
> This avoids undefined (and completely wrong) behaviour when locking all
> memory (size ~0). In this case, the old code would "round up" ~0 to the
> nearest page, overflowing to 0. Since fls(0) == 0, this would calculate
> a region width of 10 + 0 = 10. But then the code would shift by
> (region_width - 11) = -1. As shifting by a negative number is undefined,
> UBSAN flags the bug. Of course, even if it were defined the behaviour is
> wrong, instead of locking all memory almost none would get locked.
>
> The new form of the calculation corrects this special case and avoids
> the undefined behaviour.
>
> Signed-off-by: Alyssa Rosenzweig <[email protected]>
> Reported-and-tested-by: Chris Morgan <[email protected]>
> Fixes: f3ba91228e8e ("drm/panfrost: Add initial panfrost driver")
> Cc: <[email protected]>

Reviewed-by: Steven Price <[email protected]>

> ---
> drivers/gpu/drm/panfrost/panfrost_mmu.c | 19 +++++--------------
> 1 file changed, 5 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> index 0da5b3100ab1..f6e02d0392f4 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> @@ -62,21 +62,12 @@ static void lock_region(struct panfrost_device *pfdev, u32 as_nr,
> {
> u8 region_width;
> u64 region = iova & PAGE_MASK;
> - /*
> - * fls returns:
> - * 1 .. 32
> - *
> - * 10 + fls(num_pages)
> - * results in the range (11 .. 42)
> - */
> -
> - size = round_up(size, PAGE_SIZE);
>
> - region_width = 10 + fls(size >> PAGE_SHIFT);
> - if ((size >> PAGE_SHIFT) != (1ul << (region_width - 11))) {
> - /* not pow2, so must go up to the next pow2 */
> - region_width += 1;
> - }
> + /* The size is encoded as ceil(log2) minus(1), which may be calculated
> + * with fls. The size must be clamped to hardware bounds.
> + */
> + size = max_t(u64, size, PAGE_SIZE);
> + region_width = fls64(size - 1) - 1;
> region |= region_width;
>
> /* Lock the region that needs to be updated */
>

2021-08-25 14:10:55

by Alyssa Rosenzweig

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] drm/panfrost: Handle non-aligned lock addresses

> > In practice, the current callers pass PAGE_SIZE aligned inputs, avoiding
> > the bug. Therefore this doesn't need to be backported. Still, that's a
> > happy accident and not a precondition of lock_region, so we let's do the
> > right thing to future proof.
>
> Actually it's worse than that due to the hardware behaviour, the spec
> states (for LOCKADDR_BASE):
>
> > Only the upper bits of the address are used. The address is aligned to a
> > multiple of the region size, so a variable number of low-order bits are
> > ignored, depending on the selected region size. It is recommended that software
> > ensures that these low bits in the address are cleared, to avoid confusion.
>
> It appears that indeed this has caused confusion ;)
>
> So for a simple request like locking from 0xCAFE0000 - 0xCB010000 (size
> = 0x30000) the region width gets rounded up (to 0x40000) which causes
> the start address to be effectively rounded down (by the hardware) to
> 0xCAFC0000 and we fail to lock 0xCB000000-0xCB010000.
>
> Interestingly (unless my reading of this is wrong) that means to lock
> 0xFFFF0000-0x100010000 (i.e. crossing the 4GB boundary) requires locking
> *at least* 0x00000000-0x200000000 (i.e. locking the first 8GB).
>
> This appears to be broken in kbase (which actually does zero out the low
> bits of the address) - I've raised a bug internally so hopefully someone
> will tell me if I've read the spec completely wrong here.

Horrifying, and not what I wanted to read my last day before 2 weeks of
leave. Let's drop this patch, hopefully by the time I'm back, your
friends in GPU can confirm that's a spec bug and not an actual
hardware/driver one...

Can you apply the other 3 patches in the mean time? Thanks :-)

2021-08-25 15:35:50

by Alyssa Rosenzweig

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] drm/panfrost: Handle non-aligned lock addresses

> > Horrifying, and not what I wanted to read my last day before 2 weeks of
> > leave. Let's drop this patch, hopefully by the time I'm back, your
> > friends in GPU can confirm that's a spec bug and not an actual
> > hardware/driver one...
> >
> > Can you apply the other 3 patches in the mean time? Thanks :-)
> >
>
> Yeah, sure. I'll push the first 3 to drm-misc-next-fixes (should land in
> v5.15).
>
> It's interesting that if my (new) reading of the spec is correct then
> kbase has been horribly broken in this respect forever. So clearly it
> can't be something that crops up very often. It would have been good if
> the spec could have included wording such as "naturally aligned" if
> that's what was intended.

Indeed. Fingers crossed this is a mix-up. Although the text you quoted
seems pretty clear unfortunately :|

> Enjoy your holiday!

Thanks!

2021-08-25 16:01:55

by Steven Price

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] drm/panfrost: Handle non-aligned lock addresses

On 25/08/2021 15:07, Alyssa Rosenzweig wrote:
>>> In practice, the current callers pass PAGE_SIZE aligned inputs, avoiding
>>> the bug. Therefore this doesn't need to be backported. Still, that's a
>>> happy accident and not a precondition of lock_region, so we let's do the
>>> right thing to future proof.
>>
>> Actually it's worse than that due to the hardware behaviour, the spec
>> states (for LOCKADDR_BASE):
>>
>>> Only the upper bits of the address are used. The address is aligned to a
>>> multiple of the region size, so a variable number of low-order bits are
>>> ignored, depending on the selected region size. It is recommended that software
>>> ensures that these low bits in the address are cleared, to avoid confusion.
>>
>> It appears that indeed this has caused confusion ;)
>>
>> So for a simple request like locking from 0xCAFE0000 - 0xCB010000 (size
>> = 0x30000) the region width gets rounded up (to 0x40000) which causes
>> the start address to be effectively rounded down (by the hardware) to
>> 0xCAFC0000 and we fail to lock 0xCB000000-0xCB010000.
>>
>> Interestingly (unless my reading of this is wrong) that means to lock
>> 0xFFFF0000-0x100010000 (i.e. crossing the 4GB boundary) requires locking
>> *at least* 0x00000000-0x200000000 (i.e. locking the first 8GB).
>>
>> This appears to be broken in kbase (which actually does zero out the low
>> bits of the address) - I've raised a bug internally so hopefully someone
>> will tell me if I've read the spec completely wrong here.
>
> Horrifying, and not what I wanted to read my last day before 2 weeks of
> leave. Let's drop this patch, hopefully by the time I'm back, your
> friends in GPU can confirm that's a spec bug and not an actual
> hardware/driver one...
>
> Can you apply the other 3 patches in the mean time? Thanks :-)
>

Yeah, sure. I'll push the first 3 to drm-misc-next-fixes (should land in
v5.15).

It's interesting that if my (new) reading of the spec is correct then
kbase has been horribly broken in this respect forever. So clearly it
can't be something that crops up very often. It would have been good if
the spec could have included wording such as "naturally aligned" if
that's what was intended.

Enjoy your holiday!

Thanks,
Steve