2023-12-20 05:29:24

by Jiajun Xie

[permalink] [raw]
Subject: [PATCH v1] mm: fix unmap_mapping_range high bits shift bug

From: Jiajun Xie <[email protected]>

The bug happens when highest bit of holebegin is 1, suppose
holebign is 0x8000000111111000, after shift, hba would be
0xfff8000000111111, then vma_interval_tree_foreach would look
it up fail or leads to the wrong result.

error call seq e.g.:
- mmap(..., offset=0x8000000111111000)
|- syscall(mmap, ... unsigned long, off):
|- ksys_mmap_pgoff( ... , off >> PAGE_SHIFT);

here pgoff is correctly shifted to 0x8000000111111,
but pass 0x8000000111111000 as holebegin to unmap
would then cause terrible result, as shown below:

- unmap_mapping_range(..., loff_t const holebegin)
|- pgoff_t hba = holebegin >> PAGE_SHIFT;
/* hba = 0xfff8000000111111 unexpectedly */

turn holebegin to be unsigned first would fix the bug.

Signed-off-by: Jiajun Xie <[email protected]>
---
mm/memory.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 5c757fba8..6e0712d06 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3624,8 +3624,8 @@ EXPORT_SYMBOL_GPL(unmap_mapping_pages);
void unmap_mapping_range(struct address_space *mapping,
loff_t const holebegin, loff_t const holelen, int even_cows)
{
- pgoff_t hba = holebegin >> PAGE_SHIFT;
- pgoff_t hlen = (holelen + PAGE_SIZE - 1) >> PAGE_SHIFT;
+ pgoff_t hba = (pgoff_t)(holebegin) >> PAGE_SHIFT;
+ pgoff_t hlen = ((pgoff_t)(holelen) + PAGE_SIZE - 1) >> PAGE_SHIFT;

/* Check for overflow. */
if (sizeof(holelen) > sizeof(hlen)) {
--
2.34.1



2023-12-20 17:53:50

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v1] mm: fix unmap_mapping_range high bits shift bug

On Wed, 20 Dec 2023 13:28:39 +0800 "jiajun.xie" <[email protected]> wrote:

> From: Jiajun Xie <[email protected]>
>
> The bug happens when highest bit of holebegin is 1, suppose
> holebign is 0x8000000111111000, after shift, hba would be
> 0xfff8000000111111, then vma_interval_tree_foreach would look
> it up fail or leads to the wrong result.
>
> error call seq e.g.:
> - mmap(..., offset=0x8000000111111000)
> |- syscall(mmap, ... unsigned long, off):
> |- ksys_mmap_pgoff( ... , off >> PAGE_SHIFT);
>
> here pgoff is correctly shifted to 0x8000000111111,
> but pass 0x8000000111111000 as holebegin to unmap
> would then cause terrible result, as shown below:
>
> - unmap_mapping_range(..., loff_t const holebegin)
> |- pgoff_t hba = holebegin >> PAGE_SHIFT;
> /* hba = 0xfff8000000111111 unexpectedly */
>
> turn holebegin to be unsigned first would fix the bug.
>

Thanks. Are you able to describe the runtime effects of this
(obviously bad, but it's good to spell it out) and under what
circumstances it occurs?

2023-12-21 05:40:33

by Jiajun Xie

[permalink] [raw]
Subject: Re: [PATCH v1] mm: fix unmap_mapping_range high bits shift bug

On Thu, Dec 21, 2023 at 1:53 AM Andrew Morton <[email protected]> wrote:
>
> On Wed, 20 Dec 2023 13:28:39 +0800 "jiajun.xie" <[email protected]> wrote:
>
> > From: Jiajun Xie <[email protected]>
> >
> > The bug happens when highest bit of holebegin is 1, suppose
> > holebign is 0x8000000111111000, after shift, hba would be
> > 0xfff8000000111111, then vma_interval_tree_foreach would look
> > it up fail or leads to the wrong result.
> >
> > error call seq e.g.:
> > - mmap(..., offset=0x8000000111111000)
> > |- syscall(mmap, ... unsigned long, off):
> > |- ksys_mmap_pgoff( ... , off >> PAGE_SHIFT);
> >
> > here pgoff is correctly shifted to 0x8000000111111,
> > but pass 0x8000000111111000 as holebegin to unmap
> > would then cause terrible result, as shown below:
> >
> > - unmap_mapping_range(..., loff_t const holebegin)
> > |- pgoff_t hba = holebegin >> PAGE_SHIFT;
> > /* hba = 0xfff8000000111111 unexpectedly */
> >
> > turn holebegin to be unsigned first would fix the bug.
> >
>
> Thanks. Are you able to describe the runtime effects of this
> (obviously bad, but it's good to spell it out) and under what
> circumstances it occurs?

Thanks for the quick reply.

The issue happens in Heterogeneous computing, where the
device(e.g. gpu) and host share the same virtual address space.

A simple workflow pattern which hit the issue is:
/* host */
1. userspace first mmap a file backed VA range with specified offset.
e.g. (offset=0x800..., mmap return: va_a)
2. write some data to the corresponding sys page
e.g. (va_a = 0xAABB)
/* device */
3. gpu workload touches VA, triggers gpu fault and notify the host.
/* host */
4. reviced gpu fault notification, then it will:
4.1 unmap host pages and also takes care of cpu tlb
(use unmap_mapping_range with offset=0x800...)
4.2 migrate sys page to device
4.3 setup device page table and resolve device fault.
/* device */
5. gpu workload continued, it accessed va_a and got 0xAABB.
6. gpu workload continued, it wrote 0xBBCC to va_a.
/* host */
7. userspace access va_a, as expected, it will:
7.1 trigger cpu vm fault.
7.2 driver handling fault to migrate gpu local page to host.
8. userspace then could correctly get 0xBBCC from va_a
9. done

But in step 4.1, if we hitted the bug this patch mentioned, then user space
would never trigger cpu fault, and still get the old value: 0xAABB.

2023-12-21 22:08:31

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v1] mm: fix unmap_mapping_range high bits shift bug

On Thu, 21 Dec 2023 13:40:11 +0800 Jiajun Xie <[email protected]> wrote:

> > (obviously bad, but it's good to spell it out) and under what
> > circumstances it occurs?
>
> Thanks for the quick reply.
>
> The issue happens in Heterogeneous computing, where the
> device(e.g. gpu) and host share the same virtual address space.
>
> A simple workflow pattern which hit the issue is:
> /* host */
> 1. userspace first mmap a file backed VA range with specified offset.
> e.g. (offset=0x800..., mmap return: va_a)
> 2. write some data to the corresponding sys page
> e.g. (va_a = 0xAABB)
> /* device */
> 3. gpu workload touches VA, triggers gpu fault and notify the host.
> /* host */
> 4. reviced gpu fault notification, then it will:
> 4.1 unmap host pages and also takes care of cpu tlb
> (use unmap_mapping_range with offset=0x800...)
> 4.2 migrate sys page to device
> 4.3 setup device page table and resolve device fault.
> /* device */
> 5. gpu workload continued, it accessed va_a and got 0xAABB.
> 6. gpu workload continued, it wrote 0xBBCC to va_a.
> /* host */
> 7. userspace access va_a, as expected, it will:
> 7.1 trigger cpu vm fault.
> 7.2 driver handling fault to migrate gpu local page to host.
> 8. userspace then could correctly get 0xBBCC from va_a
> 9. done
>
> But in step 4.1, if we hitted the bug this patch mentioned, then user space
> would never trigger cpu fault, and still get the old value: 0xAABB.

Thanks. Based on the above, I added cc:stable to the changelog so the
fix will be backported into earlier kernels (it looks like that's 20+
years worth!). And I pasted the above text into that changelog.