From: Jiajun Xie <[email protected]>
The bug happens when highest bit of holebegin is 1, suppose
holebign is 0x8000000111111000, after shift, hba would be
0xfff8000000111111, then vma_interval_tree_foreach would look
it up fail or leads to the wrong result.
error call seq e.g.:
- mmap(..., offset=0x8000000111111000)
|- syscall(mmap, ... unsigned long, off):
|- ksys_mmap_pgoff( ... , off >> PAGE_SHIFT);
here pgoff is correctly shifted to 0x8000000111111,
but pass 0x8000000111111000 as holebegin to unmap
would then cause terrible result, as shown below:
- unmap_mapping_range(..., loff_t const holebegin)
|- pgoff_t hba = holebegin >> PAGE_SHIFT;
/* hba = 0xfff8000000111111 unexpectedly */
turn holebegin to be unsigned first would fix the bug.
Signed-off-by: Jiajun Xie <[email protected]>
---
mm/memory.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 5c757fba8..6e0712d06 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3624,8 +3624,8 @@ EXPORT_SYMBOL_GPL(unmap_mapping_pages);
void unmap_mapping_range(struct address_space *mapping,
loff_t const holebegin, loff_t const holelen, int even_cows)
{
- pgoff_t hba = holebegin >> PAGE_SHIFT;
- pgoff_t hlen = (holelen + PAGE_SIZE - 1) >> PAGE_SHIFT;
+ pgoff_t hba = (pgoff_t)(holebegin) >> PAGE_SHIFT;
+ pgoff_t hlen = ((pgoff_t)(holelen) + PAGE_SIZE - 1) >> PAGE_SHIFT;
/* Check for overflow. */
if (sizeof(holelen) > sizeof(hlen)) {
--
2.34.1
On Wed, 20 Dec 2023 13:28:39 +0800 "jiajun.xie" <[email protected]> wrote:
> From: Jiajun Xie <[email protected]>
>
> The bug happens when highest bit of holebegin is 1, suppose
> holebign is 0x8000000111111000, after shift, hba would be
> 0xfff8000000111111, then vma_interval_tree_foreach would look
> it up fail or leads to the wrong result.
>
> error call seq e.g.:
> - mmap(..., offset=0x8000000111111000)
> |- syscall(mmap, ... unsigned long, off):
> |- ksys_mmap_pgoff( ... , off >> PAGE_SHIFT);
>
> here pgoff is correctly shifted to 0x8000000111111,
> but pass 0x8000000111111000 as holebegin to unmap
> would then cause terrible result, as shown below:
>
> - unmap_mapping_range(..., loff_t const holebegin)
> |- pgoff_t hba = holebegin >> PAGE_SHIFT;
> /* hba = 0xfff8000000111111 unexpectedly */
>
> turn holebegin to be unsigned first would fix the bug.
>
Thanks. Are you able to describe the runtime effects of this
(obviously bad, but it's good to spell it out) and under what
circumstances it occurs?
On Thu, Dec 21, 2023 at 1:53 AM Andrew Morton <[email protected]> wrote:
>
> On Wed, 20 Dec 2023 13:28:39 +0800 "jiajun.xie" <[email protected]> wrote:
>
> > From: Jiajun Xie <[email protected]>
> >
> > The bug happens when highest bit of holebegin is 1, suppose
> > holebign is 0x8000000111111000, after shift, hba would be
> > 0xfff8000000111111, then vma_interval_tree_foreach would look
> > it up fail or leads to the wrong result.
> >
> > error call seq e.g.:
> > - mmap(..., offset=0x8000000111111000)
> > |- syscall(mmap, ... unsigned long, off):
> > |- ksys_mmap_pgoff( ... , off >> PAGE_SHIFT);
> >
> > here pgoff is correctly shifted to 0x8000000111111,
> > but pass 0x8000000111111000 as holebegin to unmap
> > would then cause terrible result, as shown below:
> >
> > - unmap_mapping_range(..., loff_t const holebegin)
> > |- pgoff_t hba = holebegin >> PAGE_SHIFT;
> > /* hba = 0xfff8000000111111 unexpectedly */
> >
> > turn holebegin to be unsigned first would fix the bug.
> >
>
> Thanks. Are you able to describe the runtime effects of this
> (obviously bad, but it's good to spell it out) and under what
> circumstances it occurs?
Thanks for the quick reply.
The issue happens in Heterogeneous computing, where the
device(e.g. gpu) and host share the same virtual address space.
A simple workflow pattern which hit the issue is:
/* host */
1. userspace first mmap a file backed VA range with specified offset.
e.g. (offset=0x800..., mmap return: va_a)
2. write some data to the corresponding sys page
e.g. (va_a = 0xAABB)
/* device */
3. gpu workload touches VA, triggers gpu fault and notify the host.
/* host */
4. reviced gpu fault notification, then it will:
4.1 unmap host pages and also takes care of cpu tlb
(use unmap_mapping_range with offset=0x800...)
4.2 migrate sys page to device
4.3 setup device page table and resolve device fault.
/* device */
5. gpu workload continued, it accessed va_a and got 0xAABB.
6. gpu workload continued, it wrote 0xBBCC to va_a.
/* host */
7. userspace access va_a, as expected, it will:
7.1 trigger cpu vm fault.
7.2 driver handling fault to migrate gpu local page to host.
8. userspace then could correctly get 0xBBCC from va_a
9. done
But in step 4.1, if we hitted the bug this patch mentioned, then user space
would never trigger cpu fault, and still get the old value: 0xAABB.
On Thu, 21 Dec 2023 13:40:11 +0800 Jiajun Xie <[email protected]> wrote:
> > (obviously bad, but it's good to spell it out) and under what
> > circumstances it occurs?
>
> Thanks for the quick reply.
>
> The issue happens in Heterogeneous computing, where the
> device(e.g. gpu) and host share the same virtual address space.
>
> A simple workflow pattern which hit the issue is:
> /* host */
> 1. userspace first mmap a file backed VA range with specified offset.
> e.g. (offset=0x800..., mmap return: va_a)
> 2. write some data to the corresponding sys page
> e.g. (va_a = 0xAABB)
> /* device */
> 3. gpu workload touches VA, triggers gpu fault and notify the host.
> /* host */
> 4. reviced gpu fault notification, then it will:
> 4.1 unmap host pages and also takes care of cpu tlb
> (use unmap_mapping_range with offset=0x800...)
> 4.2 migrate sys page to device
> 4.3 setup device page table and resolve device fault.
> /* device */
> 5. gpu workload continued, it accessed va_a and got 0xAABB.
> 6. gpu workload continued, it wrote 0xBBCC to va_a.
> /* host */
> 7. userspace access va_a, as expected, it will:
> 7.1 trigger cpu vm fault.
> 7.2 driver handling fault to migrate gpu local page to host.
> 8. userspace then could correctly get 0xBBCC from va_a
> 9. done
>
> But in step 4.1, if we hitted the bug this patch mentioned, then user space
> would never trigger cpu fault, and still get the old value: 0xAABB.
Thanks. Based on the above, I added cc:stable to the changelog so the
fix will be backported into earlier kernels (it looks like that's 20+
years worth!). And I pasted the above text into that changelog.