2023-12-07 16:12:53

by Ryan Roberts

[permalink] [raw]
Subject: [PATCH v9 02/10] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap()

In preparation for supporting anonymous multi-size THP, improve
folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be
passed to it. In this case, all contained pages are accounted using the
order-0 folio (or base page) scheme.

Reviewed-by: Yu Zhao <[email protected]>
Reviewed-by: Yin Fengwei <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Barry Song <[email protected]>
Tested-by: Kefeng Wang <[email protected]>
Tested-by: John Hubbard <[email protected]>
Signed-off-by: Ryan Roberts <[email protected]>
---
mm/rmap.c | 28 ++++++++++++++++++++--------
1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 2a1e45e6419f..846fc79f3ca9 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1335,32 +1335,44 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
* This means the inc-and-test can be bypassed.
* The folio does not have to be locked.
*
- * If the folio is large, it is accounted as a THP. As the folio
+ * If the folio is pmd-mappable, it is accounted as a THP. As the folio
* is new, it's assumed to be mapped exclusively by a single process.
*/
void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
unsigned long address)
{
- int nr;
+ int nr = folio_nr_pages(folio);

- VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
+ VM_BUG_ON_VMA(address < vma->vm_start ||
+ address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
__folio_set_swapbacked(folio);
+ __folio_set_anon(folio, vma, address, true);

- if (likely(!folio_test_pmd_mappable(folio))) {
+ if (likely(!folio_test_large(folio))) {
/* increment count (starts at -1) */
atomic_set(&folio->_mapcount, 0);
- nr = 1;
+ SetPageAnonExclusive(&folio->page);
+ } else if (!folio_test_pmd_mappable(folio)) {
+ int i;
+
+ for (i = 0; i < nr; i++) {
+ struct page *page = folio_page(folio, i);
+
+ /* increment count (starts at -1) */
+ atomic_set(&page->_mapcount, 0);
+ SetPageAnonExclusive(page);
+ }
+
+ atomic_set(&folio->_nr_pages_mapped, nr);
} else {
/* increment count (starts at -1) */
atomic_set(&folio->_entire_mapcount, 0);
atomic_set(&folio->_nr_pages_mapped, COMPOUND_MAPPED);
- nr = folio_nr_pages(folio);
+ SetPageAnonExclusive(&folio->page);
__lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr);
}

__lruvec_stat_mod_folio(folio, NR_ANON_MAPPED, nr);
- __folio_set_anon(folio, vma, address, true);
- SetPageAnonExclusive(&folio->page);
}

/**
--
2.25.1


2024-01-13 22:42:50

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v9 02/10] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap()

On Thu, Dec 07, 2023 at 04:12:03PM +0000, Ryan Roberts wrote:
> In preparation for supporting anonymous multi-size THP, improve
> folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be
> passed to it. In this case, all contained pages are accounted using the
> order-0 folio (or base page) scheme.
>
> Reviewed-by: Yu Zhao <[email protected]>
> Reviewed-by: Yin Fengwei <[email protected]>
> Reviewed-by: David Hildenbrand <[email protected]>
> Reviewed-by: Barry Song <[email protected]>
> Tested-by: Kefeng Wang <[email protected]>
> Tested-by: John Hubbard <[email protected]>
> Signed-off-by: Ryan Roberts <[email protected]>
> ---
> mm/rmap.c | 28 ++++++++++++++++++++--------
> 1 file changed, 20 insertions(+), 8 deletions(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 2a1e45e6419f..846fc79f3ca9 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1335,32 +1335,44 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
> * This means the inc-and-test can be bypassed.
> * The folio does not have to be locked.
> *
> - * If the folio is large, it is accounted as a THP. As the folio
> + * If the folio is pmd-mappable, it is accounted as a THP. As the folio
> * is new, it's assumed to be mapped exclusively by a single process.
> */
> void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
> unsigned long address)
> {
> - int nr;
> + int nr = folio_nr_pages(folio);
>
> - VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
> + VM_BUG_ON_VMA(address < vma->vm_start ||
> + address + (nr << PAGE_SHIFT) > vma->vm_end, vma);

hi,
I'm hitting this bug (console output below) with adding uprobe
on simple program like:

$ cat up.c
int main(void)
{
return 0;
}

# bpftrace -e 'uprobe:/home/jolsa/up:_start {}'

$ ./up

it's on top of current linus tree master:
052d534373b7 Merge tag 'exfat-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat

before this patch it seems to work, I can send my .config if needed

thanks,
jirka


---
[ 147.562264][ T719] vma ffff888166134e68 start 0000000000401000 end 0000000000402000 mm ffff88817cf2a840
[ 147.562264][ T719] prot 25 anon_vma ffff88817b6818e0 vm_ops ffffffff83475ec0
[ 147.562264][ T719] pgoff 1 file ffff888168d01240 private_data 0000000000000000
[ 147.562264][ T719] flags: 0x75(read|exec|mayread|maywrite|mayexec)
[ 147.571660][ T719] ------------[ cut here ]------------
[ 147.572319][ T719] kernel BUG at mm/rmap.c:1412!
[ 147.572825][ T719] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN NOPTI
[ 147.573792][ T719] CPU: 3 PID: 719 Comm: up Not tainted 6.7.0+ #273 faf755a6fc44b54f4ff1c207411fbd9df5a3968d
[ 147.574831][ T719] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014
[ 147.575652][ T719] RIP: 0010:folio_add_new_anon_rmap+0x2cc/0x8f0
[ 147.576164][ T719] Code: c7 c6 20 d2 38 83 48 89 df e8 c0 4a fb ff 0f 0b 48 89 ef e8 16 ab 08 00 4c 3b 65 00 0f 83 cd fd ff ff 48 89 ef e8 34 44 fb ff f7 c3 ff 0f 00 00 0f 85 de fe ff ff be 08 00 00 00 48 89 df
[ 147.577609][ T719] RSP: 0018:ffff88815759f568 EFLAGS: 00010286
[ 147.578140][ T719] RAX: 00000000000000fa RBX: ffffea00053eef40 RCX: 0000000000000000
[ 147.578825][ T719] RDX: 0000000000000000 RSI: ffffffff81289b44 RDI: ffffffff872ff1a0
[ 147.579513][ T719] RBP: ffff888166134e68 R08: 0000000000000001 R09: ffffed102aeb3e5f
[ 147.580198][ T719] R10: ffff88815759f2ff R11: 0000000000000000 R12: 0000000000401020
[ 147.580886][ T719] R13: 0000000000000001 R14: ffffea00053eef40 R15: ffffea00053eef40
[ 147.581566][ T719] FS: 0000000000000000(0000) GS:ffff88842ce00000(0000) knlGS:0000000000000000
[ 147.582263][ T719] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 147.582724][ T719] CR2: 00005634f0ffe880 CR3: 000000010c0f8002 CR4: 0000000000770ef0
[ 147.583304][ T719] PKRU: 55555554
[ 147.583586][ T719] Call Trace:
[ 147.583869][ T719] <TASK>
[ 147.584122][ T719] ? die+0x32/0x80
[ 147.584422][ T719] ? do_trap+0x12f/0x220
[ 147.584800][ T719] ? folio_add_new_anon_rmap+0x2cc/0x8f0
[ 147.585411][ T719] ? folio_add_new_anon_rmap+0x2cc/0x8f0
[ 147.585891][ T719] ? do_error_trap+0xa7/0x160
[ 147.586349][ T719] ? folio_add_new_anon_rmap+0x2cc/0x8f0
[ 147.586879][ T719] ? handle_invalid_op+0x2c/0x40
[ 147.587354][ T719] ? folio_add_new_anon_rmap+0x2cc/0x8f0
[ 147.587892][ T719] ? exc_invalid_op+0x29/0x40
[ 147.588352][ T719] ? asm_exc_invalid_op+0x16/0x20
[ 147.588847][ T719] ? preempt_count_sub+0x14/0xc0
[ 147.589335][ T719] ? folio_add_new_anon_rmap+0x2cc/0x8f0
[ 147.589899][ T719] ? folio_add_new_anon_rmap+0x2cc/0x8f0
[ 147.590437][ T719] __replace_page+0x364/0xb40
[ 147.590918][ T719] ? __pfx___replace_page+0x10/0x10
[ 147.591412][ T719] ? __pfx_lock_release+0x10/0x10
[ 147.591910][ T719] ? do_raw_spin_trylock+0xcd/0x120
[ 147.592555][ T719] ? __pfx_vma_alloc_folio+0x10/0x10
[ 147.593095][ T719] ? preempt_count_add+0x6e/0xc0
[ 147.593612][ T719] ? preempt_count_sub+0x14/0xc0
[ 147.594143][ T719] uprobe_write_opcode+0x3f6/0x820
[ 147.594616][ T719] ? __pfx_uprobe_write_opcode+0x10/0x10
[ 147.595125][ T719] ? preempt_count_sub+0x14/0xc0
[ 147.595551][ T719] ? up_write+0x125/0x2f0
[ 147.596014][ T719] install_breakpoint.isra.0+0xe5/0x470
[ 147.596635][ T719] uprobe_mmap+0x37b/0x8d0
[ 147.598111][ T719] ? __pfx_uprobe_mmap+0x10/0x10
[ 147.598561][ T719] mmap_region+0xa02/0x1220
[ 147.599013][ T719] ? rcu_is_watching+0x34/0x60
[ 147.599602][ T719] ? lock_acquired+0xbf/0x670
[ 147.600024][ T719] ? __pfx_mmap_region+0x10/0x10
[ 147.600458][ T719] ? security_mmap_addr+0x20/0x60
[ 147.600909][ T719] ? get_unmapped_area+0x169/0x1f0
[ 147.601353][ T719] do_mmap+0x425/0x660
[ 147.601739][ T719] vm_mmap_pgoff+0x15e/0x2b0
[ 147.602156][ T719] ? __pfx_vm_mmap_pgoff+0x10/0x10
[ 147.602597][ T719] ? __pfx_get_random_u64+0x10/0x10
[ 147.603059][ T719] elf_load+0xdc/0x3a0
[ 147.603433][ T719] load_elf_binary+0x6f6/0x22b0
[ 147.603889][ T719] ? __pfx_load_elf_binary+0x10/0x10
[ 147.604385][ T719] ? __pfx_lock_acquired+0x10/0x10
[ 147.604952][ T719] bprm_execve+0x494/0xc80
[ 147.605379][ T719] ? __pfx_bprm_execve+0x10/0x10
[ 147.605843][ T719] do_execveat_common.isra.0+0x24f/0x330
[ 147.606358][ T719] __x64_sys_execve+0x52/0x60
[ 147.606797][ T719] do_syscall_64+0x87/0x1b0
[ 147.607148][ T719] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 147.607630][ T719] RIP: 0033:0x7faa9b0bdb4b
[ 147.608732][ T719] Code: Unable to access opcode bytes at 0x7faa9b0bdb21.
[ 147.609318][ T719] RSP: 002b:00007ffff9921708 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
[ 147.610046][ T719] RAX: ffffffffffffffda RBX: 00005634f1964990 RCX: 00007faa9b0bdb4b
[ 147.610727][ T719] RDX: 00005634f1966d20 RSI: 00005634f19612c0 RDI: 00005634f1964990
[ 147.611528][ T719] RBP: 00007ffff9921800 R08: 0000000000000001 R09: 0000000000000001
[ 147.612192][ T719] R10: 0000000000000008 R11: 0000000000000246 R12: 00000000ffffffff
[ 147.612829][ T719] R13: 00005634f1964990 R14: 00005634f19612c0 R15: 00005634f1966d20
[ 147.613479][ T719] </TASK>
[ 147.613787][ T719] Modules linked in: intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel kvm_intel rapl iTiTCO_vendor_support i2c_i801 i2c_smbus lpc_ich drm loop drm_panel_orientation_quirks zram
[ 147.615630][ T719] ---[ end trace 0000000000000000 ]---
[ 147.616253][ T719] RIP: 0010:folio_add_new_anon_rmap+0x2cc/0x8f0
[ 147.616714][ T719] Code: c7 c6 20 d2 38 83 48 89 df e8 c0 4a fb ff 0f 0b 48 89 ef e8 16 ab 08 00 4c 3b 65 00 0f 83 cd fd ff ff 48 89 ef e8 34 44 fb ff f7 c3 ff 0f 00 00 0f 85 de fe ff ff be 08 00 00 00 48 89 df
[ 147.618160][ T719] RSP: 0018:ffff88815759f568 EFLAGS: 00010286
[ 147.618594][ T719] RAX: 00000000000000fa RBX: ffffea00053eef40 RCX: 0000000000000000
[ 147.619318][ T719] RDX: 0000000000000000 RSI: ffffffff81289b44 RDI: ffffffff872ff1a0
[ 147.619930][ T719] RBP: ffff888166134e68 R08: 0000000000000001 R09: ffffed102aeb3e5f
[ 147.620577][ T719] R10: ffff88815759f2ff R11: 0000000000000000 R12: 0000000000401020
[ 147.621236][ T719] R13: 0000000000000001 R14: ffffea00053eef40 R15: ffffea00053eef40
[ 147.621894][ T719] FS: 0000000000000000(0000) GS:ffff88842ce00000(0000) knlGS:0000000000000000
[ 147.622596][ T719] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 147.623186][ T719] CR2: 00007faa9b0bdb21 CR3: 000000010c0f8002 CR4: 0000000000770ef0
[ 147.623960][ T719] PKRU: 55555554
[ 147.624331][ T719] note: up[719] exited with preempt_count 1
[ 147.624953][ T719] BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:49
[ 147.625898][ T719] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 719, name: up
[ 147.626672][ T719] preempt_count: 0, expected: 0
[ 147.627945][ T719] RCU nest depth: 1, expected: 0
[ 147.628410][ T719] INFO: lockdep is turned off.
[ 147.628898][ T719] CPU: 3 PID: 719 Comm: up Tainted: G D 6.7.0+ #273 faf755a6fc44b54f4ff1c207411fbd9df5a3968d
[ 147.629954][ T719] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014
[ 147.630838][ T719] Call Trace:
[ 147.631185][ T719] <TASK>
[ 147.631514][ T719] dump_stack_lvl+0x15d/0x180
[ 147.631973][ T719] __might_resched+0x270/0x3b0
[ 147.636533][ T719] exit_signals+0x1d/0x460
[ 147.636947][ T719] do_exit+0x27f/0x13b0
[ 147.637368][ T719] ? __pfx__printk+0x10/0x10
[ 147.637827][ T719] ? __pfx_do_exit+0x10/0x10
[ 147.638238][ T719] make_task_dead+0xd9/0x240
[ 147.638610][ T719] rewind_stack_and_make_dead+0x17/0x20
[ 147.639064][ T719] RIP: 0033:0x7faa9b0bdb4b
[ 147.639445][ T719] Code: Unable to access opcode bytes at 0x7faa9b0bdb21.
[ 147.640015][ T719] RSP: 002b:00007ffff9921708 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
[ 147.640694][ T719] RAX: ffffffffffffffda RBX: 00005634f1964990 RCX: 00007faa9b0bdb4b
[ 147.641407][ T719] RDX: 00005634f1966d20 RSI: 00005634f19612c0 RDI: 00005634f1964990
[ 147.642133][ T719] RBP: 00007ffff9921800 R08: 0000000000000001 R09: 0000000000000001
[ 147.642911][ T719] R10: 0000000000000008 R11: 0000000000000246 R12: 00000000ffffffff
[ 147.643685][ T719] R13: 00005634f1964990 R14: 00005634f19612c0 R15: 00005634f1966d20
[ 147.644454][ T719] </TASK>
[ 147.644819][ T719] ------------[ cut here ]------------

2024-01-14 17:34:14

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v9 02/10] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap()

On 13.01.24 23:42, Jiri Olsa wrote:
> On Thu, Dec 07, 2023 at 04:12:03PM +0000, Ryan Roberts wrote:
>> In preparation for supporting anonymous multi-size THP, improve
>> folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be
>> passed to it. In this case, all contained pages are accounted using the
>> order-0 folio (or base page) scheme.
>>
>> Reviewed-by: Yu Zhao <[email protected]>
>> Reviewed-by: Yin Fengwei <[email protected]>
>> Reviewed-by: David Hildenbrand <[email protected]>
>> Reviewed-by: Barry Song <[email protected]>
>> Tested-by: Kefeng Wang <[email protected]>
>> Tested-by: John Hubbard <[email protected]>
>> Signed-off-by: Ryan Roberts <[email protected]>
>> ---
>> mm/rmap.c | 28 ++++++++++++++++++++--------
>> 1 file changed, 20 insertions(+), 8 deletions(-)
>>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index 2a1e45e6419f..846fc79f3ca9 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1335,32 +1335,44 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
>> * This means the inc-and-test can be bypassed.
>> * The folio does not have to be locked.
>> *
>> - * If the folio is large, it is accounted as a THP. As the folio
>> + * If the folio is pmd-mappable, it is accounted as a THP. As the folio
>> * is new, it's assumed to be mapped exclusively by a single process.
>> */
>> void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>> unsigned long address)
>> {
>> - int nr;
>> + int nr = folio_nr_pages(folio);
>>
>> - VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
>> + VM_BUG_ON_VMA(address < vma->vm_start ||
>> + address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
>
> hi,
> I'm hitting this bug (console output below) with adding uprobe
> on simple program like:
>
> $ cat up.c
> int main(void)
> {
> return 0;
> }
>
> # bpftrace -e 'uprobe:/home/jolsa/up:_start {}'
>
> $ ./up
>
> it's on top of current linus tree master:
> 052d534373b7 Merge tag 'exfat-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
>
> before this patch it seems to work, I can send my .config if needed

bpf only inserts a small folio, so no magic there.

It was:
VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
And now it is
VM_BUG_ON_VMA(address < vma->vm_start || address + (nr << PAGE_SHIFT) > vma->vm_end, vma);

I think this change is sane. As long as the address is aligned to full pages
(which it better should be)

Staring at uprobe_write_opcode, likely vaddr isn't aligned ...

Likely (hopefully) that is not an issue for __folio_set_anon(), because linear_page_index()
will mask these bits off.


Would the following change fix it for you?

From c640a8363e47bc96965a35115a040b5f876c4320 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <[email protected]>
Date: Sun, 14 Jan 2024 18:32:57 +0100
Subject: [PATCH] tmp

Signed-off-by: David Hildenbrand <[email protected]>
---
kernel/events/uprobes.c | 2 +-
mm/rmap.c | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 485bb0389b488..929e98c629652 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -537,7 +537,7 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
}
}

- ret = __replace_page(vma, vaddr, old_page, new_page);
+ ret = __replace_page(vma, vaddr & PAGE_MASK, old_page, new_page);
if (new_page)
put_page(new_page);
put_old:
diff --git a/mm/rmap.c b/mm/rmap.c
index f5d43edad529a..a903db4df6b97 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1408,6 +1408,7 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
{
int nr = folio_nr_pages(folio);

+ VM_WARN_ON_FOLIO(!IS_ALIGNED(address, PAGE_SIZE), folio);
VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
VM_BUG_ON_VMA(address < vma->vm_start ||
address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
--
2.43.0



--
Cheers,

David / dhildenb


2024-01-14 20:55:36

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v9 02/10] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap()

On Sun, Jan 14, 2024 at 06:33:56PM +0100, David Hildenbrand wrote:
> On 13.01.24 23:42, Jiri Olsa wrote:
> > On Thu, Dec 07, 2023 at 04:12:03PM +0000, Ryan Roberts wrote:
> > > In preparation for supporting anonymous multi-size THP, improve
> > > folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be
> > > passed to it. In this case, all contained pages are accounted using the
> > > order-0 folio (or base page) scheme.
> > >
> > > Reviewed-by: Yu Zhao <[email protected]>
> > > Reviewed-by: Yin Fengwei <[email protected]>
> > > Reviewed-by: David Hildenbrand <[email protected]>
> > > Reviewed-by: Barry Song <[email protected]>
> > > Tested-by: Kefeng Wang <[email protected]>
> > > Tested-by: John Hubbard <[email protected]>
> > > Signed-off-by: Ryan Roberts <[email protected]>
> > > ---
> > > mm/rmap.c | 28 ++++++++++++++++++++--------
> > > 1 file changed, 20 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/mm/rmap.c b/mm/rmap.c
> > > index 2a1e45e6419f..846fc79f3ca9 100644
> > > --- a/mm/rmap.c
> > > +++ b/mm/rmap.c
> > > @@ -1335,32 +1335,44 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
> > > * This means the inc-and-test can be bypassed.
> > > * The folio does not have to be locked.
> > > *
> > > - * If the folio is large, it is accounted as a THP. As the folio
> > > + * If the folio is pmd-mappable, it is accounted as a THP. As the folio
> > > * is new, it's assumed to be mapped exclusively by a single process.
> > > */
> > > void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
> > > unsigned long address)
> > > {
> > > - int nr;
> > > + int nr = folio_nr_pages(folio);
> > > - VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
> > > + VM_BUG_ON_VMA(address < vma->vm_start ||
> > > + address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
> >
> > hi,
> > I'm hitting this bug (console output below) with adding uprobe
> > on simple program like:
> >
> > $ cat up.c
> > int main(void)
> > {
> > return 0;
> > }
> >
> > # bpftrace -e 'uprobe:/home/jolsa/up:_start {}'
> >
> > $ ./up
> >
> > it's on top of current linus tree master:
> > 052d534373b7 Merge tag 'exfat-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
> >
> > before this patch it seems to work, I can send my .config if needed
>
> bpf only inserts a small folio, so no magic there.
>
> It was:
> VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
> And now it is
> VM_BUG_ON_VMA(address < vma->vm_start || address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
>
> I think this change is sane. As long as the address is aligned to full pages
> (which it better should be)
>
> Staring at uprobe_write_opcode, likely vaddr isn't aligned ...
>
> Likely (hopefully) that is not an issue for __folio_set_anon(), because linear_page_index()
> will mask these bits off.
>
>
> Would the following change fix it for you?

great, that fixes it for me, you can add my

Tested-by: Jiri Olsa <[email protected]>

thanks,
jirka

>
> From c640a8363e47bc96965a35115a040b5f876c4320 Mon Sep 17 00:00:00 2001
> From: David Hildenbrand <[email protected]>
> Date: Sun, 14 Jan 2024 18:32:57 +0100
> Subject: [PATCH] tmp
>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> kernel/events/uprobes.c | 2 +-
> mm/rmap.c | 1 +
> 2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 485bb0389b488..929e98c629652 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -537,7 +537,7 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
> }
> }
> - ret = __replace_page(vma, vaddr, old_page, new_page);
> + ret = __replace_page(vma, vaddr & PAGE_MASK, old_page, new_page);
> if (new_page)
> put_page(new_page);
> put_old:
> diff --git a/mm/rmap.c b/mm/rmap.c
> index f5d43edad529a..a903db4df6b97 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1408,6 +1408,7 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
> {
> int nr = folio_nr_pages(folio);
> + VM_WARN_ON_FOLIO(!IS_ALIGNED(address, PAGE_SIZE), folio);
> VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
> VM_BUG_ON_VMA(address < vma->vm_start ||
> address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
> --
> 2.43.0
>
>
>
> --
> Cheers,
>
> David / dhildenb
>

2024-01-15 08:50:42

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v9 02/10] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap()

On 14/01/2024 20:55, Jiri Olsa wrote:
> On Sun, Jan 14, 2024 at 06:33:56PM +0100, David Hildenbrand wrote:
>> On 13.01.24 23:42, Jiri Olsa wrote:
>>> On Thu, Dec 07, 2023 at 04:12:03PM +0000, Ryan Roberts wrote:
>>>> In preparation for supporting anonymous multi-size THP, improve
>>>> folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be
>>>> passed to it. In this case, all contained pages are accounted using the
>>>> order-0 folio (or base page) scheme.
>>>>
>>>> Reviewed-by: Yu Zhao <[email protected]>
>>>> Reviewed-by: Yin Fengwei <[email protected]>
>>>> Reviewed-by: David Hildenbrand <[email protected]>
>>>> Reviewed-by: Barry Song <[email protected]>
>>>> Tested-by: Kefeng Wang <[email protected]>
>>>> Tested-by: John Hubbard <[email protected]>
>>>> Signed-off-by: Ryan Roberts <[email protected]>
>>>> ---
>>>> mm/rmap.c | 28 ++++++++++++++++++++--------
>>>> 1 file changed, 20 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>>> index 2a1e45e6419f..846fc79f3ca9 100644
>>>> --- a/mm/rmap.c
>>>> +++ b/mm/rmap.c
>>>> @@ -1335,32 +1335,44 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
>>>> * This means the inc-and-test can be bypassed.
>>>> * The folio does not have to be locked.
>>>> *
>>>> - * If the folio is large, it is accounted as a THP. As the folio
>>>> + * If the folio is pmd-mappable, it is accounted as a THP. As the folio
>>>> * is new, it's assumed to be mapped exclusively by a single process.
>>>> */
>>>> void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>>>> unsigned long address)
>>>> {
>>>> - int nr;
>>>> + int nr = folio_nr_pages(folio);
>>>> - VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
>>>> + VM_BUG_ON_VMA(address < vma->vm_start ||
>>>> + address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
>>>
>>> hi,
>>> I'm hitting this bug (console output below) with adding uprobe
>>> on simple program like:
>>>
>>> $ cat up.c
>>> int main(void)
>>> {
>>> return 0;
>>> }
>>>
>>> # bpftrace -e 'uprobe:/home/jolsa/up:_start {}'
>>>
>>> $ ./up
>>>
>>> it's on top of current linus tree master:
>>> 052d534373b7 Merge tag 'exfat-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
>>>
>>> before this patch it seems to work, I can send my .config if needed

Thanks for the bug report!

>>
>> bpf only inserts a small folio, so no magic there.
>>
>> It was:
>> VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
>> And now it is
>> VM_BUG_ON_VMA(address < vma->vm_start || address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
>>
>> I think this change is sane. As long as the address is aligned to full pages
>> (which it better should be)
>>
>> Staring at uprobe_write_opcode, likely vaddr isn't aligned ...
>>
>> Likely (hopefully) that is not an issue for __folio_set_anon(), because linear_page_index()
>> will mask these bits off.
>>
>>
>> Would the following change fix it for you?

And thanks for fixing my mess so quickly, David.

FWIW, I agree with your diagnosis. One small suggestion below.

>
> great, that fixes it for me, you can add my
>
> Tested-by: Jiri Olsa <[email protected]>
>
> thanks,
> jirka
>
>>
>> From c640a8363e47bc96965a35115a040b5f876c4320 Mon Sep 17 00:00:00 2001
>> From: David Hildenbrand <[email protected]>
>> Date: Sun, 14 Jan 2024 18:32:57 +0100
>> Subject: [PATCH] tmp
>>
>> Signed-off-by: David Hildenbrand <[email protected]>
>> ---
>> kernel/events/uprobes.c | 2 +-
>> mm/rmap.c | 1 +
>> 2 files changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
>> index 485bb0389b488..929e98c629652 100644
>> --- a/kernel/events/uprobes.c
>> +++ b/kernel/events/uprobes.c
>> @@ -537,7 +537,7 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
>> }
>> }
>> - ret = __replace_page(vma, vaddr, old_page, new_page);
>> + ret = __replace_page(vma, vaddr & PAGE_MASK, old_page, new_page);
>> if (new_page)
>> put_page(new_page);
>> put_old:
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index f5d43edad529a..a903db4df6b97 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1408,6 +1408,7 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>> {
>> int nr = folio_nr_pages(folio);
>> + VM_WARN_ON_FOLIO(!IS_ALIGNED(address, PAGE_SIZE), folio);

nit: Is it worth also adding this to __folio_add_anon_rmap() so that
folio_add_anon_rmap_ptes() and folio_add_anon_rmap_pmd() also benefit?

Regardless:

Reviewed-by: Ryan Roberts <[email protected]>

>> VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
>> VM_BUG_ON_VMA(address < vma->vm_start ||
>> address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
>> --
>> 2.43.0
>>
>>
>>
>> --
>> Cheers,
>>
>> David / dhildenb
>>


2024-01-15 09:38:24

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v9 02/10] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap()

>>> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
>>> index 485bb0389b488..929e98c629652 100644
>>> --- a/kernel/events/uprobes.c
>>> +++ b/kernel/events/uprobes.c
>>> @@ -537,7 +537,7 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
>>> }
>>> }
>>> - ret = __replace_page(vma, vaddr, old_page, new_page);
>>> + ret = __replace_page(vma, vaddr & PAGE_MASK, old_page, new_page);
>>> if (new_page)
>>> put_page(new_page);
>>> put_old:
>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>> index f5d43edad529a..a903db4df6b97 100644
>>> --- a/mm/rmap.c
>>> +++ b/mm/rmap.c
>>> @@ -1408,6 +1408,7 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>>> {
>>> int nr = folio_nr_pages(folio);
>>> + VM_WARN_ON_FOLIO(!IS_ALIGNED(address, PAGE_SIZE), folio);
>
> nit: Is it worth also adding this to __folio_add_anon_rmap() so that
> folio_add_anon_rmap_ptes() and folio_add_anon_rmap_pmd() also benefit?
>

Yes, same thoughts. Just included it so we would catch if still
something goes wrong here.

I'll split that change out either way.


> Regardless:
>
> Reviewed-by: Ryan Roberts <[email protected]>

Thanks!

--
Cheers,

David / dhildenb


2024-01-24 11:19:26

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v9 02/10] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap()

On Wed, Jan 24, 2024 at 12:15:52PM +0100, Sven Schnelle wrote:
> Ryan Roberts <[email protected]> writes:
>
> > On 14/01/2024 20:55, Jiri Olsa wrote:
> >> On Sun, Jan 14, 2024 at 06:33:56PM +0100, David Hildenbrand wrote:
> >>> On 13.01.24 23:42, Jiri Olsa wrote:
> >>>> On Thu, Dec 07, 2023 at 04:12:03PM +0000, Ryan Roberts wrote:
> >>>>> In preparation for supporting anonymous multi-size THP, improve
> >>>>> folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be
> >>>>> passed to it. In this case, all contained pages are accounted using the
> >>>>> order-0 folio (or base page) scheme.
> >>>>>
> >>>>> Reviewed-by: Yu Zhao <[email protected]>
> >>>>> Reviewed-by: Yin Fengwei <[email protected]>
> >>>>> Reviewed-by: David Hildenbrand <[email protected]>
> >>>>> Reviewed-by: Barry Song <[email protected]>
> >>>>> Tested-by: Kefeng Wang <[email protected]>
> >>>>> Tested-by: John Hubbard <[email protected]>
> >>>>> Signed-off-by: Ryan Roberts <[email protected]>
> >>>>> ---
> >>>>> mm/rmap.c | 28 ++++++++++++++++++++--------
> >>>>> 1 file changed, 20 insertions(+), 8 deletions(-)
> >>>>>
> >>>>> diff --git a/mm/rmap.c b/mm/rmap.c
> >>>>> index 2a1e45e6419f..846fc79f3ca9 100644
> >>>>> --- a/mm/rmap.c
> >>>>> +++ b/mm/rmap.c
> >>>>> @@ -1335,32 +1335,44 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
> >>>>> * This means the inc-and-test can be bypassed.
> >>>>> * The folio does not have to be locked.
> >>>>> *
> >>>>> - * If the folio is large, it is accounted as a THP. As the folio
> >>>>> + * If the folio is pmd-mappable, it is accounted as a THP. As the folio
> >>>>> * is new, it's assumed to be mapped exclusively by a single process.
> >>>>> */
> >>>>> void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
> >>>>> unsigned long address)
> >>>>> {
> >>>>> - int nr;
> >>>>> + int nr = folio_nr_pages(folio);
> >>>>> - VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
> >>>>> + VM_BUG_ON_VMA(address < vma->vm_start ||
> >>>>> + address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
> >>>>
> >>>> hi,
> >>>> I'm hitting this bug (console output below) with adding uprobe
> >>>> on simple program like:
> >>>>
> >>>> $ cat up.c
> >>>> int main(void)
> >>>> {
> >>>> return 0;
> >>>> }
> >>>>
> >>>> # bpftrace -e 'uprobe:/home/jolsa/up:_start {}'
> >>>>
> >>>> $ ./up
> >>>>
> >>>> it's on top of current linus tree master:
> >>>> 052d534373b7 Merge tag 'exfat-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
> >>>>
> >>>> before this patch it seems to work, I can send my .config if needed
> >
> > Thanks for the bug report!
>
> I just hit the same bug in our CI, but can't find the fix in -next. Is
> this in the queue somewhere?

we hit it as well, but I can see the fix in linux-next/master

4c137bc28064 uprobes: use pagesize-aligned virtual address when replacing pages

jirka

2024-01-24 11:45:19

by Sven Schnelle

[permalink] [raw]
Subject: Re: [PATCH v9 02/10] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap()

Ryan Roberts <[email protected]> writes:

> On 14/01/2024 20:55, Jiri Olsa wrote:
>> On Sun, Jan 14, 2024 at 06:33:56PM +0100, David Hildenbrand wrote:
>>> On 13.01.24 23:42, Jiri Olsa wrote:
>>>> On Thu, Dec 07, 2023 at 04:12:03PM +0000, Ryan Roberts wrote:
>>>>> In preparation for supporting anonymous multi-size THP, improve
>>>>> folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be
>>>>> passed to it. In this case, all contained pages are accounted using the
>>>>> order-0 folio (or base page) scheme.
>>>>>
>>>>> Reviewed-by: Yu Zhao <[email protected]>
>>>>> Reviewed-by: Yin Fengwei <[email protected]>
>>>>> Reviewed-by: David Hildenbrand <[email protected]>
>>>>> Reviewed-by: Barry Song <[email protected]>
>>>>> Tested-by: Kefeng Wang <[email protected]>
>>>>> Tested-by: John Hubbard <[email protected]>
>>>>> Signed-off-by: Ryan Roberts <[email protected]>
>>>>> ---
>>>>> mm/rmap.c | 28 ++++++++++++++++++++--------
>>>>> 1 file changed, 20 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>>>> index 2a1e45e6419f..846fc79f3ca9 100644
>>>>> --- a/mm/rmap.c
>>>>> +++ b/mm/rmap.c
>>>>> @@ -1335,32 +1335,44 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
>>>>> * This means the inc-and-test can be bypassed.
>>>>> * The folio does not have to be locked.
>>>>> *
>>>>> - * If the folio is large, it is accounted as a THP. As the folio
>>>>> + * If the folio is pmd-mappable, it is accounted as a THP. As the folio
>>>>> * is new, it's assumed to be mapped exclusively by a single process.
>>>>> */
>>>>> void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>>>>> unsigned long address)
>>>>> {
>>>>> - int nr;
>>>>> + int nr = folio_nr_pages(folio);
>>>>> - VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
>>>>> + VM_BUG_ON_VMA(address < vma->vm_start ||
>>>>> + address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
>>>>
>>>> hi,
>>>> I'm hitting this bug (console output below) with adding uprobe
>>>> on simple program like:
>>>>
>>>> $ cat up.c
>>>> int main(void)
>>>> {
>>>> return 0;
>>>> }
>>>>
>>>> # bpftrace -e 'uprobe:/home/jolsa/up:_start {}'
>>>>
>>>> $ ./up
>>>>
>>>> it's on top of current linus tree master:
>>>> 052d534373b7 Merge tag 'exfat-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
>>>>
>>>> before this patch it seems to work, I can send my .config if needed
>
> Thanks for the bug report!

I just hit the same bug in our CI, but can't find the fix in -next. Is
this in the queue somewhere?

Thanks
Sven

2024-01-24 12:16:03

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v9 02/10] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap()

On 24/01/2024 11:19, Jiri Olsa wrote:
> On Wed, Jan 24, 2024 at 12:15:52PM +0100, Sven Schnelle wrote:
>> Ryan Roberts <[email protected]> writes:
>>
>>> On 14/01/2024 20:55, Jiri Olsa wrote:
>>>> On Sun, Jan 14, 2024 at 06:33:56PM +0100, David Hildenbrand wrote:
>>>>> On 13.01.24 23:42, Jiri Olsa wrote:
>>>>>> On Thu, Dec 07, 2023 at 04:12:03PM +0000, Ryan Roberts wrote:
>>>>>>> In preparation for supporting anonymous multi-size THP, improve
>>>>>>> folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be
>>>>>>> passed to it. In this case, all contained pages are accounted using the
>>>>>>> order-0 folio (or base page) scheme.
>>>>>>>
>>>>>>> Reviewed-by: Yu Zhao <[email protected]>
>>>>>>> Reviewed-by: Yin Fengwei <[email protected]>
>>>>>>> Reviewed-by: David Hildenbrand <[email protected]>
>>>>>>> Reviewed-by: Barry Song <[email protected]>
>>>>>>> Tested-by: Kefeng Wang <[email protected]>
>>>>>>> Tested-by: John Hubbard <[email protected]>
>>>>>>> Signed-off-by: Ryan Roberts <[email protected]>
>>>>>>> ---
>>>>>>> mm/rmap.c | 28 ++++++++++++++++++++--------
>>>>>>> 1 file changed, 20 insertions(+), 8 deletions(-)
>>>>>>>
>>>>>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>>>>>> index 2a1e45e6419f..846fc79f3ca9 100644
>>>>>>> --- a/mm/rmap.c
>>>>>>> +++ b/mm/rmap.c
>>>>>>> @@ -1335,32 +1335,44 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
>>>>>>> * This means the inc-and-test can be bypassed.
>>>>>>> * The folio does not have to be locked.
>>>>>>> *
>>>>>>> - * If the folio is large, it is accounted as a THP. As the folio
>>>>>>> + * If the folio is pmd-mappable, it is accounted as a THP. As the folio
>>>>>>> * is new, it's assumed to be mapped exclusively by a single process.
>>>>>>> */
>>>>>>> void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>>>>>>> unsigned long address)
>>>>>>> {
>>>>>>> - int nr;
>>>>>>> + int nr = folio_nr_pages(folio);
>>>>>>> - VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
>>>>>>> + VM_BUG_ON_VMA(address < vma->vm_start ||
>>>>>>> + address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
>>>>>>
>>>>>> hi,
>>>>>> I'm hitting this bug (console output below) with adding uprobe
>>>>>> on simple program like:
>>>>>>
>>>>>> $ cat up.c
>>>>>> int main(void)
>>>>>> {
>>>>>> return 0;
>>>>>> }
>>>>>>
>>>>>> # bpftrace -e 'uprobe:/home/jolsa/up:_start {}'
>>>>>>
>>>>>> $ ./up
>>>>>>
>>>>>> it's on top of current linus tree master:
>>>>>> 052d534373b7 Merge tag 'exfat-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
>>>>>>
>>>>>> before this patch it seems to work, I can send my .config if needed
>>>
>>> Thanks for the bug report!
>>
>> I just hit the same bug in our CI, but can't find the fix in -next. Is
>> this in the queue somewhere?
>
> we hit it as well, but I can see the fix in linux-next/master
>
> 4c137bc28064 uprobes: use pagesize-aligned virtual address when replacing pages

Yes that's the one. Just to confirm: you are still hitting the VM_BUG_ON despite
having this change in your kernel? Could you please send over the full bug log?

>
> jirka


2024-01-24 12:16:28

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v9 02/10] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap()

On Wed, Jan 24, 2024 at 12:02:53PM +0000, Ryan Roberts wrote:
> On 24/01/2024 11:19, Jiri Olsa wrote:
> > On Wed, Jan 24, 2024 at 12:15:52PM +0100, Sven Schnelle wrote:
> >> Ryan Roberts <[email protected]> writes:
> >>
> >>> On 14/01/2024 20:55, Jiri Olsa wrote:
> >>>> On Sun, Jan 14, 2024 at 06:33:56PM +0100, David Hildenbrand wrote:
> >>>>> On 13.01.24 23:42, Jiri Olsa wrote:
> >>>>>> On Thu, Dec 07, 2023 at 04:12:03PM +0000, Ryan Roberts wrote:
> >>>>>>> In preparation for supporting anonymous multi-size THP, improve
> >>>>>>> folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be
> >>>>>>> passed to it. In this case, all contained pages are accounted using the
> >>>>>>> order-0 folio (or base page) scheme.
> >>>>>>>
> >>>>>>> Reviewed-by: Yu Zhao <[email protected]>
> >>>>>>> Reviewed-by: Yin Fengwei <[email protected]>
> >>>>>>> Reviewed-by: David Hildenbrand <[email protected]>
> >>>>>>> Reviewed-by: Barry Song <[email protected]>
> >>>>>>> Tested-by: Kefeng Wang <[email protected]>
> >>>>>>> Tested-by: John Hubbard <[email protected]>
> >>>>>>> Signed-off-by: Ryan Roberts <[email protected]>
> >>>>>>> ---
> >>>>>>> mm/rmap.c | 28 ++++++++++++++++++++--------
> >>>>>>> 1 file changed, 20 insertions(+), 8 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/mm/rmap.c b/mm/rmap.c
> >>>>>>> index 2a1e45e6419f..846fc79f3ca9 100644
> >>>>>>> --- a/mm/rmap.c
> >>>>>>> +++ b/mm/rmap.c
> >>>>>>> @@ -1335,32 +1335,44 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
> >>>>>>> * This means the inc-and-test can be bypassed.
> >>>>>>> * The folio does not have to be locked.
> >>>>>>> *
> >>>>>>> - * If the folio is large, it is accounted as a THP. As the folio
> >>>>>>> + * If the folio is pmd-mappable, it is accounted as a THP. As the folio
> >>>>>>> * is new, it's assumed to be mapped exclusively by a single process.
> >>>>>>> */
> >>>>>>> void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
> >>>>>>> unsigned long address)
> >>>>>>> {
> >>>>>>> - int nr;
> >>>>>>> + int nr = folio_nr_pages(folio);
> >>>>>>> - VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
> >>>>>>> + VM_BUG_ON_VMA(address < vma->vm_start ||
> >>>>>>> + address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
> >>>>>>
> >>>>>> hi,
> >>>>>> I'm hitting this bug (console output below) with adding uprobe
> >>>>>> on simple program like:
> >>>>>>
> >>>>>> $ cat up.c
> >>>>>> int main(void)
> >>>>>> {
> >>>>>> return 0;
> >>>>>> }
> >>>>>>
> >>>>>> # bpftrace -e 'uprobe:/home/jolsa/up:_start {}'
> >>>>>>
> >>>>>> $ ./up
> >>>>>>
> >>>>>> it's on top of current linus tree master:
> >>>>>> 052d534373b7 Merge tag 'exfat-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
> >>>>>>
> >>>>>> before this patch it seems to work, I can send my .config if needed
> >>>
> >>> Thanks for the bug report!
> >>
> >> I just hit the same bug in our CI, but can't find the fix in -next. Is
> >> this in the queue somewhere?
> >
> > we hit it as well, but I can see the fix in linux-next/master
> >
> > 4c137bc28064 uprobes: use pagesize-aligned virtual address when replacing pages
>
> Yes that's the one. Just to confirm: you are still hitting the VM_BUG_ON despite
> having this change in your kernel? Could you please send over the full bug log?

ah sorry.. I meant the change fixes the problem for us, it just did not
yet propagate through the merge cycle into bpf trees.. but I can see it
in linux-next tree, so it's probably just matter of time

jirka

2024-01-24 12:17:40

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v9 02/10] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap()

On 24/01/2024 12:06, Jiri Olsa wrote:
> On Wed, Jan 24, 2024 at 12:02:53PM +0000, Ryan Roberts wrote:
>> On 24/01/2024 11:19, Jiri Olsa wrote:
>>> On Wed, Jan 24, 2024 at 12:15:52PM +0100, Sven Schnelle wrote:
>>>> Ryan Roberts <[email protected]> writes:
>>>>
>>>>> On 14/01/2024 20:55, Jiri Olsa wrote:
>>>>>> On Sun, Jan 14, 2024 at 06:33:56PM +0100, David Hildenbrand wrote:
>>>>>>> On 13.01.24 23:42, Jiri Olsa wrote:
>>>>>>>> On Thu, Dec 07, 2023 at 04:12:03PM +0000, Ryan Roberts wrote:
>>>>>>>>> In preparation for supporting anonymous multi-size THP, improve
>>>>>>>>> folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be
>>>>>>>>> passed to it. In this case, all contained pages are accounted using the
>>>>>>>>> order-0 folio (or base page) scheme.
>>>>>>>>>
>>>>>>>>> Reviewed-by: Yu Zhao <[email protected]>
>>>>>>>>> Reviewed-by: Yin Fengwei <[email protected]>
>>>>>>>>> Reviewed-by: David Hildenbrand <[email protected]>
>>>>>>>>> Reviewed-by: Barry Song <[email protected]>
>>>>>>>>> Tested-by: Kefeng Wang <[email protected]>
>>>>>>>>> Tested-by: John Hubbard <[email protected]>
>>>>>>>>> Signed-off-by: Ryan Roberts <[email protected]>
>>>>>>>>> ---
>>>>>>>>> mm/rmap.c | 28 ++++++++++++++++++++--------
>>>>>>>>> 1 file changed, 20 insertions(+), 8 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>>>>>>>> index 2a1e45e6419f..846fc79f3ca9 100644
>>>>>>>>> --- a/mm/rmap.c
>>>>>>>>> +++ b/mm/rmap.c
>>>>>>>>> @@ -1335,32 +1335,44 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
>>>>>>>>> * This means the inc-and-test can be bypassed.
>>>>>>>>> * The folio does not have to be locked.
>>>>>>>>> *
>>>>>>>>> - * If the folio is large, it is accounted as a THP. As the folio
>>>>>>>>> + * If the folio is pmd-mappable, it is accounted as a THP. As the folio
>>>>>>>>> * is new, it's assumed to be mapped exclusively by a single process.
>>>>>>>>> */
>>>>>>>>> void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>>>>>>>>> unsigned long address)
>>>>>>>>> {
>>>>>>>>> - int nr;
>>>>>>>>> + int nr = folio_nr_pages(folio);
>>>>>>>>> - VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
>>>>>>>>> + VM_BUG_ON_VMA(address < vma->vm_start ||
>>>>>>>>> + address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
>>>>>>>>
>>>>>>>> hi,
>>>>>>>> I'm hitting this bug (console output below) with adding uprobe
>>>>>>>> on simple program like:
>>>>>>>>
>>>>>>>> $ cat up.c
>>>>>>>> int main(void)
>>>>>>>> {
>>>>>>>> return 0;
>>>>>>>> }
>>>>>>>>
>>>>>>>> # bpftrace -e 'uprobe:/home/jolsa/up:_start {}'
>>>>>>>>
>>>>>>>> $ ./up
>>>>>>>>
>>>>>>>> it's on top of current linus tree master:
>>>>>>>> 052d534373b7 Merge tag 'exfat-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
>>>>>>>>
>>>>>>>> before this patch it seems to work, I can send my .config if needed
>>>>>
>>>>> Thanks for the bug report!
>>>>
>>>> I just hit the same bug in our CI, but can't find the fix in -next. Is
>>>> this in the queue somewhere?
>>>
>>> we hit it as well, but I can see the fix in linux-next/master
>>>
>>> 4c137bc28064 uprobes: use pagesize-aligned virtual address when replacing pages
>>
>> Yes that's the one. Just to confirm: you are still hitting the VM_BUG_ON despite
>> having this change in your kernel? Could you please send over the full bug log?
>
> ah sorry.. I meant the change fixes the problem for us, it just did not
> yet propagate through the merge cycle into bpf trees.. but I can see it
> in linux-next tree, so it's probably just matter of time

OK great! How about you, Sven? Do you have this change in your kernel? Hopefully
it should fix your problem.

>
> jirka


2024-01-24 12:42:20

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v9 02/10] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap()

On 24/01/2024 12:28, Sven Schnelle wrote:
> Hi Ryan,
>
> Ryan Roberts <[email protected]> writes:
>
>>>>>>>>>> I'm hitting this bug (console output below) with adding uprobe
>>>>>>>>>> on simple program like:
>>>>>>>>>>
>>>>>>>>>> $ cat up.c
>>>>>>>>>> int main(void)
>>>>>>>>>> {
>>>>>>>>>> return 0;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> # bpftrace -e 'uprobe:/home/jolsa/up:_start {}'
>>>>>>>>>>
>>>>>>>>>> $ ./up
>>>>>>>>>>
>>>>>>>>>> it's on top of current linus tree master:
>>>>>>>>>> 052d534373b7 Merge tag 'exfat-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
>>>>>>>>>>
>>>>>>>>>> before this patch it seems to work, I can send my .config if needed
>>>>>>>
>>>>>>> Thanks for the bug report!
>>>>>>
>>>>>> I just hit the same bug in our CI, but can't find the fix in -next. Is
>>>>>> this in the queue somewhere?
>>>>>
>>>>> we hit it as well, but I can see the fix in linux-next/master
>>>>>
>>>>> 4c137bc28064 uprobes: use pagesize-aligned virtual address when replacing pages
>>>>
>>>> Yes that's the one. Just to confirm: you are still hitting the VM_BUG_ON despite
>>>> having this change in your kernel? Could you please send over the full bug log?
>>>
>>> ah sorry.. I meant the change fixes the problem for us, it just did not
>>> yet propagate through the merge cycle into bpf trees.. but I can see it
>>> in linux-next tree, so it's probably just matter of time
>>
>> OK great! How about you, Sven? Do you have this change in your kernel? Hopefully
>> it should fix your problem.
>
> Same here - the fix makes uprobes work again, i just didn't see it in
> torvalds-master and neither in todays linux-next. But Jiri is right,
> it's in linux-next/master. I just missed to find it there. So everything
> should be ok.

Great!

2024-01-24 13:02:56

by Sven Schnelle

[permalink] [raw]
Subject: Re: [PATCH v9 02/10] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap()

Hi Ryan,

Ryan Roberts <[email protected]> writes:

>>>>>>>>> I'm hitting this bug (console output below) with adding uprobe
>>>>>>>>> on simple program like:
>>>>>>>>>
>>>>>>>>> $ cat up.c
>>>>>>>>> int main(void)
>>>>>>>>> {
>>>>>>>>> return 0;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> # bpftrace -e 'uprobe:/home/jolsa/up:_start {}'
>>>>>>>>>
>>>>>>>>> $ ./up
>>>>>>>>>
>>>>>>>>> it's on top of current linus tree master:
>>>>>>>>> 052d534373b7 Merge tag 'exfat-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
>>>>>>>>>
>>>>>>>>> before this patch it seems to work, I can send my .config if needed
>>>>>>
>>>>>> Thanks for the bug report!
>>>>>
>>>>> I just hit the same bug in our CI, but can't find the fix in -next. Is
>>>>> this in the queue somewhere?
>>>>
>>>> we hit it as well, but I can see the fix in linux-next/master
>>>>
>>>> 4c137bc28064 uprobes: use pagesize-aligned virtual address when replacing pages
>>>
>>> Yes that's the one. Just to confirm: you are still hitting the VM_BUG_ON despite
>>> having this change in your kernel? Could you please send over the full bug log?
>>
>> ah sorry.. I meant the change fixes the problem for us, it just did not
>> yet propagate through the merge cycle into bpf trees.. but I can see it
>> in linux-next tree, so it's probably just matter of time
>
> OK great! How about you, Sven? Do you have this change in your kernel? Hopefully
> it should fix your problem.

Same here - the fix makes uprobes work again, i just didn't see it in
torvalds-master and neither in todays linux-next. But Jiri is right,
it's in linux-next/master. I just missed to find it there. So everything
should be ok.