by Qi Zheng

[permalink] [raw]

Subject: Re: [PATCH v3 00/15] Free user PTE page table pages

On 11/11/21 8:51 PM, David Hildenbrand wrote:

>>>>
>>>> In the performance test shown on the cover, we repeatedly performed
>>>> touch and madvise(MADV_DONTNEED) actions, which simulated the case
>>>> you said above.
>>>>
>>>> We did find a small amount of performance regression, but I think it is
>>>> acceptable, and no new perf hotspots have been added.
>>>
>>> That test always accesses 2MiB and does it from a single thread. Things
>>> might (IMHO will) look different when only accessing individual pages
>>> and doing the access from one/multiple separate threads (that's what
>>
>> No, it includes multi-threading:
>>
>
> Oh sorry, I totally skipped [2].
>
>> while (1) {
>> char *c;
>> char *start = mmap_area[cpu];
>> char *end = mmap_area[cpu] + FAULT_LENGTH;
>> pthread_barrier_wait(&barrier);
>> //printf("fault into %p-%p\n",start, end);
>>
>> for (c = start; c < end; c += PAGE_SIZE)
>> *c = 0;
>>
>> pthread_barrier_wait(&barrier);
>> for (i = 0; cpu==0 && i < num; i++)
>> madvise(mmap_area[i], FAULT_LENGTH, MADV_DONTNEED);
>> pthread_barrier_wait(&barrier);
>> }
>>
>> Thread on cpu0 will use madvise(MADV_DONTNEED) to release the physical
>> memory of threads on other cpu.
>>
>
> I'll have a more detailed look at the benchmark. On a quick glimpse,

Thank you for your time :)

> looks like the threads are also accessing a full 2MiB range, one page at
> a time, and one thread is zapping the whole 2MiB range. A single CPU
> only accesses memory within one 2MiB range IIRC.
>
> Having multiple threads just access individual pages within a single 2
> MiB region, and having one thread zap that memory (e.g., simulate
> swapout) could be another benchmark.

LGTM, I will simulate more scenarios for testing.

>
> We have to make sure to run with THP disabled (e.g., using
> madvise(MADV_NOHUGEPAGE) on the complete mapping in the benchmark
> eventually), because otherwise you might just be populating+zapping THPs
> if they would otherwise be allowed in the environment.

Yes, I turned off THP during testing:

root@~$ cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]

>

--
Thanks,
Qi

2021-11-11 13:47:44

by kernel test robot

[permalink] [raw]

Subject: Re: [PATCH v3 02/15] mm: introduce is_huge_pmd() helper

Hi Qi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on hnaz-mm/master]
[also build test ERROR on tip/perf/core tip/x86/core linus/master v5.15 next-20211111]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Qi-Zheng/Free-user-PTE-page-table-pages/20211110-185837
base: https://github.com/hnaz/linux-mm master
config: ia64-defconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/ce86336fbabb116520ad01162faf5c8d4a1ce124
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Qi-Zheng/Free-user-PTE-page-table-pages/20211110-185837
git checkout ce86336fbabb116520ad01162faf5c8d4a1ce124
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross ARCH=ia64

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All errors (new ones prefixed by >>):

mm/memory.c: In function 'copy_pmd_range':
>> mm/memory.c:1149:21: error: implicit declaration of function 'is_huge_pmd'; did you mean 'is_hugepd'? [-Werror=implicit-function-declaration]
1149 | if (is_huge_pmd(*src_pmd)) {
| ^~~~~~~~~~~
| is_hugepd
In file included from <command-line>:
In function 'zap_pmd_range',
inlined from 'zap_pud_range' at mm/memory.c:1499:10,
inlined from 'zap_p4d_range' at mm/memory.c:1520:10,
inlined from 'unmap_page_range' at mm/memory.c:1541:10:
include/linux/compiler_types.h:335:45: error: call to '__compiletime_assert_304' declared with attribute error: BUILD_BUG failed
335 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^
include/linux/compiler_types.h:316:25: note: in definition of macro '__compiletime_assert'
316 | prefix ## suffix(); \
| ^~~~~~
include/linux/compiler_types.h:335:9: note: in expansion of macro '_compiletime_assert'
335 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
| ^~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:59:21: note: in expansion of macro 'BUILD_BUG_ON_MSG'
59 | #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed")
| ^~~~~~~~~~~~~~~~
include/linux/huge_mm.h:328:27: note: in expansion of macro 'BUILD_BUG'
328 | #define HPAGE_PMD_SIZE ({ BUILD_BUG(); 0; })
| ^~~~~~~~~
mm/memory.c:1444:44: note: in expansion of macro 'HPAGE_PMD_SIZE'
1444 | if (next - addr != HPAGE_PMD_SIZE)
| ^~~~~~~~~~~~~~
cc1: some warnings being treated as errors
--
mm/mprotect.c: In function 'change_pmd_range':
>> mm/mprotect.c:260:21: error: implicit declaration of function 'is_huge_pmd'; did you mean 'is_hugepd'? [-Werror=implicit-function-declaration]
260 | if (is_huge_pmd(*pmd)) {
| ^~~~~~~~~~~
| is_hugepd
In file included from <command-line>:
In function 'change_pmd_range',
inlined from 'change_pud_range' at mm/mprotect.c:307:12,
inlined from 'change_p4d_range' at mm/mprotect.c:327:12,
inlined from 'change_protection_range' at mm/mprotect.c:352:12:
include/linux/compiler_types.h:335:45: error: call to '__compiletime_assert_298' declared with attribute error: BUILD_BUG failed
335 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^
include/linux/compiler_types.h:316:25: note: in definition of macro '__compiletime_assert'
316 | prefix ## suffix(); \
| ^~~~~~
include/linux/compiler_types.h:335:9: note: in expansion of macro '_compiletime_assert'
335 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
| ^~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:59:21: note: in expansion of macro 'BUILD_BUG_ON_MSG'
59 | #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed")
| ^~~~~~~~~~~~~~~~
include/linux/huge_mm.h:328:27: note: in expansion of macro 'BUILD_BUG'
328 | #define HPAGE_PMD_SIZE ({ BUILD_BUG(); 0; })
| ^~~~~~~~~
mm/mprotect.c:261:44: note: in expansion of macro 'HPAGE_PMD_SIZE'
261 | if (next - addr != HPAGE_PMD_SIZE) {
| ^~~~~~~~~~~~~~
cc1: some warnings being treated as errors
--
mm/mremap.c: In function 'move_page_tables':
>> mm/mremap.c:535:21: error: implicit declaration of function 'is_huge_pmd'; did you mean 'is_hugepd'? [-Werror=implicit-function-declaration]
535 | if (is_huge_pmd(*old_pmd)) {
| ^~~~~~~~~~~
| is_hugepd
In file included from <command-line>:
include/linux/compiler_types.h:335:45: error: call to '__compiletime_assert_304' declared with attribute error: BUILD_BUG failed
335 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^
include/linux/compiler_types.h:316:25: note: in definition of macro '__compiletime_assert'
316 | prefix ## suffix(); \
| ^~~~~~
include/linux/compiler_types.h:335:9: note: in expansion of macro '_compiletime_assert'
335 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
| ^~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:59:21: note: in expansion of macro 'BUILD_BUG_ON_MSG'
59 | #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed")
| ^~~~~~~~~~~~~~~~
include/linux/huge_mm.h:328:27: note: in expansion of macro 'BUILD_BUG'
328 | #define HPAGE_PMD_SIZE ({ BUILD_BUG(); 0; })
| ^~~~~~~~~
mm/mremap.c:536:39: note: in expansion of macro 'HPAGE_PMD_SIZE'
536 | if (extent == HPAGE_PMD_SIZE &&
| ^~~~~~~~~~~~~~
cc1: some warnings being treated as errors

vim +1149 mm/memory.c

1132
1133 static inline int
1134 copy_pmd_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
1135 pud_t *dst_pud, pud_t *src_pud, unsigned long addr,
1136 unsigned long end)
1137 {
1138 struct mm_struct *dst_mm = dst_vma->vm_mm;
1139 struct mm_struct *src_mm = src_vma->vm_mm;
1140 pmd_t *src_pmd, *dst_pmd;
1141 unsigned long next;
1142
1143 dst_pmd = pmd_alloc(dst_mm, dst_pud, addr);
1144 if (!dst_pmd)
1145 return -ENOMEM;
1146 src_pmd = pmd_offset(src_pud, addr);
1147 do {
1148 next = pmd_addr_end(addr, end);
> 1149 if (is_huge_pmd(*src_pmd)) {
1150 int err;
1151 VM_BUG_ON_VMA(next-addr != HPAGE_PMD_SIZE, src_vma);
1152 err = copy_huge_pmd(dst_mm, src_mm, dst_pmd, src_pmd,
1153 addr, dst_vma, src_vma);
1154 if (err == -ENOMEM)
1155 return -ENOMEM;
1156 if (!err)
1157 continue;
1158 /* fall through */
1159 }
1160 if (pmd_none_or_clear_bad(src_pmd))
1161 continue;
1162 if (copy_pte_range(dst_vma, src_vma, dst_pmd, src_pmd,
1163 addr, next))
1164 return -ENOMEM;
1165 } while (dst_pmd++, src_pmd++, addr = next, addr != end);
1166 return 0;
1167 }
1168

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]

Attachments:

(No filename) (8.89 kB)
.config.gz (19.59 kB)
Download all attachments

2021-11-14 14:43:59

by kernel test robot

[permalink] [raw]

Subject: [mm/pte_ref] afcc9fb874: kernel_BUG_at_include/linux/pte_ref.h

Greeting,

FYI, we noticed the following commit (built with gcc-9):

commit: afcc9fb8741f26773a381ac1e159e0172344b7d5 ("[PATCH v3 13/15] mm/pte_ref: free user PTE page table pages")
url: https://github.com/0day-ci/linux/commits/Qi-Zheng/Free-user-PTE-page-table-pages/20211110-185837
base: https://github.com/hnaz/linux-mm master
patch link: https://lore.kernel.org/linux-doc/[email protected]

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):

+------------------------------------------+------------+------------+
| | e249f0fa9a | afcc9fb874 |
+------------------------------------------+------------+------------+
| boot_successes | 16 | 0 |
| boot_failures | 0 | 14 |
| kernel_BUG_at_include/linux/pte_ref.h | 0 | 14 |
| invalid_opcode:#[##] | 0 | 14 |
| RIP:destroy_args | 0 | 14 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 14 |
+------------------------------------------+------------+------------+

If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>

[ 7.245922][ T1] kernel BUG at include/linux/pte_ref.h:56!
[ 7.269161][ T1] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[ 7.271019][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.15.0-rc7-mm1-00448-gafcc9fb8741f #1
[ 7.273761][ T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 7.276418][ T1] RIP: 0010:destroy_args (include/linux/pte_ref.h:56 include/linux/pte_ref.h:123 mm/debug_vm_pgtable.c:1051)
[ 7.277992][ T1] Code: 6b 58 4c 8b 2b 49 8b 3c 24 e8 c6 38 b4 fe 48 c1 e0 06 48 03 05 aa eb 4c ff 8b 50 30 81 e2 00 02 00 f0 81 fa 00 00 00 f0 74 02 <0f> 0b f0 83 68 20 01 75 15 48 89 ea 4c 89 e6 4c 89 ef 48 81 e2 00
All code
========
0: 6b 58 4c 8b imul $0xffffff8b,0x4c(%rax),%ebx
4: 2b 49 8b sub -0x75(%rcx),%ecx
7: 3c 24 cmp $0x24,%al
9: e8 c6 38 b4 fe callq 0xfffffffffeb438d4
e: 48 c1 e0 06 shl $0x6,%rax
12: 48 03 05 aa eb 4c ff add -0xb31456(%rip),%rax # 0xffffffffff4cebc3
19: 8b 50 30 mov 0x30(%rax),%edx
1c: 81 e2 00 02 00 f0 and $0xf0000200,%edx
22: 81 fa 00 00 00 f0 cmp $0xf0000000,%edx
28: 74 02 je 0x2c
2a:* 0f 0b ud2 <-- trapping instruction
2c: f0 83 68 20 01 lock subl $0x1,0x20(%rax)
31: 75 15 jne 0x48
33: 48 89 ea mov %rbp,%rdx
36: 4c 89 e6 mov %r12,%rsi
39: 4c 89 ef mov %r13,%rdi
3c: 48 rex.W
3d: 81 .byte 0x81
3e: e2 00 loop 0x40

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: f0 83 68 20 01 lock subl $0x1,0x20(%rax)
7: 75 15 jne 0x1e
9: 48 89 ea mov %rbp,%rdx
c: 4c 89 e6 mov %r12,%rsi
f: 4c 89 ef mov %r13,%rdi
12: 48 rex.W
13: 81 .byte 0x81
14: e2 00 loop 0x16
[ 7.283473][ T1] RSP: 0000:ffffc90000013da0 EFLAGS: 00010206
[ 7.285295][ T1] RAX: ffffea0000000000 RBX: ffffc90000013dc8 RCX: 0000000000000000
[ 7.287675][ T1] RDX: 00000000f0000200 RSI: ffffffff823848b5 RDI: 0000000000000000
[ 7.290056][ T1] RBP: 000024b4af3bd000 R08: 0000000000000001 R09: 0000000000000040
[ 7.292449][ T1] R10: ffff88842fc2fb60 R11: ffffc90000013d00 R12: ffff88812da63000
[ 7.294926][ T1] R13: ffff88810ca08c00 R14: 0000000140000067 R15: 0000000000000027
[ 7.297349][ T1] FS: 0000000000000000(0000) GS:ffff88842fc00000(0000) knlGS:0000000000000000
[ 7.300020][ T1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7.301949][ T1] CR2: 0000000000000000 CR3: 0000000002612000 CR4: 00000000000006f0
[ 7.304153][ T1] Call Trace:
[ 7.306975][ T1] <TASK>
[ 7.307966][ T1] debug_vm_pgtable (mm/debug_vm_pgtable.c:1334)
[ 7.309435][ T1] ? init_args (mm/debug_vm_pgtable.c:1241)
[ 7.310773][ T1] do_one_initcall (init/main.c:1303)
[ 7.312212][ T1] kernel_init_freeable (init/main.c:1377 init/main.c:1394 init/main.c:1413 init/main.c:1618)
[ 7.313728][ T1] ? rest_init (init/main.c:1499)
[ 7.315002][ T1] kernel_init (init/main.c:1509)
[ 7.316368][ T1] ret_from_fork (arch/x86/entry/entry_64.S:301)
[ 7.317692][ T1] </TASK>
[ 7.318697][ T1] Modules linked in:
[ 7.320060][ T1] ---[ end trace 1f2bbe378e842286 ]---
[ 7.321766][ T1] RIP: 0010:destroy_args (include/linux/pte_ref.h:56 include/linux/pte_ref.h:123 mm/debug_vm_pgtable.c:1051)
[ 7.323325][ T1] Code: 6b 58 4c 8b 2b 49 8b 3c 24 e8 c6 38 b4 fe 48 c1 e0 06 48 03 05 aa eb 4c ff 8b 50 30 81 e2 00 02 00 f0 81 fa 00 00 00 f0 74 02 <0f> 0b f0 83 68 20 01 75 15 48 89 ea 4c 89 e6 4c 89 ef 48 81 e2 00
All code
========
0: 6b 58 4c 8b imul $0xffffff8b,0x4c(%rax),%ebx
4: 2b 49 8b sub -0x75(%rcx),%ecx
7: 3c 24 cmp $0x24,%al
9: e8 c6 38 b4 fe callq 0xfffffffffeb438d4
e: 48 c1 e0 06 shl $0x6,%rax
12: 48 03 05 aa eb 4c ff add -0xb31456(%rip),%rax # 0xffffffffff4cebc3
19: 8b 50 30 mov 0x30(%rax),%edx
1c: 81 e2 00 02 00 f0 and $0xf0000200,%edx
22: 81 fa 00 00 00 f0 cmp $0xf0000000,%edx
28: 74 02 je 0x2c
2a:* 0f 0b ud2 <-- trapping instruction
2c: f0 83 68 20 01 lock subl $0x1,0x20(%rax)
31: 75 15 jne 0x48
33: 48 89 ea mov %rbp,%rdx
36: 4c 89 e6 mov %r12,%rsi
39: 4c 89 ef mov %r13,%rdi
3c: 48 rex.W
3d: 81 .byte 0x81
3e: e2 00 loop 0x40

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: f0 83 68 20 01 lock subl $0x1,0x20(%rax)
7: 75 15 jne 0x1e
9: 48 89 ea mov %rbp,%rdx
c: 4c 89 e6 mov %r12,%rsi
f: 4c 89 ef mov %r13,%rdi
12: 48 rex.W
13: 81 .byte 0x81
14: e2 00 loop 0x16

To reproduce:

# build kernel
cd linux
cp config-5.15.0-rc7-mm1-00448-gafcc9fb8741f .config
make HOSTCC=gcc-9 CC=gcc-9 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.

---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Oliver Sang

Attachments:

(No filename) (7.22 kB)
config-5.15.0-rc7-mm1-00448-gafcc9fb8741f (119.14 kB)
job-script (4.57 kB)
dmesg.xz (11.36 kB)
Download all attachments