2021-06-11 19:50:28

by Nathan Chancellor

[permalink] [raw]
Subject: vmemmap alloc failure in hot_add_req()

Hi all,

I am occasionally seeing a kernel warning when running virtual machines
in Hyper-V, which usually happens a minute or so after boot. It does not
happen on every boot and it is reproducible on at least v5.10. I think
it might have something to do with constant reboots, which I do when
testing various kernels.

The stack trace is as follows:

[ 49.215291] kworker/0:1: vmemmap alloc failure: order:9, mode:0x4cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL), nodemask=(null),cpuset=/,mems_allowed=0
[ 49.215299] CPU: 0 PID: 18 Comm: kworker/0:1 Not tainted 5.13.0-rc5 #1
[ 49.215301] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.0 11/01/2019
[ 49.215302] Workqueue: events hot_add_req [hv_balloon]
[ 49.215307] Call Trace:
[ 49.215310] dump_stack+0x76/0x94
[ 49.215314] warn_alloc.cold+0x78/0xdc
[ 49.215316] ? __alloc_pages+0x200/0x230
[ 49.215319] vmemmap_alloc_block+0x86/0xdc
[ 49.215323] vmemmap_populate+0x10e/0x31c
[ 49.215324] __populate_section_memmap+0x38/0x4e
[ 49.215326] sparse_add_section+0x12c/0x1cf
[ 49.215329] __add_pages+0xa9/0x130
[ 49.215330] add_pages+0x12/0x60
[ 49.215333] add_memory_resource+0x180/0x300
[ 49.215335] __add_memory+0x3b/0x80
[ 49.215336] add_memory+0x2e/0x50
[ 49.215337] hot_add_req+0x3fc/0x5a0 [hv_balloon]
[ 49.215340] process_one_work+0x214/0x3e0
[ 49.215342] worker_thread+0x4d/0x3d0
[ 49.215344] ? process_one_work+0x3e0/0x3e0
[ 49.215345] kthread+0x133/0x150
[ 49.215347] ? kthread_associate_blkcg+0xc0/0xc0
[ 49.215348] ret_from_fork+0x22/0x30
[ 49.215351] Mem-Info:
[ 49.215352] active_anon:251 inactive_anon:140868 isolated_anon:0
active_file:47497 inactive_file:88505 isolated_file:0
unevictable:8 dirty:14 writeback:0
slab_reclaimable:12013 slab_unreclaimable:11403
mapped:131701 shmem:12671 pagetables:3140 bounce:0
free:41388 free_pcp:37 free_cma:0
[ 49.215355] Node 0 active_anon:1004kB inactive_anon:563472kB active_file:189988kB inactive_file:354020kB unevictable:32kB isolated(anon):0kB isolated(file):0kB mapped:526804kB dirty:56kB writeback:0kB shmem:50684kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:5904kB pagetables:12560kB all_unreclaimable? no
[ 49.215358] Node 0 DMA free:6496kB min:480kB low:600kB high:720kB reserved_highatomic:0KB active_anon:0kB inactive_anon:3120kB active_file:2584kB inactive_file:2792kB unevictable:0kB writepending:0kB present:15996kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 49.215361] lowmem_reserve[]: 0 1384 1384 1384 1384
[ 49.215364] Node 0 DMA32 free:159056kB min:44572kB low:55712kB high:66852kB reserved_highatomic:0KB active_anon:1004kB inactive_anon:560352kB active_file:187004kB inactive_file:350864kB unevictable:32kB writepending:56kB present:1555760kB managed:1432388kB mlocked:32kB bounce:0kB free_pcp:172kB local_pcp:0kB free_cma:0kB
[ 49.215367] lowmem_reserve[]: 0 0 0 0 0
[ 49.215369] Node 0 DMA: 17*4kB (UM) 13*8kB (M) 10*16kB (M) 3*32kB (ME) 3*64kB (UME) 4*128kB (UME) 1*256kB (E) 2*512kB (UE) 2*1024kB (ME) 1*2048kB (E) 0*4096kB = 6508kB
[ 49.215377] Node 0 DMA32: 8061*4kB (UME) 5892*8kB (UME) 2449*16kB (UME) 604*32kB (UME) 207*64kB (UME) 49*128kB (UM) 7*256kB (M) 1*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 159716kB
[ 49.215388] 148696 total pagecache pages
[ 49.215388] 0 pages in swap cache
[ 49.215389] Swap cache stats: add 0, delete 0, find 0/0
[ 49.215390] Free swap = 0kB
[ 49.215390] Total swap = 0kB
[ 49.215391] 392939 pages RAM
[ 49.215391] 0 pages HighMem/MovableOnly
[ 49.215391] 31002 pages reserved
[ 49.215392] 0 pages cma reserved
[ 49.215393] 0 pages hwpoisoned

Is this a known issue and/or am I doing something wrong? I only noticed
this because there are times when I am compiling something intensive in
the VM such as LLVM and the VM runs out of memory even though I have
plenty of free memory on the host but I am not sure if this warning is
related to that issue.

I do not have much experience with Hyper-V so it is possible I do not
have something configured properly or there is some other issue going
on. Let me know if there is any further information I can provide or
help debug in any way.

Cheers,
Nathan


2021-06-12 03:44:08

by Nathan Chancellor

[permalink] [raw]
Subject: Re: vmemmap alloc failure in hot_add_req()

On Fri, Jun 11, 2021 at 12:48:26PM -0700, Nathan Chancellor wrote:
> Hi all,
>
> I am occasionally seeing a kernel warning when running virtual machines
> in Hyper-V, which usually happens a minute or so after boot. It does not
> happen on every boot and it is reproducible on at least v5.10. I think
> it might have something to do with constant reboots, which I do when
> testing various kernels.
>
> The stack trace is as follows:
>
> [ 49.215291] kworker/0:1: vmemmap alloc failure: order:9, mode:0x4cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL), nodemask=(null),cpuset=/,mems_allowed=0
> [ 49.215299] CPU: 0 PID: 18 Comm: kworker/0:1 Not tainted 5.13.0-rc5 #1
> [ 49.215301] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.0 11/01/2019
> [ 49.215302] Workqueue: events hot_add_req [hv_balloon]
> [ 49.215307] Call Trace:
> [ 49.215310] dump_stack+0x76/0x94
> [ 49.215314] warn_alloc.cold+0x78/0xdc
> [ 49.215316] ? __alloc_pages+0x200/0x230
> [ 49.215319] vmemmap_alloc_block+0x86/0xdc
> [ 49.215323] vmemmap_populate+0x10e/0x31c
> [ 49.215324] __populate_section_memmap+0x38/0x4e
> [ 49.215326] sparse_add_section+0x12c/0x1cf
> [ 49.215329] __add_pages+0xa9/0x130
> [ 49.215330] add_pages+0x12/0x60
> [ 49.215333] add_memory_resource+0x180/0x300
> [ 49.215335] __add_memory+0x3b/0x80
> [ 49.215336] add_memory+0x2e/0x50
> [ 49.215337] hot_add_req+0x3fc/0x5a0 [hv_balloon]
> [ 49.215340] process_one_work+0x214/0x3e0
> [ 49.215342] worker_thread+0x4d/0x3d0
> [ 49.215344] ? process_one_work+0x3e0/0x3e0
> [ 49.215345] kthread+0x133/0x150
> [ 49.215347] ? kthread_associate_blkcg+0xc0/0xc0
> [ 49.215348] ret_from_fork+0x22/0x30
> [ 49.215351] Mem-Info:
> [ 49.215352] active_anon:251 inactive_anon:140868 isolated_anon:0
> active_file:47497 inactive_file:88505 isolated_file:0
> unevictable:8 dirty:14 writeback:0
> slab_reclaimable:12013 slab_unreclaimable:11403
> mapped:131701 shmem:12671 pagetables:3140 bounce:0
> free:41388 free_pcp:37 free_cma:0
> [ 49.215355] Node 0 active_anon:1004kB inactive_anon:563472kB active_file:189988kB inactive_file:354020kB unevictable:32kB isolated(anon):0kB isolated(file):0kB mapped:526804kB dirty:56kB writeback:0kB shmem:50684kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:5904kB pagetables:12560kB all_unreclaimable? no
> [ 49.215358] Node 0 DMA free:6496kB min:480kB low:600kB high:720kB reserved_highatomic:0KB active_anon:0kB inactive_anon:3120kB active_file:2584kB inactive_file:2792kB unevictable:0kB writepending:0kB present:15996kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [ 49.215361] lowmem_reserve[]: 0 1384 1384 1384 1384
> [ 49.215364] Node 0 DMA32 free:159056kB min:44572kB low:55712kB high:66852kB reserved_highatomic:0KB active_anon:1004kB inactive_anon:560352kB active_file:187004kB inactive_file:350864kB unevictable:32kB writepending:56kB present:1555760kB managed:1432388kB mlocked:32kB bounce:0kB free_pcp:172kB local_pcp:0kB free_cma:0kB
> [ 49.215367] lowmem_reserve[]: 0 0 0 0 0
> [ 49.215369] Node 0 DMA: 17*4kB (UM) 13*8kB (M) 10*16kB (M) 3*32kB (ME) 3*64kB (UME) 4*128kB (UME) 1*256kB (E) 2*512kB (UE) 2*1024kB (ME) 1*2048kB (E) 0*4096kB = 6508kB
> [ 49.215377] Node 0 DMA32: 8061*4kB (UME) 5892*8kB (UME) 2449*16kB (UME) 604*32kB (UME) 207*64kB (UME) 49*128kB (UM) 7*256kB (M) 1*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 159716kB
> [ 49.215388] 148696 total pagecache pages
> [ 49.215388] 0 pages in swap cache
> [ 49.215389] Swap cache stats: add 0, delete 0, find 0/0
> [ 49.215390] Free swap = 0kB
> [ 49.215390] Total swap = 0kB
> [ 49.215391] 392939 pages RAM
> [ 49.215391] 0 pages HighMem/MovableOnly
> [ 49.215391] 31002 pages reserved
> [ 49.215392] 0 pages cma reserved
> [ 49.215393] 0 pages hwpoisoned
>
> Is this a known issue and/or am I doing something wrong? I only noticed
> this because there are times when I am compiling something intensive in
> the VM such as LLVM and the VM runs out of memory even though I have
> plenty of free memory on the host but I am not sure if this warning is
> related to that issue.

I had one of these events just happen, journalctl output attached. I
have no idea what the "unhandled message: type:" messages mean, I guess
that corresponds to the switch statement in balloon_onchannelcallback().

> I do not have much experience with Hyper-V so it is possible I do not
> have something configured properly or there is some other issue going
> on. Let me know if there is any further information I can provide or
> help debug in any way.

Cheers,
Nathan


Attachments:
(No filename) (4.75 kB)
journalctl.log (74.12 kB)
Download all attachments

2021-06-14 07:39:42

by David Hildenbrand

[permalink] [raw]
Subject: Re: vmemmap alloc failure in hot_add_req()

On 12.06.21 04:11, Hillf Danton wrote:
> On Fri, 11 Jun 2021 12:48:26 -0700 Nathan Chancellor wrote:
>> Hi all,
>>
>> I am occasionally seeing a kernel warning when running virtual machines
>> in Hyper-V, which usually happens a minute or so after boot. It does not
>> happen on every boot and it is reproducible on at least v5.10. I think
>> it might have something to do with constant reboots, which I do when
>> testing various kernels.
>>
>> The stack trace is as follows:
>>
>> [ 49.215291] kworker/0:1: vmemmap alloc failure: order:9, mode:0x4cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL), nodemask=(null),cpuset=/,mems_allowed=0
>> [ 49.215299] CPU: 0 PID: 18 Comm: kworker/0:1 Not tainted 5.13.0-rc5 #1
>> [ 49.215301] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.0 11/01/2019
>> [ 49.215302] Workqueue: events hot_add_req [hv_balloon]
>
> Apart from order:9 (mm Cced), events_unbound is the right workqueue instead
> because the report shows the risk that hot_add_req could block other pending
> events longer than thought. Any special reason for the events wq?
>
>> [ 49.215307] Call Trace:
>> [ 49.215310] dump_stack+0x76/0x94
>> [ 49.215314] warn_alloc.cold+0x78/0xdc
>> [ 49.215316] ? __alloc_pages+0x200/0x230
>> [ 49.215319] vmemmap_alloc_block+0x86/0xdc
>> [ 49.215323] vmemmap_populate+0x10e/0x31c
>> [ 49.215324] __populate_section_memmap+0x38/0x4e
>> [ 49.215326] sparse_add_section+0x12c/0x1cf
>> [ 49.215329] __add_pages+0xa9/0x130
>> [ 49.215330] add_pages+0x12/0x60
>> [ 49.215333] add_memory_resource+0x180/0x300
>> [ 49.215335] __add_memory+0x3b/0x80
>> [ 49.215336] add_memory+0x2e/0x50
>> [ 49.215337] hot_add_req+0x3fc/0x5a0 [hv_balloon]
>> [ 49.215340] process_one_work+0x214/0x3e0
>> [ 49.215342] worker_thread+0x4d/0x3d0
>> [ 49.215344] ? process_one_work+0x3e0/0x3e0
>> [ 49.215345] kthread+0x133/0x150
>> [ 49.215347] ? kthread_associate_blkcg+0xc0/0xc0
>> [ 49.215348] ret_from_fork+0x22/0x30
>> [ 49.215351] Mem-Info:
>> [ 49.215352] active_anon:251 inactive_anon:140868 isolated_anon:0
>> active_file:47497 inactive_file:88505 isolated_file:0
>> unevictable:8 dirty:14 writeback:0
>> slab_reclaimable:12013 slab_unreclaimable:11403
>> mapped:131701 shmem:12671 pagetables:3140 bounce:0
>> free:41388 free_pcp:37 free_cma:0
>> [ 49.215355] Node 0 active_anon:1004kB inactive_anon:563472kB active_file:189988kB inactive_file:354020kB unevictable:32kB isolated(anon):0kB isolated(file):0kB mapped:526804kB dirty:56kB writeback:0kB shmem:50684kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:5904kB pagetables:12560kB all_unreclaimable? no
>> [ 49.215358] Node 0 DMA free:6496kB min:480kB low:600kB high:720kB reserved_highatomic:0KB active_anon:0kB inactive_anon:3120kB active_file:2584kB inactive_file:2792kB unevictable:0kB writepending:0kB present:15996kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>> [ 49.215361] lowmem_reserve[]: 0 1384 1384 1384 1384
>> [ 49.215364] Node 0 DMA32 free:159056kB min:44572kB low:55712kB high:66852kB reserved_highatomic:0KB active_anon:1004kB inactive_anon:560352kB active_file:187004kB inactive_file:350864kB unevictable:32kB writepending:56kB present:1555760kB managed:1432388kB mlocked:32kB bounce:0kB free_pcp:172kB local_pcp:0kB free_cma:0kB
>> [ 49.215367] lowmem_reserve[]: 0 0 0 0 0
>> [ 49.215369] Node 0 DMA: 17*4kB (UM) 13*8kB (M) 10*16kB (M) 3*32kB (ME) 3*64kB (UME) 4*128kB (UME) 1*256kB (E) 2*512kB (UE) 2*1024kB (ME) 1*2048kB (E) 0*4096kB = 6508kB
>> [ 49.215377] Node 0 DMA32: 8061*4kB (UME) 5892*8kB (UME) 2449*16kB (UME) 604*32kB (UME) 207*64kB (UME) 49*128kB (UM) 7*256kB (M) 1*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 159716kB
>> [ 49.215388] 148696 total pagecache pages
>> [ 49.215388] 0 pages in swap cache
>> [ 49.215389] Swap cache stats: add 0, delete 0, find 0/0
>> [ 49.215390] Free swap = 0kB
>> [ 49.215390] Total swap = 0kB
>> [ 49.215391] 392939 pages RAM
>> [ 49.215391] 0 pages HighMem/MovableOnly
>> [ 49.215391] 31002 pages reserved
>> [ 49.215392] 0 pages cma reserved
>> [ 49.215393] 0 pages hwpoisoned
>>
>> Is this a known issue and/or am I doing something wrong? I only noticed
>> this because there are times when I am compiling something intensive in
>> the VM such as LLVM and the VM runs out of memory even though I have
>> plenty of free memory on the host but I am not sure if this warning is
>> related to that issue.

Hi,

Is hotplugged memory getting onlined automatically (either from user
space via a udev script or via the kernel, for example, with
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE)?

If it's not getting onlined, you easily sport after hotplug e.g., via
"lsmem" that there are quite some offline memory blocks.

Note that x86_64 code will fallback from populating huge pages to
populating base pages for the vmemmap; this can happen easily when under
memory pressure.

If adding memory would fail completely, you'd see another "hot_add
memory failed error is ..." error message from hyper-v in the kernel
log. If that doesn't show up, it's simply suboptimal, but hotplugging
memory still succeeded.


Note: we could support "memmap_on_memory" in some cases (e.g., no memory
holes in hotadded range) when hotplugging memory blocks via hyper-v,
which would result in this warning less trigger less frequently.

--
Thanks,

David / dhildenb

2021-06-17 02:44:35

by Nathan Chancellor

[permalink] [raw]
Subject: Re: vmemmap alloc failure in hot_add_req()

Hi David,

On 6/14/2021 12:38 AM, David Hildenbrand wrote:
> On 12.06.21 04:11, Hillf Danton wrote:
>> On Fri, 11 Jun 2021 12:48:26 -0700 Nathan Chancellor wrote:
>>> Hi all,
>>>
>>> I am occasionally seeing a kernel warning when running virtual machines
>>> in Hyper-V, which usually happens a minute or so after boot. It does not
>>> happen on every boot and it is reproducible on at least v5.10. I think
>>> it might have something to do with constant reboots, which I do when
>>> testing various kernels.
>>>
>>> The stack trace is as follows:
>>>
>>> [   49.215291] kworker/0:1: vmemmap alloc failure: order:9,
>>> mode:0x4cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL),
>>> nodemask=(null),cpuset=/,mems_allowed=0
>>> [   49.215299] CPU: 0 PID: 18 Comm: kworker/0:1 Not tainted
>>> 5.13.0-rc5 #1
>>> [   49.215301] Hardware name: Microsoft Corporation Virtual
>>> Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.0 11/01/2019
>>> [   49.215302] Workqueue: events hot_add_req [hv_balloon]
>>
>> Apart from order:9 (mm Cced), events_unbound is the right workqueue
>> instead
>> because the report shows the risk that hot_add_req could block other
>> pending
>> events longer than thought. Any special reason for the events wq?
>>
>>> [   49.215307] Call Trace:
>>> [   49.215310]  dump_stack+0x76/0x94
>>> [   49.215314]  warn_alloc.cold+0x78/0xdc
>>> [   49.215316]  ? __alloc_pages+0x200/0x230
>>> [   49.215319]  vmemmap_alloc_block+0x86/0xdc
>>> [   49.215323]  vmemmap_populate+0x10e/0x31c
>>> [   49.215324]  __populate_section_memmap+0x38/0x4e
>>> [   49.215326]  sparse_add_section+0x12c/0x1cf
>>> [   49.215329]  __add_pages+0xa9/0x130
>>> [   49.215330]  add_pages+0x12/0x60
>>> [   49.215333]  add_memory_resource+0x180/0x300
>>> [   49.215335]  __add_memory+0x3b/0x80
>>> [   49.215336]  add_memory+0x2e/0x50
>>> [   49.215337]  hot_add_req+0x3fc/0x5a0 [hv_balloon]
>>> [   49.215340]  process_one_work+0x214/0x3e0
>>> [   49.215342]  worker_thread+0x4d/0x3d0
>>> [   49.215344]  ? process_one_work+0x3e0/0x3e0
>>> [   49.215345]  kthread+0x133/0x150
>>> [   49.215347]  ? kthread_associate_blkcg+0xc0/0xc0
>>> [   49.215348]  ret_from_fork+0x22/0x30
>>> [   49.215351] Mem-Info:
>>> [   49.215352] active_anon:251 inactive_anon:140868 isolated_anon:0
>>>                  active_file:47497 inactive_file:88505 isolated_file:0
>>>                  unevictable:8 dirty:14 writeback:0
>>>                  slab_reclaimable:12013 slab_unreclaimable:11403
>>>                  mapped:131701 shmem:12671 pagetables:3140 bounce:0
>>>                  free:41388 free_pcp:37 free_cma:0
>>> [   49.215355] Node 0 active_anon:1004kB inactive_anon:563472kB
>>> active_file:189988kB inactive_file:354020kB unevictable:32kB
>>> isolated(anon):0kB isolated(file):0kB mapped:526804kB dirty:56kB
>>> writeback:0kB shmem:50684kB shmem_thp: 0kB shmem_pmdmapped: 0kB
>>> anon_thp: 0kB writeback_tmp:0kB kernel_stack:5904kB
>>> pagetables:12560kB all_unreclaimable? no
>>> [   49.215358] Node 0 DMA free:6496kB min:480kB low:600kB high:720kB
>>> reserved_highatomic:0KB active_anon:0kB inactive_anon:3120kB
>>> active_file:2584kB inactive_file:2792kB unevictable:0kB
>>> writepending:0kB present:15996kB managed:15360kB mlocked:0kB
>>> bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>>> [   49.215361] lowmem_reserve[]: 0 1384 1384 1384 1384
>>> [   49.215364] Node 0 DMA32 free:159056kB min:44572kB low:55712kB
>>> high:66852kB reserved_highatomic:0KB active_anon:1004kB
>>> inactive_anon:560352kB active_file:187004kB inactive_file:350864kB
>>> unevictable:32kB writepending:56kB present:1555760kB
>>> managed:1432388kB mlocked:32kB bounce:0kB free_pcp:172kB
>>> local_pcp:0kB free_cma:0kB
>>> [   49.215367] lowmem_reserve[]: 0 0 0 0 0
>>> [   49.215369] Node 0 DMA: 17*4kB (UM) 13*8kB (M) 10*16kB (M) 3*32kB
>>> (ME) 3*64kB (UME) 4*128kB (UME) 1*256kB (E) 2*512kB (UE) 2*1024kB
>>> (ME) 1*2048kB (E) 0*4096kB = 6508kB
>>> [   49.215377] Node 0 DMA32: 8061*4kB (UME) 5892*8kB (UME) 2449*16kB
>>> (UME) 604*32kB (UME) 207*64kB (UME) 49*128kB (UM) 7*256kB (M) 1*512kB
>>> (M) 0*1024kB 0*2048kB 0*4096kB = 159716kB
>>> [   49.215388] 148696 total pagecache pages
>>> [   49.215388] 0 pages in swap cache
>>> [   49.215389] Swap cache stats: add 0, delete 0, find 0/0
>>> [   49.215390] Free swap  = 0kB
>>> [   49.215390] Total swap = 0kB
>>> [   49.215391] 392939 pages RAM
>>> [   49.215391] 0 pages HighMem/MovableOnly
>>> [   49.215391] 31002 pages reserved
>>> [   49.215392] 0 pages cma reserved
>>> [   49.215393] 0 pages hwpoisoned
>>>
>>> Is this a known issue and/or am I doing something wrong? I only noticed
>>> this because there are times when I am compiling something intensive in
>>> the VM such as LLVM and the VM runs out of memory even though I have
>>> plenty of free memory on the host but I am not sure if this warning is
>>> related to that issue.
>
> Hi,
>
> Is hotplugged memory getting onlined automatically (either from user
> space via a udev script or via the kernel, for example, with
> CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE)?

It does look like this kernel configuration has
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y.

> If it's not getting onlined, you easily sport after hotplug e.g., via
> "lsmem" that there are quite some offline memory blocks.
>
> Note that x86_64 code will fallback from populating huge pages to
> populating base pages for the vmemmap; this can happen easily when under
> memory pressure.

Not sure if it is relevant or not but this warning can show up within a
minute of startup without me doing anything in particular.

> If adding memory would fail completely, you'd see another "hot_add
> memory failed error is ..." error message from hyper-v in the kernel
> log. If that doesn't show up, it's simply suboptimal, but hotplugging
> memory still succeeded.

I did notice that from the code in hv_balloon.c but I do not think I
have ever seen that message in my logs.

> Note: we could support "memmap_on_memory" in some cases (e.g., no memory
> holes in hotadded range) when hotplugging memory blocks via hyper-v,
> which would result in this warning less trigger less frequently.

Cheers,
Nathan

2021-06-17 08:44:01

by David Hildenbrand

[permalink] [raw]
Subject: Re: vmemmap alloc failure in hot_add_req()

> It does look like this kernel configuration has
> CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y.

Okay, so then it's most likely really more of an issue with fragmented
physical memory -- which is suboptimal but not a show blocker in your setup.

(there are still cases where memory onlining can fail, especially with
kasan running, but these are rather corner cases)

>
>> If it's not getting onlined, you easily sport after hotplug e.g., via
>> "lsmem" that there are quite some offline memory blocks.
>>
>> Note that x86_64 code will fallback from populating huge pages to
>> populating base pages for the vmemmap; this can happen easily when under
>> memory pressure.
>
> Not sure if it is relevant or not but this warning can show up within a
> minute of startup without me doing anything in particular.

I remember that Hyper-V will start with a certain (configured) boot VM
memory size and once the guest is up and running, use memory stats of
the guest to decide whether to add (hotplug) or remove (balloon inflate)
memory from the VM.

So this could just be Hyper-V trying to apply its heuristics.

>
>> If adding memory would fail completely, you'd see another "hot_add
>> memory failed error is ..." error message from hyper-v in the kernel
>> log. If that doesn't show up, it's simply suboptimal, but hotplugging
>> memory still succeeded.
>
> I did notice that from the code in hv_balloon.c but I do not think I
> have ever seen that message in my logs.

Okay, so at least hotplugging memory is working.

--
Thanks,

David / dhildenb

2021-06-18 04:24:50

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: vmemmap alloc failure in hot_add_req()

From: David Hildenbrand <[email protected]> Sent: Thursday, June 17, 2021 1:43 AM
>
> > It does look like this kernel configuration has
> > CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y.
>
> Okay, so then it's most likely really more of an issue with fragmented
> physical memory -- which is suboptimal but not a show blocker in your setup.
>
> (there are still cases where memory onlining can fail, especially with
> kasan running, but these are rather corner cases)
>
> >
> >> If it's not getting onlined, you easily sport after hotplug e.g., via
> >> "lsmem" that there are quite some offline memory blocks.
> >>
> >> Note that x86_64 code will fallback from populating huge pages to
> >> populating base pages for the vmemmap; this can happen easily when under
> >> memory pressure.
> >
> > Not sure if it is relevant or not but this warning can show up within a
> > minute of startup without me doing anything in particular.
>
> I remember that Hyper-V will start with a certain (configured) boot VM
> memory size and once the guest is up and running, use memory stats of
> the guest to decide whether to add (hotplug) or remove (balloon inflate)
> memory from the VM.
>
> So this could just be Hyper-V trying to apply its heuristics.

Nathan --

Could you clarify if your VM is running in the context of the Windows
Subsystem for Linux (WSL) v2 feature in Windows 10? Or are you
running a "traditional" VM created using the Hyper-V Manager UI
or Powershell?

If the latter, how do you have the memory configuration set up? In
the UI, first you can specify the RAM allocated to the VM. Then
separately, you can enable the "Dynamic Memory" feature, in which
case you also specify a "Minimum RAM" and "Maximum RAM". It
looks like you must have the "Dynamic Memory" feature enabled
since the original stack trace includes the hot_add_req() function
from the hv_balloon driver.

The Dynamic Memory feature is generally used only when you
need to allow Hyper-V to manage the allocation of physical memory
across multiple VMs. Dynamic Memory is essentially Hyper-V's way of
allowing memory overcommit. If you don't need that capability,
turning off Dynamic Memory and just specifying the amount of
memory you want to assign to the VM is the best course of action.

With Dynamic Memory enabled, you may have encountered a
situation where the memory needs of the VM grew very quickly,
and Hyper-V balloon driver got into a situation where it needed
to allocate memory in order to add memory, and it couldn't. If
you want to continue to use the Dynamic Memory feature, then
you probably need to increase the initial amount of RAM assigned
to the VM (the "RAM" setting in the Hyper-V Manager UI).

Michael

>
> >
> >> If adding memory would fail completely, you'd see another "hot_add
> >> memory failed error is ..." error message from hyper-v in the kernel
> >> log. If that doesn't show up, it's simply suboptimal, but hotplugging
> >> memory still succeeded.
> >
> > I did notice that from the code in hv_balloon.c but I do not think I
> > have ever seen that message in my logs.
>
> Okay, so at least hotplugging memory is working.
>
> --
> Thanks,
>
> David / dhildenb

2021-06-19 07:04:59

by Nathan Chancellor

[permalink] [raw]
Subject: Re: vmemmap alloc failure in hot_add_req()

Hi Michael,

On 6/17/2021 5:16 PM, Michael Kelley wrote:
> From: David Hildenbrand <[email protected]> Sent: Thursday, June 17, 2021 1:43 AM
>>
>>> It does look like this kernel configuration has
>>> CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y.
>>
>> Okay, so then it's most likely really more of an issue with fragmented
>> physical memory -- which is suboptimal but not a show blocker in your setup.
>>
>> (there are still cases where memory onlining can fail, especially with
>> kasan running, but these are rather corner cases)
>>
>>>
>>>> If it's not getting onlined, you easily sport after hotplug e.g., via
>>>> "lsmem" that there are quite some offline memory blocks.
>>>>
>>>> Note that x86_64 code will fallback from populating huge pages to
>>>> populating base pages for the vmemmap; this can happen easily when under
>>>> memory pressure.
>>>
>>> Not sure if it is relevant or not but this warning can show up within a
>>> minute of startup without me doing anything in particular.
>>
>> I remember that Hyper-V will start with a certain (configured) boot VM
>> memory size and once the guest is up and running, use memory stats of
>> the guest to decide whether to add (hotplug) or remove (balloon inflate)
>> memory from the VM.
>>
>> So this could just be Hyper-V trying to apply its heuristics.
>
> Nathan --
>
> Could you clarify if your VM is running in the context of the Windows
> Subsystem for Linux (WSL) v2 feature in Windows 10? Or are you
> running a "traditional" VM created using the Hyper-V Manager UI
> or Powershell?

This is a traditional VM created using the Hyper-V Manager.

> If the latter, how do you have the memory configuration set up? In
> the UI, first you can specify the RAM allocated to the VM. Then
> separately, you can enable the "Dynamic Memory" feature, in which
> case you also specify a "Minimum RAM" and "Maximum RAM". It
> looks like you must have the "Dynamic Memory" feature enabled
> since the original stack trace includes the hot_add_req() function
> from the hv_balloon driver.

That is correct. I believe Dynamic Memory is the default setting so I
just left that as it was. The startup memory for this virtual machine is
2GB as it is a lightweight Arch Linux Xfce4 configuration and aside from
occasionally compiling software, it will just be sitting there because
it is mainly there for testing kernels.

> The Dynamic Memory feature is generally used only when you
> need to allow Hyper-V to manage the allocation of physical memory
> across multiple VMs. Dynamic Memory is essentially Hyper-V's way of
> allowing memory overcommit. If you don't need that capability,
> turning off Dynamic Memory and just specifying the amount of
> memory you want to assign to the VM is the best course of action.

Ack. My workstation was occasionally memory constrained so I figured
relying on the Dynamic Memory feature would make sense. I upgraded the
amount of RAM that I had today so I will probably just end up disabling
the Dynamic Memory feature and allocating the amount of memory up front.

> With Dynamic Memory enabled, you may have encountered a
> situation where the memory needs of the VM grew very quickly,
> and Hyper-V balloon driver got into a situation where it needed
> to allocate memory in order to add memory, and it couldn't. If
> you want to continue to use the Dynamic Memory feature, then
> you probably need to increase the initial amount of RAM assigned
> to the VM (the "RAM" setting in the Hyper-V Manager UI).

I will keep that in mind and see if I can find a good number.

Thanks for the reply!

Cheers,
Nathan