2013-06-24 19:02:03

by Vinod, Chegu

[permalink] [raw]
Subject: kvm_intel: Could not allocate 42 bytes percpu data


Hello,

Lots (~700+) of the following messages are showing up in the dmesg of a
3.10-rc1 based kernel (Host OS is running on a large socket count box
with HT-on).

[ 82.270682] PERCPU: allocation failed, size=42 align=16, alloc from
reserved chunk failed
[ 82.272633] kvm_intel: Could not allocate 42 bytes percpu data

... also call traces like the following...

[ 101.852136] ffffc901ad5aa090 ffff88084675dd08 ffffffff81633743
ffff88084675ddc8
[ 101.860889] ffffffff81145053 ffffffff81f3fa78 ffff88084809dd40
ffff8907d1cfd2e8
[ 101.869466] ffff8907d1cfd280 ffff88087fffdb08 ffff88084675c010
ffff88084675dfd8
[ 101.878190] Call Trace:
[ 101.880953] [<ffffffff81633743>] dump_stack+0x19/0x1e
[ 101.886679] [<ffffffff81145053>] pcpu_alloc+0x9a3/0xa40
[ 101.892754] [<ffffffff81145103>] __alloc_reserved_percpu+0x13/0x20
[ 101.899733] [<ffffffff810b2d7f>] load_module+0x35f/0x1a70
[ 101.905835] [<ffffffff8163ad6e>] ? do_page_fault+0xe/0x10
[ 101.911953] [<ffffffff810b467b>] SyS_init_module+0xfb/0x140
[ 101.918287] [<ffffffff8163f542>] system_call_fastpath+0x16/0x1b
[ 101.924981] kvm_intel: Could not allocate 42 bytes percpu data


Wondering if anyone else has seen this with the recent [3.10] based
kernels esp. on larger boxes?

There was a similar issue that was reported earlier (where modules were
being loaded per cpu without checking if an instance was already
loaded/being-loaded). That issue seems to have been addressed in the
recent past (e.g. https://lkml.org/lkml/2013/1/24/659 along with a
couple of follow on cleanups) Is the above yet another variant of the
original issue or perhaps some race condition that got exposed when
there are lot more threads ?

Vinod



2013-06-24 22:52:50

by Prarit Bhargava

[permalink] [raw]
Subject: Re: kvm_intel: Could not allocate 42 bytes percpu data



On 06/24/2013 03:01 PM, Chegu Vinod wrote:
>
> Hello,
>
> Lots (~700+) of the following messages are showing up in the dmesg of a 3.10-rc1
> based kernel (Host OS is running on a large socket count box with HT-on).
>
> [ 82.270682] PERCPU: allocation failed, size=42 align=16, alloc from reserved
> chunk failed
> [ 82.272633] kvm_intel: Could not allocate 42 bytes percpu data

On 3.10? Geez. I thought we had fixed this. I'll grab a big machine and see
if I can debug.

Rusty -- any ideas off the top of your head?'
>
> ... also call traces like the following...
>
> [ 101.852136] ffffc901ad5aa090 ffff88084675dd08 ffffffff81633743 ffff88084675ddc8
> [ 101.860889] ffffffff81145053 ffffffff81f3fa78 ffff88084809dd40 ffff8907d1cfd2e8
> [ 101.869466] ffff8907d1cfd280 ffff88087fffdb08 ffff88084675c010 ffff88084675dfd8
> [ 101.878190] Call Trace:
> [ 101.880953] [<ffffffff81633743>] dump_stack+0x19/0x1e
> [ 101.886679] [<ffffffff81145053>] pcpu_alloc+0x9a3/0xa40
> [ 101.892754] [<ffffffff81145103>] __alloc_reserved_percpu+0x13/0x20
> [ 101.899733] [<ffffffff810b2d7f>] load_module+0x35f/0x1a70
> [ 101.905835] [<ffffffff8163ad6e>] ? do_page_fault+0xe/0x10
> [ 101.911953] [<ffffffff810b467b>] SyS_init_module+0xfb/0x140
> [ 101.918287] [<ffffffff8163f542>] system_call_fastpath+0x16/0x1b
> [ 101.924981] kvm_intel: Could not allocate 42 bytes percpu data
>
>
> Wondering if anyone else has seen this with the recent [3.10] based kernels esp.
> on larger boxes?
>
> There was a similar issue that was reported earlier (where modules were being
> loaded per cpu without checking if an instance was already loaded/being-loaded).
> That issue seems to have been addressed in the recent past (e.g.
> https://lkml.org/lkml/2013/1/24/659 along with a couple of follow on cleanups)
> Is the above yet another variant of the original issue or perhaps some race
> condition that got exposed when there are lot more threads ?

Hmm ... not sure but yeah, that's the likely culprit.

P.

2013-06-27 01:28:16

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: kvm_intel: Could not allocate 42 bytes percpu data

On Mon, Jun 24, 2013 at 06:52:44PM -0400, Prarit Bhargava wrote:
>
>
> On 06/24/2013 03:01 PM, Chegu Vinod wrote:
> >
> > Hello,
> >
> > Lots (~700+) of the following messages are showing up in the dmesg of a 3.10-rc1
> > based kernel (Host OS is running on a large socket count box with HT-on).
> >
> > [ 82.270682] PERCPU: allocation failed, size=42 align=16, alloc from reserved
> > chunk failed
> > [ 82.272633] kvm_intel: Could not allocate 42 bytes percpu data
>
> On 3.10? Geez. I thought we had fixed this. I'll grab a big machine and see
> if I can debug.
>
> Rusty -- any ideas off the top of your head?'

As far as my limited understanding goes, the reserved space setup by
arch code for percpu allocations, is limited and subject to exhaustion.

It would be best if the allocator could handle the allocation, but
otherwise, switching vmx.c to dynamic allocations for the percpu
regions is an option (see 013f6a5d3dd9e4).

It should be similar to convert these two larger data structures:

static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
static DEFINE_PER_CPU(struct desc_ptr, host_gdt);


> >
> > ... also call traces like the following...
> >
> > [ 101.852136] ffffc901ad5aa090 ffff88084675dd08 ffffffff81633743 ffff88084675ddc8
> > [ 101.860889] ffffffff81145053 ffffffff81f3fa78 ffff88084809dd40 ffff8907d1cfd2e8
> > [ 101.869466] ffff8907d1cfd280 ffff88087fffdb08 ffff88084675c010 ffff88084675dfd8
> > [ 101.878190] Call Trace:
> > [ 101.880953] [<ffffffff81633743>] dump_stack+0x19/0x1e
> > [ 101.886679] [<ffffffff81145053>] pcpu_alloc+0x9a3/0xa40
> > [ 101.892754] [<ffffffff81145103>] __alloc_reserved_percpu+0x13/0x20
> > [ 101.899733] [<ffffffff810b2d7f>] load_module+0x35f/0x1a70
> > [ 101.905835] [<ffffffff8163ad6e>] ? do_page_fault+0xe/0x10
> > [ 101.911953] [<ffffffff810b467b>] SyS_init_module+0xfb/0x140
> > [ 101.918287] [<ffffffff8163f542>] system_call_fastpath+0x16/0x1b
> > [ 101.924981] kvm_intel: Could not allocate 42 bytes percpu data
> >
> >
> > Wondering if anyone else has seen this with the recent [3.10] based kernels esp.
> > on larger boxes?
> >
> > There was a similar issue that was reported earlier (where modules were being
> > loaded per cpu without checking if an instance was already loaded/being-loaded).
> > That issue seems to have been addressed in the recent past (e.g.
> > https://lkml.org/lkml/2013/1/24/659 along with a couple of follow on cleanups)
> > Is the above yet another variant of the original issue or perhaps some race
> > condition that got exposed when there are lot more threads ?
>
> Hmm ... not sure but yeah, that's the likely culprit.
>
> P.

2013-07-01 08:53:07

by Rusty Russell

[permalink] [raw]
Subject: Re: kvm_intel: Could not allocate 42 bytes percpu data

Chegu Vinod <[email protected]> writes:
> Hello,
>
> Lots (~700+) of the following messages are showing up in the dmesg of a
> 3.10-rc1 based kernel (Host OS is running on a large socket count box
> with HT-on).
>
> [ 82.270682] PERCPU: allocation failed, size=42 align=16, alloc from
> reserved chunk failed
> [ 82.272633] kvm_intel: Could not allocate 42 bytes percpu data

Woah, weird....

Oh. Shit. Um, this is embarrassing.

Thanks,
Rusty.
===
module: do percpu allocation after uniqueness check. No, really!

v3.8-rc1-5-g1fb9341 was supposed to stop parallel kvm loads exhausting
percpu memory on large machines:

Now we have a new state MODULE_STATE_UNFORMED, we can insert the
module into the list (and thus guarantee its uniqueness) before we
allocate the per-cpu region.

In my defence, it didn't actually say the patch did this. Just that
we "can".

This patch actually *does* it.

Signed-off-by: Rusty Russell <[email protected]>
Tested-by: Noone it seems.

diff --git a/kernel/module.c b/kernel/module.c
index cab4bce..fa53db8 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2927,7 +2927,6 @@ static struct module *layout_and_allocate(struct load_info *info, int flags)
{
/* Module within temporary copy. */
struct module *mod;
- Elf_Shdr *pcpusec;
int err;

mod = setup_load_info(info, flags);
@@ -2942,17 +2941,10 @@ static struct module *layout_and_allocate(struct load_info *info, int flags)
err = module_frob_arch_sections(info->hdr, info->sechdrs,
info->secstrings, mod);
if (err < 0)
- goto out;
+ return ERR_PTR(err);

- pcpusec = &info->sechdrs[info->index.pcpu];
- if (pcpusec->sh_size) {
- /* We have a special allocation for this section. */
- err = percpu_modalloc(mod,
- pcpusec->sh_size, pcpusec->sh_addralign);
- if (err)
- goto out;
- pcpusec->sh_flags &= ~(unsigned long)SHF_ALLOC;
- }
+ /* We will do a special allocation for per-cpu sections later. */
+ info->sechdrs[info->index.pcpu].sh_flags &= ~(unsigned long)SHF_ALLOC;

/* Determine total sizes, and put offsets in sh_entsize. For now
this is done generically; there doesn't appear to be any
@@ -2963,17 +2955,22 @@ static struct module *layout_and_allocate(struct load_info *info, int flags)
/* Allocate and move to the final place */
err = move_module(mod, info);
if (err)
- goto free_percpu;
+ return ERR_PTR(err);

/* Module has been copied to its final place now: return it. */
mod = (void *)info->sechdrs[info->index.mod].sh_addr;
kmemleak_load_module(mod, info);
return mod;
+}

-free_percpu:
- percpu_modfree(mod);
-out:
- return ERR_PTR(err);
+static int alloc_module_percpu(struct module *mod, struct load_info *info)
+{
+ Elf_Shdr *pcpusec = &info->sechdrs[info->index.pcpu];
+ if (!pcpusec->sh_size)
+ return 0;
+
+ /* We have a special allocation for this section. */
+ return percpu_modalloc(mod, pcpusec->sh_size, pcpusec->sh_addralign);
}

/* mod is no longer valid after this! */
@@ -3237,6 +3234,11 @@ static int load_module(struct load_info *info, const char __user *uargs,
}
#endif

+ /* To avoid stressing percpu allocator, do this once we're unique. */
+ err = alloc_module_percpu(mod, info);
+ if (err)
+ goto unlink_mod;
+
/* Now module is in final location, initialize linked lists, etc. */
err = module_unload_init(mod);
if (err)

2013-07-02 01:13:24

by Vinod, Chegu

[permalink] [raw]
Subject: Re: kvm_intel: Could not allocate 42 bytes percpu data

On 6/30/2013 11:22 PM, Rusty Russell wrote:
> Chegu Vinod <[email protected]> writes:
>> Hello,
>>
>> Lots (~700+) of the following messages are showing up in the dmesg of a
>> 3.10-rc1 based kernel (Host OS is running on a large socket count box
>> with HT-on).
>>
>> [ 82.270682] PERCPU: allocation failed, size=42 align=16, alloc from
>> reserved chunk failed
>> [ 82.272633] kvm_intel: Could not allocate 42 bytes percpu data
> Woah, weird....
>
> Oh. Shit. Um, this is embarrassing.
>
> Thanks,
> Rusty.


Thanks for your response!

> ===
> module: do percpu allocation after uniqueness check. No, really!
>
> v3.8-rc1-5-g1fb9341 was supposed to stop parallel kvm loads exhausting
> percpu memory on large machines:
>
> Now we have a new state MODULE_STATE_UNFORMED, we can insert the
> module into the list (and thus guarantee its uniqueness) before we
> allocate the per-cpu region.
>
> In my defence, it didn't actually say the patch did this. Just that
> we "can".
>
> This patch actually *does* it.
>
> Signed-off-by: Rusty Russell <[email protected]>
> Tested-by: Noone it seems.

Your following "updated" fix seems to be working fine on the larger
socket count machine with HT-on.

Thx
Vinod
>
> diff --git a/kernel/module.c b/kernel/module.c
> index cab4bce..fa53db8 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -2927,7 +2927,6 @@ static struct module *layout_and_allocate(struct load_info *info, int flags)
> {
> /* Module within temporary copy. */
> struct module *mod;
> - Elf_Shdr *pcpusec;
> int err;
>
> mod = setup_load_info(info, flags);
> @@ -2942,17 +2941,10 @@ static struct module *layout_and_allocate(struct load_info *info, int flags)
> err = module_frob_arch_sections(info->hdr, info->sechdrs,
> info->secstrings, mod);
> if (err < 0)
> - goto out;
> + return ERR_PTR(err);
>
> - pcpusec = &info->sechdrs[info->index.pcpu];
> - if (pcpusec->sh_size) {
> - /* We have a special allocation for this section. */
> - err = percpu_modalloc(mod,
> - pcpusec->sh_size, pcpusec->sh_addralign);
> - if (err)
> - goto out;
> - pcpusec->sh_flags &= ~(unsigned long)SHF_ALLOC;
> - }
> + /* We will do a special allocation for per-cpu sections later. */
> + info->sechdrs[info->index.pcpu].sh_flags &= ~(unsigned long)SHF_ALLOC;
>
> /* Determine total sizes, and put offsets in sh_entsize. For now
> this is done generically; there doesn't appear to be any
> @@ -2963,17 +2955,22 @@ static struct module *layout_and_allocate(struct load_info *info, int flags)
> /* Allocate and move to the final place */
> err = move_module(mod, info);
> if (err)
> - goto free_percpu;
> + return ERR_PTR(err);
>
> /* Module has been copied to its final place now: return it. */
> mod = (void *)info->sechdrs[info->index.mod].sh_addr;
> kmemleak_load_module(mod, info);
> return mod;
> +}
>
> -free_percpu:
> - percpu_modfree(mod);
> -out:
> - return ERR_PTR(err);
> +static int alloc_module_percpu(struct module *mod, struct load_info *info)
> +{
> + Elf_Shdr *pcpusec = &info->sechdrs[info->index.pcpu];
> + if (!pcpusec->sh_size)
> + return 0;
> +
> + /* We have a special allocation for this section. */
> + return percpu_modalloc(mod, pcpusec->sh_size, pcpusec->sh_addralign);
> }
>
> /* mod is no longer valid after this! */
> @@ -3237,6 +3234,11 @@ static int load_module(struct load_info *info, const char __user *uargs,
> }
> #endif
>
> + /* To avoid stressing percpu allocator, do this once we're unique. */
> + err = alloc_module_percpu(mod, info);
> + if (err)
> + goto unlink_mod;
> +
> /* Now module is in final location, initialize linked lists, etc. */
> err = module_unload_init(mod);
> if (err)
> .
>

2013-07-02 12:23:37

by Rusty Russell

[permalink] [raw]
Subject: Re: kvm_intel: Could not allocate 42 bytes percpu data

Chegu Vinod <[email protected]> writes:
> On 6/30/2013 11:22 PM, Rusty Russell wrote:
>> Chegu Vinod <[email protected]> writes:
>>> Hello,
>>>
>>> Lots (~700+) of the following messages are showing up in the dmesg of a
>>> 3.10-rc1 based kernel (Host OS is running on a large socket count box
>>> with HT-on).
>>>
>>> [ 82.270682] PERCPU: allocation failed, size=42 align=16, alloc from
>>> reserved chunk failed
>>> [ 82.272633] kvm_intel: Could not allocate 42 bytes percpu data
>> Woah, weird....
>>
>> Oh. Shit. Um, this is embarrassing.
>>
>> Thanks,
>> Rusty.
>
>
> Thanks for your response!
>
>> ===
>> module: do percpu allocation after uniqueness check. No, really!
>>
>> v3.8-rc1-5-g1fb9341 was supposed to stop parallel kvm loads exhausting
>> percpu memory on large machines:
>>
>> Now we have a new state MODULE_STATE_UNFORMED, we can insert the
>> module into the list (and thus guarantee its uniqueness) before we
>> allocate the per-cpu region.
>>
>> In my defence, it didn't actually say the patch did this. Just that
>> we "can".
>>
>> This patch actually *does* it.
>>
>> Signed-off-by: Rusty Russell <[email protected]>
>> Tested-by: Noone it seems.
>
> Your following "updated" fix seems to be working fine on the larger
> socket count machine with HT-on.

OK, did you definitely revert every other workaround?

If so, please give me a Tested-by: line...

Thanks,
Rusty.

2013-07-02 16:34:53

by Vinod, Chegu

[permalink] [raw]
Subject: Re: kvm_intel: Could not allocate 42 bytes percpu data

On 7/1/2013 10:49 PM, Rusty Russell wrote:
> Chegu Vinod <[email protected]> writes:
>> On 6/30/2013 11:22 PM, Rusty Russell wrote:
>>> Chegu Vinod <[email protected]> writes:
>>>> Hello,
>>>>
>>>> Lots (~700+) of the following messages are showing up in the dmesg of a
>>>> 3.10-rc1 based kernel (Host OS is running on a large socket count box
>>>> with HT-on).
>>>>
>>>> [ 82.270682] PERCPU: allocation failed, size=42 align=16, alloc from
>>>> reserved chunk failed
>>>> [ 82.272633] kvm_intel: Could not allocate 42 bytes percpu data
>>> Woah, weird....
>>>
>>> Oh. Shit. Um, this is embarrassing.
>>>
>>> Thanks,
>>> Rusty.
>>
>> Thanks for your response!
>>
>>> ===
>>> module: do percpu allocation after uniqueness check. No, really!
>>>
>>> v3.8-rc1-5-g1fb9341 was supposed to stop parallel kvm loads exhausting
>>> percpu memory on large machines:
>>>
>>> Now we have a new state MODULE_STATE_UNFORMED, we can insert the
>>> module into the list (and thus guarantee its uniqueness) before we
>>> allocate the per-cpu region.
>>>
>>> In my defence, it didn't actually say the patch did this. Just that
>>> we "can".
>>>
>>> This patch actually *does* it.
>>>
>>> Signed-off-by: Rusty Russell <[email protected]>
>>> Tested-by: Noone it seems.
>> Your following "updated" fix seems to be working fine on the larger
>> socket count machine with HT-on.
> OK, did you definitely revert every other workaround?

Yes no other workarounds were there when your change was tested.

>
> If so, please give me a Tested-by: line...

FYI.... The actual verification of your change was done by my esteemed
colleague :Jim Hull (cc'd) who had access to this larger socket count box.



Tested-by: Jim Hull <[email protected]>




Thanks
Vinod


>
> Thanks,
> Rusty.
> .
>

2013-07-03 01:16:05

by Rusty Russell

[permalink] [raw]
Subject: Re: kvm_intel: Could not allocate 42 bytes percpu data

Chegu Vinod <[email protected]> writes:
> On 7/1/2013 10:49 PM, Rusty Russell wrote:
>> Chegu Vinod <[email protected]> writes:
>>> On 6/30/2013 11:22 PM, Rusty Russell wrote:
>>>> module: do percpu allocation after uniqueness check. No, really!
>>>>
>>>> v3.8-rc1-5-g1fb9341 was supposed to stop parallel kvm loads exhausting
>>>> percpu memory on large machines:
>>>>
>>>> Now we have a new state MODULE_STATE_UNFORMED, we can insert the
>>>> module into the list (and thus guarantee its uniqueness) before we
>>>> allocate the per-cpu region.
>>>>
>>>> In my defence, it didn't actually say the patch did this. Just that
>>>> we "can".
>>>>
>>>> This patch actually *does* it.
>>>>
>>>> Signed-off-by: Rusty Russell <[email protected]>
>>>> Tested-by: Noone it seems.
>>> Your following "updated" fix seems to be working fine on the larger
>>> socket count machine with HT-on.
>> OK, did you definitely revert every other workaround?
>
> Yes no other workarounds were there when your change was tested.
>
>>
>> If so, please give me a Tested-by: line...
>
> FYI.... The actual verification of your change was done by my esteemed
> colleague :Jim Hull (cc'd) who had access to this larger socket count box.
>
>
>
> Tested-by: Jim Hull <[email protected]>

Thanks, I've put this in my -next tree, and CC'd stable.

Cheers,
Rusty.