2018-12-26 14:23:24

by Fengguang Wu

[permalink] [raw]
Subject: [RFC][PATCH v2 01/21] e820: cheat PMEM as DRAM

From: Fan Du <[email protected]>

This is a hack to enumerate PMEM as NUMA nodes.
It's necessary for current BIOS that don't yet fill ACPI HMAT table.

WARNING: take care to backup. It is mutual exclusive with libnvdimm
subsystem and can destroy ndctl managed namespaces.

Signed-off-by: Fan Du <[email protected]>
Signed-off-by: Fengguang Wu <[email protected]>
---
arch/x86/kernel/e820.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- linux.orig/arch/x86/kernel/e820.c 2018-12-23 19:20:34.587078783 +0800
+++ linux/arch/x86/kernel/e820.c 2018-12-23 19:20:34.587078783 +0800
@@ -403,7 +403,8 @@ static int __init __append_e820_table(st
/* Ignore the entry on 64-bit overflow: */
if (start > end && likely(size))
return -1;
-
+ if (type == E820_TYPE_PMEM)
+ type = E820_TYPE_RAM;
e820__range_add(start, size, type);

entry++;




2018-12-27 17:09:47

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [RFC][PATCH v2 01/21] e820: cheat PMEM as DRAM

On Wed, Dec 26, 2018 at 09:14:47PM +0800, Fengguang Wu wrote:
> From: Fan Du <[email protected]>
>
> This is a hack to enumerate PMEM as NUMA nodes.
> It's necessary for current BIOS that don't yet fill ACPI HMAT table.
>
> WARNING: take care to backup. It is mutual exclusive with libnvdimm
> subsystem and can destroy ndctl managed namespaces.

Why depend on firmware to present this "correctly"? It seems to me like
less effort all around to have ndctl label some namespaces as being for
this kind of use.

2018-12-27 18:15:41

by Fengguang Wu

[permalink] [raw]
Subject: Re: [RFC][PATCH v2 01/21] e820: cheat PMEM as DRAM

On Wed, Dec 26, 2018 at 07:41:41PM -0800, Matthew Wilcox wrote:
>On Wed, Dec 26, 2018 at 09:14:47PM +0800, Fengguang Wu wrote:
>> From: Fan Du <[email protected]>
>>
>> This is a hack to enumerate PMEM as NUMA nodes.
>> It's necessary for current BIOS that don't yet fill ACPI HMAT table.
>>
>> WARNING: take care to backup. It is mutual exclusive with libnvdimm
>> subsystem and can destroy ndctl managed namespaces.
>
>Why depend on firmware to present this "correctly"? It seems to me like
>less effort all around to have ndctl label some namespaces as being for
>this kind of use.

Dave Hansen may be more suitable to answer your question. He posted
patches to make PMEM NUMA node coexist with libnvdimm and ndctl:

[PATCH 0/9] Allow persistent memory to be used like normal RAM
https://lkml.org/lkml/2018/10/23/9

That depends on future BIOS. So we did this quick hack to test out
PMEM NUMA node for the existing BIOS.

Thanks,
Fengguang

2018-12-27 18:15:41

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC][PATCH v2 01/21] e820: cheat PMEM as DRAM

On Wed, Dec 26, 2018 at 8:11 PM Fengguang Wu <[email protected]> wrote:
>
> On Wed, Dec 26, 2018 at 07:41:41PM -0800, Matthew Wilcox wrote:
> >On Wed, Dec 26, 2018 at 09:14:47PM +0800, Fengguang Wu wrote:
> >> From: Fan Du <[email protected]>
> >>
> >> This is a hack to enumerate PMEM as NUMA nodes.
> >> It's necessary for current BIOS that don't yet fill ACPI HMAT table.
> >>
> >> WARNING: take care to backup. It is mutual exclusive with libnvdimm
> >> subsystem and can destroy ndctl managed namespaces.
> >
> >Why depend on firmware to present this "correctly"? It seems to me like
> >less effort all around to have ndctl label some namespaces as being for
> >this kind of use.
>
> Dave Hansen may be more suitable to answer your question. He posted
> patches to make PMEM NUMA node coexist with libnvdimm and ndctl:
>
> [PATCH 0/9] Allow persistent memory to be used like normal RAM
> https://lkml.org/lkml/2018/10/23/9
>
> That depends on future BIOS. So we did this quick hack to test out
> PMEM NUMA node for the existing BIOS.

No, it does not depend on a future BIOS.

Willy, have a look here [1], here [2], and here [3] for the
work-in-progress ndctl takeover approach (actually 'daxctl' in this
case).

[1]: https://lkml.org/lkml/2018/10/23/9
[2]: https://lkml.org/lkml/2018/10/31/243
[3]: https://lists.01.org/pipermail/linux-nvdimm/2018-November/018677.html

2018-12-28 07:19:05

by Yang Shi

[permalink] [raw]
Subject: Re: [RFC][PATCH v2 01/21] e820: cheat PMEM as DRAM

On Wed, Dec 26, 2018 at 9:13 PM Dan Williams <[email protected]> wrote:
>
> On Wed, Dec 26, 2018 at 8:11 PM Fengguang Wu <[email protected]> wrote:
> >
> > On Wed, Dec 26, 2018 at 07:41:41PM -0800, Matthew Wilcox wrote:
> > >On Wed, Dec 26, 2018 at 09:14:47PM +0800, Fengguang Wu wrote:
> > >> From: Fan Du <[email protected]>
> > >>
> > >> This is a hack to enumerate PMEM as NUMA nodes.
> > >> It's necessary for current BIOS that don't yet fill ACPI HMAT table.
> > >>
> > >> WARNING: take care to backup. It is mutual exclusive with libnvdimm
> > >> subsystem and can destroy ndctl managed namespaces.
> > >
> > >Why depend on firmware to present this "correctly"? It seems to me like
> > >less effort all around to have ndctl label some namespaces as being for
> > >this kind of use.
> >
> > Dave Hansen may be more suitable to answer your question. He posted
> > patches to make PMEM NUMA node coexist with libnvdimm and ndctl:
> >
> > [PATCH 0/9] Allow persistent memory to be used like normal RAM
> > https://lkml.org/lkml/2018/10/23/9
> >
> > That depends on future BIOS. So we did this quick hack to test out
> > PMEM NUMA node for the existing BIOS.
>
> No, it does not depend on a future BIOS.

It is correct. We already have Dave's patches + Dan's patch (added
target_node field) work on our machine which has SRAT.

Thanks,
Yang

>
> Willy, have a look here [1], here [2], and here [3] for the
> work-in-progress ndctl takeover approach (actually 'daxctl' in this
> case).
>
> [1]: https://lkml.org/lkml/2018/10/23/9
> [2]: https://lkml.org/lkml/2018/10/31/243
> [3]: https://lists.01.org/pipermail/linux-nvdimm/2018-November/018677.html
>

2018-12-28 15:23:05

by Fengguang Wu

[permalink] [raw]
Subject: Re: [RFC][PATCH v2 01/21] e820: cheat PMEM as DRAM

On Thu, Dec 27, 2018 at 11:32:06AM -0800, Yang Shi wrote:
>On Wed, Dec 26, 2018 at 9:13 PM Dan Williams <[email protected]> wrote:
>>
>> On Wed, Dec 26, 2018 at 8:11 PM Fengguang Wu <[email protected]> wrote:
>> >
>> > On Wed, Dec 26, 2018 at 07:41:41PM -0800, Matthew Wilcox wrote:
>> > >On Wed, Dec 26, 2018 at 09:14:47PM +0800, Fengguang Wu wrote:
>> > >> From: Fan Du <[email protected]>
>> > >>
>> > >> This is a hack to enumerate PMEM as NUMA nodes.
>> > >> It's necessary for current BIOS that don't yet fill ACPI HMAT table.
>> > >>
>> > >> WARNING: take care to backup. It is mutual exclusive with libnvdimm
>> > >> subsystem and can destroy ndctl managed namespaces.
>> > >
>> > >Why depend on firmware to present this "correctly"? It seems to me like
>> > >less effort all around to have ndctl label some namespaces as being for
>> > >this kind of use.
>> >
>> > Dave Hansen may be more suitable to answer your question. He posted
>> > patches to make PMEM NUMA node coexist with libnvdimm and ndctl:
>> >
>> > [PATCH 0/9] Allow persistent memory to be used like normal RAM
>> > https://lkml.org/lkml/2018/10/23/9
>> >
>> > That depends on future BIOS. So we did this quick hack to test out
>> > PMEM NUMA node for the existing BIOS.
>>
>> No, it does not depend on a future BIOS.
>
>It is correct. We already have Dave's patches + Dan's patch (added
>target_node field) work on our machine which has SRAT.

Thanks for the correction. It looks my perception was out of date.
So we can follow Dave+Dan's patches to create the PMEM NUMA nodes.

Thanks,
Fengguang

>>
>> Willy, have a look here [1], here [2], and here [3] for the
>> work-in-progress ndctl takeover approach (actually 'daxctl' in this
>> case).
>>
>> [1]: https://lkml.org/lkml/2018/10/23/9
>> [2]: https://lkml.org/lkml/2018/10/31/243
>> [3]: https://lists.01.org/pipermail/linux-nvdimm/2018-November/018677.html
>>
>