2013-04-11 14:56:01

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 5/5] kexec: X86: Pass memory ranges via e820 table instead of memmap= boot parameter

On Thu, Apr 11, 2013 at 5:26 AM, Thomas Renninger <[email protected]> wrote:
> Currently ranges are passed via kernel boot parameters:
> memmap=exactmap memmap=X#Y memmap=
>
> Pass them via e820 table directly instead.

how to address "saved_max_pfn" referring in kernel?

kernel need to use saved_max_pfn from old e820 in
drivers/char/mem.c::read_oldmem()

mips and powerpc they are passing that from command line "savemaxmem="

x86 should use that too?

Thanks

Yinghai


2013-04-11 15:08:04

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 5/5] kexec: X86: Pass memory ranges via e820 table instead of memmap= boot parameter

On 04/11/2013 07:55 AM, Yinghai Lu wrote:
> On Thu, Apr 11, 2013 at 5:26 AM, Thomas Renninger <[email protected]> wrote:
>> Currently ranges are passed via kernel boot parameters:
>> memmap=exactmap memmap=X#Y memmap=
>>
>> Pass them via e820 table directly instead.
>
> how to address "saved_max_pfn" referring in kernel?
>
> kernel need to use saved_max_pfn from old e820 in
> drivers/char/mem.c::read_oldmem()
>
> mips and powerpc they are passing that from command line "savemaxmem="
>
> x86 should use that too?
>

Oh bloody hell, yet another f-ing "max_pfn" variable.

The *only* one that makes any kind of sense is max_low_pfn (marking the
cutoff to highmem)... the pretty much the rest of them are just plain wrong.

And I don't mean "mildly annoying", I mean "catastrophically wrong
semantics". In this case, it introduces a completely arbitrary
distinction between a nonmemory range below a high water mark and a
nonmemory range above that high water mark. In fact, from reading the
code it seems pretty clear that the device will blindly assume that
anything below saved_max_pfn is memory and will try to map it
cachable... which will #MC on quite a few machines.

This kind of crap HAS TO STOP. Memory is discontiguous, deal with it
and deal with it properly.

I also have to admit that I don't see the difference between /dev/mem
and /dev/oldmem, as the former allows access to memory ranges outside
the ones used by the current kernel, which is what the oldmem device
seems to be intended to od.

-hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2013-04-12 12:24:47

by Thomas Renninger

[permalink] [raw]
Subject: Re: [PATCH 5/5] kexec: X86: Pass memory ranges via e820 table instead of memmap= boot parameter

On Thursday, April 11, 2013 07:55:57 AM Yinghai Lu wrote:
> On Thu, Apr 11, 2013 at 5:26 AM, Thomas Renninger <[email protected]> wrote:
> > Currently ranges are passed via kernel boot parameters:
> > memmap=exactmap memmap=X#Y memmap=
> >
> > Pass them via e820 table directly instead.
>
> how to address "saved_max_pfn" referring in kernel?
Yes, this patch won't work as I miss out the previously usable memory
totally.
I have to re-work this one and also pass these ranges as discussed
via a KDUMP_RESERVED or even better a KDUMP_MEMORY e820 type.
KDUMP_RESERVED could get used for reserved memory inside the crash
kernel range at some point of time if it is useful.

Can the other patches get applied already if they are fine?

> kernel need to use saved_max_pfn from old e820 in
> drivers/char/mem.c::read_oldmem()
>
> mips and powerpc they are passing that from command line "savemaxmem="
>
> x86 should use that too?
I could add that.
But things cannot get cleaned up because things have to be
compatible to old kexec tools not passing this param at least for
quite some time.

Thomas

2013-04-12 14:31:58

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 5/5] kexec: X86: Pass memory ranges via e820 table instead of memmap= boot parameter

On Thu, Apr 11, 2013 at 08:06:50AM -0700, H. Peter Anvin wrote:
> On 04/11/2013 07:55 AM, Yinghai Lu wrote:
> > On Thu, Apr 11, 2013 at 5:26 AM, Thomas Renninger <[email protected]> wrote:
> >> Currently ranges are passed via kernel boot parameters:
> >> memmap=exactmap memmap=X#Y memmap=
> >>
> >> Pass them via e820 table directly instead.
> >
> > how to address "saved_max_pfn" referring in kernel?
> >
> > kernel need to use saved_max_pfn from old e820 in
> > drivers/char/mem.c::read_oldmem()
> >
> > mips and powerpc they are passing that from command line "savemaxmem="
> >
> > x86 should use that too?
> >
>
> Oh bloody hell, yet another f-ing "max_pfn" variable.
>
> The *only* one that makes any kind of sense is max_low_pfn (marking the
> cutoff to highmem)... the pretty much the rest of them are just plain wrong.
>
> And I don't mean "mildly annoying", I mean "catastrophically wrong
> semantics". In this case, it introduces a completely arbitrary
> distinction between a nonmemory range below a high water mark and a
> nonmemory range above that high water mark. In fact, from reading the
> code it seems pretty clear that the device will blindly assume that
> anything below saved_max_pfn is memory and will try to map it
> cachable... which will #MC on quite a few machines.
>
> This kind of crap HAS TO STOP. Memory is discontiguous, deal with it
> and deal with it properly.

Agreed. saved_max_pfn is bad idea. Passing all the mappable memory of
old kernel as "RESERVED" (Or KDUMP_RESERVED or KDUMP_MEM or whatever) to
next kernel in e820 map sounds better. And next kernel can allow access
to RESERVED range using /dev/oldmem interface.

For backward compatibility with old kexec-tools we can probably retain
saved_max_pfn for some time. We can set saved_max_pfn to end of
memory range including "RESERVED" regions. And this will be overwritten
if old kexec-tools have passed this parameter on command line. Also
whenever user passes saved_max_pfn on command line, we can do WARN_ONCE()
to upgrade to kexec-tools and let them know that saved_max_pfn will be
deprecated.

For issue of doing ioremap() on everything as cacheable, we should be
able to modify copy_olmem_page() and it should go through memory map
and check whether said pfn is mappable or not and what flags should
be used to map it.

I think this will again be problem with old kexec-tools. May be we check
of presence of atleast one "KDUMP_RESERVED" range in memory map. If none
is present, we know old kexec-tools were used and in that we can map
all pfn ioremap() blindly. We can do WARN_ONCE() and ask user to upgrade
the kexec-tools and after some time do away with this hack in
copy_oldmem_page() as well as remove saved_max_pfn.
>
> I also have to admit that I don't see the difference between /dev/mem
> and /dev/oldmem, as the former allows access to memory ranges outside
> the ones used by the current kernel, which is what the oldmem device
> seems to be intended to od.
>

I think one difference seems to be that /dev/mem assumes that validly
accessed memory is already mapped in kernel while /dev/oldmeme assumes
it is not mapped and creates temporary mappings explicitly.

Thanks
Vivek

2013-04-12 14:57:31

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 5/5] kexec: X86: Pass memory ranges via e820 table instead of memmap= boot parameter

On 04/12/2013 07:31 AM, Vivek Goyal wrote:
>>
>> I also have to admit that I don't see the difference between /dev/mem
>> and /dev/oldmem, as the former allows access to memory ranges outside
>> the ones used by the current kernel, which is what the oldmem device
>> seems to be intended to od.
>
> I think one difference seems to be that /dev/mem assumes that validly
> accessed memory is already mapped in kernel while /dev/oldmeme assumes
> it is not mapped and creates temporary mappings explicitly.
>

Dave Hansen has been working on fixing /dev/mem for HIGHMEM.

-hpa

2013-04-12 22:17:41

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 5/5] kexec: X86: Pass memory ranges via e820 table instead of memmap= boot parameter

On 04/12/2013 07:56 AM, H. Peter Anvin wrote:
> On 04/12/2013 07:31 AM, Vivek Goyal wrote:
>>> I also have to admit that I don't see the difference between /dev/mem
>>> and /dev/oldmem, as the former allows access to memory ranges outside
>>> the ones used by the current kernel, which is what the oldmem device
>>> seems to be intended to od.

It varies from arch to arch of course.

But, /dev/mem has restrictions on it, like CONFIG_STRICT_DEVMEM or the
ARCH_HAS_VALID_PHYS_ADDR_RANGE. There's a lot of stuff that depends on
it, *and* folks have tried to fix it up so that it's not _as_ blatant of
a way to completely screw your system.

/dev/mem also tries to be nice to arches that have restrictions like:

> /*
> * On ia64 if a page has been mapped somewhere as
> * uncached, then it must also be accessed uncached
> * by the kernel or data corruption may occur
> */

I think /dev/oldmem isn't so nice and could actually cause some real
problems if used on ia64 where the cached/uncached accesses are mixed.

2013-04-12 23:18:38

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 5/5] kexec: X86: Pass memory ranges via e820 table instead of memmap= boot parameter

Yes... That is one reason I think it is a real problem.


Dave Hansen <[email protected]> wrote:

>On 04/12/2013 07:56 AM, H. Peter Anvin wrote:
>> On 04/12/2013 07:31 AM, Vivek Goyal wrote:
>>>> I also have to admit that I don't see the difference between
>/dev/mem
>>>> and /dev/oldmem, as the former allows access to memory ranges
>outside
>>>> the ones used by the current kernel, which is what the oldmem
>device
>>>> seems to be intended to od.
>
>It varies from arch to arch of course.
>
>But, /dev/mem has restrictions on it, like CONFIG_STRICT_DEVMEM or the
>ARCH_HAS_VALID_PHYS_ADDR_RANGE. There's a lot of stuff that depends on
>it, *and* folks have tried to fix it up so that it's not _as_ blatant
>of
>a way to completely screw your system.
>
>/dev/mem also tries to be nice to arches that have restrictions like:
>
>> /*
>> * On ia64 if a page has been mapped
>somewhere as
>> * uncached, then it must also be accessed
>uncached
>> * by the kernel or data corruption may occur
>> */
>
>I think /dev/oldmem isn't so nice and could actually cause some real
>problems if used on ia64 where the cached/uncached accesses are mixed.

--
Sent from my mobile phone. Please excuse brevity and lack of formatting.

2013-04-15 04:53:29

by Hatayama, Daisuke

[permalink] [raw]
Subject: Re: [PATCH 5/5] kexec: X86: Pass memory ranges via e820 table instead of memmap= boot parameter

(2013/04/13 7:17), Dave Hansen wrote:
> On 04/12/2013 07:56 AM, H. Peter Anvin wrote:
>> On 04/12/2013 07:31 AM, Vivek Goyal wrote:
>>>> I also have to admit that I don't see the difference between /dev/mem
>>>> and /dev/oldmem, as the former allows access to memory ranges outside
>>>> the ones used by the current kernel, which is what the oldmem device
>>>> seems to be intended to od.
>
> It varies from arch to arch of course.
>
> But, /dev/mem has restrictions on it, like CONFIG_STRICT_DEVMEM or the
> ARCH_HAS_VALID_PHYS_ADDR_RANGE. There's a lot of stuff that depends on
> it, *and* folks have tried to fix it up so that it's not _as_ blatant of
> a way to completely screw your system.
>
> /dev/mem also tries to be nice to arches that have restrictions like:
>
>> /*
>> * On ia64 if a page has been mapped somewhere as
>> * uncached, then it must also be accessed uncached
>> * by the kernel or data corruption may occur
>> */
>
> I think /dev/oldmem isn't so nice and could actually cause some real
> problems if used on ia64 where the cached/uncached accesses are mixed.

This sounds like there's no such issue on x86 cache mechanism. Is it
correct? If so, what is the difference between ia64 and x86 cache
mechanisms?

--
Thanks.
HATAYAMA, Daisuke

2013-04-15 05:57:15

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 5/5] kexec: X86: Pass memory ranges via e820 table instead of memmap= boot parameter

On 04/14/2013 09:52 PM, HATAYAMA Daisuke wrote:
> This sounds like there's no such issue on x86 cache mechanism. Is it
> correct? If so, what is the difference between ia64 and x86 cache
> mechanisms?

I'm just going by the code comments:

drivers/char/mem.c
> /*
> * On ia64 if a page has been mapped somewhere as uncached, then
> * it must also be accessed uncached by the kernel or data
> * corruption may occur.
> */

2013-04-15 07:58:47

by Hatayama, Daisuke

[permalink] [raw]
Subject: Re: [PATCH 5/5] kexec: X86: Pass memory ranges via e820 table instead of memmap= boot parameter

(2013/04/15 14:58), Dave Hansen wrote:
> On 04/14/2013 09:52 PM, HATAYAMA Daisuke wrote:
>> This sounds like there's no such issue on x86 cache mechanism. Is it
>> correct? If so, what is the difference between ia64 and x86 cache
>> mechanisms?
>
> I'm just going by the code comments:
>
> drivers/char/mem.c
>> /*
>> * On ia64 if a page has been mapped somewhere as uncached, then
>> * it must also be accessed uncached by the kernel or data
>> * corruption may occur.
>> */

I think it reasonable, in complexity of design, to decide cache or
uncache according to whether target memory is RAM or some device. If
we're concerned about page levels, things are to be complicated further
since memory typing is done per pages. How large does such table become
to represent memory types for all the target pages, how do we create it
and when? (I don't know ia64 but I guess caching on ia64 is also done in
per pages just like x86...)

--
Thanks.
HATAYAMA, Daisuke

2013-04-15 14:50:28

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 5/5] kexec: X86: Pass memory ranges via e820 table instead of memmap= boot parameter

This is also true on some x86 systems.

Dave Hansen <[email protected]> wrote:

>On 04/14/2013 09:52 PM, HATAYAMA Daisuke wrote:
>> This sounds like there's no such issue on x86 cache mechanism. Is it
>> correct? If so, what is the difference between ia64 and x86 cache
>> mechanisms?
>
>I'm just going by the code comments:
>
>drivers/char/mem.c
>> /*
>> * On ia64 if a page has been mapped somewhere as
>uncached, then
>> * it must also be accessed uncached by the kernel or
>data
>> * corruption may occur.
>> */

--
Sent from my mobile phone. Please excuse brevity and lack of formatting.