2020-10-29 21:32:33

by Sudarshan Rajagopalan

[permalink] [raw]
Subject: mm/memblock: export memblock_{start/end}_of_DRAM

Hello all,

We have a usecase where a module driver adds certain memory blocks using
add_memory_driver_managed(), so that it can perform memory hotplug
operations on these blocks. In general, these memory blocks aren’t
something that gets physically added later, but is part of actual RAM
that system booted up with. Meaning – we set the ‘mem=’ cmdline
parameter to limit the memory and later add the remaining ones using
add_memory*() variants.

The basic idea is to have driver have ownership and manage certain
memory blocks for hotplug operations.

For the driver be able to know how much memory was limited and how much
actually present, we take the delta of ‘bootmem physical end address’
and ‘memblock_end_of_DRAM’. The 'bootmem physical end address' is
obtained by scanning the reg values in ‘memory’ DT node and determining
the max {addr,size}. Since our driver is getting modularized, we won’t
have access to memblock_end_of_DRAM (i.e. end address of all memory
blocks after ‘mem=’ is applied).

So checking if memblock_{start/end}_of_DRAM() symbols can be exported?
Also, this information can be obtained by userspace by doing ‘cat
/proc/iomem’ and greping for ‘System RAM’. So wondering if userspace can
have access to such info, can we allow kernel module drivers have access
by exporting memblock_{start/end}_of_DRAM().

Or are there any other ways where a module driver can get the end
address of system memory block?


Sudarshan

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a
Linux Foundation Collaborative Project


2020-10-30 06:46:22

by David Hildenbrand

[permalink] [raw]
Subject: Re: mm/memblock: export memblock_{start/end}_of_DRAM

On 29.10.20 22:29, Sudarshan Rajagopalan wrote:
> Hello all,
>

Hi!

> We have a usecase where a module driver adds certain memory blocks using
> add_memory_driver_managed(), so that it can perform memory hotplug
> operations on these blocks. In general, these memory blocks aren’t
> something that gets physically added later, but is part of actual RAM
> that system booted up with. Meaning – we set the ‘mem=’ cmdline
> parameter to limit the memory and later add the remaining ones using
> add_memory*() variants.
>
> The basic idea is to have driver have ownership and manage certain
> memory blocks for hotplug operations.

So, in summary, you're still abusing the memory hot(un)plug
infrastructure from your driver - just not in a severe way as before.
And I'll tell you why, so you might understand why exposing this API is
not really a good idea and why your driver wouldn't - for example - be
upstream material.

Don't get me wrong, what you are doing might be ok in your context, but
it's simply not universally applicable in our current model.

Ordinary system RAM works different than many other devices (like PCI
devices) whereby *something* senses the device and exposes it to the
system, and some available driver binds to it and owns the memory.

Memory is detected by a driver and added to the system via e.g.,
add_memory_driver_managed(). Memory devices are created and the memory
is directly handed off to the system, to be used as system RAM as soon
as memory devices are onlined. There is no driver that "binds" memory
like other devices - it's rather the core (buddy) that uses/owns that
memory immediately after device creation.

>
> For the driver be able to know how much memory was limited and how much
> actually present, we take the delta of ‘bootmem physical end address’
> and ‘memblock_end_of_DRAM’. The 'bootmem physical end address' is
> obtained by scanning the reg values in ‘memory’ DT node and determining
> the max {addr,size}. Since our driver is getting modularized, we won’t
> have access to memblock_end_of_DRAM (i.e. end address of all memory
> blocks after ‘mem=’ is applied).

What you do with "mem=" is force memory detection to ignore some of it's
detected memory.

>
> So checking if memblock_{start/end}_of_DRAM() symbols can be exported?
> Also, this information can be obtained by userspace by doing ‘cat
> /proc/iomem’ and greping for ‘System RAM’. So wondering if userspace can

Not correct: with "mem=", cat /proc/iomem only shows *detected* + added
system RAM, not the unmodified detection.

> have access to such info, can we allow kernel module drivers have access
> by exporting memblock_{start/end}_of_DRAM().
>
> Or are there any other ways where a module driver can get the end
> address of system memory block?

And here is our problem: You disabled *detection* of that memory by the
responsible driver (here: core). Now your driver wants to know what
would have been detected. Assume you have memory hole in that region -
it would not work by simply looking at start/end. You're driver is not
the one doing the detection.

Another issue is: when using such memory for KVM guests, there is no
mechanism that tracks ownership of that memory - imagine another driver
wanting to use that memory. This really only works in special environments.

Yet another issue: you cannot assume that memblock data will stay around
after boot. While we do it right now for arm64, that might change at
some point. This is also one of the reasons why we don't export any real
memblock data to drivers.


When using "mem=" you have to know the exact layout of your system RAM
and communicate the right places how that layout looks like manually:
here, to your driver.

The clean way of doing things today is to allocate RAM and use it for
guests - e.g., using hugetlb/gigantic pages. As I said, there are other
techniques coming up to deal with minimizing struct page overhead - if
that's what you're concerned with (I still don't know why you're
removing the memory from the host when giving it to the guest).

--
Thanks,

David / dhildenb

2020-10-30 08:43:16

by Mike Rapoport

[permalink] [raw]
Subject: Re: mm/memblock: export memblock_{start/end}_of_DRAM

On Thu, Oct 29, 2020 at 02:29:27PM -0700, Sudarshan Rajagopalan wrote:
> Hello all,
>
> We have a usecase where a module driver adds certain memory blocks using
> add_memory_driver_managed(), so that it can perform memory hotplug
> operations on these blocks. In general, these memory blocks aren’t something
> that gets physically added later, but is part of actual RAM that system
> booted up with. Meaning – we set the ‘mem=’ cmdline parameter to limit the
> memory and later add the remaining ones using add_memory*() variants.
>
> The basic idea is to have driver have ownership and manage certain memory
> blocks for hotplug operations.
>
> For the driver be able to know how much memory was limited and how much
> actually present, we take the delta of ‘bootmem physical end address’ and
> ‘memblock_end_of_DRAM’. The 'bootmem physical end address' is obtained by
> scanning the reg values in ‘memory’ DT node and determining the max
> {addr,size}. Since our driver is getting modularized, we won’t have access
> to memblock_end_of_DRAM (i.e. end address of all memory blocks after ‘mem=’
> is applied).
>
> So checking if memblock_{start/end}_of_DRAM() symbols can be exported? Also,
> this information can be obtained by userspace by doing ‘cat /proc/iomem’ and
> greping for ‘System RAM’. So wondering if userspace can have access to such
> info, can we allow kernel module drivers have access by exporting
> memblock_{start/end}_of_DRAM().

These functions cannot be exported not because we want to hide this
information from the modules but because it is unsafe to use them.
On most architecturs these functions are __init so they are discarded
after boot anyway. Beisdes, the memory configuration known to memblock
might be not accurate in many cases as David explained in his reply.

> Or are there any other ways where a module driver can get the end address of
> system memory block?

What do you mean by "system memory block"? There could be a lot of
interpretations if you take into account memory hotplug, "mem=" option,
reserved and firmware memory.

I'd suggest you to describe the entire use case in more detail. Having
the complete picture would help finding a proper solution.

> Sudarshan
>

--
Sincerely yours,
Mike.

2020-10-31 09:23:08

by Christoph Hellwig

[permalink] [raw]
Subject: Re: mm/memblock: export memblock_{start/end}_of_DRAM

On Fri, Oct 30, 2020 at 10:38:42AM +0200, Mike Rapoport wrote:
>
> What do you mean by "system memory block"? There could be a lot of
> interpretations if you take into account memory hotplug, "mem=" option,
> reserved and firmware memory.
>
> I'd suggest you to describe the entire use case in more detail. Having
> the complete picture would help finding a proper solution.

I think we need the code for the driver trying to do this as an RFC
submission. Everything else is rather pointless.

2020-10-31 10:07:48

by David Hildenbrand

[permalink] [raw]
Subject: Re: mm/memblock: export memblock_{start/end}_of_DRAM

On 31.10.20 10:18, Christoph Hellwig wrote:
> On Fri, Oct 30, 2020 at 10:38:42AM +0200, Mike Rapoport wrote:
>>
>> What do you mean by "system memory block"? There could be a lot of
>> interpretations if you take into account memory hotplug, "mem=" option,
>> reserved and firmware memory.
>>
>> I'd suggest you to describe the entire use case in more detail. Having
>> the complete picture would help finding a proper solution.
>
> I think we need the code for the driver trying to do this as an RFC
> submission. Everything else is rather pointless.

Sharing RFCs is most probably not what people want when developing
advanced hypervisor features :)

@Sudarshan, I recommend looking at the slides of the KVM Forum talk from
yesterday

https://kvmforum2020.sched.com/event/eE40/towards-an-alternative-memory-architecture-joao-martins-oracle?iframe=no

It contains a nice summary of the state of art, and how "mem=", devdax,
and dax_hmat can be used to tackle the issue in a hypervisor.

--
Thanks,

David / dhildenb

2020-11-03 08:41:09

by Christoph Hellwig

[permalink] [raw]
Subject: Re: mm/memblock: export memblock_{start/end}_of_DRAM

On Sat, Oct 31, 2020 at 11:05:45AM +0100, David Hildenbrand wrote:
> On 31.10.20 10:18, Christoph Hellwig wrote:
> > On Fri, Oct 30, 2020 at 10:38:42AM +0200, Mike Rapoport wrote:
> > > What do you mean by "system memory block"? There could be a lot of
> > > interpretations if you take into account memory hotplug, "mem=" option,
> > > reserved and firmware memory.
> > >
> > > I'd suggest you to describe the entire use case in more detail. Having
> > > the complete picture would help finding a proper solution.
> >
> > I think we need the code for the driver trying to do this as an RFC
> > submission. Everything else is rather pointless.
>
> Sharing RFCs is most probably not what people want when developing advanced
> hypervisor features :)

Well, if they can't even do that it really has no relevance for kernel
development.

2020-11-03 12:55:46

by Sudarshan Rajagopalan

[permalink] [raw]
Subject: Re: mm/memblock: export memblock_{start/end}_of_DRAM

On 2020-10-29 23:41, David Hildenbrand wrote:
> On 29.10.20 22:29, Sudarshan Rajagopalan wrote:
>> Hello all,
>>
>
> Hi!
>

Hi David.. thanks for the response as always.

>> We have a usecase where a module driver adds certain memory blocks
>> using
>> add_memory_driver_managed(), so that it can perform memory hotplug
>> operations on these blocks. In general, these memory blocks aren’t
>> something that gets physically added later, but is part of actual RAM
>> that system booted up with. Meaning – we set the ‘mem=’ cmdline
>> parameter to limit the memory and later add the remaining ones using
>> add_memory*() variants.
>>
>> The basic idea is to have driver have ownership and manage certain
>> memory blocks for hotplug operations.
>
> So, in summary, you're still abusing the memory hot(un)plug
> infrastructure from your driver - just not in a severe way as before.
> And I'll tell you why, so you might understand why exposing this API
> is not really a good idea and why your driver wouldn't - for example -
> be upstream material.
>
> Don't get me wrong, what you are doing might be ok in your context,
> but it's simply not universally applicable in our current model.
>
> Ordinary system RAM works different than many other devices (like PCI
> devices) whereby *something* senses the device and exposes it to the
> system, and some available driver binds to it and owns the memory.
>
> Memory is detected by a driver and added to the system via e.g.,
> add_memory_driver_managed(). Memory devices are created and the memory
> is directly handed off to the system, to be used as system RAM as soon
> as memory devices are onlined. There is no driver that "binds" memory
> like other devices - it's rather the core (buddy) that uses/owns that
> memory immediately after device creation.
>

I see.. and I agree that drivers are meant to *sense* that something
changed or newly added, so that driver can check if it's the one
responsible or compatible for handling this entity and binds to it. So I
guess what it boils down to is - a driver that uses memory hotplug
_cannot_ add/remove or have ownership of memblock boot memory, but for
the newly added RAM blocks later on.

I was trying to mimic the detecting and adding of extra RAM by limiting
the System RAM with "mem=XGB" as though system booted with XGB of boot
memory and later add the remaining blocks (force detection and adding)
using add_memorY-driver_manager(). This remaining blocks are calculated
by 'physical end addr of boot memory' - 'memblock_end_of_DRAM'. The
"physical end addr of boot memory" i.e. the actual RAM that bootloader
informs to kernel can be obtained by scanning the 'memory' DT node.

>>
>> For the driver be able to know how much memory was limited and how
>> much
>> actually present, we take the delta of ‘bootmem physical end address’
>> and ‘memblock_end_of_DRAM’. The 'bootmem physical end address' is
>> obtained by scanning the reg values in ‘memory’ DT node and
>> determining
>> the max {addr,size}. Since our driver is getting modularized, we won’t
>> have access to memblock_end_of_DRAM (i.e. end address of all memory
>> blocks after ‘mem=’ is applied).
>
> What you do with "mem=" is force memory detection to ignore some of
> it's detected memory.
>
>>
>> So checking if memblock_{start/end}_of_DRAM() symbols can be exported?
>> Also, this information can be obtained by userspace by doing ‘cat
>> /proc/iomem’ and greping for ‘System RAM’. So wondering if userspace
>> can
>
> Not correct: with "mem=", cat /proc/iomem only shows *detected* +
> added system RAM, not the unmodified detection.
>

That's correct - I meant 'memblock_end_of_DRAM' along with "mem=" can be
calculated using 'cat /proc/iomem' which shows "detected plus added"
System RAM, and not the remaining undetected one which got stripped off
due to "mem=XGB". Basically, 'memblock_end_of_DRAM' address with
'mem=XGB' is {end addr of boot RAM - XGB}.. which would be same as end
address of "System RAM" showed in /proc/iomem.

The reasoning for this is - if userspace can have access to such info
and calculate the memblock end address, why not let drivers have this
info using memblock_end_of_DRAM()?

>> have access to such info, can we allow kernel module drivers have
>> access
>> by exporting memblock_{start/end}_of_DRAM().
>>
>> Or are there any other ways where a module driver can get the end
>> address of system memory block?
>
> And here is our problem: You disabled *detection* of that memory by
> the responsible driver (here: core). Now your driver wants to know
> what would have been detected. Assume you have memory hole in that
> region - it would not work by simply looking at start/end. You're
> driver is not the one doing the detection.
>

Regarding the memory hole - the driver can inspect the 'memory' DT node
that kernel gets from ABL from RAM partition table if any such holes
exist or not. I agree that if such holes exists, hot adding will fail
since it needs block size to be added.
The same issue will arise if a RAM slot is added and a driver senses it
and it only knows the start/end of this RAM slot (though such holes
generally doesn't exists in RAM slots).

This is again something specific to our target which we make sure there
are no such holes in the top most memory which is stripped off by "mem="
and later added by the driver. I agree this is not universal upstream
material type, but its a method that drivers can utilize.

> Another issue is: when using such memory for KVM guests, there is no
> mechanism that tracks ownership of that memory - imagine another
> driver wanting to use that memory. This really only works in special
> environments.
>
> Yet another issue: you cannot assume that memblock data will stay
> around after boot. While we do it right now for arm64, that might
> change at some point. This is also one of the reasons why we don't
> export any real memblock data to drivers.
>
>
> When using "mem=" you have to know the exact layout of your system RAM
> and communicate the right places how that layout looks like manually:
> here, to your driver.
>

I agree the issues mentioned here with this approach are valid from
upstream POV, but we aren't trying to make a generic driver for this
usecase and upstream it, but rather have it tailor made for our usecase
alone where we know the layout of the System RAM (max bootmemory, no
holes etc) and we utilize "mem=" and memory hotplug so that driver can
add and have ownership of the remaining memory for later hotplug
operations.

> The clean way of doing things today is to allocate RAM and use it for
> guests - e.g., using hugetlb/gigantic pages. As I said, there are
> other techniques coming up to deal with minimizing struct page
> overhead - if that's what you're concerned with (I still don't know
> why you're removing the memory from the host when giving it to the
> guest).

The overhead of strut page with hugetlb is valid, but we have other
usecases outside of inter-VM sharing where we rely on memory
hotplugging. In general, we want a way to be able to add/remove and
offline/online a memory which is part of boot. With all the tools
available - "mem=", "/proc/iomem", "memory" DT node and memory hotplug
framework, a driver can still be able to achieve this and these tools
that are present now does allow it.

Keeping the interVM memory sharing aside, would it be okay if
memblock_end_of_DRAM() be exported? Like I mentioned before, there can
be a userspace service that calculates this using 'cat /proc/iomem' and
have it delivered to driver via a sysfs node. So I dont see any harm in
exporting this info to driver. I agree other memblock info shouldn't be
exposed outside to drivers. But I see no harm for
memblock_end_of_DRAM().

I will be glad to share more info about the usecase where we use this
approach if that would help, and I can check and get back on how much we
can share since this is a proprietary usecase for Qualcomm.


Sudarshan

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a
Linux Foundation Collaborative Project

2020-11-03 13:33:16

by Sudarshan Rajagopalan

[permalink] [raw]
Subject: Re: mm/memblock: export memblock_{start/end}_of_DRAM

On 2020-10-30 01:38, Mike Rapoport wrote:
> On Thu, Oct 29, 2020 at 02:29:27PM -0700, Sudarshan Rajagopalan wrote:
>> Hello all,
>>
>> We have a usecase where a module driver adds certain memory blocks
>> using
>> add_memory_driver_managed(), so that it can perform memory hotplug
>> operations on these blocks. In general, these memory blocks aren’t
>> something
>> that gets physically added later, but is part of actual RAM that
>> system
>> booted up with. Meaning – we set the ‘mem=’ cmdline parameter to limit
>> the
>> memory and later add the remaining ones using add_memory*() variants.
>>
>> The basic idea is to have driver have ownership and manage certain
>> memory
>> blocks for hotplug operations.
>>
>> For the driver be able to know how much memory was limited and how
>> much
>> actually present, we take the delta of ‘bootmem physical end address’
>> and
>> ‘memblock_end_of_DRAM’. The 'bootmem physical end address' is obtained
>> by
>> scanning the reg values in ‘memory’ DT node and determining the max
>> {addr,size}. Since our driver is getting modularized, we won’t have
>> access
>> to memblock_end_of_DRAM (i.e. end address of all memory blocks after
>> ‘mem=’
>> is applied).
>>
>> So checking if memblock_{start/end}_of_DRAM() symbols can be exported?
>> Also,
>> this information can be obtained by userspace by doing ‘cat
>> /proc/iomem’ and
>> greping for ‘System RAM’. So wondering if userspace can have access to
>> such
>> info, can we allow kernel module drivers have access by exporting
>> memblock_{start/end}_of_DRAM().
>
> These functions cannot be exported not because we want to hide this
> information from the modules but because it is unsafe to use them.
> On most architecturs these functions are __init so they are discarded
> after boot anyway. Beisdes, the memory configuration known to memblock
> might be not accurate in many cases as David explained in his reply.
>

I don't see how information contained in memblock_{start/end}_of_DRAM()
is considered hidden if the information can be obtained using 'cat
/proc/iomem'. The memory resource manager adds these blocks either in
"System RAM", "reserved", "Kernel data/code" etc. Inspecting this, one
could determine whats the start and end of memblocks.

I agree on the part that its __init annotated and could be removed after
boot. This is something that the driver can be vary of too.

>> Or are there any other ways where a module driver can get the end
>> address of
>> system memory block?
>
> What do you mean by "system memory block"? There could be a lot of
> interpretations if you take into account memory hotplug, "mem=" option,
> reserved and firmware memory.

I meant the physical end address of memblock. The equivalent of
memblock_end_of_DRAM.

>
> I'd suggest you to describe the entire use case in more detail. Having
> the complete picture would help finding a proper solution.

The usecase in general is have a way to add/remove and online/offline
certain memory blocks which are part of boot. We do this by limiting the
memory using "mem=" and latter add the remaining blocks using
add_memory_driver_mamanaged().

>
>> Sudarshan
>>
>
> --
> Sincerely yours,
> Mike.


Sudarshan

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a
Linux Foundation Collaborative Project

2020-11-03 16:55:52

by Mike Rapoport

[permalink] [raw]
Subject: Re: mm/memblock: export memblock_{start/end}_of_DRAM

On Mon, Nov 02, 2020 at 06:51:25PM -0800, Sudarshan Rajagopalan wrote:
> On 2020-10-30 01:38, Mike Rapoport wrote:
> > On Thu, Oct 29, 2020 at 02:29:27PM -0700, Sudarshan Rajagopalan wrote:
> > > Hello all,
> > >
> > > We have a usecase where a module driver adds certain memory blocks
> > > using
> > > add_memory_driver_managed(), so that it can perform memory hotplug
> > > operations on these blocks. In general, these memory blocks aren’t
> > > something
> > > that gets physically added later, but is part of actual RAM that
> > > system
> > > booted up with. Meaning – we set the ‘mem=’ cmdline parameter to
> > > limit the
> > > memory and later add the remaining ones using add_memory*() variants.
> > >
> > > The basic idea is to have driver have ownership and manage certain
> > > memory
> > > blocks for hotplug operations.
> > >
> > > For the driver be able to know how much memory was limited and how
> > > much
> > > actually present, we take the delta of ‘bootmem physical end
> > > address’ and
> > > ‘memblock_end_of_DRAM’. The 'bootmem physical end address' is
> > > obtained by
> > > scanning the reg values in ‘memory’ DT node and determining the max
> > > {addr,size}. Since our driver is getting modularized, we won’t have
> > > access
> > > to memblock_end_of_DRAM (i.e. end address of all memory blocks after
> > > ‘mem=’
> > > is applied).
> > >
> > > So checking if memblock_{start/end}_of_DRAM() symbols can be
> > > exported? Also,
> > > this information can be obtained by userspace by doing ‘cat
> > > /proc/iomem’ and
> > > greping for ‘System RAM’. So wondering if userspace can have access
> > > to such
> > > info, can we allow kernel module drivers have access by exporting
> > > memblock_{start/end}_of_DRAM().
> >
> > These functions cannot be exported not because we want to hide this
> > information from the modules but because it is unsafe to use them.
> > On most architecturs these functions are __init so they are discarded
> > after boot anyway. Beisdes, the memory configuration known to memblock
> > might be not accurate in many cases as David explained in his reply.
> >
>
> I don't see how information contained in memblock_{start/end}_of_DRAM() is
> considered hidden if the information can be obtained using 'cat
> /proc/iomem'. The memory resource manager adds these blocks either in
> "System RAM", "reserved", "Kernel data/code" etc. Inspecting this, one could
> determine whats the start and end of memblocks.

I'm not saying that the memblock data is considered hidden. On most
systems it is simply not present after boot. And even if it is not
discarded, it might be not accurate on any arch except arm64.

> I agree on the part that its __init annotated and could be removed after
> boot. This is something that the driver can be vary of too.
>
> > > Or are there any other ways where a module driver can get the end
> > > address of
> > > system memory block?
> >
> > What do you mean by "system memory block"? There could be a lot of
> > interpretations if you take into account memory hotplug, "mem=" option,
> > reserved and firmware memory.
>
> I meant the physical end address of memblock. The equivalent of
> memblock_end_of_DRAM.

> > I'd suggest you to describe the entire use case in more detail. Having
> > the complete picture would help finding a proper solution.
>
> The usecase in general is have a way to add/remove and online/offline
> certain memory blocks which are part of boot. We do this by limiting the
> memory using "mem=" and latter add the remaining blocks using
> add_memory_driver_mamanaged().

I think such infrastructure should be a part of core mm rather than
external out-of-tree driver.

> Sudarshan
>
--
Sincerely yours,
Mike.