2014-07-24 07:42:12

by Zhang Zhen

[permalink] [raw]
Subject: [PATCH] memory-hotplug: add sysfs zone_index attribute

Currently memory-hotplug has two limits:
1. If the memory block is in ZONE_NORMAL, you can change it to
ZONE_MOVABLE, but this memory block must be adjacent to ZONE_MOVABLE.
2. If the memory block is in ZONE_MOVABLE, you can change it to
ZONE_NORMAL, but this memory block must be adjacent to ZONE_NORMAL.

Without this patch, we don't know which zone a memory block is in.
So we don't know which memory block is adjacent to ZONE_MOVABLE or
ZONE_NORMAL.

On the other hand, with this patch, we can easy to know newly added
memory is added as ZONE_NORMAL (for powerpc, ZONE_DMA, for x86_32,
ZONE_HIGHMEM).

Updated the related Documentation.

Signed-off-by: Zhang Zhen <[email protected]>
---
Documentation/ABI/testing/sysfs-devices-memory | 9 +++++++++
Documentation/memory-hotplug.txt | 4 +++-
drivers/base/memory.c | 15 ++++++++++++++-
3 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-devices-memory b/Documentation/ABI/testing/sysfs-devices-memory
index 7405de2..39d3423 100644
--- a/Documentation/ABI/testing/sysfs-devices-memory
+++ b/Documentation/ABI/testing/sysfs-devices-memory
@@ -61,6 +61,15 @@ Users: hotplug memory remove tools
http://www.ibm.com/developerworks/wikis/display/LinuxP/powerpc-utils


+What: /sys/devices/system/memory/memoryX/zone_index
+Date: July 2014
+Contact: Zhang Zhen <[email protected]>
+Description:
+ The file /sys/devices/system/memory/memoryX/zone_index
+ is read-only and is designed to show which zone this memory block is in.
+Users: hotplug memory remove tools
+ http://www.ibm.com/developerworks/wikis/display/LinuxP/powerpc-utils
+
What: /sys/devices/system/memoryX/nodeY
Date: October 2009
Contact: Linux Memory Management list <[email protected]>
diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 45134dc..07019133 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -155,6 +155,7 @@ Under each memory block, you can see 4 files:
/sys/devices/system/memory/memoryXXX/phys_device
/sys/devices/system/memory/memoryXXX/state
/sys/devices/system/memory/memoryXXX/removable
+/sys/devices/system/memory/memoryXXX/zone_index

'phys_index' : read-only and contains memory block id, same as XXX.
'state' : read-write
@@ -170,6 +171,8 @@ Under each memory block, you can see 4 files:
block is removable and a value of 0 indicates that
it is not removable. A memory block is removable only if
every section in the block is removable.
+'zone_index' : read-only: designed to show which zone this memory block
+ is in.

NOTE:
These directories/files appear after physical memory hotplug phase.
@@ -408,7 +411,6 @@ node if necessary.
- allowing memory hot-add to ZONE_MOVABLE. maybe we need some switch like
sysctl or new control file.
- showing memory block and physical device relationship.
- - showing memory block is under ZONE_MOVABLE or not
- test and make it better memory offlining.
- support HugeTLB page migration and offlining.
- memmap removing at memory offline.
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 89f752d..3434d97 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -373,11 +373,23 @@ static ssize_t show_phys_device(struct device *dev,
return sprintf(buf, "%d\n", mem->phys_device);
}

+static ssize_t show_mem_zone_index(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct memory_block *mem = to_memory_block(dev);
+ struct page *first_page;
+ struct zone *zone;
+
+ first_page = pfn_to_page(mem->start_section_nr << PFN_SECTION_SHIFT);
+ zone = page_zone(first_page);
+ return sprintf(buf, "%s\n", zone->name);
+}
+
static DEVICE_ATTR(phys_index, 0444, show_mem_start_phys_index, NULL);
static DEVICE_ATTR(state, 0644, show_mem_state, store_mem_state);
static DEVICE_ATTR(phys_device, 0444, show_phys_device, NULL);
static DEVICE_ATTR(removable, 0444, show_mem_removable, NULL);
-
+static DEVICE_ATTR(zone_index, 0444, show_mem_zone_index, NULL);
/*
* Block size attribute stuff
*/
@@ -521,6 +533,7 @@ static struct attribute *memory_memblk_attrs[] = {
&dev_attr_state.attr,
&dev_attr_phys_device.attr,
&dev_attr_removable.attr,
+ &dev_attr_zone_index.attr,
NULL
};

--
1.8.1.2


.




2014-07-24 17:59:53

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] memory-hotplug: add sysfs zone_index attribute

On 07/24/2014 12:41 AM, Zhang Zhen wrote:
> Currently memory-hotplug has two limits:
> 1. If the memory block is in ZONE_NORMAL, you can change it to
> ZONE_MOVABLE, but this memory block must be adjacent to ZONE_MOVABLE.
> 2. If the memory block is in ZONE_MOVABLE, you can change it to
> ZONE_NORMAL, but this memory block must be adjacent to ZONE_NORMAL.
>
> Without this patch, we don't know which zone a memory block is in.
> So we don't know which memory block is adjacent to ZONE_MOVABLE or
> ZONE_NORMAL.
>
> On the other hand, with this patch, we can easy to know newly added
> memory is added as ZONE_NORMAL (for powerpc, ZONE_DMA, for x86_32,
> ZONE_HIGHMEM).

A section can contain more than one zone. This interface will lie about
such sections, which is quite unfortunate.

I'd really much rather see an interface that has a section itself
enumerate to which zones it may be changed. The way you have it now,
any user has to know the rules that you've laid out above. If the
kernel changed those restrictions, we'd have to teach every application
about the change in restrictions.


2014-07-25 02:39:53

by Zhang Zhen

[permalink] [raw]
Subject: Re: [PATCH] memory-hotplug: add sysfs zone_index attribute

On 2014/7/25 1:59, Dave Hansen wrote:
> On 07/24/2014 12:41 AM, Zhang Zhen wrote:
>> Currently memory-hotplug has two limits:
>> 1. If the memory block is in ZONE_NORMAL, you can change it to
>> ZONE_MOVABLE, but this memory block must be adjacent to ZONE_MOVABLE.
>> 2. If the memory block is in ZONE_MOVABLE, you can change it to
>> ZONE_NORMAL, but this memory block must be adjacent to ZONE_NORMAL.
>>
>> Without this patch, we don't know which zone a memory block is in.
>> So we don't know which memory block is adjacent to ZONE_MOVABLE or
>> ZONE_NORMAL.
>>
>> On the other hand, with this patch, we can easy to know newly added
>> memory is added as ZONE_NORMAL (for powerpc, ZONE_DMA, for x86_32,
>> ZONE_HIGHMEM).
>
> A section can contain more than one zone. This interface will lie about
> such sections, which is quite unfortunate.
>
1. In arch_add_memory(), x86_64 add the new pages of the new memory block default to
ZONE_NORMAL (for powerpc, ZONE_DMA, for x86_32, ZONE_HIGHMEM).

2. In __offline_pages(), test_pages_in_a_zone() guaranteed the pages of a memory block
we try to offline are in the same zone. If a section contains more than one zone,
the memory block can not be offlined.

Based on the above two points, i think the pages of a memory block are in one zone, and the sections
of a memory block are in one zone.

Could you please explain in detail what is the case a section can contain more than one zone ?

Thanks for your comments!

> I'd really much rather see an interface that has a section itself
> enumerate to which zones it may be changed. The way you have it now,
> any user has to know the rules that you've laid out above. If the
> kernel changed those restrictions, we'd have to teach every application
> about the change in restrictions.
>

This interface is designed to show which zone a memory block is in. If the kernel changed those
restrictions, this interface doesn't need to change.
For a x86_64 machine booted with "mem=400M" and with 2GiB memory installed.
Sample output of the sysfs files:
# cat block_size_bytes
8000000
# cat memory0/zone_index
DMA
# cat memory1/zone_index
DMA32
# cat memory2/zone_index
DMA32
# cat memory3/zone_index
DMA32
# echo 0x20000000 > probe
# cat memory4/zone_index
Normal
# echo online > memory4/state
# cat memory4/zone_index
Normal

# echo offline > memory4/state
# echo online_movable > memory4/state
# cat memory4/zone_index
Movable

Thanks!

Best regards!
>
>
>
> .
>

2014-07-25 07:19:52

by Zhang Zhen

[permalink] [raw]
Subject: Re: [PATCH] memory-hotplug: add sysfs zone_index attribute

On 2014/7/25 10:39, Zhang Zhen wrote:
> On 2014/7/25 1:59, Dave Hansen wrote:
>> On 07/24/2014 12:41 AM, Zhang Zhen wrote:
>>> Currently memory-hotplug has two limits:
>>> 1. If the memory block is in ZONE_NORMAL, you can change it to
>>> ZONE_MOVABLE, but this memory block must be adjacent to ZONE_MOVABLE.
>>> 2. If the memory block is in ZONE_MOVABLE, you can change it to
>>> ZONE_NORMAL, but this memory block must be adjacent to ZONE_NORMAL.
>>>
>>> Without this patch, we don't know which zone a memory block is in.
>>> So we don't know which memory block is adjacent to ZONE_MOVABLE or
>>> ZONE_NORMAL.
>>>
>>> On the other hand, with this patch, we can easy to know newly added
>>> memory is added as ZONE_NORMAL (for powerpc, ZONE_DMA, for x86_32,
>>> ZONE_HIGHMEM).
>>
>> A section can contain more than one zone. This interface will lie about
>> such sections, which is quite unfortunate.

Hi Dave,

You are right, i only considered the memory block added after machine booted.
For a x86_64 machine booted with "mem=400M" and with 2GiB memory installed.
Sample output of the sysfs files:
# cat block_size_bytes
8000000
# cat memory0/zone_index
DMA

Here memory0 cantain DMA_ZONE and DMA32_ZONE.
>>
> 1. In arch_add_memory(), x86_64 add the new pages of the new memory block default to
> ZONE_NORMAL (for powerpc, ZONE_DMA, for x86_32, ZONE_HIGHMEM).
>
> 2. In __offline_pages(), test_pages_in_a_zone() guaranteed the pages of a memory block
> we try to offline are in the same zone. If a section contains more than one zone,
> the memory block can not be offlined.
>
> Based on the above two points, i think the pages of a memory block are in one zone, and the sections
> of a memory block are in one zone.
>
> Could you please explain in detail what is the case a section can contain more than one zone ?
>
> Thanks for your comments!
>
>> I'd really much rather see an interface that has a section itself
>> enumerate to which zones it may be changed. The way you have it now,
>> any user has to know the rules that you've laid out above. If the
>> kernel changed those restrictions, we'd have to teach every application
>> about the change in restrictions.
>>
Here you are right too, we should add an interface to show which zones a memory block may
be changed to. So user doesn't need to know the rules above.
I will send a new version.

Thank you very much !

>
> This interface is designed to show which zone a memory block is in. If the kernel changed those
> restrictions, this interface doesn't need to change.
> For a x86_64 machine booted with "mem=400M" and with 2GiB memory installed.
> Sample output of the sysfs files:
> # cat block_size_bytes
> 8000000
> # cat memory0/zone_index
> DMA
> # cat memory1/zone_index
> DMA32
> # cat memory2/zone_index
> DMA32
> # cat memory3/zone_index
> DMA32
> # echo 0x20000000 > probe
> # cat memory4/zone_index
> Normal
> # echo online > memory4/state
> # cat memory4/zone_index
> Normal
>
> # echo offline > memory4/state
> # echo online_movable > memory4/state
> # cat memory4/zone_index
> Movable
>
> Thanks!
>
> Best regards!
>>
>>
>>
>> .
>>
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>
>