2015-12-21 03:13:06

by Joonsoo Kim

[permalink] [raw]
Subject: [RFC] theoretical race between memory hotplug and pfn iterator

Hello, memory-hotplug folks.

I found theoretical problems between memory hotplug and pfn iterator.
For example, pfn iterator works something like below.

for (pfn = zone_start_pfn; pfn < zone_end_pfn; pfn++) {
if (!pfn_valid(pfn))
continue;

page = pfn_to_page(pfn);
/* Do whatever we want */
}

Sequence of hotplug is something like below.

1) add memmap (after then, pfn_valid will return valid)
2) memmap_init_zone()

So, if pfn iterator runs between 1) and 2), it could access
uninitialized page information.

This problem could be solved by re-ordering initialization steps.

Hot-remove also has a problem. If memory is hot-removed after
pfn_valid() succeed in pfn iterator, access to page would cause NULL
deference because hot-remove frees corresponding memmap. There is no
guard against free in any pfn iterators.

This problem can be solved by inserting get_online_mems() in all pfn
iterators but this looks error-prone for future usage. Another idea is
that delaying free corresponding memmap until synchronization point such
as system suspend. It will guarantee that there is no running pfn
iterator. Do any have a better idea?

Btw, I tried to memory-hotremove with QEMU 2.5.5 but it didn't work. I
followed sequences in doc/memory-hotplug. Do you have any comment on this?

Thanks.


2015-12-21 07:02:54

by Zhu Guihua

[permalink] [raw]
Subject: Re: [RFC] theoretical race between memory hotplug and pfn iterator


On 12/21/2015 11:15 AM, Joonsoo Kim wrote:
> Hello, memory-hotplug folks.
>
> I found theoretical problems between memory hotplug and pfn iterator.
> For example, pfn iterator works something like below.
>
> for (pfn = zone_start_pfn; pfn < zone_end_pfn; pfn++) {
> if (!pfn_valid(pfn))
> continue;
>
> page = pfn_to_page(pfn);
> /* Do whatever we want */
> }
>
> Sequence of hotplug is something like below.
>
> 1) add memmap (after then, pfn_valid will return valid)
> 2) memmap_init_zone()
>
> So, if pfn iterator runs between 1) and 2), it could access
> uninitialized page information.
>
> This problem could be solved by re-ordering initialization steps.
>
> Hot-remove also has a problem. If memory is hot-removed after
> pfn_valid() succeed in pfn iterator, access to page would cause NULL
> deference because hot-remove frees corresponding memmap. There is no
> guard against free in any pfn iterators.
>
> This problem can be solved by inserting get_online_mems() in all pfn
> iterators but this looks error-prone for future usage. Another idea is
> that delaying free corresponding memmap until synchronization point such
> as system suspend. It will guarantee that there is no running pfn
> iterator. Do any have a better idea?
>
> Btw, I tried to memory-hotremove with QEMU 2.5.5 but it didn't work. I
> followed sequences in doc/memory-hotplug. Do you have any comment on this?

I tried memory hot remove with qemu 2.5.5 and RHEL 7, it works well.
Maybe you can provide more details, such as guest version, err log.

Thanks,
Zhu

>
> Thanks.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
> .
>


2015-12-21 07:15:50

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [RFC] theoretical race between memory hotplug and pfn iterator

On Mon, Dec 21, 2015 at 03:00:08PM +0800, Zhu Guihua wrote:
>
> On 12/21/2015 11:15 AM, Joonsoo Kim wrote:
> >Hello, memory-hotplug folks.
> >
> >I found theoretical problems between memory hotplug and pfn iterator.
> >For example, pfn iterator works something like below.
> >
> >for (pfn = zone_start_pfn; pfn < zone_end_pfn; pfn++) {
> > if (!pfn_valid(pfn))
> > continue;
> >
> > page = pfn_to_page(pfn);
> > /* Do whatever we want */
> >}
> >
> >Sequence of hotplug is something like below.
> >
> >1) add memmap (after then, pfn_valid will return valid)
> >2) memmap_init_zone()
> >
> >So, if pfn iterator runs between 1) and 2), it could access
> >uninitialized page information.
> >
> >This problem could be solved by re-ordering initialization steps.
> >
> >Hot-remove also has a problem. If memory is hot-removed after
> >pfn_valid() succeed in pfn iterator, access to page would cause NULL
> >deference because hot-remove frees corresponding memmap. There is no
> >guard against free in any pfn iterators.
> >
> >This problem can be solved by inserting get_online_mems() in all pfn
> >iterators but this looks error-prone for future usage. Another idea is
> >that delaying free corresponding memmap until synchronization point such
> >as system suspend. It will guarantee that there is no running pfn
> >iterator. Do any have a better idea?
> >
> >Btw, I tried to memory-hotremove with QEMU 2.5.5 but it didn't work. I
> >followed sequences in doc/memory-hotplug. Do you have any comment on this?
>
> I tried memory hot remove with qemu 2.5.5 and RHEL 7, it works well.
> Maybe you can provide more details, such as guest version, err log.

I'm testing with qemu 2.5.5 and linux-next-20151209 with reverting
following two patches.

"mm/memblock.c: use memblock_insert_region() for the empty array"
"mm-memblock-use-memblock_insert_region-for-the-empty-array-checkpatch-fixes"

When I type "device_del dimm1" in qemu monitor, there is no err log in
kernel and it looks like command has no effect. I inserted log to
acpi_memory_device_remove() but there is no message, too. Is there
another way to check that device_del event is actually transmitted to kernel?

I launch the qemu with following command.
./qemu-system-x86_64-recent -enable-kvm -smp 8 -m 4096,slots=16,maxmem=8G ...

Thanks.

2015-12-21 08:03:28

by Zhu Guihua

[permalink] [raw]
Subject: Re: [RFC] theoretical race between memory hotplug and pfn iterator


On 12/21/2015 03:17 PM, Joonsoo Kim wrote:
> On Mon, Dec 21, 2015 at 03:00:08PM +0800, Zhu Guihua wrote:
>> On 12/21/2015 11:15 AM, Joonsoo Kim wrote:
>>> Hello, memory-hotplug folks.
>>>
>>> I found theoretical problems between memory hotplug and pfn iterator.
>>> For example, pfn iterator works something like below.
>>>
>>> for (pfn = zone_start_pfn; pfn < zone_end_pfn; pfn++) {
>>> if (!pfn_valid(pfn))
>>> continue;
>>>
>>> page = pfn_to_page(pfn);
>>> /* Do whatever we want */
>>> }
>>>
>>> Sequence of hotplug is something like below.
>>>
>>> 1) add memmap (after then, pfn_valid will return valid)
>>> 2) memmap_init_zone()
>>>
>>> So, if pfn iterator runs between 1) and 2), it could access
>>> uninitialized page information.
>>>
>>> This problem could be solved by re-ordering initialization steps.
>>>
>>> Hot-remove also has a problem. If memory is hot-removed after
>>> pfn_valid() succeed in pfn iterator, access to page would cause NULL
>>> deference because hot-remove frees corresponding memmap. There is no
>>> guard against free in any pfn iterators.
>>>
>>> This problem can be solved by inserting get_online_mems() in all pfn
>>> iterators but this looks error-prone for future usage. Another idea is
>>> that delaying free corresponding memmap until synchronization point such
>>> as system suspend. It will guarantee that there is no running pfn
>>> iterator. Do any have a better idea?
>>>
>>> Btw, I tried to memory-hotremove with QEMU 2.5.5 but it didn't work. I
>>> followed sequences in doc/memory-hotplug. Do you have any comment on this?
>> I tried memory hot remove with qemu 2.5.5 and RHEL 7, it works well.
>> Maybe you can provide more details, such as guest version, err log.
> I'm testing with qemu 2.5.5 and linux-next-20151209 with reverting
> following two patches.
>
> "mm/memblock.c: use memblock_insert_region() for the empty array"
> "mm-memblock-use-memblock_insert_region-for-the-empty-array-checkpatch-fixes"
>
> When I type "device_del dimm1" in qemu monitor, there is no err log in
> kernel and it looks like command has no effect. I inserted log to
> acpi_memory_device_remove() but there is no message, too. Is there
> another way to check that device_del event is actually transmitted to kernel?

You can use udev to monitor memory device remove event. (udevadm monitor)

>
> I launch the qemu with following command.
> ./qemu-system-x86_64-recent -enable-kvm -smp 8 -m 4096,slots=16,maxmem=8G ...
>
> Thanks.
>
>
> .
>


2015-12-21 12:09:43

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [RFC] theoretical race between memory hotplug and pfn iterator

2015-12-21 17:00 GMT+09:00 Zhu Guihua <[email protected]>:
>
> On 12/21/2015 03:17 PM, Joonsoo Kim wrote:
>>
>> On Mon, Dec 21, 2015 at 03:00:08PM +0800, Zhu Guihua wrote:
>>>
>>> On 12/21/2015 11:15 AM, Joonsoo Kim wrote:
>>>>
>>>> Hello, memory-hotplug folks.
>>>>
>>>> I found theoretical problems between memory hotplug and pfn iterator.
>>>> For example, pfn iterator works something like below.
>>>>
>>>> for (pfn = zone_start_pfn; pfn < zone_end_pfn; pfn++) {
>>>> if (!pfn_valid(pfn))
>>>> continue;
>>>>
>>>> page = pfn_to_page(pfn);
>>>> /* Do whatever we want */
>>>> }
>>>>
>>>> Sequence of hotplug is something like below.
>>>>
>>>> 1) add memmap (after then, pfn_valid will return valid)
>>>> 2) memmap_init_zone()
>>>>
>>>> So, if pfn iterator runs between 1) and 2), it could access
>>>> uninitialized page information.
>>>>
>>>> This problem could be solved by re-ordering initialization steps.
>>>>
>>>> Hot-remove also has a problem. If memory is hot-removed after
>>>> pfn_valid() succeed in pfn iterator, access to page would cause NULL
>>>> deference because hot-remove frees corresponding memmap. There is no
>>>> guard against free in any pfn iterators.
>>>>
>>>> This problem can be solved by inserting get_online_mems() in all pfn
>>>> iterators but this looks error-prone for future usage. Another idea is
>>>> that delaying free corresponding memmap until synchronization point such
>>>> as system suspend. It will guarantee that there is no running pfn
>>>> iterator. Do any have a better idea?
>>>>
>>>> Btw, I tried to memory-hotremove with QEMU 2.5.5 but it didn't work. I
>>>> followed sequences in doc/memory-hotplug. Do you have any comment on
>>>> this?
>>>
>>> I tried memory hot remove with qemu 2.5.5 and RHEL 7, it works well.
>>> Maybe you can provide more details, such as guest version, err log.
>>
>> I'm testing with qemu 2.5.5 and linux-next-20151209 with reverting
>> following two patches.
>>
>> "mm/memblock.c: use memblock_insert_region() for the empty array"
>>
>> "mm-memblock-use-memblock_insert_region-for-the-empty-array-checkpatch-fixes"
>>
>> When I type "device_del dimm1" in qemu monitor, there is no err log in
>> kernel and it looks like command has no effect. I inserted log to
>> acpi_memory_device_remove() but there is no message, too. Is there
>> another way to check that device_del event is actually transmitted to
>> kernel?
>
>
> You can use udev to monitor memory device remove event. (udevadm monitor)
>

I have tried it but there is no message when I type hot-remove command.

Thanks.