2014-02-05 16:29:12

by Nathan Zimmer

[permalink] [raw]
Subject: [RFC] Move the memory_notifier out of the memory_hotplug lock

There are a few spots where we call memory_notifier. This doesn't need to be
done under lock since memory_notify is just a blocking_notifier_call_chain
with it's own locking mechanism.

This RFC is a follow on to the one I submitted earlier.
(move register_memory_resource out of the lock_memory_hotplug, commit ac13c46)
Most of our time is being spend under the memory hotplug lock in particular
online_pages() so it makes sense to move out everything that can be easily
moved out.

However perf pointed me to a spot to work on, setup_zone_migrate_reserve.
In fact most of the time is spent there. Since that is going to require
more reading and time I will start by whittling out some easy pieces.

cc: Andrew Morton <[email protected]>
cc: Tang Chen <[email protected]>
cc: Wen Congyang <[email protected]>
cc: Toshi Kani <[email protected]>
cc: Yasuaki Ishimatsu <[email protected]>
cc: Xishi Qiu <[email protected]>
cc: Cody P Schafer <[email protected]>
cc: "Rafael J. Wysocki" <[email protected]>
cc: David Rientjes <[email protected]>
cc: Jiang Liu <[email protected]>
Cc: Hedi Berriche <[email protected]>
Cc: Mike Travis <[email protected]>
[email protected] (open list:MEMORY MANAGEMENT)
[email protected] (open list)

---
mm/memory_hotplug.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 62a0cd1..a3cbd14 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -985,12 +985,12 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
if (need_zonelists_rebuild)
zone_pcp_reset(zone);
mutex_unlock(&zonelists_mutex);
+ unlock_memory_hotplug();
printk(KERN_DEBUG "online_pages [mem %#010llx-%#010llx] failed\n",
(unsigned long long) pfn << PAGE_SHIFT,
(((unsigned long long) pfn + nr_pages)
<< PAGE_SHIFT) - 1);
memory_notify(MEM_CANCEL_ONLINE, &arg);
- unlock_memory_hotplug();
return ret;
}

@@ -1016,9 +1016,10 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ

writeback_set_ratelimit();

+ unlock_memory_hotplug();
+
if (onlined_pages)
memory_notify(MEM_ONLINE, &arg);
- unlock_memory_hotplug();

return 0;
}
@@ -1601,8 +1602,8 @@ repeat:
vm_total_pages = nr_free_pagecache_pages();
writeback_set_ratelimit();

- memory_notify(MEM_OFFLINE, &arg);
unlock_memory_hotplug();
+ memory_notify(MEM_OFFLINE, &arg);
return 0;

failed_removal:
--
1.8.2.1


2014-02-05 20:29:34

by David Rientjes

[permalink] [raw]
Subject: Re: [RFC] Move the memory_notifier out of the memory_hotplug lock

On Wed, 5 Feb 2014, Nathan Zimmer wrote:

> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 62a0cd1..a3cbd14 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -985,12 +985,12 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
> if (need_zonelists_rebuild)
> zone_pcp_reset(zone);
> mutex_unlock(&zonelists_mutex);
> + unlock_memory_hotplug();
> printk(KERN_DEBUG "online_pages [mem %#010llx-%#010llx] failed\n",
> (unsigned long long) pfn << PAGE_SHIFT,
> (((unsigned long long) pfn + nr_pages)
> << PAGE_SHIFT) - 1);
> memory_notify(MEM_CANCEL_ONLINE, &arg);
> - unlock_memory_hotplug();
> return ret;
> }
>
> @@ -1016,9 +1016,10 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
>
> writeback_set_ratelimit();
>
> + unlock_memory_hotplug();
> +
> if (onlined_pages)
> memory_notify(MEM_ONLINE, &arg);
> - unlock_memory_hotplug();
>
> return 0;
> }

That looks a little problematic, what happens if a nid is being brought
online and a registered callback does something like allocate resources
for the arg->status_change_nid and the above two hunks of this patch end
up racing?

Before, a registered callback would be guaranteed to see either a
MEMORY_CANCEL_ONLINE or MEMORY_ONLINE after it has already done
MEMORY_GOING_ONLINE.

With your patch, we could race and see one cpu doing MEMORY_GOING_ONLINE,
another cpu doing MEMORY_GOING_ONLINE, and then MEMORY_ONLINE and
MEMORY_CANCEL_ONLINE in either order.

So I think this patch will break most registered callbacks that actually
depend on lock_memory_hotplug(), it's a coarse lock for that reason.

2014-02-05 23:10:42

by Nathan Zimmer

[permalink] [raw]
Subject: Re: [RFC] Move the memory_notifier out of the memory_hotplug lock

On 02/05/2014 02:29 PM, David Rientjes wrote:
> On Wed, 5 Feb 2014, Nathan Zimmer wrote:
>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 62a0cd1..a3cbd14 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -985,12 +985,12 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
>> if (need_zonelists_rebuild)
>> zone_pcp_reset(zone);
>> mutex_unlock(&zonelists_mutex);
>> + unlock_memory_hotplug();
>> printk(KERN_DEBUG "online_pages [mem %#010llx-%#010llx] failed\n",
>> (unsigned long long) pfn << PAGE_SHIFT,
>> (((unsigned long long) pfn + nr_pages)
>> << PAGE_SHIFT) - 1);
>> memory_notify(MEM_CANCEL_ONLINE, &arg);
>> - unlock_memory_hotplug();
>> return ret;
>> }
>>
>> @@ -1016,9 +1016,10 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
>>
>> writeback_set_ratelimit();
>>
>> + unlock_memory_hotplug();
>> +
>> if (onlined_pages)
>> memory_notify(MEM_ONLINE, &arg);
>> - unlock_memory_hotplug();
>>
>> return 0;
>> }
> That looks a little problematic, what happens if a nid is being brought
> online and a registered callback does something like allocate resources
> for the arg->status_change_nid and the above two hunks of this patch end
> up racing?
>
> Before, a registered callback would be guaranteed to see either a
> MEMORY_CANCEL_ONLINE or MEMORY_ONLINE after it has already done
> MEMORY_GOING_ONLINE.
>
> With your patch, we could race and see one cpu doing MEMORY_GOING_ONLINE,
> another cpu doing MEMORY_GOING_ONLINE, and then MEMORY_ONLINE and
> MEMORY_CANCEL_ONLINE in either order.
>
> So I think this patch will break most registered callbacks that actually
> depend on lock_memory_hotplug(), it's a coarse lock for that reason.

Since the argument being passed in is the pfn and size it would be an issue
only if two threads attepted to online the same piece of memory. Right?

That seems very unlikely but if it can happen it needs to be protected against.

Nate


2014-02-05 23:20:13

by David Rientjes

[permalink] [raw]
Subject: Re: [RFC] Move the memory_notifier out of the memory_hotplug lock

On Wed, 5 Feb 2014, Nathan Zimmer wrote:

> > That looks a little problematic, what happens if a nid is being brought
> > online and a registered callback does something like allocate resources
> > for the arg->status_change_nid and the above two hunks of this patch end
> > up racing?
> >
> > Before, a registered callback would be guaranteed to see either a
> > MEMORY_CANCEL_ONLINE or MEMORY_ONLINE after it has already done
> > MEMORY_GOING_ONLINE.
> >
> > With your patch, we could race and see one cpu doing MEMORY_GOING_ONLINE,
> > another cpu doing MEMORY_GOING_ONLINE, and then MEMORY_ONLINE and
> > MEMORY_CANCEL_ONLINE in either order.
> >
> > So I think this patch will break most registered callbacks that actually
> > depend on lock_memory_hotplug(), it's a coarse lock for that reason.
>
> Since the argument being passed in is the pfn and size it would be an issue
> only if two threads attepted to online the same piece of memory. Right?
>

No, I'm referring to registered callbacks that provide a resource for
arg->status_change_nid. An example would be the callbacks I added to the
slub allocator in slab_memory_callback(). If we are now able to get a
racy MEM_GOING_ONLINE -> MEM_GOING_ONLINE -> MEM_ONLINE ->
MEM_CANCEL_ONLINE, which is possible with your patch _and_ the node being
successfully onlined at the end, then we get a NULL pointer dereference
because the kmem_cache_node for each slab cache has been freed.

> That seems very unlikely but if it can happen it needs to be protected
> against.
>

The protection for registered memory online or offline callbacks is
lock_memory_hotplug() which is eliminated with your patch, the locking for
memory_notify() that you're citing is irrelevant.

2014-02-06 16:09:44

by Nathan Zimmer

[permalink] [raw]
Subject: Re: [RFC] Move the memory_notifier out of the memory_hotplug lock

On Wed, Feb 05, 2014 at 03:20:07PM -0800, David Rientjes wrote:
> On Wed, 5 Feb 2014, Nathan Zimmer wrote:
>
> > > That looks a little problematic, what happens if a nid is being brought
> > > online and a registered callback does something like allocate resources
> > > for the arg->status_change_nid and the above two hunks of this patch end
> > > up racing?
> > >
> > > Before, a registered callback would be guaranteed to see either a
> > > MEMORY_CANCEL_ONLINE or MEMORY_ONLINE after it has already done
> > > MEMORY_GOING_ONLINE.
> > >
> > > With your patch, we could race and see one cpu doing MEMORY_GOING_ONLINE,
> > > another cpu doing MEMORY_GOING_ONLINE, and then MEMORY_ONLINE and
> > > MEMORY_CANCEL_ONLINE in either order.
> > >
> > > So I think this patch will break most registered callbacks that actually
> > > depend on lock_memory_hotplug(), it's a coarse lock for that reason.
> >
> > Since the argument being passed in is the pfn and size it would be an issue
> > only if two threads attepted to online the same piece of memory. Right?
> >
>
> No, I'm referring to registered callbacks that provide a resource for
> arg->status_change_nid. An example would be the callbacks I added to the
> slub allocator in slab_memory_callback(). If we are now able to get a
> racy MEM_GOING_ONLINE -> MEM_GOING_ONLINE -> MEM_ONLINE ->
> MEM_CANCEL_ONLINE, which is possible with your patch _and_ the node being
> successfully onlined at the end, then we get a NULL pointer dereference
> because the kmem_cache_node for each slab cache has been freed.
>
Ok I think I see now. In my testing I had only been onlining parts of nodes.
So all nodes were already had at least some memory online from the beginning.

> > That seems very unlikely but if it can happen it needs to be protected
> > against.
> >
>
> The protection for registered memory online or offline callbacks is
> lock_memory_hotplug() which is eliminated with your patch, the locking for
> memory_notify() that you're citing is irrelevant.

Would the race still exist if we left the position of the locks alone and
broke it up by nid, something like this?


diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index ee37657..e797e21 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -913,7 +913,9 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
int ret;
struct memory_notify arg;

- lock_memory_hotplug();
+ nid = page_to_nid(pfn_to_page(pfn));
+
+ lock_memory_hotplug(nid);
/*
* This doesn't need a lock to do pfn_to_page().
* The section can't be removed here because of the
@@ -923,19 +925,19 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ

if ((zone_idx(zone) > ZONE_NORMAL || online_type == ONLINE_MOVABLE) &&
!can_online_high_movable(zone)) {
- unlock_memory_hotplug();
+ unlock_memory_hotplug(nid);
return -1;
}

if (online_type == ONLINE_KERNEL && zone_idx(zone) == ZONE_MOVABLE) {
if (move_pfn_range_left(zone - 1, zone, pfn, pfn + nr_pages)) {
- unlock_memory_hotplug();
+ unlock_memory_hotplug(nid);
return -1;
}
}
if (online_type == ONLINE_MOVABLE && zone_idx(zone) == ZONE_MOVABLE - 1) {
if (move_pfn_range_right(zone, zone + 1, pfn, pfn + nr_pages)) {
- unlock_memory_hotplug();
+ unlock_memory_hotplug(nid);
return -1;
}
}
@@ -947,13 +949,11 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
arg.nr_pages = nr_pages;
node_states_check_changes_online(nr_pages, zone, &arg);

- nid = page_to_nid(pfn_to_page(pfn));
-
ret = memory_notify(MEM_GOING_ONLINE, &arg);
ret = notifier_to_errno(ret);
if (ret) {
memory_notify(MEM_CANCEL_ONLINE, &arg);
- unlock_memory_hotplug();
+ unlock_memory_hotplug(nid);
return ret;
}
/*
@@ -978,7 +978,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
(((unsigned long long) pfn + nr_pages)
<< PAGE_SHIFT) - 1);
memory_notify(MEM_CANCEL_ONLINE, &arg);
- unlock_memory_hotplug();
+ unlock_memory_hotplug(nid);
return ret;
}

@@ -1006,7 +1006,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ

if (onlined_pages)
memory_notify(MEM_ONLINE, &arg);
- unlock_memory_hotplug();
+ unlock_memory_hotplug(nid);

return 0;
}

2014-02-07 09:30:20

by Jiang Liu

[permalink] [raw]
Subject: Re: [RFC] Move the memory_notifier out of the memory_hotplug lock

Hi Nathan,
I feel some registered memory hotplug notification callbacks
may have an assumption that they will be serialized by the memory
hotplug framework. If we relax the lock semantics, we need to scan
all those callbacks and make sure they are safe.
And it's easy for user to trigger concurrent online/offline
requests through sysfs interfaces. So it would be better to keep
strict lock semantics.
Thanks!
Gerry

On 2014/2/7 0:09, Nathan Zimmer wrote:
> On Wed, Feb 05, 2014 at 03:20:07PM -0800, David Rientjes wrote:
>> On Wed, 5 Feb 2014, Nathan Zimmer wrote:
>>
>>>> That looks a little problematic, what happens if a nid is being brought
>>>> online and a registered callback does something like allocate resources
>>>> for the arg->status_change_nid and the above two hunks of this patch end
>>>> up racing?
>>>>
>>>> Before, a registered callback would be guaranteed to see either a
>>>> MEMORY_CANCEL_ONLINE or MEMORY_ONLINE after it has already done
>>>> MEMORY_GOING_ONLINE.
>>>>
>>>> With your patch, we could race and see one cpu doing MEMORY_GOING_ONLINE,
>>>> another cpu doing MEMORY_GOING_ONLINE, and then MEMORY_ONLINE and
>>>> MEMORY_CANCEL_ONLINE in either order.
>>>>
>>>> So I think this patch will break most registered callbacks that actually
>>>> depend on lock_memory_hotplug(), it's a coarse lock for that reason.
>>>
>>> Since the argument being passed in is the pfn and size it would be an issue
>>> only if two threads attepted to online the same piece of memory. Right?
>>>
>>
>> No, I'm referring to registered callbacks that provide a resource for
>> arg->status_change_nid. An example would be the callbacks I added to the
>> slub allocator in slab_memory_callback(). If we are now able to get a
>> racy MEM_GOING_ONLINE -> MEM_GOING_ONLINE -> MEM_ONLINE ->
>> MEM_CANCEL_ONLINE, which is possible with your patch _and_ the node being
>> successfully onlined at the end, then we get a NULL pointer dereference
>> because the kmem_cache_node for each slab cache has been freed.
>>
> Ok I think I see now. In my testing I had only been onlining parts of nodes.
> So all nodes were already had at least some memory online from the beginning.
>
>>> That seems very unlikely but if it can happen it needs to be protected
>>> against.
>>>
>>
>> The protection for registered memory online or offline callbacks is
>> lock_memory_hotplug() which is eliminated with your patch, the locking for
>> memory_notify() that you're citing is irrelevant.
>
> Would the race still exist if we left the position of the locks alone and
> broke it up by nid, something like this?
>
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index ee37657..e797e21 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -913,7 +913,9 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
> int ret;
> struct memory_notify arg;
>
> - lock_memory_hotplug();
> + nid = page_to_nid(pfn_to_page(pfn));
> +
> + lock_memory_hotplug(nid);
> /*
> * This doesn't need a lock to do pfn_to_page().
> * The section can't be removed here because of the
> @@ -923,19 +925,19 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
>
> if ((zone_idx(zone) > ZONE_NORMAL || online_type == ONLINE_MOVABLE) &&
> !can_online_high_movable(zone)) {
> - unlock_memory_hotplug();
> + unlock_memory_hotplug(nid);
> return -1;
> }
>
> if (online_type == ONLINE_KERNEL && zone_idx(zone) == ZONE_MOVABLE) {
> if (move_pfn_range_left(zone - 1, zone, pfn, pfn + nr_pages)) {
> - unlock_memory_hotplug();
> + unlock_memory_hotplug(nid);
> return -1;
> }
> }
> if (online_type == ONLINE_MOVABLE && zone_idx(zone) == ZONE_MOVABLE - 1) {
> if (move_pfn_range_right(zone, zone + 1, pfn, pfn + nr_pages)) {
> - unlock_memory_hotplug();
> + unlock_memory_hotplug(nid);
> return -1;
> }
> }
> @@ -947,13 +949,11 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
> arg.nr_pages = nr_pages;
> node_states_check_changes_online(nr_pages, zone, &arg);
>
> - nid = page_to_nid(pfn_to_page(pfn));
> -
> ret = memory_notify(MEM_GOING_ONLINE, &arg);
> ret = notifier_to_errno(ret);
> if (ret) {
> memory_notify(MEM_CANCEL_ONLINE, &arg);
> - unlock_memory_hotplug();
> + unlock_memory_hotplug(nid);
> return ret;
> }
> /*
> @@ -978,7 +978,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
> (((unsigned long long) pfn + nr_pages)
> << PAGE_SHIFT) - 1);
> memory_notify(MEM_CANCEL_ONLINE, &arg);
> - unlock_memory_hotplug();
> + unlock_memory_hotplug(nid);
> return ret;
> }
>
> @@ -1006,7 +1006,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
>
> if (onlined_pages)
> memory_notify(MEM_ONLINE, &arg);
> - unlock_memory_hotplug();
> + unlock_memory_hotplug(nid);
>
> return 0;
> }
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>