LinuxLists.cc - [PATCH] Fix race in set_max_huge_pages for multiple updaters of nr_huge

2005-11-18 17:51:56

Subject: [PATCH] Fix race in set_max_huge_pages for multiple updaters of nr_huge_pages

If there are multiple updaters to /proc/sys/vm/nr_hugepages
simultaneously it is possible for the nr_huge_pages variable to become
incorrect. There is no locking in the set_max_huge_pages function
around alloc_fresh_huge_page which is able to update nr_huge_pages. Two
callers to alloc_fresh_huge_page could race against each other as could
a call to alloc_fresh_huge_page and a call to update_and_free_page.
This patch just expands the area covered by the hugetlb_lock to cover
the call into alloc_fresh_huge_page. I'm not sure how we could say that
a sysctl section is performance critical where more specific locking
would be needed.

My reproducer was to run a couple copies of the following script
simultaneously

while [ true ]; do
echo 1000 > /proc/sys/vm/nr_hugepages
echo 500 > /proc/sys/vm/nr_hugepages
echo 750 > /proc/sys/vm/nr_hugepages
echo 100 > /proc/sys/vm/nr_hugepages
echo 0 > /proc/sys/vm/nr_hugepages
done

and then watch /proc/meminfo and eventually you will see things like

HugePages_Total: 100
HugePages_Free: 109

After applying the patch all seemed well.

Signed-off-by: Eric Paris <[email protected]>
----

hugetlb.c | 10 +++++++---
1 files changed, 7 insertions(+), 3 deletions(-)

--- linux-2.6.14.2/mm/hugetlb.c.old
+++ linux-2.6.14.2/mm/hugetlb.c
@@ -22,6 +22,7 @@
static struct list_head hugepage_freelists[MAX_NUMNODES];
static unsigned int nr_huge_pages_node[MAX_NUMNODES];
static unsigned int free_huge_pages_node[MAX_NUMNODES];
+/* This lock protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages */
static DEFINE_SPINLOCK(hugetlb_lock);

static void enqueue_huge_page(struct page *page)
@@ -172,10 +173,13 @@
static unsigned long set_max_huge_pages(unsigned long count)
{
while (count > nr_huge_pages) {
- struct page *page = alloc_fresh_huge_page();
- if (!page)
- return nr_huge_pages;
+ struct page *page;
spin_lock(&hugetlb_lock);
+ page = alloc_fresh_huge_page();
+ if (!page) {
+ spin_unlock(&hugetlb_lock);
+ return nr_huge_pages;
+ }
enqueue_huge_page(page);
spin_unlock(&hugetlb_lock);
}

2005-11-19 03:10:15

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH] Fix race in set_max_huge_pages for multiple updaters of nr_huge_pages

Eric Paris <[email protected]> wrote:
>
> If there are multiple updaters to /proc/sys/vm/nr_hugepages
> simultaneously it is possible for the nr_huge_pages variable to become
> incorrect. There is no locking in the set_max_huge_pages function
> around alloc_fresh_huge_page which is able to update nr_huge_pages. Two
> callers to alloc_fresh_huge_page could race against each other as could
> a call to alloc_fresh_huge_page and a call to update_and_free_page.
> This patch just expands the area covered by the hugetlb_lock to cover
> the call into alloc_fresh_huge_page. I'm not sure how we could say that
> a sysctl section is performance critical where more specific locking
> would be needed.
>
> My reproducer was to run a couple copies of the following script
> simultaneously
>
> while [ true ]; do
> echo 1000 > /proc/sys/vm/nr_hugepages
> echo 500 > /proc/sys/vm/nr_hugepages
> echo 750 > /proc/sys/vm/nr_hugepages
> echo 100 > /proc/sys/vm/nr_hugepages
> echo 0 > /proc/sys/vm/nr_hugepages
> done
>
> and then watch /proc/meminfo and eventually you will see things like
>
> HugePages_Total: 100
> HugePages_Free: 109
>
> After applying the patch all seemed well.
>
> Signed-off-by: Eric Paris <[email protected]>
> ----
>
> hugetlb.c | 10 +++++++---
> 1 files changed, 7 insertions(+), 3 deletions(-)
>
> --- linux-2.6.14.2/mm/hugetlb.c.old
> +++ linux-2.6.14.2/mm/hugetlb.c
> @@ -22,6 +22,7 @@
> static struct list_head hugepage_freelists[MAX_NUMNODES];
> static unsigned int nr_huge_pages_node[MAX_NUMNODES];
> static unsigned int free_huge_pages_node[MAX_NUMNODES];
> +/* This lock protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages */
> static DEFINE_SPINLOCK(hugetlb_lock);
>
> static void enqueue_huge_page(struct page *page)
> @@ -172,10 +173,13 @@
> static unsigned long set_max_huge_pages(unsigned long count)
> {
> while (count > nr_huge_pages) {
> - struct page *page = alloc_fresh_huge_page();
> - if (!page)
> - return nr_huge_pages;
> + struct page *page;
> spin_lock(&hugetlb_lock);
> + page = alloc_fresh_huge_page();
> + if (!page) {
> + spin_unlock(&hugetlb_lock);
> + return nr_huge_pages;
> + }
> enqueue_huge_page(page);
> spin_unlock(&hugetlb_lock);
> }

Nope, alloc_fresh_huge_page() does a GFP_HIGHUSER allocation, which can
sleep and may not be called inside spinlock. You would have seen a spew of
might_sleep() warnings if this was tested with the appropriate kernel
debugging options.

How about this?

--- devel/mm/hugetlb.c~hugetlb-fix-race-in-set_max_huge_pages-for-multiple-updaters-of-nr_huge_pages 2005-11-18 19:04:10.000000000 -0800
+++ devel-akpm/mm/hugetlb.c 2005-11-18 19:07:26.000000000 -0800
@@ -22,6 +22,10 @@ unsigned long max_huge_pages;
static struct list_head hugepage_freelists[MAX_NUMNODES];
static unsigned int nr_huge_pages_node[MAX_NUMNODES];
static unsigned int free_huge_pages_node[MAX_NUMNODES];
+
+/*
+ * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages
+ */
static DEFINE_SPINLOCK(hugetlb_lock);

static void enqueue_huge_page(struct page *page)
@@ -61,8 +65,10 @@ static struct page *alloc_fresh_huge_pag
HUGETLB_PAGE_ORDER);
nid = (nid + 1) % num_online_nodes();
if (page) {
+ spin_lock(&hugetlb_lock);
nr_huge_pages++;
nr_huge_pages_node[page_to_nid(page)]++;
+ spin_unlock(&hugetlb_lock);
}
return page;
}
_

2005-11-19 03:25:10

by William Lee Irwin III

[permalink] [raw]

Subject: Re: [PATCH] Fix race in set_max_huge_pages for multiple updaters of nr_huge_pages

On Fri, Nov 18, 2005 at 07:09:50PM -0800, Andrew Morton wrote:
> Nope, alloc_fresh_huge_page() does a GFP_HIGHUSER allocation, which can
> sleep and may not be called inside spinlock. You would have seen a spew of
> might_sleep() warnings if this was tested with the appropriate kernel
> debugging options.
> How about this?

Looks good. My reply got stuck behind a queue of other things to do
since it needed a correction.

Acked-by: William Irwin <[email protected]>

-- wli