2013-04-26 12:13:15

by Zhang Yi

[permalink] [raw]
Subject: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage

Hi ,

At 2013-04-26 04:52:31,"Thomas Gleixner" <[email protected]> wrote:
>
>Unfortunately this did not work out very well.
>
>1. Your patch now lacks a proper changelog which explains the change
>
>2. Your patch lacks any newline characters as you can see below
>

I am so sorry for my mistakes. : )




The futex-keys of processes share futex determined by page-offset, mapping-host, and
mapping-index of the user space address.
User appications using hugepage for futex may lead to futex-key conflict.
Assume there are two or more futexes in diffrent normal pages of the hugepage,
and each futex has the same offset in its normal page, causing all the futexes have the same futex-key.
In that case, futex may not work well.

This patch adds the normal page index in the compound page into the offset of futex-key.

Steps to reproduce the bug:
1. The 1st thread map a file of hugetlbfs, and use the return address as the 1st mutex's
address, and use the return address with PAGE_SIZE added as the 2nd mutex's address;
2. The 1st thread initialize the two mutexes with pshared attribute, and lock the two mutexes.
3. The 1st thread create the 2nd thread, and the 2nd thread block on the 1st mutex.
4. The 1st thread create the 3rd thread, and the 3rd thread block on the 2nd mutex.
5. The 1st thread unlock the 2nd mutex, the 3rd thread can not take the 2nd mutex, and
may block forever.


Signed-off-by: Zhang Yi <[email protected]>
Tested-by: Ma Chenggong <[email protected]>
Reviewed-by: Liu Dong <[email protected]>
Reviewed-by: Cui Yunfeng <[email protected]>
Reviewed-by: Lu Zhongjun <[email protected]>
Reviewed-by: Jiang Biao <[email protected]>


diff -uprN orig/linux3.9-rc7/include/linux/futex.h new/linux3.9-rc7/include/linux/futex.h
--- orig/linux3.9-rc7/include/linux/futex.h 2013-04-15 00:45:16.000000000 +0000
+++ new/linux3.9-rc7/include/linux/futex.h 2013-04-19 16:33:58.725880000 +0000
@@ -19,7 +19,7 @@ handle_futex_death(u32 __user *uaddr, st
* The key type depends on whether it's a shared or private mapping.
* Don't rearrange members without looking at hash_futex().
*
- * offset is aligned to a multiple of sizeof(u32) (== 4) by definition.
+ * There are three components in offset:
* We use the two low order bits of offset to tell what is the kind of key :
* 00 : Private process futex (PTHREAD_PROCESS_PRIVATE)
* (no reference on an inode or mm)
@@ -27,6 +27,9 @@ handle_futex_death(u32 __user *uaddr, st
* mapped on a file (reference on the underlying inode)
* 10 : Shared futex (PTHREAD_PROCESS_SHARED)
* (but private mapping on an mm, and reference taken on it)
+ * Bits 2 to (PAGE_SHIFT-1) indicates the offset of futex in its page.
+ * The rest hign order bits indicates the index if the page is a
+ * subpage of a compound page.
*/

#define FUT_OFF_INODE 1 /* We set bit 0 if key has a reference on inode */
@@ -36,17 +39,17 @@ union futex_key {
struct {
unsigned long pgoff;
struct inode *inode;
- int offset;
+ long offset;
} shared;
struct {
unsigned long address;
struct mm_struct *mm;
- int offset;
+ long offset;
} private;
struct {
unsigned long word;
void *ptr;
- int offset;
+ long offset;
} both;
};

diff -uprN orig/linux3.9-rc7/kernel/futex.c new/linux3.9-rc7/kernel/futex.c
--- orig/linux3.9-rc7/kernel/futex.c 2013-04-15 00:45:16.000000000 +0000
+++ new/linux3.9-rc7/kernel/futex.c 2013-04-19 16:24:05.629143000 +0000
@@ -215,6 +215,22 @@ static void drop_futex_key_refs(union fu
}
}

+/*
+* Get subpage index in compound page, for futex_key.
+*/
+static inline int get_page_compound_index(struct page *page)
+{
+ struct page *head_page;
+ if (PageHead(page))
+ return 0;
+
+ head_page = compound_head(page);
+ if (compound_order(head_page) >= MAX_ORDER)
+ return page_to_pfn(page) - page_to_pfn(head_page);
+ else
+ return page - compound_head(page);
+}
+
/**
* get_futex_key() - Get parameters which are the keys for a futex
* @uaddr: virtual address of the futex
@@ -228,7 +244,8 @@ static void drop_futex_key_refs(union fu
* The key words are stored in *key on success.
*
* For shared mappings, it's (page->index, file_inode(vma->vm_file),
- * offset_within_page). For private mappings, it's (uaddr, current->mm).
+ * page_compound_index and offset_within_page).
+ * For private mappings, it's (uaddr, current->mm).
* We can usually work out the index without swapping in the page.
*
* lock_page() might sleep, the caller should not hold a spinlock.
@@ -239,7 +256,7 @@ get_futex_key(u32 __user *uaddr, int fsh
unsigned long address = (unsigned long)uaddr;
struct mm_struct *mm = current->mm;
struct page *page, *page_head;
- int err, ro = 0;
+ int err, ro = 0, comp_idx = 0;

/*
* The futex address must be "naturally" aligned.
@@ -299,6 +316,7 @@ again:
* freed from under us.
*/
if (page != page_head) {
+ comp_idx = get_page_compound_index(page);
get_page(page_head);
put_page(page);
}
@@ -311,6 +329,7 @@ again:
#else
page_head = compound_head(page);
if (page != page_head) {
+ comp_idx = get_page_compound_index(page);
get_page(page_head);
put_page(page);
}
@@ -363,7 +382,9 @@ again:
key->private.mm = mm;
key->private.address = address;
} else {
- key->both.offset |= FUT_OFF_INODE; /* inode-based key */
+ /* inode-based key */
+ key->both.offset |= ((long)comp_idx << PAGE_SHIFT)
+ | FUT_OFF_INODE;
key->shared.inode = page_head->mapping->host;
key->shared.pgoff = page_head->index;
}


2013-04-26 18:26:14

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage

Zhang,

On Fri, 26 Apr 2013, Zhang Yi wrote:
> At 2013-04-26 04:52:31,"Thomas Gleixner" <[email protected]> wrote:
> >
> >Unfortunately this did not work out very well.
> >
> >1. Your patch now lacks a proper changelog which explains the change
> >
> >2. Your patch lacks any newline characters as you can see below
> >
>
> I am so sorry for my mistakes. : )

Nothing to worry about. We all make mistakes! :)

> The futex-keys of processes share futex determined by page-offset, mapping-host, and
> mapping-index of the user space address.
> User appications using hugepage for futex may lead to futex-key conflict.
> Assume there are two or more futexes in diffrent normal pages of the hugepage,
> and each futex has the same offset in its normal page, causing all the futexes have the same futex-key.

Nit-pick: Please format changelog text with a linebreak around 78
characters. So it looks like this:

The futex-keys of processes share futex determined by page-offset,
mapping-host, and mapping-index of the user space address. User
appications using hugepage for futex may lead to futex-key conflict.

Assume there are two or more futexes in diffrent normal pages of the
hugepage, and each futex has the same offset in its normal page,
causing all the futexes have the same futex-key.

> In that case, futex may not work well.

Very nice detective work!

> diff -uprN orig/linux3.9-rc7/include/linux/futex.h new/linux3.9-rc7/include/linux/futex.h
> --- orig/linux3.9-rc7/include/linux/futex.h 2013-04-15 00:45:16.000000000 +0000
> +++ new/linux3.9-rc7/include/linux/futex.h 2013-04-19 16:33:58.725880000 +0000

The canonical diff for patch submission is

diff -uprN linux3.9-rc7/ linux3.9-rc7.orig/

That results in a patch which can be applied with "patch -p1" from the
kernel base directory and that's how all our scripts work.

Your's needs to be applied with -p2, so it requires manual
interaction.

You can verify that by cd'ing into the kernel tree base directory and
run "patch -p1 < your.patch".

You might have a look at quilt or simply use git, which will do the
right thing for you and in both cases you do not need a separate
kernel tree to diff against.

> #define FUT_OFF_INODE 1 /* We set bit 0 if key has a reference on inode */
> @@ -36,17 +39,17 @@ union futex_key {
> struct {
> unsigned long pgoff;
> struct inode *inode;
> - int offset;
> + long offset;

unsigned long please, offset can't be negative. The "int" type of
offset was silly already.

> +/*
> +* Get subpage index in compound page, for futex_key.
> +*/
> +static inline int get_page_compound_index(struct page *page)
> +{
> + struct page *head_page;
> + if (PageHead(page))
> + return 0;

If you look at the callsite, then you'll see that this is only called
when page != page_head. And page_head = compound_head(page). So you
don't need to double check that.

> +
> + head_page = compound_head(page);

Again. The head page is already known, so you can hand it into the
function.

> + if (compound_order(head_page) >= MAX_ORDER)
> + return page_to_pfn(page) - page_to_pfn(head_page);
> + else
> + return page - compound_head(page);
> +}
> +

Now instead of returning that value, I'd rather hand the futex key
pointer to the function and let the function add the index
value. Something like:

static void key_add_compound_idx(key, page, page_head)
{
...
}

That makes the code simpler and easier to read.

Thanks,

tglx

2013-05-07 12:24:05

by Zhang Yi

[permalink] [raw]
Subject: RE: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage

> -----Original Message-----
> From: Thomas Gleixner [mailto:[email protected]]
> Sent: Saturday, April 27, 2013 2:26 AM
> To: Zhang Yi
> Cc: [email protected]; 'Peter Zijlstra'; 'Darren Hart'; 'Ingo Molnar'; 'Dave Hansen';
[email protected];
> [email protected]
> Subject: Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
>
> Zhang,
>
> On Fri, 26 Apr 2013, Zhang Yi wrote:
> > At 2013-04-26 04:52:31,"Thomas Gleixner" <[email protected]> wrote:
> > >
> > >Unfortunately this did not work out very well.
> > >
> > >1. Your patch now lacks a proper changelog which explains the change
> > >
> > >2. Your patch lacks any newline characters as you can see below
> > >
> >
> > I am so sorry for my mistakes. : )
>
> Nothing to worry about. We all make mistakes! :)
>
> > The futex-keys of processes share futex determined by page-offset, mapping-host, and
> > mapping-index of the user space address.
> > User appications using hugepage for futex may lead to futex-key conflict.
> > Assume there are two or more futexes in diffrent normal pages of the hugepage,
> > and each futex has the same offset in its normal page, causing all the futexes have the same futex-key.
>
> Nit-pick: Please format changelog text with a linebreak around 78
> characters. So it looks like this:
>
> The futex-keys of processes share futex determined by page-offset,
> mapping-host, and mapping-index of the user space address. User
> appications using hugepage for futex may lead to futex-key conflict.
>
> Assume there are two or more futexes in diffrent normal pages of the
> hugepage, and each futex has the same offset in its normal page,
> causing all the futexes have the same futex-key.
>
> > In that case, futex may not work well.
>
> Very nice detective work!
>
> > diff -uprN orig/linux3.9-rc7/include/linux/futex.h new/linux3.9-rc7/include/linux/futex.h
> > --- orig/linux3.9-rc7/include/linux/futex.h 2013-04-15 00:45:16.000000000 +0000
> > +++ new/linux3.9-rc7/include/linux/futex.h 2013-04-19 16:33:58.725880000 +0000
>
> The canonical diff for patch submission is
>
> diff -uprN linux3.9-rc7/ linux3.9-rc7.orig/
>
> That results in a patch which can be applied with "patch -p1" from the
> kernel base directory and that's how all our scripts work.
>
> Your's needs to be applied with -p2, so it requires manual
> interaction.
>
> You can verify that by cd'ing into the kernel tree base directory and
> run "patch -p1 < your.patch".
>
> You might have a look at quilt or simply use git, which will do the
> right thing for you and in both cases you do not need a separate
> kernel tree to diff against.
>
> > #define FUT_OFF_INODE 1 /* We set bit 0 if key has a reference on inode */
> > @@ -36,17 +39,17 @@ union futex_key {
> > struct {
> > unsigned long pgoff;
> > struct inode *inode;
> > - int offset;
> > + long offset;
>
> unsigned long please, offset can't be negative. The "int" type of
> offset was silly already.
>
> > +/*
> > +* Get subpage index in compound page, for futex_key.
> > +*/
> > +static inline int get_page_compound_index(struct page *page)
> > +{
> > + struct page *head_page;
> > + if (PageHead(page))
> > + return 0;
>
> If you look at the callsite, then you'll see that this is only called
> when page != page_head. And page_head = compound_head(page). So you
> don't need to double check that.
>
> > +
> > + head_page = compound_head(page);
>
> Again. The head page is already known, so you can hand it into the
> function.
>
> > + if (compound_order(head_page) >= MAX_ORDER)
> > + return page_to_pfn(page) - page_to_pfn(head_page);
> > + else
> > + return page - compound_head(page);
> > +}
> > +
>
> Now instead of returning that value, I'd rather hand the futex key
> pointer to the function and let the function add the index
> value. Something like:
>
> static void key_add_compound_idx(key, page, page_head)
> {
> ...
> }
>
> That makes the code simpler and easier to read.
>
> Thanks,
>
> tglx


The futex-keys of processes share futex determined by page-offset,
mapping-host, and mapping-index of the user space address. User
appications using hugepage for futex may lead to futex-key conflict.

Assume there are two or more futexes in diffrent normal pages of the
hugepage, and each futex has the same offset in its normal page,
causing all the futexes have the same futex-key.


Steps to reproduce the bug:
1. The 1st thread map a file of hugetlbfs, and use the return address
as the 1st mutex's
address, and use the return address with PAGE_SIZ
added as the 2nd mutex's addres.;
2. The 1st thread initialize the two mutexes with pshared attribute
and lock the two mutexes.
3. The 1st thread create the 2nd thread, and the 2nd thread block o
the 1st mutex.
4. The 1st thread create the 3rd thread, and the 3rd thread block o
the 2nd mutex.
5. The 1st thread unlock the 2nd mutex, the 3rd thread can not tak
the 2nd mutex, an
may block forever.


Signed-off-by: Zhang Yi <[email protected]>
Tested-by: Ma Chenggong <[email protected]
Reviewed-by: Thomas Gleixner <[email protected]>
Reviewed-by: Darren Hart <[email protected]>
Reviewed-by: Dave Hansen <[email protected]>n>
Reviewed-by: Liu Dong <[email protected]>
Reviewed-by: Cui Yunfeng <[email protected]>
Reviewed-by: Lu Zhongjun <[email protected]>
Reviewed-by: Jiang Biao <[email protected]>


diff -uprN linux3.9-orig/include/linux/futex.h linux3.9/include/linux/futex.h
--- linux3.9-orig/include/linux/futex.h 2013-04-15 00:45:16.000000000 +0000
+++ linux3.9/include/linux/futex.h 2013-04-27 08:59:58.932078000 +0000
@@ -19,7 +19,7 @@ handle_futex_death(u32 __user *uaddr, st
* The key type depends on whether it's a shared or private mapping.
* Don't rearrange members without looking at hash_futex().
*
- * offset is aligned to a multiple of sizeof(u32) (== 4) by definition.
+ * There are three cmponents in offset:
* We use the two low order bits of offset to tell what is the kind of key :
* 00 : Private process futex (PTHREAD_PROCESS_PRIVATE)
* (no reference on an inode or mm)
@@ -27,6 +27,9 @@ handle_futex_death(u32 __user *uaddr, st
* mapped on a file (reference on the underlying inode)
* 10 : Shared futex (PTHREAD_PROCESS_SHARED)
* (but private mapping on an mm, and reference taken on it)
+ * Bits 2 to (PAGE_SHIFT-1) indicates the offset of futex in its page.
+ * The rest hign order bits indicates the index if the page is a
+ * subpage of a compound page.
*/

#define FUT_OFF_INODE 1 /* We set bit 0 if key has a reference on inode */
@@ -36,17 +39,17 @@ union futex_key {
struct {
unsigned long pgoff;
struct inode *inode;
- int offset;
+ unsigned long offset;
} shared;
struct {
unsigned long address;
struct mm_struct *mm;
- int offset;
+ unsigned long offset;
} private;
struct {
unsigned long word;
void *ptr;
- int offset;
+ unsigned long offset;
} both;
};

diff -uprN linux3.9-orig/kernel/futex.c linux3.9/kernel/futex.c
--- linux3.9-orig/kernel/futex.c 2013-04-15 00:45:16.000000000 +0000
+++ linux3.9/kernel/futex.c 2013-05-06 16:24:40.403525000 +0000
@@ -215,6 +215,22 @@ static void drop_futex_key_refs(union fu
}
}

+/*
+* Get subpage index in compound page, and add it into futex_key.
+*/
+static void key_add_compound_idx(union futex_key *key,
+ struct page *head_page, struct page *page)
+{
+ int compound_idx;
+
+ if (compound_order(head_page) >= MAX_ORDER)
+ compound_idx = page_to_pfn(page) - page_to_pfn(head_page);
+ else
+ compound_idx = page - head_page;
+
+ key->both.offset |= compound_idx << PAGE_SHIFT;
+}
+
/**
* get_futex_key() - Get parameters which are the keys for a futex
* @uaddr: virtual address of the futex
@@ -228,7 +244,8 @@ static void drop_futex_key_refs(union fu
* The key words are stored in *key on success.
*
* For shared mappings, it's (page->index, file_inode(vma->vm_file),
- * offset_within_page). For private mappings, it's (uaddr, current->mm).
+ * page_compound_index and offset_within_page).
+ * For private mappings, it's (uaddr, current->mm).
* We can usually work out the index without swapping in the page.
*
* lock_page() might sleep, the caller should not hold a spinlock.
@@ -366,6 +383,8 @@ again:
key->both.offset |= FUT_OFF_INODE; /* inode-based key */
key->shared.inode = page_head->mapping->host;
key->shared.pgoff = page_head->index;
+ if (page != page_head)
+ key_add_compound_idx(key, page_head, page);
}

get_futex_key_refs(key);


2013-05-07 12:34:50

by Zhang Yi

[permalink] [raw]
Subject: RE: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage

It is OK that I send the mail to myself , but there are some wrong while sending to you.
Ignore this mail ,please, I will check and send it again.

> -----Original Message-----
> From: Zhang Yi [mailto:[email protected]]
> Sent: Tuesday, May 07, 2013 8:24 PM
> To: 'Thomas Gleixner'
> Cc: '[email protected]'; 'Peter Zijlstra'; 'Darren Hart'; 'Ingo Molnar'; 'Dave Hansen';
'[email protected]';
> '[email protected]'
> Subject: RE: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
>
> > -----Original Message-----
> > From: Thomas Gleixner [mailto:[email protected]]
> > Sent: Saturday, April 27, 2013 2:26 AM
> > To: Zhang Yi
> > Cc: [email protected]; 'Peter Zijlstra'; 'Darren Hart'; 'Ingo Molnar'; 'Dave Hansen';
[email protected];
> > [email protected]
> > Subject: Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
> >
> > Zhang,
> >
> > On Fri, 26 Apr 2013, Zhang Yi wrote:
> > > At 2013-04-26 04:52:31,"Thomas Gleixner" <[email protected]> wrote:
> > > >
> > > >Unfortunately this did not work out very well.
> > > >
> > > >1. Your patch now lacks a proper changelog which explains the change
> > > >
> > > >2. Your patch lacks any newline characters as you can see below
> > > >
> > >
> > > I am so sorry for my mistakes. : )
> >
> > Nothing to worry about. We all make mistakes! :)
> >
> > > The futex-keys of processes share futex determined by page-offset, mapping-host, and
> > > mapping-index of the user space address.
> > > User appications using hugepage for futex may lead to futex-key conflict.
> > > Assume there are two or more futexes in diffrent normal pages of the hugepage,
> > > and each futex has the same offset in its normal page, causing all the futexes have the same futex-key.
> >
> > Nit-pick: Please format changelog text with a linebreak around 78
> > characters. So it looks like this:
> >
> > The futex-keys of processes share futex determined by page-offset,
> > mapping-host, and mapping-index of the user space address. User
> > appications using hugepage for futex may lead to futex-key conflict.
> >
> > Assume there are two or more futexes in diffrent normal pages of the
> > hugepage, and each futex has the same offset in its normal page,
> > causing all the futexes have the same futex-key.
> >
> > > In that case, futex may not work well.
> >
> > Very nice detective work!
> >
> > > diff -uprN orig/linux3.9-rc7/include/linux/futex.h new/linux3.9-rc7/include/linux/futex.h
> > > --- orig/linux3.9-rc7/include/linux/futex.h 2013-04-15 00:45:16.000000000 +0000
> > > +++ new/linux3.9-rc7/include/linux/futex.h 2013-04-19 16:33:58.725880000 +0000
> >
> > The canonical diff for patch submission is
> >
> > diff -uprN linux3.9-rc7/ linux3.9-rc7.orig/
> >
> > That results in a patch which can be applied with "patch -p1" from the
> > kernel base directory and that's how all our scripts work.
> >
> > Your's needs to be applied with -p2, so it requires manual
> > interaction.
> >
> > You can verify that by cd'ing into the kernel tree base directory and
> > run "patch -p1 < your.patch".
> >
> > You might have a look at quilt or simply use git, which will do the
> > right thing for you and in both cases you do not need a separate
> > kernel tree to diff against.
> >
> > > #define FUT_OFF_INODE 1 /* We set bit 0 if key has a reference on inode */
> > > @@ -36,17 +39,17 @@ union futex_key {
> > > struct {
> > > unsigned long pgoff;
> > > struct inode *inode;
> > > - int offset;
> > > + long offset;
> >
> > unsigned long please, offset can't be negative. The "int" type of
> > offset was silly already.
> >
> > > +/*
> > > +* Get subpage index in compound page, for futex_key.
> > > +*/
> > > +static inline int get_page_compound_index(struct page *page)
> > > +{
> > > + struct page *head_page;
> > > + if (PageHead(page))
> > > + return 0;
> >
> > If you look at the callsite, then you'll see that this is only called
> > when page != page_head. And page_head = compound_head(page). So you
> > don't need to double check that.
> >
> > > +
> > > + head_page = compound_head(page);
> >
> > Again. The head page is already known, so you can hand it into the
> > function.
> >
> > > + if (compound_order(head_page) >= MAX_ORDER)
> > > + return page_to_pfn(page) - page_to_pfn(head_page);
> > > + else
> > > + return page - compound_head(page);
> > > +}
> > > +
> >
> > Now instead of returning that value, I'd rather hand the futex key
> > pointer to the function and let the function add the index
> > value. Something like:
> >
> > static void key_add_compound_idx(key, page, page_head)
> > {
> > ...
> > }
> >
> > That makes the code simpler and easier to read.
> >
> > Thanks,
> >
> > tglx
>
>
> The futex-keys of processes share futex determined by page-offset,
> mapping-host, and mapping-index of the user space address. User
> appications using hugepage for futex may lead to futex-key conflict.
>
> Assume there are two or more futexes in diffrent normal pages of the
> hugepage, and each futex has the same offset in its normal page,
> causing all the futexes have the same futex-key.
>
>
> Steps to reproduce the bug:
> 1. The 1st thread map a file of hugetlbfs, and use the return address
> as the 1st mutex's
> address, and use the return address with PAGE_SIZ
> added as the 2nd mutex's addres.;
> 2. The 1st thread initialize the two mutexes with pshared attribute
> and lock the two mutexes.
> 3. The 1st thread create the 2nd thread, and the 2nd thread block o
> the 1st mutex.
> 4. The 1st thread create the 3rd thread, and the 3rd thread block o
> the 2nd mutex.
> 5. The 1st thread unlock the 2nd mutex, the 3rd thread can not tak
> the 2nd mutex, an
> may block forever.
>
>
> Signed-off-by: Zhang Yi <[email protected]>
> Tested-by: Ma Chenggong <[email protected]
> Reviewed-by: Thomas Gleixner <[email protected]>
> Reviewed-by: Darren Hart <[email protected]>
> Reviewed-by: Dave Hansen <[email protected]>n>
> Reviewed-by: Liu Dong <[email protected]>
> Reviewed-by: Cui Yunfeng <[email protected]>
> Reviewed-by: Lu Zhongjun <[email protected]>
> Reviewed-by: Jiang Biao <[email protected]>
>
>
> diff -uprN linux3.9-orig/include/linux/futex.h linux3.9/include/linux/futex.h
> --- linux3.9-orig/include/linux/futex.h 2013-04-15 00:45:16.000000000 +0000
> +++ linux3.9/include/linux/futex.h 2013-04-27 08:59:58.932078000 +0000
> @@ -19,7 +19,7 @@ handle_futex_death(u32 __user *uaddr, st
> * The key type depends on whether it's a shared or private mapping.
> * Don't rearrange members without looking at hash_futex().
> *
> - * offset is aligned to a multiple of sizeof(u32) (== 4) by definition.
> + * There are three cmponents in offset:
> * We use the two low order bits of offset to tell what is the kind of key :
> * 00 : Private process futex (PTHREAD_PROCESS_PRIVATE)
> * (no reference on an inode or mm)
> @@ -27,6 +27,9 @@ handle_futex_death(u32 __user *uaddr, st
> * mapped on a file (reference on the underlying inode)
> * 10 : Shared futex (PTHREAD_PROCESS_SHARED)
> * (but private mapping on an mm, and reference taken on it)
> + * Bits 2 to (PAGE_SHIFT-1) indicates the offset of futex in its page.
> + * The rest hign order bits indicates the index if the page is a
> + * subpage of a compound page.
> */
>
> #define FUT_OFF_INODE 1 /* We set bit 0 if key has a reference on inode */
> @@ -36,17 +39,17 @@ union futex_key {
> struct {
> unsigned long pgoff;
> struct inode *inode;
> - int offset;
> + unsigned long offset;
> } shared;
> struct {
> unsigned long address;
> struct mm_struct *mm;
> - int offset;
> + unsigned long offset;
> } private;
> struct {
> unsigned long word;
> void *ptr;
> - int offset;
> + unsigned long offset;
> } both;
> };
>
> diff -uprN linux3.9-orig/kernel/futex.c linux3.9/kernel/futex.c
> --- linux3.9-orig/kernel/futex.c 2013-04-15 00:45:16.000000000 +0000
> +++ linux3.9/kernel/futex.c 2013-05-06 16:24:40.403525000 +0000
> @@ -215,6 +215,22 @@ static void drop_futex_key_refs(union fu
> }
> }
>
> +/*
> +* Get subpage index in compound page, and add it into futex_key.
> +*/
> +static void key_add_compound_idx(union futex_key *key,
> + struct page *head_page, struct page *page)
> +{
> + int compound_idx;
> +
> + if (compound_order(head_page) >= MAX_ORDER)
> + compound_idx = page_to_pfn(page) - page_to_pfn(head_page);
> + else
> + compound_idx = page - head_page;
> +
> + key->both.offset |= compound_idx << PAGE_SHIFT;
> +}
> +
> /**
> * get_futex_key() - Get parameters which are the keys for a futex
> * @uaddr: virtual address of the futex
> @@ -228,7 +244,8 @@ static void drop_futex_key_refs(union fu
> * The key words are stored in *key on success.
> *
> * For shared mappings, it's (page->index, file_inode(vma->vm_file),
> - * offset_within_page). For private mappings, it's (uaddr, current->mm).
> + * page_compound_index and offset_within_page).
> + * For private mappings, it's (uaddr, current->mm).
> * We can usually work out the index without swapping in the page.
> *
> * lock_page() might sleep, the caller should not hold a spinlock.
> @@ -366,6 +383,8 @@ again:
> key->both.offset |= FUT_OFF_INODE; /* inode-based key */
> key->shared.inode = page_head->mapping->host;
> key->shared.pgoff = page_head->index;
> + if (page != page_head)
> + key_add_compound_idx(key, page_head, page);
> }
>
> get_futex_key_refs(key);


2013-05-07 12:44:10

by Zhang Yi

[permalink] [raw]
Subject: RE: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage

Hi,

The futex-keys of processes share futex determined by page-offset,
mapping-host, and mapping-index of the user space address. User
appications using hugepage for futex may lead to futex-key conflict.

Assume there are two or more futexes in diffrent normal pages of the
hugepage, and each futex has the same offset in its normal page,
causing all the futexes have the same futex-key.


Steps to reproduce the bug:
1. The 1st thread map a file of hugetlbfs, and use the return address
as the 1st mutex's address, and use the return address with PAGE_SIZE
added as the 2nd mutex's address.
2. The 1st thread initialize the two mutexes with pshared attribute,
and lock the two mutexes.
3. The 1st thread create the 2nd thread, and the 2nd thread block on
the 1st mutex.
4. The 1st thread create the 3rd thread, and the 3rd thread block on
the 2nd mutex.
5. The 1st thread unlock the 2nd mutex, the 3rd thread cannot take
the 2nd mutex, and may block forever.


Signed-off-by: Zhang Yi <[email protected]>
Tested-by: Ma Chenggong <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Reviewed-by: Darren Hart <[email protected]>
Reviewed-by: Dave Hansen <[email protected]>
Reviewed-by: Liu Dong <[email protected]>
Reviewed-by: Cui Yunfeng <[email protected]>
Reviewed-by: Lu Zhongjun <[email protected]>
Reviewed-by: Jiang Biao <[email protected]>


diff -uprN linux3.9-orig/include/linux/futex.h linux3.9/include/linux/futex.h
--- linux3.9-orig/include/linux/futex.h 2013-04-15 00:45:16.000000000 +0000
+++ linux3.9/include/linux/futex.h 2013-04-27 08:59:58.932078000 +0000
@@ -19,7 +19,7 @@ handle_futex_death(u32 __user *uaddr, st
* The key type depends on whether it's a shared or private mapping.
* Don't rearrange members without looking at hash_futex().
*
- * offset is aligned to a multiple of sizeof(u32) (== 4) by definition.
+ * There are three cmponents in offset:
* We use the two low order bits of offset to tell what is the kind of key :
* 00 : Private process futex (PTHREAD_PROCESS_PRIVATE)
* (no reference on an inode or mm)
@@ -27,6 +27,9 @@ handle_futex_death(u32 __user *uaddr, st
* mapped on a file (reference on the underlying inode)
* 10 : Shared futex (PTHREAD_PROCESS_SHARED)
* (but private mapping on an mm, and reference taken on it)
+ * Bits 2 to (PAGE_SHIFT-1) indicates the offset of futex in its page.
+ * The rest hign order bits indicates the index if the page is a
+ * subpage of a compound page.
*/

#define FUT_OFF_INODE 1 /* We set bit 0 if key has a reference on inode */
@@ -36,17 +39,17 @@ union futex_key {
struct {
unsigned long pgoff;
struct inode *inode;
- int offset;
+ unsigned long offset;
} shared;
struct {
unsigned long address;
struct mm_struct *mm;
- int offset;
+ unsigned long offset;
} private;
struct {
unsigned long word;
void *ptr;
- int offset;
+ unsigned long offset;
} both;
};

diff -uprN linux3.9-orig/kernel/futex.c linux3.9/kernel/futex.c
--- linux3.9-orig/kernel/futex.c 2013-04-15 00:45:16.000000000 +0000
+++ linux3.9/kernel/futex.c 2013-05-06 16:24:40.403525000 +0000
@@ -215,6 +215,22 @@ static void drop_futex_key_refs(union fu
}
}

+/*
+* Get subpage index in compound page, and add it into futex_key.
+*/
+static void key_add_compound_idx(union futex_key *key,
+ struct page *head_page, struct page *page)
+{
+ int compound_idx;
+
+ if (compound_order(head_page) >= MAX_ORDER)
+ compound_idx = page_to_pfn(page) - page_to_pfn(head_page);
+ else
+ compound_idx = page - head_page;
+
+ key->both.offset |= compound_idx << PAGE_SHIFT;
+}
+
/**
* get_futex_key() - Get parameters which are the keys for a futex
* @uaddr: virtual address of the futex
@@ -228,7 +244,8 @@ static void drop_futex_key_refs(union fu
* The key words are stored in *key on success.
*
* For shared mappings, it's (page->index, file_inode(vma->vm_file),
- * offset_within_page). For private mappings, it's (uaddr, current->mm).
+ * page_compound_index and offset_within_page).
+ * For private mappings, it's (uaddr, current->mm).
* We can usually work out the index without swapping in the page.
*
* lock_page() might sleep, the caller should not hold a spinlock.
@@ -366,6 +383,8 @@ again:
key->both.offset |= FUT_OFF_INODE; /* inode-based key */
key->shared.inode = page_head->mapping->host;
key->shared.pgoff = page_head->index;
+ if (page != page_head)
+ key_add_compound_idx(key, page_head, page);
}

get_futex_key_refs(key);


2013-05-07 15:20:16

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage

On Tue, May 07, 2013 at 08:23:48PM +0800, Zhang Yi wrote:
> diff -uprN linux3.9-orig/kernel/futex.c linux3.9/kernel/futex.c
> --- linux3.9-orig/kernel/futex.c 2013-04-15 00:45:16.000000000 +0000
> +++ linux3.9/kernel/futex.c 2013-05-06 16:24:40.403525000 +0000
> @@ -215,6 +215,22 @@ static void drop_futex_key_refs(union fu
> }
> }
>
> +/*
> +* Get subpage index in compound page, and add it into futex_key.
> +*/
> +static void key_add_compound_idx(union futex_key *key,
> + struct page *head_page, struct page *page)
> +{
> + int compound_idx;
> +
> + if (compound_order(head_page) >= MAX_ORDER)
> + compound_idx = page_to_pfn(page) - page_to_pfn(head_page);
> + else
> + compound_idx = page - head_page;
> +
> + key->both.offset |= compound_idx << PAGE_SHIFT;
> +}
> +

This implicitely assumies it is dealing with a hugetlbfs page. Today, it
is the case that an inode-based futex with PageCompound is a hugetlbfs
page but that could change in the future if THP ever backs files. This
would then break again except it would be harder to fix because THP pages
can be collapsed underneath you after the futex key has been generated.

As this problem is hugetlbfs-specific should the fix be firmly in hugetlbfs
land? Something like the following untested and only partial diff? Is the
use of PageCompound in the futex path like this going to be problematic?

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 16e4e9a..f9c33d3 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -348,6 +348,17 @@ static inline int hstate_index(struct hstate *h)
return h - hstates;
}

+pgoff_t __basepage_index(struct page *page);
+
+/* Return page->index in PAGE_SIZE units */
+static inline pgoff_t basepage_index(struct page *page)
+{
+ if (!PageCompound(page))
+ return page->index;
+
+ return __basepage_index(page);
+}
+
#else
struct hstate {};
#define alloc_huge_page_node(h, nid) NULL
@@ -365,6 +376,10 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
{
return 1;
}
+static inline pgoff_t basepage_index(struct page *page)
+{
+ return page->index;
+}
#define hstate_index_to_shift(index) 0
#define hstate_index(h) 0
#endif
diff --git a/kernel/futex.c b/kernel/futex.c
index b26dcfc..97beb5d 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -61,6 +61,7 @@
#include <linux/nsproxy.h>
#include <linux/ptrace.h>
#include <linux/sched/rt.h>
+#include <linux/hugetlb.h>

#include <asm/futex.h>

@@ -365,7 +366,7 @@ again:
} else {
key->both.offset |= FUT_OFF_INODE; /* inode-based key */
key->shared.inode = page_head->mapping->host;
- key->shared.pgoff = page_head->index;
+ key->shared.pgoff = basepage_index(page_head);
}

get_futex_key_refs(key);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 1a12f5b..ddbad35 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -690,6 +690,23 @@ int PageHuge(struct page *page)
}
EXPORT_SYMBOL_GPL(PageHuge);

+pgoff_t __basepage_index(struct page *page)
+{
+ struct page *page_head = compound_head(page);
+ pgoff_t index = page_index(page_head);
+ int compound_idx;
+
+ if (!PageHuge(page_head))
+ return page_index(page);
+
+ if (compound_order(page_head) >= MAX_ORDER)
+ compound_idx = page_to_pfn(page) - page_to_pfn(page_head);
+ else
+ compound_idx = page - head_page;
+
+ return (index << page_hstate(page_head)->order) + compound_idx;
+}
+
static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid)
{
struct page *page;

2013-05-07 15:25:05

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage



On Tue, 7 May 2013, Mel Gorman wrote:

> On Tue, May 07, 2013 at 08:23:48PM +0800, Zhang Yi wrote:
> > diff -uprN linux3.9-orig/kernel/futex.c linux3.9/kernel/futex.c
> > --- linux3.9-orig/kernel/futex.c 2013-04-15 00:45:16.000000000 +0000
> > +++ linux3.9/kernel/futex.c 2013-05-06 16:24:40.403525000 +0000
> > @@ -215,6 +215,22 @@ static void drop_futex_key_refs(union fu
> > }
> > }
> >
> > +/*
> > +* Get subpage index in compound page, and add it into futex_key.
> > +*/
> > +static void key_add_compound_idx(union futex_key *key,
> > + struct page *head_page, struct page *page)
> > +{
> > + int compound_idx;
> > +
> > + if (compound_order(head_page) >= MAX_ORDER)
> > + compound_idx = page_to_pfn(page) - page_to_pfn(head_page);
> > + else
> > + compound_idx = page - head_page;
> > +
> > + key->both.offset |= compound_idx << PAGE_SHIFT;
> > +}
> > +
>
> This implicitely assumies it is dealing with a hugetlbfs page. Today, it
> is the case that an inode-based futex with PageCompound is a hugetlbfs
> page but that could change in the future if THP ever backs files. This
> would then break again except it would be harder to fix because THP pages
> can be collapsed underneath you after the futex key has been generated.
>
> As this problem is hugetlbfs-specific should the fix be firmly in hugetlbfs
> land? Something like the following untested and only partial diff? Is the
> use of PageCompound in the futex path like this going to be problematic?

Why should it ?

> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 16e4e9a..f9c33d3 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -348,6 +348,17 @@ static inline int hstate_index(struct hstate *h)
> return h - hstates;
> }
>
> +pgoff_t __basepage_index(struct page *page);
> +
> +/* Return page->index in PAGE_SIZE units */
> +static inline pgoff_t basepage_index(struct page *page)
> +{
> + if (!PageCompound(page))
> + return page->index;
> +
> + return __basepage_index(page);
> +}
> +
> #else
> struct hstate {};
> #define alloc_huge_page_node(h, nid) NULL
> @@ -365,6 +376,10 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
> {
> return 1;
> }
> +static inline pgoff_t basepage_index(struct page *page)
> +{
> + return page->index;
> +}
> #define hstate_index_to_shift(index) 0
> #define hstate_index(h) 0
> #endif
> diff --git a/kernel/futex.c b/kernel/futex.c
> index b26dcfc..97beb5d 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -61,6 +61,7 @@
> #include <linux/nsproxy.h>
> #include <linux/ptrace.h>
> #include <linux/sched/rt.h>
> +#include <linux/hugetlb.h>
>
> #include <asm/futex.h>
>
> @@ -365,7 +366,7 @@ again:
> } else {
> key->both.offset |= FUT_OFF_INODE; /* inode-based key */
> key->shared.inode = page_head->mapping->host;
> - key->shared.pgoff = page_head->index;
> + key->shared.pgoff = basepage_index(page_head);

That want's to be basepage_index(page), right ?

> }
>
> get_futex_key_refs(key);
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 1a12f5b..ddbad35 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -690,6 +690,23 @@ int PageHuge(struct page *page)
> }
> EXPORT_SYMBOL_GPL(PageHuge);
>
> +pgoff_t __basepage_index(struct page *page)
> +{
> + struct page *page_head = compound_head(page);
> + pgoff_t index = page_index(page_head);
> + int compound_idx;
> +
> + if (!PageHuge(page_head))
> + return page_index(page);
> +
> + if (compound_order(page_head) >= MAX_ORDER)
> + compound_idx = page_to_pfn(page) - page_to_pfn(page_head);
> + else
> + compound_idx = page - head_page;
> +
> + return (index << page_hstate(page_head)->order) + compound_idx;
> +}
> +
> static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid)
> {
> struct page *page;
>

2013-05-07 15:54:18

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage

On Tue, May 07, 2013 at 05:24:57PM +0200, Thomas Gleixner wrote:
>
>
> On Tue, 7 May 2013, Mel Gorman wrote:
>
> > On Tue, May 07, 2013 at 08:23:48PM +0800, Zhang Yi wrote:
> > > diff -uprN linux3.9-orig/kernel/futex.c linux3.9/kernel/futex.c
> > > --- linux3.9-orig/kernel/futex.c 2013-04-15 00:45:16.000000000 +0000
> > > +++ linux3.9/kernel/futex.c 2013-05-06 16:24:40.403525000 +0000
> > > @@ -215,6 +215,22 @@ static void drop_futex_key_refs(union fu
> > > }
> > > }
> > >
> > > +/*
> > > +* Get subpage index in compound page, and add it into futex_key.
> > > +*/
> > > +static void key_add_compound_idx(union futex_key *key,
> > > + struct page *head_page, struct page *page)
> > > +{
> > > + int compound_idx;
> > > +
> > > + if (compound_order(head_page) >= MAX_ORDER)
> > > + compound_idx = page_to_pfn(page) - page_to_pfn(head_page);
> > > + else
> > > + compound_idx = page - head_page;
> > > +
> > > + key->both.offset |= compound_idx << PAGE_SHIFT;
> > > +}
> > > +
> >
> > This implicitely assumies it is dealing with a hugetlbfs page. Today, it
> > is the case that an inode-based futex with PageCompound is a hugetlbfs
> > page but that could change in the future if THP ever backs files. This
> > would then break again except it would be harder to fix because THP pages
> > can be collapsed underneath you after the futex key has been generated.
> >
> > As this problem is hugetlbfs-specific should the fix be firmly in hugetlbfs
> > land? Something like the following untested and only partial diff? Is the
> > use of PageCompound in the futex path like this going to be problematic?
>
> Why should it ?
>

The comment for it states that it is "generally not used in hot code
paths" but it's a light-weight check that the cache lines should already
be fetched for. I doubt that the overhead of this check versus
page_head == page is noticable.

> > @@ -365,7 +366,7 @@ again:
> > } else {
> > key->both.offset |= FUT_OFF_INODE; /* inode-based key */
> > key->shared.inode = page_head->mapping->host;
> > - key->shared.pgoff = page_head->index;
> > + key->shared.pgoff = basepage_index(page_head);
>
> That want's to be basepage_index(page), right ?
>

BAH, yes.

--
Mel Gorman
SUSE Labs

2013-05-10 09:09:29

by zhang.yi20

[permalink] [raw]
Subject: Re: Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage



Mel Gorman <[email protected]> wrote on 2013/05/07 23:20:07:

>
> Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
>
> On Tue, May 07, 2013 at 08:23:48PM +0800, Zhang Yi wrote:
> > diff -uprN linux3.9-orig/kernel/futex.c linux3.9/kernel/futex.c
> > --- linux3.9-orig/kernel/futex.c 2013-04-15 00:45:16.000000000 +0000
> > +++ linux3.9/kernel/futex.c 2013-05-06 16:24:40.403525000 +0000
> > @@ -215,6 +215,22 @@ static void drop_futex_key_refs(union fu
> > }
> > }
> >
> > +/*
> > +* Get subpage index in compound page, and add it into futex_key.
> > +*/
> > +static void key_add_compound_idx(union futex_key *key,
> > + struct page *head_page, struct page *page)
> > +{
> > + int compound_idx;
> > +
> > + if (compound_order(head_page) >= MAX_ORDER)
> > + compound_idx = page_to_pfn(page) - page_to_pfn(head_page);
> > + else
> > + compound_idx = page - head_page;
> > +
> > + key->both.offset |= compound_idx << PAGE_SHIFT;
> > +}
> > +
>
> This implicitely assumies it is dealing with a hugetlbfs page. Today, it
> is the case that an inode-based futex with PageCompound is a hugetlbfs
> page but that could change in the future if THP ever backs files. This
> would then break again except it would be harder to fix because THP pages
> can be collapsed underneath you after the futex key has been generated.
>
> As this problem is hugetlbfs-specific should the fix be firmly in
hugetlbfs

I think we should do.
Eg, user applications want high performance, they may use DPDK which using
hugetlbfs.


Should I rework the patch like the following code, and test it?

> land? Something like the following untested and only partial diff? Is the
> use of PageCompound in the futex path like this going to be problematic?
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 16e4e9a..f9c33d3 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -348,6 +348,17 @@ static inline int hstate_index(struct hstate *h)
> return h - hstates;
> }
>
> +pgoff_t __basepage_index(struct page *page);
> +
> +/* Return page->index in PAGE_SIZE units */
> +static inline pgoff_t basepage_index(struct page *page)
> +{
> + if (!PageCompound(page))
> + return page->index;
> +
> + return __basepage_index(page);
> +}
> +
> #else
> struct hstate {};
> #define alloc_huge_page_node(h, nid) NULL
> @@ -365,6 +376,10 @@ static inline unsigned int pages_per_huge_page
> (struct hstate *h)
> {
> return 1;
> }
> +static inline pgoff_t basepage_index(struct page *page)
> +{
> + return page->index;
> +}
> #define hstate_index_to_shift(index) 0
> #define hstate_index(h) 0
> #endif
> diff --git a/kernel/futex.c b/kernel/futex.c
> index b26dcfc..97beb5d 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -61,6 +61,7 @@
> #include <linux/nsproxy.h>
> #include <linux/ptrace.h>
> #include <linux/sched/rt.h>
> +#include <linux/hugetlb.h>
>
> #include <asm/futex.h>
>
> @@ -365,7 +366,7 @@ again:
> } else {
> key->both.offset |= FUT_OFF_INODE; /* inode-based key */
> key->shared.inode = page_head->mapping->host;
> - key->shared.pgoff = page_head->index;
> + key->shared.pgoff = basepage_index(page_head);
> }
>
> get_futex_key_refs(key);
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 1a12f5b..ddbad35 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -690,6 +690,23 @@ int PageHuge(struct page *page)
> }
> EXPORT_SYMBOL_GPL(PageHuge);
>
> +pgoff_t __basepage_index(struct page *page)
> +{
> + struct page *page_head = compound_head(page);
> + pgoff_t index = page_index(page_head);
> + int compound_idx;
> +
> + if (!PageHuge(page_head))
> + return page_index(page);
> +
> + if (compound_order(page_head) >= MAX_ORDER)
> + compound_idx = page_to_pfn(page) - page_to_pfn(page_head);
> + else
> + compound_idx = page - head_page;
> +
> + return (index << page_hstate(page_head)->order) + compound_idx;
> +}
> +
> static struct page *alloc_fresh_huge_page_node(struct hstate *h, int
nid)
> {
> struct page *page;

2013-05-10 09:42:32

by Mel Gorman

[permalink] [raw]
Subject: Re: Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage

On Fri, May 10, 2013 at 05:08:30PM +0800, [email protected] wrote:
>
>
> Mel Gorman <[email protected]> wrote on 2013/05/07 23:20:07:
>
> >
> > Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
> >
> > On Tue, May 07, 2013 at 08:23:48PM +0800, Zhang Yi wrote:
> > > diff -uprN linux3.9-orig/kernel/futex.c linux3.9/kernel/futex.c
> > > --- linux3.9-orig/kernel/futex.c 2013-04-15 00:45:16.000000000 +0000
> > > +++ linux3.9/kernel/futex.c 2013-05-06 16:24:40.403525000 +0000
> > > @@ -215,6 +215,22 @@ static void drop_futex_key_refs(union fu
> > > }
> > > }
> > >
> > > +/*
> > > +* Get subpage index in compound page, and add it into futex_key.
> > > +*/
> > > +static void key_add_compound_idx(union futex_key *key,
> > > + struct page *head_page, struct page *page)
> > > +{
> > > + int compound_idx;
> > > +
> > > + if (compound_order(head_page) >= MAX_ORDER)
> > > + compound_idx = page_to_pfn(page) - page_to_pfn(head_page);
> > > + else
> > > + compound_idx = page - head_page;
> > > +
> > > + key->both.offset |= compound_idx << PAGE_SHIFT;
> > > +}
> > > +
> >
> > This implicitely assumies it is dealing with a hugetlbfs page. Today, it
> > is the case that an inode-based futex with PageCompound is a hugetlbfs
> > page but that could change in the future if THP ever backs files. This
> > would then break again except it would be harder to fix because THP pages
> > can be collapsed underneath you after the futex key has been generated.
> >
> > As this problem is hugetlbfs-specific should the fix be firmly in
> hugetlbfs
>
> I think we should do.
> Eg, user applications want high performance, they may use DPDK which using
> hugetlbfs.
>
>
> Should I rework the patch like the following code, and test it?
>

Yes please, making sure to fix the bug Thomas pointed out.

--
Mel Gorman
SUSE Labs