2010-06-17 13:53:31

by Dan Carpenter

[permalink] [raw]
Subject: [patch] x86, pat: freeing invalid memtype messages

Commit 20413f27163 "x86, pat: Fix memory leak in free_memtype" added an
error message in free_memtype() if rbt_memtype_erase() returns NULL.
The problem is that if CONFIG_X86_PAT is enabled, we use a different
implimentation of rbt_memtype_erase() that always returns NULL.

I've modified rbt_memtype_erase() to return an ERR_PTR() on errors and
made free_memtype() check for that instead.

Addresses: https://bugzilla.kernel.org/show_bug.cgi?id=16205

Signed-off-by: Dan Carpenter <[email protected]>

diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index acc15b2..81b7735 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -359,10 +359,10 @@ int free_memtype(u64 start, u64 end)
entry = rbt_memtype_erase(start, end);
spin_unlock(&memtype_lock);

- if (!entry) {
+ if (IS_ERR(entry)) {
printk(KERN_INFO "%s:%d freeing invalid memtype %Lx-%Lx\n",
current->comm, current->pid, start, end);
- return -EINVAL;
+ return PTR_ERR(entry);
}

kfree(entry);
diff --git a/arch/x86/mm/pat_rbtree.c b/arch/x86/mm/pat_rbtree.c
index f537087..90e5cbe 100644
--- a/arch/x86/mm/pat_rbtree.c
+++ b/arch/x86/mm/pat_rbtree.c
@@ -236,8 +236,10 @@ struct memtype *rbt_memtype_erase(u64 start, u64 end)
struct memtype *data;

data = memtype_rb_exact_match(&memtype_rbroot, start, end);
- if (!data)
+ if (!data) {
+ data = ERR_PTR(-EINVAL);
goto out;
+ }

rb_erase(&data->rb, &memtype_rbroot);
out:


2010-06-17 16:19:28

by Marcin Slusarz

[permalink] [raw]
Subject: Re: [patch] x86, pat: freeing invalid memtype messages

On Thu, Jun 17, 2010 at 03:45:59PM +0200, Dan Carpenter wrote:
> Commit 20413f27163 "x86, pat: Fix memory leak in free_memtype" added an
> error message in free_memtype() if rbt_memtype_erase() returns NULL.
> The problem is that if CONFIG_X86_PAT is enabled, we use a different
> implimentation of rbt_memtype_erase() that always returns NULL.
>
> I've modified rbt_memtype_erase() to return an ERR_PTR() on errors and
> made free_memtype() check for that instead.
>
> Addresses: https://bugzilla.kernel.org/show_bug.cgi?id=16205
>
> Signed-off-by: Dan Carpenter <[email protected]>

This patch is probably ok, but it does not address my bug.
I have CONFIG_X86_PAT=y, so rbt_memtype_erase does not always return NULL.

> diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
> index acc15b2..81b7735 100644
> --- a/arch/x86/mm/pat.c
> +++ b/arch/x86/mm/pat.c
> @@ -359,10 +359,10 @@ int free_memtype(u64 start, u64 end)
> entry = rbt_memtype_erase(start, end);
> spin_unlock(&memtype_lock);
>
> - if (!entry) {
> + if (IS_ERR(entry)) {
> printk(KERN_INFO "%s:%d freeing invalid memtype %Lx-%Lx\n",
> current->comm, current->pid, start, end);
> - return -EINVAL;
> + return PTR_ERR(entry);
> }
>
> kfree(entry);
> diff --git a/arch/x86/mm/pat_rbtree.c b/arch/x86/mm/pat_rbtree.c
> index f537087..90e5cbe 100644
> --- a/arch/x86/mm/pat_rbtree.c
> +++ b/arch/x86/mm/pat_rbtree.c
> @@ -236,8 +236,10 @@ struct memtype *rbt_memtype_erase(u64 start, u64 end)
> struct memtype *data;
>
> data = memtype_rb_exact_match(&memtype_rbroot, start, end);
> - if (!data)
> + if (!data) {
> + data = ERR_PTR(-EINVAL);
> goto out;
> + }
>
> rb_erase(&data->rb, &memtype_rbroot);
> out:
>
>

2010-06-17 16:33:55

by Dan Carpenter

[permalink] [raw]
Subject: Re: [patch] x86, pat: freeing invalid memtype messages

On Thu, Jun 17, 2010 at 06:17:28PM +0200, Marcin Slusarz wrote:
> On Thu, Jun 17, 2010 at 03:45:59PM +0200, Dan Carpenter wrote:
> > Commit 20413f27163 "x86, pat: Fix memory leak in free_memtype" added an
> > error message in free_memtype() if rbt_memtype_erase() returns NULL.
> > The problem is that if CONFIG_X86_PAT is enabled, we use a different
> > implimentation of rbt_memtype_erase() that always returns NULL.
> >
> > I've modified rbt_memtype_erase() to return an ERR_PTR() on errors and
> > made free_memtype() check for that instead.
> >
> > Addresses: https://bugzilla.kernel.org/show_bug.cgi?id=16205
> >
> > Signed-off-by: Dan Carpenter <[email protected]>
>
> This patch is probably ok, but it does not address my bug.
> I have CONFIG_X86_PAT=y, so rbt_memtype_erase does not always return NULL.
>

Uh. Yeah. Crap. I'm an idiot.

regrds,
dan carpenter

2010-06-18 01:58:51

by Xiaotian Feng

[permalink] [raw]
Subject: Re: [patch] x86, pat: freeing invalid memtype messages

On 06/18/2010 12:17 AM, Marcin Slusarz wrote:
> On Thu, Jun 17, 2010 at 03:45:59PM +0200, Dan Carpenter wrote:
>> Commit 20413f27163 "x86, pat: Fix memory leak in free_memtype" added an
>> error message in free_memtype() if rbt_memtype_erase() returns NULL.
>> The problem is that if CONFIG_X86_PAT is enabled, we use a different
>> implimentation of rbt_memtype_erase() that always returns NULL.
>>
>> I've modified rbt_memtype_erase() to return an ERR_PTR() on errors and
>> made free_memtype() check for that instead.
>>
>> Addresses: https://bugzilla.kernel.org/show_bug.cgi?id=16205
>>
>> Signed-off-by: Dan Carpenter<[email protected]>
>
> This patch is probably ok, but it does not address my bug.
> I have CONFIG_X86_PAT=y, so rbt_memtype_erase does not always return NULL.

The reason for the warning "swapper:1 freeing invalid memtype \
bf799000-bf79a000"
could be two callers reserved "bf799000 - bf79a000". The two callers has
the same reserve area and same memtype, so the sencond caller will also
success to reserve "bf799000 - bf79a000".

But at the free stage, if one caller freed "bf799000 - bf79a000", then
another caller is trying to free "bf799000 - bf79a000", can not find it
in the rbtree, so pop up an invalid memtype.

>
>> diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
>> index acc15b2..81b7735 100644
>> --- a/arch/x86/mm/pat.c
>> +++ b/arch/x86/mm/pat.c
>> @@ -359,10 +359,10 @@ int free_memtype(u64 start, u64 end)
>> entry = rbt_memtype_erase(start, end);
>> spin_unlock(&memtype_lock);
>>
>> - if (!entry) {
>> + if (IS_ERR(entry)) {
>> printk(KERN_INFO "%s:%d freeing invalid memtype %Lx-%Lx\n",
>> current->comm, current->pid, start, end);
>> - return -EINVAL;
>> + return PTR_ERR(entry);
>> }
>>
>> kfree(entry);
>> diff --git a/arch/x86/mm/pat_rbtree.c b/arch/x86/mm/pat_rbtree.c
>> index f537087..90e5cbe 100644
>> --- a/arch/x86/mm/pat_rbtree.c
>> +++ b/arch/x86/mm/pat_rbtree.c
>> @@ -236,8 +236,10 @@ struct memtype *rbt_memtype_erase(u64 start, u64 end)
>> struct memtype *data;
>>
>> data = memtype_rb_exact_match(&memtype_rbroot, start, end);
>> - if (!data)
>> + if (!data) {
>> + data = ERR_PTR(-EINVAL);
>> goto out;
>> + }
>>
>> rb_erase(&data->rb,&memtype_rbroot);
>> out:
>>
>>
>

2010-06-18 06:48:05

by Xiaotian Feng

[permalink] [raw]
Subject: Re: [patch] x86, pat: freeing invalid memtype messages

On 06/18/2010 12:17 AM, Marcin Slusarz wrote:
> On Thu, Jun 17, 2010 at 03:45:59PM +0200, Dan Carpenter wrote:
>> Commit 20413f27163 "x86, pat: Fix memory leak in free_memtype" added an
>> error message in free_memtype() if rbt_memtype_erase() returns NULL.
>> The problem is that if CONFIG_X86_PAT is enabled, we use a different
>> implimentation of rbt_memtype_erase() that always returns NULL.
>>
>> I've modified rbt_memtype_erase() to return an ERR_PTR() on errors and
>> made free_memtype() check for that instead.
>>
>> Addresses: https://bugzilla.kernel.org/show_bug.cgi?id=16205
>>
>> Signed-off-by: Dan Carpenter<[email protected]>
>
> This patch is probably ok, but it does not address my bug.
> I have CONFIG_X86_PAT=y, so rbt_memtype_erase does not always return NULL.

Could you please try boot with kernel parameter "debugpat", and
show me the output of reserve_memtype/free_memtype ?

>
>> diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
>> index acc15b2..81b7735 100644
>> --- a/arch/x86/mm/pat.c
>> +++ b/arch/x86/mm/pat.c
>> @@ -359,10 +359,10 @@ int free_memtype(u64 start, u64 end)
>> entry = rbt_memtype_erase(start, end);
>> spin_unlock(&memtype_lock);
>>
>> - if (!entry) {
>> + if (IS_ERR(entry)) {
>> printk(KERN_INFO "%s:%d freeing invalid memtype %Lx-%Lx\n",
>> current->comm, current->pid, start, end);
>> - return -EINVAL;
>> + return PTR_ERR(entry);
>> }
>>
>> kfree(entry);
>> diff --git a/arch/x86/mm/pat_rbtree.c b/arch/x86/mm/pat_rbtree.c
>> index f537087..90e5cbe 100644
>> --- a/arch/x86/mm/pat_rbtree.c
>> +++ b/arch/x86/mm/pat_rbtree.c
>> @@ -236,8 +236,10 @@ struct memtype *rbt_memtype_erase(u64 start, u64 end)
>> struct memtype *data;
>>
>> data = memtype_rb_exact_match(&memtype_rbroot, start, end);
>> - if (!data)
>> + if (!data) {
>> + data = ERR_PTR(-EINVAL);
>> goto out;
>> + }
>>
>> rb_erase(&data->rb,&memtype_rbroot);
>> out:
>>
>>
>

2010-06-18 17:59:29

by Marcin Slusarz

[permalink] [raw]
Subject: Re: [patch] x86, pat: freeing invalid memtype messages

On Fri, Jun 18, 2010 at 02:47:45PM +0800, Xiaotian Feng wrote:
> On 06/18/2010 12:17 AM, Marcin Slusarz wrote:
> > On Thu, Jun 17, 2010 at 03:45:59PM +0200, Dan Carpenter wrote:
> >> Commit 20413f27163 "x86, pat: Fix memory leak in free_memtype" added an
> >> error message in free_memtype() if rbt_memtype_erase() returns NULL.
> >> The problem is that if CONFIG_X86_PAT is enabled, we use a different
> >> implimentation of rbt_memtype_erase() that always returns NULL.
> >>
> >> I've modified rbt_memtype_erase() to return an ERR_PTR() on errors and
> >> made free_memtype() check for that instead.
> >>
> >> Addresses: https://bugzilla.kernel.org/show_bug.cgi?id=16205
> >>
> >> Signed-off-by: Dan Carpenter<[email protected]>
> >
> > This patch is probably ok, but it does not address my bug.
> > I have CONFIG_X86_PAT=y, so rbt_memtype_erase does not always return NULL.
>
> Could you please try boot with kernel parameter "debugpat", and
> show me the output of reserve_memtype/free_memtype ?
>

reserve_memtype added 0xfed00000-0xfed01000, track uncached-minus, req uncached-minus, ret uncached-minus
reserve_memtype added 0xbf780000-0xbf78d000, track uncached-minus, req uncached-minus, ret uncached-minus
reserve_memtype added 0xbf79a000-0xbf79d000, track uncached-minus, req uncached-minus, ret uncached-minus
reserve_memtype added 0xbf798000-0xbf799000, track uncached-minus, req uncached-minus, ret uncached-minus
reserve_memtype added 0xbf780000-0xbf781000, track uncached-minus, req uncached-minus, ret uncached-minus
reserve_memtype added 0xbf798000-0xbf799000, track uncached-minus, req uncached-minus, ret uncached-minus
reserve_memtype added 0xbf7dc000-0xbf7dd000, track uncached-minus, req uncached-minus, ret uncached-minus
reserve_memtype added 0xbf780000-0xbf781000, track uncached-minus, req uncached-minus, ret uncached-minus
reserve_memtype added 0xbf798000-0xbf799000, track uncached-minus, req uncached-minus, ret uncached-minus
reserve_memtype added 0xbf780000-0xbf781000, track uncached-minus, req uncached-minus, ret uncached-minus
reserve_memtype added 0xbf798000-0xbf799000, track uncached-minus, req uncached-minus, ret uncached-minus
reserve_memtype added 0xbf78f000-0xbf790000, track uncached-minus, req uncached-minus, ret uncached-minus
reserve_memtype added 0xbf78f000-0xbf790000, track uncached-minus, req uncached-minus, ret uncached-minus
free_memtype request 0xbf798000-0xbf799000
reserve_memtype added 0xbf798000-0xbf799000, track uncached-minus, req uncached-minus, ret uncached-minus
free_memtype request 0xbf798000-0xbf799000
reserve_memtype added 0xbf798000-0xbf799000, track uncached-minus, req uncached-minus, ret uncached-minus
free_memtype request 0xbf798000-0xbf799000
reserve_memtype added 0xbf798000-0xbf799000, track uncached-minus, req uncached-minus, ret uncached-minus
free_memtype request 0xbf798000-0xbf799000
reserve_memtype added 0xbf799000-0xbf79a000, track uncached-minus, req uncached-minus, ret uncached-minus
free_memtype request 0xbf799000-0xbf79a000
reserve_memtype added 0xbf799000-0xbf79a000, track uncached-minus, req uncached-minus, ret uncached-minus
swapper:1 freeing invalid memtype bf799000-bf79a000


http://kadu.net/~joi/kernel/2010.06.09/2.6.35-rc3-debugpat.txt

Marcin

2010-06-21 10:56:55

by Xiaotian Feng

[permalink] [raw]
Subject: Re: [patch] x86, pat: freeing invalid memtype messages

On 06/19/2010 01:57 AM, Marcin Slusarz wrote:
> On Fri, Jun 18, 2010 at 02:47:45PM +0800, Xiaotian Feng wrote:
>> On 06/18/2010 12:17 AM, Marcin Slusarz wrote:
>>> On Thu, Jun 17, 2010 at 03:45:59PM +0200, Dan Carpenter wrote:
>>>> Commit 20413f27163 "x86, pat: Fix memory leak in free_memtype" added an
>>>> error message in free_memtype() if rbt_memtype_erase() returns NULL.
>>>> The problem is that if CONFIG_X86_PAT is enabled, we use a different
>>>> implimentation of rbt_memtype_erase() that always returns NULL.
>>>>
>>>> I've modified rbt_memtype_erase() to return an ERR_PTR() on errors and
>>>> made free_memtype() check for that instead.
>>>>
>>>> Addresses: https://bugzilla.kernel.org/show_bug.cgi?id=16205
>>>>
>>>> Signed-off-by: Dan Carpenter<[email protected]>
>>>
>>> This patch is probably ok, but it does not address my bug.
>>> I have CONFIG_X86_PAT=y, so rbt_memtype_erase does not always return NULL.
>>
>> Could you please try boot with kernel parameter "debugpat", and
>> show me the output of reserve_memtype/free_memtype ?
>>
>>
>
>
> http://kadu.net/~joi/kernel/2010.06.09/2.6.35-rc3-debugpat.txt

That's quite weird, from the above log:

[ 1.787891] reserve_memtype added 0xbf799000-0xbf79a000, track uncached-minus, req uncached-minus, ret uncached-minus
[ 1.791029] free_memtype request 0xbf799000-0xbf79a000
[ 1.791822] reserve_memtype added 0xbf799000-0xbf79a000, track uncached-minus, req uncached-minus, ret uncached-minus
[ 1.794998] swapper:1 freeing invalid memtype bf799000-bf79a000
[ 1.795795] reserve_memtype added 0xbf799000-0xbf79a000, track uncached-minus, req uncached-minus, ret uncached-minus
[ 1.798979] free_memtype request 0xbf799000-0xbf79a000
[ 1.799775] Overlap at 0xbf799000-0xbf79a000

[ 22.271353] reserve_memtype added 0xd0a40000-0xd0a50000, track write-combining, req write-combining, ret write-combining
[ 22.275707] free_memtype request 0xd0a40000-0xd0a50000
[ 23.209570] reserve_memtype added 0xd0a40000-0xd0a50000, track write-combining, req write-combining, ret write-combining
[ 23.213888] X:2538 freeing invalid memtype d0a40000-d0a50000
[ 23.214065] reserve_memtype added 0xd0a40000-0xd0a50000, track write-combining, req write-combining, ret write-combining
[ 23.218415] free_memtype request 0xd0a40000-0xd0a50000
[ 26.028404] Overlap at 0xd0a40000-0xd0a50000

So it looks like after we free_memtype, reserve the same area again, then free_memtype again showed us the invalid memtype (was not found in rbtree).
But the third time reserve_memtype found overlap (it's in rbtree)...

I guess there might be something wrong between the augmented rbtree insert/remove ..

(Cc'ed Peter)

>
> Marcin
>

2010-06-21 11:02:47

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [patch] x86, pat: freeing invalid memtype messages

On Mon, 2010-06-21 at 18:56 +0800, Xiaotian Feng wrote:

> I guess there might be something wrong between the augmented rbtree insert/remove ..

The easiest thing is to revert that change and try again, the next step
would be to print the full RB tree on each modification and look where
it goes wrong.

That said, I did print my fair share of (augmented) RB trees while
playing with scheduler patches and I can't remember it ever having
messed up like that.

2010-06-21 11:07:47

by Xiaotian Feng

[permalink] [raw]
Subject: Re: [patch] x86, pat: freeing invalid memtype messages

On 06/21/2010 07:02 PM, Peter Zijlstra wrote:
> On Mon, 2010-06-21 at 18:56 +0800, Xiaotian Feng wrote:
>
>> I guess there might be something wrong between the augmented rbtree insert/remove ..
>
> The easiest thing is to revert that change and try again, the next step
> would be to print the full RB tree on each modification and look where
> it goes wrong.
>
> That said, I did print my fair share of (augmented) RB trees while
> playing with scheduler patches and I can't remember it ever having
> messed up like that.
He's using 2.6.35-rc2+, without your "rbtree: Undo augmented trees
performance damage" patch ;-)

2010-06-21 15:35:08

by Marcin Slusarz

[permalink] [raw]
Subject: Re: [patch] x86, pat: freeing invalid memtype messages

On Mon, Jun 21, 2010 at 07:07:27PM +0800, Xiaotian Feng wrote:
> On 06/21/2010 07:02 PM, Peter Zijlstra wrote:
> > On Mon, 2010-06-21 at 18:56 +0800, Xiaotian Feng wrote:
> >
> >> I guess there might be something wrong between the augmented rbtree insert/remove ..
> >
> > The easiest thing is to revert that change and try again, the next step
> > would be to print the full RB tree on each modification and look where
> > it goes wrong.
> >
> > That said, I did print my fair share of (augmented) RB trees while
> > playing with scheduler patches and I can't remember it ever having
> > messed up like that.
> He's using 2.6.35-rc2+, without your "rbtree: Undo augmented trees
> performance damage" patch ;-)

I applied it manually (commit 2463eb8b3093995e09a0d41b3d78ee0cf5fb4249 from -tip)
to 2.6.35-rc3 and it fixed both acpi's and nouveau's "invalid memtype" messages.
Thanks.

Marcin

2010-06-21 15:41:44

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [patch] x86, pat: freeing invalid memtype messages

On Mon, 2010-06-21 at 17:33 +0200, Marcin Slusarz wrote:
> On Mon, Jun 21, 2010 at 07:07:27PM +0800, Xiaotian Feng wrote:
> > On 06/21/2010 07:02 PM, Peter Zijlstra wrote:
> > > On Mon, 2010-06-21 at 18:56 +0800, Xiaotian Feng wrote:
> > >
> > >> I guess there might be something wrong between the augmented rbtree insert/remove ..
> > >
> > > The easiest thing is to revert that change and try again, the next step
> > > would be to print the full RB tree on each modification and look where
> > > it goes wrong.
> > >
> > > That said, I did print my fair share of (augmented) RB trees while
> > > playing with scheduler patches and I can't remember it ever having
> > > messed up like that.
> > He's using 2.6.35-rc2+, without your "rbtree: Undo augmented trees
> > performance damage" patch ;-)
>
> I applied it manually (commit 2463eb8b3093995e09a0d41b3d78ee0cf5fb4249 from -tip)
> to 2.6.35-rc3 and it fixed both acpi's and nouveau's "invalid memtype" messages.
> Thanks.

Oh neat, so it actually fixes a bug in the previous augmented rb-tree
implementation?

2010-06-21 17:55:09

by Suresh Siddha

[permalink] [raw]
Subject: Re: [patch] x86, pat: freeing invalid memtype messages

On Mon, 2010-06-21 at 08:41 -0700, Peter Zijlstra wrote:
> On Mon, 2010-06-21 at 17:33 +0200, Marcin Slusarz wrote:
> > On Mon, Jun 21, 2010 at 07:07:27PM +0800, Xiaotian Feng wrote:
> > > On 06/21/2010 07:02 PM, Peter Zijlstra wrote:
> > > > On Mon, 2010-06-21 at 18:56 +0800, Xiaotian Feng wrote:
> > > >
> > > >> I guess there might be something wrong between the augmented rbtree insert/remove ..
> > > >
> > > > The easiest thing is to revert that change and try again, the next step
> > > > would be to print the full RB tree on each modification and look where
> > > > it goes wrong.
> > > >
> > > > That said, I did print my fair share of (augmented) RB trees while
> > > > playing with scheduler patches and I can't remember it ever having
> > > > messed up like that.
> > > He's using 2.6.35-rc2+, without your "rbtree: Undo augmented trees
> > > performance damage" patch ;-)
> >
> > I applied it manually (commit 2463eb8b3093995e09a0d41b3d78ee0cf5fb4249 from -tip)
> > to 2.6.35-rc3 and it fixed both acpi's and nouveau's "invalid memtype" messages.
> > Thanks.
>
> Oh neat, so it actually fixes a bug in the previous augmented rb-tree
> implementation?

When I was reviewing your fix, it looked like that prior to your fix we
were re-augmenting only at points where we do the tree rotations/color
change and at the points of node insertion/removal. I don't think we
were re-augmenting all the parent nodes in the path of the selected-node
that is going to replace the deleted node.

Perhaps we were hitting this issue here.

thanks,
suresh

2010-06-21 18:08:54

by Venkatesh Pallipadi

[permalink] [raw]
Subject: Re: [patch] x86, pat: freeing invalid memtype messages

On Mon, Jun 21, 2010 at 10:54 AM, Suresh Siddha
<[email protected]> wrote:
> On Mon, 2010-06-21 at 08:41 -0700, Peter Zijlstra wrote:
>> On Mon, 2010-06-21 at 17:33 +0200, Marcin Slusarz wrote:
>> > On Mon, Jun 21, 2010 at 07:07:27PM +0800, Xiaotian Feng wrote:
>> > > On 06/21/2010 07:02 PM, Peter Zijlstra wrote:
>> > > > On Mon, 2010-06-21 at 18:56 +0800, Xiaotian Feng wrote:
>> > > >
>> > > >> I guess there might be something wrong between the augmented rbtree insert/remove ..
>> > > >
>> > > > The easiest thing is to revert that change and try again, the next step
>> > > > would be to print the full RB tree on each modification and look where
>> > > > it goes wrong.
>> > > >
>> > > > That said, I did print my fair share of (augmented) RB trees while
>> > > > playing with scheduler patches and I can't remember it ever having
>> > > > messed up like that.
>> > > He's using 2.6.35-rc2+, without your "rbtree: Undo augmented trees
>> > > performance damage" patch ;-)
>> >
>> > I applied it manually (commit 2463eb8b3093995e09a0d41b3d78ee0cf5fb4249 from -tip)
>> > to 2.6.35-rc3 and it fixed both acpi's and nouveau's "invalid memtype" messages.
>> > Thanks.
>>
>> Oh neat, so it actually fixes a bug in the previous augmented rb-tree
>> implementation?
>
> When I was reviewing your fix, it looked like that prior to your fix we
> were re-augmenting only at points where we do the tree rotations/color
> change and at the points of node insertion/removal. I don't think we
> were re-augmenting all the parent nodes in the path of the selected-node
> that is going to replace the deleted node.
>
> Perhaps we were hitting this issue here.
>

rb_erase was calling the augment callback with successor_parent_cb.
That should be doing proper re-augmenting on delete.

May be we are hitting the problem with not-initializing
subtree_max_end on insert? That was fixed in a later patch.

Thanks,
Venki

2010-06-21 18:38:33

by Venkatesh Pallipadi

[permalink] [raw]
Subject: Re: [patch] x86, pat: freeing invalid memtype messages

On Mon, Jun 21, 2010 at 11:08 AM, Venkatesh Pallipadi <[email protected]> wrote:
> On Mon, Jun 21, 2010 at 10:54 AM, Suresh Siddha
> <[email protected]> wrote:
>> On Mon, 2010-06-21 at 08:41 -0700, Peter Zijlstra wrote:
>>> On Mon, 2010-06-21 at 17:33 +0200, Marcin Slusarz wrote:
>>> > On Mon, Jun 21, 2010 at 07:07:27PM +0800, Xiaotian Feng wrote:
>>> > > On 06/21/2010 07:02 PM, Peter Zijlstra wrote:
>>> > > > On Mon, 2010-06-21 at 18:56 +0800, Xiaotian Feng wrote:
>>> > > >
>>> > > >> I guess there might be something wrong between the augmented rbtree insert/remove ..
>>> > > >
>>> > > > The easiest thing is to revert that change and try again, the next step
>>> > > > would be to print the full RB tree on each modification and look where
>>> > > > it goes wrong.
>>> > > >
>>> > > > That said, I did print my fair share of (augmented) RB trees while
>>> > > > playing with scheduler patches and I can't remember it ever having
>>> > > > messed up like that.
>>> > > He's using 2.6.35-rc2+, without your "rbtree: Undo augmented trees
>>> > > performance damage" patch ;-)
>>> >
>>> > I applied it manually (commit 2463eb8b3093995e09a0d41b3d78ee0cf5fb4249 from -tip)
>>> > to 2.6.35-rc3 and it fixed both acpi's and nouveau's "invalid memtype" messages.
>>> > Thanks.
>>>
>>> Oh neat, so it actually fixes a bug in the previous augmented rb-tree
>>> implementation?
>>
>> When I was reviewing your fix, it looked like that prior to your fix we
>> were re-augmenting only at points where we do the tree rotations/color
>> change and at the points of node insertion/removal. I don't think we
>> were re-augmenting all the parent nodes in the path of the selected-node
>> that is going to replace the deleted node.
>>
>> Perhaps we were hitting this issue here.
>>
>
> rb_erase was calling the augment callback with successor_parent_cb.
> That should be doing proper re-augmenting on delete.
>
> May be we are hitting the problem with not-initializing
> subtree_max_end on insert? That was fixed in a later patch.

Here's the patch I was referring to
http://marc.info/?l=linux-mm-commits&m=127654225011850&w=2

Marcin: Can you try this patch without Peter's patch and see whether
there are any issues with that. Just to make sure we don't have issues
wuth underlying augmented rbtree algorithm that somehow got fixed or
masked for the time being with Peter's change.

Thanks,
Venki

2010-06-21 18:44:03

by Marcin Slusarz

[permalink] [raw]
Subject: Re: [patch] x86, pat: freeing invalid memtype messages

On Mon, Jun 21, 2010 at 11:38:27AM -0700, Venkatesh Pallipadi wrote:
> On Mon, Jun 21, 2010 at 11:08 AM, Venkatesh Pallipadi <[email protected]> wrote:
> > On Mon, Jun 21, 2010 at 10:54 AM, Suresh Siddha
> > <[email protected]> wrote:
> >> On Mon, 2010-06-21 at 08:41 -0700, Peter Zijlstra wrote:
> >>> On Mon, 2010-06-21 at 17:33 +0200, Marcin Slusarz wrote:
> >>> > On Mon, Jun 21, 2010 at 07:07:27PM +0800, Xiaotian Feng wrote:
> >>> > > On 06/21/2010 07:02 PM, Peter Zijlstra wrote:
> >>> > > > On Mon, 2010-06-21 at 18:56 +0800, Xiaotian Feng wrote:
> >>> > > >
> >>> > > >> I guess there might be something wrong between the augmented rbtree insert/remove ..
> >>> > > >
> >>> > > > The easiest thing is to revert that change and try again, the next step
> >>> > > > would be to print the full RB tree on each modification and look where
> >>> > > > it goes wrong.
> >>> > > >
> >>> > > > That said, I did print my fair share of (augmented) RB trees while
> >>> > > > playing with scheduler patches and I can't remember it ever having
> >>> > > > messed up like that.
> >>> > > He's using 2.6.35-rc2+, without your "rbtree: Undo augmented trees
> >>> > > performance damage" patch ;-)
> >>> >
> >>> > I applied it manually (commit 2463eb8b3093995e09a0d41b3d78ee0cf5fb4249 from -tip)
> >>> > to 2.6.35-rc3 and it fixed both acpi's and nouveau's "invalid memtype" messages.
> >>> > Thanks.
> >>>
> >>> Oh neat, so it actually fixes a bug in the previous augmented rb-tree
> >>> implementation?
> >>
> >> When I was reviewing your fix, it looked like that prior to your fix we
> >> were re-augmenting only at points where we do the tree rotations/color
> >> change and at the points of node insertion/removal. I don't think we
> >> were re-augmenting all the parent nodes in the path of the selected-node
> >> that is going to replace the deleted node.
> >>
> >> Perhaps we were hitting this issue here.
> >>
> >
> > rb_erase was calling the augment callback with successor_parent_cb.
> > That should be doing proper re-augmenting on delete.
> >
> > May be we are hitting the problem with not-initializing
> > subtree_max_end on insert? That was fixed in a later patch.
>
> Here's the patch I was referring to
> http://marc.info/?l=linux-mm-commits&m=127654225011850&w=2
>
> Marcin: Can you try this patch without Peter's patch and see whether
> there are any issues with that. Just to make sure we don't have issues
> wuth underlying augmented rbtree algorithm that somehow got fixed or
> masked for the time being with Peter's change.

I tested this patch few days ago and it produced even more "invalid memtype" messages...

Marcin

2010-06-21 18:58:53

by Marcin Slusarz

[permalink] [raw]
Subject: Re: [patch] x86, pat: freeing invalid memtype messages

On Mon, Jun 21, 2010 at 08:41:54PM +0200, Marcin Slusarz wrote:
> On Mon, Jun 21, 2010 at 11:38:27AM -0700, Venkatesh Pallipadi wrote:
> > On Mon, Jun 21, 2010 at 11:08 AM, Venkatesh Pallipadi <[email protected]> wrote:
> > > On Mon, Jun 21, 2010 at 10:54 AM, Suresh Siddha
> > > <[email protected]> wrote:
> > >> On Mon, 2010-06-21 at 08:41 -0700, Peter Zijlstra wrote:
> > >>> On Mon, 2010-06-21 at 17:33 +0200, Marcin Slusarz wrote:
> > >>> > On Mon, Jun 21, 2010 at 07:07:27PM +0800, Xiaotian Feng wrote:
> > >>> > > On 06/21/2010 07:02 PM, Peter Zijlstra wrote:
> > >>> > > > On Mon, 2010-06-21 at 18:56 +0800, Xiaotian Feng wrote:
> > >>> > > >
> > >>> > > >> I guess there might be something wrong between the augmented rbtree insert/remove ..
> > >>> > > >
> > >>> > > > The easiest thing is to revert that change and try again, the next step
> > >>> > > > would be to print the full RB tree on each modification and look where
> > >>> > > > it goes wrong.
> > >>> > > >
> > >>> > > > That said, I did print my fair share of (augmented) RB trees while
> > >>> > > > playing with scheduler patches and I can't remember it ever having
> > >>> > > > messed up like that.
> > >>> > > He's using 2.6.35-rc2+, without your "rbtree: Undo augmented trees
> > >>> > > performance damage" patch ;-)
> > >>> >
> > >>> > I applied it manually (commit 2463eb8b3093995e09a0d41b3d78ee0cf5fb4249 from -tip)
> > >>> > to 2.6.35-rc3 and it fixed both acpi's and nouveau's "invalid memtype" messages.
> > >>> > Thanks.
> > >>>
> > >>> Oh neat, so it actually fixes a bug in the previous augmented rb-tree
> > >>> implementation?
> > >>
> > >> When I was reviewing your fix, it looked like that prior to your fix we
> > >> were re-augmenting only at points where we do the tree rotations/color
> > >> change and at the points of node insertion/removal. I don't think we
> > >> were re-augmenting all the parent nodes in the path of the selected-node
> > >> that is going to replace the deleted node.
> > >>
> > >> Perhaps we were hitting this issue here.
> > >>
> > >
> > > rb_erase was calling the augment callback with successor_parent_cb.
> > > That should be doing proper re-augmenting on delete.
> > >
> > > May be we are hitting the problem with not-initializing
> > > subtree_max_end on insert? That was fixed in a later patch.
> >
> > Here's the patch I was referring to
> > http://marc.info/?l=linux-mm-commits&m=127654225011850&w=2
> >
> > Marcin: Can you try this patch without Peter's patch and see whether
> > there are any issues with that. Just to make sure we don't have issues
> > wuth underlying augmented rbtree algorithm that somehow got fixed or
> > masked for the time being with Peter's change.
>
> I tested this patch few days ago and it produced even more "invalid memtype" messages...

http://kadu.net/~joi/kernel/2010.06.09/2.6.35-rc3-debugpat-x86-proper-init-of-memtype-subtree_max_end.txt

Marcin

2010-06-22 02:46:07

by Xiaotian Feng

[permalink] [raw]
Subject: Re: [patch] x86, pat: freeing invalid memtype messages

On 06/22/2010 01:54 AM, Suresh Siddha wrote:
> On Mon, 2010-06-21 at 08:41 -0700, Peter Zijlstra wrote:
>> On Mon, 2010-06-21 at 17:33 +0200, Marcin Slusarz wrote:
>>> On Mon, Jun 21, 2010 at 07:07:27PM +0800, Xiaotian Feng wrote:
>>>> On 06/21/2010 07:02 PM, Peter Zijlstra wrote:
>>>>> On Mon, 2010-06-21 at 18:56 +0800, Xiaotian Feng wrote:
>>>>>
>>>>>> I guess there might be something wrong between the augmented rbtree insert/remove ..
>>>>>
>>>>> The easiest thing is to revert that change and try again, the next step
>>>>> would be to print the full RB tree on each modification and look where
>>>>> it goes wrong.
>>>>>
>>>>> That said, I did print my fair share of (augmented) RB trees while
>>>>> playing with scheduler patches and I can't remember it ever having
>>>>> messed up like that.
>>>> He's using 2.6.35-rc2+, without your "rbtree: Undo augmented trees
>>>> performance damage" patch ;-)
>>>
>>> I applied it manually (commit 2463eb8b3093995e09a0d41b3d78ee0cf5fb4249 from -tip)
>>> to 2.6.35-rc3 and it fixed both acpi's and nouveau's "invalid memtype" messages.
>>> Thanks.
>>
>> Oh neat, so it actually fixes a bug in the previous augmented rb-tree
>> implementation?
>
> When I was reviewing your fix, it looked like that prior to your fix we
> were re-augmenting only at points where we do the tree rotations/color
> change and at the points of node insertion/removal. I don't think we
> were re-augmenting all the parent nodes in the path of the selected-node
> that is going to replace the deleted node.
>
> Perhaps we were hitting this issue here.

Were it from a insert without any rotations/color changes?

This case is performing insert A/remove A/ 2nd insert A/ 2nd remove
A/3rd insert A. And the 2nd remove
shows us the invalid memtype. 3rd insert shows us it is in the rbtree.
All I can image is that get_subtree_max_end
in memtype_rb_lowest_match returned stale value.

It looks like we don't re-augment the parent nodes if there aren't any
rotations/color changes
in the rb_insert_color().

>
> thanks,
> suresh
>
>

2010-06-22 03:52:26

by Venkatesh Pallipadi

[permalink] [raw]
Subject: Re: [patch] x86, pat: freeing invalid memtype messages

On Mon, Jun 21, 2010 at 7:45 PM, Xiaotian Feng <[email protected]> wrote:
> On 06/22/2010 01:54 AM, Suresh Siddha wrote:
>>
>> On Mon, 2010-06-21 at 08:41 -0700, Peter Zijlstra wrote:
>>>
>>> On Mon, 2010-06-21 at 17:33 +0200, Marcin Slusarz wrote:
>>>>
>>>> On Mon, Jun 21, 2010 at 07:07:27PM +0800, Xiaotian Feng wrote:
>>>>>
>>>>> On 06/21/2010 07:02 PM, Peter Zijlstra wrote:
>>>>>>
>>>>>> On Mon, 2010-06-21 at 18:56 +0800, Xiaotian Feng wrote:
>>>>>>
>>>>>>> I guess there might be something wrong between the augmented rbtree
>>>>>>> insert/remove ..
>>>>>>
>>>>>> The easiest thing is to revert that change and try again, the next
>>>>>> step
>>>>>> would be to print the full RB tree on each modification and look where
>>>>>> it goes wrong.
>>>>>>
>>>>>> That said, I did print my fair share of (augmented) RB trees while
>>>>>> playing with scheduler patches and I can't remember it ever having
>>>>>> messed up like that.
>>>>>
>>>>> He's using 2.6.35-rc2+, without your "rbtree: Undo augmented trees
>>>>> performance damage" patch ;-)
>>>>
>>>> I applied it manually (commit 2463eb8b3093995e09a0d41b3d78ee0cf5fb4249
>>>> from -tip)
>>>> to 2.6.35-rc3 and it fixed both acpi's and nouveau's "invalid memtype"
>>>> messages.
>>>> Thanks.
>>>
>>> Oh neat, so it actually fixes a bug in the previous augmented rb-tree
>>> implementation?
>>
>> When I was reviewing your fix, it looked like that prior to your fix we
>> were re-augmenting only at points where we do the tree rotations/color
>> change and at the points of node insertion/removal. I don't think we
>> were re-augmenting all the parent nodes in the path of the selected-node
>> that is going to replace the deleted node.
>>
>> Perhaps we were hitting this issue here.
>
> Were it from a insert without any rotations/color changes?
>
> This case is performing insert A/remove A/ 2nd insert A/ 2nd remove A/3rd
> insert A. And the 2nd remove
> shows us the invalid memtype. ?3rd insert shows us it is in the rbtree. All
> I can image is that get_subtree_max_end
> in memtype_rb_lowest_match returned stale value.
>
> It looks like we don't re-augment the parent nodes if there aren't any
> rotations/color changes
> in the rb_insert_color().


Nice detective work! Yes. That kind of hints that max_end
initialization is the problem. The patch I pointed to here
http://marc.info/?l=linux-mm-commits&m=127654225011850&w=2
will have a side-effect without Peter's change and make this problem worse.
This minimal patch here
https://bugzilla.kernel.org/show_bug.cgi?id=16092 comment #2
on a kernel without Peter's change should not have this problem with
insert however. As any new node starts at 0 max_end, updates itself to
memtype->end and calls augment on its parent in rb_insert_color
augment callback.

Thanks,
Venki