2018-01-12 16:23:49

by Joseph Salisbury

[permalink] [raw]
Subject: [REGRESSION][v4.14.y][v4.15] x86/intel_rdt/cqm: Improve limbo list processing

Hi Vikas,

A kernel bug report was opened against Ubuntu [0].  After a kernel
bisect, it was found that reverting the following commit resolved this bug:

commit 24247aeeabe99eab13b798ccccc2dec066dd6f07
Author: Vikas Shivappa <[email protected]>
Date:   Tue Aug 15 18:00:43 2017 -0700

    x86/intel_rdt/cqm: Improve limbo list processing


The regression was introduced as of v4.14-r1 and still exists with
current mainline.  The trace with v4.15-rc7 is in comment #44[1].

I was hoping to get your feedback, since you are the patch author.  Do
you think gathering any additional data will help diagnose this issue,
or would it be best to submit a revert request?


Thanks,

Joe
[0] http://pad.lv/1733662
[1]
https://bugs.launchpad.net/ubuntu/+source/linux-hwe/+bug/1733662/comments/44



2018-01-14 11:35:52

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [REGRESSION][v4.14.y][v4.15] x86/intel_rdt/cqm: Improve limbo list processing

On Fri, 12 Jan 2018, Joseph Salisbury wrote:

> Hi Vikas,
>
> A kernel bug report was opened against Ubuntu [0].  After a kernel
> bisect, it was found that reverting the following commit resolved this bug:
>
> commit 24247aeeabe99eab13b798ccccc2dec066dd6f07
> Author: Vikas Shivappa <[email protected]>
> Date:   Tue Aug 15 18:00:43 2017 -0700
>
>     x86/intel_rdt/cqm: Improve limbo list processing
>
>
> The regression was introduced as of v4.14-r1 and still exists with
> current mainline.  The trace with v4.15-rc7 is in comment #44[1].
>
> I was hoping to get your feedback, since you are the patch author.  Do
> you think gathering any additional data will help diagnose this issue,
> or would it be best to submit a revert request?

That stinks like a use after free. Can you run with KASAN enabled?

Thanks,

tglx

2018-01-16 13:09:54

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [REGRESSION][v4.14.y][v4.15] x86/intel_rdt/cqm: Improve limbo list processing


Vikas, Fenghua can you please look at that ASAP?

On Sun, 14 Jan 2018, Thomas Gleixner wrote:

> On Fri, 12 Jan 2018, Joseph Salisbury wrote:
>
> > Hi Vikas,
> >
> > A kernel bug report was opened against Ubuntu [0].  After a kernel
> > bisect, it was found that reverting the following commit resolved this bug:
> >
> > commit 24247aeeabe99eab13b798ccccc2dec066dd6f07
> > Author: Vikas Shivappa <[email protected]>
> > Date:   Tue Aug 15 18:00:43 2017 -0700
> >
> >     x86/intel_rdt/cqm: Improve limbo list processing
> >
> >
> > The regression was introduced as of v4.14-r1 and still exists with
> > current mainline.  The trace with v4.15-rc7 is in comment #44[1].
> >
> > I was hoping to get your feedback, since you are the patch author.  Do
> > you think gathering any additional data will help diagnose this issue,
> > or would it be best to submit a revert request?
>
> That stinks like a use after free. Can you run with KASAN enabled?
>
> Thanks,
>
> tglx

2018-01-16 16:41:15

by Joseph Salisbury

[permalink] [raw]
Subject: Re: [REGRESSION][v4.14.y][v4.15] x86/intel_rdt/cqm: Improve limbo list processing

On 01/16/2018 08:32 AM, Shankar, Ravi V wrote:
> Vikas on vacation until end of the month. Fenghua will look into this
> issue.
>
> On Jan 16, 2018, at 5:09 AM, Thomas Gleixner <[email protected]
> <mailto:[email protected]>> wrote:
>
>>
>> Vikas, Fenghua can you please look at that ASAP?
>>
>> On Sun, 14 Jan 2018, Thomas Gleixner wrote:
>>
>>> On Fri, 12 Jan 2018, Joseph Salisbury wrote:
>>>
>>>> Hi Vikas,
>>>>
>>>> A kernel bug report was opened against Ubuntu [0].? After a kernel
>>>> bisect, it was found that reverting the following commit resolved
>>>> this bug:
>>>>
>>>> commit 24247aeeabe99eab13b798ccccc2dec066dd6f07
>>>> Author: Vikas Shivappa <[email protected]
>>>> <mailto:[email protected]>>
>>>> Date:?? Tue Aug 15 18:00:43 2017 -0700
>>>>
>>>> ??? x86/intel_rdt/cqm: Improve limbo list processing
>>>>
>>>>
>>>> The regression was introduced as of v4.14-r1 and still exists with
>>>> current mainline.? The trace with v4.15-rc7 is in comment #44[1].
>>>>
>>>> I was hoping to get your feedback, since you are the patch author.? Do
>>>> you think gathering any additional data will help diagnose this issue,
>>>> or would it be best to submit a revert request?
>>>
>>> That stinks like a use after free. Can you run with KASAN enabled?
>>>
>>> Thanks,
>>>
>>> ? ?tglx


Here is some data wiht KASAN enabled:
https://bugs.launchpad.net/ubuntu/+source/linux-hwe/+bug/1733662/comments/51

Are there any specific logs you would like to see, or specific actions
executed?

Thanks,

Joe





2018-01-16 18:09:41

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [REGRESSION][v4.14.y][v4.15] x86/intel_rdt/cqm: Improve limbo list processing

On Tue, 16 Jan 2018, Joseph Salisbury wrote:
> On 01/16/2018 08:32 AM, Shankar, Ravi V wrote:
> > Vikas on vacation until end of the month. Fenghua will look into this
> > issue.
> >
> > On Jan 16, 2018, at 5:09 AM, Thomas Gleixner <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >>
> >> Vikas, Fenghua can you please look at that ASAP?
> >>
> >> On Sun, 14 Jan 2018, Thomas Gleixner wrote:
> >>
> >>> On Fri, 12 Jan 2018, Joseph Salisbury wrote:
> >>>
> >>>> Hi Vikas,
> >>>>
> >>>> A kernel bug report was opened against Ubuntu [0].? After a kernel
> >>>> bisect, it was found that reverting the following commit resolved
> >>>> this bug:
> >>>>
> >>>> commit 24247aeeabe99eab13b798ccccc2dec066dd6f07
> >>>> Author: Vikas Shivappa <[email protected]
> >>>> <mailto:[email protected]>>
> >>>> Date:?? Tue Aug 15 18:00:43 2017 -0700
> >>>>
> >>>> ??? x86/intel_rdt/cqm: Improve limbo list processing
> >>>>
> >>>>
> >>>> The regression was introduced as of v4.14-r1 and still exists with
> >>>> current mainline.? The trace with v4.15-rc7 is in comment #44[1].
> >>>>
> >>>> I was hoping to get your feedback, since you are the patch author.? Do
> >>>> you think gathering any additional data will help diagnose this issue,
> >>>> or would it be best to submit a revert request?
> >>>
> >>> That stinks like a use after free. Can you run with KASAN enabled?
> >>>
> >>> Thanks,
> >>>
> >>> ? ?tglx
>
>
> Here is some data wiht KASAN enabled:
> https://bugs.launchpad.net/ubuntu/+source/linux-hwe/+bug/1733662/comments/51
>
> Are there any specific logs you would like to see, or specific actions
> executed?

No, the KASAN output is pretty clear where the issue is.

Thanks,

tglx

2018-01-16 18:34:18

by Fenghua Yu

[permalink] [raw]
Subject: RE: [REGRESSION][v4.14.y][v4.15] x86/intel_rdt/cqm: Improve limbo list processing

> From: Thomas Gleixner [mailto:[email protected]]
> On Tue, 16 Jan 2018, Joseph Salisbury wrote:
> > On 01/16/2018 08:32 AM, Shankar, Ravi V wrote:
> > > Vikas on vacation until end of the month. Fenghua will look into
> > > this issue.
> > >
> > > On Jan 16, 2018, at 5:09 AM, Thomas Gleixner <[email protected]
> > > <mailto:[email protected]>> wrote:
> > >
> > >>
> > >> Vikas, Fenghua can you please look at that ASAP?
> > >>
> > >> On Sun, 14 Jan 2018, Thomas Gleixner wrote:
> > >>
> > >>> On Fri, 12 Jan 2018, Joseph Salisbury wrote:
> > >>>
> > >>>> Hi Vikas,
> > >>>>
> > >>>> A kernel bug report was opened against Ubuntu [0].? After a
> > >>>> kernel bisect, it was found that reverting the following commit
> > >>>> resolved this bug:
> > >>>>
> > >>>> commit 24247aeeabe99eab13b798ccccc2dec066dd6f07
> > >>>> Author: Vikas Shivappa <[email protected]
> > >>>> <mailto:[email protected]>>
> > >>>> Date:?? Tue Aug 15 18:00:43 2017 -0700
> > >>>>
> > >>>> ??? x86/intel_rdt/cqm: Improve limbo list processing
> > >>>>
> > >>>>
> > >>>> The regression was introduced as of v4.14-r1 and still exists
> > >>>> with current mainline.? The trace with v4.15-rc7 is in comment #44[1].
> > >>>>
> > >>>> I was hoping to get your feedback, since you are the patch
> > >>>> author.? Do you think gathering any additional data will help
> > >>>> diagnose this issue, or would it be best to submit a revert request?
> > >>>
> > >>> That stinks like a use after free. Can you run with KASAN enabled?
> > >>>
> > >>> Thanks,
> > >>>
> > >>> ? ?tglx
> >
> >
> > Here is some data wiht KASAN enabled:
> > https://bugs.launchpad.net/ubuntu/+source/linux-
> hwe/+bug/1733662/comme
> > nts/51
> >
> > Are there any specific logs you would like to see, or specific actions
> > executed?
>
> No, the KASAN output is pretty clear where the issue is.
>
> Thanks,
>
> tglx

Is this a Haswell specific issue?

I run the following test forever without issue on Broadwell and 4.15.0-rc6 with rdt mounted:
for ((;;)) do
for ((i=1;i<88;i++)) do
echo 0 >/sys/devices/system/cpu/cpu$i/online
done
echo "online cpus:"
grep processor /proc/cpuinfo |wc
for ((i=1;i<88;i++)) do
echo 1 >/sys/devices/system/cpu/cpu$i/online
done
echo "online cpus:"
grep processor /proc/cpuinfo|wc
done

I'm finding a Haswell to reproduce the issue.

Thanks.

-Fenghua

2018-01-16 19:00:05

by Thomas Gleixner

[permalink] [raw]
Subject: RE: [REGRESSION][v4.14.y][v4.15] x86/intel_rdt/cqm: Improve limbo list processing

On Tue, 16 Jan 2018, Yu, Fenghua wrote:
> > From: Thomas Gleixner [mailto:[email protected]]
> Is this a Haswell specific issue?
>
> I run the following test forever without issue on Broadwell and 4.15.0-rc6 with rdt mounted:
> for ((;;)) do
> for ((i=1;i<88;i++)) do
> echo 0 >/sys/devices/system/cpu/cpu$i/online
> done
> echo "online cpus:"
> grep processor /proc/cpuinfo |wc
> for ((i=1;i<88;i++)) do
> echo 1 >/sys/devices/system/cpu/cpu$i/online
> done
> echo "online cpus:"
> grep processor /proc/cpuinfo|wc
> done
>
> I'm finding a Haswell to reproduce the issue.

Come on. This is crystal clear from the KASAN trace. And the fix is simple enough.

You simply do not run into it because on your machine

is_llc_occupancy_enabled() is false...

Thanks,

tglx

8<--------------------

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 88dcf8479013..99442370de40 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -525,10 +525,6 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
*/
if (static_branch_unlikely(&rdt_mon_enable_key))
rmdir_mondata_subdir_allrdtgrp(r, d->id);
- kfree(d->ctrl_val);
- kfree(d->rmid_busy_llc);
- kfree(d->mbm_total);
- kfree(d->mbm_local);
list_del(&d->list);
if (is_mbm_enabled())
cancel_delayed_work(&d->mbm_over);
@@ -545,6 +541,10 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
cancel_delayed_work(&d->cqm_limbo);
}

+ kfree(d->ctrl_val);
+ kfree(d->rmid_busy_llc);
+ kfree(d->mbm_total);
+ kfree(d->mbm_local);
kfree(d);
return;
}

Subject: [tip:x86/urgent] x86/intel_rdt/cqm: Prevent use after free

Commit-ID: d47924417319e3b6a728c0b690f183e75bc2a702
Gitweb: https://git.kernel.org/tip/d47924417319e3b6a728c0b690f183e75bc2a702
Author: Thomas Gleixner <[email protected]>
AuthorDate: Tue, 16 Jan 2018 19:59:59 +0100
Committer: Thomas Gleixner <[email protected]>
CommitDate: Wed, 17 Jan 2018 11:56:47 +0100

x86/intel_rdt/cqm: Prevent use after free

intel_rdt_iffline_cpu() -> domain_remove_cpu() frees memory first and then
proceeds accessing it.

BUG: KASAN: use-after-free in find_first_bit+0x1f/0x80
Read of size 8 at addr ffff883ff7c1e780 by task cpuhp/31/195
find_first_bit+0x1f/0x80
has_busy_rmid+0x47/0x70
intel_rdt_offline_cpu+0x4b4/0x510

Freed by task 195:
kfree+0x94/0x1a0
intel_rdt_offline_cpu+0x17d/0x510

Do the teardown first and then free memory.

Fixes: 24247aeeabe9 ("x86/intel_rdt/cqm: Improve limbo list processing")
Reported-by: Joseph Salisbury <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Ravi Shankar <[email protected]>
Cc: Peter Zilstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Vikas Shivappa <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: "Roderick W. Smith" <[email protected]>
Cc: [email protected]
Cc: Fenghua Yu <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801161957510.2366@nanos

---
arch/x86/kernel/cpu/intel_rdt.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 88dcf84..9944237 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -525,10 +525,6 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
*/
if (static_branch_unlikely(&rdt_mon_enable_key))
rmdir_mondata_subdir_allrdtgrp(r, d->id);
- kfree(d->ctrl_val);
- kfree(d->rmid_busy_llc);
- kfree(d->mbm_total);
- kfree(d->mbm_local);
list_del(&d->list);
if (is_mbm_enabled())
cancel_delayed_work(&d->mbm_over);
@@ -545,6 +541,10 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
cancel_delayed_work(&d->cqm_limbo);
}

+ kfree(d->ctrl_val);
+ kfree(d->rmid_busy_llc);
+ kfree(d->mbm_total);
+ kfree(d->mbm_local);
kfree(d);
return;
}

2018-01-17 20:38:03

by Joseph Salisbury

[permalink] [raw]
Subject: Re: [REGRESSION][v4.14.y][v4.15] x86/intel_rdt/cqm: Improve limbo list processing

On 01/16/2018 01:59 PM, Thomas Gleixner wrote:
> On Tue, 16 Jan 2018, Yu, Fenghua wrote:
>>> From: Thomas Gleixner [mailto:[email protected]]
>> Is this a Haswell specific issue?
>>
>> I run the following test forever without issue on Broadwell and 4.15.0-rc6 with rdt mounted:
>> for ((;;)) do
>> for ((i=1;i<88;i++)) do
>> echo 0 >/sys/devices/system/cpu/cpu$i/online
>> done
>> echo "online cpus:"
>> grep processor /proc/cpuinfo |wc
>> for ((i=1;i<88;i++)) do
>> echo 1 >/sys/devices/system/cpu/cpu$i/online
>> done
>> echo "online cpus:"
>> grep processor /proc/cpuinfo|wc
>> done
>>
>> I'm finding a Haswell to reproduce the issue.
> Come on. This is crystal clear from the KASAN trace. And the fix is simple enough.
>
> You simply do not run into it because on your machine
>
> is_llc_occupancy_enabled() is false...
>
> Thanks,
>
> tglx
>
> 8<--------------------
>
> diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
> index 88dcf8479013..99442370de40 100644
> --- a/arch/x86/kernel/cpu/intel_rdt.c
> +++ b/arch/x86/kernel/cpu/intel_rdt.c
> @@ -525,10 +525,6 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
> */
> if (static_branch_unlikely(&rdt_mon_enable_key))
> rmdir_mondata_subdir_allrdtgrp(r, d->id);
> - kfree(d->ctrl_val);
> - kfree(d->rmid_busy_llc);
> - kfree(d->mbm_total);
> - kfree(d->mbm_local);
> list_del(&d->list);
> if (is_mbm_enabled())
> cancel_delayed_work(&d->mbm_over);
> @@ -545,6 +541,10 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
> cancel_delayed_work(&d->cqm_limbo);
> }
>
> + kfree(d->ctrl_val);
> + kfree(d->rmid_busy_llc);
> + kfree(d->mbm_total);
> + kfree(d->mbm_local);
> kfree(d);
> return;
> }

Thanks, Thomas.  I'll build some test kernels and have your patch tested
out.


Thanks,


Joe



2018-01-17 22:19:16

by Joseph Salisbury

[permalink] [raw]
Subject: Re: [REGRESSION][v4.14.y][v4.15] x86/intel_rdt/cqm: Improve limbo list processing

On 01/16/2018 01:59 PM, Thomas Gleixner wrote:
> On Tue, 16 Jan 2018, Yu, Fenghua wrote:
>>> From: Thomas Gleixner [mailto:[email protected]]
>> Is this a Haswell specific issue?
>>
>> I run the following test forever without issue on Broadwell and 4.15.0-rc6 with rdt mounted:
>> for ((;;)) do
>> for ((i=1;i<88;i++)) do
>> echo 0 >/sys/devices/system/cpu/cpu$i/online
>> done
>> echo "online cpus:"
>> grep processor /proc/cpuinfo |wc
>> for ((i=1;i<88;i++)) do
>> echo 1 >/sys/devices/system/cpu/cpu$i/online
>> done
>> echo "online cpus:"
>> grep processor /proc/cpuinfo|wc
>> done
>>
>> I'm finding a Haswell to reproduce the issue.
> Come on. This is crystal clear from the KASAN trace. And the fix is simple enough.
>
> You simply do not run into it because on your machine
>
> is_llc_occupancy_enabled() is false...
>
> Thanks,
>
> tglx
>
> 8<--------------------
>
> diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
> index 88dcf8479013..99442370de40 100644
> --- a/arch/x86/kernel/cpu/intel_rdt.c
> +++ b/arch/x86/kernel/cpu/intel_rdt.c
> @@ -525,10 +525,6 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
> */
> if (static_branch_unlikely(&rdt_mon_enable_key))
> rmdir_mondata_subdir_allrdtgrp(r, d->id);
> - kfree(d->ctrl_val);
> - kfree(d->rmid_busy_llc);
> - kfree(d->mbm_total);
> - kfree(d->mbm_local);
> list_del(&d->list);
> if (is_mbm_enabled())
> cancel_delayed_work(&d->mbm_over);
> @@ -545,6 +541,10 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
> cancel_delayed_work(&d->cqm_limbo);
> }
>
> + kfree(d->ctrl_val);
> + kfree(d->rmid_busy_llc);
> + kfree(d->mbm_total);
> + kfree(d->mbm_local);
> kfree(d);
> return;
> }

Hi Thomas,

Testing of your patch shows that your patch resolves the bug.  Thanks
for the assistance!  Is this something you could submit to mainline?

Thanks,


Joe


2018-01-17 22:56:57

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [REGRESSION][v4.14.y][v4.15] x86/intel_rdt/cqm: Improve limbo list processing

On Wed, 17 Jan 2018, Joseph Salisbury wrote:
> On 01/16/2018 01:59 PM, Thomas Gleixner wrote:
>
> Testing of your patch shows that your patch resolves the bug.  Thanks
> for the assistance!  Is this something you could submit to mainline?

Already there :)

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d47924417319e3b6a728c0b690f183e75bc2a702

Tagged for stable.

Thanks,

tglx

2018-01-17 23:01:09

by Joseph Salisbury

[permalink] [raw]
Subject: Re: [REGRESSION][v4.14.y][v4.15] x86/intel_rdt/cqm: Improve limbo list processing

On 01/17/2018 05:55 PM, Thomas Gleixner wrote:
> On Wed, 17 Jan 2018, Joseph Salisbury wrote:
>> On 01/16/2018 01:59 PM, Thomas Gleixner wrote:
>>
>> Testing of your patch shows that your patch resolves the bug.  Thanks
>> for the assistance!  Is this something you could submit to mainline?
> Already there :)
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d47924417319e3b6a728c0b690f183e75bc2a702
>
> Tagged for stable.
>
> Thanks,
>
> tglx

Thanks so much!