2022-11-09 11:16:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v8] misc: sgi-gru: fix use-after-free error in gru_set_context_option, gru_fault and gru_handle_user_call_os

On Wed, Nov 09, 2022 at 06:51:58PM +0800, Zheng Wang wrote:
> Gts may be freed in gru_check_chiplet_assignment.
> The caller still use it after that, UAF happens.

I do not understand what this text means, sorry. Can you try to make it
more descriptive?

>
> Fix it by introducing a return value to see if it's in error path or not.
> Free the gts in caller if gru_check_chiplet_assignment check failed.

Please wrap all of your changelog text at 72 columns, you have 2
paragraphs with different wrappings.

>
> Fixes: 55484c45dbec ("gru: allow users to specify gru chiplet 2")
> Signed-off-by: Zheng Wang <[email protected]>
> Acked-by: Dimitri Sivanich <[email protected]>
> ---
> v8:
> - remove tested-by tag suggested by Greg
>
> v7:
> - fix some spelling problems suggested by Greg, change kernel test robot from reported-by tag to tested-by tag
>
> v6:
> - remove unused var checked by kernel test robot
>
> v5:
> - fix logical issue and remove unnecessary variable suggested by Dimitri Sivanich
>
> v4:
> - use VM_FAULT_NOPAGE as failure code in gru_fault and -EINVAL in other functions suggested by Yejian
>
> v3:
> - add preempt_enable and use VM_FAULT_NOPAGE as failure code suggested by Yejian
>
> v2:
> - commit message changes suggested by Greg
>
> v1: https://lore.kernel.org/lkml/CAJedcCzY72jqgF-pCPtx66vXXwdPn-KMagZnqrxcpWw1NxTLaA@mail.gmail.com/
> ---
> drivers/misc/sgi-gru/grufault.c | 14 ++++++++++++--
> drivers/misc/sgi-gru/grumain.c | 16 ++++++++++++----
> drivers/misc/sgi-gru/grutables.h | 2 +-
> 3 files changed, 25 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
> index d7ef61e602ed..bdd515d33225 100644
> --- a/drivers/misc/sgi-gru/grufault.c
> +++ b/drivers/misc/sgi-gru/grufault.c
> @@ -656,7 +656,9 @@ int gru_handle_user_call_os(unsigned long cb)
> if (ucbnum >= gts->ts_cbr_au_count * GRU_CBR_AU_SIZE)
> goto exit;
>
> - gru_check_context_placement(gts);
> + ret = gru_check_context_placement(gts);
> + if (ret)
> + goto err;
>
> /*
> * CCH may contain stale data if ts_force_cch_reload is set.
> @@ -677,6 +679,10 @@ int gru_handle_user_call_os(unsigned long cb)
> exit:
> gru_unlock_gts(gts);
> return ret;
> +err:
> + gru_unlock_gts(gts);
> + gru_unload_context(gts, 1);
> + return -EINVAL;
> }
>
> /*
> @@ -874,7 +880,11 @@ int gru_set_context_option(unsigned long arg)
> } else {
> gts->ts_user_blade_id = req.val1;
> gts->ts_user_chiplet_id = req.val0;
> - gru_check_context_placement(gts);
> + if (gru_check_context_placement(gts)) {
> + gru_unlock_gts(gts);
> + gru_unload_context(gts, 1);
> + return -EINVAL;
> + }
> }
> break;
> case sco_gseg_owner:
> diff --git a/drivers/misc/sgi-gru/grumain.c b/drivers/misc/sgi-gru/grumain.c
> index 9afda47efbf2..beba69fc3cd7 100644
> --- a/drivers/misc/sgi-gru/grumain.c
> +++ b/drivers/misc/sgi-gru/grumain.c
> @@ -716,9 +716,10 @@ static int gru_check_chiplet_assignment(struct gru_state *gru,
> * chiplet. Misassignment can occur if the process migrates to a different
> * blade or if the user changes the selected blade/chiplet.
> */
> -void gru_check_context_placement(struct gru_thread_state *gts)
> +int gru_check_context_placement(struct gru_thread_state *gts)
> {
> struct gru_state *gru;
> + int ret = 0;
>
> /*
> * If the current task is the context owner, verify that the
> @@ -727,14 +728,16 @@ void gru_check_context_placement(struct gru_thread_state *gts)
> */
> gru = gts->ts_gru;
> if (!gru || gts->ts_tgid_owner != current->tgid)
> - return;
> + return ret;

Why does this check return "all is good!" ?

Shouldn't that be an error?

thanks,

greg k-h


2022-11-09 12:17:42

by Zheng Hacker

[permalink] [raw]
Subject: Re: [PATCH v8] misc: sgi-gru: fix use-after-free error in gru_set_context_option, gru_fault and gru_handle_user_call_os

Greg KH <[email protected]> 于2022年11月9日周三 19:12写道:
>
> On Wed, Nov 09, 2022 at 06:51:58PM +0800, Zheng Wang wrote:
> > Gts may be freed in gru_check_chiplet_assignment.
> > The caller still use it after that, UAF happens.
>
> I do not understand what this text means, sorry. Can you try to make it
> more descriptive?
>

Sorry for my unclear description. The new candidate is as follows:
In some bad situation, the gts may be freed gru_check_chiplet_assignment.
The call chain can be gru_unload_context->gru_free_gru_context->gts_drop
and kfree finally. However, the caller didn't know if the gts is freed
or not and
use it afterwards. This will trigger a Use after Free bug.

> >
> > Fix it by introducing a return value to see if it's in error path or not.
> > Free the gts in caller if gru_check_chiplet_assignment check failed.
>
> Please wrap all of your changelog text at 72 columns, you have 2
> paragraphs with different wrappings.
>

Get it, sorry for that.

> > /*
> > * If the current task is the context owner, verify that the
> > @@ -727,14 +728,16 @@ void gru_check_context_placement(struct gru_thread_state *gts)
> > */
> > gru = gts->ts_gru;
> > if (!gru || gts->ts_tgid_owner != current->tgid)
> > - return;
> > + return ret;
>
> Why does this check return "all is good!" ?
>
> Shouldn't that be an error?
>
This check is something like "if the gts has been initiiazed properly".
If it's not, I thinks we shouldn't treat the gts like something very
bad happend. Because in the later request, the gts can still have a
chance to be configed/updated properly. This is different from "it's
too bad so we have to unload gts right now". This is just my personal
point of view. Besides, the caller of this function have token it into
consider. In gru_fault, it will try again and in gru_handle_user_call_os,
it will return -EAGAIN. In gru_set_context_option, it will be fine
because there won't be any operation on gts->ts_gru or gts->ts_tgid_owner

Best regards,

Zheng Wang

2022-11-09 12:34:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v8] misc: sgi-gru: fix use-after-free error in gru_set_context_option, gru_fault and gru_handle_user_call_os

On Wed, Nov 09, 2022 at 08:04:04PM +0800, Zheng Hacker wrote:
> Greg KH <[email protected]> 于2022年11月9日周三 19:12写道:
> > > /*
> > > * If the current task is the context owner, verify that the
> > > @@ -727,14 +728,16 @@ void gru_check_context_placement(struct gru_thread_state *gts)
> > > */
> > > gru = gts->ts_gru;
> > > if (!gru || gts->ts_tgid_owner != current->tgid)
> > > - return;
> > > + return ret;
> >
> > Why does this check return "all is good!" ?
> >
> > Shouldn't that be an error?
> >
> This check is something like "if the gts has been initiiazed properly".
> If it's not, I thinks we shouldn't treat the gts like something very
> bad happend. Because in the later request, the gts can still have a
> chance to be configed/updated properly. This is different from "it's
> too bad so we have to unload gts right now". This is just my personal
> point of view. Besides, the caller of this function have token it into
> consider. In gru_fault, it will try again and in gru_handle_user_call_os,
> it will return -EAGAIN. In gru_set_context_option, it will be fine
> because there won't be any operation on gts->ts_gru or gts->ts_tgid_owner

Then you need to document it why this is "success" as it is not obvious
at all.

thanks,

greg k-h

2022-11-09 13:17:29

by Zheng Hacker

[permalink] [raw]
Subject: Re: [PATCH v8] misc: sgi-gru: fix use-after-free error in gru_set_context_option, gru_fault and gru_handle_user_call_os

Greg KH <[email protected]> 于2022年11月9日周三 20:09写道:
>
> On Wed, Nov 09, 2022 at 08:04:04PM +0800, Zheng Hacker wrote:
> > Greg KH <[email protected]> 于2022年11月9日周三 19:12写道:
> > > > /*
> > > > * If the current task is the context owner, verify that the
> > > > @@ -727,14 +728,16 @@ void gru_check_context_placement(struct gru_thread_state *gts)
> > > > */
> > > > gru = gts->ts_gru;
> > > > if (!gru || gts->ts_tgid_owner != current->tgid)
> > > > - return;
> > > > + return ret;
> > >
> > > Why does this check return "all is good!" ?
> > >
> > > Shouldn't that be an error?
> > >
> > This check is something like "if the gts has been initiiazed properly".
> > If it's not, I thinks we shouldn't treat the gts like something very
> > bad happend. Because in the later request, the gts can still have a
> > chance to be configed/updated properly. This is different from "it's
> > too bad so we have to unload gts right now". This is just my personal
> > point of view. Besides, the caller of this function have token it into
> > consider. In gru_fault, it will try again and in gru_handle_user_call_os,
> > it will return -EAGAIN. In gru_set_context_option, it will be fine
> > because there won't be any operation on gts->ts_gru or gts->ts_tgid_owner
>
> Then you need to document it why this is "success" as it is not obvious
> at all.
>
Oh yes. I will append a comment to document it with the code in the
next version of patch.

Best regards,
Zheng Wang