We've observed crashes due to an empty cpu mask in
hyperv_flush_tlb_others. Obviously the cpu mask in question is changed
between the cpumask_empty call at the beginning of the function and when
it is actually used later.
One theory is that an interrupt comes in between and a code path ends up
changing the mask. Move the check after interrupt has been disabled to
see if it fixes the issue.
Signed-off-by: Wei Liu <[email protected]>
Cc: [email protected]
---
arch/x86/hyperv/mmu.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
index 5208ba49c89a..2c87350c1fb0 100644
--- a/arch/x86/hyperv/mmu.c
+++ b/arch/x86/hyperv/mmu.c
@@ -66,11 +66,17 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
if (!hv_hypercall_pg)
goto do_native;
- if (cpumask_empty(cpus))
- return;
-
local_irq_save(flags);
+ /*
+ * Only check the mask _after_ interrupt has been disabled to avoid the
+ * mask changing under our feet.
+ */
+ if (cpumask_empty(cpus)) {
+ local_irq_restore(flags);
+ return;
+ }
+
flush_pcpu = (struct hv_tlb_flush **)
this_cpu_ptr(hyperv_pcpu_input_arg);
--
2.20.1
From: Wei Liu <[email protected]> Sent: Tuesday, January 5, 2021 9:51 AM
>
> We've observed crashes due to an empty cpu mask in
> hyperv_flush_tlb_others. Obviously the cpu mask in question is changed
> between the cpumask_empty call at the beginning of the function and when
> it is actually used later.
>
> One theory is that an interrupt comes in between and a code path ends up
> changing the mask. Move the check after interrupt has been disabled to
> see if it fixes the issue.
>
> Signed-off-by: Wei Liu <[email protected]>
> Cc: [email protected]
> ---
> arch/x86/hyperv/mmu.c | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
> index 5208ba49c89a..2c87350c1fb0 100644
> --- a/arch/x86/hyperv/mmu.c
> +++ b/arch/x86/hyperv/mmu.c
> @@ -66,11 +66,17 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
> if (!hv_hypercall_pg)
> goto do_native;
>
> - if (cpumask_empty(cpus))
> - return;
> -
> local_irq_save(flags);
>
> + /*
> + * Only check the mask _after_ interrupt has been disabled to avoid the
> + * mask changing under our feet.
> + */
> + if (cpumask_empty(cpus)) {
> + local_irq_restore(flags);
> + return;
> + }
> +
> flush_pcpu = (struct hv_tlb_flush **)
> this_cpu_ptr(hyperv_pcpu_input_arg);
>
> --
> 2.20.1
Reviewed-by: Michael Kelley <[email protected]>
On Tue, Jan 05, 2021 at 06:20:05PM +0000, Michael Kelley wrote:
> From: Wei Liu <[email protected]> Sent: Tuesday, January 5, 2021 9:51 AM
> >
> > We've observed crashes due to an empty cpu mask in
> > hyperv_flush_tlb_others. Obviously the cpu mask in question is changed
> > between the cpumask_empty call at the beginning of the function and when
> > it is actually used later.
> >
> > One theory is that an interrupt comes in between and a code path ends up
> > changing the mask. Move the check after interrupt has been disabled to
> > see if it fixes the issue.
> >
> > Signed-off-by: Wei Liu <[email protected]>
> > Cc: [email protected]
> > ---
> > arch/x86/hyperv/mmu.c | 12 +++++++++---
> > 1 file changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
> > index 5208ba49c89a..2c87350c1fb0 100644
> > --- a/arch/x86/hyperv/mmu.c
> > +++ b/arch/x86/hyperv/mmu.c
> > @@ -66,11 +66,17 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
> > if (!hv_hypercall_pg)
> > goto do_native;
> >
> > - if (cpumask_empty(cpus))
> > - return;
> > -
> > local_irq_save(flags);
> >
> > + /*
> > + * Only check the mask _after_ interrupt has been disabled to avoid the
> > + * mask changing under our feet.
> > + */
> > + if (cpumask_empty(cpus)) {
> > + local_irq_restore(flags);
> > + return;
> > + }
> > +
> > flush_pcpu = (struct hv_tlb_flush **)
> > this_cpu_ptr(hyperv_pcpu_input_arg);
> >
> > --
> > 2.20.1
>
> Reviewed-by: Michael Kelley <[email protected]>
>
Applied to hyperv-fixes.
Wei.