Date: Mon, 14 Apr 2014 14:01:54 +0100
From: Will Deacon <will.deacon@arm.com>
To: Ding Tianhong <dingtianhong@huawei.com>
Cc: Catalin Marinas <Catalin.Marinas@arm.com>, Sukie Peng <Sukie.Peng@arm.com>,
        "huxinwei@huawei.com" <huxinwei@huawei.com>,
        "linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] arm64: Flush the process's mm context TLB entries when
 switching
Message-ID: <20140414130154.GE3530@arm.com>
References: <534BCE80.3090406@huawei.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <534BCE80.3090406@huawei.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org

Hi Ding,

On Mon, Apr 14, 2014 at 01:03:12PM +0100, Ding Tianhong wrote:
> I met a problem when migrating process by following steps:
> 
> 1) The process was already running on core 0.
> 2) Set the CPU affinity of the process to 0x02 and move it to core 1,
>    it could work well.
> 3) Set the CPU affinity of the process to 0x01 and move it to core 0 again,
>    the problem occurs and the process was killed.

[...]

> It was a very strange problem that the PC and LR are both 0, and the esr is
> 0x83000006, it means that the used for instruction access generated MMU faults
> and synchronous external aborts, including synchronous parity errors.
> 
> I try to fix the problem by invalidating the process's TLB entries when switching,
> it will make the context stale and pick new one, and then it could work well.
> 
> So I think in some situation that after the process switching, the modification of
> the TLB entries in the new core didn't inform all other cores to invalidate the old
> TLB entries which was in the inner shareable caches, and then if the process schedule
> to another core, the old TLB entries may occur MMU faults.

Yes, it sounds like you don't have your TLBs configured correctly. Can you
confirm that your EL3 firmware is configuring TLB broadcasting correctly
please?

> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
> ---
>  arch/arm64/kernel/process.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 6391485..d7d8439 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -283,6 +283,13 @@ static void tls_thread_switch(struct task_struct *next)
>  	: : "r" (tpidr), "r" (tpidrro));
>  }
>  
> +static void tlb_flush_thread(struct task_struct *prev)
> +{
> +	/* Flush the prev task's TLB entries */
> +	if (prev->mm)
> +		flush_tlb_mm(prev->mm);
> +}
> +
>  /*
>   * Thread switching.
>   */
> @@ -296,6 +303,8 @@ struct task_struct *__switch_to(struct task_struct *prev,
>  	hw_breakpoint_thread_switch(next);
>  	contextidr_thread_switch(next);
>  
> +	tlb_flush_thread(prev);

NAK to the patch -- the architecture certainly doesn't require this, and
it's a huge hammer for what is more likely a firmware initialisation issue.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/