Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751730AbdFIQOY convert rfc822-to-8bit (ORCPT ); Fri, 9 Jun 2017 12:14:24 -0400 Received: from mail.kernel.org ([198.145.29.99]:58070 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751548AbdFIQOW (ORCPT ); Fri, 9 Jun 2017 12:14:22 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 81C4D239FD Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=robh@kernel.org MIME-Version: 1.0 In-Reply-To: <1497016569.3536.84.camel@pengutronix.de> References: <20170608041124.4624-1-chris.packham@alliedtelesis.co.nz> <20170608041124.4624-2-chris.packham@alliedtelesis.co.nz> <1497014062.3536.52.camel@pengutronix.de> <1497016569.3536.84.camel@pengutronix.de> From: Rob Herring Date: Fri, 9 Jun 2017 11:13:59 -0500 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: DMA-safety for edac_atomic_scrub on ARM To: =?UTF-8?Q?Jan_L=C3=BCbbe?= Cc: Borislav Petkov , "linux-arm-kernel@lists.infradead.org" , linux-edac@vger.kernel.org, Mauro Carvalho Chehab , "linux-kernel@vger.kernel.org" , Chris Packham , "kernel@pengutronix.de" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4437 Lines: 101 On Fri, Jun 9, 2017 at 8:56 AM, Jan Lübbe wrote: > Hi, > > I've CCed Rob as the original author of the ARM EDAC scrub function. > > On Fr, 2017-06-09 at 15:14 +0200, Jan Lübbe wrote: >> [...] >> > + mci->scrub_mode = SCRUB_SW_SRC; >> I'm not sure if this works as expected ARM as it is currently >> implemented, but that's a topic for a different mail. > > Some more background as I understand it so far: > > Configuring a EDAC MC to scrub_mode = SCRUB_SW_SRC causes the common > handler code to run a per-arch software scrub function. It is used by > several EADC drivers on ARM. > > drivers/edac/edac_mc.c: >> if (mci->scrub_mode == SCRUB_SW_SRC) { >> /* >> * Some memory controllers (called MCs below) can remap >> * memory so that it is still available at a different >> * address when PCI devices map into memory. >> * MC's that can't do this, lose the memory where PCI >> * devices are mapped. This mapping is MC-dependent >> * and so we call back into the MC driver for it to >> * map the MC page to a physical (CPU) page which can >> * then be mapped to a virtual page - which can then >> * be scrubbed. >> */ >> remapped_page = mci->ctl_page_to_phys ? >> mci->ctl_page_to_phys(mci, page_frame_number) : >> page_frame_number; >> >> edac_mc_scrub_block(remapped_page, >> offset_in_page, grain); >> } > > edac_mc_scrub_block() then basically checks if it actually hit a valid > PFN, maps the page with kmap_atomic and runs edac_atomic_scrub(). > For ARM this is implemented in arch/arm/include/asm/edac.h: >> /* >> * ECC atomic, DMA, SMP and interrupt safe scrub function. >> * Implements the per arch edac_atomic_scrub() that EDAC use for software >> * ECC scrubbing. It reads memory and then writes back the original >> * value, allowing the hardware to detect and correct memory errors. >> */ >> >> static inline void edac_atomic_scrub(void *va, u32 size) >> { >> #if __LINUX_ARM_ARCH__ >= 6 >> unsigned int *virt_addr = va; >> unsigned int temp, temp2; >> unsigned int i; >> >> for (i = 0; i < size / sizeof(*virt_addr); i++, virt_addr++) { >> /* Very carefully read and write to memory atomically >> * so we are interrupt, DMA and SMP safe. >> */ >> __asm__ __volatile__("\n" >> "1: ldrex %0, [%2]\n" >> " strex %1, %0, [%2]\n" >> " teq %1, #0\n" >> " bne 1b\n" >> : "=&r"(temp), "=&r"(temp2) >> : "r"(virt_addr) >> : "cc"); >> } >> #endif >> } > > The comment "ECC atomic, DMA, SMP and interrupt safe scrub function" > seems to be copied from the initial implementation on x86, first to > powerpc, to mips and later to arm. Yes. > On ARM, other bus masters are usually not coherent to the CPUs. This > means that exclusive loads/stores only affect (at most) the sharable > domain, which is usually a subset of the SoC. Maybe usually, but that's entirely system dependent. For the system that needed this (and maybe still the only 32-bit one), it was coherent i/o. Good luck doing 10G networking with a non-coherent master. > Consequently, when a bus master outside of the shareable domain (such as > a ethernet controller) writes to the same memory after the ldrex, the > strex will simply succeed and write stale data to the CPU cache, which > can later overwrite the data written by the ethernet controller. > > I'm not sure if it is actually possible to implement this in a DMA-safe > way on ARM without stopping all other bus masters. Generically, probably not. And stopping bus masters is probably not going to work. Bottom line is systems have to be designed for this or bus masters and drivers audited that they could deal with this. Many bus masters are not going to both repeatedly read the same location and write to it. It's a rare problem on an already rare occurrence. Rob