Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp4202430imu; Mon, 14 Jan 2019 17:27:06 -0800 (PST) X-Google-Smtp-Source: ALg8bN6hCP97iRsf3ZXd4EqaZ2IhCkfAQMvx8+wpEi59lmTxPnHj6na9i96CVgNwNxLSxzUbkZvM X-Received: by 2002:a63:2406:: with SMTP id k6mr1286183pgk.229.1547515626788; Mon, 14 Jan 2019 17:27:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547515626; cv=none; d=google.com; s=arc-20160816; b=k90e5IDX6chBjLHXAcSnHaMopuzte3fntW24ejbImXJGv/tyfpUfeLmItQCY2znA4o pmiPPOq3VR7ke6mK4OeCouF7hdUxMi8An2/v55k2exQEfdyrqtM71kTyOa3tIn/TyNGk QdrcMmonjBo5cEiQ6q6dWs/79rKwLit5ORAndsEocbBQVG/w+HUpubJhph7hPpaS1Ufd lLXoHj6FMiVCKwKUngCXh7FoE/zdjTUOhKLw5V7yXq6R7UMhWzXs7vCIcVvRhSotUO6V xjOioPgO0n3o4/wJzQ6wr2iFyBMiwYZKM+zjbHLufkBH2wjenj4X6St0bG7dl/4/LLE/ VapA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :content-language:accept-language:references:message-id:date :thread-index:thread-topic:subject:cc:to:from:dkim-signature; bh=xIvHPqdv/wco6akVLIiqjZZ3udeREV3XQ4jWtL9+pVM=; b=nU5dDdhKNj2c0ebSK+apG4JBV0+khiwFEViZE8P2bZw4Six3DCosfK2Z0J/ZFq6Q8M XVEwZ+sfD/xzsTNAYUzIr8FV3kugVMnYIjMSxOHCjUpjuYuZ2wogAwADxQXLRvoKICTK JzYTbXvNBv2D2O0Uax+eO5KmHYKRGFmRAyWVntQVpbZGFUbAHhD1lLUB6U5t0vAOG9CV VfwgTp9copFtAhXx9PJcWk3jdNBYqOpg+bagZ6cggwnM0CTxBKDnzNJf9f8RfAZeOs+1 71Yfiqd96iRkG6NllvKY4OYLlbHq9nunUxnusKZcR7NbkrHbxL1mknPTt/R/viH6/IaV c+wA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@synopsys.com header.s=mail header.b=GiWuwTjw; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=synopsys.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j61si1744730plb.232.2019.01.14.17.26.51; Mon, 14 Jan 2019 17:27:06 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@synopsys.com header.s=mail header.b=GiWuwTjw; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=synopsys.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727624AbfAOBJa (ORCPT + 99 others); Mon, 14 Jan 2019 20:09:30 -0500 Received: from us01smtprelay-2.synopsys.com ([198.182.47.9]:39918 "EHLO smtprelay.synopsys.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726769AbfAOBJa (ORCPT ); Mon, 14 Jan 2019 20:09:30 -0500 Received: from mailhost.synopsys.com (mailhost1.synopsys.com [10.12.238.239]) by smtprelay.synopsys.com (Postfix) with ESMTP id 9D67624E1175; Mon, 14 Jan 2019 17:09:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=synopsys.com; s=mail; t=1547514569; bh=hNjojRQr7ZRu2ZUvPh38A1DMhrXhO6q7jLuboqcg9rE=; h=From:To:CC:Subject:Date:References:From; b=GiWuwTjw20MP002PGt+duKXOeIhK7FoT+gXQ+t6MKSWuvGXQTWmQBdK2jIZEpcqRY 4HWc7zBvDDsqRbfH/jumSLqbuEjT5FTkyBUPutbXbVeWFjxjENgmgdJAYk2r2qrcKN E0VjIJ7PQu07lEiWvMZSC96bWTbi/hJ3Pex1ck+wukHvjAHIMHxrHOfXDKaRcvfw5C H2vzvMn5J+GA/qtKMkgjqgosqaqD11nR4oeAEnH4YQB4/PpsMQohHSgla291DppYka qUegbcbUMtGGFrfcIkjR3q59oeqhqRP9bQeHilZJQFnjkO+3KLOP/PlvLJOBMMTMn1 5G15sPjU0otrA== Received: from US01WEHTC2.internal.synopsys.com (us01wehtc2-vip.internal.synopsys.com [10.12.239.238]) by mailhost.synopsys.com (Postfix) with ESMTP id 8CD895887; Mon, 14 Jan 2019 17:09:29 -0800 (PST) Received: from US01WEMBX2.internal.synopsys.com ([fe80::e4b6:5520:9c0d:250b]) by US01WEHTC2.internal.synopsys.com ([10.12.239.237]) with mapi id 14.03.0415.000; Mon, 14 Jan 2019 17:09:13 -0800 From: Vineet Gupta To: Eugeniy Paltsev , "linux-snps-arc@lists.infradead.org" CC: "linux-kernel@vger.kernel.org" , "Alexey Brodkin" Subject: Re: [PATCH 1/2] ARCv2: LIB: memeset: fix doing prefetchw outside of buffer Thread-Topic: [PATCH 1/2] ARCv2: LIB: memeset: fix doing prefetchw outside of buffer Thread-Index: AQHUrBw2ZJte6r90DkSex+vHyLGaMw== Date: Tue, 15 Jan 2019 01:09:13 +0000 Message-ID: References: <20190114151649.32726-1-Eugeniy.Paltsev@synopsys.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.144.199.106] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 1/14/19 7:17 AM, Eugeniy Paltsev wrote:=0A= > Current ARCv2 memeset implementation may call 'prefetchw'=0A= > instruction for address which lies outside of memset area.=0A= > So we got one modified (dirty) cache line outside of memset=0A= > area. This may lead to data corruption if this area is used=0A= > for DMA IO.=0A= >=0A= > Another issue is that current ARCv2 memeset implementation=0A= > may call 'prealloc' instruction for L1 cache line which=0A= > doesn't fully belongs to memeset area in case of 128B L1 D$=0A= > line length. That leads to data corruption.=0A= >=0A= > Fix prefetchw/prealloc instructions using in case of 64B L1 data=0A= > cache line (default case) and don't use prefetch* instructions=0A= > for other possible L1 data cache line lengths (32B and 128B).=0A= >=0A= > Signed-off-by: Eugeniy Paltsev =0A= > ---=0A= > arch/arc/lib/memset-archs.S | 30 +++++++++++++++++++++++-------=0A= > 1 file changed, 23 insertions(+), 7 deletions(-)=0A= >=0A= > diff --git a/arch/arc/lib/memset-archs.S b/arch/arc/lib/memset-archs.S=0A= > index 62ad4bcb841a..c7717832336f 100644=0A= > --- a/arch/arc/lib/memset-archs.S=0A= > +++ b/arch/arc/lib/memset-archs.S=0A= > @@ -7,11 +7,32 @@=0A= > */=0A= > =0A= > #include =0A= > +#include =0A= > =0A= > #undef PREALLOC_NOT_AVAIL=0A= > =0A= > +/*=0A= > + * The memset implementation below is optimized to use prefetchw and pre= alloc=0A= > + * instruction in case of CPU with 64B L1 data cache line (L1_CACHE_SHIF= T =3D=3D 6)=0A= > + * If you want to implement optimized memset for other possible L1 data = cache=0A= > + * line lengths (32B and 128B) you should rewrite code carefully checkin= g=0A= > + * we don't call any prefetchw/prealloc instruction for L1 cache lines w= hich=0A= > + * don't belongs to memset area.=0A= =0A= Good point. FWIW, it is possible to support those non common line lengths b= y using=0A= L1_CACHE_SHIFT etc in asm code below but I agree its not worth the trouble.= =0A= =0A= > + */=0A= > +#if L1_CACHE_SHIFT!=3D6=0A= > +# define PREALLOC_INSTR(...)=0A= > +# define PREFETCHW_INSTR(...)=0A= > +#else /* L1_CACHE_SHIFT!=3D6 */=0A= > +# define PREFETCHW_INSTR(...) prefetchw __VA_ARGS__=0A= > +# ifdef PREALLOC_NOT_AVAIL=0A= > +# define PREALLOC_INSTR(...) prefetchw __VA_ARGS__=0A= > +# else=0A= > +# define PREALLOC_INSTR(...) prealloc __VA_ARGS__=0A= > +# endif=0A= > +#endif /* L1_CACHE_SHIFT!=3D6 */=0A= > +=0A= > ENTRY_CFI(memset)=0A= > - prefetchw [r0] ; Prefetch the write location=0A= > + PREFETCHW_INSTR([r0]) ; Prefetch the first write location=0A= > mov.f 0, r2=0A= > ;;; if size is zero=0A= > jz.d [blink]=0A= > @@ -48,11 +69,7 @@ ENTRY_CFI(memset)=0A= > =0A= > lpnz @.Lset64bytes=0A= > ;; LOOP START=0A= > -#ifdef PREALLOC_NOT_AVAIL=0A= > - prefetchw [r3, 64] ;Prefetch the next write location=0A= > -#else=0A= > - prealloc [r3, 64]=0A= > -#endif=0A= > + PREALLOC_INSTR([r3, 64]) ;Prefetch the next write location=0A= =0A= These are not solving the issue - I'd break this up and move these bits to = your=0A= next patch.=0A= =0A= > #ifdef CONFIG_ARC_HAS_LL64=0A= > std.ab r4, [r3, 8]=0A= > std.ab r4, [r3, 8]=0A= > @@ -85,7 +102,6 @@ ENTRY_CFI(memset)=0A= > lsr.f lp_count, r2, 5 ;Last remaining max 124 bytes=0A= > lpnz .Lset32bytes=0A= > ;; LOOP START=0A= > - prefetchw [r3, 32] ;Prefetch the next write location=0A= =0A= So the real fix for issue at hand is this line. I'll keep this for the fix = and=0A= beef up the changelog. Thing is existing code was already skipping the last= 64B=0A= from the main loop (thus avoided prefetching the next line), but then reint= roduced=0A= prefetchw is last 32B loop, spoiling the party. That prefetchw was pointle= ss anyways=0A= =0A= -Vineet=0A=