Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp1316222imm; Wed, 8 Aug 2018 14:53:25 -0700 (PDT) X-Google-Smtp-Source: AA+uWPzAZlqy68wiAjYySP8/D3Aq9chCtIAo3rfQo0t3gXHTFZEcIpGBHJXCU/Uu3eeL5Dt8QCEx X-Received: by 2002:a17:902:988a:: with SMTP id s10-v6mr4055600plp.200.1533765205052; Wed, 08 Aug 2018 14:53:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533765205; cv=none; d=google.com; s=arc-20160816; b=YeaaUd4R5XlGg1byoRDEHXOvZ4Hti70GruuV5+V3O9QDBphSKVU2M/+SlyiZsXMk/e moEyldU5KJsUZUH9qHIMM/QtBp20qDRoO31P/r1sWtIZZT9kLXzYFBGcqcccbWkc3MzT 3DF5BqtwJmguq2PeSKAVxETxh3RgLIUn6LewvSQgp95wS91Cmgis+hb/Tz8ts3zTaAkF 6xphxujd8V68KliMB1kq4CfvkhWxfjbaPcpl78U/szJqf9+MsgbpOK1Rq5lUc4e+l5wl ZgBawwmXsQEP7fqruxV/bmnXjRyNhL4fF9VWznkwd0i1NxHujLX8pE8flQGdQ8IGfuyv RBlQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:arc-authentication-results; bh=PYKftUCIRQAy6E42xbS6fbgU43o8syOEzEEx1Z5fSfQ=; b=VlBA+4AAMy0pB2i4feQmeid+Lbsp6IyKOaV9gkqoo7Ft2iVXW5aZtfc1hpp+jLoEUb ALMOP/pltdhmdVHpnxNd4ot6Kf7A55P9BibnfHuD/Ga4HITchzySWZ5hF/bsJJ/7su9X DTxgQszLhK1HQQuD65xsJff1MeHZbplCNXajMN0599iUn3QC2pcz5rzvbiPCfuml/RJt U4/bCRMjoH3FM6Gf3oFY9ZrnlprN7u8ctV6oOMEriTJh32ruxWvNVMbDvLp2oiAhaNLa q2GVsA/SLxpCk2bZE4BZbSmpfbm7G03etbITOXszSGgHSmDBam/IRHJcuSM+GTzlvzxr VxAA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x1-v6si4146606plb.253.2018.08.08.14.53.10; Wed, 08 Aug 2018 14:53:25 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730972AbeHIANr (ORCPT + 99 others); Wed, 8 Aug 2018 20:13:47 -0400 Received: from mail-qt0-f171.google.com ([209.85.216.171]:39805 "EHLO mail-qt0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727337AbeHIANr (ORCPT ); Wed, 8 Aug 2018 20:13:47 -0400 Received: by mail-qt0-f171.google.com with SMTP id q12-v6so4316719qtp.6; Wed, 08 Aug 2018 14:52:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=PYKftUCIRQAy6E42xbS6fbgU43o8syOEzEEx1Z5fSfQ=; b=jtDDWyHcO54R2xN53OtI1BTx9FtqQIfmOt7igUVpOrPHNW8MWR9Fe3cFUjFnAYERGS cEV1PBGnhGlHMgl68HXUaaEKcVq7CpZFku5Xvt+BgMYoPqNxhQwJoD281nonkoRHtFuV m3PXdAf5B6GhkPfEiXu21qk8wA/FKRScLtLILx21H3O70y4YNMVsNjZo0cTyc0610KKN 3f2RHjyjTjAJNPlMUCGXPUjKH7jzcM0U+I8C30jiU4KCcuJf5pdc+AufIEQJQhoXQOyj ffWtPrcaW5KbLZw04HQRWxxoa0jb4KJ9y4QPM26gi4ZLIr4JFIwpcVuRhCUEJFU7AjKN My8Q== X-Gm-Message-State: AOUpUlGF3y6t40IqioE98jbgymg4F5VusjdZTpZXzwF2TlDgHLLa8GlR 0mMiRROC1Nwbf9KpKqUOJpdmZovOBjbUcWPSJZY= X-Received: by 2002:a0c:93b3:: with SMTP id f48-v6mr4195966qvf.151.1533765130789; Wed, 08 Aug 2018 14:52:10 -0700 (PDT) MIME-Version: 1.0 References: <20180803094129.GB17798@arm.com> <20180808113927.GA24736@iMac.local> <20180808151444.GF24736@iMac.local> In-Reply-To: From: Arnd Bergmann Date: Wed, 8 Aug 2018 23:51:53 +0200 Message-ID: Subject: Re: framebuffer corruption due to overlapping stp instructions on arm64 To: Mikulas Patocka Cc: Catalin Marinas , Richard.Earnshaw@arm.com, Thomas Petazzoni , Joao Pinto , GNU C Library , Ard Biesheuvel , Jingoo Han , Will Deacon , Russell King - ARM Linux , Linux Kernel Mailing List , neko@bakuhatsu.net, linux-pci , Linux ARM Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 8, 2018 at 8:25 PM Mikulas Patocka wrote: > On Wed, 8 Aug 2018, Arnd Bergmann wrote: > > > On Wed, Aug 8, 2018 at 5:15 PM Catalin Marinas wrote: > > > > > > On Wed, Aug 08, 2018 at 04:01:12PM +0100, Richard Earnshaw wrote: > > > > On 08/08/18 15:12, Mikulas Patocka wrote: > > > > > On Wed, 8 Aug 2018, Catalin Marinas wrote: > > > > >> On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: > > > - failing to write a few bytes > > > - writing a few bytes that were written 16 bytes before > > > - writing a few bytes that were written 16 bytes after > > > > > > > The overlapping writes in memcpy never write different values to the > > > > same location, so I still feel this must be some sort of HW issue, not a > > > > SW one. > > > > > > So do I (my interpretation is that it combines or rather skips some of > > > the writes to the same 16-byte address as it ignores the data strobes). > > > > Maybe it just always writes to the wrong location, 16 bytes apart for one of > > the stp instructions. Since we are usually dealing with a pair of overlapping > > 'stp', both unaligned, that could explain both the missing bytes (we write > > data to the wrong place, but overwrite it with the correct data right away) > > and the extra copy (we write it to the wrong place, but then write the correct > > data to the correct place as well). > > > > This sounds a bit like what the original ARM CPUs did on unaligned > > memory access, where a single aligned 4-byte location was accessed, > > but the bytes swapped around. > > > > There may be a few more things worth trying out or analysing from > > the recorded past failures to understand more about how it goes > > wrong: > > > > - For which data lengths does it fail? Having two overlapping > > unaligned stp is something that only happens for 16..96 byte > > memcpy. > > If you want to research the corruptions in detail, I uploaded a file > containing 7k corruptions here: > http://people.redhat.com/~mpatocka/testcases/arm-pcie-corruption/ Nice! I already found a couple of things: - Failure to copy always happens at the *end* of a 16 byte aligned physical address, it misses between 1 and 6 bytes, never 7 or more, and it's more likely to be fewer bytes that are affected. 279 7 389 6 484 5 683 4 741 3 836 2 946 1 - The first byte that fails to get copied is always 16 bytes after the memcpy target. Since we only observe it at the end of the 16 byte range, it means this happens specifically for addresses ending in 0x9 (7 bytes missed) to 0xf (1 byte missed). - Out of 7445 corruptions, 4358 were of the kind that misses a copy at the end of a 16-byte area, they were for copies between 41 and 64 bytes, more to the larger end of the scale (note that with your test program, smaller memcpys happen more frequenly than larger ones). 47 0x29 36 0x2a 47 0x2b 23 0x2c 29 0x2d 31 0x2e 36 0x2f 46 0x30 45 0x31 51 0x32 62 0x33 64 0x34 77 0x35 91 0x36 90 0x37 100 0x38 100 0x39 209 0x3a 279 0x3b 366 0x3c 498 0x3d 602 0x3e 682 0x3f 747 0x40 - All corruption with data copied to the wrong place happened for copies between 33 and 47 bytes, mostly to the smaller end of the scale: 391 0x21 360 0x22 319 0x23 273 0x24 273 0x25 241 0x26 224 0x27 221 0x28 231 0x29 208 0x2a 163 0x2b 86 0x2c 63 0x2d 33 0x2e 1 0x2f - One common (but not the only, still investigating) case for data getting written to the wrong place is: * corruption starts 16 bytes after the memcpy start * corrupt bytes are the same as the bytes written to the start * start address ends in 0x1 through 0x7 * length of corruption is at most memcpy length- 32, always between 1 and 7. Arnd