Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp3548140imm; Mon, 6 Aug 2018 06:45:02 -0700 (PDT) X-Google-Smtp-Source: AAOMgpe7jnPL8EI8UmsklXHXL88CxPcsLuDFJtMg7Rg4JrxJuLRrxUUiFm3kBo8E4X6Odq0sCNE5 X-Received: by 2002:a17:902:988a:: with SMTP id s10-v6mr13835167plp.200.1533563102496; Mon, 06 Aug 2018 06:45:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533563102; cv=none; d=google.com; s=arc-20160816; b=ICZfRhQRtHaYuOY/DUERYPNSKgckpudQkC3pYQ64Fgmx0/9Hfqx6G6ihwl6JJgMZWE 2PFKz/6SP6nhwPEbYQJKwCAlr1uHoYLScy4GqFMjKBbS7Yd4Y/0fTE80YonPZYKG4801 qIeeiiGT4nLCJlqHzjI9HgyZMyJbWcoH0E26MEhNBxrAvitSP0JOm5uIuiqvcIr7X187 T1t/yF4ASEH8Y305DTg3YxqeVwQcOf5z3gbNlRNLtt/GIl7WfAFOX5NzVCAv011CEISA UDrmSubBqPuAhiKTRrMrHXOOW7tZvAdJ6mYnmJLQECwyxnYZA8kS5gPYmHw2QeG8dgcy WyuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=Wio9OicJV1qDbH53FzfpZpYtJ5JnyZnA4pYV7RGCVcY=; b=LEypeoKxEpsuEvmeCXvGOG1IaF9LYuqqdVUNLdNwfAx15ISicm1jsIULTZNyL9qP1Z NTL3GPkQHOph8E02OdWJr8DBp6fDnSQ/74OBECjZ7+KkxswNW9mtIAb6yH4DMlC+3wQ3 pEoQJUG+6/U7th4Ks+9+9VP1wBWdcm9PMJLD5zVnqgSYpFVFsThCoPlfLVrEDCfTAURj YtqWuHlthdYNenUqP0SZ9xesVQA9KCHJApb+CDiZdryakO/eJJZlqcXpmgw4iE0P1BCF 1DOteL9hPXSodBN7zMG9Opm5YRTiJF7asRDSOETmk+a3KnkhCnVd8cgq4h7H3sLj345D CBGw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d4-v6si7299210pfc.219.2018.08.06.06.44.47; Mon, 06 Aug 2018 06:45:02 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730180AbeHFOvV (ORCPT + 99 others); Mon, 6 Aug 2018 10:51:21 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:38100 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726988AbeHFOvV (ORCPT ); Mon, 6 Aug 2018 10:51:21 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6852580D; Mon, 6 Aug 2018 05:42:24 -0700 (PDT) Received: from [10.4.12.131] (e110467-lin.emea.arm.com [10.4.12.131]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 751CD3F2EA; Mon, 6 Aug 2018 05:42:22 -0700 (PDT) Subject: Re: framebuffer corruption due to overlapping stp instructions on arm64 To: Mikulas Patocka , Ard Biesheuvel Cc: Thomas Petazzoni , Joao Pinto , linux-pci , Jingoo Han , Will Deacon , Russell King , Linux Kernel Mailing List , Matt Sealey , Catalin Marinas , linux-arm-kernel References: <20180803094129.GB17798@arm.com> From: Robin Murphy Message-ID: <99fff4fe-afa9-f12f-a518-472a9dd1c530@arm.com> Date: Mon, 6 Aug 2018 13:42:20 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/08/18 11:25, Mikulas Patocka wrote: [...] >> None of this explains why some transactions fail to make it across >> entirely. The overlapping writes in question write the same data to >> the memory locations that are covered by both, and so the ordering in >> which the transactions are received should not affect the outcome. > > You're right that the corruption couldn't be explained just by reordering > writes. My hypothesis is that the PCIe controller tries to disambiguate > the overlapping writes, but the disambiguation logic was not tested and it > is buggy. If there's a barrier between the overlapping writes, the PCIe > controller won't see any overlapping writes, so it won't trigger the > faulty disambiguation logic and it works. > > Could the ARM engineers look if there's some chicken bit in Cortex-A72 > that could insert barriers between non-cached writes automatically? I don't think there is, and even if there was I imagine it would have a pretty hideous effect on non-coherent DMA buffers and the various other places in which we have Normal-NC mappings of actual system RAM. > I observe these kinds of corruptions: > - failing to write a few bytes That could potentially be explained by the reordering/atomicity issues Matt mentioned, i.e. the load is observing part of the store, before the store has fully completed. > - writing a few bytes that were written 16 bytes before > - writing a few bytes that were written 16 bytes after Those sound more like the interconnect or root complex ignoring the byte strobes on an unaligned burst, of which I think the simplistic view would be "it's broken". FWIW I stuck my old Nvidia 7600GT card in my Arm Juno r2 board (2x Cortex-A72), built your test program natively with GCC 8.1.1 at -O2, and it's still happily flickering pixels in the corner of the console after nearly an hour (in parallel with some iperf3 just to ensure plenty of PCIe traffic). I would strongly suspect this issue is particular to Armada 8k, so its' probably one for the Marvell folks to take a closer look at - I believe some previous interconnect issues on those SoCs were actually fixable in firmware. Robin.