Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp333421ybz; Tue, 21 Apr 2020 09:52:36 -0700 (PDT) X-Google-Smtp-Source: APiQypI1biB8RyIYXJ1AO+CBHWBd1YZSHN0Y1LbLibPd/BBCHbNQ5cmmQ0hztAV9+kgqYnr8LNgr X-Received: by 2002:a17:906:5e45:: with SMTP id b5mr21452649eju.0.1587487956797; Tue, 21 Apr 2020 09:52:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1587487956; cv=none; d=google.com; s=arc-20160816; b=uZgo0SerUWXa6fhQiNtso2JHuw9dVzImzWF1V109LHm5QC5vai9HE31x0E+ILX2gt8 WgpH5PsRjARWPQEk8LaK1mTQhoVmr1AvLtJnvpaWzWj7X7c1mvkVtSvqKMhz6HIvaL1m kcygMRkUyNLaUSCgyoaSn4We/uEsFf/7AB/1YmvJWOFYDtwUbWOPzg42Y3fmQPMNYpQ+ EA8l3PsBKa9PDX8xjfk4QPuseBPKlonwLWK5YZBIOf+DiOXynMFK7zxb1QsQNKIlgspT jVgf+6Wvp9OF4pq8GaHw8RDzQGwUbnCdkVjLL0QDmf1PgGlsLZanr1fe6sw49v9h938w mL8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=zLPk1ucSWG63rBa8bRNJDWUD8ghjQyx9gnIQVo/nBmA=; b=HR+wH3zB7plBhoRSikV6WD22uwYL/JbkRmfwfGotvhE0W33QM+EQXerkc8hPM7+vdn OFaekSahzP/SUHAhSYhNOO8yhvzI46dVayMkMuMhUL+Z9AZz0l/Z7nD87tyH5oTEAz3p P8kyF3+qESSwPh3WCh4nBOfvkJghMeP0yuhueqLvpNhW/PMdFU3R4IlSfM1vHVCaN3g9 GWDqQVkTbNTq/qHXQ8GP8s0NXEEJo8iFPGDHOJgZyWrGc5xRQAvoZOZSRCihuzvpgjjE ro6Ea1oweHBwNODtfPfcWXvqZQFqS4ZnlHhaJV5e14LTUXRqrzSIMaa1jdGD9O/oDLoz wKYw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j22si1827255ejm.490.2020.04.21.09.52.12; Tue, 21 Apr 2020 09:52:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728957AbgDUQua (ORCPT + 99 others); Tue, 21 Apr 2020 12:50:30 -0400 Received: from foss.arm.com ([217.140.110.172]:38402 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725963AbgDUQu3 (ORCPT ); Tue, 21 Apr 2020 12:50:29 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DD05E1FB; Tue, 21 Apr 2020 09:50:28 -0700 (PDT) Received: from [192.168.0.14] (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 738FF3F73D; Tue, 21 Apr 2020 09:50:27 -0700 (PDT) Subject: Re: [PATCH] arm32: fix flushcache syscall with device address To: Jonathan Cameron Cc: Will Deacon , Tian Tao , catalin.marinas@arm.com, linux-kernel@vger.kernel.org, linuxarm@huawei.com, linux-arm-kernel@lists.infradead.org, gregkh@linuxfoundation.org, tglx@linutronix.de, info@metux.net, allison@lohutok.net References: <1587456514-61156-1-git-send-email-tiantao6@hisilicon.com> <20200421081239.GA15439@willie-the-truck> <20200421121651.000009f0@Huawei.com> From: James Morse Message-ID: Date: Tue, 21 Apr 2020 17:50:22 +0100 User-Agent: Mozilla/5.0 (X11; Linux aarch64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: <20200421121651.000009f0@Huawei.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-GB Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi guys, (Subject Nit: arm64, as that is what your patch modifies) On 21/04/2020 12:16, Jonathan Cameron wrote: > On Tue, 21 Apr 2020 09:12:39 +0100 > Will Deacon wrote: > >> On Tue, Apr 21, 2020 at 04:08:34PM +0800, Tian Tao wrote: >>> An issue has been observed on our Kungpeng916 systems when using a PCI >>> express GPU. This occurs when a 32 bit application running on a 64 bit >>> kernel issues a cache flush operation to a memory address that is in >>> a PCI BAR of the GPU.The results in an illegal operation and >>> subsequent crash. >> >> A kernel crash? If so, please can you include the log here? > > Deploying my finest copy typing from the image Tian Tao sent out > > KERNEL: /root/vmlinux-4.19.36-3patch-00228-debuginfo > DUMPFILE: vmcore [PARTIAL DUMP] > CPUS: 64 > DATE: Fri Mar 20 06:59:56 2020 > UPTIME: 07:01:01 > LOAD AVERAGE: 33.76, 35.45, 35.79 > TASKS: 59447 > NODENAME: cpus-new-ondemand-0509 > RELEASE: 4.19.36-3patch-0228 > VERSION: #4 SMP Fri Feb 28 15:18:51 UTC 2020 > MACHINE: aarch64 (unknown MHz) > MEMORY: 255.7 GB > PANIC: "kernel panic - not syncing: Asynchronous SError Interrupt" > PID: 175108 > COMMAND: "UnityMain" > TASK: ffff80a96999dd00 [THREAD_INFO: ffff80a96999dd00] > CPU: 62 > STATE: TASK_RUNNING (PANIC) > > crash> bt > PID: 175108 TASK: ffff80a96999dd00 CPU: 62 COMMAND: "UnityMain" > #0 [ffff000194e1b920] machine_kexec at ffff0000080a265c > #1 [ffff000194e1b980] __crash_kexec at ffff0000081b3ba8 > #2 [ffff000194e1bb10] panic at ffff0000080ecc98 > #3 [ffff000194e1bbf0] nmi_panic at ffff0000080ec7f4 > #4 [ffff000194e1bc10] arm64_serror_panic at fff00000809019c > #5 [ffff000194e1bc30] do_serror at ffff00000809039c > #6 [ffff000194e1bd90] el1_error at ffff000008083e50 > #7 [ffff000194e1bda0] __flush_icache_range at ffff0000080a9ec4 > #8 [ffff000194e1be60] el0_svc_common at fff0000080977d8 > #9 [ffff000194e1bea0] el0_svc_compat_handler at ffff0000080979b4 > #10 [ffff000194e1bff0] el0_svc_compat at ffff0000008083874 > > PC: c90fe7f8 LR: c90ff09c SP: d2afa8e0 PSTATE: 800b0010 > X12: c56e96e4 X11: d2afaa48 X10: d0ff1000 X9: d2afab68 > x8: 000000d6 X7: 000f0002 X6: d3c61840 X5: d3c61001 > X4: d3c03000 X3: 0004d54a x2: 00000000 x1: d3c61040 > X0: d3c61000 > > > New advanced test for Mavis Beacon teaches typing. Thanks for doing that! > In summary this is all we have to hand... Jumping back to the patch: On 21/04/2020 09:08, Tian Tao wrote:> diff --git a/arch/arm64/kernel/sys_compat.c b/arch/arm64/kernel/sys_compat.c > index 3c18c24..6c07944 100644 > --- a/arch/arm64/kernel/sys_compat.c > +++ b/arch/arm64/kernel/sys_compat.c > @@ -32,6 +94,13 @@ __do_compat_cache_op(unsigned long start, unsigned long end) > if (fatal_signal_pending(current)) > return 0; > > + /* do not flush page table is non-cacheable */ > + if (!__check_pt_cacheable(start)) { > + cond_resched(); > + start += chunk; > + continue; > + } The Arm-arm expects this to work, so we can't just skip it! D4.4.8's "Effects of instructions that operate by VA to the PoC" has: | For Device memory and Normal memory that is Inner Non-cacheable, Outer Non-cacheable, | these instructions must affect the caches of all PEs in the Outer Shareable shareability | domain of the PE on which the instruction is operating. What does aarch64 user-space do in the same situation? Surely that also goes down in flames too!? I think the real problem here is you've given this kind of mapping to user-space. If the device behind the mapping can respond like this, you must trust user-space not to poke it inappropriately. If its not got all the properties of memory: please don't pretend its memory. If its a device, it probably needs to be carefully managed by a dedicated driver. Do you know where the abort comes from and why? (The interconnect, PCIE-root-complex, or from the GPU itself?) Is it the wrong attributes? Too large an eviction from the cache? Wrong alignment for this BAR? It can't handle cache-maintenance? Thanks, James