Received: by 10.223.164.202 with SMTP id h10csp850405wrb; Wed, 15 Nov 2017 08:55:34 -0800 (PST) X-Google-Smtp-Source: AGs4zMYuqxKnCUnHPbv3LQ0Z1KB3hX40q5siWXJhKEBuKVh+rOyl4U1Tat6CnAceO7fPorQZQ+dN X-Received: by 10.101.91.193 with SMTP id o1mr15942997pgr.75.1510764934182; Wed, 15 Nov 2017 08:55:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1510764934; cv=none; d=google.com; s=arc-20160816; b=blYUnrrNRWH9E9XCgIPSrhD/8iapVhYlfhjNLRNXXRSiyiXx7cOkvWz7T81fTsEFYj RY9Itpz3w0cBob84F1KZRCESkZ9cYd3LSr9URAKWcAV7aBI3Tm9v2jM4qd7LzCc7eJ+Z oViYMsJq/Erh3SrkmAg43BSpITgSI+x40AuSkf3ejdDPClv1u/5fdmOcZhaHQsHH95d9 nwzNZ6HLF0lYjnL0mcPh23c7jq48B0/igm6tOg6+/kyvW3eLlDoUrHWTG1G+cTwoqpvh s80tg4SeyZ6kmaTIDzyc2kScINaVaeqq1vwEjLWv7uyGZeZ4bnFiXvzzyfouAKv6V4JR TpdA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date :arc-authentication-results; bh=UjeTBJuwbKidefm5AezgQRI3/Z/S3NfguAJG3JDJryQ=; b=mF0mXSYK2p14ssc08hCnb1144IcCMbEyVK4VmcR7Ohij+7Sxb+caxkq1bh12VrWH0H AtfszSgjkbgkt6Y1eXyaUugR0D3OzQAvcHDSrg6rLv70hFA98pKj2587U4o9Iiu5hXy4 05+RQCxMWUfkuGPgKgRyCdpTNRaqE4BvBKTxbL7R7ot1GoxQ9Gdthw+Pg9rehyEj0bHP ny5sCMcjhakAyQISLbUNfKu2snLfmlDhTsSRW1Pq5n9Ma190HjzEkj7AdDxaR+ZGIQTB BV0VorclNqVtsVVC+25jxiLZUP8ktralIJ/+hQJ82wH7rSts6x6BlXZ0ptZo1Ggh0Ph2 RgJQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h185si20123940pfc.277.2017.11.15.08.55.22; Wed, 15 Nov 2017 08:55:34 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758180AbdKOPM5 (ORCPT + 89 others); Wed, 15 Nov 2017 10:12:57 -0500 Received: from youngberry.canonical.com ([91.189.89.112]:55321 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755088AbdKOPMs (ORCPT ); Wed, 15 Nov 2017 10:12:48 -0500 Received: from 2.general.manjo.us.vpn ([10.172.65.3] helo=hungry) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.76) (envelope-from ) id 1eEzMo-0007ST-OZ; Wed, 15 Nov 2017 15:12:43 +0000 Date: Wed, 15 Nov 2017 09:12:33 -0600 (CST) From: Manoj Iyer X-X-Sender: manjo@hungry To: Shanker Donthineni , James Morse cc: Manoj Iyer , Will Deacon , Marc Zyngier , linux-arm-kernel@lists.infradead.org, Catalin Marinas , Ard Biesheuvel , Matt Fleming , Christoffer Dall , linux-kernel@vger.kernel.org, linux-efi@vger.kernel.org, kvmarm@lists.cs.columbia.edu Subject: Re: [3/3] arm64: Add software workaround for Falkor erratum 1041 In-Reply-To: Message-ID: References: <1509679664-3749-4-git-send-email-shankerd@codeaurora.org> <5A04369A.2020405@arm.com> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 10 Nov 2017, Manoj Iyer wrote: > On Thu, 9 Nov 2017, Manoj Iyer wrote: > >> >> James, >> >> Looks like my VM test raised a false alarm. I retested stock Artful 4.13 >> kernel (No erratum 1041 patches applied). >> > > James, an update on the crash (false alarm). We suspect this is a firmware > crash due to a possible fw bug. Once this is addressed I will be able to send > you the test results you requested on VM start/stop with the erratum 1041 > patches applied. > James/Shanker, I can report that VM start/stop/restart tests worked with the patches applied to Ubuntu 4.13 (Artful) kernel on the qdf2400 hardware. Host: Ubuntu 4.13 with Erratum 1041 patches applied Guest: Stock Ubuntu 4.13 kernel - create 20 vms one at a time 10 iteration of: - Stop (virsh destroy) 20 VMs one at a time - Start (virsh start) 20 VMs one at a time. Tested-by: Manoj Iyer > >> Host: Ubuntu Artful 4.13 kernel with *no* erratum 1041 patches applied. >> Guest: Ubuntu Zesty (4.10) kernel. >> >> - Created 20 VMs one at a time >> >> In a loop: >> - Stop (virsh destroy) 20 VMs one at a time >> - Start (virsh start) 20 VMs one at a time. >> >> And, I am able to reproduce the system reset issue I previously reported. I >> think the problem I reported with VMs might have nothing to do with the >> erratum 1041 patches, and probably needs to be root caused seperately. >> >> With stock 4.13 kernel (no erratum 1041 patches applied): >> >> awrep6 login: [ 461.881379] ACPI CPPC: PCC check channel failed. Status=0 >> [ 462.051194] ACPI CPPC: PCC check channel failed. Status=0 >> [ 462.223137] ACPI CPPC: PCC check channel failed. Status=0 >> [ 462.633790] ACPI CPPC: PCC check channel failed. Status=0 >> [ 463.231971] ACPI CPPC: PCC check channel failed. Status=0 >> [ 463.403163] ACPI CPPC: PCC check channel failed. Status=0 >> [ 463.822936] ACPI CPPC: PCC check channel failed. Status=0 >> [ 463.995222] ACPI CPPC: PCC check channel failed. Status=0 >> [ 464.130962] ACPI CPPC: PCC check channel failed. Status=0 >> [ 464.258973] ACPI CPPC: PCC check channel failed. Status=0 >> [ 465.283028] ACPI CPPC: PCC check channel failed. Status=0 >> >> >> SYS_DBG: Running SDI image (immediate mode) >> SYS_DBG: Ram Dump Init >> SYS_DBG: Failed to init SD card >> SYS_DBG: Resetting system! >> >> >> On Thu, 9 Nov 2017, Manoj Iyer wrote: >> >>> >>> >>> >>> On Thu, 9 Nov 2017, Manoj Iyer wrote: >>> >>>> >>>> James, >>>> >>>> (sorry for top-posting) >>>> >>>> Applied patch 3 patches to Ubuntu Artful Kernel ( 4.13.0-16-generic ) >>>> >>>> - Start 20 VMs one at a time >>>> >>>> In a loop: >>>> - Stop (virsh destroy) 20 VMs one at a time >>>> - Start (virsh start) 20 VMs one at a time. >>> >>> Fixing some confusion I might have introduced in my prev email. >>> >>> - Applied all 3 patches to Ubuntu Artful Kernel ( 4.13.0-16-generic ) >>> >>> - Created 20 VMs one at a time >>> >>> In a loop: >>> - Stop (virsh destroy) 20 VMs one at a time >>> - Start (virsh start) 20 VMs one at a time. >>> >>>> >>>> The system reset's itself after starting the last VM on the 1st loop >>>> displaying the following: >>>> >>>> awrep6 login: [ 603.349141] ACPI CPPC: PCC check channel failed. Status=0 >>>> [ 603.765101] ACPI CPPC: PCC check channel failed. Status=0 >>>> [ 603.937389] ACPI CPPC: PCC check channel failed. Status=0 >>>> [ 608.285495] ACPI CPPC: PCC check channel failed. Status=0 >>>> [ 608.289481] ACPI CPPC: PCC check channel failed. Status=0 >>>> >>>> SYS_DBG: Running SDI image (immediate mode) >>>> SYS_DBG: Ram Dump Init >>>> SYS_DBG: Failed to init SD card >>>> SYS_DBG: Resetting system! >>>> >>>> Followed by the following messages on system reboot: >>>> [ 6.616891] BERT: Error records from previous boot: >>>> [ 6.621655] [Hardware Error]: event severity: fatal >>>> [ 6.626516] [Hardware Error]: imprecise tstamp: 0000-00-00 00:00:00 >>>> [ 6.632851] [Hardware Error]: Error 0, type: fatal >>>> [ 6.637713] [Hardware Error]: section type: unknown, >>>> d2e2621c-f936-468d-0d84-15a4ed015c8b >>>> [ 6.646045] [Hardware Error]: section length: 0x238 >>>> [ 6.651082] [Hardware Error]: 00000000: 72724502 5220726f 6f736165 >>>> 6e55206e .Error Reason Un >>>> [ 6.659761] [Hardware Error]: 00000010: 776f6e6b 0000006e 00000000 >>>> 00000000 known........... >>>> [ 6.668442] [Hardware Error]: 00000020: 00000000 00000000 00000000 >>>> 00000000 ................ >>>> [ 6.677122] [Hardware Error]: 00000030: 00000000 00000000 00000000 >>>> 00000000 ................ >>>> >>>> >>>> On Thu, 9 Nov 2017, James Morse wrote: >>>> >>>>> Hi Manoj, >>>>> >>>>> On 08/11/17 19:05, Manoj Iyer wrote: >>>>>> On Thu, 2 Nov 2017, Shanker Donthineni wrote: >>>>>>> The ARM architecture defines the memory locations that are permitted >>>>>>> to be accessed as the result of a speculative instruction fetch from >>>>>>> an exception level for which all stages of translation are disabled. >>>>>>> Specifically, the core is permitted to speculatively fetch from the >>>>>>> 4KB region containing the current program counter and next 4KB. >>>>>>> >>>>>>> When translation is changed from enabled to disabled for the running >>>>>>> exception level (SCTLR_ELn[M] changed from a value of 1 to 0), the >>>>>>> Falkor core may errantly speculatively access memory locations outside >>>>>>> of the 4KB region permitted by the architecture. The errant memory >>>>>>> access may lead to one of the following unexpected behaviors. >>>>> >>>>>> I applied the 3 patches to Ubuntu 4.13.0-16-generic (Artful) kernel and >>>>>> ran stress-ng cpu tests on QDF2400 server >>>>> >>>>> [...] >>>>> >>>>>> Where stress-ng would spawn N workers and test cpu offline/online, >>>>>> perform >>>>>> matrix operations, do rapid context switchs, and anonymous mmaps. >>>>>> Although >>>>>> I was not able to reproduce the erratum on the stock 4.13 kernel using >>>>>> the >>>>>> same test case, the patched kernel did not seem to introduce any >>>>>> regressions either. I ran the stress-ng tests for over 8hrs found the >>>>>> system to be stable. >>>>> >>>>> >>>>> Could you throw kexec and KVM into the mix? This issue only shows up >>>>> when we >>>>> disable the MMU, which we almost never do. >>>>> >>>>> For CPU offline/online we make the PSCI 'offline' call with the MMU >>>>> enabled. >>>>> When the CPU comes back firmware has reset the EL2/EL1 SCTLR from a >>>>> higher >>>>> exception level, so it won't hit this issue. >>>>> >>>>> One place we do this is kexec, where we drop into purgatory with the MMU >>>>> disabled. >>>>> >>>>> The other is KVM unloading itself to return to the hyp stub. You can >>>>> stress this >>>>> by starting and stopping a VM. When the number of VMs reaches 0 KVM >>>>> should >>>>> unload via 'kvm_arch_hardware_disable()'. >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> James >>>>> >>>>> >>>> >>>> -- >>>> ============================ >>>> Manoj Iyer >>>> Ubuntu/Canonical >>>> ARM Servers - Cloud >>>> ============================ >>>> >>>> >>> >>> -- >>> ============================ >>> Manoj Iyer >>> Ubuntu/Canonical >>> ARM Servers - Cloud >>> ============================ >>> >>> >> >> -- >> ============================ >> Manoj Iyer >> Ubuntu/Canonical >> ARM Servers - Cloud >> ============================ >> >> > > -- > ============================ > Manoj Iyer > Ubuntu/Canonical > ARM Servers - Cloud > ============================ > > From 1584092961889387440@xxx Wed Nov 15 01:19:32 +0000 2017 X-GM-THRID: 1583013979194485035 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread