Received: by 10.223.164.221 with SMTP id h29csp3810235wrb; Tue, 10 Oct 2017 08:33:39 -0700 (PDT) X-Received: by 10.101.64.72 with SMTP id h8mr12753733pgp.110.1507649619882; Tue, 10 Oct 2017 08:33:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1507649619; cv=none; d=google.com; s=arc-20160816; b=cdJwaYIPRqP5CmoN6shtdUTJ/f+j6ut9pg8oSVtl30ifV+PAExiFm6CPChMZdx6zMN RX0zcsalmRUAJzKgv4ratawP1AFufR4V//Es2S6NhbS3NAa92Ct//bHTbrZoCrt5hMKN WUYkTe2qU2iqFaoxfdwYCkJA72GZXaFrDLqMKY4ts3bw2MG6TvpavdGaUEFDgUdD/do7 wcBRnYXzuAVMFFala0+2v+zMLLWycSf2YqURAI1wz/58XFadA8GpeNQ8mHRi7CL5HoNB yhQYluu/5nLipbhO1qGZMBu8YZ26MedRRcHKtMV5FDoCghVePTwVxmKZ/4JQuTbkmS8/ HABg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature:dkim-signature :arc-authentication-results; bh=tU0dDrdmOUNz0cGLSg5AOnotzqbQg7iR/PUODcP+yX0=; b=hRDXnbGVgaItQxQP+52x87XLm/xVENEndtejJi6lcBSzJXfYpeFKuMfFvsmTM9v9Pv eJHnNd81SAONqRS0wuoaqXysrmD66KmVHr5+jaBCHUeIpimz5jQdOrp2AJAcy0VzKSg2 1EuCbC0QXOTtlRfd97bcbkhky7w5wggjEIUzybbClQziKsfuflwC3c0BKg8xmObLPzuk Kt3QIG254dMhIFYhPtdxZvNJG8agY/y+58wOOcvAa+yoD/azdHZAp08LNkCZbnkkGBio j3JMWA4QNnThpFaWg3H4gGIBiSRT5jSCfbf5cafjrqOBtNN0KoQUABNf7ampoLb9AtmS xW1Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@google.com header.s=20161025 header.b=AFmWFmzJ; dkim=fail header.i=@chromium.org header.s=google header.b=IRIK4OSP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z3si6303768pgr.215.2017.10.10.08.33.25; Tue, 10 Oct 2017 08:33:39 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@google.com header.s=20161025 header.b=AFmWFmzJ; dkim=fail header.i=@chromium.org header.s=google header.b=IRIK4OSP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932464AbdJJPdH (ORCPT + 99 others); Tue, 10 Oct 2017 11:33:07 -0400 Received: from mail-wm0-f49.google.com ([74.125.82.49]:45712 "EHLO mail-wm0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932365AbdJJPdC (ORCPT ); Tue, 10 Oct 2017 11:33:02 -0400 Received: by mail-wm0-f49.google.com with SMTP id q124so6341576wmb.0 for ; Tue, 10 Oct 2017 08:33:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=tU0dDrdmOUNz0cGLSg5AOnotzqbQg7iR/PUODcP+yX0=; b=AFmWFmzJXmlnUI7uPM3fhIcM2bRMpsgWmC0WwVz9DoOIuJAkBKUxwkjsT+WzodNg6w A4ZolOK73zQtaXWCXa0J3XLJjHF0VGmeTPEjbnyw0/URy9Tssk5dQ7hhabhjy9uHEKp7 chATwPXmujrKPt97mCozMy7sSSu49sONL4o13DHWAqcqX7AixGdjNfg4Ve9cfg/HQH+P ra99rAH0ATwSzHFLwKvy/qc7LpnqsBjikWZWslEOaf16ARCrRpliPUyqtfxwDYYlPNTu UinTR5zwQdeFbNLwdkyfnHGsEfZuNRKR25Z1iWEjFa37csQ1H9Z7BAEaJybcXjPCBCtn C4jg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=tU0dDrdmOUNz0cGLSg5AOnotzqbQg7iR/PUODcP+yX0=; b=IRIK4OSPDlJnAe+Z4b5kilX1SJokCnC8rgZ/djO0ReLVHENpKnytwdcp30Mw2dXV4j FfWvysAMAccQLMASVSXNw00uetqTWcEnBP4ADiZvRed/7Au7U75LbTHojOxhEtyM19bo hFhEMIOsouUhxO1LLFtA8XXcUOR19ZEkQZ6CA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=tU0dDrdmOUNz0cGLSg5AOnotzqbQg7iR/PUODcP+yX0=; b=dcYQMwQACuUxhPE5vEZXHq6sB49V5scQL8m9/mFiI6vzLjVBJVN308DuCEtXYA2H4k QKF3hewkCYK6Ay3IxdEAZ4Q0M17HtQX3uE1pJVkIs60ny8W+uDfEw4URU7Gi3MfS6YHM HnEMyWsHzCEFO+pHi3Xh7Aw0Md1TGLU42X7MUmivUjfQKtHJdkU5uHJt8epTgEFMPBPy jQ6wN9W7oWiLrruk/2vYk0cU4WGew9ml7y/n26xmCkM5E4rqh/blaqx9lbDObrqDI4L4 hWCK6+ljMNcc8AIvwklGWGBnfLKfasoUQxi9VW7HAXGlUGnkYw5H8SoKzdF3cDQlNMAr IX9w== X-Gm-Message-State: AMCzsaUQAUKFU2XUsZ+AAdT4wrNerJCvMg0W6WtbN/rk1xeDFeGF39ok lUfMXeXR+QUvmthSU/CKvFCYdhQL2eKd4yxsqIROiw== X-Google-Smtp-Source: AOwi7QD402HbQnjWeiF6AkKF+YSIpL3nadh8AyHfL/zwlsqSw+TffUsIx6ASebThdmuafQGdQKtARneqwvE0gOriKwc= X-Received: by 10.223.195.131 with SMTP id p3mr4829795wrf.89.1507649580945; Tue, 10 Oct 2017 08:33:00 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.5.73 with HTTP; Tue, 10 Oct 2017 08:33:00 -0700 (PDT) In-Reply-To: <6beb7b8c-f602-112b-73b2-22188dcea28e@nvidia.com> References: <20170908205011.77986-1-briannorris@chromium.org> <02aa65e7-e967-055b-2af3-2e9b6ef77935@nvidia.com> <20170919171401.GA10968@google.com> <20170920061317.GB13616@google.com> <6beb7b8c-f602-112b-73b2-22188dcea28e@nvidia.com> From: Shawn N Date: Tue, 10 Oct 2017 08:33:00 -0700 X-Google-Sender-Auth: ye_mBHBovzgJ0jFL3Q0o1PUnh-U Message-ID: Subject: Re: [PATCH v3] platform/chrome: Use proper protocol transfer function To: Jon Hunter Cc: Olof Johansson , Benson Leung , Lee Jones , linux-kernel@vger.kernel.org, Doug Anderson , Brian Norris , Brian Norris , Gwendal Grignou , Enric Balletbo , Tomeu Vizoso , "linux-tegra@vger.kernel.org" , amstan@chromium.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 10, 2017 at 6:35 AM, Jon Hunter wrote: > > On 26/09/17 00:15, Shawn N wrote: >> On Wed, Sep 20, 2017 at 1:22 PM, Shawn N wrote: >>> On Tue, Sep 19, 2017 at 11:13 PM, Brian Norris wrote: >>>> Hi, >>>> >>>> On Tue, Sep 19, 2017 at 11:05:38PM -0700, Shawn N wrote: >>>>> This is failing because our EC_CMD_GET_PROTOCOL_INFO host command is >>>>> getting messed up, or the reply buffer is getting corrupted somehow. >>>>> >>>>> ec_dev->proto_version = >>>>> min(EC_HOST_REQUEST_VERSION, >>>>> fls(proto_info->protocol_versions) - 1); >>>>> >>> >>> Checking this closer, the first host command we send after we boot the >>> kernel (EC_CMD_GET_PROTOCOL_INFO) is failing due to protocol error >>> (see 'SPI rx bad data' / 'SPI not ready' on the EC console). Since >>> this doesn't seem to happen on the Chromium OS nyan_big release >>> kernel, I suggest to hook up a logic analyzer and see if the SPI >>> master is doing something bad. >>> >>> The error handling in cros_ec_cmd_xfer_spi() is completely wrong and >>> we return -EAGAIN / EC_RES_IN_PROGRESS, which the caller interprets >>> "the host command was received by the EC and is currently being >>> handled, poll status until completion". So the caller polls status >>> with EC_CMD_GET_COMMS_STATUS, sees no host command is in progress >>> (which is interpreted to mean "the host command I sent previously has >>> now successfully completed"), and returns success. The problem here is >>> that the initial host command was never received at all, and no reply >>> was ever received, so our reply data is all zero. >>> >>> Two things need to be fixed here: >>> >>> 1) Find out why the first host command after boot is failing. Probe >>> SPI pins and see what's going on. >>> 2) Fix error handling so we properly return an error (or properly >>> retry the entire command) when a protocol error occurs (I made some >>> attempt in https://chromium-review.googlesource.com/385080/, probably >>> I should revisit that). >> >> The below patch will fix error handling and will make things mostly >> work on nyan_big, because we'll fall back to V2 protocol after the >> initial failure. But we should still investigate why we're getting >> errors on the first host command. We aren't seeing these errors when >> we send commands from firmware, so I suspect something is wrong in >> kernel SPI HW initialization that causes the first command to fail. > I have been looking into this a bit more and it appears to be timing > related. I found that enabling some debug in the Tegra SPI driver the > problem would go away and seems that adding a delay before sending the > SPI message would also workaround the problem. > > Looking back at the Tegra Linux test history [0], it appears that after > v4.10-rc5 [1] I start to see the following message which is the first > indication of some SPI issues ... > > cros-ec-spi spi32766.0: packet too long (249 bytes, expected 4) > > I attempted to bisect this, but I was not successful because each time > I would ended up somewhere different. I found that even with v4.9 I may > see the issue 1 in 20 times and so I realised that I am not even sure > when the problem really started or if has always been there. It seems > that for older kernels it is harder to reproduce. I am wondering now if > some timing has changed somewhere causing us to see the problem more > frequently with newer kernels? > > I see the the cros-ec binding defines the following ... > > "google,cros-ec-spi-pre-delay: Some implementations of the EC need a > little time to wake up from sleep before they can receive SPI transfers > at a high clock rate. This property specifies the delay, in usecs, > between the assertion of the CS to the start of the first clock pulse." > > I found that adding the following also worked around the problem ... > > diff --git a/arch/arm/boot/dts/tegra124-nyan.dtsi b/arch/arm/boot/dts/tegra124-nyan.dtsi > index 5cf987b5401e..0baa6bfc0f36 100644 > --- a/arch/arm/boot/dts/tegra124-nyan.dtsi > +++ b/arch/arm/boot/dts/tegra124-nyan.dtsi > @@ -317,6 +317,7 @@ > interrupts = ; > reg = <0>; > > + google,cros-ec-spi-pre-delay = <10>; > google,cros-ec-spi-msg-delay = <2000>; > > i2c-tunnel { > > I have tried 50 boots with the above and I have seen no SPI failures > on boot. > > I did look to see if it is possible to probe the SPI signals with a > scope but from the schematics I am not sure if they are accessible or > buried in the PCB. > > Is it possible that Tegra is sending the SPI message too soon for the > EC? I have not worked much with the cros_ec stm32 SPI host interface, but I think it's possible. Also note that we define "google,cros-ec-spi-pre-delay = <30>;" for the veyron family of devices, which also use an stm32-family embedded controller. Some of the folks on cc worked more on veyron and may have more insight. I'm still not clear on why we see an error only on the first transaction after boot. In this case, the embedded controller previously handled host commands from firmware just fine, and the handoff between firmware and the kernel shouldn't be significant, from the EC point of view. If host command timing is consistent from the master, I would expect to see some constant error rate, eg. some chance any host command will fail, rather than the first host command always failing. > > Cheers > Jon > > [0] https://nvtb.github.io/linux/ > [1] https://nvtb.github.io/linux/test_v4.10-rc5/20170123023102/boot/tegra124-nyan-big/tegra124-nyan-big/tegra_defconfig_log.txt > > -- > nvpublic From 1580878091966470673@xxx Tue Oct 10 13:40:33 +0000 2017 X-GM-THRID: 1578006392712496529 X-Gmail-Labels: Inbox,Category Forums