Received: by 2002:ac0:950c:0:0:0:0:0 with SMTP id f12csp2296432imc; Tue, 12 Mar 2019 10:49:46 -0700 (PDT) X-Google-Smtp-Source: APXvYqzSzn+r5zQd7hZsDYNShBG7gBiLZ2GmzPqGMxS7eJtXSQ8qPyoyAk/gmQJQgo+d9CvqbfqV X-Received: by 2002:a17:902:be02:: with SMTP id r2mr41646541pls.209.1552412986830; Tue, 12 Mar 2019 10:49:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1552412986; cv=none; d=google.com; s=arc-20160816; b=yQHXhQ80lsfNDB6mfCudozjS4GndZ7MJetPPz/X9izMhqiZi6ktjb6LIbwKzPbbqlb ft81MD8EbSt2iMDrA1z0izl74W+280FDXK4OSYVc+1c17phrin/c8NA5QpsjRxmS2I/U 7X5hAZYUnCV4kgnj7nLhu0ZbYCWwp9pMeYWROKwZwnu3KUR4xmzIz5ZXq+pFLyW4AjAr vzA/plPoZ30b1lGP0TfiU2E1P4XT+svmPcPkd9PN/ow+jx0Wthq5R3RcaWPOGaKRwp06 uznCBy2TdOdveDh0+lBvNdqA3MBRaafORIQ92PYNbt+RWV1/+UCv0LKLB4VNTMC45Ooe DwbQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:date:cc:to:from:subject:message-id :dkim-signature; bh=rM2sONr0gX8ptKwCoqG3+yJYc9ZD7fvee7gaLJP4eWU=; b=F9GyTSPgPJiEuRdzZHUFylrnuts8bZ0aBTtRgphnD6PMXt2o+VsBCQBx1I1hv+B/AO 7PCEKbS3Bcsu01B6qykKhJiJEbkm5HUqz+xzHunbd9u33tlnA8AeBteI41NurKoGAy+3 NSoXYLrjleytI/M7P9GSOHKRyatXzaQMG/G/1L1Ga7HUiEdRTMbbN+iAV9U0Tjdk+vNn +z1ATjJOwD1uNTg+RXV9wX9OxxmS9Plc57XvYCjgH+Ea23jcW79dLBsCXgmz1SkVrVF1 2k6knDw9Ibc42jMEz64aMQdeoLKxY80bvN/oS+UJksi2vd1o9etOIufwF3bUCfdEbBqf yxbA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@hansenpartnership.com header.s=20151216 header.b=Kuajv41H; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=hansenpartnership.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o188si8466338pfb.66.2019.03.12.10.49.31; Tue, 12 Mar 2019 10:49:46 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@hansenpartnership.com header.s=20151216 header.b=Kuajv41H; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=hansenpartnership.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728580AbfCLRPA (ORCPT + 99 others); Tue, 12 Mar 2019 13:15:00 -0400 Received: from bedivere.hansenpartnership.com ([66.63.167.143]:42764 "EHLO bedivere.hansenpartnership.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728308AbfCLROT (ORCPT ); Tue, 12 Mar 2019 13:14:19 -0400 Received: from localhost (localhost [127.0.0.1]) by bedivere.hansenpartnership.com (Postfix) with ESMTP id 4CB748EE1ED; Tue, 12 Mar 2019 10:14:18 -0700 (PDT) Received: from bedivere.hansenpartnership.com ([127.0.0.1]) by localhost (bedivere.hansenpartnership.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GGGELzzPz0U3; Tue, 12 Mar 2019 10:14:18 -0700 (PDT) Received: from [153.66.254.194] (unknown [50.35.68.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by bedivere.hansenpartnership.com (Postfix) with ESMTPSA id 6C85C8EE0F5; Tue, 12 Mar 2019 10:14:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=hansenpartnership.com; s=20151216; t=1552410857; bh=tFSJUgyUTyD2x9HYFCm+hG/UbCsyC2Au38/rg+6n56E=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=Kuajv41HvPAn+dDf/f7Px9fAhh1+6dHIH0cq7RNNDZ2nrOtle0a+LodEoVBJPBb7r XzGTcqVEXb3vPIZnpMmLLKPN9TjuOd5A59sXM91zGjld2jkT/qrV+ABHXMfWgMkn+E 9DzVp1wWM/WAkw+HsN6oOLSDUrAbW27YO66H1hKc= Message-ID: <1552410856.3083.28.camel@HansenPartnership.com> Subject: Re: [PATCH] tpm: Make timeout logic simpler and more robust From: James Bottomley To: Mimi Zohar , Jarkko Sakkinen , Peter =?ISO-8859-1?Q?H=FCwe?= Cc: Calvin Owens , Jason Gunthorpe , Arnd Bergmann , Greg Kroah-Hartman , linux-integrity@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Date: Tue, 12 Mar 2019 10:14:16 -0700 In-Reply-To: <1552409969.24794.68.camel@linux.ibm.com> References: <358e89ed2b766d51b5f57abf31ab7a925ac63379.1552348123.git.calvinowens@fb.com> <1552350463.23859.8.camel@HansenPartnership.com> <20190312125028.GC9243@linux.intel.com> <1552401766.3083.3.camel@HansenPartnership.com> <1552409969.24794.68.camel@linux.ibm.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.26.6 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2019-03-12 at 12:59 -0400, Mimi Zohar wrote: > On Tue, 2019-03-12 at 07:42 -0700, James Bottomley wrote: > > On Tue, 2019-03-12 at 14:50 +0200, Jarkko Sakkinen wrote: > > > On Mon, Mar 11, 2019 at 05:27:43PM -0700, James Bottomley wrote: > > > > On Mon, 2019-03-11 at 16:54 -0700, Calvin Owens wrote: > > > > > e're having lots of problems with TPM commands timing out, > > > > > and we're seeing these problems across lots of different > > > > > hardware (both v1/v2). > > > > > > > > > > I instrumented the driver to collect latency data, but I > > > > > wasn't able to find any specific timeout to fix: it seems > > > > > like many of them are too aggressive. So I tried replacing > > > > > all the timeout logic with a single universal long timeout, > > > > > and found that makes our TPMs 100% reliable. > > > > > > > > > > Given that this timeout logic is very complex, problematic, > > > > > and appears to serve no real purpose, I propose simply > > > > > deleting all of it. > > > > > > > > "no real purpose" is a bit strong given that all these timeouts > > > > are standards mandated. The purpose stated by the standards is > > > > that there needs to be a way of differentiating the TPM crashed > > > > from the TPM is taking a very long time to respond. For a > > > > normally functioning TPM it looks complex and unnecessary, but > > > > for a malfunctioning one it's a lifesaver. > > > > > > Standards should be only followed when they make practical sense > > > and ignored when not. The range is only up to 2s anyway. > > > > I don't disagree ... and I'm certainly not going to defend the TCG > > because I do think the complexity of some of its standards > > contributed to the lack of use of TPM 1.2. > > > > However, I am saying we should root cause this problem rather than > > take a blind shot at the apparent timeout complexity. My timeout > > instability is definitely related to the polling adjustments, so > > it's not unreasonable to think Facebooks might be as well. > > James, I thought Peter sent you a tis "debug" tool to help you debug > the problem you're seeing. Whatever happened? No, not seen one. I have tried to debug the problem, but it's really odd: my TPM is a polled nuvoton (so no irq line). If you poll the data ready bit on my TPM too often, it simply drops off the bus and every TPM operation after that times out. The only way to recover is to reboot. James