Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754930AbcJEUJX convert rfc822-to-8bit (ORCPT ); Wed, 5 Oct 2016 16:09:23 -0400 Received: from mga01.intel.com ([192.55.52.88]:38304 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753977AbcJEUJW (ORCPT ); Wed, 5 Oct 2016 16:09:22 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.31,302,1473145200"; d="scan'208";a="1049957557" From: "Winkler, Tomas" To: Jason Gunthorpe CC: Jarkko Sakkinen , "tpmdd-devel@lists.sourceforge.net" , "linux-kernel@vger.kernel.org" Subject: RE: [PATCH] tpm: don't destroy chip device prematurely Thread-Topic: [PATCH] tpm: don't destroy chip device prematurely Thread-Index: AQHSHJY7jYiWgZcWb0y7/x76d69Sl6CUwviAgAC3bgCAANJWAIAAMLYAgAEU7gCAAMAwAIAAgWOg///ptoCAAKl48IAAhHEAgABWmQA= Date: Wed, 5 Oct 2016 20:09:17 +0000 Message-ID: <5B8DA87D05A7694D9FA63FD143655C1B542F54E9@hasmsx108.ger.corp.intel.com> References: <20161002101755.GA25844@intel.com> <20161002102455.GA27464@intel.com> <20161002212126.GA25872@obsidianresearch.com> <5B8DA87D05A7694D9FA63FD143655C1B542F466B@hasmsx108.ger.corp.intel.com> <20161003124836.GE9990@intel.com> <20161004051946.GA10572@intel.com> <20161004164738.GA17149@obsidianresearch.com> <5B8DA87D05A7694D9FA63FD143655C1B542F4C92@hasmsx108.ger.corp.intel.com> <20161004231057.GA20062@obsidianresearch.com> <5B8DA87D05A7694D9FA63FD143655C1B542F5084@hasmsx108.ger.corp.intel.com> <20161005171132.GE18636@obsidianresearch.com> In-Reply-To: <20161005171132.GE18636@obsidianresearch.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ctpclassification: CTP_IC x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiYzQ0ZmRlMjEtOWRlYS00NWU1LWFhY2MtNjBlNGY1ZGIzZDBmIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6IkE4UTczcnNaZFo5WTNyY1ozZFVzNlcrbGVRVkhWTmlrdkFwNlhNS2R5alU9In0= x-originating-ip: [10.184.70.11] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5785 Lines: 138 > > On Wed, Oct 05, 2016 at 07:48:59AM +0000, Winkler, Tomas wrote: > > > > down_write(&chip->ops_sem); > > > > chip->ops = NULL; > > > > up_write(&chip->ops_sem); > > > > > > No, that is wrong as well, another thread can issue a TPM command > > > between the device_del and the ops = NULL. Presumably that will fail > > > the same as tpm2_shutdown does. > > > > Right, but that's why we need the TPM_CHIP_FLAG_REGISTERED bit to > > stay. > > How does that help? > > We already null ops to signal the driver is removed, but we don't check ops in > all places today. Notably sysfs doesn't do the test, and relies on the hard fence > device_del provides. > > So at a minimum to go forward with this approach you have to fix sysfs to be > safe - which I don't think is worthwhile.. Yes, if this approach is taken this has to go across the whole stack, there is no question about it. > > Second, tmp2_shutdown only assure that the tpm state is saved, we are > > taking too hardline here. If another command is issued, this is a > > problem of the upper layers and it has to be fixed in the upper layer. > > We can't fix it at the upper layers, this is classic removal race. Whatever > happens it must not cause the kernel to malfunction. > > > This is the problem statement form the commit message: > > > > ' But with the introduction of runtime power management it will result > > in shutting down the parent device while it still in use.' > > > https://sourceforge.net/p/tpmdd/mailman/message/35395799/ > > Great, your commit message should include the klog message this patch is > fixing. > It could, but that patch was not merged yet, and I believe even if the issue is exposed only with runtime_pm currently, we have a bug in design even w/o runtime pm. > > > But again, the real bug is in design, where a device is used after > > > device_del() is called. > > No, I don't think so.. I do :) > > > > What code actually fails? I don't see anything in the runtime pm > > > patch that relies on chip->dev at all. > > > > "chip-dev.parent'' > > dev_get_drvdata(&chip->dev); > > Also chip->dev.name because we can call dev_log/etc(&chip->dev..) Yes, of course but this was not the point of disagreement > These are all fine, obviously. Todays kernel retains those values across > device_del and we set those values in tpmm_chip_alloc/etc. So correct values > are present as long as the chip memory exists. tpm continues to hold a kref on > the chip so the memory will be around. I'm not saying they are not, but calling deep into the tpm stack and even to the parent device with unutilized device is not sane. > 'dev' values are only being used for clarity and convenience, if ever the kernel > changes behavior and nulls those values during device_del then we will copy > the values into chip and stop using dev. No algorithm needs to change, and we > don't need a registered 'dev' > to function. Frankly I would suggest to stay with the device, it let you add tpm spec attributes (sysfs) if needed to the stack, you cannot hang those on the air. You should distinguish between chip->dev and cdev though, those are not the same things. > However, I find such a possible change to be deeply unlikely. If you look > around the kernel I think you will find that many subsystems subtly depend on > these same invariants for corectness during various unregister races. Yes, but the code around is aware that it's unregistering, you have device_del, put_device pair enclosing device removal in the same function., unlike tmp2_shutdown which requires most of the stack functional. > > > What code fails and why? > > > > device_del(dev) > > bus_remove_device(dev) > > device_release_driver(dev) > > __device_release_driver(dev) > > pm_runtime_reinit(dev) { > > if (dev->parent) > > > > pm_runtime_put(dev->parent); > > Eh?? The full code is: > > void pm_runtime_reinit(struct device *dev) { if (!pm_runtime_enabled(dev)) { > if (dev->power.irq_safe) { > if (dev->parent) > pm_runtime_put(dev->parent); > > And irq_safe is only set by pm_runtime_irq_safe(). I can't find any place that > looks like that is called on chip->dev > > Is there some other PM path where dev->parent becomes invovled? Of course, the power management utilize the device hierarch, it assumes there is power dependencies between parents and child devices, such as bus controllers and the devices on that bus. > Are you just guessing this solves a problem, or were you able to reproduce > Jarkko's report? No, guesses are not my style :), this solves the issue, as you see this was also validated by Jarrko on his setup. > Considering that Jarkko cannot reliably reproduce the original bug, I'm deeply > skeptical this patch actually does anything more than fiddle with timing > around some kind of undiscovered race condition scenario. > > Even if that pm_runtime_put is happening, why doesn't the > > + pm_runtime_get_sync(chip->dev.parent); > > The runtime_pm patch adds to tpm_transmit take care of bringing the device > back? Unfortunately not because, because it gets out of sync and what is actually happening is that idle callback is called and device is put to idle between send and receive. Now we can find a trick to fix this, but this would be rather w/o while we know what the real issue is. > I'm still not hearing from you an explanation for what is actually happening to > cause the 325 error.. What does that error code decode to anyhow? Error code 0x145 RC_AUTH_CONTEXT, but the error code may vary, basically we lost context because the device went to the idle state. Thanks Tomas