Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp1836879imm; Thu, 23 Aug 2018 09:31:58 -0700 (PDT) X-Google-Smtp-Source: AA+uWPyQTdqZhAQgH6MI3FrGarp64y+yvTMgttYALVqo91xMb/4LBSsZ0vFvENL0xBaiaXUHVM+l X-Received: by 2002:a63:ea49:: with SMTP id l9-v6mr56494985pgk.427.1535041918249; Thu, 23 Aug 2018 09:31:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535041918; cv=none; d=google.com; s=arc-20160816; b=lAec1kVMhVseSlvlgMfQRlNrM2oUxUkip9LP+mZaGx3m/NOuQ9Wo8EeyTz5uHZl9mQ 10rYR/p+lgBJdx0kqapU3S9MSFZh5jYlDs+f6lfvFM+MCGnBb2LR8jCbXmfbmmnFOb4z hWPv6pLb+XxGN82YNTnXwYxqP5Cp6pijug/0xx3u+zCWot1Y54SQcwrJ7KmG8C2CRBff j4KosFyGZzajABjBVqK6zD1VrLjtiqouLEGkMvjiYl2OUl4PuL+TUzplfkGKOgHty1xe D5mV55qDtpkaShBRbIfNJg+lXGKajZ16Wwb1X2NvprRyl9rUBKJGND3rXi3AVvG0jPwW Kmpw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:reply-to:dkim-signature :arc-authentication-results; bh=T/DKCj82GMPz4bjnxiL+eCtF/Q+MtoRU9c0rSF8wyLo=; b=bgXnnJIeIOdSa/RNLqptC4YPvxfcFi0RxSkJlZxSwdUfbgLIjxage5jBp0eHHTt1Lt Ila9xHLlu5fFCFxqcMqkd3FJF5yYDcKWls6KzhxZZ1oGTmbKUIELCxuce3ZTw2MQXv0B XfO1LBAghRC85fqynB/llwersskjj6fiF0hJTSJIS5XyMyRYca+Fx2QZ0L+yLC9V0XFa pIRR+ePzstqC46pQYgELHDpqjxCc0iuUBF1OloMkzd2Akq+k2XI1uIXATDB9YXw0zCby SJsOPRayQM5E3LFMcq3PwijNsRCJ11FI6FfRCADDEDtUt7AngT8Jg0/DwsNBkXz1L7sc C2VA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=b2pF6sCR; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 33-v6si4460338pgv.11.2018.08.23.09.31.43; Thu, 23 Aug 2018 09:31:58 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=b2pF6sCR; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727313AbeHWTx1 (ORCPT + 99 others); Thu, 23 Aug 2018 15:53:27 -0400 Received: from mail-oi0-f68.google.com ([209.85.218.68]:44476 "EHLO mail-oi0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726461AbeHWTx1 (ORCPT ); Thu, 23 Aug 2018 15:53:27 -0400 Received: by mail-oi0-f68.google.com with SMTP id l82-v6so4676249oih.11 for ; Thu, 23 Aug 2018 09:23:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=reply-to:subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding:content-language; bh=T/DKCj82GMPz4bjnxiL+eCtF/Q+MtoRU9c0rSF8wyLo=; b=b2pF6sCRqeK2lICe8fA517Eb1/5ZfDKu3BjFTF2DXQKrqt6C5ZXWwsko66utaQ4RTL X9MMQYNJMKVL4irbeyHOsadoZ2wTSYSckj0jN3QXj+uoW1vZjXpCiKwY661k9CZfOGY7 /MqWy5WaCQ5Woo1xSenuKN09g1XxW5WEL/1A0AxjhmEi8UAGaeIwdRxCNT8ckN1lfLnk hrlYfhsl7vK5MRrNHK4HUlgkKvQQaI+QmDaqYEFdoZjW2n2GtWcBs66XnuiV4BLINbEm XUZICUFtPfettEqId6IXIhJlQUAFgXSbwV5RmnHPiuRTBANTNwd1dnVRLKXIAGc8gaxz mixw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:reply-to:subject:to:cc:references:from :message-id:date:user-agent:mime-version:in-reply-to :content-transfer-encoding:content-language; bh=T/DKCj82GMPz4bjnxiL+eCtF/Q+MtoRU9c0rSF8wyLo=; b=s5tLdCJEoLNXhEhhvKfrNSEfg1eH4QAgHyLSg+3YOddj27y9s/WQAArhFw+5X8xLe0 sqxKcmkmXYVuQlP3rnMFKhM6jD7hAzH646b3JKmivgYvxT60RVJ7oiJ4nSHu37a15wH7 XYVuV4EWkeV//kR9LrVv8wFJMUuJYAiSVD4F0F4610rgoiFABqewSfUNqkA1W/yPgLQS HcYmmzu5VYGLTgv4dBtXzkE8kWQGkXFmRpUCMijdrWUuYWO722fNpHMqK1/LBJRHndIV nKDt6iwH0XR5xuoltI0OXqq5QUeFF0PrbVu6BmntDGSBkx8jrHDKKCGizRIT+lospn3P +2qQ== X-Gm-Message-State: APzg51Dgm80VFXrWzeNgQZfl0cuBVGl1hcBfeYiL91yn88tH8V5Wo5YO 78VvGebkpWvRJjWMwxto0onWCzU= X-Received: by 2002:a54:4f94:: with SMTP id g20-v6mr8481461oiy.130.1535041380095; Thu, 23 Aug 2018 09:23:00 -0700 (PDT) Received: from [192.168.27.3] ([47.184.170.128]) by smtp.gmail.com with ESMTPSA id n131-v6sm2631214oia.17.2018.08.23.09.22.58 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 23 Aug 2018 09:22:58 -0700 (PDT) Reply-To: minyard@acm.org Subject: Re: [RFC] IPMI state machine regression To: Andrew Banman , Corey Minyard Cc: Arnd Bergmann , Greg Kroah-Hartman , justin.ernst@hpe.com, rja@hpe.com, frank.ramsay@hpe.com, openipmi-developer@lists.sourceforge.net, linux-kernel@vger.kernel.org References: <20180821221443.hhgcnzw6xttaih3i@linux-tqvx.americas.hpqcorp.net> <20180822162352.q7qc2udqabbqxdya@linux-tqvx> From: Corey Minyard Message-ID: Date: Thu, 23 Aug 2018 11:22:58 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20180822162352.q7qc2udqabbqxdya@linux-tqvx> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 08/22/2018 11:23 AM, Andrew Banman wrote: > On Wed, Aug 22, 2018 at 11:14:52AM -0500, Corey Minyard wrote: >> On 08/21/2018 05:14 PM, Andrew Banman wrote: >>> Dear IPMI supporters, >>> >>> We observe a window in IPMI BT's opportunistic get capabilities request, >>> wherein GET_DEVICE_GUID and GET_DEVICE_ID requests may start while the BT state >>> machine is in WR_CONSUME. Following this, the 0xD5 error code is forced in >>> bt_start_transaction, IPMI fails to initialize, and the interface is torn down. >>> There is no mechanism to retry bringing up the interface in open() /dev/ipmi. >>> This leaves IPMI hosed until you reload modules. Looks to happen after we call >>> schedule(). >> When was the latest kernel where this worked properly?  Also, what hardware >> is this? > This is UV4. > > First known bad commit, but I am not sure if the timing issue predates > it: > > commit aa9c9ab2443e3b9562c6c7cfc245a9e43b557d14 > Author: Jeremy Kerr > Date: Fri Aug 25 15:47:24 2017 +0800 > > ipmi: allow dynamic BMC version information > > Hits less frequently with older kernels so I didn't see it until > recently when it became more frequent. Ok, that's for the crash, which makes sense.  But that's an easy problem to fix. I would like a "Tested-by" on that, if you get to test it, though I was able to simulate various failures there to test it out. So reading between the lines ("more frequent") I'm guessing that this still happened with older kernels, but is becoming annoying with newer kernels. I would guess recent changes causes it to happen more often due to changes in the way the upper layer interacts with the lower layers, you will have more messages at startup, and the timing is somewhat different. The BT code itself hasn't changed much in over 10 years.  Nothing that looks like it would cause an issue like this.  So I would guess this is an issue that has been around for a while. I don't have any real hardware with a BT interface, just the one in qemu, but I've never seen it there. It actually looks like the state machine is working ok.  But the BMC is responding to a "Get Device ID" command with: Recv:: 1c 08 d5 That's an error response with D5, which is "Cannot execute command. Command, or request parameter(s), not supported in present state." That's an error response from your BMC.  That particular command shouldn't ever respond with that error, so I think the bug here is with your BMC. -corey