Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp4185900ybl; Mon, 9 Dec 2019 06:50:49 -0800 (PST) X-Google-Smtp-Source: APXvYqyeXDxVhG4+hc7Xa6hKVEqisxOsgApKoIxEMxWRlsk6tfN7tuKEVHrYxD1RCjD7w2LiBqtQ X-Received: by 2002:aca:d4c1:: with SMTP id l184mr24619470oig.172.1575903049022; Mon, 09 Dec 2019 06:50:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1575903049; cv=none; d=google.com; s=arc-20160816; b=MFM7T25l0M78nt7FvN9CCEEMqa5dDVN0f1daoFQb3eoJMEDUHs7hgyTaarIE8QPNyQ 4I2xxv8Vwy3mYzCr6PJndN/SPb8rUFC9hCNCyrobXv7xgHMVLhXZSHq2njXEyHSVQAKU tyJBMdke3uQvopSSvBcwSQMklx01Mx0AcSG8vbcpLN19BP53CBeCV71eF4Fc3ixyYJoN lSFl493oTwsm3uzbdq++YoVrByPPKSdH/gHi7/uwAdnysN/xJ/m+boKs3t2KqxQlEMIw jursRAM7VWPnENYnbLgC1a7tdAgAic1lpT8Aiz/qskiX3bwiXlA6tSiUdkQCcs4w1gZ8 zjCQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :content-language:accept-language:in-reply-to:references:message-id :date:thread-index:thread-topic:subject:cc:to:from:ironport-sdr :dkim-signature; bh=XbcTZM4MjHE6lmT4GQMd2AUnJ2M3OvQz7zQpxDVD1KM=; b=Bm9IvuxZ1atEArSZdOiuEcJb5w7m7sryRpvI2HbSYoUeybGGTCE2Vfz3LrGClNbGS9 SKeIO8PzsozdNjVCfR22ybAn4rfwNnXWTCQQtbrMbPfyU81K8cCjqUybEt/XnbTzFMqS uX15YoAeQOz5fZmqKlq+4GZheYnJx6a6dju2Qp+TjP7GJZpD8ikGr8LOAFezJofSK6eJ ynZ6qY8SCEpgU46wSyOIClo4at/ZPwJwLBMDaG7zniVl/dmOkf57mw4RskgTNNL/KoFl m00CylYYS7Vk2vneKalTOOMUcQqM0LYzINMF1mWdg4kKdn7JGvX6B9sX+cxoz3+iQFKc 61BA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=Zc5iSu2O; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y198si12440022oia.163.2019.12.09.06.50.38; Mon, 09 Dec 2019 06:50:49 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=Zc5iSu2O; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727787AbfLIOlq (ORCPT + 99 others); Mon, 9 Dec 2019 09:41:46 -0500 Received: from smtp-fw-6002.amazon.com ([52.95.49.90]:44549 "EHLO smtp-fw-6002.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726687AbfLIOlp (ORCPT ); Mon, 9 Dec 2019 09:41:45 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1575902505; x=1607438505; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=XbcTZM4MjHE6lmT4GQMd2AUnJ2M3OvQz7zQpxDVD1KM=; b=Zc5iSu2OtZqxN5T1MbLyl/rgXp8YIELTJLwhEjZcCi2KwGmmVS0mon/t qUO94rswxhAi9mgPbtXNVVx4m0nsvZ5sA+CGeF54Se/nmzYYp8jtQ8t+F QZ3urZdnhSm3vqODrLIpMWFEd2uU1ULV3AcCxV763PAXGnkK2EwV6+Zg9 0=; IronPort-SDR: YwFq6IemvsINv3mi9a8IuDZl45EjZnpt8fBGfIdhG+M/Nx3njDgEBZRpGtinUxcfUUPKL1ZOXZ Z3G3Hi67ilWw== X-IronPort-AV: E=Sophos;i="5.69,296,1571702400"; d="scan'208";a="6820773" Received: from iad6-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-2a-8549039f.us-west-2.amazon.com) ([10.124.125.6]) by smtp-border-fw-out-6002.iad6.amazon.com with ESMTP; 09 Dec 2019 14:41:43 +0000 Received: from EX13MTAUEA001.ant.amazon.com (pdx4-ws-svc-p6-lb7-vlan2.pdx.amazon.com [10.170.41.162]) by email-inbound-relay-2a-8549039f.us-west-2.amazon.com (Postfix) with ESMTPS id 32D3AA22FC; Mon, 9 Dec 2019 14:41:42 +0000 (UTC) Received: from EX13D32EUC004.ant.amazon.com (10.43.164.121) by EX13MTAUEA001.ant.amazon.com (10.43.61.243) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Mon, 9 Dec 2019 14:41:41 +0000 Received: from EX13D32EUC003.ant.amazon.com (10.43.164.24) by EX13D32EUC004.ant.amazon.com (10.43.164.121) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Mon, 9 Dec 2019 14:41:41 +0000 Received: from EX13D32EUC003.ant.amazon.com ([10.43.164.24]) by EX13D32EUC003.ant.amazon.com ([10.43.164.24]) with mapi id 15.00.1367.000; Mon, 9 Dec 2019 14:41:40 +0000 From: "Durrant, Paul" To: =?iso-8859-1?Q?Roger_Pau_Monn=E9?= CC: "linux-kernel@vger.kernel.org" , "xen-devel@lists.xenproject.org" , "Juergen Gross" , Stefano Stabellini , "Boris Ostrovsky" Subject: RE: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed Thread-Topic: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed Thread-Index: AQHVq3SCoU35oX1INEGjFwMD1PQM5aexs7UAgAAEsBCAAAg3gIAAAevggAAghQCAAAFgMA== Date: Mon, 9 Dec 2019 14:41:40 +0000 Message-ID: References: <20191205140123.3817-1-pdurrant@amazon.com> <20191205140123.3817-3-pdurrant@amazon.com> <20191209113926.GS980@Air-de-Roger> <19b5c2fa36b842e58bbdddd602c4e672@EX13D32EUC003.ant.amazon.com> <20191209122537.GV980@Air-de-Roger> <54e3cd3a42d8418d9a36388315deab13@EX13D32EUC003.ant.amazon.com> <20191209142852.GW980@Air-de-Roger> In-Reply-To: <20191209142852.GW980@Air-de-Roger> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.43.164.211] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > -----Original Message----- > From: Roger Pau Monn=E9 > Sent: 09 December 2019 14:29 > To: Durrant, Paul > Cc: linux-kernel@vger.kernel.org; xen-devel@lists.xenproject.org; Juergen > Gross ; Stefano Stabellini ; > Boris Ostrovsky > Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced t= o > closed >=20 > On Mon, Dec 09, 2019 at 12:40:47PM +0000, Durrant, Paul wrote: > > > -----Original Message----- > > > From: Roger Pau Monn=E9 > > > Sent: 09 December 2019 12:26 > > > To: Durrant, Paul > > > Cc: linux-kernel@vger.kernel.org; xen-devel@lists.xenproject.org; > Juergen > > > Gross ; Stefano Stabellini ; > > > Boris Ostrovsky > > > Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is > forced to > > > closed > > > > > > On Mon, Dec 09, 2019 at 12:01:38PM +0000, Durrant, Paul wrote: > > > > > -----Original Message----- > > > > > From: Roger Pau Monn=E9 > > > > > Sent: 09 December 2019 11:39 > > > > > To: Durrant, Paul > > > > > Cc: linux-kernel@vger.kernel.org; xen-devel@lists.xenproject.org; > > > Juergen > > > > > Gross ; Stefano Stabellini > ; > > > > > Boris Ostrovsky > > > > > Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is > > > forced to > > > > > closed > > > > > > > > > > On Thu, Dec 05, 2019 at 02:01:21PM +0000, Paul Durrant wrote: > > > > > > Only force state to closed in the case when the toolstack may > need > > > to > > > > > > clean up. This can be detected by checking whether the state in > > > xenstore > > > > > > has been set to closing prior to device removal. > > > > > > > > > > I'm not sure I see the point of this, I would expect that a > failure to > > > > > probe or the removal of the device would leave the xenbus state a= s > > > > > closed, which is consistent with the actual driver state. > > > > > > > > > > Can you explain what's the benefit of leaving a device without a > > > > > driver in such unknown state? > > > > > > > > > > > > > If probe fails then I think it should leave the state alone. If the > > > > state is moved to closed then basically you just killed that > > > > connection to the guest (as the frontend will normally close down > > > > when it sees this change) so, if the probe failure was due to a bug > > > > in blkback or, e.g., a transient resource issue then it's game over > > > > as far as that guest goes. > > > > > > But the connection can be restarted by switching the backend to the > > > init state again. > > > > Too late. The frontend saw closed and you already lost. > > > > > > > > > The ultimate goal here is PV backend re-load that is completely > > > transparent to the guest. Modifying anything in xenstore compromises > that > > > so we need to be careful. > > > > > > That's a fine goal, but not switching to closed state in > > > xenbus_dev_remove seems wrong, as you have actually left the frontend > > > without a matching backend and with the state not set to closed. > > > > > > > Why is this a problem? With this series fully applied a (block) backend > can come and go without needing to change the state. Relying on guests to > DTRT is not a sustainable option for a cloud deployment. > > > > > Ie: that would be fine if you explicitly state this is some kind of > > > internal blkback reload, but not for the general case where blkback > > > has been unbound. I think we need someway to difference a blkback > > > reload vs a unbound. > > > > > > > Why do we need that though? Why is it advantageous for a backend to go > to closed. No PV backends cope with an unbind as-is, and a toolstack > initiated unplug will always set state to 5 anyway. So TBH any state > transition done directly in the xenbus code looks wrong to me anyway (but > appears to be a necessary evil to keep the toolstack working in the event > it spawns a backend where there is actually to driver present, or it > doesn't come online). >=20 > IMO the normal flow for unbind would be to attempt to close open > connections and then remove the driver: leaving frontends connected > without any attached backends is not correct, and will just block the > guest frontend until requests start timing out. >=20 > I can see the reasoning for doing that for the purpose of updating a > blkback module without guests noticing, but I would prefer that > leaving connections open was an option that could be given when > unbinding (or maybe a driver option in sysfs?), so that the default > behaviour would be to try to close everything when unbinding if > possible. Well unbind is pretty useless now IMO since bind doesn't work, and a transi= tion straight to closed is just plain wrong anyway. But, we could have a fl= ag that the backend driver sets to say that it supports transparent re-bind= that gates this code. Would that make you feel more comfortable? If you want unbind to actually do a proper unplug then that's extra work an= d not really something I want to tackle (and re-bind would still need to be= toolstack initiated as something would have to re-create the xenstore area= ). Paul >=20 > Thanks, Roger.