Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp323414ybt; Wed, 17 Jun 2020 01:43:11 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwOGArkDF2JVbKCecF1tz3hdVZIx7pS3y/hzPFVqjc72v42SyE0/vGDQxzz5piFfRL1kmds X-Received: by 2002:a17:906:2c5b:: with SMTP id f27mr6872770ejh.413.1592383391058; Wed, 17 Jun 2020 01:43:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1592383391; cv=none; d=google.com; s=arc-20160816; b=l+wRBhJ4wpkJIOUrbTh/+0YeWt87HlU2lEeYjuO+eKMvujMXGjm4b4xMAN/ouPteXz AVW4iHK0B2p8TBTkbOc3jaPA0ufOif0ghfHWRSJcZrRBsn45rGLx0IVHDIOw+hjK39/d RRQX3TgIcaW2e/Nun0lAkz7uGSi8YSiDcJGKCf5zH6kjs5v28tNqCUqk7v5GhTd3Vfn8 k5DhaJw/3x83ONzEkoa+LeU14jcs8fWcR6WbiCHqttXP4bELyWj53H1GA4gsNtogwLof DixfjvjUrLM1BoTxDxiAzuvUri2nbooHSZLOcyEQpn7UUQdHMGAfPWsUMwtf0SXVg5Z+ XLLw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:ironport-sdr; bh=JEjDRngFFIXK5/7AaqKgiYGM/F5smxx9RKGsj5b6UAk=; b=jOgid0m4luP1RX2enZ1Nfk8wVsUcBBOWLaTIPPpEK+rZjRwEJLqLXMnHkts5UxprKC LsetyEqY5ujHQ/ZA9SZosZZDOCE837lQSeZaVLs0MTNjXFXv+b+GriLz8GypSyqnf84E gw4Cj9ymxl19NiLau0Yyp53rRS7Z9zCol1QlS2y7K5l5csKzimyq+bvgcqxdLvSjdkMP q/ukhs7zrJl4GiwRWRBeWYF1KtKlR2r1srF2Pxd8em7ZvrIAl4R6NpNAvAPY2tHSWxls bIMt56D21um42FU8K5LJuZUKHiieQ1kHY9fZ2bcrTkMKXENLO4G7pzrpBL+Urv7LexgF 9oww== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=citrix.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l5si12030924edn.474.2020.06.17.01.42.49; Wed, 17 Jun 2020 01:43:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=citrix.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726980AbgFQIjE (ORCPT + 99 others); Wed, 17 Jun 2020 04:39:04 -0400 Received: from esa3.hc3370-68.iphmx.com ([216.71.145.155]:44270 "EHLO esa3.hc3370-68.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725964AbgFQIjD (ORCPT ); Wed, 17 Jun 2020 04:39:03 -0400 Authentication-Results: esa3.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none IronPort-SDR: rXJCpYrqKVrAxeK/HGufl69LKC4pcQTcUUGZYB534dwGmUxVSqjN+x/504zNn9i+YPShSGKfW0 U0QMIZHnITxsotA99KOFT6OeNi8ZZ1m2L7oXIXDGt902t7cD4K5WSO+YP/COJ+vMNSTSziyGu7 HNftc9oo9lXvpx57ycrXBxaltImFbOK3Us6eexDysElhroEjtq8q7fatJV8B7YHjav9ncYU2JC 6yWJiQKYqhQurhE6bspELNnpxxK4ZszRp06jGIM0AlVuGxTyJg6YOrknOiMVmGXdVj9Ks/74SW yzE= X-SBRS: 2.7 X-MesageID: 20246153 X-Ironport-Server: esa3.hc3370-68.iphmx.com X-Remote-IP: 162.221.158.21 X-Policy: $RELAYED X-IronPort-AV: E=Sophos;i="5.73,522,1583211600"; d="scan'208";a="20246153" Date: Wed, 17 Jun 2020 10:38:50 +0200 From: Roger Pau =?utf-8?B?TW9ubsOp?= To: Anchal Agarwal CC: Boris Ostrovsky , "tglx@linutronix.de" , "mingo@redhat.com" , "bp@alien8.de" , "hpa@zytor.com" , "x86@kernel.org" , "jgross@suse.com" , "linux-pm@vger.kernel.org" , "linux-mm@kvack.org" , "Kamata, Munehisa" , "sstabellini@kernel.org" , "konrad.wilk@oracle.com" , "axboe@kernel.dk" , "davem@davemloft.net" , "rjw@rjwysocki.net" , "len.brown@intel.com" , "pavel@ucw.cz" , "peterz@infradead.org" , "Valentin, Eduardo" , "Singh, Balbir" , "xen-devel@lists.xenproject.org" , "vkuznets@redhat.com" , "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "Woodhouse, David" , "benh@kernel.crashing.org" Subject: Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation] Message-ID: <20200617083850.GX735@Air-de-Roger> References: <7FD7505E-79AA-43F6-8D5F-7A2567F333AB@amazon.com> <20200604070548.GH1195@Air-de-Roger> <20200616214925.GA21684@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com> <20200616223003.GA28769@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20200616223003.GA28769@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com> X-ClientProxiedBy: AMSPEX02CAS02.citrite.net (10.69.22.113) To AMSPEX02CL02.citrite.net (10.69.22.126) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 16, 2020 at 10:30:03PM +0000, Anchal Agarwal wrote: > On Tue, Jun 16, 2020 at 09:49:25PM +0000, Anchal Agarwal wrote: > > On Thu, Jun 04, 2020 at 09:05:48AM +0200, Roger Pau Monné wrote: > > > On Wed, Jun 03, 2020 at 11:33:52PM +0000, Agarwal, Anchal wrote: > > > > On Tue, May 19, 2020 at 11:27:50PM +0000, Anchal Agarwal wrote: > > > > > From: Munehisa Kamata > > > > > + xenbus_dev_error(dev, err, "Freezing timed out;" > > > > > + "the device may become inconsistent state"); > > > > > > > > Leaving the device in this state is quite bad, as it's in a closed > > > > state and with the queues frozen. You should make an attempt to > > > > restore things to a working state. > > > > > > > > You mean if backend closed after timeout? Is there a way to know that? I understand it's not good to > > > > leave it in this state however, I am still trying to find if there is a good way to know if backend is still connected after timeout. > > > > Hence the message " the device may become inconsistent state". I didn't see a timeout not even once on my end so that's why > > > > I may be looking for an alternate perspective here. may be need to thaw everything back intentionally is one thing I could think of. > > > > > > You can manually force this state, and then check that it will behave > > > correctly. I would expect that on a failure to disconnect from the > > > backend you should switch the frontend to the 'Init' state in order to > > > try to reconnect to the backend when possible. > > > > > From what I understand forcing manually is, failing the freeze without > > disconnect and try to revive the connection by unfreezing the > > queues->reconnecting to backend [which never got diconnected]. May be even > > tearing down things manually because I am not sure what state will frontend > > see if backend fails to to disconnect at any point in time. I assumed connected. > > Then again if its "CONNECTED" I may not need to tear down everything and start > > from Initialising state because that may not work. > > > > So I am not so sure about backend's state so much, lets say if xen_blkif_disconnect fail, > > I don't see it getting handled in the backend then what will be backend's state? > > Will it still switch xenbus state to 'Closed'? If not what will frontend see, > > if it tries to read backend's state through xenbus_read_driver_state ? > > > > So the flow be like: > > Front end marks XenbusStateClosing > > Backend marks its state as XenbusStateClosing > > Frontend marks XenbusStateClosed > > Backend disconnects calls xen_blkif_disconnect > > Backend fails to disconnect, the above function returns EBUSY > > What will be state of backend here? > > Frontend did not tear down the rings if backend does not switches the > > state to 'Closed' in case of failure. > > > > If backend stays in CONNECTED state, then even if we mark it Initialised in frontend, backend > > won't be calling connect(). {From reading code in frontend_changed} > > IMU, Initialising will fail since backend dev->state != XenbusStateClosed plus > > we did not tear down anything so calling talk_to_blkback may not be needed > > > > Does that sound correct? > Send that too quickly, I also meant to add XenBusIntialised state should be ok > only if we expect backend will stay in "Connected" state. Also, I experimented > with that notion. I am little worried about the correctness here. > Can the backend come to an Unknown state somehow? Not really, there's no such thing as an Unknown state. There are no guarantees about what a backend can do really, so it could indeed switch to a not recognized state, but that would be a bug in the backend. Roger.