Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp2540025pxb; Fri, 29 Oct 2021 03:35:21 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxsmJKAatRXqSSgAe3kSa6xTpWDAKp7aD3zGnk6GV/erO/HM6Yq9eKzMWyNQ5ztDZjwoteX X-Received: by 2002:a17:903:2091:b0:141:64ae:f56e with SMTP id d17-20020a170903209100b0014164aef56emr8856389plc.76.1635503721120; Fri, 29 Oct 2021 03:35:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635503721; cv=none; d=google.com; s=arc-20160816; b=fSir7T+dJWAFH+kaAeq3RwFhuUqWGNrrGp80gXg28a1uJsLcTYvA1XSYUDxCIDrzDJ KmOI2bx/oAA4lTPk3Zvj9HIQQ9m+NFHP7J/+AiiXiakIN27sV+dxdMua4u2Tr2D6iUGd q0jorEXuOXvZElrrtTzTEsK+zwO7L456ff8snkVvcfPCgDgXkLjrzJ6CcN1/lz6QIfHS P4oqeFwYUIpcQDTwNpgshBuzjzqN+2czPJBD/QyK6SE6lk2pLD/QACMp4jJzn+GL28BX cXzU+NRvazFIGB6JCn749CBedY96MQTbKjthiaUsssHHvufGPmx57fp1pS6yg2P5wSqi SweQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=YdG7303DTkIoGkZP5x+x4wYzhp24+CwM+Xym9I44YWs=; b=UF8yV2gqzAUExzrCpSPx9XlmlJizr/vBawIkEvsNytiYxI7v6MeV7F5rkhwDT++/RQ W1YLwsBdachEhCyZ7CCtn0XwLJmUMHpR3EweJboaU3NKkTfQs6SXAi3cpq7lRoiNqXKp t4ipyHdpzgExqW0qGk1sCy3MvLXJ7t46+YGpOzBaKpEdhToOQbEt3NLNwx+ELFnUjQGy FpC8/7XLbFKWXXEUNn5fmxiZtggxoKAuY1BXL2/4ak3TU1TZJ4uvCo6kzPDCILUn5vvp GN5ziSA7KXzxUnG7z+PvgX1M8vgH+1AoJR6HGFmqGMxREVCgygYcE2i25E/Vj0aeSh0L hGzA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@messagingengine.com header.s=fm1 header.b=B7QCGeJ9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id r3si7446354pgr.546.2021.10.29.03.35.04; Fri, 29 Oct 2021 03:35:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@messagingengine.com header.s=fm1 header.b=B7QCGeJ9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231777AbhJ2KfP (ORCPT + 99 others); Fri, 29 Oct 2021 06:35:15 -0400 Received: from out3-smtp.messagingengine.com ([66.111.4.27]:37873 "EHLO out3-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231734AbhJ2KfO (ORCPT ); Fri, 29 Oct 2021 06:35:14 -0400 Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id C0CCF5C01B4; Fri, 29 Oct 2021 06:32:45 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Fri, 29 Oct 2021 06:32:45 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=YdG730 3DTkIoGkZP5x+x4wYzhp24+CwM+Xym9I44YWs=; b=B7QCGeJ92ZtCVMqwCNv34T 954aDC3ascW+t0S+sUV//zWOoDdJESXwj5FSjsUPLNH0pouoyow8/sUN52I7/NoB C173U6lft8cSpGa7RjglbqH+z2kwmwsT1l8VNnyvplzPHdmlFn3RUP+x+vulL1xo bWVzROOVZBgThP9KgxPQPh+8Ppsar87MapLTrSY/n2L/47Cj3D9SqHwsZKVZcFW5 IaVnLvNIVIWdrIa+6Ear9w3H1lcyhMPIB/1/X8okuUVfIDB8yAlJbOiy0rkMiogp zuBZE91zK1+Rg8mVFkYhfFk9IRBHBTwcJAxROcQt7jkFM59upaCtywi91M7rK1pQ == X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvtddrvdeghedgvdeiucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvffukfhfgggtuggjsehgtdorredttdejnecuhfhrohhmpeforghrvghk ucforghrtgiihihkohifshhkihdqifpkrhgvtghkihcuoehmrghrmhgrrhgvkhesihhnvh hishhisghlvghthhhinhhgshhlrggsrdgtohhmqeenucggtffrrghtthgvrhhnpeeiieeh jeegteeggeeigffhkeekieefjeduhedvfffhiefgkefhvdevfeejffdvfeenucevlhhush htvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehmrghrmhgrrhgvkhes ihhnvhhishhisghlvghthhhinhhgshhlrggsrdgtohhm X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 29 Oct 2021 06:32:44 -0400 (EDT) Date: Fri, 29 Oct 2021 12:32:40 +0200 From: Marek =?utf-8?Q?Marczykowski-G=C3=B3recki?= To: Juergen Gross Cc: xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org, Boris Ostrovsky , Stefano Stabellini , stable@vger.kernel.org Subject: Re: [PATCH] xen/balloon: add late_initcall_sync() for initial ballooning done Message-ID: References: <20211028105952.10011-1-jgross@suse.com> <27e7619a-a797-5c46-9f9f-015ab488e31c@suse.com> <63a474ea-9e5d-4515-ca99-1d56f52b7673@suse.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="MZykTqTxNZ8yEQ39" Content-Disposition: inline In-Reply-To: <63a474ea-9e5d-4515-ca99-1d56f52b7673@suse.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --MZykTqTxNZ8yEQ39 Content-Type: text/plain; protected-headers=v1; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Date: Fri, 29 Oct 2021 12:32:40 +0200 From: Marek =?utf-8?Q?Marczykowski-G=C3=B3recki?= To: Juergen Gross Cc: xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org, Boris Ostrovsky , Stefano Stabellini , stable@vger.kernel.org Subject: Re: [PATCH] xen/balloon: add late_initcall_sync() for initial ballooning done On Fri, Oct 29, 2021 at 12:22:18PM +0200, Juergen Gross wrote: > On 29.10.21 11:57, Marek Marczykowski-G=C3=B3recki wrote: > > On Fri, Oct 29, 2021 at 06:48:44AM +0200, Juergen Gross wrote: > > > On 28.10.21 22:16, Marek Marczykowski-G=C3=B3recki wrote: > > > > On Thu, Oct 28, 2021 at 12:59:52PM +0200, Juergen Gross wrote: > > > > > When running as PVH or HVM guest with actual memory < max memory = the > > > > > hypervisor is using "populate on demand" in order to allow the gu= est > > > > > to balloon down from its maximum memory size. For this to work > > > > > correctly the guest must not touch more memory pages than its tar= get > > > > > memory size as otherwise the PoD cache will be exhausted and the = guest > > > > > is crashed as a result of that. > > > > >=20 > > > > > In extreme cases ballooning down might not be finished today befo= re > > > > > the init process is started, which can consume lots of memory. > > > > >=20 > > > > > In order to avoid random boot crashes in such cases, add a late i= nit > > > > > call to wait for ballooning down having finished for PVH/HVM gues= ts. > > > > >=20 > > > > > Cc: > > > > > Reported-by: Marek Marczykowski-G=C3=B3recki > > > > > Signed-off-by: Juergen Gross > > > >=20 > > > > It may happen that initial balloon down fails (state=3D=3DBP_ECANCE= LED). In > > > > that case, it waits indefinitely. I think it should rather report a > > > > failure (and panic? it's similar to OOM before PID 1 starts, so rat= her > > > > hard to recover), instead of hanging. > > >=20 > > > Okay, I can add something like that. I'm thinking of issuing a failure > > > message in case of credit not having changed for 1 minute and panic() > > > after two more minutes. Is this fine? > >=20 > > Isn't it better to get a state from balloon_thread()? If the balloon > > fails it won't really try anymore (until 3600s timeout), so waiting in > > that state doesn't help. And reporting the failure earlier may be more > > user friendly. Or maybe there is something that could wakeup the thread > > earlier, that I don't see? Hot plugging more RAM is rather unlikely at > > this stage... >=20 > Waking up the thread would be easy, but probably that wouldn't really > help. Waking it up alone no. I was thinking what could wake it up - if nothing, then definitely waiting wouldn't help. You explained that just below: > The idea was that maybe a Xen admin would see the guest not booting up > further and then adding some more memory to the guest (this should wake > up the balloon thread again). >=20 > I agree that stopping to wait for ballooning to finish in case of it > having failed is probably a sensible thing to do. Additionally I could > add a boot parameter to control the timeout after the fail message and > the panic(). Right, that would make sense: it's basically a time admin has to plug in more memory to the VM. > What do you think? --=20 Best Regards, Marek Marczykowski-G=C3=B3recki Invisible Things Lab --MZykTqTxNZ8yEQ39 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEhrpukzGPukRmQqkK24/THMrX1ywFAmF7zckACgkQ24/THMrX 1yzVxQf8DUP3H4Z8NAkaoDVpZJ/oOnqtiuVAYq6eLO+LusUGWgfcjgF6QoxCvMCA yuUmLc5NaZf9LNeqxNm6v+TOeS4g1ZYgAGtU1r2WeujWFD8uXD/3v1pWyWhLTFEs BGKjjDJDWBGVYSXh64P7UKxG9fZQH+uiEy9Agfv3Gy0PhvtH8WqpD/kTi5lN/z0G VQ5jDWcoZgluFddglJx35OXfEEjo8qaojvbl86cYBTFuiRfrVTEo411oh8yl4cHh sWC+L0F5cngKl6M53q+Rpo3Q5Ohabd/INmhZEGOgQ7P8VXlxhlCkVUA5bWjR1rt8 KdylFCdtRVHctkF+FWP82EqQEHjmtw== =7a/r -----END PGP SIGNATURE----- --MZykTqTxNZ8yEQ39--