Received: by 2002:ab2:6857:0:b0:1ef:ffd0:ce49 with SMTP id l23csp1024608lqp; Fri, 22 Mar 2024 03:23:01 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWAR41pCVoFwft4WdAwl+9thksHRgVrX2FuZz7nDKaXjghBuRV9CDVXPbUV3jwsA8+HkCZeNhoEhmHFQrxRF6w4AV8joIGPq1ECaToiCA== X-Google-Smtp-Source: AGHT+IGU0VxIaOkV9JNnfDtuoTyeY7FnbVze7S5SK0BIGKfofkJPiAbDxWlNRFrDO3Bj2/6oXuFN X-Received: by 2002:a19:6b17:0:b0:513:d5ec:afb with SMTP id d23-20020a196b17000000b00513d5ec0afbmr1583087lfa.40.1711102981162; Fri, 22 Mar 2024 03:23:01 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1711102981; cv=pass; d=google.com; s=arc-20160816; b=wrsBJ77d85omHEUg5pMTrJrNh9GPwbKhRSSLiYtC4vZGw7N+oPzwcDVhzbgZNUqXI6 fqQ9UWkt+MWepSQFGJRrb65b7SFuBBSRQG0y2t9W3gPoc4IPWP29GKLKazbV8ok1ZYv/ 8gay0TzX4wAZFOMvoFw1OAVyX5MGOcDuegH/LvkPPCPfrzQYYMF2NW2e2DX1DiBlItyw Xgnq1ywNtvG+Ap8rClC9+tFPcmFoSfkozBL6lR8JlvBa3G9hr5lJazjPDavrk7FlycPX 81X9+cNsFOpeuwX1QdqSSEqdBXofkaGuMc+yyxbVijVlU4zyBs+pBuReq1Ub0SEuD1+W 9l+A== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id:dkim-signature; bh=t4EtCupj1AZsWtQ2aOS+eqQ4iFxWWbTe9gpwh+nYbgo=; fh=LmNTj5PTnRDzp8NbjzEIiNKSegmT1zUozEReYRF4qk4=; b=KMNw6wLDJqBbIbuWv8TXGKV8Jz+A3PhgqEYDLdiATwmfqC2CT3+gv6Rvc4Z4wCsSbr 09iq9EYaFDkd0i4FiZTTEVKJj/K4nsT2gIXiQOS3bj9H//Kc1yk/kXtRHQ2pDMrnviTQ FT9j477ocvm3ZrU7Sm4a7UQZghcdGp/kRo5D2WyGgcX5GFqGhOGEO1TAdl2ZlEzJIQZ2 4EbGmBhkPjtt1cNjSB0j52bXztFnFhe8QZoiD36QjRfsg/9buDKNpVfVBsxbBtmx/+8a xCwa9BaqqxUpycPneDmnJAOUPbbxMDC9LZTFPv3uHVNsTLBQIPx7WXev6Mi4JcF+0pd6 jipw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=OCumWId4; arc=pass (i=1 spf=pass spfdomain=linux.ibm.com dkim=pass dkdomain=ibm.com dmarc=pass fromdomain=linux.ibm.com); spf=pass (google.com: domain of linux-kernel+bounces-111286-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-111286-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id g25-20020a1709067c5900b00a46fdbe8518si784742ejp.744.2024.03.22.03.23.01 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Mar 2024 03:23:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-111286-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=OCumWId4; arc=pass (i=1 spf=pass spfdomain=linux.ibm.com dkim=pass dkdomain=ibm.com dmarc=pass fromdomain=linux.ibm.com); spf=pass (google.com: domain of linux-kernel+bounces-111286-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-111286-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 7A96B1F23952 for ; Fri, 22 Mar 2024 10:22:52 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 0F90B3B2BF; Fri, 22 Mar 2024 10:22:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="OCumWId4" Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A6CDF39AC5; Fri, 22 Mar 2024 10:22:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711102950; cv=none; b=qYtJ3u2Rj386j7gmsPnHdzFI+Aoa5w+SyhY2lufc+/L+1tQiuQJwDHKdGwkZVMPHgOVz1kbuiU6m/um8v1IXi++A4wZpQUx81YdbjS+4pKF8O/zbpdqXqE3qE4x9p6UMGsPk8LOTL8R1zBv8IQ5DxACkWn5c0hJycFbKQnaYBiM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711102950; c=relaxed/simple; bh=vMwYb4avxUZsb6Vz24fjsEAOtzod3kHSa/OcSBljTCc=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=WJG/2ful8/knSRCOb4wjRU3toYRCy50kp3jGStHAIHyZsjETcjnzngE9RZ8jGuXpGXy4xVDR3cr3011J+EeVOpW773Klpd3YWqYD2BgZDf+T4YzEFAtDlooJ3dk7IAhGZPp0UPoIO9KWqbnlFMOQ5p8ikGkD/HmLY2RzTcCnz9U= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=OCumWId4; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 42M7D0oO029705; Fri, 22 Mar 2024 10:22:22 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=t4EtCupj1AZsWtQ2aOS+eqQ4iFxWWbTe9gpwh+nYbgo=; b=OCumWId4GftXCC+x8XAN+4LzM5nfQc2Vt/51ofVBaheILkrNgywAF5nxxOT3ugMMfwbA 1j5by1l1X8hD/dPcSokAsNhZQLcXz8nqpLC+Cu9zYl0eS1zXrKF4DX7eTP2hP4jZOhlx o1dJkNZqgSVbr4Ls/DqVHnLHaHi16bwnPxADpGBOxYuNj2PuyOuROQrTgkRAByE28Erw oTIUczGUPaIVhc+8YtXnQkXdcDeqTeLiLoc8RaopVRYflvAYekQkr+c+ijVVQ1lpoUAl njxfj2pDb7jDT7OCs/USg4OVicIPam4oA9xB1eNAGfqoXjxMcH3Oy0ne/uQchOaBQKIP AA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3x15d30c1v-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 22 Mar 2024 10:22:21 +0000 Received: from m0360083.ppops.net (m0360083.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 42MAMLB0015406; Fri, 22 Mar 2024 10:22:21 GMT Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3x15d30c1s-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 22 Mar 2024 10:22:21 +0000 Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 42M8kq8w015722; Fri, 22 Mar 2024 10:22:20 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 3x0x15k3nh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 22 Mar 2024 10:22:20 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 42MAMEqP47382980 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 22 Mar 2024 10:22:16 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9EE7620067; Fri, 22 Mar 2024 10:22:14 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5DC812004F; Fri, 22 Mar 2024 10:22:14 +0000 (GMT) Received: from [9.152.224.222] (unknown [9.152.224.222]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Fri, 22 Mar 2024 10:22:14 +0000 (GMT) Message-ID: Date: Fri, 22 Mar 2024 11:22:13 +0100 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v1 2/2] s390/mm: re-enable the shared zeropage for !PV and !skeys KVM guests Content-Language: en-US To: David Hildenbrand , linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, Janosch Frank , Claudio Imbrenda , Heiko Carstens , Vasily Gorbik , Andrew Morton , Peter Xu , Alexander Gordeev , Sven Schnelle , Gerald Schaefer , Andrea Arcangeli , kvm@vger.kernel.org, linux-s390@vger.kernel.org References: <20240321215954.177730-1-david@redhat.com> <20240321215954.177730-3-david@redhat.com> From: Christian Borntraeger In-Reply-To: <20240321215954.177730-3-david@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: ztFlzz80v3ERALXfSy3FTl3bPX6Bi31k X-Proofpoint-ORIG-GUID: 3N0heez5LomRZyHLGJD7rvjWaDPGsK2B X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-03-22_06,2024-03-21_02,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 phishscore=0 clxscore=1011 priorityscore=1501 impostorscore=0 bulkscore=0 adultscore=0 spamscore=0 malwarescore=0 lowpriorityscore=0 mlxlogscore=999 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2403210000 definitions=main-2403220074 Am 21.03.24 um 22:59 schrieb David Hildenbrand: > commit fa41ba0d08de ("s390/mm: avoid empty zero pages for KVM guests to > avoid postcopy hangs") introduced an undesired side effect when combined > with memory ballooning and VM migration: memory part of the inflated > memory balloon will consume memory. > > Assuming we have a 100GiB VM and inflated the balloon to 40GiB. Our VM > will consume ~60GiB of memory. If we now trigger a VM migration, > hypervisors like QEMU will read all VM memory. As s390x does not support > the shared zeropage, we'll end up allocating for all previously-inflated > memory part of the memory balloon: 50 GiB. So we might easily > (unexpectedly) crash the VM on the migration source. > > Even worse, hypervisors like QEMU optimize for zeropage migration to not > consume memory on the migration destination: when migrating a > "page full of zeroes", on the migration destination they check whether the > target memory is already zero (by reading the destination memory) and avoid > writing to the memory to not allocate memory: however, s390x will also > allocate memory here, implying that also on the migration destination, we > will end up allocating all previously-inflated memory part of the memory > balloon. > > This is especially bad if actual memory overcommit was not desired, when > memory ballooning is used for dynamic VM memory resizing, setting aside > some memory during boot that can be added later on demand. Alternatives > like virtio-mem that would avoid this issue are not yet available on > s390x. > > There could be ways to optimize some cases in user space: before reading > memory in an anonymous private mapping on the migration source, check via > /proc/self/pagemap if anything is already populated. Similarly check on > the migration destination before reading. While that would avoid > populating tables full of shared zeropages on all architectures, it's > harder to get right and performant, and requires user space changes. > > Further, with posctopy live migration we must place a page, so there, > "avoid touching memory to avoid allocating memory" is not really > possible. (Note that a previously we would have falsely inserted > shared zeropages into processes using UFFDIO_ZEROPAGE where > mm_forbids_zeropage() would have actually forbidden it) > > PV is currently incompatible with memory ballooning, and in the common > case, KVM guests don't make use of storage keys. Instead of zapping > zeropages when enabling storage keys / PV, that turned out to be > problematic in the past, let's do exactly the same we do with KSM pages: > trigger unsharing faults to replace the shared zeropages by proper > anonymous folios. > > What about added latency when enabling storage kes? Having a lot of > zeropages in applicable environments (PV, legacy guests, unittests) is > unexpected. Further, KSM could today already unshare the zeropages > and unmerging KSM pages when enabling storage kets would unshare the > KSM-placed zeropages in the same way, resulting in the same latency. > > Signed-off-by: David Hildenbrand Nice work. Looks good to me and indeed it fixes the memory over-consumption that you mentioned. Reviewed-by: Christian Borntraeger Tested-by: Christian Borntraeger (can also be seen with virsh managedsave; virsh start) I guess its too invasive for stable, but I would say it is real fix.