Received: by 2002:ab2:7988:0:b0:1f4:b336:87c4 with SMTP id g8csp93683lqj; Thu, 11 Apr 2024 10:43:31 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUbiqlMf5lVE1rEx37z2KMRSgD2lXL1lR7UPb5QGXSiowtyu25OxcdPWzxtAQTTg+ArEXgDP+6kkOF2SP3IK2R0RuZ/goUGaq8FWhLWVg== X-Google-Smtp-Source: AGHT+IHSHdb5vzTaa3UEaRsNa/tq/S0WH2i1I+L3rwwJ/udwrV0/i9LXKg1JEgCGMfRkhkndERta X-Received: by 2002:a05:622a:4c7:b0:436:6369:f632 with SMTP id q7-20020a05622a04c700b004366369f632mr379114qtx.20.1712857410841; Thu, 11 Apr 2024 10:43:30 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712857410; cv=pass; d=google.com; s=arc-20160816; b=FvSDZqxjMG5FzcykltS6frgKmKsPpHCZelm70nEfjYv66+Zen66oPgC63AYG1f3OSb ZBoipuHABiGnzU51iLBCKJDN8T2zXxPmufpy0T0cjS5/vMDmIyiosqshtsnj5KOVNhGl mVMbZ5ny90EFgHry8iUmtKXEVxdFriWDiIJHRGwBnAx00XEZbZq9/TLGyrUnE2Kex4oP cXGwJLvH8JNvOOtcfNSKtOGWWxWWOpMYOag57QSic6wvPP7jWn+ZlCdz9Sy93hAJvphu fqxmmhddiyNFi+gWo//WgNwhYIFB8lwwUgD1zlfwdXnZ2mPQXhK6J7HGDG8ggfE8S2P5 ByPg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=ZhbzH0ImR3r/1f0GcYB335mQQPuC0QvnUY34Zu4C8Ic=; fh=lTN169O21rOOr7J+7lxx3V02qSPUFVwSLBMTyvj5SSQ=; b=kq/4hETSKWsCKuItPIGRH7u+fxgQ2AlLYtrlzMjepcDPwCPfxD3hwYVFSAkf1qKLvv CoEzOiI4n49/+T7wnv4v9A+ExheTM1qJhYYnKgvnBDx868g9YjxvwIOeT5F5LJQ8lGIV JljHgMfFkB2qUxvM3nxxpMbIbOHVWPLQlcTCMbcrhtJ/lpPdQ27RsriEUgjYKtybfl5+ oMyZPdL1oZ+e/vs6x2l/fDPbjxk9m0IinDrHeqE3D3EysXsKLTMbYtOums451mfmnmO4 ERu+fOIJQr0nLX6wKTarVNKl8jEb9kkb7hG/lSDCuo7ZldkpkTQoRIXzK7z3s7NXCaur 21PA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Is1qWto3; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-141299-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-141299-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id g17-20020ac85811000000b0043462cceb56si1991300qtg.509.2024.04.11.10.43.30 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 11 Apr 2024 10:43:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-141299-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Is1qWto3; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-141299-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-141299-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 909261C22376 for ; Thu, 11 Apr 2024 17:43:30 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A3B4015DBB2; Thu, 11 Apr 2024 16:15:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Is1qWto3" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1DD1215D5C2 for ; Thu, 11 Apr 2024 16:14:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712852099; cv=none; b=ljgyPlT1ZjbUF3oydowlUkEqAzbi7obyzrIMotsAdIdWBl50C7T4whIqoKfyqCkTk47hSNSIezh/PVIrp/F2fmCyZtxPEl8edP/P3ml831NbDsk0yrW3c6pA0j8bOuB6WahvO3SHtVeyaD5FAXiGGCXn5W6xh14Ko1RAIzZB4vc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712852099; c=relaxed/simple; bh=bIGD7TEXYxMqNuHPxtOApDGnJHOvQ/kaNLS9oW0lEDg=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=J331miwqvd2QbY5U+Xrgvo13YsRP83SWvXMfJHmzdJLHXAOdNz72GhK+/SRY/tMKhNHnA9cKzn6ainNWL47eKww3Z5t95WLQcjSMwARXBIvLhEgr2gOUkXVAuVoNCB+f7iktB+FPUdIs53FNF+mZr+44UV/QIGjbE8CNRjGY91Y= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Is1qWto3; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1712852096; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=ZhbzH0ImR3r/1f0GcYB335mQQPuC0QvnUY34Zu4C8Ic=; b=Is1qWto3K3gtYPGh0EJ4HBsn6+/+WUpF1AIQZCzfF5eyAvmwqv9khhIUmdASm8+SDOJQMe PwxbXEp9vCjOYOO3lesXkqWMwYotYH+XVnJ/tjoRU6pAYc4ySwVJuY/RVSGHPdSPb+TZI8 OfbLNhMLKI4W6ORrnkYPeSgdbca3poM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-550-0jne7cuaOEKLOPpDSddeiQ-1; Thu, 11 Apr 2024 12:14:51 -0400 X-MC-Unique: 0jne7cuaOEKLOPpDSddeiQ-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 30A7088E861; Thu, 11 Apr 2024 16:14:50 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.194.173]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2D60E10E4B; Thu, 11 Apr 2024 16:14:45 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Christian Borntraeger , Janosch Frank , Claudio Imbrenda , Heiko Carstens , Vasily Gorbik , Andrew Morton , Peter Xu , Alexander Gordeev , Sven Schnelle , Gerald Schaefer , Andrea Arcangeli , kvm@vger.kernel.org, linux-s390@vger.kernel.org Subject: [PATCH v3 0/2] s390/mm: shared zeropage + KVM fixes Date: Thu, 11 Apr 2024 18:14:39 +0200 Message-ID: <20240411161441.910170-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.5 This series fixes one issue with uffd + shared zeropages on s390x and fixes that "ordinary" KVM guests can make use of shared zeropages again. userfaultfd could currently end up mapping shared zeropages into processes that forbid shared zeropages. This only apples to s390x, relevant for handling PV guests and guests that use storage kets correctly. Fix it by placing a zeroed folio instead of the shared zeropage during UFFDIO_ZEROPAGE instead. I stumbled over this issue while looking into a customer scenario that is using: (1) Memory ballooning for dynamic resizing. Start a VM with, say, 100 GiB and inflate the balloon during boot to 60 GiB. The VM has ~40 GiB available and additional memory can be "fake hotplugged" to the VM later on demand by deflating the balloon. Actual memory overcommit is not desired, so physical memory would only be moved between VMs. (2) Live migration of VMs between sites to evacuate servers in case of emergency. Without the shared zeropage, during (2), the VM would suddenly consume 100 GiB on the migration source and destination. On the migration source, where we don't excpect memory overcommit, we could easilt end up crashing the VM during migration. Independent of that, memory handed back to the hypervisor using "free page reporting" would end up consuming actual memory after the migration on the destination, not getting freed up until reused+freed again. While there might be ways to optimize parts of this in QEMU, we really should just support the shared zeropage again for ordinary VMs. We only expect legcy guests to make use of storage keys, so let's handle zeropages again when enabling storage keys or when enabling PV. To not break userfaultfd like we did in the past, don't zap the shared zeropages, but instead trigger unsharing faults, just like we do for unsharing KSM pages in break_ksm(). Unsharing faults will simply replace the shared zeropage by a zeroed anonymous folio. We can already trigger the same fault path using GUP, when trying to long-term pin a shared zeropage, but also when unmerging a KSM-placed zeropages, so this is nothing new. Patch #1 tested on 86-64 by forcing mm_forbids_zeropage() to be 1, and running the uffd selftests. Patch #2 tested on s390x: the live migration scenario now works as expected, and kvm-unit-tests that trigger usage of skeys work well, whereby I can see detection and unsharing of shared zeropages. Further (as broken in v2), I tested that the shared zeropage is no longer populated after skeys are used -- that mm_forbids_zeropage() works as expected: ./s390x-run s390x/skey.elf \ -no-shutdown \ -chardev socket,id=monitor,path=/var/tmp/mon,server,nowait \ -mon chardev=monitor,mode=readline Then, in another shell: # cat /proc/`pgrep qemu`/smaps_rollup | grep Rss Rss: 31484 kB # echo "dump-guest-memory tmp" | sudo nc -U /var/tmp/mon ... # cat /proc/`pgrep qemu`/smaps_rollup | grep Rss Rss: 160452 kB -> Reading guest memory does not populate the shared zeropage Doing the same with selftest.elf (no skeys) # cat /proc/`pgrep qemu`/smaps_rollup | grep Rss Rss: 30900 kB # echo "dump-guest-memory tmp" | sudo nc -U /var/tmp/mon ... # cat /proc/`pgrep qemu`/smaps_rollup | grep Rsstmp/mon Rss: 30924 kB -> Reading guest memory does populate the shared zeropage Based on s390/features. Andrew agreed that both patches can go via the s390x tree. v2 -> v3: * "mm/userfaultfd: don't place zeropages when zeropages are disallowed" -> Fix wrong mm_forbids_zeropage check * "s390/mm: re-enable the shared zeropage for !PV and !skeys KVM guests" -> Fix wrong mm_forbids_zeropage define v1 -> v2: * "mm/userfaultfd: don't place zeropages when zeropages are disallowed" -> Minor "ret" ahndling tweaks * "s390/mm: re-enable the shared zeropage for !PV and !skeys KVM guests" -> Added Fixes: tag Cc: Christian Borntraeger Cc: Janosch Frank Cc: Claudio Imbrenda Cc: Heiko Carstens Cc: Vasily Gorbik Cc: Andrew Morton Cc: Peter Xu Cc: Alexander Gordeev Cc: Sven Schnelle Cc: Gerald Schaefer Cc: Andrea Arcangeli Cc: kvm@vger.kernel.org Cc: linux-s390@vger.kernel.org David Hildenbrand (2): mm/userfaultfd: don't place zeropages when zeropages are disallowed s390/mm: re-enable the shared zeropage for !PV and !skeys KVM guests arch/s390/include/asm/gmap.h | 2 +- arch/s390/include/asm/mmu.h | 5 + arch/s390/include/asm/mmu_context.h | 1 + arch/s390/include/asm/pgtable.h | 16 ++- arch/s390/kvm/kvm-s390.c | 4 +- arch/s390/mm/gmap.c | 163 +++++++++++++++++++++------- mm/userfaultfd.c | 34 ++++++ 7 files changed, 178 insertions(+), 47 deletions(-) -- 2.44.0