Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp9363221rwr; Thu, 11 May 2023 13:54:46 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ41eQ4N5iUGCjTT8jkafbNtzdgwCYs7HbWjsfcKWzP+jXyA0xU+uvPIEn814TxHCNsExpZf X-Received: by 2002:a17:903:120c:b0:1a6:d15f:3ce1 with SMTP id l12-20020a170903120c00b001a6d15f3ce1mr27803827plh.34.1683838485912; Thu, 11 May 2023 13:54:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683838485; cv=none; d=google.com; s=arc-20160816; b=roJUAQIoDKE+n9J1EENs3RtVrtieWDGHLItBJw6yPRp6Tb6MlUTOhTXmXce51p0h0D 6zhqBdxn84qkScMZ9vB+DwGdU+UIkxSBUiJpxhSCEFNC0JrcXZ40VqQSUhRzDQBhOWTu ZR9lpGQdLKRnJ5vGkWdxcBitXd1mncikyoHFTtg2kHJ1nTs84ImmDYZq+z0t2G1QPKVv s/54F6oadVWXhnc5QVib2L2rZ7O0LtTnRwJv4KSnUVT0XM5nkYmTcgNxiVNUydMEA1ru eZkDxTPkGHS+y3ZmyxeKqMDabzo9hdDIihP5UpoT4PmlsRVsnAuxOa3x0VJGi9mu1KDE RGDw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=vZx8FpQdLVzyY5XjqVFGkizAxjSRSvIZJciSNIfEcDI=; b=tvesib4c3bhkbtd6j9hD5jS5UUhB1Y72ln8r7f5YHWtPdDwypMujTyqIvT1U1isj3V f5uIRxdRLaGhzCKKDD2vgRxF9swdEmawegQ41Fll99c3O6D5+j2Di194E3Dty7H6yqiT m4TbUxVAW07cPn+TT3Zhw0euBPoDHIHTHehen1jObq1iIIQ7ZsX81sTYukr8fpU94x0t kB1HHK6ceFpnKIAkJRWMevwjg22XWLWGf203kwHqbFGDZwUtURlUEY1uyf1Q23vt9ckD xok3ZNWZpqrW5Xmms947G7Fbt5vwqIVmiq9s7jQkGi++quCMhZ7tflE/x/tmIAlwHJtn 0uPA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=V31+M9mg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f12-20020a170902ce8c00b001ac7b1ddba1si7894392plg.458.2023.05.11.13.54.32; Thu, 11 May 2023 13:54:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=V31+M9mg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239145AbjEKUk5 (ORCPT + 99 others); Thu, 11 May 2023 16:40:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43128 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239225AbjEKUkz (ORCPT ); Thu, 11 May 2023 16:40:55 -0400 Received: from mail-qv1-xf30.google.com (mail-qv1-xf30.google.com [IPv6:2607:f8b0:4864:20::f30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F18DD199E for ; Thu, 11 May 2023 13:40:52 -0700 (PDT) Received: by mail-qv1-xf30.google.com with SMTP id 6a1803df08f44-61b5da092dfso41459416d6.0 for ; Thu, 11 May 2023 13:40:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1683837652; x=1686429652; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=vZx8FpQdLVzyY5XjqVFGkizAxjSRSvIZJciSNIfEcDI=; b=V31+M9mgzwDUisMy4JDnJDouRAbsaL7pgoffAx/ibC37kMSfO1+m5Ckl4dF2YIbx6Y 14tG8z8oHdHnyT3BhNZ68dUmkvkIX2RZyNT0iuqaCU1bWJxlzw/2G9Mg7PsOOTQkrRR2 LENQ/q6YuRjcAfhfIG49GvYyegOaAw2YvXMXEQiutviQvY/XL/nNv5MunRO5PZsrKaOV dANDmtfkJwVa5ProdP2UWjpHESzcPCcdR7g8TcsTgg0tUQAl4pL0xHxUk+dJvq3qefrz 7e9Dvt4yx0mTF/0nJcU/783igvVhbYIB/PafPCntuI7bQ3E5LaWhoML03Sr9QXQNRgTO xS4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683837652; x=1686429652; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vZx8FpQdLVzyY5XjqVFGkizAxjSRSvIZJciSNIfEcDI=; b=HQd81VKhIijYm6k2DUmNnIUK47+RA3TLxBgBVt+MbdNvNPhCJ7RshZqWAxT2K5j2cr 12ppx6gQSIXxx8pisBj/lwZaLrXMYu7VEOSwGNHgpEwO+IQ+xHyZhSUn4HVM6pZXp82X ZVivSuVfSzOK2r9sJoFzndv6uz5u7yZN2iKs9X9CP4XpT3fhzJC+vL8uGZ5JYSJTLhOf iueUg7+OAIzS+WMhYVPb8BYWUmmGip6KVc6vg6QyCp6mHcQFpk9zqD4g4R83WXh6qtvM tAsnsO/vuSwnZiqakxPV4iDe/hs/Oa32pxe28nbcbupL38I6ou7mtDs4vZF6CmHdou6r K8Zg== X-Gm-Message-State: AC+VfDwUEvOONEQ3SUUJNs2OiyuYFe3WiIn06MJatD+pP0PN6KXOo7Ap dUkPCx981kOZeWNze4NKeyWyFkRW+O1QCFNchrpzvg== X-Received: by 2002:ad4:594d:0:b0:621:65de:f5f9 with SMTP id eo13-20020ad4594d000000b0062165def5f9mr5248392qvb.5.1683837651964; Thu, 11 May 2023 13:40:51 -0700 (PDT) MIME-Version: 1.0 References: <20230511182426.1898675-1-axelrasmussen@google.com> <20230511202243.GA5466@monkey> In-Reply-To: <20230511202243.GA5466@monkey> From: Axel Rasmussen Date: Thu, 11 May 2023 13:40:16 -0700 Message-ID: Subject: Re: [PATCH 1/3] mm: userfaultfd: add new UFFDIO_SIGBUS ioctl To: Mike Kravetz Cc: Alexander Viro , Andrew Morton , Christian Brauner , David Hildenbrand , Hongchen Zhang , Huang Ying , James Houghton , "Liam R. Howlett" , Miaohe Lin , "Mike Rapoport (IBM)" , Nadav Amit , Naoya Horiguchi , Peter Xu , Shuah Khan , ZhangPeng , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 11, 2023 at 1:29=E2=80=AFPM Mike Kravetz wrote: > > On 05/11/23 11:24, Axel Rasmussen wrote: > > The basic idea here is to "simulate" memory poisoning for VMs. A VM > > running on some host might encounter a memory error, after which some > > page(s) are poisoned (i.e., future accesses SIGBUS). They expect that > > once poisoned, pages can never become "un-poisoned". So, when we live > > migrate the VM, we need to preserve the poisoned status of these pages. > > > > When live migrating, we try to get the guest running on its new host as > > quickly as possible. So, we start it running before all memory has been > > copied, and before we're certain which pages should be poisoned or not. > > > > So the basic way to use this new feature is: > > > > - On the new host, the guest's memory is registered with userfaultfd, i= n > > either MISSING or MINOR mode (doesn't really matter for this purpose)= . > > - On any first access, we get a userfaultfd event. At this point we can > > communicate with the old host to find out if the page was poisoned. > > Just curious, what is this communication channel with the old host? James can probably describe it in more detail / more correctly than I can. My (possibly wrong :) ) understanding is: On the source machine we maintain a bitmap indicating which pages are clean or dirty (meaning, modified after the initial "precopy" of memory to the target machine) or poisoned. Eventually the entire bitmap is sent to the target machine, but this takes some time (maybe seconds on large machines). After this point though we have all the information we need, we no longer need to communicate with the source to find out the status of pages (although there may still be some memory contents to finish copying over). In the meantime, I think the target machine can also ask the source machine about the status of individual pages (for quick on-demand paging). As for the underlying mechanism, it's an internal protocol but the publicly-available thing it's most similar to is probably gRPC [1]. At a really basic level, we send binary serialized protocol buffers [2] over the network in a request / response fashion. [1] https://grpc.io/ [2] https://protobuf.dev/ > -- > Mike Kravetz > > > - If so, we can respond with a UFFDIO_SIGBUS - this places a swap marke= r > > so any future accesses will SIGBUS. Because the pte is now "present", > > future accesses won't generate more userfaultfd events, they'll just > > SIGBUS directly. > > > > UFFDIO_SIGBUS does not handle unmapping previously-present PTEs. This > > isn't needed, because during live migration we want to intercept > > all accesses with userfaultfd (not just writes, so WP mode isn't useful > > for this). So whether minor or missing mode is being used (or both), th= e > > PTE won't be present in any case, so handling that case isn't needed. > >