Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp9383652rwr; Thu, 11 May 2023 14:13:39 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7uT+w/R4GOFJZzUbmP5KZJmhPYnRmySjU/e2rIl+c/cOeCkb7i7tEa4lVMFHwYWvEBPvCQ X-Received: by 2002:a05:6a00:b45:b0:63d:3c39:ecc2 with SMTP id p5-20020a056a000b4500b0063d3c39ecc2mr28245015pfo.12.1683839619345; Thu, 11 May 2023 14:13:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683839619; cv=none; d=google.com; s=arc-20160816; b=ZhmVexn6HmalbaL1bH0/LEccCh90wVFELG/qpjUC46vPxQch7nVSgLSI4QNskIOOVr Y+92Emua/OYMLlBzuYrzqHfRFB0/i6mc3tvlGspZZzV1hpYyGumHh5kEVy2tIjOuSiFR pYnrOYjXgrgNP7oBhE/fWXWJTjefpKsE/FGS9n8Y31tpLpKTHV5mtSvW9UD1Fw8KeVQD fwDbq56Oih/+qyCCGLS/E0SICGmKytqC2MT37WWPevvEOPFuMMy9VF6NlA8nNdWVwpG5 shxGnFSbco5QTArRSR6NXgGg1EEoMAwU71pJlH7fF0dwVwz26a1ZWqWNh/BivxL2SbG7 njow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=vCeq2xThlfZB3ptEpDOPv6geRdVGsuZ5n1puK00a7nI=; b=wt2ddlEuMbSd6o8YFHJawSopluEs7ROqRKRmQhLgPeddVT83kAGpQ03cKc5NAHLkp2 6EsqAxQU/69iZ0qcu/F7h0wDZF5umnm86svDoTSjHEFMADSwaGF8vs9Xg+olTRT9innp iCImjIppIUE7IQOxrv+6PRQ9HEBHZ4k8xwaEHm2olIhFass8NRgykzFVh4MJaobG1hQV dkGbvlsIh061jpVRsoTYQ7rxY7Yko50w3nqLg/mDvOPRA0whUQoQvAy5rV7YhudP3Dit 2CbN4/FgwjnI5EqJr72Dnr/LB1rENWJwasAlJbZrnKBuFFB7vmJOJNSWeoEMweQ110RN EG7g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=iEn7QA6s; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f28-20020a63381c000000b0052c6200e5e1si7681311pga.649.2023.05.11.14.13.27; Thu, 11 May 2023 14:13:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=iEn7QA6s; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238381AbjEKVGW (ORCPT + 99 others); Thu, 11 May 2023 17:06:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54418 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238502AbjEKVGU (ORCPT ); Thu, 11 May 2023 17:06:20 -0400 Received: from mail-qk1-x735.google.com (mail-qk1-x735.google.com [IPv6:2607:f8b0:4864:20::735]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6FDFC8A40 for ; Thu, 11 May 2023 14:06:02 -0700 (PDT) Received: by mail-qk1-x735.google.com with SMTP id af79cd13be357-75131c2997bso3110158385a.1 for ; Thu, 11 May 2023 14:06:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1683839159; x=1686431159; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=vCeq2xThlfZB3ptEpDOPv6geRdVGsuZ5n1puK00a7nI=; b=iEn7QA6soRHK+uXK3n4TqBcQlPoJaEX2jYbpTglsAhODvMazuXOFhfxfFwkiYGSmF1 4bSBY1X9T+/L3ZH0Lve9uA91vW5G+q7lshvyEPyUI0B3bXviFnkMtjrYkHT/rNm1qh5F SZ6NU5/fWdIptlm7rZanFIuCaOI5efRp9kPDdP/WYTWTLe3yDNZbwUt0nJYXDr5d8HET FVdt8UNAKbRx6YwHMl2l4ZXa72qwyEiMduWdZCy/T0UigM//K/4b+KFG7c1UVBV84pKX GyZ1aH2R4CSXlr5lc+HBuHSLNI/4o9Ioy+qKnwdnAXER5sh5ZcKeCUThHS6BrxrVQTfl OejQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683839159; x=1686431159; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vCeq2xThlfZB3ptEpDOPv6geRdVGsuZ5n1puK00a7nI=; b=Kv3n18zFl4p/P04/ASGRbQtInK9U2QBTaSI41Na0Ur0BcOjW7yYrmorqWJXwPLqI1O f9hVxpXZuTjkImisDqYYA4E3lmZo0Zy6thRluCfqc2DI1aP/vXq2oy59CIAkeB649urI FTr5YNUnWbTplphUnZB7MawVMG1I+YmjYZBrfiU3KwnBvJUhglW3aUKMR8CVS7SMCvvq NoAp9Hi/yvngQSpfhv06CRkiKoyInivBfb2ZOrRLDDdTifT2LHxQI3rmVqSY7mtl0ars qAzuEdZeZ2hOxbwYf27B+1NRuT925gv3KRHanY99e0xOGo+fmyWAi8s7s+bYsqwUb4fp OKoQ== X-Gm-Message-State: AC+VfDz9u92magYM05jILhIOmnoxVAIABBvTlZ2m14DJKJ6IsVaTDl6R 7XTUYPuUK4vUl96BbXLR5pbQ6o+nhVbwt+N3lry4hg== X-Received: by 2002:a05:6214:408:b0:5dd:b986:b44 with SMTP id z8-20020a056214040800b005ddb9860b44mr42769277qvx.6.1683839158749; Thu, 11 May 2023 14:05:58 -0700 (PDT) MIME-Version: 1.0 References: <20230511182426.1898675-1-axelrasmussen@google.com> <20230511202243.GA5466@monkey> In-Reply-To: From: Axel Rasmussen Date: Thu, 11 May 2023 14:05:23 -0700 Message-ID: Subject: Re: [PATCH 1/3] mm: userfaultfd: add new UFFDIO_SIGBUS ioctl To: Mike Kravetz Cc: Alexander Viro , Andrew Morton , Christian Brauner , David Hildenbrand , Hongchen Zhang , Huang Ying , James Houghton , "Liam R. Howlett" , Miaohe Lin , "Mike Rapoport (IBM)" , Nadav Amit , Naoya Horiguchi , Peter Xu , Shuah Khan , ZhangPeng , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Jiaqi Yan Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 11, 2023 at 1:40=E2=80=AFPM Axel Rasmussen wrote: > > On Thu, May 11, 2023 at 1:29=E2=80=AFPM Mike Kravetz wrote: > > > > On 05/11/23 11:24, Axel Rasmussen wrote: Apologies for the noise, I should have CC'ed +Jiaqi on this series too, since he is working on other parts of the memory poisoning / recovery stuff internally. > > > The basic idea here is to "simulate" memory poisoning for VMs. A VM > > > running on some host might encounter a memory error, after which some > > > page(s) are poisoned (i.e., future accesses SIGBUS). They expect that > > > once poisoned, pages can never become "un-poisoned". So, when we live > > > migrate the VM, we need to preserve the poisoned status of these page= s. > > > > > > When live migrating, we try to get the guest running on its new host = as > > > quickly as possible. So, we start it running before all memory has be= en > > > copied, and before we're certain which pages should be poisoned or no= t. > > > > > > So the basic way to use this new feature is: > > > > > > - On the new host, the guest's memory is registered with userfaultfd,= in > > > either MISSING or MINOR mode (doesn't really matter for this purpos= e). > > > - On any first access, we get a userfaultfd event. At this point we c= an > > > communicate with the old host to find out if the page was poisoned. > > > > Just curious, what is this communication channel with the old host? > > James can probably describe it in more detail / more correctly than I > can. My (possibly wrong :) ) understanding is: > > On the source machine we maintain a bitmap indicating which pages are > clean or dirty (meaning, modified after the initial "precopy" of > memory to the target machine) or poisoned. Eventually the entire > bitmap is sent to the target machine, but this takes some time (maybe > seconds on large machines). After this point though we have all the > information we need, we no longer need to communicate with the source > to find out the status of pages (although there may still be some > memory contents to finish copying over). > > In the meantime, I think the target machine can also ask the source > machine about the status of individual pages (for quick on-demand > paging). > > As for the underlying mechanism, it's an internal protocol but the > publicly-available thing it's most similar to is probably gRPC [1]. At > a really basic level, we send binary serialized protocol buffers [2] > over the network in a request / response fashion. > > [1] https://grpc.io/ > [2] https://protobuf.dev/ > > > -- > > Mike Kravetz > > > > > - If so, we can respond with a UFFDIO_SIGBUS - this places a swap mar= ker > > > so any future accesses will SIGBUS. Because the pte is now "present= ", > > > future accesses won't generate more userfaultfd events, they'll jus= t > > > SIGBUS directly. > > > > > > UFFDIO_SIGBUS does not handle unmapping previously-present PTEs. This > > > isn't needed, because during live migration we want to intercept > > > all accesses with userfaultfd (not just writes, so WP mode isn't usef= ul > > > for this). So whether minor or missing mode is being used (or both), = the > > > PTE won't be present in any case, so handling that case isn't needed. > > >