Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp5215875iog; Wed, 22 Jun 2022 14:50:59 -0700 (PDT) X-Google-Smtp-Source: AGRyM1sW1x/62TlnrUZCkBJG6ZrHo1W/PpzxyYX3cgiWruCBpkLs3dXYaW1L7sjSaHNgrIfJ3AFI X-Received: by 2002:a17:90a:384f:b0:1ea:fa0b:3132 with SMTP id l15-20020a17090a384f00b001eafa0b3132mr475374pjf.5.1655934659553; Wed, 22 Jun 2022 14:50:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1655934659; cv=none; d=google.com; s=arc-20160816; b=zwg7Zco054OzxSpgbEUbp9k+wHJ78nilMtrL1kuu0cf2eLBBwt/qZKlnXHaJPUu+sU Imc5AmD6Zh1XRbInnvPUm2wcNW73l6k/Sd7sCOWA6145zDqf2HQ55JGLXJ0lp+Q6H8zF BZ58NJzJktxL9G6tAg7a5dTOWhoVDeK6GS9vraf7NpgkBd2JZ46IPZyhoKni+SM1xiwB MXe3kslkzfX8feqaUW1e+uZGDkp2O24eQRijm4zTXzatcDe99H4cuAh6uQWxs2eBMtKw wpIeXvd4WoBX7sjsjb6ORy4xyTH1jf4hlrOiDdY5TZyF9x4j+o6OCDYybbYJe/vTZCkJ i4dA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=gOKc2ikh4vL40/zpc4P3kwwtd9Sh2ixHRUheDMcVbAY=; b=nC04x3H0cQFKWYiVW8tWjoMVApMtn43tvlaaU2daCCdA2yl6A3P+38tcC4/fs/o7iN HPBjr4fz0t3Glv3xCvatsSHGMqiYu608h3E0tdOqAxDfXsFPRuZFH59qyQY9EO0/JZuf 9tH04CbZWfwxAeWJ9A8U0v3hqdAn3dqATDBvsIz98G3qRLnJLo1/ewWoQ7Ek9h/0iQSX Ad8YLqeyydVCIh0kzEXUpkR1siK9gItpy1TvM7CNGCjkgVXbBfwvLxnBnVlxaaIv28ft KoJeWgQUmNZ20fVUVCrTdcqTvHpo/N4t630bTRfDvIYEBMpKfpd8Y9HxF3uhHZWy226L Cb6A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=YdO2se9g; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s194-20020a6377cb000000b003ffafbc0463si13459128pgc.369.2022.06.22.14.50.48; Wed, 22 Jun 2022 14:50:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=YdO2se9g; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234198AbiFVVhL (ORCPT + 99 others); Wed, 22 Jun 2022 17:37:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58512 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232456AbiFVVhI (ORCPT ); Wed, 22 Jun 2022 17:37:08 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 359F035AA8 for ; Wed, 22 Jun 2022 14:37:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1655933826; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gOKc2ikh4vL40/zpc4P3kwwtd9Sh2ixHRUheDMcVbAY=; b=YdO2se9gd94Ut8tZVWLUOh82I8qdYnUNy2qkS/jX8iYlM+cXEkNI9W8AzEUi+41JwLlxli U/BOb7SdCxVJIp765xdYKTXVC27iQlh7PlKxY/A2AZ1AsltRzEOSoy0LAEUOZ7Bw1O6vVn ExHvEFwCjw550+2+sHx7xQRSKk/oUeg= Received: from mail-io1-f70.google.com (mail-io1-f70.google.com [209.85.166.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-410-l6wUNfFUPAOs0HhG97cEWQ-1; Wed, 22 Jun 2022 17:37:04 -0400 X-MC-Unique: l6wUNfFUPAOs0HhG97cEWQ-1 Received: by mail-io1-f70.google.com with SMTP id s16-20020a056602241000b0067276d240e3so994001ioa.16 for ; Wed, 22 Jun 2022 14:37:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=gOKc2ikh4vL40/zpc4P3kwwtd9Sh2ixHRUheDMcVbAY=; b=yHIijgaiiYaO2aSjE24g+Z9eD2nxBOYnTnpFZb+qLh9y/F0gclEmG3AV3A8VebnR10 hvz86E8woL2sGHBkZKh58IxLt6aXT+bwpWgmfg5TKSkuYAFzHzQUI2wcirZcMe+H4Bcz 9SLtYd9XuwZPO9Ahims3uFU2KSaPJlbb6HmyOoZwXru+ZarvWihkLKYmAatStCT/R8xA nC9Bo2dH8Ei1oFd/hYmeUbp4DgdmcPLsfQh/TkDRGeqNZla1unf5y8wmsdt4N1y/frUm uPm7/CdJ9VeJufK30OeCB3ZboF6gL1Grb4sEm3vMzF+fTKjFdXbIPSA7X8gqAR5BWlqr mP4g== X-Gm-Message-State: AJIora8mEvaOJ//BMgjC+da2pLQddB2293shj8tM5ENJRPnhQB6kbdxR BmA5I7AQ0iNql8BEUOCCcimYtxQD4HTTlUjecp4bjjh8iM21Mzay1VwMflZ3XWpn7tw4Rr4XFlU tvvxgr9HoYftFMnKQbohzZH2F X-Received: by 2002:a05:6602:2a42:b0:65a:eb90:2a12 with SMTP id k2-20020a0566022a4200b0065aeb902a12mr2846096iov.73.1655933823944; Wed, 22 Jun 2022 14:37:03 -0700 (PDT) X-Received: by 2002:a05:6602:2a42:b0:65a:eb90:2a12 with SMTP id k2-20020a0566022a4200b0065aeb902a12mr2846084iov.73.1655933823672; Wed, 22 Jun 2022 14:37:03 -0700 (PDT) Received: from localhost.localdomain (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id g7-20020a0566380c4700b00339d892cc89sm1510446jal.83.2022.06.22.14.37.01 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 22 Jun 2022 14:37:02 -0700 (PDT) From: Peter Xu To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: peterx@redhat.com, Paolo Bonzini , Andrew Morton , David Hildenbrand , "Dr . David Alan Gilbert" , Andrea Arcangeli , Linux MM Mailing List , Sean Christopherson Subject: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE Date: Wed, 22 Jun 2022 17:36:53 -0400 Message-Id: <20220622213656.81546-2-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220622213656.81546-1-peterx@redhat.com> References: <20220622213656.81546-1-peterx@redhat.com> MIME-Version: 1.0 Content-type: text/plain Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We have had FAULT_FLAG_INTERRUPTIBLE but it was never applied to GUPs. One issue with it is that not all GUP paths are able to handle signal delivers besides SIGKILL. That's not ideal for the GUP users who are actually able to handle these cases, like KVM. KVM uses GUP extensively on faulting guest pages, during which we've got existing infrastructures to retry a page fault at a later time. Allowing the GUP to be interrupted by generic signals can make KVM related threads to be more responsive. For examples: (1) SIGUSR1: which QEMU/KVM uses to deliver an inter-process IPI, e.g. when the admin issues a vm_stop QMP command, SIGUSR1 can be generated to kick the vcpus out of kernel context immediately, (2) SIGINT: which can be used with interactive hypervisor users to stop a virtual machine with Ctrl-C without any delays/hangs, (3) SIGTRAP: which grants GDB capability even during page faults that are stuck for a long time. Normally hypervisor will be able to receive these signals properly, but not if we're stuck in a GUP for a long time for whatever reason. It happens easily with a stucked postcopy migration when e.g. a network temp failure happens, then some vcpu threads can hang death waiting for the pages. With the new FOLL_INTERRUPTIBLE, we can allow GUP users like KVM to selectively enable the ability to trap these signals. Signed-off-by: Peter Xu --- include/linux/mm.h | 1 + mm/gup.c | 33 +++++++++++++++++++++++++++++---- 2 files changed, 30 insertions(+), 4 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index bc8f326be0ce..ebdf8a6b86c1 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2941,6 +2941,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, #define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */ #define FOLL_PIN 0x40000 /* pages must be released via unpin_user_page */ #define FOLL_FAST_ONLY 0x80000 /* gup_fast: prevent fall-back to slow gup */ +#define FOLL_INTERRUPTIBLE 0x100000 /* allow interrupts from generic signals */ /* * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each diff --git a/mm/gup.c b/mm/gup.c index 551264407624..ad74b137d363 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -933,8 +933,17 @@ static int faultin_page(struct vm_area_struct *vma, fault_flags |= FAULT_FLAG_WRITE; if (*flags & FOLL_REMOTE) fault_flags |= FAULT_FLAG_REMOTE; - if (locked) + if (locked) { fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + /* + * We should only grant FAULT_FLAG_INTERRUPTIBLE when we're + * (at least) killable. It also mostly means we're not + * with NOWAIT. Otherwise ignore FOLL_INTERRUPTIBLE since + * it won't make a lot of sense to be used alone. + */ + if (*flags & FOLL_INTERRUPTIBLE) + fault_flags |= FAULT_FLAG_INTERRUPTIBLE; + } if (*flags & FOLL_NOWAIT) fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT; if (*flags & FOLL_TRIED) { @@ -1322,6 +1331,22 @@ int fixup_user_fault(struct mm_struct *mm, } EXPORT_SYMBOL_GPL(fixup_user_fault); +/* + * GUP always responds to fatal signals. When FOLL_INTERRUPTIBLE is + * specified, it'll also respond to generic signals. The caller of GUP + * that has FOLL_INTERRUPTIBLE should take care of the GUP interruption. + */ +static bool gup_signal_pending(unsigned int flags) +{ + if (fatal_signal_pending(current)) + return true; + + if (!(flags & FOLL_INTERRUPTIBLE)) + return false; + + return signal_pending(current); +} + /* * Please note that this function, unlike __get_user_pages will not * return 0 for nr_pages > 0 without FOLL_NOWAIT @@ -1403,11 +1428,11 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm, * Repeat on the address that fired VM_FAULT_RETRY * with both FAULT_FLAG_ALLOW_RETRY and * FAULT_FLAG_TRIED. Note that GUP can be interrupted - * by fatal signals, so we need to check it before we + * by fatal signals of even common signals, depending on + * the caller's request. So we need to check it before we * start trying again otherwise it can loop forever. */ - - if (fatal_signal_pending(current)) { + if (gup_signal_pending(flags)) { if (!pages_done) pages_done = -EINTR; break; -- 2.32.0