Received: by 2002:a05:7412:37c9:b0:e2:908c:2ebd with SMTP id jz9csp287621rdb; Mon, 18 Sep 2023 15:26:49 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF7BhGGoTUELHG7nafQxoT2sy2UWRDJBdXaszMAKgnSA5rsKjCo1rruvlArz7saRAPOF2// X-Received: by 2002:a17:902:f7c2:b0:1c5:64aa:b961 with SMTP id h2-20020a170902f7c200b001c564aab961mr4303975plw.50.1695076009134; Mon, 18 Sep 2023 15:26:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695076009; cv=none; d=google.com; s=arc-20160816; b=z4XspiWq9fvAm/TawlZ5F+K2W+Kez+7tjKYOeQTQS/EubkQg/4jYMrLK59QK+x9gAH 74eWOqcAisEqKxrLgF5cZW19QCAyQPs/J7r+69F3fUnr+KLmMSbuVVezZPONA8AVkpAH YNevz3bhVCBACK0p0BTxTzWSNc4yeGy7kaLrOyrhc8HikpAuGuKwzgza47Y1dBTjYF0+ yizGe9u5WqqIsqQXM0MWbaZEDdsZcmH4PrXi0kTtbM2tUhno9NXG1FAv55AISYMMkXw0 HhFfz4W9v+VFVUYffUZ0fsjAq12kJESZJbp2Z5JnyfAUye2u5EJ4vzVWoS4mAbIc/efW ubjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:mime-version :dkim-signature; bh=8NCRoQtasHBhaxbBJof29twbnK3J3FpNXqEPensTrQU=; fh=PZwkJ+zsyjaENrrgt6BOYjQFzJqb99+9MdfUtuZkdi0=; b=MQqkOnR4vSB3hLh2gYeG4M+GEOVd9fGQnMg+kJ8EVCiEV2BGJgD6KEXipuZDUSQx/h foGtkSPGJEcHQEiu73qmqUXRMvl8kZePktUV7JZDxmbB5q9Hwo/4fAcN7kRkUuHScLnc fLxtk870bqxUCQhi0NF1IMQoTT/2j1ZyolDG4suktYyIHECMa8TAu85oZrQb4uDk7z0d gGxBVuEIkSbO9hd9li4ZI+OOXHAl23gc+pgImBhvxvuQ0R5IAf7/7/6vnAgEs4fnEvLe aXwEC2Mu8hJEpu/yUKZCdTeqSnjDrPFPjtLShpFuNfzvj9HMBDWD00Mmgxet5NIaImUF qDBg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b="k+xVY/UY"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from morse.vger.email (morse.vger.email. [23.128.96.31]) by mx.google.com with ESMTPS id q12-20020a170902dacc00b001bf2dc67da3si3302536plx.209.2023.09.18.15.26.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Sep 2023 15:26:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) client-ip=23.128.96.31; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b="k+xVY/UY"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id B591E80ABF25; Mon, 18 Sep 2023 10:13:42 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231406AbjIRRNm (ORCPT + 99 others); Mon, 18 Sep 2023 13:13:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38588 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231296AbjIRRNk (ORCPT ); Mon, 18 Sep 2023 13:13:40 -0400 Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0A5EC83 for ; Mon, 18 Sep 2023 10:13:35 -0700 (PDT) Received: by mail-wm1-x334.google.com with SMTP id 5b1f17b1804b1-4047c6ec21dso4425e9.0 for ; Mon, 18 Sep 2023 10:13:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1695057213; x=1695662013; darn=vger.kernel.org; h=cc:to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=8NCRoQtasHBhaxbBJof29twbnK3J3FpNXqEPensTrQU=; b=k+xVY/UYy34DW9Sij4WkPswS8rzDpog6MBsUzSw+XHVXVDUQ4ShslsRo3NJ1owRpYU eA1EcvRvvr+rj/yCy93Zs+rmnI3BzXxb0GEHUmgDOSMd8n3tTXvGOdpF6xEIugIzlCMV FFCEpZp8dHFCliICJKDp7RmWRNPJ6SWke7WGY8EXu9q/sV/lsBqtHQLB5y2xRp6Vb83M kTzV4O1im60vHLHICOnak8HTMrp0PhOeOBJrKoR+Ufm4knAOkk7D6a3hbMNW7YxWxy+C q1XNixjC3/XlEOaKtPXiiMGhSlTXrBtqEZB+6RZMWUMZ4gi9g2NLUJuxRmNbLEeLGzfL V1fg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695057213; x=1695662013; h=cc:to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=8NCRoQtasHBhaxbBJof29twbnK3J3FpNXqEPensTrQU=; b=CF4REPmT6Vpo+yCez0qK00eJunx6M+d3xicSub3dUxhuGFcYthM3R5ExRvM9TSkfWS mpfEP/b8hYsGccP7XPvE4nKoG6BxSaRZbCGYOYeDvpusP92K8xdeyhkddgHkN0dkAHnM 2b4r7dk2NC73QLGVnZ38Vy2mhWqWOy6mFf9pygMubA2T11xqnHgglDjQ7K/s0SphKLn4 8ZiND/7d61ZwHSwDjWZUClN6/rlK4aJbA6ZSAt9jkdk8+zeGxWhKSYU/v1c4Lo0ryxK1 r8asAyPoHYfRiepz+Wng4/bwlQGTuKaVGor0KCWDWsUyh3nKfm9vWMRyBGpVzlPVmoKp KaYQ== X-Gm-Message-State: AOJu0YzCeYN8dFdFO1VJGGOztHZ5F6omjhalR3398nUTNg/4xtQpWXNx cXEMPzD0WUh7ASNby65qbarehkny/6X/dzGV0Kj0+czBwdVXpTHgDoKLDw== X-Received: by 2002:a05:600c:3acd:b0:400:c6de:6a20 with SMTP id d13-20020a05600c3acd00b00400c6de6a20mr181501wms.3.1695057213310; Mon, 18 Sep 2023 10:13:33 -0700 (PDT) MIME-Version: 1.0 From: Jann Horn Date: Mon, 18 Sep 2023 19:12:56 +0200 Message-ID: Subject: KVM nonblocking MMU notifier with KVM_GUEST_USES_PFN looks racy [but is currently unused] To: Paolo Bonzini , David Woodhouse Cc: kernel list , KVM list , Linux-MM , Michal Hocko , Sean Christopherson Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Mon, 18 Sep 2023 10:13:42 -0700 (PDT) Hi! I haven't tested this and might be missing something, but I think that the MMU notifier for KVM_GUEST_USES_PFN pfncache is currently a bit broken. Except that nothing seems to actually use KVM_GUEST_USES_PFN, so currently it's not actually a problem? gfn_to_pfn_cache_invalidate_start() contains the following: /* * If the OOM reaper is active, then all vCPUs should have * been stopped already, so perform the request without * KVM_REQUEST_WAIT and be sad if any needed to be IPI'd. */ if (!may_block) req &= ~KVM_REQUEST_WAIT; called = kvm_make_vcpus_request_mask(kvm, req, vcpu_bitmap); WARN_ON_ONCE(called && !may_block); The comment explains that we rely on OOM reaping only happening when a process is sufficiently far into being stopped that it is no longer executing vCPUs, but from what I can tell, that's not what the caller actually guarantees. Especially on the path from the process_mrelease() syscall (if we're dealing with a process whose mm is not shared with other processes), we only check that the target process has SIGNAL_GROUP_EXIT set. From what I can tell, that does imply that delivery of a fatal signal has begun, but doesn't even imply that the CPU running the target process has been IPI'd, let alone that the target process has died or anything like that. But I also don't see any reason why gfn_to_pfn_cache_invalidate_start() actually has to do anything special for non-blocking invalidation - from what I can tell, nothing in there can block, basically everything runs with preemption disabled. The first half of the function holds a spinlock; the second half is basically a call to kvm_make_vcpus_request_mask(), which disables preemption across the whole function with get_cpu()/put_cpu(). A synchronous IPI spins until the IPI has been acked but that doesn't count as sleeping. (And the rest of the OOM reaping code will do stuff like synchronous IPIs for its TLB flushes, too.) So maybe you/I can just rip out the special-casing of nonblocking mode from gfn_to_pfn_cache_invalidate_start() to fix this? Relevant call paths for the theoretical race: sys_kill prepare_kill_siginfo kill_something_info kill_proc_info rcu_read_lock kill_pid_info rcu_read_lock group_send_sig_info [PIDTYPE_TGID] do_send_sig_info lock_task_sighand [task->sighand->siglock] send_signal_locked __send_signal_locked prepare_signal legacy_queue signalfd_notify sigaddset(&pending->signal, sig) complete_signal signal->flags = SIGNAL_GROUP_EXIT [mrelease will work starting here] for each thread: sigaddset(&t->pending.signal, SIGKILL) signal_wake_up [IPI happens here] unlock_task_sighand [task->sighand->siglock] rcu_read_unlock rcu_read_unlock sys_process_mrelease find_lock_task_mm spin_lock(&p->alloc_lock) task_will_free_mem SIGNAL_GROUP_EXIT suffices PF_EXITING suffices if singlethreaded? task_unlock mmap_read_lock_killable __oom_reap_task_mm for each private non-PFNMAP/MIXED VMA: tlb_gather_mmu mmu_notifier_invalidate_range_start_nonblock __mmu_notifier_invalidate_range_start mn_hlist_invalidate_range_start kvm_mmu_notifier_invalidate_range_start [as ops->invalidate_range_start] gfn_to_pfn_cache_invalidate_start [loop over gfn_to_pfn_cache instances] if overlap and KVM_GUEST_USES_PFN [UNUSED]: evict_vcpus=true [if evict_vcpus] kvm_make_vcpus_request_mask __kvm_handle_hva_range unmap_page_range mmu_notifier_invalidate_range_end tlb_finish_mmu mmap_read_unlock