Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp5475852rwb; Mon, 14 Nov 2022 05:20:45 -0800 (PST) X-Google-Smtp-Source: AA0mqf54Y85qUe+cVEsOYq5PBRNBwfZVW+sZgB/9Pf6NIsjA0dllPLAgYagwU+CX77ozNaZqsjqh X-Received: by 2002:a17:90a:520d:b0:212:e766:f3e4 with SMTP id v13-20020a17090a520d00b00212e766f3e4mr13353739pjh.213.1668432044992; Mon, 14 Nov 2022 05:20:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668432044; cv=none; d=google.com; s=arc-20160816; b=LcNtRiqZ1xD6L9z/wIzq4HZY6d/dxemiB6bUhcpr+teY9w5VA1Iu7GXSBF9KuTjzfa INW6H+UXeMrmWkFNOpYs9jpkK6rG9kj7JGGxUrpZ2CC5hWRaP6yZsTW3886KnDyG7EZW Urf/Nz4Ny/n0h3/BJaunOnaI4nVkIbH5nktkH4bGAr2hj5MWsDwnr78L1Ir8DcG2OUAc PoBGHJ6AvFdyYitP+5INPmaWrACUMN+Lss0cH//AvYLyuzRgiuN+Bazt+BpItCmqZTJn VZmlQlw2Q0EUFApCgwWn8BYdeutl1UjaUTvcFc9x3/gOUhd9Vu0lTAOF/YMRVCmR72Ga gZSg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=Iq24fi/K1ZWxZUXMvcUoyTILrqk5JmMovex8xWMhGvs=; b=yYzoQ5qBHI7VEal09E/q1bPCYeKKAfPaexDbw2SN3mwStpHvoRdM532rKQVYdg0QIb 5ZDJQjOX0689nCCC3NrmSAbFuq95X8uen2XgphTevdJFrjFTt8ezreFqis7q+Fiq6e6p aO88+JrakflaWgJacdW2XRTH4xnlBg5aI5FN6ndO6qvfvuJyhTtkkUDVXikw82xzOOzW AH6He00Y4KxY4UBovgBgw7SAODAfVL/SB/JcLj2aprJ5lmngoXxoUyuHXOq5c7LA3GBY 7Bpv8ztGOxMpovuWVwsKxa8oj/BtzRKLTGaV10hNcMDLJ1MbkNT6DjsI5v9r149eKUsf FYJQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=S+9g95Ur; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b3-20020a17090a6e0300b0021616a303b8si7355684pjk.83.2022.11.14.05.20.18; Mon, 14 Nov 2022 05:20:44 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=S+9g95Ur; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236355AbiKNLqJ (ORCPT + 88 others); Mon, 14 Nov 2022 06:46:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42478 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236149AbiKNLp3 (ORCPT ); Mon, 14 Nov 2022 06:45:29 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 470112180D; Mon, 14 Nov 2022 03:44:49 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id F1E061FED3; Mon, 14 Nov 2022 11:44:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1668426287; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Iq24fi/K1ZWxZUXMvcUoyTILrqk5JmMovex8xWMhGvs=; b=S+9g95Urw709nCH24xOaR7yO668sAayDLj+OvE2gd++F0B3Uctz3YCnc4Dm+houj7Rkq+O o9SNuP3/3CedtLzMpL1fiOwJ9IHCmkCRUXc/tRLtHK9UhdrtpX1iarg+ryTnBA77fJUTrj gbUVJGeEqe5Vb6sjMG7NEJRyttxaBSg= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id CFCA013A92; Mon, 14 Nov 2022 11:44:47 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 3XuMMC8qcmMuAwAAMHmgww (envelope-from ); Mon, 14 Nov 2022 11:44:47 +0000 Date: Mon, 14 Nov 2022 12:44:47 +0100 From: Michal Hocko To: Zhongkun He Cc: Andrew Morton , corbet@lwn.net, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org Subject: Re: [External] Re: [PATCH v2] mm: add new syscall pidfd_set_mempolicy(). Message-ID: References: <20221111084051.2121029-1-hezhongkun.hzk@bytedance.com> <20221111112732.30e1696bcd0d5b711c188a9a@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 14-11-22 00:41:21, Zhongkun He wrote: > Hi Andrew, thanks for your replay. > > > This sounds a bit suspicious. Please share much more detail about > > these races. If we proced with this design then mpol_put_async() > > shouild have comments which fully describe the need for the async free. > > > > How do we *know* that these races are fully prevented with this > > approach? How do we know that mpol_put_async() won't free the data > > until the race window has fully passed? > > A mempolicy can be either associated with a process or with a VMA. > All vma manipulation is somewhat protected by a down_read on > mmap_lock.In process context there is no locking because only > the process accesses its own state before. We shouldn't really rely on mmap_sem for this IMO. There is alloc_lock (aka task lock) that makes sure the policy is stable so that caller can atomically take a reference and hold on the policy. And we do not do that consistently and this should be fixed. E.g. just looking at some random places like allowed_mems_nr (relying on get_task_policy) is completely lockless and some paths (like fadvise) do not use any of the explicit (alloc_lock) or implicit (mmap_lock) locking. That means that the task_work based approach cannot really work in this case, right? Playing more tricks will not really help long term. So while your patch tries to workaround the current state of the art I do not think we really want that. As stated previously, I would much rather see proper reference counting instead. I thought we have agreed this would be the first approach unless the resulting performance is really bad. Have you concluded that to be the case? I do not see any numbers or notes in the changelog. -- Michal Hocko SUSE Labs