Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp2637999rwd; Mon, 15 May 2023 14:46:24 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4EHqUIxQL7eGBW7dkWdez/38+yjoP1RYi0oQoNrjxUE1c8885uRNBzUsJuJqsiBIbFnaay X-Received: by 2002:a05:6a00:24cb:b0:64b:43d8:a57d with SMTP id d11-20020a056a0024cb00b0064b43d8a57dmr12089542pfv.13.1684187184337; Mon, 15 May 2023 14:46:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684187184; cv=none; d=google.com; s=arc-20160816; b=vqhYKblXZy/tXTIeHXWRjA011iH3qA1u4exD2aTO2t0ZnRpjA1ozRKIO7/rxus7ydf iz5ZVFkgcD+aZyEw70Q/aMti5kbmtQWzO0bLrUGBcdj5v6v40ntDW8oM/D0MWjEl98hB vjMSI3NtqysOmjFFOWA7B7ILQtWOvUj7y+I3iqJRSJf9GXYcMN1eHT+cvdKfQRrVq+us WRsTeOwiNBXM+7+lWQCWde9zG3byA+KoIbfJNtTcNPyKD7lKlA+1AobNZcQG2Bo5FVsN wWGrIJIOjQRfPX65fxK2S/Hg+9pKe4mZvYfoY7ACr70Wnj0P1Hx2cNiA2m0KNFKZSETA g9JQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=IfijG8z/oaCbBPvewVLlk610DheYUL7u+Tg7VIYtDaQ=; b=q0ish6v5wf5f+jJESgXU9oiQqA36Owa4mJz78CcnUkhRChmeyIGeVAEpELAZQnufb0 mYmQ90dFejK5sgUTlIaLGjbrwyJkiEP+YyQS58zeGpLzUseKyJ6zq8zekSXrNAXTrItK AgsYZbZuedCam5HhXqOpJWzP8Ohz5LGGShdwSiPEQOYcQHD/3QFvYaVn2R+/IAaAY+nE yWF0EnlD52SKJoDtjo91ZH9PvBwEfuFiIj5+M45cmZEcAy2uoBQjEx7Ml9v4JtDvJXqX Qgg8I5+IU+XAPFNDet3H+vgD7SGCE79VK7JoNL8YKLOALlCuVC0lYvKO/IbP2qZsDD2K xVSw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=cclaETLR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a13-20020aa7970d000000b006351cdc5e2fsi17626258pfg.292.2023.05.15.14.46.09; Mon, 15 May 2023 14:46:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=cclaETLR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245531AbjEOVg5 (ORCPT + 99 others); Mon, 15 May 2023 17:36:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32938 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244780AbjEOVg4 (ORCPT ); Mon, 15 May 2023 17:36:56 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 43B9459F0 for ; Mon, 15 May 2023 14:36:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684186615; x=1715722615; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=okg4qHsni/PY/1YuKzsck2gzkBAG4CiQF0UDeudg5Ts=; b=cclaETLRdj2gtEw01HAzOtspEhmXzshX4ecRl3T5/zbVvT5zSn6JAUHZ lO+WwNdjZW5iZ/jM9VHulSvleVn3u309hqad942rPe+X66loQoKDmtb1Y CmArCPQILghGewJrZ7iYjSiw80B96RcifUNFifaKXL5ZOiC44biXISptA W2vqd406j52o2f+1TalSxSSHgKuojtnX5ehEDDzX1WJV93aAl9eUZvmlX R9ciIDJ76psk943VWhpBtgpsxdf03V+Ut2sYsO31GRjSOYdJ/bOo0QQjA o3GjkY9FoPSyRYEneirWJLqY9TG8sMcLVAWe7PvoB+gS6TMAqojchVDiw Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10711"; a="351350572" X-IronPort-AV: E=Sophos;i="5.99,277,1677571200"; d="scan'208";a="351350572" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 May 2023 14:36:55 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10711"; a="695183462" X-IronPort-AV: E=Sophos;i="5.99,277,1677571200"; d="scan'208";a="695183462" Received: from satwikja-mobl.amr.corp.intel.com (HELO [10.212.213.112]) ([10.212.213.112]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 May 2023 14:36:54 -0700 Message-ID: <93ae88a4-1dac-77bf-37f6-f8708a6d83b7@intel.com> Date: Mon, 15 May 2023 14:36:54 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Subject: Re: [GIT PULL] x86/shstk for 6.4 Content-Language: en-US To: Linus Torvalds Cc: "Edgecombe, Rick P" , "dave.hansen@linux.intel.com" , "keescook@chromium.org" , "linux-kernel@vger.kernel.org" , "x86@kernel.org" , "akpm@linux-foundation.org" , Peter Zijlstra References: <20230424212130.590684-1-dave.hansen@linux.intel.com> <4433c3595db23f7c779b69b222958151b69ddd70.camel@intel.com> <148b3edb-b056-11a0-1684-6273a4a2d39a@intel.com> <4171c4b0-e24b-a7e2-9928-030cc14f1d8d@intel.com> <95c2e669-bce9-3dd5-e197-3faf816c4b45@intel.com> From: Dave Hansen In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-8.2 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/12/23 14:55, Linus Torvalds wrote: >> There's always a race there because mm->mm_users can always get bumped >> after the fork()er checks it. > Ugh. I was assuming that if they don't already have a reference to the > mm, they can't get one (becasue the '1' comes from 'current->mm', and > nobody else has that mm). > > But maybe that's just not true. Looking around, we have things like > > pages->source_mm = current->mm; > mmgrab(pages->source_mm); > > that creates a ref to the mm with a grab, and then later it gets used. > > So maybe the "no races can happen" is limited to *both* mm_users and > mm_count being 1. What would increment it in that case, when 'current' > is obviously busy forking? > > That "both are one" might still be the common case for fork(). Hmm? get_task_mm() is the devil here. It goes right from having a task_struct to bumping ->mm_users, no ->mm_count needed. It also bumps ->mm_users while holding task_lock(), which means we can't do something simple like take mmap_lock in there to avoid racing with fork(). I did hack something together that seems to work for fork() and get_task_mm(). Basically, we let get_task_mm()'s legacy behavior to be the fast path. But it diverts over to a slow path if get_task_mm() notices that an mm's refcounts and mmap_lock are consistent with a fork() happening elsewhere. The slow path releases the task_lock() and acquires mmap_lock so it can sleep until the (potential) fork() is finished. On the other side, the fork() code checks ->mm_users and ->mm_count. It can now depend on them being stable because it holds mmap_lock and it diverts the get_task_mm() callers over to the slow path. This works for two important cases: 1. get_task_mm() callers since they now conditionally use mmap_lock 2. mmgrab() -> mmget_not_zero() users that later take the mmap_lock I'm also fairly sure it misses some cases outside of those two. The patch is also quite ugly. The "->task_doing_fast_fork" mechanism is pure hackery, for instance. This seems to stay on the fast paths pretty well, even with 'top' or some other /proc poking going on. In the end, this is balancing the extra cost of the get_task_mm() slow path with reduced atomic cost in the fork() path. It looks promising so far. Is this worth pursuing?