Received: by 2002:a05:7412:bb8d:b0:d7:7d3a:4fe2 with SMTP id js13csp79720rdb; Mon, 14 Aug 2023 10:01:49 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFgVXmZvdQw4DhF27wZgSZAYgygki8kyz3rk8RoOP/Z3+vGHWtv/6DVkR/4K1sYQrNSTBuJ X-Received: by 2002:a17:906:3284:b0:965:6075:d100 with SMTP id 4-20020a170906328400b009656075d100mr8649038ejw.39.1692032508993; Mon, 14 Aug 2023 10:01:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1692032508; cv=none; d=google.com; s=arc-20160816; b=k/6SuD+lCNYKrIZE+GKlUR46q0hKqCS/AjA15b+/XPjDD8oomyrn0Ap644NbHCDu11 4j4eVJuSGUt7RhFfae53xpSgoHjf/dyKa/6IWIRba/GZXQOajb1P3n54EFwdpjhVtH6G wHvXy2YcSBCdQnKcMal8iUcZuxm8b4j0JwznxjlC5RDplwdld69qwW4XzT/gf/Hweg6k ITtt5N9JfmwoJ94yOdQl3mBVet2XNtEUJUg8kW7jIZc9x0gvskCcTP7E3zAt5IMm1ZGN GeSPMZ6pcPzU+OQAehHxBgRlvJc6zaAwwqNefS3/giU+GtVakoGw3DHtwNi6IlOVu2/W jteQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:subject :organization:from:references:cc:to:content-language:user-agent :mime-version:date:message-id:dkim-signature; bh=HnwxoJIvuHQj1oL9+UcRHhuzEVRGxsaQ3RBeYdqhfQs=; fh=Eo7iBQfGAUTaq0hJCuOph3Kxcsn6A34dWzNX1ARz+c8=; b=ydQRucWyS7GHxDSthLhSvMcPsFmBBZcXmQXWX/qcsMahuYN4wLNQFl6RvbOMOuiImC UIjbGvluRP1zEgyTa68dckuUZv0WNHD+S/ZMnhy/TuqrApRDqThf4t1F2dA/DZguVtcG FRm0QBZCvnr3Vr9GdX7B6rtftFfj/vkd725MNwvi2ay6b3LVDeRub4jVdd6qy7DqsZqp 1tlMKE/zVff51hsEsuUk4ReboqCLw0zQVgIbSal5k1ly48LISV/blu/KLnQ+B2JZiSaL 2wAWSv55y9gl0G0L8X6bZwzPIC3DywTdOA1wK2+SjcvkFVVDTL+rE2tgGq4P+Q1VLFYw F9oQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=TG4xt7sC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id lm21-20020a170906981500b00992acfe04b8si8668958ejb.150.2023.08.14.10.00.42; Mon, 14 Aug 2023 10:01:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=TG4xt7sC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229651AbjHNPkR (ORCPT + 99 others); Mon, 14 Aug 2023 11:40:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40282 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233009AbjHNPjv (ORCPT ); Mon, 14 Aug 2023 11:39:51 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 12ED310DD for ; Mon, 14 Aug 2023 08:39:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1692027540; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HnwxoJIvuHQj1oL9+UcRHhuzEVRGxsaQ3RBeYdqhfQs=; b=TG4xt7sC8vu57205jFsa9oKCarIpg8NxKRawbdvvOqQc9ZnGpi1nZfZreczSCKkzKCnXmb btXqi4XaH0mA+cEoA+Qdlsj5Au6kliSWvJeWkmp41QubrclsRL4NkSnG4gSik0NSNvoFCr XITM5OibbwvGmN8I6kSRHbKEfhCx9xc= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-564-pIwqhlXUMXC-svtXIIL79A-1; Mon, 14 Aug 2023 11:38:59 -0400 X-MC-Unique: pIwqhlXUMXC-svtXIIL79A-1 Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-3fe1cdf2024so27132625e9.3 for ; Mon, 14 Aug 2023 08:38:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692027537; x=1692632337; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=HnwxoJIvuHQj1oL9+UcRHhuzEVRGxsaQ3RBeYdqhfQs=; b=Mb/jSRKG2ON2Bo8GuRKByVtv3F8/ItZW4g1N/WPPl6yTDHty0CIkF+ZgXY23SBQ2Fv NPJ5s6mOlBOrbfci9EbGKvgBgK8FF0wTWixIwmqEu6rIRlycyQGZKBO5qJI6LEbpW08z KbA1Ex5ilvXDp3FVdIvwu/amq4b2lxiW/3RREUZvDxw7YALst8HGf+6u9yZjZHmtfmiw D3Shy+B+VPtvAyeLjBalfg2MnAE2a/CtC0gnCFQLzXflRuhHRCaA+okRVmNmVqOx6rar HseQKEYGITuwDPKB59gEDwFeSPAst8VdkOak5jbDh/hDA0gJZ5WHDWf8JNIB3IpVI4zS WPdA== X-Gm-Message-State: AOJu0YwQrifKFiUf63gTqqPmIZ1S3VkHT0YHQyLl9WlaZ0UNbhumXCEE KjVavYJIXRy5Ud3Ngea9iZi+W0JSNEyoTXwCRqFjnSg4LUAdWJEQhMdSz+XVJopxfhXyclLKU1p IIweYdlofyg+EQYwjZT1/5Y5J X-Received: by 2002:a05:600c:280b:b0:3fe:16c8:65fa with SMTP id m11-20020a05600c280b00b003fe16c865famr7911849wmb.4.1692027537004; Mon, 14 Aug 2023 08:38:57 -0700 (PDT) X-Received: by 2002:a05:600c:280b:b0:3fe:16c8:65fa with SMTP id m11-20020a05600c280b00b003fe16c865famr7911834wmb.4.1692027536556; Mon, 14 Aug 2023 08:38:56 -0700 (PDT) Received: from ?IPV6:2003:d8:2f2b:d900:2d94:8433:b532:3418? (p200300d82f2bd9002d948433b5323418.dip0.t-ipconnect.de. [2003:d8:2f2b:d900:2d94:8433:b532:3418]) by smtp.gmail.com with ESMTPSA id l4-20020a1ced04000000b003fe61c33df5sm17737760wmh.3.2023.08.14.08.38.55 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 14 Aug 2023 08:38:56 -0700 (PDT) Message-ID: <93d50ca7-6c78-eee0-c4ca-8ca3dff2ccd3@redhat.com> Date: Mon, 14 Aug 2023 17:38:55 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Content-Language: en-US To: Mateusz Guzik Cc: linux-kernel@vger.kernel.org, torvalds@linux-foundation.org, brauner@kernel.org, ebiederm@xmission.com, akpm@linux-foundation.org, linux-mm@kvack.org, koct9i@gmail.com, oleg@redhat.com, dave@stgolabs.net References: <20230813123333.1705833-1-mjguzik@gmail.com> <00dd353b-e5aa-69fb-6b52-cb59028ea90a@redhat.com> <88f1f73e-9879-95f9-bbc4-339c43a5c2f1@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH] kernel/fork: stop playing lockless games for exe_file replacement In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 14.08.23 10:54, Mateusz Guzik wrote: > On 8/14/23, David Hildenbrand wrote: >> On 14.08.23 10:21, Mateusz Guzik wrote: >>> On 8/14/23, David Hildenbrand wrote: >>>> On 13.08.23 14:33, Mateusz Guzik wrote: >>>>> xchg originated in 6e399cd144d8 ("prctl: avoid using mmap_sem for >>>>> exe_file serialization"). While the commit message does not explain >>>>> *why* the change, clearly the intent was to use mmap_sem less in this >>>>> codepath. I found the original submission [1] which ultimately claims >>>>> it >>>>> cleans things up. >>>> >>>> More details are apparently in v1 of that patch: >>>> >>>> https://lore.kernel.org/lkml/1424979417.10344.14.camel@stgolabs.net/ >>>> >>>> Regarding your patch: adding more mmap_write_lock() where avoidable, I'm >>>> not so sure. >>>> >>> >>> But exe_file access is already synchronized with the semaphore and >>> your own commit added a mmap_read_lock/mmap_read_unlock cycle after >>> the xchg in question to accomodate this requirement. >> >> Yes, we want to handle concurrent fork() ("Don't race with dup_mmap()"), >> thus mmap_read_lock. >> >>> Then mmap_write_lock around the replacement is the obvious thing to do. >> >> Apparently to you. :) >> >> mmap_write_lock will block more than fork. For example, concurrent page >> faults (without new VMA locking), for no apparent reason to me. >> >>> >>>> Your patch doesn't look (to me) like it is removing a lot of complexity. >>>> >>> >>> The code in the current form makes the reader ask what prompts xchg + >>> read-lock instead of mere write-locking. >>> >>> This is not a hot path either and afaics it can only cause contention >>> if userspace is trying to abuse the interface to break the kernel, >>> messing with a processs hard at work (which would be an extra argument >>> to not play games on kernel side). >>> >>> That said, no, it does not remove "a lot of complexity". It does >>> remove some though at no real downside that I can see. >>> >>> But then again, it is on people who insist on xchg to justify it. >> >> Changing it now needs good justification, why we would want to block any >> concurrent MM activity. Maybe there is good justification. >> >> In any case, this commit would have to update the documentation of >> replace_mm_exe_file, that spells out existing locking behavior. >> > > Perhaps it will help if I add that the prctl thingy always had a > troubled relationship with locking. Yes, it's not the first time that I looked at kernel/sys.c and wodnered why some of the toggles don't perform any locking. > > Last time I looked at it was in 2016, where I found that it was doing > down_read to update arg_start/arg_end and others while a consumer in > procfs would read them and assert on their sanity. As only a read-lock > was held, 2 threads could be used to transiently produce a bogus state > as they race with their updates and trigger the BUG. See this commit > (but also excuse weirdly bad english ;)) > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ddf1d398e517e660207e2c807f76a90df543a217 > > Moreover check out the following in prctl_set_auxv: > > task_lock(current); > memcpy(mm->saved_auxv, user_auxv, len); > task_unlock(current); > > any thread in the process can reach that codepath while sharing the > same mm, thus this does not lock any updates. Not only that, but a > duplicated memcpy onto the area in prctl_set_mm_map does not even take > that lock and the code to read this does not take any locks. > > [Code duplication and synchronization aside, additional points > deducted for saved_auxv storing always-NULL pointers instead of adding > them on reads.] > > The above exhausts my willingness to argue about this change, I'm just > a passerby. If it is NAKed, I'm dropping the subject. As long as nobody cares about concurrent MM activity being restricted with your change (I suspect we don't care), we're good. -- Cheers, David / dhildenb