Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp740638pxb; Thu, 23 Sep 2021 09:37:29 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwiyeALbYsjTRuHLlOEHYlvZ7ORMrU3M58BWloPwNjsnInylRUKlVcHcc/D8QmA9V8gEujU X-Received: by 2002:a02:1081:: with SMTP id 123mr4746383jay.83.1632415049511; Thu, 23 Sep 2021 09:37:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1632415049; cv=none; d=google.com; s=arc-20160816; b=wvNsRhfhKXvtsn1hlQKfuYBvr5lmBZ+jrLfarfJWUXlc+tl4NLWiJfX2B6FVWNnVOq fnksOBmTkIS5uG+B85tmedHiFUbxvqZujR1KThliSC7347jvepZFh530XCBuRBDHKimZ v4exoUxlx10vbPyx6LhbEhCMdb89n95ZV70s6d5fcnuk1s2yvLh5eVY6Sk5ZvSxwkaiV 8oa3uAgxJlJltwJUG6kZbwG3v4IT0fOqz1lwLdVh2VVgF6bF8TsjnKTosYwRcL9NfeHN FIoWcJt+UMRI9k5GBs/ooE3Q3FfZnXAngwWT2CgjlquVZem+mTQS1afIhYLHcO2Sk7ya ZKOw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-language:content-transfer-encoding :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=oRu9ewgFc/39PpVB/GnblVw4+11L82/zU/npG8Ru7+s=; b=luyH9BmXQvcL0fYAWoqaT++T7wytjs2NCNfH2X/m8289xa2Rt6jppGQAeQL4keaYRU Lmxar33ep6BR3FFV1PXePOJdOc/z9igy7sONsakLI8x5KhRyVxqJZDgBLmfEkqVNBQyS trnNBJk6ErLgyy+BiuRkp+TOoq5vyBj+VxEmBg2f61LlGNeE+YLQ23jbSsgMZ7Facuis vKsr/X7QUuhf8ywhSY61NpNrdpHfTE9UgIWudbOMsYFAhDsZ6BaSGv16/JBhi+z5ZUtY /Q9OEhL2jsVbV9EVtlgwm5ZGK/ueMTX5oGalRgnWDUC20rB5fdsSuknsXMiu3htsmZbc eMKQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@colorfullife-com.20210112.gappssmtp.com header.s=20210112 header.b="coG3/AOB"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n22si6897830ioo.36.2021.09.23.09.37.14; Thu, 23 Sep 2021 09:37:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@colorfullife-com.20210112.gappssmtp.com header.s=20210112 header.b="coG3/AOB"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230139AbhIWQhw (ORCPT + 99 others); Thu, 23 Sep 2021 12:37:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49746 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229913AbhIWQhw (ORCPT ); Thu, 23 Sep 2021 12:37:52 -0400 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 46C57C061574 for ; Thu, 23 Sep 2021 09:36:20 -0700 (PDT) Received: by mail-wr1-x436.google.com with SMTP id t18so19021976wrb.0 for ; Thu, 23 Sep 2021 09:36:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=colorfullife-com.20210112.gappssmtp.com; s=20210112; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding:content-language; bh=oRu9ewgFc/39PpVB/GnblVw4+11L82/zU/npG8Ru7+s=; b=coG3/AOBNehwTqYXFpr8aL0QNDJ9gGLSLkkP/bOGsthl3rXTCb/+G5CraQ2/T8oisD l96spFmY2gnyN9MZp8sxWZWcXCSwJkyYBw0+lqzrvtFEYqNe/tepfi9/Htr2uESXY7Vb P/8injtF4uDeGKnDTkUZ1S6XV1XEUPhvQ8aymQ+FqT9htzI03lwB537Aq8C6B4feOfI6 rJESv5+rAoRiRT22LiiVZiBOj94AjPJr106wJwlM4Cl31S8dAX/E9VrRqZu9y4au4wTC EFisSdgE5yJxq+tTBRQUSPo6yvv+XUkv9HUiq+zVu8iv08efy3vM0SOd+8yE5wArdjtU LhJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=oRu9ewgFc/39PpVB/GnblVw4+11L82/zU/npG8Ru7+s=; b=CwF1vpgffgwmt6H1eGS/Y+JbQ3ME/mvukVrMPrOIE79Z63I7lsFx5MZpQam2qKuhhe Bwn07hxn0z7Ibp34YnYuoN0DEbLVkWH3u0caHHe6koN1HcapKmgBaMS6IY7Ow/INA4LM s1WjGA7/zzvXvrKD0H4oFVnG2EZ+UgMudPOw4QSNvWY2GKDiHOjwbn1RBABRJtSagzsF w6sd+QZh3rcyngOucf8E8ZEu6a/mQAng1u9Ai8VHa0zJ7Xh+6FRLE/9WrSJThDA5J19O JlOYrFxY5WMDkCqP3GPYCCBk66G0lXNgrhmIhm1YEm2L/wtVZTkfacxAunORykR7f8XP Ecgg== X-Gm-Message-State: AOAM533GlQ4tz+jRkR9tykN1YW5F6WKiQ9EvQwr3cfz46IOLDnv3L7MT QRMC3APQkOlsQKIYEO849d0hnQ== X-Received: by 2002:a05:600c:a08:: with SMTP id z8mr17212585wmp.165.1632414978851; Thu, 23 Sep 2021 09:36:18 -0700 (PDT) Received: from localhost.localdomain (p200300d997223e009514b44ef79a69c4.dip0.t-ipconnect.de. [2003:d9:9722:3e00:9514:b44e:f79a:69c4]) by smtp.googlemail.com with ESMTPSA id k19sm5620352wmr.21.2021.09.23.09.36.17 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 23 Sep 2021 09:36:18 -0700 (PDT) Subject: Re: [PATCH 0/2] shm: omit forced shm destroy if task IPC namespace was changed To: "Eric W. Biederman" , Alexander Mihalicyn Cc: Andrew Morton , "linux-kernel@vger.kernel.org" , Milton Miller , Jack Miller , Pavel Tikhomirov , Davidlohr Bueso , Johannes Weiner , Michal Hocko , Vladimir Davydov , Andrei Vagin , Christian Brauner References: <20210706132259.71740-1-alexander.mikhalitsyn@virtuozzo.com> <20210709181241.cca57cf83c52964b2cd0dcf0@linux-foundation.org> <87y2ab9w8u.fsf@disp2133> From: Manfred Spraul Message-ID: <00ce7bff-e432-8244-1765-12460817baab@colorfullife.com> Date: Thu, 23 Sep 2021 18:36:16 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <87y2ab9w8u.fsf@disp2133> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Eric, I'd like to restart the discussion, the issue should be fixed. On 7/12/21 9:18 PM, Eric W. Biederman wrote: > > Given that this is a bug I think c) is the safest option. > > A couple of suggestions. > 1) We can replace the test "shm_creator != NULL" with > "list_empty(&shp->shm_clist)" Yes, good idea. list_del() already contains WRITE_ONCE() and list_empty() contains READ_ONCE(), i.e. we get clear memory ordering rules. > and remove shm_creator. I do not see how we can remove shm_creator: - thread A: creates new segment, doesn't map it. - thread B: shmctl(,IPC_RMID,). thread B must now locate the lock that protects ->shm_clist. And if the lock is per-thread, then I do not see how we can avoid having a pointer in shp to the lock. > Along with replacing "shm_creator = NULL" with > "list_del_init(&shp->shm_clist)". Correct, list_del_init() is better than shm_create = NULL. > 2) We can update shmat to do "list_del_init(&shp->shm_clist)" > upon shmat. That would be a (tiny) user space visible change: echo 0 > /proc/sys/kernel/shm_rmid_forced shmget() shmat() shmdt() echo 1 > /proc/sys/kernel/shm_rmid_forced exit() Right now: segment is destroyed After your proposal: Segment is not destroyed. I don't think that we should mix that with the bugfix. > The last unmap will still shm_destroy the > shm segment as ns->shm_rmid_forced is set. But what if shm_rmid_forced is modified? > For a multi-threaded process I think this will nicely clean up > the clist, and make it clear that the clist only cares about > those segments that have been created but never attached. > 3) Put a non-reference counted struct ipc_namespace in struct > shmid_kernel, and use it to remove the namespace parameter > from shm_destroy. > > I think that is enough to fix this bug with no changes in semantics, > no additional memory consumed, and an implementation that is easier > to read and perhaps a little faster. I do not see how this solves the list corruption: A thread creates 2 shm segments, then switches the namespace and creates another 2 segment. - corruption 1: in each of the namespaces, one thread calls shmctl(,IPC_RMID,) -> both will operate in parallel on ->shm_clist. - corruption 2: exit_shm() in parallel to one thread - corruption 3: one shmctl(,IPC_RMID,) in parallel to a shmget(). i.e.: we can have list_add() and multiple list_del() in parallel. I don't see how this should work without a lock. With regards to memory usage: I would propose to use task_lock(), that lock exists already. --     Manfred