Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp3233853pxv; Mon, 12 Jul 2021 12:28:33 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwQ2yCkV3Ey7sDcLu9y1Gwiy1GZ4XczKTW5pX/212k3iJsue2jQ+y9umlEtUAxhZEIN6rDG X-Received: by 2002:a17:907:76da:: with SMTP id kf26mr675919ejc.511.1626118113119; Mon, 12 Jul 2021 12:28:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626118113; cv=none; d=google.com; s=arc-20160816; b=kRRCmCyPWt9MTC/zL06oxGuVLzDuDjVnL/0qh4Hx6OxXikH1zI3LpsFgLj35mBFRN3 pQ9QgV+d64DkvEejwfk4xPj8lZz+mRtlBnpNpuYe4SPn711e0pA++lTX5QqQ9XBpkwJs DrsRDfyZqMGJ2i4b8sx8xojLB45VQBHWk+qKipjZ2gITEMEpewkAh6EN7vR6Zh3ig+ZP 2yjS4knZ6g5k/xXjfrkt8RKH65YUQx29I4ooRo2Iwny7wREwDltBTM5Cgl6H/LMUwtZM NtTfjgLDVjCNNfQdgXgLZZ1ogNGgZYF40fYTqjEJe3YYaa4/M6fCnmenLZE6V6Ra068Q ieGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=H0SY2i5T1uLLAwEBIZLwQwYm4iQHs9xRAcgGeCt9wm0=; b=q6lsh6iuzpkfSGTIJ1GNU0lrDZX3Q1j7LYpkEymWFtrUG7XBKnnZMntDTmUNN6EdSW mfyaVZONjCWjl31cPUtDimSiSkinZHaRz/qwmCJlel13eyp43/KXxAZpjuRHMqxtngeM DtmnzWGc7YY28HqgaHQCx8GfVsQBKumg4zdKxzvt9fqp9D276fLjsIdGHoXYpyTSgsyE Rhs1K6z7//CSksQ/CGJjKIl8yxUvUJB2+nI1zZpX5OglgBu5rlqTGgmeO1sPBukRYVzx 4Jyp0679aVq7bxblM4s67MKjPh0kXzPPkaxphS9p9T6CD+0wAu1nqS2U06nwblzsTvyy L1Ow== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@mihalicyn.com header.s=mihalicyn header.b=d2i8zhWU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=mihalicyn.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v21si19526895ejr.485.2021.07.12.12.28.09; Mon, 12 Jul 2021 12:28:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@mihalicyn.com header.s=mihalicyn header.b=d2i8zhWU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=mihalicyn.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234868AbhGLTaE (ORCPT + 99 others); Mon, 12 Jul 2021 15:30:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41126 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233090AbhGLTaC (ORCPT ); Mon, 12 Jul 2021 15:30:02 -0400 Received: from mail-ed1-x534.google.com (mail-ed1-x534.google.com [IPv6:2a00:1450:4864:20::534]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 59401C0613DD for ; Mon, 12 Jul 2021 12:27:12 -0700 (PDT) Received: by mail-ed1-x534.google.com with SMTP id t3so29643835edc.7 for ; Mon, 12 Jul 2021 12:27:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mihalicyn.com; s=mihalicyn; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=H0SY2i5T1uLLAwEBIZLwQwYm4iQHs9xRAcgGeCt9wm0=; b=d2i8zhWU+ABwbaJYLb0m14Y/YxocoioWmFCc0dl/kvA4WR1l/q13Xy7Uuh/UJ4zwBL p9ToZOjUSLeJq7ZfmTikTPUfuLrmyIRtXiaPVqHZ6/BLdI1bCR0iQ+NcvMjZNmireEpd CqJmkqaR248eK3RPTCMfUTCTlE0NK83DrGBmg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=H0SY2i5T1uLLAwEBIZLwQwYm4iQHs9xRAcgGeCt9wm0=; b=h94sPHw9QG/QhIAvPuttjJbNsY/ya/l/fFkzB1djXYP2+H6+MkryK4j6Rpp3jIEFtu V58hkW7D/DYorYeSs9gp46hFa03hiJ8y42f9qYVkKqLLbXWRMteQTNN1nVoEHb/BB/D4 2DrH6DbWPO2xkZANFl4c4P8DhttER2zf50ESdXrjhKGJJk+GhsiIZR106fK3zaQK/l0o kBAG9KNhLno2L5VS/Sx+SA7Q9DeTLBNfDSMf4SCzwpX+wHZSyBFmGFSyfa4Mdy94tWps k3z4RQckwlqklHul7bOn1tQ9NBstwwht5yq6Eh0bcathfWxQbi7SugM1MAETzoiaGAx8 2Aiw== X-Gm-Message-State: AOAM530bKnTIO5c2pZnsoB6jj1LoPepZWw0LYs/Kujdx+CYSVRofn2YT HEJS/ZGRyZx7xM5M+mr66u6M9uO9NFTcGlDbl2jCoA== X-Received: by 2002:a05:6402:498:: with SMTP id k24mr536302edv.25.1626118030895; Mon, 12 Jul 2021 12:27:10 -0700 (PDT) MIME-Version: 1.0 References: <20210706132259.71740-1-alexander.mikhalitsyn@virtuozzo.com> <20210709181241.cca57cf83c52964b2cd0dcf0@linux-foundation.org> <87y2ab9w8u.fsf@disp2133> In-Reply-To: <87y2ab9w8u.fsf@disp2133> From: Alexander Mihalicyn Date: Mon, 12 Jul 2021 22:27:00 +0300 Message-ID: Subject: Re: [PATCH 0/2] shm: omit forced shm destroy if task IPC namespace was changed To: "Eric W. Biederman" Cc: Manfred Spraul , Andrew Morton , "linux-kernel@vger.kernel.org" , Pavel Tikhomirov , Davidlohr Bueso , Johannes Weiner , Michal Hocko , Vladimir Davydov , Andrei Vagin , Christian Brauner Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Eric, On Mon, Jul 12, 2021 at 10:18 PM Eric W. Biederman wrote: > > Alexander Mihalicyn writes: > > > Hello Manfred, > > > > On Sun, Jul 11, 2021 at 2:47 PM Manfred Spraul wrote: > >> > >> Hi Alex, > >> > >> > >> Am Sonntag, 11. Juli 2021 schrieb Alexander Mihalicyn : > >> > > >> > Hi, Manfred, > >> > > >> > On Sun, Jul 11, 2021 at 12:13 PM Manfred Spraul > >> > wrote: > >> > > > >> > > Hi, > >> > > > >> > > > >> > > Am Samstag, 10. Juli 2021 schrieb Alexander Mihalicyn : > >> > >> > >> > >> > >> > >> Now, using setns() syscall, we can construct situation when on > >> > >> task->sysvshm.shm_clist list > >> > >> we have shm items from several (!) IPC namespaces. > >> > >> > >> > >> > >> > > Does this imply that locking ist affected as well? According to the initial patch, accesses to shm_clist are protected by "the" IPC shm namespace rwsem. This can't work if the list contains objects from several namespaces. > >> > > >> > Of course, you are right. I've to rework this part -> I can add check into > >> > static int newseg(struct ipc_namespace *ns, struct ipc_params *params) > >> > function and before adding new shm into task list check that list is empty OR > >> > an item which is present on the list from the same namespace as > >> > current->nsproxy->ipc_ns. > >> > > >> Ok. (Sorry, I have only smartphone internet, thus I could not check > >> the patch fully) > >> > >> > >> I've proposed a change which keeps the old behaviour of setns() but > >> > >> fixes double free. > >> > >> > >> > > Assuming that locking works, I would consider this as a namespace design question: Do we want to support that a task contains shm objects from several ipc namespaces? > >> > > >> > This depends on what we mean by "task contains shm objects from > >> > several ipc namespaces". There are two meanings: > >> > > >> > 1. Task has attached shm object from different ipc namespaces > >> > > >> > We already support that by design. When we doing a change of namespace > >> > using unshare(CLONE_NEWIPC) even with > >> > sysctl shm_rmid_forced=1 we not detach all ipc's from task! > >> > >> OK. Thus shm and sem have different behavior anyways. > >> > >> > > >> > 2. Task task->sysvshm.shm_clist list has items from different IPC namespaces. > >> > > >> > I'm not sure, do we need that or not. But I'm ready to prepare a patch > >> > for any of the options which we choose: > >> > a) just add exit_shm(current)+shm_init_task(current); > >> > b) prepare PATCHv2 with appropriate check in the newseg() to prevent > >> > adding new items from different namespace to the list > >> > c) rework algorithm so we can safely have items from different > >> > namespaces in task->sysvshm.shm_clist > >> > > >> Before you write something, let's wait what the others say. I don't > >> qualify AS shm expert > >> > >> a) is user space visible, without any good excuse > > > > yes, but maybe we decide that this is not so critical? > > We need more people here :) > > It is barely visible. You have to do something very silly > to see this happening. It is probably ok, but the work to > verify that nothing cares so that we can safely backport > the change is probably much more work than just updating > the list to handle shmid's for multiple namespaces. > > > >> c) is probably highest amount of Changes > > > > yep. but ok, I will prepare patches fast. > > Given that this is a bug I think c) is the safest option. > > A couple of suggestions. > 1) We can replace the test "shm_creator != NULL" with > "list_empty(&shp->shm_clist)" and remove shm_creator. > > Along with replacing "shm_creator = NULL" with > "list_del_init(&shp->shm_clist)". > > 2) We can update shmat to do "list_del_init(&shp->shm_clist)" > upon shmat. The last unmap will still shm_destroy the > shm segment as ns->shm_rmid_forced is set. > > For a multi-threaded process I think this will nicely clean up > the clist, and make it clear that the clist only cares about > those segments that have been created but never attached. > > 3) Put a non-reference counted struct ipc_namespace in struct > shmid_kernel, and use it to remove the namespace parameter > from shm_destroy. Thanks for your detailed suggestions! ;) I will prepare a patch tomorrow for kernel + test what's happening with CRIU and will prepare a fix for it. > > I think that is enough to fix this bug with no changes in semantics, > no additional memory consumed, and an implementation that is easier > to read and perhaps a little faster. > > Eric Regards, Alex