Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp2126891pxv; Sun, 11 Jul 2021 03:34:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxWkg6rl+vOIAUXneHEEJEFHksDtsYOa35UjVwM+vcAxs+9fwHTJPX3vQJbMsF0d2u7SfZS X-Received: by 2002:a6b:b210:: with SMTP id b16mr30115820iof.209.1625999654503; Sun, 11 Jul 2021 03:34:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1625999654; cv=none; d=google.com; s=arc-20160816; b=zZpN+n4O2gyRAJ06vY+GjYGjho9HZ6BRhJInG4i/RDJvJtFfIo6wnRIOuKk8Hw3N5O Vex17FrD8T1fJDlCsAxO96ACxL0Xg+rdVR4C1N0qGkg621x1soQY+L7ISE1SXA3UCRnp RfTV1367A5n1fVLc5rJ/CHM9UA0GTDxthTS5Y6EYg1gmP3d5RnSCaLZkAW9g+6M5NQbX TUDtoYPUr2QjJ0mBcv4vlRiT5s9MHQXFpjHfqsDZlpasd9HcI8dmt2M3vRE554McDp1Z geUUlEBLenTC48ufSkQRNK6/B5v6vDn04+MXRiKcVy6AHHIeMbhnzYdtlYf5GYa7Q3x/ M5gA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=66BLxugFoZqs32+cjec7ut7cucLB/jHItPYdDOsXDYk=; b=J3lrnOF+yNqP15eGtgx8JyAjOSbBBsCehAndgJONAdJSoLQqrIqBSMs2Tg46rt4wIn BnGqRbUEha3JT1GK/O1XB6ntDvY1gtm2fufK9hw8ZvCz824XfCnQQ8ZRdaYnDAYQPxa+ EABrhTSPc+8MRz1t06ZS55PuPwrFx/MiWECg16u1QWt2D+uII+b6I7vdo8ZOrnr/WSZ2 8Fay2n4UIQQFSaECRXcgceBb+jQKx9vL+kuEhcmnMtIL8NtYqT1pu7rHxF2RzduCDYzb 2R9KHeSbYyfEetEZ7no+Bbpqdgi78TzieAO185qm/5DfQCm42/7JpFXy2O4CYvDhoNZ7 CkMw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@mihalicyn.com header.s=mihalicyn header.b=G90e1S3c; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=mihalicyn.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u8si16556789ill.90.2021.07.11.03.34.02; Sun, 11 Jul 2021 03:34:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@mihalicyn.com header.s=mihalicyn header.b=G90e1S3c; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=mihalicyn.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231444AbhGKKgX (ORCPT + 99 others); Sun, 11 Jul 2021 06:36:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46736 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231183AbhGKKgW (ORCPT ); Sun, 11 Jul 2021 06:36:22 -0400 Received: from mail-ej1-x62e.google.com (mail-ej1-x62e.google.com [IPv6:2a00:1450:4864:20::62e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 14F86C0613DD for ; Sun, 11 Jul 2021 03:33:36 -0700 (PDT) Received: by mail-ej1-x62e.google.com with SMTP id c17so27025396ejk.13 for ; Sun, 11 Jul 2021 03:33:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mihalicyn.com; s=mihalicyn; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=66BLxugFoZqs32+cjec7ut7cucLB/jHItPYdDOsXDYk=; b=G90e1S3cXf1kWiZ74XhXxxfdOKsUS1sZGM7hQXBshakpmGjtWy7Ena4gleugHqAMec UbrqQREdu2RGoyt0VfSOPyqdONJJo0e3rDWobPecP8FsR5E2G1pkyX9YdSvE4DFT+JEq RXwnMly25gzHoSgwcCoBrNtHT9CXiQAqDkVmw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=66BLxugFoZqs32+cjec7ut7cucLB/jHItPYdDOsXDYk=; b=sfAKSLqGv0zfzeepTJsUPExAvyVQ8ofME48kJi9CDx/siqlGDT4PxIzoDT85B2oFeR b+g/tTcnfYjUH2Xywq6TQGOfqckzZJEdSeeol10WW+sYnsjn6mGdrU2+htB3rMGnJ+k+ E8ogOz+RFweH6MRyHLLLOxguxzL8s8P4PkUIKpHci/CwwVSJ+9V78L+sbNDDPv5YDjGA 0ygX2X+NIvSfg4xoojjKWb63Gre/wEEMQWmGpZQ9pvNDZPdzqeIyQ2jANyXwXLu81aEe JaiPucT+FKgkD1c7TZtmB8xfJ84FNmxeffKXuyZnZGLOepsujv4pBMFUXgCHGuHIaaLv V+xw== X-Gm-Message-State: AOAM533uZYYF1i9FSv3MSZ39Z7jICc3bJ9bsCVz2dxC9xY+CeGBhibav FZEJIjd9gOhmKNDudAtB2fhOThjQnCZOMXT1N2itbg== X-Received: by 2002:a17:907:990f:: with SMTP id ka15mr47949357ejc.132.1625999613868; Sun, 11 Jul 2021 03:33:33 -0700 (PDT) MIME-Version: 1.0 References: <20210706132259.71740-1-alexander.mikhalitsyn@virtuozzo.com> <20210709181241.cca57cf83c52964b2cd0dcf0@linux-foundation.org> In-Reply-To: From: Alexander Mihalicyn Date: Sun, 11 Jul 2021 13:33:22 +0300 Message-ID: Subject: Re: [PATCH 0/2] shm: omit forced shm destroy if task IPC namespace was changed To: Manfred Spraul Cc: Andrew Morton , "linux-kernel@vger.kernel.org" , Milton Miller , Jack Miller , Pavel Tikhomirov , Davidlohr Bueso , Johannes Weiner , Michal Hocko , Vladimir Davydov , "Eric W. Biederman" , Andrei Vagin , Christian Brauner Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, Manfred, On Sun, Jul 11, 2021 at 12:13 PM Manfred Spraul wrote: > > Hi, > > > Am Samstag, 10. Juli 2021 schrieb Alexander Mihalicyn : >> >> >> Now, using setns() syscall, we can construct situation when on >> task->sysvshm.shm_clist list >> we have shm items from several (!) IPC namespaces. >> >> > Does this imply that locking ist affected as well? According to the initial patch, accesses to shm_clist are protected by "the" IPC shm namespace rwsem. This can't work if the list contains objects from several namespaces. Of course, you are right. I've to rework this part -> I can add check into static int newseg(struct ipc_namespace *ns, struct ipc_params *params) function and before adding new shm into task list check that list is empty OR an item which is present on the list from the same namespace as current->nsproxy->ipc_ns. > > From ipc/shm.c: > > 397 down_read(&shm_ids(ns).rwsem); > 398 list_for_each_entry(shp, &task->sysvshm.shm_clist, shm_clist) > 399 shp->shm_creator = NULL; > 400 /* > 401 * Only under read lock but we are only called on current > 402 * so no entry on the list will be shared. > 403 */ > 404 list_del(&task->sysvshm.shm_clist); > 405 up_read(&shm_ids(ns).rwsem); > > > Task A and B in same namespace > > - A: create shm object > > - A: setns() > > - in parallel B) does shmctl(IPC_RMID), A) does exit() Yep. > > >> >> >> So, semantics of setns() and unshare() is different here. We can fix >> this problem by adding >> analogical calls to exit_shm(), shm_init_task() into >> >> static void commit_nsset(struct nsset *nsset) >> ... >> #ifdef CONFIG_IPC_NS >> if (flags & CLONE_NEWIPC) { >> exit_sem(me); >> + shm_init_task(current); >> + exit_shm(current); >> } >> #endif >> >> with this change semantics of unshare() and setns() will be equal in >> terms of the shm_rmid_forced >> feature. > > Additional advantage: exit_sem() and exit_shm() would appear as pairs, both in unshare () and setns(). > >> But this may break some applications which using setns() and >> IPC resources not destroying >> after that syscall. (CRIU using setns() extensively and we have to >> change our IPC ns C/R implementation >> a little bit if we take this way of fixing the problem). >> >> I've proposed a change which keeps the old behaviour of setns() but >> fixes double free. >> > Assuming that locking works, I would consider this as a namespace design question: Do we want to support that a task contains shm objects from several ipc namespaces? This depends on what we mean by "task contains shm objects from several ipc namespaces". There are two meanings: 1. Task has attached shm object from different ipc namespaces We already support that by design. When we doing a change of namespace using unshare(CLONE_NEWIPC) even with sysctl shm_rmid_forced=1 we not detach all ipc's from task! Let see on shm_exit() functio which is validly called when we doing unshare(): if (shm_may_destroy(ns, shp)) { <--- (shp->shm_nattch == 0) && (ns->shm_rmid_forced || (shp->shm_perm.mode & SHM_DEST)) shm_lock_by_ptr(shp); shm_destroy(ns, shp); } here all depends on shp->shm_nattch which will be non-zero if used doing something like this: int id = shmget(0xAAAA, 4096, IPC_CREAT|0700); void *addr = shmat(id, NULL, 0); // <-- map shm to the task address space unshare(CLONE_NEWIPC); // <--- task->sysvshm.shm_clist is cleared! But shm 0xAAAA remains attached id = shmget(0xBBBB, 4096, IPC_CREAT|0700); // <-- add item to the task->sysvshm.shm_clist now it contains object only from new IPC namespace addr = shmat(id, NULL, 0); So, this task->sysvshm.shm_clist list used only for shm_rmid_forced feature. It doesn't affect any mm-related things like /proc//maps or something similar. 2. Task task->sysvshm.shm_clist list has items from different IPC namespaces. I'm not sure, do we need that or not. But I'm ready to prepare a patch for any of the options which we choose: a) just add exit_shm(current)+shm_init_task(current); b) prepare PATCHv2 with appropriate check in the newseg() to prevent adding new items from different namespace to the list c) rework algorithm so we can safely have items from different namespaces in task->sysvshm.shm_clist and, of course, prepare a test case with this particular bug reproducer to prevent future degradations and increase coverage. (I'm not publishing the reproducer program directly on the list at the moment because it may be not fully safe. But I think any of us already knows how to trigger the problem.) > > Does it work everywhere (/proc/{pid}, ...)? > -- > Manfred Thanks, Alex