Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp943838pxb; Wed, 27 Oct 2021 15:46:28 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwPnE7QiCP3+z0z/VINunEdP9BFmH/tf5WMul5GEedr62mWTPpVWyB7adM6nL86UUZkNMaM X-Received: by 2002:aa7:cc96:: with SMTP id p22mr935765edt.91.1635374787804; Wed, 27 Oct 2021 15:46:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635374787; cv=none; d=google.com; s=arc-20160816; b=ZOjo6apO4Ai+CoZdDg8vxiqUjziX1AFDLCERMMCGYnPKXnk8egfJDz+2673T17ZCQq FWrJx8hItBz2VMHO1ux9HJixCYqHQLhOI7LTDd8MkFjNdjpNWU1I7Zf6KZwlgL/vdRc7 oDpFKu1LBg39arcYhr8btkFYhUeuw9RLmLdtxEmdq4tL34yNFc0jDbq9rBk/9w1XG92B 1gq8STekRp5LtLlQt1P1MyuX/KWKC9YfwJ8cXtNn+bT9h4V4kqa1KcDOJ+JK+YjcZc0G jHmSC/5u1UKIZROMCXlQQsgHBIEGW2Y0g2ppMabKLW11zQEVuwffNKjGaC/D0uOR8LlQ LAOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=yHdp9tOZ7y1Mwfs+YkJJ25dLtIO+Tt3gV9cviUbRGcA=; b=lwLx2LT1zuzZm8jlC6+uQjQ44bbusmA9Kyf1+pMhEQhuBM9UkEkCcqQQfiTW/KaAkB A/6yaqcfabiBaKwNmWpkbJ/x5Y4fxRLmFj9psaNJoI4hiCCLxzdIUTFAVbN6mwibFINd FlSCVGMgRk7t2gKm9mm+GFfQInq5NkCoZzIYk6LpCX039q+BkJV/t4G28gfCGf87pBGO Vf0ob3xGaz1P5J9uVAc9EEO2O43tOiABdXTCaTzOY+VNburpOYyWSaez2K13jo4Sh2tM 5QjhAR9SUZI2zzeZKlmfVY2HYUS6DqCgNuHn8IPwVlxnnj6sy+56A/gkm0XXIHB1Wos0 QYhA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@virtuozzo.com header.s=relay header.b=CyhB2X7s; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=virtuozzo.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i2si2274048ejw.120.2021.10.27.15.46.02; Wed, 27 Oct 2021 15:46:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@virtuozzo.com header.s=relay header.b=CyhB2X7s; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=virtuozzo.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230125AbhJ0Wqm (ORCPT + 99 others); Wed, 27 Oct 2021 18:46:42 -0400 Received: from relay.sw.ru ([185.231.240.75]:45836 "EHLO relay.sw.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229836AbhJ0Wqj (ORCPT ); Wed, 27 Oct 2021 18:46:39 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=MIME-Version:Message-Id:Date:Subject:From: Content-Type; bh=yHdp9tOZ7y1Mwfs+YkJJ25dLtIO+Tt3gV9cviUbRGcA=; b=CyhB2X7sZcDq En9KVWHoUzf7flwOVHxLNIQaAamPUGAa2XatDb8wK1m2fpChqukWeOBQTX1xjA3acPSCwU+Q/acYh thknzRkIUYrBrc1AangJbDLZMbFoBTA21Hv6lO3JBP5tZKhlqggvT+MUFux4Orb2YI6XdP033kbwa 2FLiI=; Received: from [10.94.6.52] (helo=dhcp-172-16-24-175.sw.ru) by relay.sw.ru with esmtp (Exim 4.94.2) (envelope-from ) id 1mfreT-007RG6-35; Thu, 28 Oct 2021 01:44:09 +0300 From: Alexander Mikhalitsyn To: linux-kernel@vger.kernel.org Cc: Alexander Mikhalitsyn , "Eric W. Biederman" , Andrew Morton , Davidlohr Bueso , Greg KH , Andrei Vagin , Pavel Tikhomirov , Vasily Averin , Manfred Spraul , Alexander Mikhalitsyn , stable@vger.kernel.org Subject: [PATCH 0/2] shm: shm_rmid_forced feature fixes Date: Thu, 28 Oct 2021 01:43:46 +0300 Message-Id: <20211027224348.611025-1-alexander.mikhalitsyn@virtuozzo.com> X-Mailer: git-send-email 2.28.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org A long story behind all of that... Some time ago I met kernel crash after CRIU restore procedure, fortunately, it was CRIU restore, so, I had dump files and could do restore many times and crash reproduced easily. After some investigation I've constructed the minimal reproducer. It was found that it's use-after-free and it happens only if sysctl kernel.shm_rmid_forced = 1. The key of the problem is that the exit_shm() function not handles shp's object destroy when task->sysvshm.shm_clist contains items from different IPC namespaces. In most cases this list will contain only items from one IPC namespace. Why this list may contain object from different namespaces? Function exit_shm() designed to clean up this list always when process leaves IPC namespace. But we made a mistake a long time ago and not add exit_shm() call into setns() syscall procedures. 1st second idea was just to add this call to setns() syscall but it's obviously changes semantics of setns() syscall and that's userspace-visible change. So, I gave up this idea. First real attempt to address the issue was just to omit forced destroy if we meet shp object not from current task IPC namespace [1]. But that was not the best idea because task->sysvshm.shm_clist was protected by rwsem which belongs to current task IPC namespace. It means that list corruption may occur. Second approach is just extend exit_shm() to properly handle shp's from different IPC namespaces [2]. This is really non-trivial thing, I've put a lot of effort into that but not believed that it's possible to make it fully safe, clean and clear. Thanks to the efforts of Manfred Spraul working and elegant solution was designed. Thanks a lot, Manfred! Eric also suggested the way to address the issue in ("[RFC][PATCH] shm: In shm_exit destroy all created and never attached segments") Eric's idea was to maintain a list of shm_clists one per IPC namespace, use lock-less lists. But there is some extra memory consumption-related concerns. Alternative solution which was suggested by me was implemented in ("shm: reset shm_clist on setns but omit forced shm destroy") Idea is pretty simple, we add exit_shm() syscall to setns() but DO NOT destroy shm segments even if sysctl kernel.shm_rmid_forced = 1, we just clean up the task->sysvshm.shm_clist list. This chages semantics of setns() syscall a little bit but in comparision to "naive" solution when we just add exit_shm() without any special exclusions this looks like a safer option. [1] https://lkml.org/lkml/2021/7/6/1108 [2] https://lkml.org/lkml/2021/7/14/736 Cc: "Eric W. Biederman" Cc: Andrew Morton Cc: Davidlohr Bueso Cc: Greg KH Cc: Andrei Vagin Cc: Pavel Tikhomirov Cc: Vasily Averin Cc: Manfred Spraul Cc: Alexander Mikhalitsyn Cc: stable@vger.kernel.org Signed-off-by: Alexander Mikhalitsyn Alexander Mikhalitsyn (2): ipc: WARN if trying to remove ipc object which is absent shm: extend forced shm destroy to support objects from several IPC nses include/linux/ipc_namespace.h | 15 +++ include/linux/sched/task.h | 2 +- include/linux/shm.h | 2 +- ipc/shm.c | 170 +++++++++++++++++++++++++--------- ipc/util.c | 6 +- 5 files changed, 145 insertions(+), 50 deletions(-) -- 2.31.1