Received: by 10.213.65.68 with SMTP id h4csp1291274imn; Thu, 29 Mar 2018 01:48:57 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+3qtl5+BskLukAAP32BjTCPcZEGxLvqtvBA3au9vsWc0CPfOyIPfutGSW6iISyBQHRuA5V X-Received: by 10.101.96.205 with SMTP id r13mr4989939pgv.427.1522313337028; Thu, 29 Mar 2018 01:48:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522313336; cv=none; d=google.com; s=arc-20160816; b=nBpx7IXF8HtcKsPkrykxbSQ0poaYazZQ9m4225u11FVPbOz+nqeWv0hERJjAh8MYGG jYEOBnrNctEAMh2CBgqpW3QqW9uDqm6H9uA2F1+lial8zTDGGuL+FonpXbU8aRr8qoEf O1a6nHz2D+0Sq42FK7T0QZ5pwdEymog4A3EwmsM8nbPDJE1IMmGGNtrRh1bTNs4YlORM 0jNkiYYyzv+AnJ2VMSRAAQls7C5rua8XbiwgGHwvrGzifHp+9NG/BYJvZfij7qErv+RD WH4BVrYeEJlQpNEsnAuDhirY5haFUC/MgIOr/qVBl9hhiBedyPgYM55/uAXPwfd3t88D LPfA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=PE7ea9wGGrJmwgueVS8ykd1BBKiIWKy5o3LLHxAp+8s=; b=xW02xbDvOcSBS9fGCrMz+GbRimoZs4kB7qwQ9JGQFKMIvRWzzROCuCBpiKnw1QkNkM zDfifkGBGeruT1anGgZFtSukGoqqwnOi/PeHXiUjUQvJ1zGQ5cqJEAdZ53BLGR7x8Zwb B5OSYY+UnD+I6/XRQh7jLz1TUJnuG9Lo79lWm4/vP2J9sXih+JhGnpr4Z1qq3O3BwH74 AYSLHrYtJjUEN/ipD9+Ib3ZBM5M4JMMa4PvemKfn+a5knd93nXWUqJYAfz+9x6/fJ6Dd DQhF2pExswO+qedAot7KhhtmACaYf3T3NhWf1fwJAiQFSWklT4cXb0LnI/kuEPZ69zBl 2S6w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@colorfullife-com.20150623.gappssmtp.com header.s=20150623 header.b=oA3cfs1L; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d12-v6si1788538plr.634.2018.03.29.01.48.42; Thu, 29 Mar 2018 01:48:56 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@colorfullife-com.20150623.gappssmtp.com header.s=20150623 header.b=oA3cfs1L; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751213AbeC2Irv (ORCPT + 99 others); Thu, 29 Mar 2018 04:47:51 -0400 Received: from mail-wm0-f67.google.com ([74.125.82.67]:40947 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751052AbeC2Irs (ORCPT ); Thu, 29 Mar 2018 04:47:48 -0400 Received: by mail-wm0-f67.google.com with SMTP id x4so9989105wmh.5 for ; Thu, 29 Mar 2018 01:47:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=colorfullife-com.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding:content-language; bh=PE7ea9wGGrJmwgueVS8ykd1BBKiIWKy5o3LLHxAp+8s=; b=oA3cfs1LV1kDSqtvnTAtNqWpK7iEgi3gkBpO5Qk0OQvpQhWrRSKwjW3p+vlNdC2Zf0 25dVXVlFY9t90MCjJ6hBp9xTN9GvLhtKI0cdhhXKydyNAnVv6+orOSrUtIcSCApYdr3e 7h9JQCyniqQYLIHs8BH2b9u5oBDaC2tM9gZB4K3LvO500Zu9hpMdWwXRyXomOB8vsWJU PdRl7vY8zbdvFbrPqaOINvi6npfPXcCHQ37/Xhc5VezYWET1mOPWZFEI/T0bm8NsDFlJ 8JKqJjGR3nVbpHid+9sLUSV6dS22LcElSjxbVz7pIE7uyZSnLz1LEDRRYJ4BHyjqgp0+ cs9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=PE7ea9wGGrJmwgueVS8ykd1BBKiIWKy5o3LLHxAp+8s=; b=fXISRk+gTuopGCRjHoO40awRq6w8QwlJlOG1EOThXyVNr/xa6Le0OUq4lbIJ+60D59 nv/lOhYzsweayEeNdEEmyIFLadrTGmMj0mss26ek/87EygnfL3X8C/lL2/S8rnKynnoi wLnLnZP7UaKC5Zk5mr4imBV1oosAvocoHrrzA4GqUORm1if0zVUwNjRDKXWajpb3rns8 YSag8SEehFx6wEkahuOJO0uP2QS90f/Lne5IrgfLKVzaT6M7ooIaOVbVObA76MWAZQEA Ey+q/VE1ET6CZ4Cr7Hb7fpoO8hxDUhaDMfuGN+3MJrSmFMytAg8nOs450SqNWbjGYOzs FEGA== X-Gm-Message-State: AElRT7F0V0JW4V8GpTl+0kBe8TqZVP3Fa6m3cgYa5cXjOS/Qo7eGVYfD quU5x3oUKaElDC+knKvK8DXMqw== X-Received: by 10.80.221.132 with SMTP id w4mr6397527edk.120.1522313267635; Thu, 29 Mar 2018 01:47:47 -0700 (PDT) Received: from localhost.localdomain (p200300D993D5DA00626DC7FFFE140369.dip0.t-ipconnect.de. [2003:d9:93d5:da00:626d:c7ff:fe14:369]) by smtp.googlemail.com with ESMTPSA id d18sm4195998edb.68.2018.03.29.01.47.45 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 29 Mar 2018 01:47:46 -0700 (PDT) Subject: Re: [RFC][PATCH] ipc: Remove IPCMNI To: Davidlohr Bueso , Waiman Long , Michael Kerrisk Cc: "Eric W. Biederman" , "Luis R. Rodriguez" , Kees Cook , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Andrew Morton , Al Viro , Matthew Wilcox , Stanislav Kinsbursky , Linux Containers , linux-api@vger.kernel.org References: <1520885744-1546-1-git-send-email-longman@redhat.com> <1520885744-1546-5-git-send-email-longman@redhat.com> <87woyfyh57.fsf@xmission.com> <5d4a858a-3136-5ef4-76fe-a61e7f2aed56@redhat.com> <87o9jru3bf.fsf@xmission.com> <935a7c50-50cc-2dc0-33bb-92c000d039bc@redhat.com> <87woyego2u.fsf_-_@xmission.com> <047c6ed6-6581-b543-ba3d-cadc543d3d25@redhat.com> <87h8ph6u67.fsf@xmission.com> <7d3a1f93-f8e5-5325-f9a7-0079f7777b6f@redhat.com> <20180329021409.gcjjrmviw2lckbfk@linux-n805> From: Manfred Spraul Message-ID: <3e201de2-bed2-6f7d-0783-700d095142e0@colorfullife.com> Date: Thu, 29 Mar 2018 10:47:45 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <20180329021409.gcjjrmviw2lckbfk@linux-n805> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello together, On 03/29/2018 04:14 AM, Davidlohr Bueso wrote: > Cc'ing mtk, Manfred and linux-api. > > See below. > > On Thu, 15 Mar 2018, Waiman Long wrote: > >> On 03/15/2018 03:00 PM, Eric W. Biederman wrote: >>> Waiman Long writes: >>> >>>> On 03/14/2018 08:49 PM, Eric W. Biederman wrote: >>>>> The define IPCMNI was originally the size of a statically sized >>>>> array in >>>>> the kernel and that has long since been removed. Therefore there >>>>> is no >>>>> fundamental reason for IPCMNI. >>>>> >>>>> The only remaining use IPCMNI serves is as a convoluted way to format >>>>> the ipc id to userspace.  It does not appear that anything except for >>>>> the CHECKPOINT_RESTORE code even cares about this variety of >>>>> assignment >>>>> and the CHECKPOINT_RESTORE code only cares about this weirdness >>>>> because >>>>> it has to restore these peculiar ids. >>>>> My assumption is that if an array is recreated, it should get a different id.     a=semget(1234,,);     semctl(a,,IPC_RMID);     b=semget(1234,,); now a!=b. Rational: semop() calls only refer to the array by the id. If there is a stale process in the system that tries to access the "old" array and the new array has the same id, then the locking gets corrupted. >>>>> Therefore make the assignment of ipc ids match the description in >>>>> Advanced Programming in the Unix Environment and assign the next id >>>>> until INT_MAX is hit then loop around to the lower ids. >>>>> Ok, sounds good. That way we really cycle through INT_MAX, right now a==b would happen after 128k RMID calls. >>>>> This can be implemented trivially with the current code using >>>>> idr_alloc_cyclic. >>>>> Is there a performance impact? Right now, the idr tree is only large if there are lots of objects. What happens if we have only 1 object, with id=INT_MAX-1? semop() that do not sleep are fairly fast. The same applies for msgsnd/msgrcv, if the message is small enough. @Davidlohr: Do you know if there are application that frequently call semop() and it doesn't have to sleep? From the scalability that was pushed into the kernel, I assume that this exists. I have myself only checked postgresql, and postgresql always sleeps. (and this was long ago) >>>>> To make it possible to keep checkpoint/restore working I have renamed >>>>> the sysctls from xxx_next_id to xxx_nextid.  That is enough change >>>>> that >>>>> a smart CRIU implementation can see that what is exported has >>>>> changed, >>>>> and act accordingly.  New kernels will be able to restore the old >>>>> id's. >>>>> >>>>> This code still needs some real world testing to verify my >>>>> assumptions. >>>>> And some work with the CRIU implementations to actually add the code >>>>> that deals with the new for of id assignment. >>>>> It means that all existing checkpoint/restore application will not work with a new kernel. Everyone must first update the checkpoint/restore application, then update the kernel. Is this acceptable? --     Manfred