Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp25877imu; Thu, 8 Nov 2018 04:30:08 -0800 (PST) X-Google-Smtp-Source: AJdET5dRDbNCO9aRfiNAIX6ZCK500xVDsgiIjGIWX5V4jztk9c0mQl341hrQL6HvOOAwoXoqLd6H X-Received: by 2002:a62:2545:: with SMTP id l66-v6mr4359290pfl.207.1541680208521; Thu, 08 Nov 2018 04:30:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1541680208; cv=none; d=google.com; s=arc-20160816; b=fkoBU6soFgF2Jox/F0ONJj+6cVcZTBHD4ABALVjDFgGxXRKQjGi4oVaLvMAn7ZOUFG D9O4NXY1svhOSIxst9ga7SlobIN/ld7kgIfVBl0XmuzznasDoM5ZRiG6BG2RXuF2IJUx JAcrPI4g6P5AWA+JLkew3lo9GwCu+X9ibDzRTf25x7s4H+lteFroqT7L+CJmcJvOeq/N xmX+C/JBGU/HiUPWxBjbnc1bMNNUcgGO7rxJl2XW+lNkxRiTCMQ+Thgp0u7RrgwFat8G uIhdK/bTKg+HehHhGFSmo3EVN5LLWp/ZeHkqIrUtwI4X6aZ/UvsZq1P39+/mbZn1MgHi G7VA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=7j5d88Q/iIP3Vh0jE+BHQjAFVnGes3gX6JwGr4dy4Vo=; b=LTw+g7iTRUHQHY2HZRpv0DAa0TAMtYXgX4BqjwT97oy9mSRD0ZS3GUl1uuiB3CKbv0 yEJXtYrVTrwYyB8X8c2LkUDGJNLRbg3n2swG7QWHpRQhyuPjKxIOi0jrFfMBnPxfYYbe nXXVllRhd/8BOkuyAe4HNyjD5YDyF9XEc22fNPVN9STABFp7WUKpC3Kegr38zMgclj8l el1voJ7PQl3twtPLmW5kr+m+XHJKy7s/H6B0p0wHTMDdP2ZRIyZlvrF3PN9JLnBpU9Ux WYWKUc0yArHfLhSdXY7/x3kOzp1ScxaJyjlxLngIqWbE/HVMr4vdI47PDN7TDQgBkaMh s2gg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=gPLqYPgh; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 33-v6si4211614pls.173.2018.11.08.04.29.49; Thu, 08 Nov 2018 04:30:08 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=gPLqYPgh; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726756AbeKHWDM (ORCPT + 99 others); Thu, 8 Nov 2018 17:03:12 -0500 Received: from bombadil.infradead.org ([198.137.202.133]:34490 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726405AbeKHWDM (ORCPT ); Thu, 8 Nov 2018 17:03:12 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=7j5d88Q/iIP3Vh0jE+BHQjAFVnGes3gX6JwGr4dy4Vo=; b=gPLqYPghJwIQMH3ovYWGP+v22 XIqlMfpWb0LSguiXXvgdc0KmTNWVYZDlLx7Sc6VLvTnDp4RveEffe8c/HJthaGTM8NED9zkhWvWSr o23oIiCi4yCUXMBNjFxGLZ1FCreDZ8bbVbGulTMJ8I/lkgwdGT9PgXbGJbVYsAqUBpUZj6iAp0cWV oloYl2Vj8jXivd9/YBh1a8TMQx/qa1f9MKBjAkjeiGEywZTU8pd1zKvhu4s29ZiGm0OXzIbcnKc5i JVi3AFofC0tApI13ntGhdUWy/7Zy5FX3LRS11+rReKov5Gry8pSTFiHvRoaKjNUVALjqs0D2g3wKj T2JhgPVTg==; Received: from willy by bombadil.infradead.org with local (Exim 4.90_1 #2 (Red Hat Linux)) id 1gKjPX-0007dc-W7; Thu, 08 Nov 2018 12:27:48 +0000 Date: Thu, 8 Nov 2018 04:27:47 -0800 From: Matthew Wilcox To: David Laight Cc: 'Martin Steigerwald' , Michal Hocko , Daniel Colascione , linux-kernel , "rppt@linux.ibm.com" , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Jonathan Corbet , Andrew Morton , Roman Gushchin , Mike Rapoport , Vlastimil Babka , "Kirill A. Shutemov" , "Dennis Zhou (Facebook)" , Prashant Dhamdhere , "open list:DOCUMENTATION" Subject: Re: [PATCH v2] Document /proc/pid PID reuse behavior Message-ID: <20181108122747.GM3074@bombadil.infradead.org> References: <20181031150625.147369-1-dancol@google.com> <20181107160015.GI27423@dhcp22.suse.cz> <4536090.43ZsV6LvYe@merkaba> <0c5610f128fa49fb9d8f7859e6f61b90@AcuMS.aculab.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <0c5610f128fa49fb9d8f7859e6f61b90@AcuMS.aculab.com> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 08, 2018 at 12:02:49PM +0000, David Laight wrote: > From: Martin Steigerwald > > Sent: 07 November 2018 17:05 > ... > > Its not quite on-topic, but I am curious now: AFAIK PID limit is 16 > > bits. Right? Could it be raised to 32 bits? I bet it would be a major > > change throughout different parts of the kernel. > > It is probably 15 bits (since -ve pid numbers are used for process groups). > > My guess is that userspace and the system call interface will handle 32bit > (signed) pid numbers. > (I don't remember 'linux emulation' being one of the emulations that > would truncate 32bit pids when one of the BDSs went to 32bit pids.) > The main problem will be that big numbers will mess up the alignment > of printouts from ps and top (etc). > This can be mitigated by only allocating 'big' numbers on systems > that have a lot of pids. > You also really want an O(1) allocator. The allocator is O(log n) -- it's the IDR allocator, used in cyclic mode. n in this case is the highest ID which is still in use. The tree is log_64(n) levels high. It walks to the bottom of the tree and puts a pointer into the tree. If the cursor has wrapped to the beginning of the tree, it may encounter a PID which is still in use; if it does, it does a bitmap scan of that node, and will then walk up the tree, doing a bitmap scan forward at each level until it finds a free PID. So it's not exactly O(log(n)), but it's close enough for all practical purposes. And more importantly, it doesn't touch a lot of cachelines. Two or three at each level of the tree that it accesses. If we went all the way to a 32-bit PID, the tree would grow to 6 levels deep, and worst-case would touch 6 + 5 + 4 levels of the tree (starting with trying to allocate PID 0xffffffff, failing, trying to allocate PID 300, then having to walk all the way forward to find PID 0xe0000000), so that's only 45 cachelines. People care a little too much about O(1)/O(n) behaviour. Cacheline behaviour, and good average-case performance without falling off a cliff in the worst case is much more important.