Received: by 10.223.164.221 with SMTP id h29csp1332535wrb; Wed, 1 Nov 2017 14:34:58 -0700 (PDT) X-Google-Smtp-Source: ABhQp+QW4Ir0TAeMnpUp/wrzvkP/HyfUPZ0haGJL1/dKmF0OExjzaqjKxb2a1SbhQN3FYBs5ZQCO X-Received: by 10.98.32.212 with SMTP id m81mr1240343pfj.227.1509572098330; Wed, 01 Nov 2017 14:34:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1509572098; cv=none; d=google.com; s=arc-20160816; b=e/GcrBs/v3mGNEHinwIg87NN+34ra55BAtoC6++DdKe6Z5SYhgYrqg5e4bPbqyIa8Y sOSa+woS3AWyzhyKfayoJ+j9p5HutOiP4q7Otp9xL0NK+i6ocLIu7NTJUVRcE6A2jGu3 pCeP4peXFtRRvjS+bDF3WtlYCETlgPRUmELc7m+fAcaep5K3B0UE5IUnSv5LXaWHfcm6 LDa0w9vfeghz2BA+KhOh7B3mBKUNKLDqVuSXsu8GzR0kw7NyZ3oKCt2mq/tVRYDfa/wp IbAnSwcLU7ySgsMqt8SoQiUWCM4OiY9jhQk03rZGy+5nLVOBxE6geLZ5VoMP3k/D+g6d VjWQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dmarc-filter:arc-authentication-results; bh=tsk8kTD/Rxj5htiTV2cp2lmkmounRFt+BF9brMribKY=; b=t6r1aHwlw3CwkK/6n3qbgrXKVh7/gKrmKZkg97KVN8ADLMUmXH3p0Wy+pyMmZnSBm/ +h9df3nepwULj51ZoobJeyhWz31SWrw1DUFkLj2QQDRU2AeS2/+FmPtFEmsJiGAlipvQ YZJnQ/TN5B3Pe0l09U7PidD6Dl330eUDkknbxi8UsA0rfXj5DJ7qe6mu6mWlqaLKCUZz wpHIOhTjAs4t5H5Dz6HzOQfuUe2NoPwd3wiHdm8df64yoqB1ydkKUKQ9PGRfvzEZManq 1J5MdulZjLG/4ldt6u6NGKPAtLJBJopDo84cSxSE+Q+Rw1BiXi6iYDZTUyhzpBC9OTSJ pu5w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x11si563256pln.594.2017.11.01.14.34.45; Wed, 01 Nov 2017 14:34:58 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933288AbdKAVc2 (ORCPT + 99 others); Wed, 1 Nov 2017 17:32:28 -0400 Received: from mx1.redhat.com ([209.132.183.28]:49756 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933383AbdKAVc0 (ORCPT ); Wed, 1 Nov 2017 17:32:26 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B0EEB356CB; Wed, 1 Nov 2017 21:32:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com B0EEB356CB Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=msnitzer@redhat.com Received: from localhost (unknown [10.18.25.149]) by smtp.corp.redhat.com (Postfix) with ESMTPS id D13815C542; Wed, 1 Nov 2017 21:32:23 +0000 (UTC) Date: Wed, 1 Nov 2017 17:32:22 -0400 From: Mike Snitzer To: "Paul E. McKenney" Cc: dm-devel@redhat.com, Mikulas Patocka , linux-kernel@vger.kernel.org, "Alasdair G. Kergon" , Zdenek Kabelac Subject: Re: SRCU's apparent use of NR_CPUS? [was: re: dm: allocate struct mapped_device with kvzalloc] Message-ID: <20171101213222.GA27306@redhat.com> References: <20171101154844.GA25792@redhat.com> <20171101162306.GU3659@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171101162306.GU3659@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Wed, 01 Nov 2017 21:32:26 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 01 2017 at 12:23pm -0400, Paul E. McKenney wrote: > On Wed, Nov 01, 2017 at 11:48:44AM -0400, Mike Snitzer wrote: > > [cc'ing Paul, and LKML, to get his/others' take on SRCU cpu scaling] > > > > On Tue, Oct 31 2017 at 7:33pm -0400, > > Mikulas Patocka wrote: > > > > > The structure srcu_struct can be very big, its size is proportional to the > > > value CONFIG_NR_CPUS. The Fedora kernel has CONFIG_NR_CPUS 8192, the field > > > io_barrier in the struct mapped_device has 84kB in the debugging kernel > > > and 50kB in the non-debugging kernel. The large size may result in failure > > > of the function kzalloc_node. > > > > > > In order to avoid the allocation failure, we use the function > > > kvzalloc_node, this function falls back to vmalloc if a large contiguous > > > chunk of memory is not available. This patch also moves the field > > > io_barrier to the last position of struct mapped_device - the reason is > > > that on many processor architectures, short memory offsets result in > > > smaller code than long memory offsets - on x86-64 it reduces code size by > > > 320 bytes. > > > > > > Note to stable kernel maintainers - the kernels 4.11 and older don't have > > > the function kvzalloc_node, you can use the function vzalloc_node instead. > > > > > > Signed-off-by: Mikulas Patocka > > > Cc: stable@vger.kernel.org > > > > This looks reasonable as a near-term workaround.. BUT: > > Paul has there been any discussion about how to make SRCU support > > dynamically scaling up to NR_CPUS maximum as 'nr_cpus' changes (rather > > than accounting for worst case of NR_CPUS up-front)? > > This is the first I have heard of this being a problem. > > For static instances of srcu_struct, life is hard. > > But it should not be all that difficult for SRCU to provide an allocator > for the dynamic cases, which given your kzalloc_node() above is the case > you are worried about, at least assuming that these allocations happen > after rcu_init() is invoked (which is pretty early). > > My approach would be to move the srcu_struct ->node[] array to its > own structure, with a pointer from srcu_struct, allowing short-sized > allocations to be used. (But I do need to check to make sure that there > are no gotchas, and with RCU there usually are a few.) Obviously some > -serious- testing would be required -- do you have a range of systems > to test on? If you'd like to give it a try I'd be happy to work on getting you test coverage. I do have access to a pretty wide range of systems. What type of testing would you like to see? (From where I sit as DM maintainer my testing would be DM-specific, just loading a DM device would make use of the SRCU code in question, so please let me know if there is anything more general you'd like done) > However, you would still have your potential failure case for systems > that really did have large numbers of CPUs, some of which really do > exist in the wild. > > > (But I had a quick look at scrutree.h and I'm not seeing explicit use of > > NR_CPUS, so it is likely occuring via implicit percpu through some > > member of 'struct srcu_struct', e.g. 'sda'?) > > The srcu_struct structure sees NR_CPUS via include/linux/rcu_node_tree.h, > which sizes the srcu_node array at build time. > > The sda pointer references a per-CPU allocation, which I believe already > is sized to the actual system rather than to NR_CPUS. OK, thanks for clarifying. Mike From 1582881527662224304@xxx Wed Nov 01 16:24:18 +0000 2017 X-GM-THRID: 1582879353469267685 X-Gmail-Labels: Inbox,Category Forums