Received: by 10.223.164.202 with SMTP id h10csp713086wrb; Tue, 14 Nov 2017 08:26:43 -0800 (PST) X-Google-Smtp-Source: AGs4zMZ+QdngEzmb7hkc4Uu4YoV3nYn0QZGURYdPyyPhqf2An6GoYvu6q0yloCZlXr0kvHUkbdVE X-Received: by 10.98.155.22 with SMTP id r22mr14105614pfd.96.1510676803598; Tue, 14 Nov 2017 08:26:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1510676803; cv=none; d=google.com; s=arc-20160816; b=x4MuMbm1Cc/pOgc+ohI50BwouCPKvjLLU+KE3phRjiYPucgmp7CfNkNzsYdDY4VMsM 7cqvlTB6Qx/A7x5BLKZC0L7EdSMKly1U26Er6eK5WP53uHKcsYRHTINCkzx03CMnccy9 9SrowiCV08as9b0M2YhvxCqpp5WNn1qb6JxTSqENUPo3gzN7DUTD2xM606ok8B6KluLI UX8W225DRqWi10BGlEXScoNaPIttUX1jk0OzTFeINr89aiDI82TLu2wqgcCXz/E+dFun boiQXLkM7tseelRGfF4CpBOafy2mSEzH2LaUD3Wa1DTOHaRXJuDkFbVzd2r5TNUk+uu0 3KRw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:cc:references:to :subject:dkim-signature:arc-authentication-results; bh=1FNkA5wfxh6aLDgnXg2FLcZMeW5bFSlfIVJ9h11A68w=; b=Jib9n95ussmo8swL0VkzcWIGkdSQ1yg9vLWDThFbbUH67FkRdkODNYJVebegVnXKbH U4YWwcEscEatjMR3PnD2qw76NI/zBETTNgRQI8P9jHs5Gkjyy2NUz3d01lr+RK8Hg4nh Eom8HHdIDFFS9AS+mMKhZy7OEeA+E8FtWGx/gTJR9laouUfKoKiTIc0RKFTvV5OeFnGO 8t+HlJQYIRhzeVItH/btSA1muOq0kLbYaG1K/n34Ru+h8vXXWRAjNNHVJPay/erq2gTV 44gPFc+DGD3SStwfuSo+9uPgaHhJ9JUOKiPZ6BLws1Az1grV20M3zBWjH5Nr9TlkbNQb LyUQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=msbLxveW; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x4si15906823pgt.246.2017.11.14.08.26.31; Tue, 14 Nov 2017 08:26:43 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=msbLxveW; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755000AbdKNPjk (ORCPT + 88 others); Tue, 14 Nov 2017 10:39:40 -0500 Received: from mail-qk0-f193.google.com ([209.85.220.193]:49650 "EHLO mail-qk0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754750AbdKNPjW (ORCPT ); Tue, 14 Nov 2017 10:39:22 -0500 Received: by mail-qk0-f193.google.com with SMTP id w125so13731553qkb.6 for ; Tue, 14 Nov 2017 07:39:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=1FNkA5wfxh6aLDgnXg2FLcZMeW5bFSlfIVJ9h11A68w=; b=msbLxveWpU5GwpcHX6vORYXcfgp8gdH0HBO3sDDt7IYxDvwobFrY8ODmdK43bIGRnK WZF+oGxxh3EMBHXIXJ1JRvDKFdncHOfxm1vNxUlyg/0PWYL0VzW1g0H50eGoT0G79Dii UF3vTBd0rlaGXZe1m5WD+OxKBXRXAu6DxK4F5teSshMDliqhWSubjvQwtPTU1JXG3c9a whX0B5d77l4JeEw8U4o+579WLrxnP555yc/6ywSP/sgPrPAHFO71FIoSjIw7pYwJnryF 1Vd5f0gl9zavtvHlrGfcsx5NhuwnVm2vgLNOhG1o6/z324C0Q6fNf0xsmQNpPB9Y6e/S lmRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=1FNkA5wfxh6aLDgnXg2FLcZMeW5bFSlfIVJ9h11A68w=; b=bYPQBD41+nRY+Gh0M8MUHrZyVRvJuAoUhNIfI0rSb5NFIm7UkrJlsag9Mfjl2XOZMA 0yyRlmd9wcWA45NC8SmgV49OxFGwQ/R2nsVAPBRJN2gpA/jMKXTYjpP9771WAy7IAX45 4Y7fbzru/w0FXsgqJWmEXreGU9m6lPyhyd8R2pcculVoC6TJzwnwgCEF7hImoeSEANlb CMCCkv4y9e0P1swzMYtjq9gVOcErttkmO6+jZdwfecj6pnxmfQMlFifPqQ594L6bHP49 msf19nBzq9lHGbmktcqOHp+r/AUbJpND753pQiT8pYx8Ys2MA9eJqyPoD0NvcTV6yoZk lKCQ== X-Gm-Message-State: AJaThX7Jn1iOucfaUxZurVk5fzzZ7jbJTrmgmP4XQDEXd8FyJl142kEl NMG9RrNqHll/CIBXT8eTE6c= X-Received: by 10.55.56.135 with SMTP id f129mr19850472qka.350.1510673961452; Tue, 14 Nov 2017 07:39:21 -0800 (PST) Received: from [192.168.1.20] (146-115-74-75.s5449.c3-0.arl-ubr1.sbo-arl.ma.cable.rcncustomer.com. [146.115.74.75]) by smtp.gmail.com with ESMTPSA id h187sm12441298qkc.40.2017.11.14.07.39.20 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 14 Nov 2017 07:39:20 -0800 (PST) Subject: Re: Allocation failure of ring buffer for trace To: Mel Gorman References: <9631b871-99cc-82bb-363f-9d429b56f5b9@gmail.com> <20171114114633.6ltw7f4y7qwipcqp@suse.de> Cc: rostedt@goodmis.org, mingo@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, koki.sanagi@us.fujitsu.com, yasu.isimatu@gmail.com From: YASUAKI ISHIMATSU Message-ID: <48b66fc4-ef82-983c-1b3d-b9c0a482bc51@gmail.com> Date: Tue, 14 Nov 2017 10:39:19 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <20171114114633.6ltw7f4y7qwipcqp@suse.de> Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/14/2017 06:46 AM, Mel Gorman wrote: > On Mon, Nov 13, 2017 at 12:48:36PM -0500, YASUAKI ISHIMATSU wrote: >> When using trace_buf_size= boot option, memory allocation of ring buffer >> for trace fails as follows: >> >> [ ] x86: Booting SMP configuration: >> >> >> In my server, there are 384 CPUs, 512 GB memory and 8 nodes. And >> "trace_buf_size=100M" is set. >> >> When using trace_buf_size=100M, kernel allocates 100 MB memory >> per CPU before calling free_are_init_core(). Kernel tries to >> allocates 38.4GB (100 MB * 384 CPU) memory. But available memory >> at this time is about 16GB (2 GB * 8 nodes) due to the following commit: >> >> 3a80a7fa7989 ("mm: meminit: initialise a subset of struct pages >> if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set") >> > > 1. What is the use case for such a large trace buffer being allocated at > boot time? I'm not sure the use case. I found the following commit log: commit 864b9a393dcb5aed09b8fd31b9bbda0fdda99374 Author: Michal Hocko Date: Fri Jun 2 14:46:49 2017 -0700 mm: consider memblock reservations for deferred memory initialization sizing So I thought similar memory exhaustion may occurs on other boot option. And I reproduced the issue. > 2. Is disabling CONFIG_DEFERRED_STRUCT_PAGE_INIT at compile time an > option for you given that it's a custom-built kernel and not a > distribution kernel? The issue also occurred on distribution kernels. So we have to fix the issue. Thanks, Yasuaki Ishimatsu > > Basically, as the allocation context is within smp_init(), there are no > opportunities to do the deferred meminit early. Furthermore, the partial > initialisation of memory occurs before the size of the trace buffers is > set so there is no opportunity to adjust the amount of memory that is > pre-initialised. We could potentially catch when memory is low during > system boot and adjust the amount that is initialised serially but the > complexity would be high. Given that deferred meminit is basically a minor > optimisation that only affects very large machines and trace_buf_size being > used is somewhat specialised, I think the most straight-forward option is > to go back to serialised meminit if trace_buf_size is specified like this; > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index 710143741eb5..6ef0ab13f774 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -558,6 +558,19 @@ void drain_local_pages(struct zone *zone); > > void page_alloc_init_late(void); > > +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT > +extern void __init disable_deferred_meminit(void); > +extern void page_alloc_init_late_prepare(void); > +#else > +static inline void disable_deferred_meminit(void) > +{ > +} > + > +static inline void page_alloc_init_late_prepare(void) > +{ > +} > +#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ > + > /* > * gfp_allowed_mask is set to GFP_BOOT_MASK during early boot to restrict what > * GFP flags are used before interrupts are enabled. Once interrupts are > diff --git a/init/main.c b/init/main.c > index 0ee9c6866ada..0248b8b5bc3a 100644 > --- a/init/main.c > +++ b/init/main.c > @@ -1058,6 +1058,8 @@ static noinline void __init kernel_init_freeable(void) > do_pre_smp_initcalls(); > lockup_detector_init(); > > + page_alloc_init_late_prepare(); > + > smp_init(); > sched_init_smp(); > > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > index 752e5daf0896..cfa7175ff093 100644 > --- a/kernel/trace/trace.c > +++ b/kernel/trace/trace.c > @@ -1115,6 +1115,13 @@ static int __init set_buf_size(char *str) > if (buf_size == 0) > return 0; > trace_buf_size = buf_size; > + > + /* > + * The size of buffers are unpredictable so initialise all memory > + * before the allocation attempt occurs. > + */ > + disable_deferred_meminit(); > + > return 1; > } > __setup("trace_buf_size=", set_buf_size); > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 77e4d3c5c57b..4dd0e153b0f2 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -290,6 +290,19 @@ EXPORT_SYMBOL(nr_online_nodes); > int page_group_by_mobility_disabled __read_mostly; > > #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT > +bool __initdata deferred_meminit_disabled; > + > +/* > + * Allow deferred meminit to be disabled by subsystems that require large > + * allocations before the memory allocator is fully initialised. It should > + * only be used in cases where the size of the allocation may not fit into > + * the 2G per node that is allocated serially. > + */ > +void __init disable_deferred_meminit(void) > +{ > + deferred_meminit_disabled = true; > +} > + > static inline void reset_deferred_meminit(pg_data_t *pgdat) > { > unsigned long max_initialise; > @@ -1567,6 +1580,23 @@ static int __init deferred_init_memmap(void *data) > } > #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ > > +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT > +/* > + * Serialised init of remaining memory if large buffers of unknown size > + * are required that might fail before parallelised meminit can start > + */ > +void __init page_alloc_init_late_prepare(void) > +{ > + int nid; > + > + if (!deferred_meminit_disabled) > + return; > + > + for_each_node_state(nid, N_MEMORY) > + deferred_init_memmap(NODE_DATA(nid)); > +} > +#endif > + > void __init page_alloc_init_late(void) > { > struct zone *zone; > From 1584058083466996963@xxx Tue Nov 14 16:05:09 +0000 2017 X-GM-THRID: 1583974053563537096 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread