Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp6072585imu; Tue, 13 Nov 2018 16:58:58 -0800 (PST) X-Google-Smtp-Source: AJdET5dfO4c5ERlE4is3z69eqIiexEB7seFScg7ImRhqAnY+QGDoQsGaeSmzxTbR3F8bAqXJ+Xqz X-Received: by 2002:a62:8096:: with SMTP id j144mr2245224pfd.140.1542157138165; Tue, 13 Nov 2018 16:58:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542157138; cv=none; d=google.com; s=arc-20160816; b=lpAcA8bqtIifcrbbiqYDBoPy4+cYfZguvgb9Hqcfnr5aG4+6QcWp+Z5Bw70Oovo32+ /bEn449+DKsnTd510LOdEvqt595GkUrGgrngKJua1qmw6IKwfMzgfFGuevjHHPxr6hHV qnNnoNV7s55FfavAVzG9HiOTqL64FS3I2Lb5TUF1xeg7pgtml25wwFNqqlsDQ1wQOph3 PPq7kaLeaUGELA+en3BjpdyNyJ4NL5B/l/b1dkuPd7YvNELv7zlIv59T2VCuxmyF30L2 +oue5qlCRSLS95ueMxiiDVbuTeKYAuxEoyGdeCkQps+dTVkylAUHKTU/WkGMZsTRowF4 9dZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version; bh=je2xPuRjHinumADpmBrlf+VyDpTCe+HxFm/NT3/Yvxk=; b=mVmmfYpccblG2W2DncGskV+WOI72WZcgeeu0OY+q0jaxNgQcv5fqd4sjRn9GoQbwct P1vNnm/3131+JOB8VRMzeNGXjNijHvDpCSF/wObWpGNNPSY6xWXjBzZDX2qE4YLuM/Pl GyjAedYeTvBZJl67If+YjIagI3DeGkWxyPeKQ85CrX7AgDNZTBoZBMXP4nDgshcp9HjU 84KAbX3YLiTOgeCZbwzXrS0M6CTl8zMOLM0xI8+C2bphmxXFAP6fpda9XLQG533jOfRx CT+Yw6qK3+JOyIfzrjqAXZCX5rlDvRES5RX1yEn6KeiZdAAC7hFh3VKPWa0odmyrkMCi 8mTA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v16-v6si22381064plo.417.2018.11.13.16.58.42; Tue, 13 Nov 2018 16:58:58 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732077AbeKNK5N convert rfc822-to-8bit (ORCPT + 99 others); Wed, 14 Nov 2018 05:57:13 -0500 Received: from mout.gmx.net ([212.227.17.22]:50623 "EHLO mout.gmx.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727295AbeKNK5N (ORCPT ); Wed, 14 Nov 2018 05:57:13 -0500 Received: from [192.168.1.153] ([74.104.183.64]) by mail.gmx.com (mrgmx101 [212.227.17.174]) with ESMTPSA (Nemesis) id 0LlqNY-1fnfhd2Zvs-00ZSxv; Wed, 14 Nov 2018 01:55:57 +0100 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 12.0 \(3445.100.39\)) Subject: Re: ODEBUG: Out of memory. ODEBUG disabled From: Qian Cai In-Reply-To: Date: Tue, 13 Nov 2018 19:55:55 -0500 Cc: Jan Stancek , Andrew Morton , Du Changbin , Christian Borntraeger Content-Transfer-Encoding: 8BIT Message-Id: References: To: linux kernel X-Mailer: Apple Mail (2.3445.100.39) X-Provags-ID: V03:K1:Zq6Gr13QYWhCXkwlVrxpTorEw4+6wglMiFN7Pytd2rMWuj5k9Mw bhRCKQCXrG8UCjKsAn5rvitMSWUXoGi1E0Rj50pu1wuxCj5zoeBAHG/1xOFysO8Oe8PdRs+ +2gEWEyZSE/wS+3zzlWNC2oMqdt51aKuG3/GBJlUyRjY3VLxZCfucH6Z4+84oExmRpEx1QW HQPAII/jrQaSVgkcAtDWw== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V01:K0:ZjtxYxf1h60=:13zVrbKI3pCkh4OdQy+ZbZ Rc7Zo/5zcD27ykkypWOtNkHYzDMzKAhi+WvFibH6FzhewFI5axhoaIwYR+ZX/jhwucwOOSGSx NmAkD5J9DT46p7JDMSLH8X73UnjhjnesRswMZHdPcq5nlpYi5FAtfcNvGNJc7toSfah21Pbxz MUA9iKWvj9yWNYbwu0y3Yg8ff2yaZL86wnfKfghobDotC+ddxJpX9Tgcry1NVUOlujz7h35Ty mXo/AWzeR7q3fqHRsE41eIQ1G6FzfxuqtOfEJjZ51mSNY3OIMoO3anSHJ/l1lqkLBKIGdeYw7 am+gQdXk2sZrdsFsiB3hpmEje1dcLANI81kNokudodehjdkj7vAOCyE2dSst6jk8gtn1tx/Z5 r1ffbNLMflhPWhQEbTLS4sZAQ8slmnt3r+M2q6SS+YloNy97omgudAvXt/JvtesF6wTUg8wua k3oqDMrYWORbrGTg54ATGI0aJftRx15cj9hF9iaDp8ZX3XS37oFJplPzaTwj8XiF1JHmZymL9 1pyKlgCJ93RbH11dCnezJqO6a1Be0bQN47yturEXJpBNMpJbOrsxyhoDDeyw9yVLF5EjhIO8O wsQ928qr/kaFxklu3Ke/xiOVXbtPKUYQGGU/9Jp7U45soOu3dk/jSrlchhlIxBEcdQZneDZNC AWGWDtXLQ267yH5bWIIJMe5kN7J4tVnWvPbFdbRyIqrhx+D6HwVVrZWtJg+aP55RBhEyLlQAT 1ApSenbxAwiRRt6OFV4J7Z5u7Dfub2ZIwTMnYLEYpIqeOAXcoROs31Gbz5CU10eysvW856GAr +V2G47+5i9uvVchlQv2LNKnakxXoeSGXDjqLrNZEu4bI1dz6F8e3+Cr5FUih6xMmkh+qTF23t M4g3BB+IS1OwxTf2iC4ACflw9HB377LmwdOMlXOjMy/2Jj+jxDEIX6X6Eut+7K Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Nov 12, 2018, at 11:33 PM, Qian Cai wrote: > > > >> On Nov 10, 2018, at 9:11 AM, Qian Cai wrote: >> >> On 11/10/18 at 8:59 AM, Waiman Long wrote: >> >>> On 11/09/2018 08:45 PM, Qian Cai wrote: >>>>> Sent: Friday, November 09, 2018 at 5:08 PM >>>>> From: "Waiman Long" >>>>> To: "Qian Cai" , "Yang Shi" >>>>> Cc: "open list" , "Thomas Gleixner" , "Arnd Bergmann" , "Joel Fernandes (Google)" , "Zhong Jiang" >>>>> Subject: Re: ODEBUG: Out of memory. ODEBUG disabled >>>>> >>>>> On 11/09/2018 04:51 PM, Qian Cai wrote: >>>>>>> On Nov 9, 2018, at 4:42 PM, Yang Shi wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 11/9/18 1:36 PM, Qian Cai wrote: >>>>>>>> It is a bit annoying on this aarch64 server with 64 CPUs that is >>>>>>>> booting the latest mainline (3541833fd1f2) causes object debugging >>>>>>>> always running out of memory. >>>>>>> May you please paste the detail failure log? >>>>>> I assume you mean dmesg. >>>>>> >>>>>> Here is the dmesg for 64 CPUs, >>>>>> https://paste.ubuntu.com/p/BnhvXXhn7k/ >>>>>>>> I have to boot the kernel with only 16 CPUs instead (nr_cpus=16) >>>>>>>> to make it work. Is it expected that object debugging is not going >>>>>>>> to work with large machines? >>>>>>> I don't think so. I'm supposed it works well with large CPU number on x86. >>>>>> Here is the one with nr_cpus workaround, >>>>>> https://paste.ubuntu.com/p/qMpd2CCPSV/ >>>>> The debugobjects code have a set of 1024 statically allocated debug >>>>> objects that can be used in early boot before the slab memory allocator >>>>> is initialized. Apparently, the system may have used up all the >>>>> statically allocated objects. Try double ODEBUG_POOL_SIZE to see if it >>>>> helps. >>>> Great, you are right. Doubling the size makes it work. Does it make sense >>>> to have a kconfig option instead? >>> >>> First, I think you need to figure out what your system needed to use up >>> so many debug objects in early boot. If there is a legitimate reason for >>> this behavior, we can talk about having a kconfig option to increase that. >> Anybody else not getting ODEBUG OOM with more than 64-CPU? As >> mentioned, restricting to 16-CPU works fine. How can I figure out why the >> system uses so much debug objects? > On another aarch64 server with 256-CPU, even double the size of > ODEBUG_POOL_SIZE, i.e., 2048 will get "ODEBUG: Out of memory. ODEBUG > disabled”. OK, here is the problem. In order to get aarch64 work, the initial ODEBUG_POOL_SIZE on 64-CPU: need 2048 256-CPU: need 8192 (4096 too small) This commit 97dd552eb23c + * Increase the thresholds for allocating and freeing objects + * according to the number of possible CPUs available in the system. + */ + debug_objects_pool_size += num_possible_cpus() * 32; Why magic number 32? It needs to be bigger than that for aarch64. (2048 + 64 x 32 - 1024) / 64 = 48 (work on 64-cpu) (4096 + 256 x 32 - 1024) / 256 = 48 (not work on 256-cpu) (8196 + 256 x 32 - 1024) / 256 = 60 (work on 256-cpu)