Received: by 2002:a17:90a:8504:0:0:0:0 with SMTP id l4csp412716pjn; Wed, 23 Oct 2019 01:33:18 -0700 (PDT) X-Google-Smtp-Source: APXvYqyEZmSWuRYTmEvXcI/kaBeZDjTAxvaIsxl2f3AbvIVbPuha1nkFZAiBId8j59cf+K+wY9CO X-Received: by 2002:a17:907:213c:: with SMTP id qo28mr16966285ejb.43.1571819598254; Wed, 23 Oct 2019 01:33:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571819598; cv=none; d=google.com; s=arc-20160816; b=OlumQmg29TaQaTg7ixDqpt5Zaim/Hmi7bOt+F+pVkqEuqUn/DjWJvFT1hWIDzf8dGh 7ZyVS1WyyEm0pFmaCkRKHs6ARlRY6v27HQ4jU1ncyq5Slv2lUFfIryYobumPsooawJ3L xXYw1+/3ylpEy0tkd4WnpL7jvb+WryTSEEbRFXlT2T79QobXPUFnX9mseiCPUn0FQPIE xBLiK4W/PRHR5hfNSpAMZn9zmL0iTd6qRY4fd8oVOuSs/p+rTKRD5BF6q7z4P9qg1zX9 z2fLJni+KCnBM8n3N3jPq5DHAgEUi9D1/6Ab07P9UQmsuM+RQBECHPLjm2LnRKEGYvLb G9vw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=XaIvGRFbKnot8hm4jz5aVmmA19296Y5pV4KrKMe9Sw0=; b=sreJ6C3yJ4f2d2N+rBj2S+QVmC75nR+fJkynj/kscC/4jG1bgnlRCF/SsGJq55xag0 6v1zYDFoR+AT6BjWQebplYVA+Sr73SKttkPc/MOoWh+/ZMEIIUc0WolS24dC7HArn0SV 69BE6gDySRn1JDzxmrYjc7t2Y3plE9AGbMwdfHLSjVLWxcH2UH+sV3MHpuE9wdpvfpey 0vzFiZJA3wSztndVTYWCRJXozObJEQyxD5fYXTa7VZfg1/2lyDNntz5QnH1qjXf79eYl 91dzhNHMaWt2ZtrYZdBDeh0AQFM0KezrlwF9TN/pKzfaOmZk5bt0nmZ8WbPuTl2qkKU6 Q4pQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v35si6048684edm.213.2019.10.23.01.32.54; Wed, 23 Oct 2019 01:33:18 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390320AbfJWIbu (ORCPT + 99 others); Wed, 23 Oct 2019 04:31:50 -0400 Received: from outbound-smtp22.blacknight.com ([81.17.249.190]:46920 "EHLO outbound-smtp22.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2390165AbfJWIbt (ORCPT ); Wed, 23 Oct 2019 04:31:49 -0400 Received: from mail.blacknight.com (pemlinmail06.blacknight.ie [81.17.255.152]) by outbound-smtp22.blacknight.com (Postfix) with ESMTPS id 19574B86F3 for ; Wed, 23 Oct 2019 09:31:46 +0100 (IST) Received: (qmail 2938 invoked from network); 23 Oct 2019 08:31:45 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.19.210]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 23 Oct 2019 08:31:45 -0000 Date: Wed, 23 Oct 2019 09:31:43 +0100 From: Mel Gorman To: Michal Hocko Cc: Waiman Long , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Johannes Weiner , Roman Gushchin , Vlastimil Babka , Konstantin Khlebnikov , Jann Horn , Song Liu , Greg Kroah-Hartman , Rafael Aquini , Mel Gorman Subject: Re: [PATCH] mm/vmstat: Reduce zone lock hold time when reading /proc/pagetypeinfo Message-ID: <20191023083143.GC3016@techsingularity.net> References: <20191022162156.17316-1-longman@redhat.com> <20191022165745.GT9379@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20191022165745.GT9379@dhcp22.suse.cz> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 22, 2019 at 06:57:45PM +0200, Michal Hocko wrote: > [Cc Mel] > > On Tue 22-10-19 12:21:56, Waiman Long wrote: > > The pagetypeinfo_showfree_print() function prints out the number of > > free blocks for each of the page orders and migrate types. The current > > code just iterates the each of the free lists to get counts. There are > > bug reports about hard lockup panics when reading the /proc/pagetyeinfo > > file just because it look too long to iterate all the free lists within > > a zone while holing the zone lock with irq disabled. > > > > Given the fact that /proc/pagetypeinfo is readable by all, the possiblity > > of crashing a system by the simple act of reading /proc/pagetypeinfo > > by any user is a security problem that needs to be addressed. > > Should we make the file 0400? It is a useful thing when debugging but > not something regular users would really need for life. > I think this would be useful in general. The information is not that useful outside of debugging. Even then it's only useful when trying to get a handle on why a path like compaction is taking too long. > > There is a free_area structure associated with each page order. There > > is also a nr_free count within the free_area for all the different > > migration types combined. Tracking the number of free list entries > > for each migration type will probably add some overhead to the fast > > paths like moving pages from one migration type to another which may > > not be desirable. > > Have you tried to measure that overhead? > I would prefer this option not be taken. It would increase the cost of watermark calculations which is a relatively fast path. > > we can actually skip iterating the list of one of the migration types > > and used nr_free to compute the missing count. Since MIGRATE_MOVABLE > > is usually the largest one on large memory systems, this is the one > > to be skipped. Since the printing order is migration-type => order, we > > will have to store the counts in an internal 2D array before printing > > them out. > > > > Even by skipping the MIGRATE_MOVABLE pages, we may still be holding the > > zone lock for too long blocking out other zone lock waiters from being > > run. This can be problematic for systems with large amount of memory. > > So a check is added to temporarily release the lock and reschedule if > > more than 64k of list entries have been iterated for each order. With > > a MAX_ORDER of 11, the worst case will be iterating about 700k of list > > entries before releasing the lock. > > But you are still iterating through the whole free_list at once so if it > gets really large then this is still possible. I think it would be > preferable to use per migratetype nr_free if it doesn't cause any > regressions. > I think it will. The patch as it is contains the overhead within the reader of the pagetypeinfo proc file which is a non-critical path. The page allocator paths on the other hand is very important. -- Mel Gorman SUSE Labs