Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp2691504pxb; Tue, 9 Mar 2021 08:35:54 -0800 (PST) X-Google-Smtp-Source: ABdhPJxiAtujxtxwQRI12XtcWGVc9MsQzr2eX5MUD/CXFaMWH1TPiEfg25Q+5+jkCWn5S9m8aztr X-Received: by 2002:a17:907:3e8c:: with SMTP id hs12mr21409488ejc.105.1615307754785; Tue, 09 Mar 2021 08:35:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1615307754; cv=none; d=google.com; s=arc-20160816; b=dwlWMSrbNv6Q0OmwKZpmgS/aflsf+ahsEnegT06IyD+QT9QArWgfXCIGHi5Son6Ra6 XGoXhAYS+PZz63mjnh8DgzzOPfxI2ool9BNm8SAH56BUwFRoeXKUqDrmBpB+h9DbgL9g tVpjZYU04hX0VnOWjvnmivdr7IPP8qfbt6mNByILeHgl1fGpuyahqsSq97k5uQCYrRrh NIZ76sjUM39yJgxk6onXtuR+c4TrDWH5z1aERLhqWucpbB87tJpUOJhR5dV2bJla54xz O98hoomeu3utbEAA4PiS0WuiWcizIwj1JVQdn05xGE/wxtB7phH7i2QttMDeELwEKd2L Z3Xg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=Xcv+tTfctmNOcwxK3BnqfaoqCTlTPEg0tYmqXjRtkqc=; b=SRdWyXrbpBTdaT08sIP/TNneRM6KoLTMXuqAJQ0CSrqofWSya6k7/zJ2Yz6hPr0fFX HJh8PF5zQzQTHzxR1n86/kTMENItm7ipK982P+DuXstz8UZDuXkrYLfJ8Ax82Drys8AG Ol6rvdLW8bRY6BOxJ5Yq9lfZ6nVG4/SPYPDbp7o4kvujhrsr+og9wEol5qym/WXnKASC T0N6Ii3x0sMGj8Y338vIa95d43aYsb7lUHBsVAzOsnh/6shga6KQytUhtBP/1btSTnYg UTpEx1woIzXIHPMpIWSk6lf8eWgDE6LjDX3xawJtpUujV6LELNqvGhpGnYlDcIksy9TN 4Raw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=JD1vnHhw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=suse.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p3si9880685ejd.319.2021.03.09.08.35.31; Tue, 09 Mar 2021 08:35:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=JD1vnHhw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230303AbhCIQc3 (ORCPT + 99 others); Tue, 9 Mar 2021 11:32:29 -0500 Received: from mx2.suse.de ([195.135.220.15]:58620 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231700AbhCIQcK (ORCPT ); Tue, 9 Mar 2021 11:32:10 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1615307529; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Xcv+tTfctmNOcwxK3BnqfaoqCTlTPEg0tYmqXjRtkqc=; b=JD1vnHhwYyCf1KuBUErPu7BJVgBYoNZkqQBFd+v0oVdMHUUPz+FyGqfYXjY7W89+jF0/As ahXVuTf4p/GfYllcMoS3jHmjNFOI/2eBBkBj2zR5k577e9mCUuzWXkEpFFAQ3v8YbY0PAa 6ZZk/x5aRl5h2vLyRXOHaaZLgVsmOQA= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 5604BAEBD; Tue, 9 Mar 2021 16:32:09 +0000 (UTC) Date: Tue, 9 Mar 2021 17:32:08 +0100 From: Michal Hocko To: Minchan Kim Cc: Andrew Morton , linux-mm , LKML , John Dias , David Hildenbrand , Jason Baron Subject: Re: [PATCH v2] mm: page_alloc: dump migrate-failed pages Message-ID: References: <20210308202047.1903802-1-minchan@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 09-03-21 08:15:41, Minchan Kim wrote: > On Tue, Mar 09, 2021 at 10:32:51AM +0100, Michal Hocko wrote: > > On Mon 08-03-21 12:20:47, Minchan Kim wrote: > > > alloc_contig_range is usually used on cma area or movable zone. > > > It's critical if the page migration fails on those areas so > > > dump more debugging message. > > > > I disagree with this statement. alloc_contig_range is not a reliable > > allocator. Any user, be it CMA or direct users of alloc_contig_range > > have to deal with allocation failures. Debugging information can be > > still useful but considering migration failures critical is > > overstatement to say the least. > > Fair enough. Let's change it. > > "Currently, debugging CMA allocation failure is too hard > due to lacking of page information. alloc_contig_range is > proper place to dump them since it has migrate-failed page > list." "Currently, debugging CMA allocation failures is quite limited. The most commong source of these failures seems to be page migration which doesn't provide any useful information on the reason of the failure by itself. alloc_contig_range can report those failures as it holds a list of migrate-failed pages." > > > page refcount, mapcount with page flags on dump_page are > > > helpful information to deduce the culprit. Furthermore, > > > dump_page_owner was super helpful to find long term pinner > > > who initiated the page allocation. > > > > > > Admin could enable the dump like this(by default, disabled) > > > > > > echo "func dump_migrate_failure_pages +p" > control > > > > > > Admin could disable it. > > > > > > echo "func dump_migrate_failure_pages =_" > control > > > > My original idea was to add few pr_debug and -DDYNAMIC_DEBUG_MODULE for > > page_alloc.c. It makes sense to enable a whole bunch at once though. > > The naming should better reflect this is alloc_contig_rage related > > because the above sounds like a generic migration failure thing. > > alloc_contig_dump_pages? Yes this is more clear. > > Somebody more familiar with the dynamic debugging infrastructure needs > > to have a look but from from a quick look it seems ok. > > > > Do we really need all the ugly ifdefery, though? Don't we want to have > > this compiled in all the time and just rely on the static branch managed > > by the dynamic debugging framework? > > I have no further idea to make it simple while we keep the flexibility > for arguments and print format. > > #if defined(CONFIG_DYNAMIC_DEBUG) || \ > (defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE)) > static void alloc_contig_dump_pages(struct list_head *page_list) > { > static DEFINE_RATELIMIT_STATE(_rs, > DEFAULT_RATELIMIT_INTERVAL, > DEFAULT_RATELIMIT_BURST); > > DEFINE_DYNAMIC_DEBUG_METADATA(descriptor, > "migrate failure"); > if (DYNAMIC_DEBUG_BRANCH(descriptor) && __ratelimit(&_rs)) { > struct page *page; > > WARN(1, "failed callstack"); > list_for_each_entry(page, page_list, lru) > dump_page(page, "migration failure"); > } > } > #else > static inline void alloc_contig_dump_pages(struct list_head *page_list) > { > } > #endif First, you would be much better off by droping the rate limitting. I am nt really convinced this is really necessary as this is a debugging aid enabled on request. A single list can be large enough to swamp logs so why bother? Also are all those CONFIG_DYNAMIC_DEBUG* ifdefs necessary? Can we simply enable DYNAMIC_DEBUG for page_alloc as I've suggested above? -- Michal Hocko SUSE Labs