Received: by 2002:a25:23cc:0:0:0:0:0 with SMTP id j195csp1498716ybj; Tue, 5 May 2020 22:58:14 -0700 (PDT) X-Google-Smtp-Source: APiQypLf0MtX0VP7V6/6iwOd3Dx4tVkIOTNNCBk8LN5/C51bewDgA67/iRrpE4A3/noZDrq2tRiU X-Received: by 2002:a17:907:1119:: with SMTP id qu25mr5600860ejb.291.1588744694264; Tue, 05 May 2020 22:58:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588744694; cv=none; d=google.com; s=arc-20160816; b=Qwaej89sgdQYWI9R3pY35T4WCoXo1KQdEL6njqqDZ8eB+8LmjLdVmFGnWxgLWiTGE/ kCM26cEkyrfMGNs4Afa6fdFfmGCR9OfLOtQ2QICXyOcUjvbOdKG6rKaVV84XxaJ42gaH Z75XTNjeD3JySxKkYsw9ZhLmMV8tSk0R/Trw45tN6reNm/Kd/IxxSNfLmy7ki2gaxy6R 0dOhA/oxfsXWEl5Zt/o9edPS+7GA6IQHl2qWRk28k1SX/4qicj6Qr/2NpuQ++1exZy1B xRvRu/dKl9NEzRqdHkXFza+arGr2GI20l2byHtMENGtxe12atmv/tTZ7zr9tKxBWurXj BSGA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=sl4voKL7gKELkVzevbqQNkOxzEzY5tVWRy9dTEzJtBw=; b=k1HKeJn2unX+VjUyZ2Cg314g/ED6DDqFArOuFyTfooOETmFckL1fWvMsSUnp/iv1+g gKXM2n7suuw7F1GP7AL9cZ1fNn6k8vcsogq70cnanpj4QY6Dg6G0uqVMQ5M0Wnz52wL+ fLi8pYuvUwfz7XUF5plwWG6f+HKlCQAIu9Rjg/hfGp/2TnhAu6xjoDz8RJ7jeupi28UN IxswO9URZjlXZNVBeLM3GQ4t3AQC3GPPtprA6Oyw2zQAQo3wdUdtWilASrQdjwwuUb1l RiTvi1gudjFN7ip7HooI3AuIe7EUi8CgRZhV111FTm1oMDdsKN9WKD5J6QRj/HzBNCv5 hA2A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g22si592613ejr.228.2020.05.05.22.57.30; Tue, 05 May 2020 22:58:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727794AbgEFFxd (ORCPT + 99 others); Wed, 6 May 2020 01:53:33 -0400 Received: from lgeamrelo12.lge.com ([156.147.23.52]:53637 "EHLO lgeamrelo11.lge.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725771AbgEFFxc (ORCPT ); Wed, 6 May 2020 01:53:32 -0400 X-Greylist: delayed 1801 seconds by postgrey-1.27 at vger.kernel.org; Wed, 06 May 2020 01:53:30 EDT Received: from unknown (HELO lgeamrelo02.lge.com) (156.147.1.126) by 156.147.23.52 with ESMTP; 6 May 2020 14:23:27 +0900 X-Original-SENDERIP: 156.147.1.126 X-Original-MAILFROM: iamjoonsoo.kim@lge.com Received: from unknown (HELO localhost) (10.177.220.187) by 156.147.1.126 with ESMTP; 6 May 2020 14:23:27 +0900 X-Original-SENDERIP: 10.177.220.187 X-Original-MAILFROM: iamjoonsoo.kim@lge.com Date: Wed, 6 May 2020 14:23:27 +0900 From: Joonsoo Kim To: "Eric W. Biederman" Cc: Andrew Morton , Linux Memory Management List , LKML , Vlastimil Babka , Laura Abbott , "Aneesh Kumar K . V" , Mel Gorman , Michal Hocko , Johannes Weiner , Roman Gushchin , Minchan Kim , Rik van Riel , Christian Koenig , Huang Rui , "Rafael J . Wysocki" , Pavel Machek , kernel-team@lge.com, Christoph Hellwig , Kexec Mailing List Subject: Re: [PATCH v2 03/10] kexec: separate PageHighMem() and PageHighMemZone() use case Message-ID: <20200506052327.GA25974@js1304-desktop> References: <1588130803-20527-1-git-send-email-iamjoonsoo.kim@lge.com> <1588130803-20527-4-git-send-email-iamjoonsoo.kim@lge.com> <87h7wzvjko.fsf@x220.int.ebiederm.org> <87ftcfpzjn.fsf@x220.int.ebiederm.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <87ftcfpzjn.fsf@x220.int.ebiederm.org> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 04, 2020 at 09:03:56AM -0500, Eric W. Biederman wrote: > > I have added in the kexec mailling list. > > Looking at the patch we are discussing it appears that the kexec code > could be doing much better in highmem situations today but is not. Sound great! > > > Joonsoo Kim writes: > > > 2020년 5월 1일 (금) 오후 11:06, Eric W. Biederman 님이 작성: > >> > >> js1304@gmail.com writes: > >> > >> > From: Joonsoo Kim > >> > > >> > Until now, PageHighMem() is used for two different cases. One is to check > >> > if there is a direct mapping for this page or not. The other is to check > >> > the zone of this page, that is, weather it is the highmem type zone or not. > >> > > >> > Now, we have separate functions, PageHighMem() and PageHighMemZone() for > >> > each cases. Use appropriate one. > >> > > >> > Note that there are some rules to determine the proper macro. > >> > > >> > 1. If PageHighMem() is called for checking if the direct mapping exists > >> > or not, use PageHighMem(). > >> > 2. If PageHighMem() is used to predict the previous gfp_flags for > >> > this page, use PageHighMemZone(). The zone of the page is related to > >> > the gfp_flags. > >> > 3. If purpose of calling PageHighMem() is to count highmem page and > >> > to interact with the system by using this count, use PageHighMemZone(). > >> > This counter is usually used to calculate the available memory for an > >> > kernel allocation and pages on the highmem zone cannot be available > >> > for an kernel allocation. > >> > 4. Otherwise, use PageHighMemZone(). It's safe since it's implementation > >> > is just copy of the previous PageHighMem() implementation and won't > >> > be changed. > >> > > >> > I apply the rule #2 for this patch. > >> > >> Hmm. > >> > >> What happened to the notion of deprecating and reducing the usage of > >> highmem? I know that we have some embedded architectures where it is > >> still important but this feels like it flies in the face of that. > > > > AFAIK, deprecating highmem requires some more time and, before then, > > we need to support it. > > But it at least makes sense to look at what we are doing with highmem > and ask if it makes sense. > > >> This part of kexec would be much more maintainable if it had a proper > >> mm layer helper that tested to see if the page matched the passed in > >> gfp flags. That way the mm layer could keep changing and doing weird > >> gyrations and this code would not care. > > > > Good idea! I will do it. > > > >> > >> What would be really helpful is if there was a straight forward way to > >> allocate memory whose physical address fits in the native word size. > >> > >> > >> All I know for certain about this patch is that it takes a piece of code > >> that looked like it made sense, and transfroms it into something I can > >> not easily verify, and can not maintain. > > > > Although I decide to make a helper as you described above, I don't > > understand why you think that a new code isn't maintainable. It is just > > the same thing with different name. Could you elaborate more why do > > you think so? > > Because the current code is already wrong. It does not handle > the general case of what it claims to handle. When the only distinction > that needs to be drawn is highmem or not highmem that is likely fine. > But now you are making it possible to draw more distinctions. At which > point I have no idea which distinction needs to be drawn. > > > The code and the logic is about 20 years old. When it was written I > don't recally taking numa seriously and the kernel only had 3 zones > as I recall (DMA aka the now deprecated GFP_DMA, NORMAL, and HIGH). > > The code attempts to work around limitations of those old zones amd play > nice in a highmem world by allocating memory HIGH memory and not using > it if the memory was above 4G ( on 32bit ). > > Looking the kernel now has GFP_DMA32 so on 32bit with highmem we should > probably be using that, when allocating memory. > From quick investigation, unfortunately, ZONE_DMA32 isn't available on x86 32bit now so using GFP_DMA32 to allocate memory below 4G would not work. Enabling ZONE_DMA32 on x86 32bit would be not simple, so, IMHO, it would be better to leave the code as it is. > > > Further in dealing with this memory management situation we only > have two situations we call kimage_alloc_page. > > For an indirect page which must have a valid page_address(page). > We could probably relax that if we cared to. > > For a general kexec page to store the next kernel in until we switch. > The general pages can be in high memory. > > In a highmem world all of those pages should be below 32bit. > > > > Given that we fundamentally have two situations my sense is that we > should just refactor the code so that we never have to deal with: > > > /* The old page I have found cannot be a > * destination page, so return it if it's > * gfp_flags honor the ones passed in. > */ > if (!(gfp_mask & __GFP_HIGHMEM) && > PageHighMem(old_page)) { > kimage_free_pages(old_page); > continue; > } > > Either we teach kimage_add_entry how to work with high memory pages > (still 32bit accessible) or we teach kimage_alloc_page to notice it is > an indirect page allocation and to always skip trying to reuse the page > it found in that case. > > That way the code does not need to know about forever changing mm internals. Nice! I already have seen your patch and found that above two lines related to HIGHMEM are removed. Thanks for your help. Thanks.