Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932367AbVKUQRS (ORCPT ); Mon, 21 Nov 2005 11:17:18 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S932356AbVKUQRR (ORCPT ); Mon, 21 Nov 2005 11:17:17 -0500 Received: from cantor2.suse.de ([195.135.220.15]:35257 "EHLO mx2.suse.de") by vger.kernel.org with ESMTP id S932367AbVKUQRQ (ORCPT ); Mon, 21 Nov 2005 11:17:16 -0500 Date: Mon, 21 Nov 2005 17:17:12 +0100 Message-ID: From: Takashi Iwai To: Hugh Dickins Cc: Lee Revell , Miles Lane , Andrew Morton , LKML , alsa-devel Subject: Re: 2.6.15-rc1-mm2 -- Bad page state at free_hot_cold_page (in process 'aplay', page c18eef30) In-Reply-To: References: <1132510467.6874.144.camel@mindpipe> User-Agent: Wanderlust/2.12.0 (Your Wildest Dreams) SEMI/1.14.6 (Maruoka) FLIM/1.14.7 (=?ISO-8859-4?Q?Sanj=F2?=) APEL/10.6 MULE XEmacs/21.5 (beta18) (chestnut) (+CVS-20041021) (i386-suse-linux) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4292 Lines: 91 At Mon, 21 Nov 2005 15:46:50 +0000 (GMT), Hugh Dickins wrote: > > On Mon, 21 Nov 2005, Takashi Iwai wrote: > > > > Sorry, I still don't figure out how __GFP_COMP solved this problem. > > Could you enlighten me a bit? > > The sequence of problems was this. > > Nick's core PageReserved removal patch in 2.6.15-rc1 (and -rc2) > changed VM_RESERVED vmas never to free their pages on unmapping (e.g. > on exit) - fine for remap_pfn_range areas, but a leak where others set > VM_RESERVED; and PageReserved not to inhibit decrementing page count. > > In -rc1-mm2 I tried to fix that leak by restoring VM_RESERVED to its > previous behaviour, and using a different flag VM_UNPAGED, set in > remap_pfn_range, for the don't-free-when-unmapping behaviour. > > But there's then a problem when the underlying page was allocated > as a high-order page, but the separate individual 0-order constituent > pages are mapped into userspace by nopage: the page count of the first > 0-order is raised by allocation, but the following 0-order pages are > left with page count 0. nopage's get_page raises that to 1, > zap_pte_range (or whatever it uses to actually do the freeing) lowers > that to 0, and hence frees the page, even though it's a constituent of > the not-yet-freed high-order page. (This had not been a problem while > PageReserved was inhibiting decrementing the page count.) > > So another of my patches in -rc1-mm2 made the PageCompound technique > available always, no longer under #ifdef CONFIG_HUGETLB_PAGE: so that > get_page and put_page on the later constituents of the high-order > page get redirected to the first one, and it should work okay again. > > Except that I'd missed that you actually have to choose to have your > high-order pages supplied as compound pages, by passing __GFP_COMP. > Since I wasn't passing that, they still weren't allocated as compound > pages, so were still being freed too soon - and the PG_reserved flag > found while freeing gave rise to the "Bad page state" messages seen. I see, thanks for explanation! Now another question arises: Which is the recommended method for mmapping RAM pages, vma nopage callback or remap_pfn_range()? IIRC, in the ealier versions, the former was recommended because remap_page_range() with page-reserve was regarded as a hack. But, looking through these changes, I feel that remap_pfn_range() is better (easier and stabler) than vma nopage... > > Isn't it needed for dma_alloc_coherent() (for i386, particularly), > > too? dma_alloc_coherent() also gets pages with __get_free_pages(). > > Didn't I deal with that by adding __GFP_COMP in snd_malloc_dev_pages? Oh yes, I overlooked it. It must be fine. > And (in a separate patch run past davem and wli first, to be aggregated > with the sound/core/memalloc patch when I sign off and send to Andrew) > in the sparc and sparc64 sbus_alloc_consistent. > > It's only an issue when the high-order page might be mapped into > userspace, then its constituents freed up by zap_pte_range; or > locked down with get_user_pages and later released: when constituents > of a high-order page pass through common code designed for 0-order pages. > > > Also, I think we can remove all Set/ClearPageReserved() in memalloc.c > > now. It was there just for mmap... > > That is so, but we'd prefer you to hold off for now. The way it's > proceeding is, for 2.6.15 we're not actually removing the Set/Clear > PageReserved from any of the drivers or from any of the architecture > initialization; but PageReserved is no longer serving any functional > purpose, except where PG_reserved acts as a double-check when the page > is freed, as to whether it's all working right. Which was useful in > this case, to identify where I'd forgotten all about __GFP_COMP; and > I fear may prove useful in some other cases too. Retaining this use > of PG_reserved for now, gives greater confidence in our safety when we > later advance to removing all the Set/ClearPageReserved hocus-pocus. OK. Let's fix them right later. Takashi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/