Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754251AbbGAPHY (ORCPT ); Wed, 1 Jul 2015 11:07:24 -0400 Received: from mail-qk0-f179.google.com ([209.85.220.179]:34327 "EHLO mail-qk0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752119AbbGAPHT (ORCPT ); Wed, 1 Jul 2015 11:07:19 -0400 Date: Wed, 1 Jul 2015 11:07:08 -0400 From: Jerome Glisse To: Mark Hairgrove Cc: "akpm@linux-foundation.org" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , Linus Torvalds , "joro@8bytes.org" , Mel Gorman , "H. Peter Anvin" , Peter Zijlstra , Andrea Arcangeli , Johannes Weiner , Larry Woodman , Rik van Riel , Dave Airlie , Brendan Conoboy , Joe Donohue , Duncan Poole , Sherry Cheung , Subhash Gutti , John Hubbard , Lucien Dunning , Cameron Buschardt , Arvind Gopalakrishnan , Haggai Eran , Shachar Raindel , Liran Liss , Roland Dreier , Ben Sander , Greg Stoner , John Bridgman , Michael Mantor , Paul Blinzer , Laurent Morichetti , Alexander Deucher , Oded Gabbay , =?iso-8859-1?B?Suly9G1l?= Glisse , Jatin Kumar Subject: Re: [PATCH 06/36] HMM: add HMM page table v2. Message-ID: <20150701150707.GA9313@gmail.com> References: <1432236705-4209-1-git-send-email-j.glisse@gmail.com> <1432236705-4209-7-git-send-email-j.glisse@gmail.com> <20150626163030.GA3748@gmail.com> <20150629144305.GA2173@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4143 Lines: 98 On Tue, Jun 30, 2015 at 07:51:12PM -0700, Mark Hairgrove wrote: > On Mon, 29 Jun 2015, Jerome Glisse wrote: > > [...] > > > > Iterator is what protect against concurrent freeing of the directory so it > > has to return to caller on directory boundary (for 64bits arch with 64bits > > pte it has return every 512 entries). Otherwise pt_iter_fini() would have > > to walk over the whole directory range again just to drop reference and this > > doesn't sound like a good idea. > > I don't understand why it would have to return to the caller to unprotect > the directory. The iterator would simply drop the reference to the > previous directory, take a reference on the next one, and keep searching > for a valid entry. > > Why would pt_iter_fini have to walk over the entire range? The iterator > would keep at most one directory per level referenced. _fini would walk > the per-level ptd array and unprotect each level, the same way it does > now. I think here we are just misunderstanding each other. I am saying that iterator have to return on directory boundary (ie when switching from one directory to the next). The return part is not only for protection it is also by design because iterator function should not test the page table entry as different code path have different synchronization requirement. > > So really with what you are asking it whould be: > > > > hmm_pt_iter_init(&iter, start, end); > > for(next=pt_iter_next(&iter,&ptep); next > { > > // Here ptep is valid until next address. Above you have to call > > // pt_iter_next() to switch to next directory. > > addr = max(start, next - (~HMM_PMD_MASK + 1)); > > for (; addr < next; addr += PAGE_SIZE, ptep++) { > > // access ptep > > } > > } > > > > My point is that internally pt_iter_next() will do the exact same test it is > > doing now btw cur and addr. Just that the addr is no longer explicit but iter > > infer it. > > But this way, the iteration across directories is more efficient because > the iterator can simply walk the directory array. Take a directory that > has one valid entry at the very end. The existing iteration will do this: > > hmm_pt_iter_next(dir_addr[0], end) > Walk up the ptd array > Compute level start and end and compare them to dir_addr[0] > Compute dir_addr[1] using addr and pt->mask > Return dir_addr[1] > hmm_pt_iter_update(dir_addr[1]) > Walk up the ptd array, compute level start and end > Compute level index of dir_addr[1] > Read entry for dir_addr[1] > Return NULL > hmm_pt_iter_next(dir_addr[1], end) > ... > And so on 511 times until the last entry is read. > > This is really more suited to a for loop iteration, which it could be if > this were fully contained within the _next call. No, existing code does not necessarily do that. Current use pattern is : for (addr = start; addr < end;) { ptep = hmm_pt_iter_update(iter, addr); if (!ptep) { addr = hmm_pt_iter_next(iter, addr, end); continue; } next = hmm_pt_level_next(pt, addr, end); for (; addr < next; addr += PAGE_SIZE, ptep++) { // Process addr using ptep. } } The inner loop is on directory boundary ie 512 entries on 64bits arch. It is that way because on some case you do not want the iterator to control the address, the outer loop might be accessing several different mirror page table each might have different gap. So you really want to have explicit address provided to the iterator function. Also iterator can not really test for valid entry as locking requirement and synchronization with other thread is different depending on which code path is walking the page table. So testing inside the iterator function is kind of pointless has the performed test might be no longer revealent by the time it returns pointer and address to the caller. Cheers, J?r?me -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/