Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752164AbbFNHhG (ORCPT ); Sun, 14 Jun 2015 03:37:06 -0400 Received: from mail-wg0-f54.google.com ([74.125.82.54]:34241 "EHLO mail-wg0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751537AbbFNHg6 (ORCPT ); Sun, 14 Jun 2015 03:36:58 -0400 Date: Sun, 14 Jun 2015 09:36:52 +0200 From: Ingo Molnar To: Oleg Nesterov Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andy Lutomirski , Andrew Morton , Denys Vlasenko , Brian Gerst , Peter Zijlstra , Borislav Petkov , "H. Peter Anvin" , Linus Torvalds , Thomas Gleixner , Waiman Long , "Paul E. McKenney" Subject: Re: [PATCH 02/12] x86/mm/hotplug: Remove pgd_list use from the memory hotplug code Message-ID: <20150614073652.GA5923@gmail.com> References: <1434188955-31397-1-git-send-email-mingo@kernel.org> <1434188955-31397-3-git-send-email-mingo@kernel.org> <20150613192454.GA1735@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150613192454.GA1735@redhat.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3224 Lines: 82 * Oleg Nesterov wrote: > On 06/13, Ingo Molnar wrote: > > > > @@ -169,29 +169,40 @@ void sync_global_pgds(unsigned long start, unsigned long end, int removed) > > > > for (address = start; address <= end; address += PGDIR_SIZE) { > > const pgd_t *pgd_ref = pgd_offset_k(address); > > - struct page *page; > > + struct task_struct *g, *p; > > > > /* > > - * When it is called after memory hot remove, pgd_none() > > - * returns true. In this case (removed == 1), we must clear > > - * the PGD entries in the local PGD level page. > > + * When this function is called after memory hot remove, > > + * pgd_none() already returns true, but only the reference > > + * kernel PGD has been cleared, not the process PGDs. > > + * > > + * So clear the affected entries in every process PGD as well: > > */ > > if (pgd_none(*pgd_ref) && !removed) > > continue; > > > > - spin_lock(&pgd_lock); > > - list_for_each_entry(page, &pgd_list, lru) { > > + spin_lock(&pgd_lock); /* Implies rcu_read_lock() for the task list iteration: */ > ^^^^^^^^^^^^^^^^^^^^^^^ > > Hmm, but it doesn't if PREEMPT_RCU? No, no, I do not pretend I understand how it > actually works ;) But, say, rcu_check_callbacks() can be called from irq and > since spin_lock() doesn't increment current->rcu_read_lock_nesting this can lead > to rcu_preempt_qs()? No, RCU grace periods are still defined by 'heavy' context boundaries such as context switches, entering idle or user-space mode. PREEMPT_RCU is like traditional RCU, except that blocking is allowed within the RCU read critical section - that is why it uses a separate nesting counter (current->rcu_read_lock_nesting), not the preempt count. But if a piece of kernel code is non-preemptible, such as a spinlocked region or an irqs-off region, then those are still natural RCU read lock regions, regardless of the RCU model, and need no additional RCU locking. rcu_check_callbacks() can be called from irq context, but only to observe whether the current CPU is in quiescent state. If it interrupts a spinlocked region it won't register a quiesent state. > > + for_each_process_thread(g, p) { > > + struct mm_struct *mm; > > pgd_t *pgd; > > spinlock_t *pgt_lock; > > > > - pgd = (pgd_t *)page_address(page) + pgd_index(address); > > - /* the pgt_lock only for Xen */ > > - pgt_lock = &pgd_page_get_mm(page)->page_table_lock; > > + task_lock(p); > > + mm = p->mm; > > + if (!mm) { > > + task_unlock(p); > > + continue; > > + } > > Again, you can simplify this code and avoid for_each_process_thread() if you use > for_each_process() + find_lock_task_mm(). True! So I looked at this when you first mentioned it but mis-read find_lock_task_mm(), which as you insist is exactly what this iteration needs to become faster and simpler. Thanks for the reminder - I have fixed it, will be part of -v3. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/