Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756463AbbDJRvW (ORCPT ); Fri, 10 Apr 2015 13:51:22 -0400 Received: from mx1.redhat.com ([209.132.183.28]:60452 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751689AbbDJRvT (ORCPT ); Fri, 10 Apr 2015 13:51:19 -0400 Date: Fri, 10 Apr 2015 19:50:48 +0200 From: Oleg Nesterov To: Kirill Tkhai Cc: linux-kernel@vger.kernel.org, Andrew Morton , Ingo Molnar , Peter Zijlstra , Michal Hocko , Rik van Riel , Ionut Alexa , Peter Hurley , Kirill Tkhai Subject: Re: [PATCH] exit: Use read lock for do_notify_parent() instead of write lock Message-ID: <20150410175048.GA23971@redhat.com> References: <1428602348.12166.29.camel@parallels.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1428602348.12166.29.camel@parallels.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3241 Lines: 100 Kirill, I'll try to read this patch tomorrow, currently I am hopelessly buried in user-space programming :/ But I have to admit that so far I dislike this patch very much. It adds a lot of complications and for what? Yes, yes, yes. tasklist_lock is another BKL and must die. We need the per-process lock. Until then I do not think the hacks like this make any sense, unless you have the "real" workload with before/after performance numbers. On 04/09, Kirill Tkhai wrote: > > I suggest to execute do_notify_parent() under read_lock(). It allows more tasks > to use it in parallel. Read lock gives enough guarantees for us: child's parent > won't change during the notification. Well, write_unlock() + read_lock() is not nice too... > include/asm-generic/qrwlock.h: > static inline void queue_reduce_locked_write_to_read(struct qrwlock *lock) > { > smp_mb__before_atomic(); > atomic_add(_QR_BIAS - _QW_LOCKED, &lock->cnts); > } Yes, downgrade() will be better. Still, this only removes do_notify_parent() from the write_lock'ed section. (lets ignore kill_orphaned_pgrp(), we want to make will_become_orphaned_pgrp lockless. Look at get_signal). And this changes the rules: currently ->exit_state is stable under read_lock, except -> EXIT_DEAD transition. OK, this is probably fine, but we need to recheck. At least this was certainly wrong some time before iirc. > @@ -594,7 +597,10 @@ static void exit_notify(struct task_struct *tsk, int group_dead) > > write_lock_irq(&tasklist_lock); > forget_original_parent(tsk, &dead); > + tsk->exit_state = EXIT_NOTIFY; > + write_unlock_irq(&tasklist_lock); And unless I missed something this EXIT_NOTIFY turns the concurrent do_wait() into the busy-wait loop. Now suppose that CONFIG_SMP=n and the rt parent preempts the exiting child right after it drops tasklist: deadlock? > + read_lock(&tasklist_lock); > if (group_dead) > kill_orphaned_pgrp(tsk->group_leader, NULL); > > @@ -612,13 +618,14 @@ static void exit_notify(struct task_struct *tsk, int group_dead) > } > > tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE; This needs WRITE_ONCE(). Otherwise gcc can do, say, tsk->exit_state = EXIT_ZOMBIE; if (autoreap) tsk->exit_state = EXIT_DEAD; which will lead to kernel crash (both parent and child can release this task). > - if (tsk->exit_state == EXIT_DEAD) > + smp_wmb(); /* Pairs with read_lock() in do_wait() */ Why? this barries looks unnecessary. OTOH. We need to set EXIT_XXX before __wake_up_parent(). OK, OK, we do not because of the busy-wait loop, but busy-wait is not an option. > @@ -1317,6 +1324,13 @@ static int wait_consider_task(struct wait_opts *wo, int ptrace, > return 0; > } > > + if (unlikely(exit_state == EXIT_NOTIFY)) { > + if (wo->wo_flags & WNOHANG) > + return 0; > + read_unlock(&tasklist_lock); > + return -REPEAT_DOWAIT; > + } No, no, no. If you do something like this, please (ab)use wo->notask_error. And wait_consider_task() should continue after that, Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/