Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932664AbbDMQeX (ORCPT ); Mon, 13 Apr 2015 12:34:23 -0400 Received: from relay.parallels.com ([195.214.232.42]:54850 "EHLO relay.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932461AbbDMQeV (ORCPT ); Mon, 13 Apr 2015 12:34:21 -0400 Message-ID: <1428942852.1467.30.camel@parallels.com> Subject: Re: [PATCH] exit: Use read lock for do_notify_parent() instead of write lock From: Kirill Tkhai To: Oleg Nesterov CC: , Andrew Morton , Ingo Molnar , Peter Zijlstra , Michal Hocko , Rik van Riel , Ionut Alexa , Peter Hurley , "Kirill Tkhai" Date: Mon, 13 Apr 2015 19:34:12 +0300 In-Reply-To: <20150410175048.GA23971@redhat.com> References: <1428602348.12166.29.camel@parallels.com> <20150410175048.GA23971@redhat.com> Organization: Parallels Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.12.9-1+b1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Originating-IP: [10.24.40.85] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5157 Lines: 144 Hi, Oleg, thanks for your review. В Пт, 10/04/2015 в 19:50 +0200, Oleg Nesterov пишет: > Kirill, > > I'll try to read this patch tomorrow, currently I am hopelessly buried > in user-space programming :/ > > But I have to admit that so far I dislike this patch very much. It adds > a lot of complications and for what? > > Yes, yes, yes. tasklist_lock is another BKL and must die. We need the > per-process lock. Until then I do not think the hacks like this make > any sense, unless you have the "real" workload with before/after > performance numbers. I don't think the complication is very huge. We add one rule about exit_state. Yes, the state becomes unstable under read_lock(), but only wait_consider_task() is affected by this. Ok, what do you mean when you're speaking about killing tasklist_lock? Can't we leave it for fork() and __unhash_process() only, but change other places which lock it for write? Every of the places will get rid of it by its own way. EXIT_NOTIFY is for do_exit(). Or you want to kill it completelly? I didn't test the patch on special workload or large SMP systems. The results for 4 CPU box (kernel compilation): [origin] 1)534.37user 32.15system 2:29.32elapsed 379%CPU (0avgtext+0avgdata 142488maxresident)k 0inputs+724264outputs (0major+23852891minor)pagefaults 0swaps 2)534.85user 32.81system 2:28.67elapsed 381%CPU (0avgtext+0avgdata 142476maxresident)k 0inputs+724264outputs (0major+23853531minor)pagefaults 0swaps [patched] 1)531.65user 32.69system 2:27.41elapsed 382%CPU (0avgtext+0avgdata 142580maxresident)k 0inputs+724256outputs (0major+23854620minor)pagefaults 0swaps 2)530.92user 32.51system 2:28.18elapsed 380%CPU (0avgtext+0avgdata 142544maxresident)k 0inputs+724256outputs (0major+23852925minor)pagefaults 0swaps My test machine has HDD, so it's not the best test for the patch. I'll try something else later. But I don't expect exciting results on workloads like this. > > On 04/09, Kirill Tkhai wrote: > > > > I suggest to execute do_notify_parent() under read_lock(). It allows more tasks > > to use it in parallel. Read lock gives enough guarantees for us: child's parent > > won't change during the notification. > > Well, write_unlock() + read_lock() is not nice too... > > > include/asm-generic/qrwlock.h: > > static inline void queue_reduce_locked_write_to_read(struct qrwlock *lock) > > { > > smp_mb__before_atomic(); > > atomic_add(_QR_BIAS - _QW_LOCKED, &lock->cnts); > > } > > Yes, downgrade() will be better. > > Still, this only removes do_notify_parent() from the write_lock'ed section. Yeah, but the plan is to go successively to removing write lock from every place it's used, except of hashing in fork() and unhashing in __unhash_process(). > (lets ignore kill_orphaned_pgrp(), we want to make will_become_orphaned_pgrp > lockless. Look at get_signal). > > And this changes the rules: currently ->exit_state is stable under read_lock, > except -> EXIT_DEAD transition. OK, this is probably fine, but we need to > recheck. At least this was certainly wrong some time before iirc. > > > @@ -594,7 +597,10 @@ static void exit_notify(struct task_struct *tsk, int group_dead) > > > > write_lock_irq(&tasklist_lock); > > forget_original_parent(tsk, &dead); > > + tsk->exit_state = EXIT_NOTIFY; > > + write_unlock_irq(&tasklist_lock); > > And unless I missed something this EXIT_NOTIFY turns the concurrent > do_wait() into the busy-wait loop. > > Now suppose that CONFIG_SMP=n and the rt parent preempts the exiting > child right after it drops tasklist: deadlock? You sure, thank. We need to disable preemption there. > > + read_lock(&tasklist_lock); > > if (group_dead) > > kill_orphaned_pgrp(tsk->group_leader, NULL); > > > > @@ -612,13 +618,14 @@ static void exit_notify(struct task_struct *tsk, int group_dead) > > } > > > > tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE; > > This needs WRITE_ONCE(). Otherwise gcc can do, say, > > tsk->exit_state = EXIT_ZOMBIE; > if (autoreap) > tsk->exit_state = EXIT_DEAD; > > which will lead to kernel crash (both parent and child can release this > task). Ah, thanks. > > > > - if (tsk->exit_state == EXIT_DEAD) > > + smp_wmb(); /* Pairs with read_lock() in do_wait() */ > > Why? this barries looks unnecessary. Sure, it's unnecessary for do_wait(). > OTOH. We need to set EXIT_XXX before __wake_up_parent(). OK, OK, we do not > because of the busy-wait loop, but busy-wait is not an option. > > > @@ -1317,6 +1324,13 @@ static int wait_consider_task(struct wait_opts *wo, int ptrace, > > return 0; > > } > > > > + if (unlikely(exit_state == EXIT_NOTIFY)) { > > + if (wo->wo_flags & WNOHANG) > > + return 0; > > + read_unlock(&tasklist_lock); > > + return -REPEAT_DOWAIT; > > + } > > No, no, no. If you do something like this, please (ab)use wo->notask_error. > And wait_consider_task() should continue after that, Kirill -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/