Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752699Ab0GAArf (ORCPT ); Wed, 30 Jun 2010 20:47:35 -0400 Received: from fgwmail6.fujitsu.co.jp ([192.51.44.36]:38510 "EHLO fgwmail6.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751994Ab0GAArd (ORCPT ); Wed, 30 Jun 2010 20:47:33 -0400 X-SecurityPolicyCheck-FJ: OK by FujitsuOutboundMailChecker v1.3.1 From: KOSAKI Motohiro To: Salman Qazi Subject: Re: A possible sys_wait* bug Cc: kosaki.motohiro@jp.fujitsu.com, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, Oleg Nesterov , Roland McGrath , Ingo Molnar , Peter Zijlstra In-Reply-To: References: Message-Id: <20100701093621.DA24.A69D9226@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.50.07 [ja] Date: Thu, 1 Jul 2010 09:47:30 +0900 (JST) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2965 Lines: 92 Hello, (cc to some core developers) Are anyone tracking this issue? This seems security issue. > One of our internal workloads ran into a problem with waitpid. A > simple repro case is as follows: > > > #include > #include > #include > #include > #include > #include > #include > #include > #include > > #define NUM_CPUS 4 > > void *thread_code(void *args) > { > int j; > int pid2; > for (j = 0; j < 1000; j++) { > pid2 = fork(); > if (pid2 == 0) > while(1) { sleep(1000); } > } > > while (1) { > int status; > if (waitpid(-1, &status, WNOHANG)) { > printf("! %d\n", errno); > } > > } > exit(0); > > } > > /* > * non-blocking waitpids in tight loop, with many children to go through, > * done on multiple thread, so that they can "pass the torch" to eachother > * and eliminate the window that a writer has to get in. > * > * This maximizes the holding of the tasklist_lock in read mode, starving > * any attempts to take the lock in the write mode. > */ > int main(int argc, char **argv) > { > int i; > pthread_attr_t attr; > pthread_t threads[NUM_CPUS]; > for (i = 0; i < NUM_CPUS; i++) { > assert(!pthread_attr_init(&attr)); > assert(!pthread_create(&threads[i], &attr, thread_code)); > } > while(1) { sleep(1000);} > return 0; > } > > > Basically, it is possibly for readers to continuously hold > tasklist_lock (theoretically forever, as they pass from one to other), > preventing the writer from taking that lock. This typically causes a > lockup on a CPU where a task is attempting to do a fork() or exit(), > resulting in the NMI watchdog firing. > > Yes, WNOHANG is being used. And I agree that this is an inefficient > use of wait(). However, I think it should be possible to produce the > same effect without WNOHANG on sufficiently large number of threads: > by having it so that at least one thread always has the reader lock. > > I think the most direct approach to the problem is to have the > readers-writer locks be writer biased (i.e. as soon as a writer > contends, we do not permit any new readers). However all suggestions > are welcome. > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/