Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp2881406pxf; Sun, 14 Mar 2021 15:58:49 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzip686mU8V6AyGt5/jA/AN+tliImstI0zxbLoWMGVIagEYYWzkXmu62K8ItGTwNk0FtRNj X-Received: by 2002:a17:906:7102:: with SMTP id x2mr20703086ejj.355.1615762729087; Sun, 14 Mar 2021 15:58:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1615762729; cv=none; d=google.com; s=arc-20160816; b=OamPMcU0aKlfNUJtMDYJ3AYHfx4rHYAJY46EnPE8I2j229l/Bu2Ozcwwqb08gQAACC TQgupoJJPLRkRRLWX3b6ZIWuLMc9texvBkxUFN6++ee6oDuDFNxAgef1+JY0sjrmxbWk s4NQXw5VuH9KszIT9aXd3DDhx3VrpGSsJJ36Hns1Qrat/xNgKnFy7t7nupdnERMIklpD omjOXr4qqNDGFTez2qVVLlwZQnW9E3WmMm1AIWXNRLRoi1j3Hszg9N5eF/rAW3UjYTKe oPMzwmHCOXujsEobtpWDJmPER8QCzU3It9+VNNJIduH0yd09G3ArXBs2n6w3uqOSGcHD mGHQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=/A5T1Q6K2KWCLF+oeCpv7x6JDQNAanmaN/8ukpT8uD4=; b=MDvmK0N7+kEtqz4BGAJk3quqGccV7ofL+XBwVVdn8NSkOPTn3IDLc350lVxxcBGIrW U0SBxT1cRLxj+C7Ly1BCNrfZvM7oXTQ3PFbkLk3vnD+CyL+2vbY4ok9N5bxybm0UasOw yV74kkPKwaMM2t2UYPiSd1AyT8mAba+/ZNvW+NBbiOkxW/vyMZLgcU9Bn9i23y8yZKiD 0/74EsdpRvGDoh/APEGK1wx1SoqMebeQm2u562GGw5sI7Jf513pRQnaxLQhPD/Vf+EDm 5LMANC1Sz4waO0CPNabmXB23GCKtGUMiRibNsf6j5jX+WZkjwT5vawz6V9eGpekjfBuA v70g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a21si9427200ejr.620.2021.03.14.15.58.26; Sun, 14 Mar 2021 15:58:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229488AbhCNW5Q (ORCPT + 99 others); Sun, 14 Mar 2021 18:57:16 -0400 Received: from mx1.riseup.net ([198.252.153.129]:46668 "EHLO mx1.riseup.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229459AbhCNW4n (ORCPT ); Sun, 14 Mar 2021 18:56:43 -0400 Received: from fews1.riseup.net (fews1-pn.riseup.net [10.0.1.83]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "*.riseup.net", Issuer "Sectigo RSA Domain Validation Secure Server CA" (not verified)) by mx1.riseup.net (Postfix) with ESMTPS id 4DzFKk2sKQzDq75; Sun, 14 Mar 2021 15:56:42 -0700 (PDT) X-Riseup-User-ID: 60716A24EDC7AEC8D03E29D45FACB7C19534F981C90079DB16679013E7E132F1 Received: from [127.0.0.1] (localhost [127.0.0.1]) by fews1.riseup.net (Postfix) with ESMTPSA id 4DzFKj2lk9z5wFg; Sun, 14 Mar 2021 15:56:41 -0700 (PDT) From: Jim Newsome To: Andrew Morton Cc: Oleg Nesterov , "Eric W . Biederman" , Christian Brauner , linux-kernel@vger.kernel.org, Jim Newsome Subject: [PATCH v6] do_wait: make PIDTYPE_PID case O(1) instead of O(n) Date: Sun, 14 Mar 2021 17:56:32 -0500 Message-Id: <20210314225632.6759-1-jnewsome@torproject.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch adds a special-case when waiting on a pid (via waitpid, waitid, wait4, etc) to avoid doing an O(n) scan of children and tracees, and instead do an O(1) lookup. This improves performance when waiting on a pid from a thread group with many children and/or tracees. Time to fork and then call waitpid on the child, from a task that already has N children [1]: N | Before | After -----|---------|------ 1 | 74 us | 74 us 20 | 72 us | 75 us 100 | 83 us | 77 us 500 | 99 us | 74 us 1000 | 179 us | 75 us 5000 | 804 us | 79 us 8000 | 1268 us | 78 us [1]: https://lkml.org/lkml/2021/3/12/1567 This can make a substantial performance improvement for applications with a thread that has many children or tracees and frequently needs to wait on them. Tools that use ptrace to intercept syscalls for a large number of processes are likely to fall into this category. In particular this patch was developed while building a ptrace-based second generation of the Shadow emulator [2], for which it allows us to avoid quadratic scaling (without having to use a workaround that introduces a ~40% performance penalty) [3]. Other examples of tools that fall into this category which this patch may help include User Mode Linux [4] and DetTrace [5]. [2]: https://shadow.github.io/ [3]: https://github.com/shadow/shadow/issues/1134#issuecomment-798992292 [4]: https://en.wikipedia.org/wiki/User-mode_Linux [5]: https://github.com/dettrace/dettrace Signed-off-by: James Newsome --- v5: https://lkml.org/lkml/2021/3/12/1134 * Switched back to explicitly looking up by tgid and then pid. * Added further motivation and context in the patch description. kernel/exit.c | 69 +++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 59 insertions(+), 10 deletions(-) diff --git a/kernel/exit.c b/kernel/exit.c index 04029e35e69a..65c862c604a7 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -1439,9 +1439,50 @@ void __wake_up_parent(struct task_struct *p, struct task_struct *parent) TASK_INTERRUPTIBLE, p); } +static bool is_effectively_child(struct wait_opts *wo, bool ptrace, + struct task_struct *target) +{ + struct task_struct *parent = + !ptrace ? target->real_parent : target->parent; + + return current == parent || (!(wo->wo_flags & __WNOTHREAD) && + same_thread_group(current, parent)); +} + +/* + * Optimization for waiting on PIDTYPE_PID. No need to iterate through child + * and tracee lists to find the target task. + */ +static int do_wait_pid(struct wait_opts *wo) +{ + bool ptrace; + struct task_struct *target; + int retval; + + target = pid_task(wo->wo_pid, PIDTYPE_PID); + if (!target) + return 0; + + ptrace = false; + if (thread_group_leader(target) && + is_effectively_child(wo, ptrace, target)) { + retval = wait_consider_task(wo, ptrace, target); + if (retval) + return retval; + } + + ptrace = true; + if (target->ptrace && is_effectively_child(wo, ptrace, target)) { + retval = wait_consider_task(wo, ptrace, target); + if (retval) + return retval; + } + + return 0; +} + static long do_wait(struct wait_opts *wo) { - struct task_struct *tsk; int retval; trace_sched_process_wait(wo->wo_pid); @@ -1463,19 +1504,27 @@ static long do_wait(struct wait_opts *wo) set_current_state(TASK_INTERRUPTIBLE); read_lock(&tasklist_lock); - tsk = current; - do { - retval = do_wait_thread(wo, tsk); - if (retval) - goto end; - retval = ptrace_do_wait(wo, tsk); + if (wo->wo_type == PIDTYPE_PID) { + retval = do_wait_pid(wo); if (retval) goto end; + } else { + struct task_struct *tsk = current; + + do { + retval = do_wait_thread(wo, tsk); + if (retval) + goto end; - if (wo->wo_flags & __WNOTHREAD) - break; - } while_each_thread(current, tsk); + retval = ptrace_do_wait(wo, tsk); + if (retval) + goto end; + + if (wo->wo_flags & __WNOTHREAD) + break; + } while_each_thread(current, tsk); + } read_unlock(&tasklist_lock); notask: -- 2.30.1