Received: by 2002:a05:6a10:2785:0:0:0:0 with SMTP id ia5csp2985113pxb; Tue, 12 Jan 2021 03:36:34 -0800 (PST) X-Google-Smtp-Source: ABdhPJyP2/q+szM0adWrEmHef2nDdXdUmJ9mKrj5xYVSJi3NwdSdedJ7Ql9fugb/YBdCJ7j3PFwG X-Received: by 2002:a17:906:8594:: with SMTP id v20mr2835515ejx.470.1610451394338; Tue, 12 Jan 2021 03:36:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1610451394; cv=none; d=google.com; s=arc-20160816; b=GqJ8WZaQ5AiHM7tg3zava/BG2monM/2Mvz3Q1ksPQkts9lV7Ok9CBqiDgfspiKX3Gq 2eZwDv6Xxkof0rzlgCvIPCNDicrxIZwQaKOo5EFQvOc/kNi9YEZ10IF9ZtWN9a5Pnedq 1KTLKiJmrVq4oivl2iJHLTgGfOa7cXW1ZxR29wguVhhVyV+gLbQd+dSf+Un4NSw1mStq DlxyxknECYxHUfnzrRsfO8pI++n+3KjLvU3Z1uaOVE2lSIHBMASuo8mo0ikyJWITTkAO 95PZ+uSHGUYginlQMWY8UggLjAZW21Sx4k0G080HxZq2migkl881dWBsnNEbMSahpi5O 3wTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=poqEsJKUqbi5q6gJHl8uXM58SBSZyDT+q0k8szBsA2w=; b=O+anQnVIAGULUYithIyvhZfnHC4HKv+tCe+RyQ4WvUIPI9orL706ZewBhOGJZLicY7 Zwb6S6mbbzpMcFdtLuS6U/N2PxPJOm7tqXsGNThq+TY+xzUjwCwiQYENEOpz0fGGYnxu bE9ckKkbMl5BJfIJTU2pHzjBnMB+/gDw0DuorzgQqdmLrnxvwMXGm6RvJMeF45MjU/Wv 66enE4f8RcGyElqlY+2L9zyTJE0qgSYzDZEZ4uzSbQ8BnAAge7Dyq1YZXdDNwDGEQ2DU gV8sdGdQpB/MEshOGMtKN3Z6J9tk7lyFvbYfs1Mie+d2JC0BS6YyiPsritBQXP/gTNvl wUFQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=kDk2nqge; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s20si1084932eji.387.2021.01.12.03.36.09; Tue, 12 Jan 2021 03:36:34 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=kDk2nqge; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391965AbhALHPS (ORCPT + 99 others); Tue, 12 Jan 2021 02:15:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56298 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391911AbhALHPR (ORCPT ); Tue, 12 Jan 2021 02:15:17 -0500 Received: from mail-yb1-xb30.google.com (mail-yb1-xb30.google.com [IPv6:2607:f8b0:4864:20::b30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 22B1DC061575; Mon, 11 Jan 2021 23:14:37 -0800 (PST) Received: by mail-yb1-xb30.google.com with SMTP id r63so1270246ybf.5; Mon, 11 Jan 2021 23:14:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=poqEsJKUqbi5q6gJHl8uXM58SBSZyDT+q0k8szBsA2w=; b=kDk2nqge0hKimd8X+Qyg7wIl2wcFF8bwwqg63vWK0sXmCj7coW6GbO272s0waLNUTV 6Uyk7xxB5iTMfoII5kNC8vYJHXOomxU9bfoR/rZOzWU+b1bGCgK4wB1Okh5640UQmk1j FZvuYw1b7tu9GDF8kUfeFQISaPc9ycpPU6a+2NKSci9WQvgTj95w7NR7B1vPgXG58ksn uLyzskJKMAERW5ZBlc+slnslah4AqpfcU4Mb+UkQmRTGtORIRfBOhXK2xCAWvzDh4HB2 TJOkVNnltdM7MT9+H3kkc25ITkOtBV7C67Zfw9LIiDP503gAsXcukaFv4IIRRku2WSdN ms2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=poqEsJKUqbi5q6gJHl8uXM58SBSZyDT+q0k8szBsA2w=; b=VN+ezHm52YxB6CVVo51gbg9CJovJI+p/AOzR0gFQnmpORV22XD7LDZRDJN29RlHb9a +jMhmvrra2Th0WZrUo4Mchz5HFS14j4aVYK8XdngMceupMKAXmYTN05YHEACnqKYvHG1 ZyOsyOKy/XSkmo3KCcls0PA8V6M0Ielv4IsM4deOS4Kt/KdMj3pmeIBeR8luVGXGzYbn m7cUj+FpfusPCTTPKxiFQEoLZBCcTOl8F2+wMLN/ek2Yn/rw+WkhHOllr5lnb28Ih5v2 V4lBXdt0OtnM7nl6TFLj7j9MZm3Wv21Kdt1cM19S+xARS7hfjiUp9fOZyfXiQ6WWB936 bxnA== X-Gm-Message-State: AOAM5329fzPG1/weyDYmViWM1ft80zP2BLY+nvWoSp9OXJXlFHpLTO16 eEBoiGWVI9bXSznBl6iaKQ/9SVtzEERNnK4+CEk= X-Received: by 2002:a25:f40e:: with SMTP id q14mr355322ybd.230.1610435676396; Mon, 11 Jan 2021 23:14:36 -0800 (PST) MIME-Version: 1.0 References: <20210108231950.3844417-1-songliubraving@fb.com> <20210108231950.3844417-5-songliubraving@fb.com> <352FED72-11B3-44F0-9B1C-92552AEB4AE8@fb.com> In-Reply-To: From: Andrii Nakryiko Date: Mon, 11 Jan 2021 23:14:25 -0800 Message-ID: Subject: Re: [PATCH bpf-next 4/4] bpf: runqslower: use task local storage To: Yonghong Song Cc: Song Liu , bpf , Networking , "linux-kernel@vger.kernel.org" , "mingo@redhat.com" , "peterz@infradead.org" , "ast@kernel.org" , "daniel@iogearbox.net" , "andrii@kernel.org" , "john.fastabend@gmail.com" , "kpsingh@chromium.org" , Kernel Team , "haoluo@google.com" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 11, 2021 at 7:24 PM Yonghong Song wrote: > > > > On 1/11/21 2:54 PM, Song Liu wrote: > > > > > >> On Jan 11, 2021, at 9:49 AM, Yonghong Song wrote: > >> > >> > >> > >> On 1/8/21 3:19 PM, Song Liu wrote: > >>> Replace hashtab with task local storage in runqslower. This improves the > >>> performance of these BPF programs. The following table summarizes average > >>> runtime of these programs, in nanoseconds: > >>> task-local hash-prealloc hash-no-prealloc > >>> handle__sched_wakeup 125 340 3124 > >>> handle__sched_wakeup_new 2812 1510 2998 > >>> handle__sched_switch 151 208 991 > >>> Note that, task local storage gives better performance than hashtab for > >>> handle__sched_wakeup and handle__sched_switch. On the other hand, for > >>> handle__sched_wakeup_new, task local storage is slower than hashtab with > >>> prealloc. This is because handle__sched_wakeup_new accesses the data for > >>> the first time, so it has to allocate the data for task local storage. > >>> Once the initial allocation is done, subsequent accesses, as those in > >>> handle__sched_wakeup, are much faster with task local storage. If we > >>> disable hashtab prealloc, task local storage is much faster for all 3 > >>> functions. > >>> Signed-off-by: Song Liu > >>> --- > >>> tools/bpf/runqslower/runqslower.bpf.c | 26 +++++++++++++++----------- > >>> 1 file changed, 15 insertions(+), 11 deletions(-) > >>> diff --git a/tools/bpf/runqslower/runqslower.bpf.c b/tools/bpf/runqslower/runqslower.bpf.c > >>> index 1f18a409f0443..c4de4179a0a17 100644 > >>> --- a/tools/bpf/runqslower/runqslower.bpf.c > >>> +++ b/tools/bpf/runqslower/runqslower.bpf.c > >>> @@ -11,9 +11,9 @@ const volatile __u64 min_us = 0; > >>> const volatile pid_t targ_pid = 0; > >>> struct { > >>> - __uint(type, BPF_MAP_TYPE_HASH); > >>> - __uint(max_entries, 10240); > >>> - __type(key, u32); > >>> + __uint(type, BPF_MAP_TYPE_TASK_STORAGE); > >>> + __uint(map_flags, BPF_F_NO_PREALLOC); > >>> + __type(key, int); > >>> __type(value, u64); > >>> } start SEC(".maps"); > >>> @@ -25,15 +25,19 @@ struct { > >>> /* record enqueue timestamp */ > >>> __always_inline > >>> -static int trace_enqueue(u32 tgid, u32 pid) > >>> +static int trace_enqueue(struct task_struct *t) > >>> { > >>> - u64 ts; > >>> + u32 pid = t->pid; > >>> + u64 ts, *ptr; > >>> if (!pid || (targ_pid && targ_pid != pid)) > >>> return 0; > >>> ts = bpf_ktime_get_ns(); > >>> - bpf_map_update_elem(&start, &pid, &ts, 0); > >>> + ptr = bpf_task_storage_get(&start, t, 0, > >>> + BPF_LOCAL_STORAGE_GET_F_CREATE); > >>> + if (ptr) > >>> + *ptr = ts; > >>> return 0; > >>> } > >>> @@ -43,7 +47,7 @@ int handle__sched_wakeup(u64 *ctx) > >>> /* TP_PROTO(struct task_struct *p) */ > >>> struct task_struct *p = (void *)ctx[0]; > >>> - return trace_enqueue(p->tgid, p->pid); > >>> + return trace_enqueue(p); > >>> } > >>> SEC("tp_btf/sched_wakeup_new") > >>> @@ -52,7 +56,7 @@ int handle__sched_wakeup_new(u64 *ctx) > >>> /* TP_PROTO(struct task_struct *p) */ > >>> struct task_struct *p = (void *)ctx[0]; > >>> - return trace_enqueue(p->tgid, p->pid); > >>> + return trace_enqueue(p); > >>> } > >>> SEC("tp_btf/sched_switch") > >>> @@ -70,12 +74,12 @@ int handle__sched_switch(u64 *ctx) > >>> /* ivcsw: treat like an enqueue event and store timestamp */ > >>> if (prev->state == TASK_RUNNING) > >>> - trace_enqueue(prev->tgid, prev->pid); > >>> + trace_enqueue(prev); > >>> pid = next->pid; > >>> /* fetch timestamp and calculate delta */ > >>> - tsp = bpf_map_lookup_elem(&start, &pid); > >>> + tsp = bpf_task_storage_get(&start, next, 0, 0); > >>> if (!tsp) > >>> return 0; /* missed enqueue */ > >> > >> Previously, hash table may overflow so we may have missed enqueue. > >> Here with task local storage, is it possible to add additional pid > >> filtering in the beginning of handle__sched_switch such that > >> missed enqueue here can be treated as an error? > > > > IIUC, hashtab overflow is not the only reason of missed enqueue. If the > > wakeup (which calls trace_enqueue) happens before runqslower starts, we > > may still get missed enqueue in sched_switch, no? > > the wakeup won't happen before runqslower starts since runqslower needs > to start to do attachment first and then trace_enqueue() can run. I think Song is right. Given wakeup and sched_switch need to be matched, depending at which exact time we attach BPF programs, we can end up missing wakeup, but not missing sched_switch, no? So it's not an error. > > For the current implementation trace_enqueue() will happen for any non-0 > pid before setting test_progs tgid, and will happen for any non-0 and > test_progs tgid if it is set, so this should be okay if we do filtering > in handle__sched_switch. Maybe you can do an experiment to prove whether > my point is correct or not. > > > > > Thanks, > > Song > >