Received: by 2002:a05:6a10:a841:0:0:0:0 with SMTP id d1csp4724510pxy; Tue, 27 Apr 2021 11:08:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy0DsEkFFGQqDXOeGH+W5Vuu7ZA1/Nhts+GYYmV9EyVXm0kqtg590bfgXs53n2P13P5ArNk X-Received: by 2002:a05:6402:1157:: with SMTP id g23mr5984093edw.303.1619546894110; Tue, 27 Apr 2021 11:08:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1619546894; cv=none; d=google.com; s=arc-20160816; b=0rtilgPxIlMMf2E+YPI28dO6W5+V++qoX/QMXPNWGL/vK6Rl6FF6sC10C/EXRBQ9RZ ARTTG8Qgsqm0AAHOVhXkdtM8cgJh0WnmMc50jxOjjJFg2WZz4Dg7JojcX53TVtemjT41 3TUgPz4hmA8SGqIqeTLpTeaBCw3bna7Erm90Wwc++vkuJLuGqHo/kaEUiPuapp8E/fQs wF2FD6sSstJj6NbyjLq3kTDXiRSWq+dSoqSSx/TNeQeL4+xvIwdzLbzL1vsGYKLrflwA FRuKB5H9PqKk7o5LUyxCxeoPF9zbzFnkAP9Tg3tI3+RszTfWRxmeM3OW1L9n83eAu7Fq DS1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=iE0Swjr1Aj6+bOUe7NqE4bb/En2P6kGs3Fyx7kqH9kE=; b=R+tUMupox/MOQMY0SyaLwTNnEi3zt2wHyVDCrZes5h9iN5l+fyXeWWA4oT2HpSrSC3 SW0cJZWBhDOjKn1bb6lPGSRcaHF6GqbyfdNUJ+Yr2jihMN5FR99Zd+GgJ+k0vVmQvpDQ GN5lZqPV3TH+WxPXDqTQDYXJ5whmOApoMr5+aNKqRAK/Be5a4RLRGl3adtOv9kYsV+pQ Zx9FzOXHUKHEp/Uelh0DYk+aT+17hs9HkHVUwpBYHkrpGHM1v2IwU7NlcAtEc5UlDEZD yyxKfCpSncz2fC8Cg6EFpVBtO7kzXX94f/LVhNlnyaJ+qDeIWd0zBZzLlYt+Pn/BQBGY msFA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Xa0T2VIf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id zg23si540392ejb.163.2021.04.27.11.07.49; Tue, 27 Apr 2021 11:08:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Xa0T2VIf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238756AbhD0SEa (ORCPT + 99 others); Tue, 27 Apr 2021 14:04:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46310 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237634AbhD0SE2 (ORCPT ); Tue, 27 Apr 2021 14:04:28 -0400 Received: from mail-yb1-xb30.google.com (mail-yb1-xb30.google.com [IPv6:2607:f8b0:4864:20::b30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7D0BDC061574; Tue, 27 Apr 2021 11:03:43 -0700 (PDT) Received: by mail-yb1-xb30.google.com with SMTP id 130so26649863ybd.10; Tue, 27 Apr 2021 11:03:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=iE0Swjr1Aj6+bOUe7NqE4bb/En2P6kGs3Fyx7kqH9kE=; b=Xa0T2VIffFZ+upfCxkvL46v6K49zaumIwbQW5f62sr/kAqZCPgT9h7CoMSvzqu0AXF QRQDihKusTdQdmrc5P++Va5PYWwZGpwZ5NlAN6XLNnGjL1eXRo6oCI49upS8c8pm2ATx 3AzqwjUp8PVEDg/TeMbrrZBZM/QfWwvIIZwe5BVwzVcalbXTalneIo79Ayts/3w9/qo8 s88XhitASerxUvRLLg5viQXxb35ScQNMn6UfdJFRovFEg0+wHkQPE1XSWyhPyTHclGHe aLMhTrpk3u9fWBBHS8TU0ylno9qEFbLg0y1SFLCTuo0QnXRhgyrutXTrYA1XpwXjCL8p Qz3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=iE0Swjr1Aj6+bOUe7NqE4bb/En2P6kGs3Fyx7kqH9kE=; b=iBcGZC4uRnvbRVnD0Lrxp2qpBLuJRayrIZd1Nl/Fz3IK0mGA+GBAL94tpHWnTzgdUZ 1QulTenY703do2H4HKyZ7JNlFsxeBodt547RdH0sHncIcvlcMjZlK7uHxuio48p/o+Bu BwXV7lNbPnEXyzo0HsqaZ3+hMPBFVF6Yd2cROxdrx4NpWJNmX49b6w+qEN736AUoGocy Du8qJBbWQYtJ3C75oJLMYJW/xQ8fwDTD/KZ6YDWavEyFMAFLQRmZNP5HwmE9LAFZsCfp jcqfi+ZsHiF51rjKnzcS2XnN9CZx3QH3H8MBsqMfRYJS9JWTo++mKLhERLWNvU3ZNSqh ebyQ== X-Gm-Message-State: AOAM531qrgju3C3wcIEencUL9yCRmT+jJlePmSN6pSMVhzpRpyuqLWb8 oGFpOCVh9HSpMpJabterzx+uB84RJGfJP6PSppk= X-Received: by 2002:a25:3357:: with SMTP id z84mr34103990ybz.260.1619546622575; Tue, 27 Apr 2021 11:03:42 -0700 (PDT) MIME-Version: 1.0 References: <20210419155243.1632274-1-revest@chromium.org> <20210419155243.1632274-7-revest@chromium.org> <2db39f1c-cedd-b9e7-2a15-aef203f068eb@rasmusvillemoes.dk> In-Reply-To: From: Andrii Nakryiko Date: Tue, 27 Apr 2021 11:03:31 -0700 Message-ID: Subject: Re: [PATCH bpf-next v5 6/6] selftests/bpf: Add a series of tests for bpf_snprintf To: Florent Revest Cc: Rasmus Villemoes , bpf , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Yonghong Song , KP Singh , Brendan Jackman , open list Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 27, 2021 at 2:51 AM Florent Revest wrote: > > On Tue, Apr 27, 2021 at 8:35 AM Rasmus Villemoes > wrote: > > > > On 26/04/2021 23.08, Florent Revest wrote: > > > On Mon, Apr 26, 2021 at 6:19 PM Andrii Nakryiko > > > wrote: > > >> > > >> On Mon, Apr 26, 2021 at 3:10 AM Florent Revest wrote: > > >>> > > >>> On Sat, Apr 24, 2021 at 12:38 AM Andrii Nakryiko > > >>> wrote: > > >>>> > > >>>> On Mon, Apr 19, 2021 at 8:52 AM Florent Revest wrote: > > >>>>> > > >>>>> The "positive" part tests all format specifiers when things go well. > > >>>>> > > >>>>> The "negative" part makes sure that incorrect format strings fail at > > >>>>> load time. > > >>>>> > > >>>>> Signed-off-by: Florent Revest > > >>>>> --- > > >>>>> .../selftests/bpf/prog_tests/snprintf.c | 125 ++++++++++++++++++ > > >>>>> .../selftests/bpf/progs/test_snprintf.c | 73 ++++++++++ > > >>>>> .../bpf/progs/test_snprintf_single.c | 20 +++ > > >>>>> 3 files changed, 218 insertions(+) > > >>>>> create mode 100644 tools/testing/selftests/bpf/prog_tests/snprintf.c > > >>>>> create mode 100644 tools/testing/selftests/bpf/progs/test_snprintf.c > > >>>>> create mode 100644 tools/testing/selftests/bpf/progs/test_snprintf_single.c > > >>>>> > > >>>>> diff --git a/tools/testing/selftests/bpf/prog_tests/snprintf.c b/tools/testing/selftests/bpf/prog_tests/snprintf.c > > >>>>> new file mode 100644 > > >>>>> index 000000000000..a958c22aec75 > > >>>>> --- /dev/null > > >>>>> +++ b/tools/testing/selftests/bpf/prog_tests/snprintf.c > > >>>>> @@ -0,0 +1,125 @@ > > >>>>> +// SPDX-License-Identifier: GPL-2.0 > > >>>>> +/* Copyright (c) 2021 Google LLC. */ > > >>>>> + > > >>>>> +#include > > >>>>> +#include "test_snprintf.skel.h" > > >>>>> +#include "test_snprintf_single.skel.h" > > >>>>> + > > >>>>> +#define EXP_NUM_OUT "-8 9 96 -424242 1337 DABBAD00" > > >>>>> +#define EXP_NUM_RET sizeof(EXP_NUM_OUT) > > >>>>> + > > >>>>> +#define EXP_IP_OUT "127.000.000.001 0000:0000:0000:0000:0000:0000:0000:0001" > > >>>>> +#define EXP_IP_RET sizeof(EXP_IP_OUT) > > >>>>> + > > >>>>> +/* The third specifier, %pB, depends on compiler inlining so don't check it */ > > >>>>> +#define EXP_SYM_OUT "schedule schedule+0x0/" > > >>>>> +#define MIN_SYM_RET sizeof(EXP_SYM_OUT) > > >>>>> + > > >>>>> +/* The third specifier, %p, is a hashed pointer which changes on every reboot */ > > >>>>> +#define EXP_ADDR_OUT "0000000000000000 ffff00000add4e55 " > > >>>>> +#define EXP_ADDR_RET sizeof(EXP_ADDR_OUT "unknownhashedptr") > > >>>>> + > > >>>>> +#define EXP_STR_OUT "str1 longstr" > > >>>>> +#define EXP_STR_RET sizeof(EXP_STR_OUT) > > >>>>> + > > >>>>> +#define EXP_OVER_OUT "%over" > > >>>>> +#define EXP_OVER_RET 10 > > >>>>> + > > >>>>> +#define EXP_PAD_OUT " 4 000" > > >>>> > > >>>> Roughly 50% of the time I get failure for this test case: > > >>>> > > >>>> test_snprintf_positive:FAIL:pad_out unexpected pad_out: actual ' 4 > > >>>> 0000' != expected ' 4 000' > > >>>> > > >>>> Re-running this test case immediately passes. Running again most > > >>>> probably fails. Please take a look. > > >>> > > >>> Do you have more information on how to reproduce this ? > > >>> I spinned up a VM at 87bd9e602 with ./vmtest -s and then run this script: > > >>> > > >>> #!/bin/sh > > >>> for i in `seq 1000` > > >>> do > > >>> ./test_progs -t snprintf > > >>> if [ $? -ne 0 ]; > > >>> then > > >>> echo FAILURE > > >>> exit 1 > > >>> fi > > >>> done > > >>> > > >>> The thousand executions passed. > > >>> > > >>> This is a bit concerning because your unexpected_pad_out seems to have > > >>> an extra '0' so it ends up with strlen(pad_out)=11 but > > >>> sizeof(pad_out)=10. The actual string writing is not really done by > > >>> our helper code but by the snprintf implementation (str and str_size > > >>> are only given to snprintf()) so I'd expect the truncation to work > > >>> well there. I'm a bit puzzled > > >> > > >> I'm puzzled too, have no idea. I also can't repro this with vmtest.sh. > > >> But I can quite reliably reproduce with my local ArchLinux-based qemu > > >> image with different config (see [0] for config itself). So please try > > >> with my config and see if that helps to repro. If not, I'll have to > > >> debug it on my own later. > > >> > > >> [0] https://gist.github.com/anakryiko/4b6ae21680842bdeacca8fa99d378048 > > > > > > I tried that config on the same commit 87bd9e602 (bpf-next/master) > > > with my debian-based qemu image and I still can't reproduce the issue > > > :| If I can be of any help let me know, I'd be happy to help > > > > > > > It's not really clear to me if this is before or after the rewrite to > > use bprintf, but regardless, in those two patches this caught my attention: > > I tried to reproduce Andrii's bug both before and after the bprintf > rewrite but I think he meant before. I'm running on the latest bpf-next master, but I don't think it's related to bprintf change. > > > u64 args[MAX_TRACE_PRINTK_VARARGS] = { arg1, arg2, arg3 }; > > - enum bpf_printf_mod_type mod[MAX_TRACE_PRINTK_VARARGS]; > > + u32 *bin_args; > > static char buf[BPF_TRACE_PRINTK_SIZE]; > > unsigned long flags; > > int ret; > > > > - ret = bpf_printf_prepare(fmt, fmt_size, args, args, mod, > > - MAX_TRACE_PRINTK_VARARGS); > > + ret = bpf_bprintf_prepare(fmt, fmt_size, args, &bin_args, > > + MAX_TRACE_PRINTK_VARARGS); > > if (ret < 0) > > return ret; > > > > - ret = snprintf(buf, sizeof(buf), fmt, BPF_CAST_FMT_ARG(0, args, mod), > > - BPF_CAST_FMT_ARG(1, args, mod), BPF_CAST_FMT_ARG(2, args, mod)); > > - /* snprintf() will not append null for zero-length strings */ > > - if (ret == 0) > > - buf[0] = '\0'; > > + ret = bstr_printf(buf, sizeof(buf), fmt, bin_args); > > > > raw_spin_lock_irqsave(&trace_printk_lock, flags); > > trace_bpf_trace_printk(buf); > > raw_spin_unlock_irqrestore(&trace_printk_lock, flags); > > > > Why isn't the write to buf[] protected by that spinlock? Or put another > > way, what protects buf[] from concurrent writes? > > You're right, that is a bug, I missed that buf was static and thought > it was just on the stack. That snprintf call should be after the > raw_spin_lock_irqsave. I'll send a patch. Thank you Rasmus. (before my > snprintf series, there was a vsprintf after the raw_spin_lock_irqsave) Can you please also clean up unnecessary ()s you added in at least a few places. Thanks. > > > Probably the test cases are not run in parallel, but this is the kind of > > thing that would give those symptoms. > > I think it's a separate issue from what Andrii reported though because > the flaky test exercises the bpf_snprintf helper and this buf spinlock > bug you just found only affects the bpf_trace_printk helper. > > That being said, it does smell a little bit like a concurrency issue > too, indeed. The bpf_snprintf test program is a raw_tp/sys_enter so it > attaches to all syscall entries and most likely gets executed many > more times than necessary and probably on parallel CPUs. The "pad_out" > buffer they write to is unique and not locked so maybe the test's > userspace reads pad_out while another CPU is writing on it and if the > string output goes through a stage where it is " 4 0000" before > being " 4 000", we might read at the wrong time. That being said, I > would find it weird that this happens as much as 50% of the time and > always specifically on that test case. > > Andrii could you maybe try changing the prog type to > "tp/syscalls/sys_enter_nanosleep" on the machine where you can > reproduce this bug ? Yes, it helps. I can't repro it easily anymore. I think the right fix, though, should be to filter by tid, not change the tracepoint. I think what's happening is we see the string right before bstr_printf does zero-termination with end[-1] = '\0'; So in some cases we see truncated string, in others we see untruncated one. > > > Rasmus