Received: by 2002:a05:6a10:a841:0:0:0:0 with SMTP id d1csp4345444pxy; Tue, 27 Apr 2021 02:52:25 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw+L2X0uAGFcGugBZeiGbzYxPtuVpbSC+JeW9ftAjmHihx/+ppFBPlYKq48BT45b26mFfGW X-Received: by 2002:a17:906:1a10:: with SMTP id i16mr3870146ejf.353.1619517145078; Tue, 27 Apr 2021 02:52:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1619517145; cv=none; d=google.com; s=arc-20160816; b=GQQbK6xtXEL+5pBLgD9FoK5YROuW8JfR6fb3AW23BGaDcOHCCGer7G4XudFf+/9vya SSxOQ3MA6fkRnipJr+hq5N1cNvWgzI7ABDJrey7Xh/Wwb+ORLeQz+s5HP+0Tw3Iz6K2O huX1vZ3psmitBJ/KXBmRTaCgDI8fQkqonZITAWNORJMBCRO+Cr7cERkLmXzIHth7TDG1 o+6EloJy02AEfu/t/YmLPhvXT7uX6hCyhvBtI+JBzFygBwO3B7RPmUx47z4VDdi0f4zA qAPvrnXLU948eoYTa6OlD9VJ+BIegGPrZYphpCwa80M8n24GP7yTQurVt7UoViKbtUnK uu9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=aDl/UXnOxJXHbpPYkD239oBJgYN5fLw9tXUol5cebbY=; b=uou109eTsTZtUgy9nu8956Z/nOYc+SJOZ/3hAQ0+I7K8d0aYn8S5ebV9tMbRpUz61o 6s3g3Zcq60+j2IrGTyj5BZc3b8X6jkLVa+8Pjal7dEbH5/Y+gDoBlTRv3/7+Q4uDffeu cyN8qh4cmiBbPEXxY1GnfAv2ykHpXBqOY5HHHtnv/3fSE99KxuQp5rY6bc6FlZJFIMLY P1ZUkPNCiZgoOdEGad2wK9yOP3+HrtdauwYmfmgJULy69dGlbWkfeLiujbWvxNdN30YL ohq9pXE8/ydAwyKsMPWaTXxD9djFxwwe7lkIj+s5p20sL0yAjjf7x3aHO4QGIkUq53NB Os0A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=lA9Y46DE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id m26si2205408edp.104.2021.04.27.02.52.01; Tue, 27 Apr 2021 02:52:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=lA9Y46DE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235267AbhD0Jvt (ORCPT + 99 others); Tue, 27 Apr 2021 05:51:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49214 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235078AbhD0Jvt (ORCPT ); Tue, 27 Apr 2021 05:51:49 -0400 Received: from mail-il1-x12b.google.com (mail-il1-x12b.google.com [IPv6:2607:f8b0:4864:20::12b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E3396C061574 for ; Tue, 27 Apr 2021 02:51:04 -0700 (PDT) Received: by mail-il1-x12b.google.com with SMTP id i22so6289991ila.11 for ; Tue, 27 Apr 2021 02:51:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=aDl/UXnOxJXHbpPYkD239oBJgYN5fLw9tXUol5cebbY=; b=lA9Y46DE6biGEjXcRc2JhpZ0jcr+XDCHlb6R3JEq0p3iKM9oZELRw4DS+/Kb2mpnhG ouaoaWUh5RY/UNh1/XhM9iKh7ctmyVC31PJqQFQcUi3u39LG3nieDj20j3LleFSLC07E LuOXh02JA/eIjPHqPTWIhj0L34Gzt2sM58z4g= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=aDl/UXnOxJXHbpPYkD239oBJgYN5fLw9tXUol5cebbY=; b=puBUujgcMV9xAwsfVVNeIH5xEwZCf9bAyGVx8/OZ1iFFaMqmKySpCiGaIjVnD6rrGZ BUmIMt8TqUO03cU9ATcALALGG7ib8mjD0YHSSelGwmyjiF3F6wKvjSFRfJ2+/wPzWbxj uk2iiCnbPxhJiadNL1IwgKeExMIE8HJh44Hv1Y4EKnVGJTIqp+NCMsVYfUdtyMtpmE5V wRCmAPi4uAVtZnMYkdpy7Z55qjn1bSxY/VOGWD9fwkP71Z+34tEWglGq3aiqYb4rGMdm 5weJ9GNH/sBaPg5PG8s/lvlA8Z6QsMSC96+bmlqf/MCrYusnOx9uxDlvxhXc8M2jgfpD OchA== X-Gm-Message-State: AOAM530Jb3FeNF6T0wAh4+Et6lDTTv/m6uvdOsrhEBiDLH4ygc/KJDJP BIFYGrhy+BpgiYJvXfrz6EM2ZE9tGim8E6fS/nRajg== X-Received: by 2002:a05:6e02:1caf:: with SMTP id x15mr18457508ill.89.1619517064365; Tue, 27 Apr 2021 02:51:04 -0700 (PDT) MIME-Version: 1.0 References: <20210419155243.1632274-1-revest@chromium.org> <20210419155243.1632274-7-revest@chromium.org> <2db39f1c-cedd-b9e7-2a15-aef203f068eb@rasmusvillemoes.dk> In-Reply-To: <2db39f1c-cedd-b9e7-2a15-aef203f068eb@rasmusvillemoes.dk> From: Florent Revest Date: Tue, 27 Apr 2021 11:50:53 +0200 Message-ID: Subject: Re: [PATCH bpf-next v5 6/6] selftests/bpf: Add a series of tests for bpf_snprintf To: Rasmus Villemoes Cc: Andrii Nakryiko , bpf , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Yonghong Song , KP Singh , Brendan Jackman , open list Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 27, 2021 at 8:35 AM Rasmus Villemoes wrote: > > On 26/04/2021 23.08, Florent Revest wrote: > > On Mon, Apr 26, 2021 at 6:19 PM Andrii Nakryiko > > wrote: > >> > >> On Mon, Apr 26, 2021 at 3:10 AM Florent Revest wrote: > >>> > >>> On Sat, Apr 24, 2021 at 12:38 AM Andrii Nakryiko > >>> wrote: > >>>> > >>>> On Mon, Apr 19, 2021 at 8:52 AM Florent Revest wrote: > >>>>> > >>>>> The "positive" part tests all format specifiers when things go well. > >>>>> > >>>>> The "negative" part makes sure that incorrect format strings fail at > >>>>> load time. > >>>>> > >>>>> Signed-off-by: Florent Revest > >>>>> --- > >>>>> .../selftests/bpf/prog_tests/snprintf.c | 125 ++++++++++++++++++ > >>>>> .../selftests/bpf/progs/test_snprintf.c | 73 ++++++++++ > >>>>> .../bpf/progs/test_snprintf_single.c | 20 +++ > >>>>> 3 files changed, 218 insertions(+) > >>>>> create mode 100644 tools/testing/selftests/bpf/prog_tests/snprintf.c > >>>>> create mode 100644 tools/testing/selftests/bpf/progs/test_snprintf.c > >>>>> create mode 100644 tools/testing/selftests/bpf/progs/test_snprintf_single.c > >>>>> > >>>>> diff --git a/tools/testing/selftests/bpf/prog_tests/snprintf.c b/tools/testing/selftests/bpf/prog_tests/snprintf.c > >>>>> new file mode 100644 > >>>>> index 000000000000..a958c22aec75 > >>>>> --- /dev/null > >>>>> +++ b/tools/testing/selftests/bpf/prog_tests/snprintf.c > >>>>> @@ -0,0 +1,125 @@ > >>>>> +// SPDX-License-Identifier: GPL-2.0 > >>>>> +/* Copyright (c) 2021 Google LLC. */ > >>>>> + > >>>>> +#include > >>>>> +#include "test_snprintf.skel.h" > >>>>> +#include "test_snprintf_single.skel.h" > >>>>> + > >>>>> +#define EXP_NUM_OUT "-8 9 96 -424242 1337 DABBAD00" > >>>>> +#define EXP_NUM_RET sizeof(EXP_NUM_OUT) > >>>>> + > >>>>> +#define EXP_IP_OUT "127.000.000.001 0000:0000:0000:0000:0000:0000:0000:0001" > >>>>> +#define EXP_IP_RET sizeof(EXP_IP_OUT) > >>>>> + > >>>>> +/* The third specifier, %pB, depends on compiler inlining so don't check it */ > >>>>> +#define EXP_SYM_OUT "schedule schedule+0x0/" > >>>>> +#define MIN_SYM_RET sizeof(EXP_SYM_OUT) > >>>>> + > >>>>> +/* The third specifier, %p, is a hashed pointer which changes on every reboot */ > >>>>> +#define EXP_ADDR_OUT "0000000000000000 ffff00000add4e55 " > >>>>> +#define EXP_ADDR_RET sizeof(EXP_ADDR_OUT "unknownhashedptr") > >>>>> + > >>>>> +#define EXP_STR_OUT "str1 longstr" > >>>>> +#define EXP_STR_RET sizeof(EXP_STR_OUT) > >>>>> + > >>>>> +#define EXP_OVER_OUT "%over" > >>>>> +#define EXP_OVER_RET 10 > >>>>> + > >>>>> +#define EXP_PAD_OUT " 4 000" > >>>> > >>>> Roughly 50% of the time I get failure for this test case: > >>>> > >>>> test_snprintf_positive:FAIL:pad_out unexpected pad_out: actual ' 4 > >>>> 0000' != expected ' 4 000' > >>>> > >>>> Re-running this test case immediately passes. Running again most > >>>> probably fails. Please take a look. > >>> > >>> Do you have more information on how to reproduce this ? > >>> I spinned up a VM at 87bd9e602 with ./vmtest -s and then run this script: > >>> > >>> #!/bin/sh > >>> for i in `seq 1000` > >>> do > >>> ./test_progs -t snprintf > >>> if [ $? -ne 0 ]; > >>> then > >>> echo FAILURE > >>> exit 1 > >>> fi > >>> done > >>> > >>> The thousand executions passed. > >>> > >>> This is a bit concerning because your unexpected_pad_out seems to have > >>> an extra '0' so it ends up with strlen(pad_out)=11 but > >>> sizeof(pad_out)=10. The actual string writing is not really done by > >>> our helper code but by the snprintf implementation (str and str_size > >>> are only given to snprintf()) so I'd expect the truncation to work > >>> well there. I'm a bit puzzled > >> > >> I'm puzzled too, have no idea. I also can't repro this with vmtest.sh. > >> But I can quite reliably reproduce with my local ArchLinux-based qemu > >> image with different config (see [0] for config itself). So please try > >> with my config and see if that helps to repro. If not, I'll have to > >> debug it on my own later. > >> > >> [0] https://gist.github.com/anakryiko/4b6ae21680842bdeacca8fa99d378048 > > > > I tried that config on the same commit 87bd9e602 (bpf-next/master) > > with my debian-based qemu image and I still can't reproduce the issue > > :| If I can be of any help let me know, I'd be happy to help > > > > It's not really clear to me if this is before or after the rewrite to > use bprintf, but regardless, in those two patches this caught my attention: I tried to reproduce Andrii's bug both before and after the bprintf rewrite but I think he meant before. > u64 args[MAX_TRACE_PRINTK_VARARGS] = { arg1, arg2, arg3 }; > - enum bpf_printf_mod_type mod[MAX_TRACE_PRINTK_VARARGS]; > + u32 *bin_args; > static char buf[BPF_TRACE_PRINTK_SIZE]; > unsigned long flags; > int ret; > > - ret = bpf_printf_prepare(fmt, fmt_size, args, args, mod, > - MAX_TRACE_PRINTK_VARARGS); > + ret = bpf_bprintf_prepare(fmt, fmt_size, args, &bin_args, > + MAX_TRACE_PRINTK_VARARGS); > if (ret < 0) > return ret; > > - ret = snprintf(buf, sizeof(buf), fmt, BPF_CAST_FMT_ARG(0, args, mod), > - BPF_CAST_FMT_ARG(1, args, mod), BPF_CAST_FMT_ARG(2, args, mod)); > - /* snprintf() will not append null for zero-length strings */ > - if (ret == 0) > - buf[0] = '\0'; > + ret = bstr_printf(buf, sizeof(buf), fmt, bin_args); > > raw_spin_lock_irqsave(&trace_printk_lock, flags); > trace_bpf_trace_printk(buf); > raw_spin_unlock_irqrestore(&trace_printk_lock, flags); > > Why isn't the write to buf[] protected by that spinlock? Or put another > way, what protects buf[] from concurrent writes? You're right, that is a bug, I missed that buf was static and thought it was just on the stack. That snprintf call should be after the raw_spin_lock_irqsave. I'll send a patch. Thank you Rasmus. (before my snprintf series, there was a vsprintf after the raw_spin_lock_irqsave) > Probably the test cases are not run in parallel, but this is the kind of > thing that would give those symptoms. I think it's a separate issue from what Andrii reported though because the flaky test exercises the bpf_snprintf helper and this buf spinlock bug you just found only affects the bpf_trace_printk helper. That being said, it does smell a little bit like a concurrency issue too, indeed. The bpf_snprintf test program is a raw_tp/sys_enter so it attaches to all syscall entries and most likely gets executed many more times than necessary and probably on parallel CPUs. The "pad_out" buffer they write to is unique and not locked so maybe the test's userspace reads pad_out while another CPU is writing on it and if the string output goes through a stage where it is " 4 0000" before being " 4 000", we might read at the wrong time. That being said, I would find it weird that this happens as much as 50% of the time and always specifically on that test case. Andrii could you maybe try changing the prog type to "tp/syscalls/sys_enter_nanosleep" on the machine where you can reproduce this bug ? > Rasmus