Received: by 2002:a05:6358:111d:b0:dc:6189:e246 with SMTP id f29csp3667549rwi; Wed, 2 Nov 2022 01:31:51 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5TTus/63kn5I5EJIj8rzT7gN/8vbs7puyQVd+iUSPs0fPhew3MoJiLQavzgLtlY/w45Ydc X-Received: by 2002:a17:907:3e06:b0:733:693:600e with SMTP id hp6-20020a1709073e0600b007330693600emr21204502ejc.410.1667377910871; Wed, 02 Nov 2022 01:31:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667377910; cv=none; d=google.com; s=arc-20160816; b=ivj0WXOzKMDBbbD//3ohxmlNd/TkEbqwMP/+8jLFVEODDQiLMeqx1sTc8cXRrg/aiR 0Bixb6jy4oKyvDOUh7qVPe+P2wdsBKQYqSZJneaq52WoN0Ke6WYVX7knGx4gH+lEIAg7 p1uTBK3GTRj2+yjQb6rXOz7Qi/hcKkpWofeh2hM+L8XKcF+RcOzYNoH5M3LdxiyUEeMw PC8HD5rd23x0l+uw10+cR8oRKl/SE9EVFfyd4Xr7B+FZr5pOl76wHaIBOBlncel3EtLF TB6FjzOL6K9Gz2mhUsG9MryTGeW8KjTGi6xqRiySeTdiDOY827MOd9Ie+vQOpR8YOd+5 KRSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :organization:from:references:cc:to:subject:user-agent:mime-version :date:message-id; bh=svIGRSS1VVolWkDV4P5xAip4g+IJTaNKZodS3fHx068=; b=b7U65V3aID9slXUImZq1Uh1p0c7lSWWNsxUBUp50iyTdQNgSVYnbm1gNoljfkqz0ft oX0SYMPUlLMuXWf6ex19Ce0FTtcyhrE1LiKOGb2M63/8QYQR4cEcFZi3B/BY+yKBX+uv fUKnRKxRbjLyR5KeaFEzBeDr1OsWraB1aTF30lOAUxa5uA50zPS37gNd+xh0wqMLPbNC DRr6ZYpaAJxrmFFCjN4uSyoTdTyl+vOYZOLu/Bqg1CxJTTwh8JYi84um+pjFEojGXYeT EPH4dYKgiqeDZaarw5J3lHZ6F5BhuL99H6t1evFBQunL9S61RLCg2KwfpkqfIwGEKWRb +K5Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s1-20020a170906060100b00781c9c3b6f5si11201375ejb.474.2022.11.02.01.31.27; Wed, 02 Nov 2022 01:31:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230330AbiKBHUA (ORCPT + 96 others); Wed, 2 Nov 2022 03:20:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46628 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230300AbiKBHT6 (ORCPT ); Wed, 2 Nov 2022 03:19:58 -0400 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E296F23164; Wed, 2 Nov 2022 00:19:55 -0700 (PDT) Received: from canpemm500005.china.huawei.com (unknown [172.30.72.56]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4N2J740d2TzRnx1; Wed, 2 Nov 2022 15:14:56 +0800 (CST) Received: from [10.174.178.197] (10.174.178.197) by canpemm500005.china.huawei.com (7.192.104.229) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Wed, 2 Nov 2022 15:19:52 +0800 Message-ID: <666b976a-8873-25e2-66dd-1398682c6cb7@huawei.com> Date: Wed, 2 Nov 2022 15:19:52 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.4.1 Subject: Re: [PATCH -next] bpf, test_run: fix alignment problem in bpf_prog_test_run_skb() To: Eric Dumazet , Kees Cook CC: Jakub Kicinski , Daniel Borkmann , , , , , , , , , , Alexander Potapenko , Marco Elver , Dmitry Vyukov , Linux MM , References: <20221101040440.3637007-1-zhongbaisong@huawei.com> <20221101210542.724e3442@kernel.org> <202211012121.47D68D0@keescook> From: zhongbaisong Organization: huawei In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.178.197] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To canpemm500005.china.huawei.com (7.192.104.229) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2022/11/2 12:37, Eric Dumazet wrote: > On Tue, Nov 1, 2022 at 9:27 PM Kees Cook wrote: >> >> On Tue, Nov 01, 2022 at 09:05:42PM -0700, Jakub Kicinski wrote: >>> On Wed, 2 Nov 2022 10:59:44 +0800 zhongbaisong wrote: >>>> On 2022/11/2 0:45, Daniel Borkmann wrote: >>>>> [ +kfence folks ] >>>> >>>> + cc: Alexander Potapenko, Marco Elver, Dmitry Vyukov >>>> >>>> Do you have any suggestions about this problem? >>> >>> + Kees who has been sending similar patches for drivers >>> >>>>> On 11/1/22 5:04 AM, Baisong Zhong wrote: >>>>>> Recently, we got a syzkaller problem because of aarch64 >>>>>> alignment fault if KFENCE enabled. >>>>>> >>>>>> When the size from user bpf program is an odd number, like >>>>>> 399, 407, etc, it will cause skb shard info's alignment access, >>>>>> as seen below: >>>>>> >>>>>> BUG: KFENCE: use-after-free read in __skb_clone+0x23c/0x2a0 >>>>>> net/core/skbuff.c:1032 >>>>>> >>>>>> Use-after-free read at 0xffff6254fffac077 (in kfence-#213): >>>>>> __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:26 [inline] >>>>>> arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline] >>>>>> arch_atomic_inc include/linux/atomic-arch-fallback.h:270 [inline] >>>>>> atomic_inc include/asm-generic/atomic-instrumented.h:241 [inline] >>>>>> __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032 >>>>>> skb_clone+0xf4/0x214 net/core/skbuff.c:1481 >>>>>> ____bpf_clone_redirect net/core/filter.c:2433 [inline] >>>>>> bpf_clone_redirect+0x78/0x1c0 net/core/filter.c:2420 >>>>>> bpf_prog_d3839dd9068ceb51+0x80/0x330 >>>>>> bpf_dispatcher_nop_func include/linux/bpf.h:728 [inline] >>>>>> bpf_test_run+0x3c0/0x6c0 net/bpf/test_run.c:53 >>>>>> bpf_prog_test_run_skb+0x638/0xa7c net/bpf/test_run.c:594 >>>>>> bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline] >>>>>> __do_sys_bpf kernel/bpf/syscall.c:4441 [inline] >>>>>> __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381 >>>>>> >>>>>> kfence-#213: 0xffff6254fffac000-0xffff6254fffac196, size=407, >>>>>> cache=kmalloc-512 >>>>>> >>>>>> allocated by task 15074 on cpu 0 at 1342.585390s: >>>>>> kmalloc include/linux/slab.h:568 [inline] >>>>>> kzalloc include/linux/slab.h:675 [inline] >>>>>> bpf_test_init.isra.0+0xac/0x290 net/bpf/test_run.c:191 >>>>>> bpf_prog_test_run_skb+0x11c/0xa7c net/bpf/test_run.c:512 >>>>>> bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline] >>>>>> __do_sys_bpf kernel/bpf/syscall.c:4441 [inline] >>>>>> __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381 >>>>>> __arm64_sys_bpf+0x50/0x60 kernel/bpf/syscall.c:4381 >>>>>> >>>>>> To fix the problem, we round up allocations with kmalloc_size_roundup() >>>>>> so that build_skb()'s use of kize() is always alignment and no special >>>>>> handling of the memory is needed by KFENCE. >>>>>> >>>>>> Fixes: 1cf1cae963c2 ("bpf: introduce BPF_PROG_TEST_RUN command") >>>>>> Signed-off-by: Baisong Zhong >>>>>> --- >>>>>> net/bpf/test_run.c | 1 + >>>>>> 1 file changed, 1 insertion(+) >>>>>> >>>>>> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c >>>>>> index 13d578ce2a09..058b67108873 100644 >>>>>> --- a/net/bpf/test_run.c >>>>>> +++ b/net/bpf/test_run.c >>>>>> @@ -774,6 +774,7 @@ static void *bpf_test_init(const union bpf_attr >>>>>> *kattr, u32 user_size, >>>>>> if (user_size > size) >>>>>> return ERR_PTR(-EMSGSIZE); >>>>>> + size = kmalloc_size_roundup(size); >>>>>> data = kzalloc(size + headroom + tailroom, GFP_USER); >>>>> >>>>> The fact that you need to do this roundup on call sites feels broken, no? >>>>> Was there some discussion / consensus that now all k*alloc() call sites >>>>> would need to be fixed up? Couldn't this be done transparently in k*alloc() >>>>> when KFENCE is enabled? I presume there may be lots of other such occasions >>>>> in the kernel where similar issue triggers, fixing up all call-sites feels >>>>> like ton of churn compared to api-internal, generic fix. >> >> I hope I answer this in more detail here: >> https://lore.kernel.org/lkml/202211010937.4631CB1B0E@keescook/ >> >> The problem is that ksize() should never have existed in the first >> place. :P Every runtime bounds checker has tripped over it, and with >> the addition of the __alloc_size attribute, I had to start ripping >> ksize() out: it can't be used to pretend an allocation grew in size. >> Things need to either preallocate more or go through *realloc() like >> everything else. Luckily, ksize() is rare. >> >> FWIW, the above fix doesn't look correct to me -- I would expect this to >> be: >> >> size_t alloc_size; >> ... >> alloc_size = kmalloc_size_roundup(size + headroom + tailroom); >> data = kzalloc(alloc_size, GFP_USER); > > Making sure the struct skb_shared_info is aligned to a cache line does > not need kmalloc_size_roundup(). > > What is needed is to adjust @size so that (@size + @headroom) is a > multiple of SMP_CACHE_BYTES ok, I'll fix it and send v2. Thanks .