Received: by 2002:a05:7412:518d:b0:e2:908c:2ebd with SMTP id fn13csp451467rdb; Thu, 5 Oct 2023 10:29:36 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHiwy7vJeKJrzxnWT7vOUVnEKrxLVVlhE5VtcWcrT1MXHBDzNbJ2bs0JmgNISKReiHz3gHh X-Received: by 2002:a9d:62d4:0:b0:6be:ea3e:367 with SMTP id z20-20020a9d62d4000000b006beea3e0367mr5636639otk.23.1696526975787; Thu, 05 Oct 2023 10:29:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696526975; cv=none; d=google.com; s=arc-20160816; b=BAmICQ7r/thLxdtLzKYbmN2epfZ7a9Wv+q+pkuQTDEkUKl+C9IybrmMO0zWtO/t38p exJtssa9MvAbD1pKTybejnKJy3tSMXSPUvi81aa/buPmDmWyBB4NzcBCknev9z0b/9rs XJVBHJrQTNmXifIHTxVxbIfCbh5UKKZO/dNB8r/XeQrwVKCm5QWCrrW/ExXddp8NgnL8 JjQfUqQP7A/GADSczZaKTjU7EPKBuXDMH7ymcKgwD4Ag6Kv5oMJD4Pb0lKcSXcuY+Ccn eQV/S+CXoYghKdNR7HtFUPLnEyh4pc6TCWhtjM1cYTll+YQ2PE/qbsbNsVGJvlQS5ox7 Kl+g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:references :cc:to:from:content-language:subject:mime-version:date :dkim-signature:message-id; bh=2IylT7nMWAJTSHME82z6aOaySj3++vJi55n7xJOKKCw=; fh=YOxCXTBeQxnRa637EZXRG8jc/wsLw7lbIJOc1iK4hkw=; b=ZdjJ0EYzYM+XnYdr7eLM+7lXmOtC4i0ogfcDqaQMCCjMRLp3D4ISTalgne6N9+PLPy UE4ERDCsTpJjZKADliD5WjGNjRSZ/bLET0k6SuNJ2GtNnP4/ZyxsPm1oUroM20YZknEf xYFzoAZxKG5VlNMu4GDsj/J7u1GzV95SO3S+hj3ykWtkfsNwTEmKqqqNX5y+sA/oOhsO /hvXltC2kEt0JBTvHW+0TlmiZ908bDbb0ZdPxlHyGuiehbQFwEfSvcnjnE6snTTkvqD3 BlYLc2GValwAMuCBw5AaLlnuDpQRMfll2chi0MQjCtV0rsENqu9N2lfL7m1vmLPirvzF LfIQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=qFZJ9Wlq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Return-Path: Received: from morse.vger.email (morse.vger.email. [23.128.96.31]) by mx.google.com with ESMTPS id h64-20020a636c43000000b0058962af84easi44586pgc.135.2023.10.05.10.29.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Oct 2023 10:29:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) client-ip=23.128.96.31; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=qFZJ9Wlq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id 8DD6D83C5539; Thu, 28 Sep 2023 22:39:45 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230181AbjI2Fiq (ORCPT + 99 others); Fri, 29 Sep 2023 01:38:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44574 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229541AbjI2Fip (ORCPT ); Fri, 29 Sep 2023 01:38:45 -0400 Received: from out-209.mta0.migadu.com (out-209.mta0.migadu.com [91.218.175.209]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C35D8199 for ; Thu, 28 Sep 2023 22:38:42 -0700 (PDT) Message-ID: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1695965920; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2IylT7nMWAJTSHME82z6aOaySj3++vJi55n7xJOKKCw=; b=qFZJ9Wlq8pgzkY2ebidjLO2sKDsFv5UgKFQXKm6YijrYJeteg/9Vq6raUELQsHnvbhLaEL H/7kdOce/CH9ku/oCdYNAnlWS8gQa7mjPcTJIfppvrZiVQeCFkRrLpXNUYgcu9kNkkFr+C bYQ/RUM0EXiahcapixUAdBUPEi7SwIE= Date: Fri, 29 Sep 2023 13:38:33 +0800 MIME-Version: 1.0 Subject: Re: [PATCH v6] net/core: Introduce netdev_core_stats_inc() Content-Language: en-US X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Yajun Deng To: Eric Dumazet Cc: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Alexander Lobakin References: <20230928100418.521594-1-yajun.deng@linux.dev> <5d8e302c-a28d-d4f4-eb91-4b54eb89490b@linux.dev> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Thu, 28 Sep 2023 22:39:45 -0700 (PDT) On 2023/9/29 00:32, Yajun Deng wrote: > > On 2023/9/29 00:23, Eric Dumazet wrote: >> On Thu, Sep 28, 2023 at 6:16 PM Yajun Deng wrote: >>> >>> On 2023/9/28 23:44, Eric Dumazet wrote: >>>> On Thu, Sep 28, 2023 at 5:40 PM Yajun Deng >>>> wrote: >>>>> On 2023/9/28 22:18, Eric Dumazet wrote: >>>>>> On Thu, Sep 28, 2023 at 12:04 PM Yajun Deng >>>>>> wrote: >>>>>>> Although there is a kfree_skb_reason() helper function that can >>>>>>> be used to >>>>>>> find the reason why this skb is dropped, but most callers didn't >>>>>>> increase >>>>>>> one of rx_dropped, tx_dropped, rx_nohandler and >>>>>>> rx_otherhost_dropped. >>>>>>> >>>>>>> For the users, people are more concerned about why the dropped >>>>>>> in ip >>>>>>> is increasing. >>>>>>> >>>>>>> Introduce netdev_core_stats_inc() for trace the caller of the >>>>>>> dropped >>>>>>> skb. Also, add __code to netdev_core_stats_alloc(), as it's called >>>>>>> unlinkly. >>>>>>> >>>>>>> Signed-off-by: Yajun Deng >>>>>>> Suggested-by: Alexander Lobakin >>>>>>> --- >>>>>>> v6: merge netdev_core_stats and netdev_core_stats_inc together >>>>>>> v5: Access the per cpu pointer before reach the relevant offset. >>>>>>> v4: Introduce netdev_core_stats_inc() instead of export >>>>>>> dev_core_stats_*_inc() >>>>>>> v3: __cold should be added to the netdev_core_stats_alloc(). >>>>>>> v2: use __cold instead of inline in dev_core_stats(). >>>>>>> v1: >>>>>>> https://lore.kernel.org/netdev/20230911082016.3694700-1-yajun.deng@linux.dev/ >>>>>>> --- >>>>>>>     include/linux/netdevice.h | 21 ++++----------------- >>>>>>>     net/core/dev.c            | 17 +++++++++++++++-- >>>>>>>     2 files changed, 19 insertions(+), 19 deletions(-) >>>>>>> >>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>>>>>> index 7e520c14eb8c..eb1fa04fbccc 100644 >>>>>>> --- a/include/linux/netdevice.h >>>>>>> +++ b/include/linux/netdevice.h >>>>>>> @@ -4002,32 +4002,19 @@ static __always_inline bool >>>>>>> __is_skb_forwardable(const struct net_device *dev, >>>>>>>            return false; >>>>>>>     } >>>>>>> >>>>>>> -struct net_device_core_stats __percpu >>>>>>> *netdev_core_stats_alloc(struct net_device *dev); >>>>>>> - >>>>>>> -static inline struct net_device_core_stats __percpu >>>>>>> *dev_core_stats(struct net_device *dev) >>>>>>> -{ >>>>>>> -       /* This READ_ONCE() pairs with the write in >>>>>>> netdev_core_stats_alloc() */ >>>>>>> -       struct net_device_core_stats __percpu *p = >>>>>>> READ_ONCE(dev->core_stats); >>>>>>> - >>>>>>> -       if (likely(p)) >>>>>>> -               return p; >>>>>>> - >>>>>>> -       return netdev_core_stats_alloc(dev); >>>>>>> -} >>>>>>> +void netdev_core_stats_inc(struct net_device *dev, u32 offset); >>>>>>> >>>>>>>     #define DEV_CORE_STATS_INC(FIELD) \ >>>>>>>     static inline void dev_core_stats_##FIELD##_inc(struct >>>>>>> net_device *dev)                \ >>>>>>> { \ >>>>>>> -       struct net_device_core_stats __percpu >>>>>>> *p;                               \ >>>>>>> - \ >>>>>>> -       p = dev_core_stats(dev); \ >>>>>>> -       if (p) \ >>>>>>> - this_cpu_inc(p->FIELD); \ >>>>>> Note that we were using this_cpu_inc() which implied : >>>>>> - IRQ safety, and >>>>>> - a barrier paired with : >>>>>> >>>>>> net/core/dev.c:10548: storage->rx_dropped += >>>>>> READ_ONCE(core_stats->rx_dropped); >>>>>> net/core/dev.c:10549: storage->tx_dropped += >>>>>> READ_ONCE(core_stats->tx_dropped); >>>>>> net/core/dev.c:10550: storage->rx_nohandler += >>>>>> READ_ONCE(core_stats->rx_nohandler); >>>>>> net/core/dev.c:10551: storage->rx_otherhost_dropped >>>>>> += READ_ONCE(core_stats->rx_otherhost_dropped); >>>>>> >>>>>> >>>>>>> + netdev_core_stats_inc(dev, \ >>>>>>> +                       offsetof(struct net_device_core_stats, >>>>>>> FIELD));         \ >>>>>>>     } >>>>>>>     DEV_CORE_STATS_INC(rx_dropped) >>>>>>>     DEV_CORE_STATS_INC(tx_dropped) >>>>>>>     DEV_CORE_STATS_INC(rx_nohandler) >>>>>>>     DEV_CORE_STATS_INC(rx_otherhost_dropped) >>>>>>> +#undef DEV_CORE_STATS_INC >>>>>>> >>>>>>>     static __always_inline int ____dev_forward_skb(struct >>>>>>> net_device *dev, >>>>>>> struct sk_buff *skb, >>>>>>> diff --git a/net/core/dev.c b/net/core/dev.c >>>>>>> index 606a366cc209..88a32c392c1d 100644 >>>>>>> --- a/net/core/dev.c >>>>>>> +++ b/net/core/dev.c >>>>>>> @@ -10497,7 +10497,8 @@ void netdev_stats_to_stats64(struct >>>>>>> rtnl_link_stats64 *stats64, >>>>>>>     } >>>>>>>     EXPORT_SYMBOL(netdev_stats_to_stats64); >>>>>>> >>>>>>> -struct net_device_core_stats __percpu >>>>>>> *netdev_core_stats_alloc(struct net_device *dev) >>>>>>> +static __cold struct net_device_core_stats __percpu >>>>>>> *netdev_core_stats_alloc( >>>>>>> +               struct net_device *dev) >>>>>>>     { >>>>>>>            struct net_device_core_stats __percpu *p; >>>>>>> >>>>>>> @@ -10510,7 +10511,19 @@ struct net_device_core_stats __percpu >>>>>>> *netdev_core_stats_alloc(struct net_device >>>>>>>            /* This READ_ONCE() pairs with the cmpxchg() above */ >>>>>>>            return READ_ONCE(dev->core_stats); >>>>>>>     } >>>>>>> -EXPORT_SYMBOL(netdev_core_stats_alloc); >>>>>>> + >>>>>>> +void netdev_core_stats_inc(struct net_device *dev, u32 offset) >>>>>>> +{ >>>>>>> +       /* This READ_ONCE() pairs with the write in >>>>>>> netdev_core_stats_alloc() */ >>>>>>> +       struct net_device_core_stats __percpu *p = >>>>>>> READ_ONCE(dev->core_stats); >>>>>>> + >>>>>>> +       if (unlikely(!p)) >>>>>>> +               p = netdev_core_stats_alloc(dev); >>>>>>> + >>>>>>> +       if (p) >>>>>>> +               (*(unsigned long *)((void *)this_cpu_ptr(p) + >>>>>>> offset))++; >>>>>> While here you are using a ++ operation that : >>>>>> >>>>>> - is not irq safe >>>>>> - might cause store-tearing. >>>>>> >>>>>> I would suggest a preliminary patch converting the "unsigned >>>>>> long" fields in >>>>>> struct net_device_core_stats to local_t >>>>> Do you mean it needs to revert the commit 6510ea973d8d ("net: Use >>>>> this_cpu_inc() to increment >>>>> >>>>> net->core_stats") first? But it would allocate memory which breaks on >>>>> PREEMPT_RT. >>>> I think I provided an (untested) alternative. >>>> >>>> unsigned long __percpu *field = (__force unsigned long __percpu *) >>>> ((__force u8 *)p + offset); >>>> this_cpu_inc(field); >>> unsigned long __percpu *field = (__force unsigned long __percpu *) >>> ((__force u8 *)p + offset); >>> this_cpu_inc(*(int *)field); >>> >>> This would compiler success. But I didn't test it. >>> This cold look complex. >> Why exactly ? Not very different from the cast you already had. > Okay, I'll test it. It seems something wrong. "ip -s a" would see the 'dropped' is increasing. But I cann't trace anything by the following cmd. "sudo  python3  /usr/share/bcc/tools/trace netdev_core_stats_inc" If I change back to "(*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++; ", I can trace the caller. So the following code would accidentally change somthing. unsigned long __percpu *field = (__force unsigned long __percpu *) ((__force u8 *)p + offset); this_cpu_inc(*field); >> >>> Shoud I base v3? Export dev_core_stats_*_inc() intead of introduce >>> netdev_core_stats_inc(). >>> That would be easy. >> Well, you tell me, but this does not look incremental to me. >> >> I do not think we need 4 different (and maybe more to come if struct >> net_device_core_stats >> grows in the future) functions for some hardly used path.