Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp1794977ybb; Thu, 2 Apr 2020 07:31:14 -0700 (PDT) X-Google-Smtp-Source: APiQypJvCo5+UogZWGmX6GxQmIqeS5dT8l/mw4eFQOi8S1z/vVGFLuVvJHFU2mMJIXluMU7g05Ds X-Received: by 2002:a4a:e38c:: with SMTP id l12mr2942249oov.7.1585837873892; Thu, 02 Apr 2020 07:31:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1585837873; cv=none; d=google.com; s=arc-20160816; b=tKIW9+iY3wX/q783b+bmqen9m0TTilJ5e4xTRjycKpFtpcmPTPfqdnHncTCAHYYPpo QtxwbpWEwRe3/cCKCKyF8ikxWXOjtDkzUAWv4yGYZ8hVhih1NeIKKsZtXPAmR1EvREyX 4w5WPGfzsYOZtyLM7d+zg6UlbGfs4mA8lLF17U7Ulcq4CCKSTNbkHlFrdbP2yTl66W3G krAKsivD1sbp3SqlXUFDvqfm4fuN1LlBu4ugdqmP48dqNhYxGadCcNxF5lNhHdDepGxG LH1KC2yjxoJhGbNeq17Qzbhi404P4ISNjDxWlzdovukdttse0TWoCr/Ch6KJjMaFeiJy odGA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=/yuqNXBhrtaso9MxkCw5lAj3Qk8T0+fTXX+OKOfEC+o=; b=cJuFxedz87Smnd+DUpnw8T+Zf+x8gw7TUxAzcgvTRtUJ3oHFTHZwg5Z9jDnMtW3RvC PRtYGobDYB1TZI1Gr5KgClPlLwLE3+vrtaGld2AkM00wdoz0vdonsGRrSj35wdSn6o/G TbUXdGSIBhD/B54G0Yp73J/awJjAx0E1jsRY6KXQpcuSUHdddb3lJvjclNuG3x6L6feb 6juqDkLQ6oFzfXCwU9xa9+o9HI3AUXVaAgNCCFfncJ4NoHSI3CiNB7FBOqKBXyfotJ03 zal0QGRbucLGcCzWufwyic3OdL18QhXGQFrafCedIAqIwL4vLEyxhJs2kEIUjpzr6HVv L3mw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cumulusnetworks.com header.s=google header.b=g9qaQ46h; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=cumulusnetworks.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n187si2470843oob.22.2020.04.02.07.30.56; Thu, 02 Apr 2020 07:31:13 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@cumulusnetworks.com header.s=google header.b=g9qaQ46h; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=cumulusnetworks.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388452AbgDBO3W (ORCPT + 99 others); Thu, 2 Apr 2020 10:29:22 -0400 Received: from mail-lf1-f67.google.com ([209.85.167.67]:34923 "EHLO mail-lf1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729123AbgDBO3V (ORCPT ); Thu, 2 Apr 2020 10:29:21 -0400 Received: by mail-lf1-f67.google.com with SMTP id d15so1110840lfn.2 for ; Thu, 02 Apr 2020 07:29:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cumulusnetworks.com; s=google; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=/yuqNXBhrtaso9MxkCw5lAj3Qk8T0+fTXX+OKOfEC+o=; b=g9qaQ46hDYtPsVPXOH1pGz1zj4zi6d6GRazQDYgAwpI2ZVqoo0G8OuNosnipEPK5bK xbYjlr+4638diq+pgtE9fPgnlwlBfow5eEiVRNdrFPLyLt4K0A+UBCRYNXlU1UtpNMTQ g6A4h94Ov3QfAoaeCB1XOtmSYAZHyc8MVczI4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=/yuqNXBhrtaso9MxkCw5lAj3Qk8T0+fTXX+OKOfEC+o=; b=VPPJEvAXQ2vvG6NNr4sLPn5aeqr3NMtFUq/jyvwkW7qjFAvZLSN+e7zPEU+z+ldMNb wtz9qC2AZQG8IQJ+C+Qgzw232cig1Fzxv3/kMt4GmspHBALi8UlX3oIGPUsIpdAaJLHD zfrQr8P9DxPILl04WYejHKgUVGA5H/nXwPXOBybtZ4qW0VHLRQ1EoYGHZ8yFsffm6pJc 8HdfXRl1rQG0MEw2BBLTOcBI2iIxdxpAVSzHbmur82pV0oXnj+IMfJvrbutnAAQ2trjx sxGIp+nAFP2QFOr4lbNDBnUOo0xOipe/ab9dDZ7g+6Ii0MiRKJbfwaRq0h0y3cuI/P/c obQw== X-Gm-Message-State: AGi0PuaqAGWKbDWdOK4olsrAwLajmE+FA+LrBXZFZ1C6L45blObZ7Zzx OChFhqQPnylyGP4bAXAJbhAo+w== X-Received: by 2002:ac2:46ed:: with SMTP id q13mr2378225lfo.176.1585837757296; Thu, 02 Apr 2020 07:29:17 -0700 (PDT) Received: from [192.168.0.109] (84-238-136-197.ip.btc-net.bg. [84.238.136.197]) by smtp.gmail.com with ESMTPSA id d19sm3245698lji.95.2020.04.02.07.29.15 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 02 Apr 2020 07:29:16 -0700 (PDT) Subject: Re: [RFC net-next v4 7/9] bridge: mrp: Connect MRP api with the switchev API To: Horatiu Vultur Cc: davem@davemloft.net, jiri@resnulli.us, ivecera@redhat.com, kuba@kernel.org, roopa@cumulusnetworks.com, olteanv@gmail.com, andrew@lunn.ch, UNGLinuxDriver@microchip.com, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, bridge@lists.linux-foundation.org References: <20200327092126.15407-1-horatiu.vultur@microchip.com> <20200327092126.15407-8-horatiu.vultur@microchip.com> <7e85b9fe-f518-0c5a-0891-6f64755407c3@cumulusnetworks.com> <20200401160621.4fq66xwamuhmzxdb@soft-dev3.microsemi.net> <98fb3248-687d-f6c1-cc0b-1c96423d82ca@cumulusnetworks.com> <20200402141827.fnnuvecq2bhunxhp@soft-dev3.microsemi.net> From: Nikolay Aleksandrov Message-ID: <8b720de0-78cc-8e4f-4fb9-dfd7ce5506ab@cumulusnetworks.com> Date: Thu, 2 Apr 2020 17:29:14 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0 MIME-Version: 1.0 In-Reply-To: <20200402141827.fnnuvecq2bhunxhp@soft-dev3.microsemi.net> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/04/2020 17:18, Horatiu Vultur wrote: > The 04/01/2020 19:32, Nikolay Aleksandrov wrote: >> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe >> >> On 01/04/2020 19:06, Horatiu Vultur wrote: >>> The 03/30/2020 19:11, Nikolay Aleksandrov wrote: >>>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe >>>> >>>> On 27/03/2020 11:21, Horatiu Vultur wrote: >>>>> Implement the MRP api. >>>>> >>>>> In case the HW can't generate MRP Test frames then the SW will try to generate >>>>> the frames. In case that also the SW will fail in generating the frames then a >>>>> error is return to the userspace. The userspace is responsible to generate all >>>>> the other MRP frames regardless if the test frames are generated by HW or SW. >>>>> >>>>> The forwarding/termination of MRP frames is happening in the kernel and is done >>>>> by the MRP instance. The userspace application doesn't do the forwarding. >>>>> >>>>> Signed-off-by: Horatiu Vultur >>>>> --- >>>>> net/bridge/br_mrp.c | 514 ++++++++++++++++++++++++++++++++++++++++++++ >>>>> 1 file changed, 514 insertions(+) >>>>> create mode 100644 net/bridge/br_mrp.c >>>>> >>>> >>>> Hi, >>> >>> Hi Nik, >>> >>>> In general the RCU usage needs more work. >>> >>> Thanks for the detailed review, this is my first time when I use the RCU, >>> so I might need to spend more time on time. >>> >>>> Also I might've missed it, but where do you >>>> handle bridge port delete which is used in mrp ? >>> >>> When a port is deleted, then the userspace application will be notified >>> and then the userspace will remove the MRP instance. Because there is no >>> point to have a MRP instance with only 1 port. And the function that >>> delets the MRP instance is br_mrp_del. >>> >> >> How would you execute br_mrp_del() if the port is already deleted from the bridge ? >> Nothing prevents the port to disappear and then you lose all bridge callbacks. > > The br_mrp_del() is called on the bridge interface and not on the port. > The flow as I see it: the port is deleted from the bridge, the userspace > will be notified by this, the userspace will determine the bridge on > which was the port and then the userspace will make netlink call on the > bridge interface. In this way it would not loose the callback when the > port is deleted from the bridge. > The question is when the bridge is deleted, I can see that first it > deletes the ports and then it would delete the bridge. In this case I am > loosing the callbacks. > Should the function br_dev_delete be exteded for this? > That is correct, but I don't think the above would work. There are pointers to ports in MRP and if they get deleted those ptrs are no longer valid. Even in br_mrp_del() there's code like: /* Reset the ports */ p = rcu_dereference_protected(mrp->p_port, lockdep_is_held(&mrp->lock)); but p could've been deleted before so now br_mrp_del() will deref an invalid ptr. By the way I just noticed another bug in br_mrp_del(): + synchronize_rcu(); + + list_del_rcu(&mrp->list); this is the wrong way around, you should delete it from the list so it can't be found by any new readers and then do synchronize_rcu() before freeing it. Also why do you use the devm_ calls ? You're allocating/freeing the memory so you don't need the managed alloc, just cleanup properly on port/bridge del. >> >>>> Also do you actually need the mrp->lock ? >>> >>> I think I should be fine not to use mrp->lock because already the rtnl >>> lock is taken. >>>>> >>>>> diff --git a/net/bridge/br_mrp.c b/net/bridge/br_mrp.c >>>>> new file mode 100644 >>>>> index 000000000000..f1de792d7a6e >>>>> --- /dev/null >>>>> +++ b/net/bridge/br_mrp.c >>>>> @@ -0,0 +1,514 @@ >>>>> +// SPDX-License-Identifier: GPL-2.0-or-later >>>>> + >>>>> +#include "br_private_mrp.h" >>>>> + >>>>> +static const u8 mrp_test_dmac[ETH_ALEN] = { 0x1, 0x15, 0x4e, 0x0, 0x0, 0x1 }; >>>>> + >>>>> +static struct net_bridge_port *br_mrp_get_port(struct net_bridge *br, >>>>> + u32 ifindex) >>>>> +{ >>>>> + struct net_bridge_port *res = NULL; >>>>> + struct net_bridge_port *port; >>>>> + >>>>> + spin_lock_bh(&br->lock); >>>>> + >>>> >>>> This is called under RTNL, you don't need the br->lock. >>> >>> Will be fix in the next patch series. >>>> >>>>> + list_for_each_entry(port, &br->port_list, list) { >>>>> + if (port->dev->ifindex == ifindex) { >>>>> + res = port; >>>>> + break; >>>>> + } >>>>> + } >>>>> + >>>>> + spin_unlock_bh(&br->lock); >>>>> + >>>>> + return res; >>>>> +} >>>>> + >>>>> +static struct br_mrp *br_mrp_find_id(struct net_bridge *br, u32 ring_id) >>>>> +{ >>>>> + struct br_mrp *res = NULL; >>>>> + struct br_mrp *mrp; >>>>> + >>>>> + rcu_read_lock(); >>>>> + >>>> >>>> This is generally a bad pattern because it can hide legitimate bugs and make >>>> it harder to debug. >>> >>> Can you give me a little more details why is a bad pattern? >>> I have tried to read about rcu from here[1][2]. But I couldn't see >>> anything about this. >>> >> >> In general you should know the context the function is used in, you cannot use the >> pointer obtained from this search after the rcu_read_unlock(). If this function is >> ever used in context which doesn't have rcu read lock or the writer lock then you'll >> mask the bug here. If you know it is always called from RCU context then just drop >> these, if not then add the proper lockdep annotations so they can be checked. > > Thanks for the details. I will fix it in the next patch series. >> >>>> >>>>> + list_for_each_entry_rcu(mrp, &br->mrp_list, list) { >>>>> + if (mrp->ring_id == ring_id) { >>>>> + res = mrp; >>>>> + break; >>>>> + } >>>>> + } >>>>> + >>>>> + rcu_read_unlock(); >>>>> + >>>>> + return res; >>>>> +} >>>>> + >>>>> +static struct br_mrp *br_mrp_find_port(struct net_bridge *br, >>>>> + struct net_bridge_port *p) >>>>> +{ >>>>> + struct br_mrp *res = NULL; >>>>> + struct br_mrp *mrp; >>>>> + >>>>> + rcu_read_lock(); >>>>> + >>>>> + list_for_each_entry_rcu(mrp, &br->mrp_list, list) { >>>>> + if (rcu_dereference(mrp->p_port) == p || >>>>> + rcu_dereference(mrp->s_port) == p) { >>>> >>>> rcu_access_pointer() is ok for comparisons >>> >>> Will be fix in the next patch series. >>>> >>>>> + res = mrp; >>>>> + break; >>>>> + } >>>>> + } >>>>> + >>>>> + rcu_read_unlock(); >>>>> + >>>>> + return res; >>>>> +} >>>>> + >>>>> +static int br_mrp_next_seq(struct br_mrp *mrp) >>>>> +{ >>>>> + mrp->seq_id++; >>>>> + return mrp->seq_id; >>>>> +} >>>>> + >>>>> +static struct sk_buff *br_mrp_skb_alloc(struct net_bridge_port *p, >>>>> + const u8 *src, const u8 *dst) >>>>> +{ >>>>> + struct ethhdr *eth_hdr; >>>>> + struct sk_buff *skb; >>>>> + u16 *version; >>>>> + >>>>> + skb = dev_alloc_skb(MRP_MAX_FRAME_LENGTH); >>>>> + if (!skb) >>>>> + return NULL; >>>>> + >>>>> + skb->dev = p->dev; >>>>> + skb->protocol = htons(ETH_P_MRP); >>>>> + skb->priority = MRP_FRAME_PRIO; >>>>> + skb_reserve(skb, sizeof(*eth_hdr)); >>>>> + >>>>> + eth_hdr = skb_push(skb, sizeof(*eth_hdr)); >>>>> + ether_addr_copy(eth_hdr->h_dest, dst); >>>>> + ether_addr_copy(eth_hdr->h_source, src); >>>>> + eth_hdr->h_proto = htons(ETH_P_MRP); >>>>> + >>>>> + version = skb_put(skb, sizeof(*version)); >>>>> + *version = cpu_to_be16(MRP_VERSION); >>>>> + >>>>> + return skb; >>>>> +} >>>>> + >>>>> +static void br_mrp_skb_tlv(struct sk_buff *skb, >>>>> + enum br_mrp_tlv_header_type type, >>>>> + u8 length) >>>>> +{ >>>>> + struct br_mrp_tlv_hdr *hdr; >>>>> + >>>>> + hdr = skb_put(skb, sizeof(*hdr)); >>>>> + hdr->type = type; >>>>> + hdr->length = length; >>>>> +} >>>>> + >>>>> +static void br_mrp_skb_common(struct sk_buff *skb, struct br_mrp *mrp) >>>>> +{ >>>>> + struct br_mrp_common_hdr *hdr; >>>>> + >>>>> + br_mrp_skb_tlv(skb, BR_MRP_TLV_HEADER_COMMON, sizeof(*hdr)); >>>>> + >>>>> + hdr = skb_put(skb, sizeof(*hdr)); >>>>> + hdr->seq_id = cpu_to_be16(br_mrp_next_seq(mrp)); >>>>> + memset(hdr->domain, 0xff, MRP_DOMAIN_UUID_LENGTH); >>>>> +} >>>>> + >>>>> +static struct sk_buff *br_mrp_alloc_test_skb(struct br_mrp *mrp, >>>>> + struct net_device *dev, >>>>> + enum br_mrp_port_role_type port_role) >>>>> +{ >>>>> + struct net_bridge_port *p = br_port_get_rtnl(dev); >>>>> + struct br_mrp_ring_test_hdr *hdr = NULL; >>>>> + struct net_bridge *br = p->br; >>>>> + struct sk_buff *skb = NULL; >>>>> + >>>>> + if (!p) >>>>> + return NULL; >>>>> + >>>>> + br = p->br; >>>>> + >>>>> + skb = br_mrp_skb_alloc(p, p->dev->dev_addr, mrp_test_dmac); >>>>> + if (!skb) >>>>> + return NULL; >>>>> + >>>>> + br_mrp_skb_tlv(skb, BR_MRP_TLV_HEADER_RING_TEST, sizeof(*hdr)); >>>>> + hdr = skb_put(skb, sizeof(*hdr)); >>>>> + >>>>> + hdr->prio = cpu_to_be16(MRP_DEFAULT_PRIO); >>>>> + ether_addr_copy(hdr->sa, p->br->dev->dev_addr); >>>>> + hdr->port_role = cpu_to_be16(port_role); >>>>> + hdr->state = cpu_to_be16(mrp->ring_state); >>>>> + hdr->transitions = cpu_to_be16(mrp->ring_transitions); >>>>> + hdr->timestamp = cpu_to_be32(jiffies_to_msecs(jiffies)); >>>>> + >>>>> + br_mrp_skb_common(skb, mrp); >>>>> + br_mrp_skb_tlv(skb, BR_MRP_TLV_HEADER_END, 0x0); >>>>> + >>>>> + return skb; >>>>> +} >>>>> + >>>>> +static void br_mrp_test_work_expired(struct work_struct *work) >>>>> +{ >>>>> + struct delayed_work *del_work = to_delayed_work(work); >>>>> + struct br_mrp *mrp = container_of(del_work, struct br_mrp, test_work); >>>>> + bool notify_open = false; >>>>> + struct sk_buff *skb; >>>>> + >>>> >>>> Since this runs asynchronously what happens if the port is deleted ? >>> >>> Later I have checks to see if the port is no NULL. Is not good enough? >>> I have these rcu_access_pointer checks and before that I disable the >>> interrupts and get the rcu lock. >>> >> >> That is not safe because you dereference the pointer again after the check >> and it may become NULL between those. You could do ptr = rcu_dereference(); >> if (!ptr) and if non-null then continue accessing that memory through ptr. > > I see, I will try to use rcu_dereference as you suggested. > >> >>>> >>>>> + if (time_before_eq(mrp->test_end, jiffies)) >>>>> + return; >>>>> + >>>>> + if (mrp->test_count_miss < mrp->test_max_miss) { >>>>> + mrp->test_count_miss++; >>>>> + } else { >>>>> + /* Notify that the ring is open only if the ring state is >>>>> + * closed, otherwise it would continue to notify at every >>>>> + * interval. >>>>> + */ >>>>> + if (mrp->ring_state == BR_MRP_RING_STATE_CLOSED) >>>>> + notify_open = true; >>>>> + } >>>>> + >>>>> + local_bh_disable(); >>>>> + rcu_read_lock(); >>>>> + >>>>> + if (!rcu_access_pointer(mrp->p_port) || >>>>> + !rcu_access_pointer(mrp->s_port)) >>>>> + goto out; >>>>> + >>>>> + /* Is it possible here to get call to delete the bridge port? If yes >>>>> + * I need to protect it >>>>> + */ >>>>> + dev_hold(rcu_dereference(mrp->p_port)->dev); >>>>> + >>>> >>>> This looks all wrong, p_port can become NULL here and you'll deref it. >>> >>> By disabling the interrupts and taking the rcu read lock, will I not be >>> sure that no one can access the p_port? >> >> No. You should read more about how RCU operates. > > Yes, I should definetly do that. > >> >>> If is not true, how the p_port can become NULL? >>> >> >> Readers and writer run concurrently. >> >>>> >>>>> + skb = br_mrp_alloc_test_skb(mrp, rcu_dereference(mrp->p_port)->dev, >>>>> + BR_MRP_PORT_ROLE_PRIMARY); >>>>> + if (!skb) >>>>> + goto out; >>>>> + >>>>> + skb_reset_network_header(skb); >>>>> + dev_queue_xmit(skb); >>>>> + >>>>> + if (notify_open && !mrp->ring_role_offloaded) >>>>> + br_mrp_port_open(rcu_dereference(mrp->p_port)->dev, true); >>>>> + >>>>> + dev_put(rcu_dereference(mrp->p_port)->dev); >>>>> + >>>>> + dev_hold(rcu_dereference(mrp->s_port)->dev); >>>>> + >>>> >>>> same here >>>> >>>>> + skb = br_mrp_alloc_test_skb(mrp, rcu_dereference(mrp->s_port)->dev, >>>>> + BR_MRP_PORT_ROLE_SECONDARY); >>>>> + if (!skb) >>>>> + goto out; >>>>> + >>>>> + skb_reset_network_header(skb); >>>>> + dev_queue_xmit(skb); >>>>> + >>>>> + if (notify_open && !mrp->ring_role_offloaded) >>>>> + br_mrp_port_open(rcu_dereference(mrp->s_port)->dev, true); >>>>> + >>>>> + dev_put(rcu_dereference(mrp->s_port)->dev); >>>>> + >>>>> +out: >>>>> + rcu_read_unlock(); >>>>> + local_bh_enable(); >>>>> + >>>>> + queue_delayed_work(system_wq, &mrp->test_work, >>>>> + usecs_to_jiffies(mrp->test_interval)); >>>>> +} >>>>> + >>>>> +/* Adds a new MRP instance. >>>>> + * note: called under rtnl_lock >>>>> + */ >>>>> +int br_mrp_add(struct net_bridge *br, struct br_mrp_instance *instance) >>>>> +{ >>>>> + struct net_bridge_port *p; >>>>> + struct br_mrp *mrp; >>>>> + >>>>> + /* If the ring exists, it is not possible to create another one with the >>>>> + * same ring_id >>>>> + */ >>>>> + mrp = br_mrp_find_id(br, instance->ring_id); >>>>> + if (mrp) >>>>> + return -EINVAL; >>>>> + >>>>> + if (!br_mrp_get_port(br, instance->p_ifindex) || >>>>> + !br_mrp_get_port(br, instance->s_ifindex)) >>>>> + return -EINVAL; >>>>> + >>>>> + mrp = devm_kzalloc(&br->dev->dev, sizeof(struct br_mrp), GFP_KERNEL); >>>>> + if (!mrp) >>>>> + return -ENOMEM; >>>>> + >>>>> + /* I think is not needed because this can be replaced with rtnl lock*/ >>>>> + spin_lock_init(&mrp->lock); >>>>> + spin_lock(&mrp->lock); >>>>> + >>>>> + mrp->br = br; >>>> >>>> Is this field (mrp->br) used anywhere ? >>> >>> Not anymore. I can remove it in the next patch series. >>> >>>> >>>>> + mrp->ring_id = instance->ring_id; >>>>> + >>>>> + p = br_mrp_get_port(br, instance->p_ifindex); >>>>> + p->state = BR_STATE_FORWARDING; >>>>> + p->flags |= BR_MRP_AWARE; >>>>> + rcu_assign_pointer(mrp->p_port, p); >>>>> + >>>>> + p = br_mrp_get_port(br, instance->s_ifindex); >>>>> + p->state = BR_STATE_FORWARDING; >>>>> + p->flags |= BR_MRP_AWARE; >>>>> + rcu_assign_pointer(mrp->s_port, p); >>>>> + >>>>> + br_mrp_switchdev_add(mrp); >>>>> + >>>>> + spin_unlock(&mrp->lock); >>>>> + synchronize_rcu(); >>>> >>>> Why do you need the synchronize here? >>> >>> Actually this shouldn't be after the list_add_tail_rcu? Because I am >>> thinking that some can read the list at the same time I am change it. >> >> That doesn't help, rcu primitives are already safe to run concurrently with readers. >> >>> >>>> >>>>> + >>>>> + list_add_tail_rcu(&mrp->list, &br->mrp_list); >>>>> + INIT_DELAYED_WORK(&mrp->test_work, br_mrp_test_work_expired); >>>>> + >>>>> + return 0; >>>>> +} >>>>> + >>>>> +/* Deletes existing MRP instance. >>>>> + * note: called under rtnl_lock >>>>> + */ >>>>> +int br_mrp_del(struct net_bridge *br, struct br_mrp_instance *instance) >>>>> +{ >>>>> + struct br_mrp *mrp = br_mrp_find_id(br, instance->ring_id); >>>>> + struct net_bridge_port *p; >>>>> + >>>>> + if (!mrp) >>>>> + return -EINVAL; >>>>> + >>>>> + /* Stop sending MRP_Test frames */ >>>>> + cancel_delayed_work(&mrp->test_work); >>>> >>>> cancel_delayed_work_sync() if you'd like to make sure it's stopped and finished (if it was running >>>> during this) >>> >>> Will be fixed in the next patch series. >>>> >>>>> + br_mrp_switchdev_send_ring_test(mrp, 0, 0, 0); >>>>> + >>>>> + spin_lock(&mrp->lock); >>>>> + >>>>> + br_mrp_switchdev_del(mrp); >>>>> + >>>>> + /* Reset the ports */ >>>>> + p = rcu_dereference_protected(mrp->p_port, lockdep_is_held(&mrp->lock)); >>>>> + if (p) { >>>>> + spin_lock(&br->lock); >>>>> + p->state = BR_STATE_FORWARDING; >>>>> + p->flags &= ~BR_MRP_AWARE; >>>>> + br_mrp_port_switchdev_set_state(p, BR_STATE_FORWARDING); >>>>> + rcu_assign_pointer(mrp->p_port, NULL); >>>>> + spin_unlock(&br->lock); >>>>> + } >>>>> + >>>>> + p = rcu_dereference_protected(mrp->s_port, lockdep_is_held(&mrp->lock)); >>>>> + if (p) { >>>>> + spin_lock(&br->lock); >>>>> + p->state = BR_STATE_FORWARDING; >>>>> + p->flags &= ~BR_MRP_AWARE; >>>>> + br_mrp_port_switchdev_set_state(p, BR_STATE_FORWARDING); >>>>> + rcu_assign_pointer(mrp->s_port, NULL); >>>>> + spin_unlock(&br->lock); >>>>> + } >>>>> + >>>>> + /* Destroy the ring */ >>>>> + mrp->br = NULL; >>>>> + >>>>> + spin_unlock(&mrp->lock); >>>>> + synchronize_rcu(); >>>>> + >>>>> + list_del_rcu(&mrp->list); >>>>> + devm_kfree(&br->dev->dev, mrp); >>>>> + >>>>> + return 0; >>>>> +} >>>>> + >>>>> +int br_mrp_set_port_state(struct net_bridge_port *p, >>>>> + enum br_mrp_port_state_type state) >>>>> +{ >>>>> + spin_lock(&p->br->lock); >>>>> + >>>>> + if (state == BR_MRP_PORT_STATE_FORWARDING) >>>>> + p->state = BR_STATE_FORWARDING; >>>>> + else >>>>> + p->state = BR_STATE_BLOCKING; >>>>> + >>>>> + br_mrp_port_switchdev_set_state(p, state); >>>>> + >>>>> + spin_unlock(&p->br->lock); >>>>> + >>>>> + return 0; >>>>> +} >>>>> + >>>>> +int br_mrp_set_port_role(struct net_bridge_port *p, >>>>> + u32 ring_id, enum br_mrp_port_role_type role) >>>>> +{ >>>>> + struct br_mrp *mrp = br_mrp_find_id(p->br, ring_id); >>>>> + >>>>> + if (!mrp) >>>>> + return -EINVAL; >>>>> + >>>>> + spin_lock(&mrp->lock); >>>>> + >>>>> + if (role == BR_MRP_PORT_ROLE_PRIMARY) >>>>> + rcu_assign_pointer(mrp->p_port, p); >>>>> + if (role == BR_MRP_PORT_ROLE_SECONDARY) >>>>> + rcu_assign_pointer(mrp->s_port, p); >>>>> + >>>>> + br_mrp_port_switchdev_set_role(p, role); >>>>> + >>>>> + spin_unlock(&mrp->lock); >>>>> + synchronize_rcu(); >>>> >>>> Why do you need to synchronize here? >>> >>> Actually this is not needed. >>>> >>>>> + >>>>> + return 0; >>>>> +} >>>>> + >>>>> +int br_mrp_set_ring_state(struct net_bridge *br, u32 ring_id, >>>>> + enum br_mrp_ring_state_type state) >>>>> +{ >>>>> + struct br_mrp *mrp = br_mrp_find_id(br, ring_id); >>>>> + >>>>> + if (!mrp) >>>>> + return -EINVAL; >>>>> + >>>>> + if (mrp->ring_state == BR_MRP_RING_STATE_CLOSED && >>>>> + state != BR_MRP_RING_STATE_CLOSED) >>>>> + mrp->ring_transitions++; >>>>> + >>>>> + mrp->ring_state = state; >>>>> + >>>>> + br_mrp_switchdev_set_ring_state(mrp, state); >>>>> + >>>>> + return 0; >>>>> +} >>>>> + >>>>> +int br_mrp_set_ring_role(struct net_bridge *br, u32 ring_id, >>>>> + enum br_mrp_ring_role_type role) >>>>> +{ >>>>> + struct br_mrp *mrp = br_mrp_find_id(br, ring_id); >>>>> + int err; >>>>> + >>>>> + if (!mrp) >>>>> + return -EINVAL; >>>>> + >>>>> + mrp->ring_role = role; >>>>> + >>>>> + /* If there is an error just bailed out */ >>>>> + err = br_mrp_switchdev_set_ring_role(mrp, role); >>>>> + if (err && err != -EOPNOTSUPP) >>>>> + return err; >>>>> + >>>>> + /* Now detect if the HW actually applied the role or not. If the HW >>>>> + * applied the role it means that the SW will not to do those operations >>>>> + * anymore. For example if the role ir MRM then the HW will notify the >>>>> + * SW when ring is open, but if the is not pushed to the HW the SW will >>>>> + * need to detect when the ring is open >>>>> + */ >>>>> + mrp->ring_role_offloaded = err == -EOPNOTSUPP ? 0 : 1; >>>>> + >>>>> + return 0; >>>>> +} >>>>> + >>>>> +int br_mrp_start_test(struct net_bridge *br, u32 ring_id, u32 interval, >>>>> + u8 max_miss, u32 period) >>>>> +{ >>>>> + struct br_mrp *mrp = br_mrp_find_id(br, ring_id); >>>>> + >>>>> + if (!mrp) >>>>> + return -EINVAL; >>>>> + >>>>> + /* Try to push is to the HW and if it fails then continue to generate in >>>>> + * SW and if that also fails then return error >>>>> + */ >>>>> + if (!br_mrp_switchdev_send_ring_test(mrp, interval, max_miss, period)) >>>>> + return 0; >>>>> + >>>>> + mrp->test_interval = interval; >>>>> + mrp->test_end = jiffies + usecs_to_jiffies(period); >>>>> + mrp->test_max_miss = max_miss; >>>>> + mrp->test_count_miss = 0; >>>>> + queue_delayed_work(system_wq, &mrp->test_work, >>>>> + usecs_to_jiffies(interval)); >>>>> + >>>>> + return 0; >>>>> +} >>>>> + >>>>> +/* Process only MRP Test frame. All the other MRP frames are processed by >>>>> + * userspace application >>>>> + * note: already called with rcu_read_lock >>>>> + */ >>>>> +static void br_mrp_mrm_process(struct br_mrp *mrp, struct sk_buff *skb) >>>>> +{ >>>>> + struct br_mrp_tlv_hdr *hdr; >>>>> + >>>>> + hdr = (struct br_mrp_tlv_hdr *)(skb->data + sizeof(uint16_t)); >>>>> + >>>>> + if (hdr->type != BR_MRP_TLV_HEADER_RING_TEST) >>>>> + return; >>>>> + >>>>> + mrp->test_count_miss = 0; >>>>> + >>>>> + br_mrp_port_open(rcu_dereference(mrp->p_port)->dev, false); >>>>> + br_mrp_port_open(rcu_dereference(mrp->s_port)->dev, false); >>>>> +} >>>>> + >>>>> +/* This will just forward the frame to the other mrp ring port(MRC role) or will >>>>> + * not do anything. >>>>> + * note: already called with rcu_read_lock >>>>> + */ >>>>> +static int br_mrp_rcv(struct net_bridge_port *p, >>>>> + struct sk_buff *skb, struct net_device *dev) >>>>> +{ >>>>> + struct net_device *s_dev, *p_dev, *d_dev; >>>>> + struct net_bridge *br; >>>>> + struct sk_buff *nskb; >>>>> + struct br_mrp *mrp; >>>>> + >>>>> + /* If port is disable don't accept any frames */ >>>>> + if (p->state == BR_STATE_DISABLED) >>>>> + return 0; >>>>> + >>>>> + br = p->br; >>>>> + mrp = br_mrp_find_port(br, p); >>>>> + if (unlikely(!mrp)) >>>>> + return 0; >>>>> + >>>>> + /* If the role is MRM then don't forward the frames */ >>>>> + if (mrp->ring_role == BR_MRP_RING_ROLE_MRM) { >>>>> + br_mrp_mrm_process(mrp, skb); >>>>> + return 1; >>>>> + } >>>>> + >>>>> + nskb = skb_clone(skb, GFP_ATOMIC); >>>>> + if (!nskb) >>>>> + return 0; >>>>> + >>>>> + p_dev = rcu_dereference(mrp->p_port)->dev; >>>>> + s_dev = rcu_dereference(mrp->s_port)->dev; >>>>> + >>>> >>>> Not safe, could deref null. >>> >>> Will be fixed in the next patch series. >>> >>>> >>>>> + if (p_dev == dev) >>>>> + d_dev = s_dev; >>>>> + else >>>>> + d_dev = p_dev; >>>>> + >>>>> + nskb->dev = d_dev; >>>>> + skb_push(nskb, ETH_HLEN); >>>>> + dev_queue_xmit(nskb); >>>>> + >>>>> + return 1; >>>>> +} >>>>> + >>>>> +int br_mrp_process(struct net_bridge_port *p, struct sk_buff *skb) >>>>> +{ >>>>> + /* If there is no MRP instance do normal forwarding */ >>>>> + if (unlikely(!(p->flags & BR_MRP_AWARE))) >>>> >>>> Shouldn't this one be likely() ? >>> >>> Yes, this should be likely. >>>> >>>>> + goto out; >>>>> + >>>>> + if (unlikely(skb->protocol == htons(ETH_P_MRP))) >>>>> + return br_mrp_rcv(p, skb, p->dev); >>>>> + >>>>> +out: >>>>> + return 0; >>>>> +} >>>>> >>>> >>> >>> [1] https://lwn.net/Articles/262464/ >>> [2] https://www.kernel.org/doc/html/latest/RCU/listRCU.html >>> >> >