Received: by 2002:a25:868d:0:0:0:0:0 with SMTP id z13csp3670273ybk; Tue, 19 May 2020 10:07:59 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxeGx3VhC67oLxgNoAMrTRAERBPYMTuCL1HBb9zjC54IzwQheyrTJ5t67pzjzrgSxUuj+Ts X-Received: by 2002:a17:906:9419:: with SMTP id q25mr201889ejx.111.1589908079135; Tue, 19 May 2020 10:07:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1589908079; cv=none; d=google.com; s=arc-20160816; b=IvEQGD7m/kXYFHk++A5uKUXEbNmu3DAmakoFjYNtv5MWCs/gfHhFCErHRGA7+BrLXw GRbVN2po7mDDe2ZykH5wEG+iGDrk9ySv4jBmRxDKWEHrpVG7mfuCI8NahG7ynUQw7XpB kQEEQQKNkXMDb0QkAWQuKZSO3w+cdFIKmoTMseP6JCivBowpPl55EubWlDvytsAWDtdA E9myH3YWgT0vhms7XjK0TmFuNiBnKCeYlrJW8LkdEzEoZQ3MelfuUbd6XypNbS9dQ/1w soNyEc4bzUKJCsJtxyzTcTO+g3fX5fk/s1VkVLjHBugMtQE7cbiRefJ7otdtdXUI7lIK lAyw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date:dkim-signature; bh=NjJTMSMaIJStjZ5vLVFN3LuT/tDATqGDyHjkps+SDvM=; b=XJtKjhghh7EZ1zwY9NuYpRpewYDEg6ykvK7oNP5OHF/tt72hkC/ncX0yn8HsJcgKjL tUIU+t5YJFz87rY8/CSBZUUrtnwyKBJFasKTOrjy6esJ1AWR7kp55W1yis0Nm77tYg// hwslN8cDyF5EsbBCAE2G4HuLJ6lISSpQc5X8zB0mYaph1UcClauCfSn1reG/p5+EaSfC mLkx4aKBS3hEbJwUcy2dPXD7oHOVQfLjVWpx+efS5ZMYlpo2e5JZBJAmwnX/EAiwHpR6 McBf2bIM+elHu+LjTYC4qrU38kumlfmXn5E7J+BEktB06r0VRi9CsrziJ3wWE396TD1M FwLw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=GjURddWx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h1si7089edn.474.2020.05.19.10.07.34; Tue, 19 May 2020 10:07:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=GjURddWx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729345AbgESRFt (ORCPT + 99 others); Tue, 19 May 2020 13:05:49 -0400 Received: from us-smtp-2.mimecast.com ([207.211.31.81]:44196 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728689AbgESRFs (ORCPT ); Tue, 19 May 2020 13:05:48 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1589907945; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NjJTMSMaIJStjZ5vLVFN3LuT/tDATqGDyHjkps+SDvM=; b=GjURddWxUgYJt8+5dBaKdLLy8BMgNWuNSWAv5Pmhv5BUN46XzZM5Xa0FyR5GpBndLzYFWY ZwV96COqQGXBGAYBbI5sY1l1H6IpPAbQggCdgHC6Nw3yZAQBvqdJdJh1wK/hkzhOT2IiRc IcGrWpPh4mK+tq8ERHu0h7qPAsL6v/U= Received: from mail-ej1-f72.google.com (mail-ej1-f72.google.com [209.85.218.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-489-NK5x37vuPs-6ReA13DmrwA-1; Tue, 19 May 2020 13:05:43 -0400 X-MC-Unique: NK5x37vuPs-6ReA13DmrwA-1 Received: by mail-ej1-f72.google.com with SMTP id cb22so6692329ejb.12 for ; Tue, 19 May 2020 10:05:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:organization:mime-version:content-transfer-encoding; bh=NjJTMSMaIJStjZ5vLVFN3LuT/tDATqGDyHjkps+SDvM=; b=dpaen9be5WXnG6HC+xqSWGeL0ICaCgHHnRxS668BURkqvSxsXjnIPiPD8AuV3S2+/O DCVZt2d58CK2tfexvq+F5/fPc7udtZqcchqLiZrdsvBt53YzQ+M8q/jSIgs92nUUuVoQ LRbkr8hG3sjqDqX0rBQ76W7QbPzO9zGlpGqIcDji1qdmSFEd62lFLyo5Fzgjz5xbw1Rd VD5KnXvhRo6TS+3UCAW8jzLFUmwn7YGjq+jVZDiFMXrDE4p9bYSeJILNoMc4ZA90QueV yOe3TOmLCRQ2XS0IMUymj92NSbV/jiBQOmM8vU688W8ltW26lZUkimYB8C/y9QyufgLe rZjQ== X-Gm-Message-State: AOAM531ZKP4IE4ljg6+YUfYeJw6nGx0vEfBU05fDa89ZK8yMo6t73Ksa 8AFshKv6gYBgqqFiyAWTUcgnoQI4HDN4SzLMfX35EIJB+bpKY0VKA4D7QLkvBzcd/bEEh22Bdlv dLptyCsBat5KYvhzqT1lQu8rq X-Received: by 2002:a50:d6d0:: with SMTP id l16mr19548615edj.317.1589907942127; Tue, 19 May 2020 10:05:42 -0700 (PDT) X-Received: by 2002:a50:d6d0:: with SMTP id l16mr19548579edj.317.1589907941555; Tue, 19 May 2020 10:05:41 -0700 (PDT) Received: from turbo.teknoraver.net (net-188-153-20-146.cust.vodafonedsl.it. [188.153.20.146]) by smtp.gmail.com with ESMTPSA id y13sm63143ejm.107.2020.05.19.10.05.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 May 2020 10:05:41 -0700 (PDT) Date: Tue, 19 May 2020 19:05:34 +0200 From: Matteo Croce To: Russell King - ARM Linux admin Cc: Antoine Tenart , netdev , "gregory.clement@bootlin.com" , LKML , Maxime Chevallier , Nadav Haklai , Thomas Petazzoni , "miquel.raynal@bootlin.com" , Stefan Chulski , Marcin Wojtas , "David S . Miller" , Linux ARM Subject: Re: [EXT] Re: [PATCH net-next 3/5] net: mvpp2: cls: Use RSS contexts to handle RSS tables Message-ID: <20200519190534.78bb8389@turbo.teknoraver.net> In-Reply-To: References: <20190524100554.8606-1-maxime.chevallier@bootlin.com> <20190524100554.8606-4-maxime.chevallier@bootlin.com> <20200423170003.GT25745@shell.armlinux.org.uk> <20200509114518.GB1551@shell.armlinux.org.uk> <20200509195246.GJ1551@shell.armlinux.org.uk> <20200509202050.GK1551@shell.armlinux.org.uk> <20200519095330.GA1551@shell.armlinux.org.uk> Organization: Red Hat X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.32; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 19 May 2020 12:05:20 +0200 Matteo Croce wrote: > On Tue, May 19, 2020 at 11:54 AM Russell King - ARM Linux admin > wrote: > > > > On Sat, May 09, 2020 at 09:20:50PM +0100, Russell King - ARM Linux > > admin wrote: > > > On Sat, May 09, 2020 at 08:52:46PM +0100, Russell King - ARM > > > Linux admin wrote: > > > > It is highly likely that 895586d5dc32 is responsible for this > > > > breakage. I've been investigating this afternoon, and what I've > > > > found, comparing a kernel without 895586d5dc32 and with > > > > 895586d5dc32 applied is: > > > > > > > > - The table programmed into the hardware via > > > > mvpp22_rss_fill_table() appears to be identical with or without > > > > the commit. > > > > > > > > - When rxhash is enabled on eth2, mvpp2_rss_port_c2_enable() > > > > reports that c2.attr[0] and c2.attr[2] are written back > > > > containing: > > > > > > > > - with 895586d5dc32, failing: 00200000 40000000 > > > > - without 895586d5dc32, working: 04000000 40000000 > > > > > > > > - When disabling rxhash, c2.attr[0] and c2.attr[2] are written > > > > back as: > > > > > > > > 04000000 00000000 > > > > > > > > The second value represents the MVPP22_CLS_C2_ATTR2_RSS_EN bit, > > > > the first value is the queue number, which comprises two > > > > fields. The high 5 bits are 24:29 and the low three are 21:23 > > > > inclusive. This comes from: > > > > > > > > c2.attr[0] = MVPP22_CLS_C2_ATTR0_QHIGH(qh) | > > > > MVPP22_CLS_C2_ATTR0_QLOW(ql); > > > > #define MVPP22_CLS_C2_ATTR0_QHIGH(qh) (((qh) & 0x1f) > > > > << 24) #define MVPP22_CLS_C2_ATTR0_QLOW(ql) (((ql) & > > > > 0x7) << 21) > > > > > > > > So, the working case gives eth2 a queue id of 4.0, or 32 as per > > > > port->first_rxq, and the non-working case a queue id of 0.1, or > > > > 1. > > > > > > > > The allocation of queue IDs seems to be in mvpp2_port_probe(): > > > > > > > > if (priv->hw_version == MVPP21) > > > > port->first_rxq = port->id * port->nrxqs; > > > > else > > > > port->first_rxq = port->id * > > > > priv->max_port_rxqs; > > > > > > > > Where: > > > > > > > > if (priv->hw_version == MVPP21) > > > > priv->max_port_rxqs = 8; > > > > else > > > > priv->max_port_rxqs = 32; > > > > > > > > Making the port 0 (eth0 / eth1) have port->first_rxq = 0, and > > > > port 1 (eth2) be 32. It seems the idea is that the first 32 > > > > queues belong to port 0, the second 32 queues belong to port 1, > > > > etc. > > > > > > > > mvpp2_rss_port_c2_enable() gets the queue number from it's > > > > parameter, 'ctx', which comes from mvpp22_rss_ctx(port, 0). > > > > This returns port->rss_ctx[0]. > > > > > > > > mvpp22_rss_context_create() is responsible for allocating that, > > > > which it does by looking for an unallocated priv->rss_tables[] > > > > pointer. This table is shared amongst all ports on the CP > > > > silicon. > > > > > > > > When we write the tables in mvpp22_rss_fill_table(), the RSS > > > > table entry is defined by: > > > > > > > > u32 sel = MVPP22_RSS_INDEX_TABLE(rss_ctx) | > > > > MVPP22_RSS_INDEX_TABLE_ENTRY(i); > > > > > > > > where rss_ctx is the context ID (queue number) and i is the > > > > index in the table. > > > > > > > > #define MVPP22_RSS_INDEX_TABLE_ENTRY(idx) (idx) > > > > #define MVPP22_RSS_INDEX_TABLE(idx) ((idx) << 8) > > > > #define MVPP22_RSS_INDEX_QUEUE(idx) ((idx) << 16) > > > > > > > > If we look at what is written: > > > > > > > > - The first table to be written has "sel" values of > > > > 00000000..0000001f, containing values 0..3. This appears to be > > > > for eth1. This is table 0, RX queue number 0. > > > > - The second table has "sel" values of 00000100..0000011f, and > > > > appears to be for eth2. These contain values 0x20..0x23. This > > > > is table 1, RX queue number 0. > > > > - The third table has "sel" values of 00000200..0000021f, and > > > > appears to be for eth3. These contain values 0x40..0x43. This > > > > is table 2, RX queue number 0. > > > > > > > > Okay, so how do queue numbers translate to the RSS table? > > > > There is another table - the RXQ2RSS table, indexed by the > > > > MVPP22_RSS_INDEX_QUEUE field of MVPP22_RSS_INDEX and accessed > > > > through the MVPP22_RXQ2RSS_TABLE register. Before > > > > 895586d5dc32, it was: > > > > > > > > mvpp2_write(priv, MVPP22_RSS_INDEX, > > > > MVPP22_RSS_INDEX_QUEUE(port->first_rxq)); > > > > mvpp2_write(priv, MVPP22_RXQ2RSS_TABLE, > > > > MVPP22_RSS_TABLE_POINTER(port->id)); > > > > > > > > and after: > > > > > > > > mvpp2_write(priv, MVPP22_RSS_INDEX, > > > > MVPP22_RSS_INDEX_QUEUE(ctx)); mvpp2_write(priv, > > > > MVPP22_RXQ2RSS_TABLE, MVPP22_RSS_TABLE_POINTER(ctx)); > > > > > > > > So, before the commit, for eth2, that would've contained '32' > > > > for the index and '1' for the table pointer - mapping queue 32 > > > > to table 1. Remember that this is queue-high.queue-low of 4.0. > > > > > > > > After the commit, we appear to map queue 1 to table 1. That > > > > again looks fine on the face of it. > > > > > > > > Section 9.3.1 of the A8040 manual seems indicate the reason > > > > that the queue number is separated. queue-low seems to always > > > > come from the classifier, whereas queue-high can be from the > > > > ingress physical port number or the classifier depending on the > > > > MVPP2_CLS_SWFWD_PCTRL_REG. > > > > > > > > We set the port bit in MVPP2_CLS_SWFWD_PCTRL_REG, meaning that > > > > queue-high comes from the MVPP2_CLS_SWFWD_P2HQ_REG() > > > > register... and this seems to be where our bug comes from. > > > > > > > > mvpp2_cls_oversize_rxq_set() sets this up as: > > > > > > > > mvpp2_write(port->priv, > > > > MVPP2_CLS_SWFWD_P2HQ_REG(port->id), (port->first_rxq >> > > > > MVPP2_CLS_OVERSIZE_RXQ_LOW_BITS)); > > > > > > > > val = mvpp2_read(port->priv, MVPP2_CLS_SWFWD_PCTRL_REG); > > > > val |= MVPP2_CLS_SWFWD_PCTRL_MASK(port->id); > > > > mvpp2_write(port->priv, MVPP2_CLS_SWFWD_PCTRL_REG, val); > > > > > > > > so, the queue-high for eth2 is _always_ 4, meaning that only > > > > queues 32 through 39 inclusive are available to eth2. Yet, > > > > we're trying to tell the classifier to set queue-high, which > > > > will be ignored, to zero. > > > > > > > > So we end up directing traffic from eth2 not to queue 1, but to > > > > queue 33, and then we tell it to look up queue 33 in the RSS > > > > table. However, RSS table has not been programmed for queue > > > > 33, and so it ends up (presumably) dropping the packets. > > > > > > > > It seems that mvpp22_rss_context_create() doesn't take account > > > > of the fact that the upper 5 bits of the queue ID can't > > > > actually be changed due to the settings in > > > > mvpp2_cls_oversize_rxq_set(), _or_ it seems that > > > > mvpp2_cls_oversize_rxq_set() has been missed in this commit. > > > > Either way, these two functions mutually disagree with what > > > > queue number should be used. > > > > > > > > So, 895586d5dc32 is indeed the cause of this problem. > > > > > > Looking deeper into what mvpp2_cls_oversize_rxq_set() and the MTU > > > validation is doing, it seems that MVPP2_CLS_SWFWD_P2HQ_REG() is > > > used for at least a couple of things. > > > > > > So, with the classifier having had RSS enabled and directing eth2 > > > traffic to queue 1, we can not ignore the fact that we may have > > > packets appearing on queue 32 for this port. > > > > > > One of the things that queue 32 will be used for is if an > > > over-sized packet attempts to egress through eth2 - it seems that > > > the A8040 has the ability to forward frames between its ports. > > > However, afaik we don't support that feature, and the kernel > > > restricts the packet size, so we should never violate the MTU > > > validator and end up with such a packet. In any case, _if_ we > > > were to attempt to transmit an oversized packet, we have no > > > support in the kernel to deal with that appearing in the port's > > > receive queue. > > > > > > Maybe it would be safe to clear the MVPP2_CLS_SWFWD_PCTRL_MASK() > > > bit? > > > > > > My testing seems to confirm my findings above - clearing this bit > > > means that if I enable rxhash on eth2, the interface can then pass > > > traffic, as we are now directing traffic to RX queue 1 rather than > > > queue 33. Traffic still seems to work with rxhash off as well. > > > > > > So, I think it's clear where the problem lies, but not what the > > > correct solution is; someone with more experience of packet > > > classifiers (this one?) needs to look at this - this is my first > > > venture into these things, and certainly the first time I've > > > traced through how this is trying to work (or not)... > > > > This is what I was using here to work around the problem, and what I > > mentioned above. > > > > diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_cls.c > > b/drivers/net/ethernet/marvell/mvpp2/mvpp2_cls.c index > > fd221d88811e..0dd3b65822dd 100644 --- > > a/drivers/net/ethernet/marvell/mvpp2/mvpp2_cls.c +++ > > b/drivers/net/ethernet/marvell/mvpp2/mvpp2_cls.c @@ -1058,7 +1058,7 > > @@ void mvpp2_cls_oversize_rxq_set(struct mvpp2_port *port) > > (port->first_rxq >> MVPP2_CLS_OVERSIZE_RXQ_LOW_BITS)); > > > > val = mvpp2_read(port->priv, MVPP2_CLS_SWFWD_PCTRL_REG); > > - val |= MVPP2_CLS_SWFWD_PCTRL_MASK(port->id); > > + val &= ~MVPP2_CLS_SWFWD_PCTRL_MASK(port->id); > > mvpp2_write(port->priv, MVPP2_CLS_SWFWD_PCTRL_REG, val); > > } > > > > > > Hi, > > I will try this change and let you know if it works. > > Thanks > Hi, The patch seems to work. I'm generating traffic with random MAC and IP addresses, to have many flows: # tcpdump -tenni eth2 9a:a9:b1:3a:b1:6b > 00:51:82:11:22:02, ethertype IPv4 (0x0800), length 60: 10.0.0.4.0 > 192.168.0.4.0: UDP, length 12 9e:92:fd:f8:7f:0a > 00:51:82:11:22:02, ethertype IPv4 (0x0800), length 60: 10.0.0.4.0 > 192.168.0.4.0: UDP, length 12 66:b7:11:8a:c2:1f > 00:51:82:11:22:02, ethertype IPv4 (0x0800), length 60: 10.0.0.1.0 > 192.168.0.1.0: UDP, length 12 7a:ba:58:bd:9a:62 > 00:51:82:11:22:02, ethertype IPv4 (0x0800), length 60: 10.0.0.1.0 > 192.168.0.1.0: UDP, length 12 7e:78:a9:97:70:3a > 00:51:82:11:22:02, ethertype IPv4 (0x0800), length 60: 10.0.0.2.0 > 192.168.0.2.0: UDP, length 12 b2:81:91:34:ce:42 > 00:51:82:11:22:02, ethertype IPv4 (0x0800), length 60: 10.0.0.2.0 > 192.168.0.2.0: UDP, length 12 2a:05:52:d0:d9:3f > 00:51:82:11:22:02, ethertype IPv4 (0x0800), length 60: 10.0.0.3.0 > 192.168.0.3.0: UDP, length 12 ee:ee:47:35:fa:81 > 00:51:82:11:22:02, ethertype IPv4 (0x0800), length 60: 10.0.0.3.0 > 192.168.0.3.0: UDP, length 12 This is the default rate, with rxhash off: # utraf eth2 tx: 0 bps 0 pps rx: 397.4 Mbps 827.9 Kpps tx: 0 bps 0 pps rx: 396.3 Mbps 825.7 Kpps tx: 0 bps 0 pps rx: 396.6 Mbps 826.3 Kpps tx: 0 bps 0 pps rx: 396.5 Mbps 826.1 Kpps PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9 root 20 0 0 0 0 R 99.7 0.0 7:02.58 ksoftirqd/0 15 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/1 20 root 20 0 0 0 0 S 0.0 0.0 2:01.48 ksoftirqd/2 25 root 20 0 0 0 0 S 0.0 0.0 0:32.86 ksoftirqd/3 and this with rx hashing enabled: # ethtool -K eth2 rxhash on # utraf eth2 tx: 0 bps 0 pps rx: 456.4 Mbps 950.8 Kpps tx: 0 bps 0 pps rx: 458.4 Mbps 955.0 Kpps tx: 0 bps 0 pps rx: 457.6 Mbps 953.3 Kpps tx: 0 bps 0 pps rx: 462.2 Mbps 962.9 Kpps PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20 root 20 0 0 0 0 R 0.7 0.0 2:02.34 ksoftirqd/2 25 root 20 0 0 0 0 S 0.3 0.0 0:33.25 ksoftirqd/3 9 root 20 0 0 0 0 S 0.0 0.0 7:52.57 ksoftirqd/0 15 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/1 The throughput doesn't increase so much, maybe we hit an HW limit of the gigabit port. The interesting thing is how the global CPU usage drops from 25% to 1%. I can't explain this, it could be due to the reduced contention? Regards, -- Matteo Croce per aspera ad upstream