2023-11-04 12:44:36

by Linus Walleij

[permalink] [raw]
Subject: [PATCH net 4/4] net: ethernet: cortina: Handle large frames

The Gemini ethernet controller provides hardware checksumming
for frames up to 1514 bytes including ethernet headers but not
FCS.

If we start sending bigger frames (after first bumping up the MTU
on both interfaces sending and receiveing the frames), truncated
packets start to appear on the target such as in this tcpdump
resulting from ping -s 1474:

23:34:17.241983 14:d6:4d:a8:3c:4f (oui Unknown) > bc:ae:c5:6b:a8:3d (oui Unknown),
ethertype IPv4 (0x0800), length 1514: truncated-ip - 2 bytes missing!
(tos 0x0, ttl 64, id 32653, offset 0, flags [DF], proto ICMP (1), length 1502)
OpenWrt.lan > Fecusia: ICMP echo request, id 1672, seq 50, length 1482

If we bypass the hardware checksumming and provide a software
fallback, everything starts working fine up to the max TX MTU
of 2047 bytes, for example ping -s2000 192.168.1.2:

00:44:29.587598 bc:ae:c5:6b:a8:3d (oui Unknown) > 14:d6:4d:a8:3c:4f (oui Unknown),
ethertype IPv4 (0x0800), length 2042:
(tos 0x0, ttl 64, id 51828, offset 0, flags [none], proto ICMP (1), length 2028)
Fecusia > OpenWrt.lan: ICMP echo reply, id 1683, seq 4, length 2008

The bit enabling to bypass hardware checksum (or any of the
"TSS" bits) are undocumented in the hardware reference manual.
The entire hardware checksum unit appears undocumented. The
conclusion that we need to use the "bypass" bit was found by
trial-and-error.

On the D-Link DIR-685 router this fixes a bug on the conduit
interface to the RTL8366RB DSA switch: as the switch needs to add
space for its tag it increases the MTU on the conduit interface
to 1504 and that means that when the router sends packages
of 1500 bytes these get an extra 4 bytes of DSA tag and the
transfer fails because of the erroneous hardware checksumming,
affecting such basic functionality as the LuCI web inteface.

Suggested-by: Vladimir Oltean <[email protected]>
Fixes: 4d5ae32f5e1e ("net: ethernet: Add a driver for Gemini gigabit ethernet")
Signed-off-by: Linus Walleij <[email protected]>
---
drivers/net/ethernet/cortina/gemini.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
index 23723c9c0f93..063e58639379 100644
--- a/drivers/net/ethernet/cortina/gemini.c
+++ b/drivers/net/ethernet/cortina/gemini.c
@@ -1145,6 +1145,7 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
dma_addr_t mapping;
unsigned short mtu;
void *buffer;
+ int ret;

mtu = ETH_HLEN;
mtu += netdev->mtu;
@@ -1170,7 +1171,14 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
word3 |= mtu;
}

- if (skb->ip_summed != CHECKSUM_NONE) {
+ if (skb->len >= ETH_FRAME_LEN) {
+ /* Hardware offloaded checksumming isn't working on frames
+ * bigger than 1514 bytes. Perhaps the buffer is only 1518
+ * bytes fitting mach a normal frame and a checksum?
+ * Just bypass on bigger frames.
+ */
+ word1 |= TSS_BYPASS_BIT;
+ } else if (skb->ip_summed != CHECKSUM_NONE) {
int tcp = 0;

if (skb->protocol == htons(ETH_P_IP)) {

--
2.34.1


2023-11-04 14:57:27

by Andrew Lunn

[permalink] [raw]
Subject: Re: [PATCH net 4/4] net: ethernet: cortina: Handle large frames

> @@ -1170,7 +1171,14 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
> word3 |= mtu;
> }
>
> - if (skb->ip_summed != CHECKSUM_NONE) {
> + if (skb->len >= ETH_FRAME_LEN) {
> + /* Hardware offloaded checksumming isn't working on frames
> + * bigger than 1514 bytes. Perhaps the buffer is only 1518
> + * bytes fitting mach a normal frame and a checksum?

mach ?

> + * Just bypass on bigger frames.
> + */
> + word1 |= TSS_BYPASS_BIT;
> + } else if (skb->ip_summed != CHECKSUM_NONE) {

I've never looked at how the network stack does checksums. But looking
at this patch, it made me wounder, how do you tell the stack it needs
to do a software checksum because the hardware cannot? Or for this
driver, is it always calculating a checksum, which is then ignored?
Maybe you can improve performance a little but disabling software
checksum when it is not needed?

Andrew

2023-11-04 15:21:12

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH net 4/4] net: ethernet: cortina: Handle large frames

Hi Linus,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 90b0c2b2edd1adff742c621e246562fbefa11b70]

url: https://github.com/intel-lab-lkp/linux/commits/Linus-Walleij/net-ethernet-cortina-Fix-MTU-max-setting/20231104-204432
base: 90b0c2b2edd1adff742c621e246562fbefa11b70
patch link: https://lore.kernel.org/r/20231104-gemini-largeframe-fix-v1-4-9c5513f22f33%40linaro.org
patch subject: [PATCH net 4/4] net: ethernet: cortina: Handle large frames
config: arc-randconfig-001-20231104 (https://download.01.org/0day-ci/archive/20231104/[email protected]/config)
compiler: arc-elf-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231104/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All warnings (new ones prefixed by >>):

drivers/net/ethernet/cortina/gemini.c: In function 'gmac_map_tx_bufs':
>> drivers/net/ethernet/cortina/gemini.c:1148:13: warning: unused variable 'ret' [-Wunused-variable]
1148 | int ret;
| ^~~


vim +/ret +1148 drivers/net/ethernet/cortina/gemini.c

1132
1133 static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
1134 struct gmac_txq *txq, unsigned short *desc)
1135 {
1136 struct gemini_ethernet_port *port = netdev_priv(netdev);
1137 struct skb_shared_info *skb_si = skb_shinfo(skb);
1138 unsigned short m = (1 << port->txq_order) - 1;
1139 short frag, last_frag = skb_si->nr_frags - 1;
1140 struct gemini_ethernet *geth = port->geth;
1141 unsigned int word1, word3, buflen;
1142 unsigned short w = *desc;
1143 struct gmac_txdesc *txd;
1144 skb_frag_t *skb_frag;
1145 dma_addr_t mapping;
1146 unsigned short mtu;
1147 void *buffer;
> 1148 int ret;
1149
1150 mtu = ETH_HLEN;
1151 mtu += netdev->mtu;
1152 if (skb->protocol == htons(ETH_P_8021Q))
1153 mtu += VLAN_HLEN;
1154
1155 if (mtu > MTU_SIZE_BIT_MASK) {
1156 netdev_err(netdev, "%s: MTU too big, max size 2047 bytes, capped\n", __func__);
1157 mtu = MTU_SIZE_BIT_MASK;
1158 }
1159
1160 if (skb->len > 65535) {
1161 /* The field for length is only 16 bits */
1162 netdev_err(netdev, "%s: frame too big, max size 65535 bytes\n", __func__);
1163 return -EINVAL;
1164 }
1165
1166 word1 = skb->len;
1167 word3 = SOF_BIT;
1168
1169 if (word1 > mtu) {
1170 word1 |= TSS_MTU_ENABLE_BIT;
1171 word3 |= mtu;
1172 }
1173
1174 if (skb->len >= ETH_FRAME_LEN) {
1175 /* Hardware offloaded checksumming isn't working on frames
1176 * bigger than 1514 bytes. Perhaps the buffer is only 1518
1177 * bytes fitting mach a normal frame and a checksum?
1178 * Just bypass on bigger frames.
1179 */
1180 word1 |= TSS_BYPASS_BIT;
1181 } else if (skb->ip_summed != CHECKSUM_NONE) {
1182 int tcp = 0;
1183
1184 if (skb->protocol == htons(ETH_P_IP)) {
1185 word1 |= TSS_IP_CHKSUM_BIT;
1186 tcp = ip_hdr(skb)->protocol == IPPROTO_TCP;
1187 } else { /* IPv6 */
1188 word1 |= TSS_IPV6_ENABLE_BIT;
1189 tcp = ipv6_hdr(skb)->nexthdr == IPPROTO_TCP;
1190 }
1191
1192 word1 |= tcp ? TSS_TCP_CHKSUM_BIT : TSS_UDP_CHKSUM_BIT;
1193 }
1194
1195 frag = -1;
1196 while (frag <= last_frag) {
1197 if (frag == -1) {
1198 buffer = skb->data;
1199 buflen = skb_headlen(skb);
1200 } else {
1201 skb_frag = skb_si->frags + frag;
1202 buffer = skb_frag_address(skb_frag);
1203 buflen = skb_frag_size(skb_frag);
1204 }
1205
1206 if (frag == last_frag) {
1207 word3 |= EOF_BIT;
1208 txq->skb[w] = skb;
1209 }
1210
1211 mapping = dma_map_single(geth->dev, buffer, buflen,
1212 DMA_TO_DEVICE);
1213 if (dma_mapping_error(geth->dev, mapping))
1214 goto map_error;
1215
1216 txd = txq->ring + w;
1217 txd->word0.bits32 = buflen;
1218 txd->word1.bits32 = word1;
1219 txd->word2.buf_adr = mapping;
1220 txd->word3.bits32 = word3;
1221
1222 word3 &= MTU_SIZE_BIT_MASK;
1223 w++;
1224 w &= m;
1225 frag++;
1226 }
1227
1228 *desc = w;
1229 return 0;
1230
1231 map_error:
1232 while (w != *desc) {
1233 w--;
1234 w &= m;
1235
1236 dma_unmap_page(geth->dev, txq->ring[w].word2.buf_adr,
1237 txq->ring[w].word0.bits.buffer_size,
1238 DMA_TO_DEVICE);
1239 }
1240 return -ENOMEM;
1241 }
1242

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

2023-11-05 20:57:04

by Linus Walleij

[permalink] [raw]
Subject: Re: [PATCH net 4/4] net: ethernet: cortina: Handle large frames

On Sat, Nov 4, 2023 at 3:57 PM Andrew Lunn <[email protected]> wrote:

> > + * Just bypass on bigger frames.
> > + */
> > + word1 |= TSS_BYPASS_BIT;
> > + } else if (skb->ip_summed != CHECKSUM_NONE) {
>
> I've never looked at how the network stack does checksums. But looking
> at this patch, it made me wounder, how do you tell the stack it needs
> to do a software checksum because the hardware cannot?

I read up on it: the documentation is in
Documentation/networking/checksum-offloads.rst
and in the header for skbuff, include/linux/skbuff.h

Actually we should check for == CHECKSUM_PARTIAL which means
we need to do the checksum (!= CHECKSUM_NONE is not inclusive)
then I call a software fallback directly from the driver if I need to.

> Or for this
> driver, is it always calculating a checksum, which is then ignored?
> Maybe you can improve performance a little but disabling software
> checksum when it is not needed?

The ping was somehow working without proper checksum
before, but I think I'm doing the right thing now, also tested with
HTTP traffic, check out v2.

Thanks for pointing it out, the patch looks way better now.

Yours,
Linus Walleij