I have been doing some work with tegra3 systems which have a smsc9512
USB network device on them. A couple of issues we found with alignment
of the data (both receive and transmit) and an issue where the automatic
transmit checksum failed.
The smsc95xx driver already takes into account the NET_IP_ALIGN
parameter when setting up the receive packet data, which means
we do not need to worry about aligning the packets in the usbnet
driver.
Adding the EVENT_NO_IP_ALIGN means that the IPv4 header is now
passed to the ip_rcv() routine with the start on an aligned address.
Tested on Raspberry Pi B3.
Signed-off-by: Ben Dooks <[email protected]>
---
drivers/net/usb/smsc95xx.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index 46385a4b8b98..2b867523cd53 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -1296,6 +1296,7 @@ static int smsc95xx_bind(struct usbnet *dev, struct usb_interface *intf)
dev->net->features |= NETIF_F_RXCSUM;
dev->net->hw_features = NETIF_F_IP_CSUM | NETIF_F_RXCSUM;
+ set_bit(EVENT_NO_IP_ALIGN, &dev->flags);
smsc95xx_init_mac_address(dev);
--
2.19.0
Add a configuration option for the default state of turbo mode
on the smsc95xx networking driver. Some systems it is better
to default this to off as it causes significant increases in
soft-irq load.
Signed-off-by: Ben Dooks <[email protected]>
---
drivers/net/usb/Kconfig | 9 +++++++++
drivers/net/usb/smsc95xx.c | 2 +-
2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/net/usb/Kconfig b/drivers/net/usb/Kconfig
index 418b0904cecb..a32f1a446ce9 100644
--- a/drivers/net/usb/Kconfig
+++ b/drivers/net/usb/Kconfig
@@ -351,6 +351,15 @@ config USB_NET_SMSC95XX
This option adds support for SMSC LAN95XX based USB 2.0
10/100 Ethernet adapters.
+config USB_NET_SMSC95XX_TURBO
+ bool "Use turbo receive mode by default"
+ depends on USB_NET_SMSC95XX
+ default y
+ help
+ This options sets the default turbo mode settings for the
+ driver's receive path. These can also be altered by the
+ turbo_mode module parameter.
+
config USB_NET_GL620A
tristate "GeneSys GL620USB-A based cables"
depends on USB_USBNET
diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index 06b4d290784d..fe13bef9579e 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -78,7 +78,7 @@ struct smsc95xx_priv {
struct usbnet *dev;
};
-static bool turbo_mode = true;
+static bool turbo_mode = IS_ENABLED(CONFIG_USB_NET_SMSC95XX_TURBO);
module_param(turbo_mode, bool, 0644);
MODULE_PARM_DESC(turbo_mode, "Enable multiple frames per Rx transaction");
--
2.19.0
The manual states that the checksum cannot lie in the last DWORD of the
transmission, so add a basic check for this and fall back to software
checksumming the packet.
This only seems to trigger for ACK packets with no options or data to
return to the other end, and the use of the tx-alignment option makes
it more likely to happen.
Signed-off-by: Ben Dooks <[email protected]>
---
drivers/net/usb/smsc95xx.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index d244357bf1ad..46385a4b8b98 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -2003,6 +2003,20 @@ static u32 smsc95xx_calc_csum_preamble(struct sk_buff *skb)
return (high_16 << 16) | low_16;
}
+/* The CSUM won't work if the checksum lies in the last 4 bytes of the
+ * transmission. This is fairly unlikely, only seems to trigger with some
+ * short TCP ACK packets sent.
+ *
+ * Note, this calculation should probably check for the alignment of the
+ * data as well, but a straight chec for csum being in the last four bytes
+ * of the packet should be ok for now.
+*/
+static bool smsc95xx_can_checksum(struct sk_buff *skb)
+{
+ unsigned int len = skb->len - skb_checksum_start_offset(skb);
+ return skb->csum_offset < (len - (4 + 1));
+}
+
static struct sk_buff *smsc95xx_tx_fixup(struct usbnet *dev,
struct sk_buff *skb, gfp_t flags)
{
@@ -2031,7 +2045,8 @@ static struct sk_buff *smsc95xx_tx_fixup(struct usbnet *dev,
}
if (csum) {
- if (skb->len <= 45) {
+ /* note, csum does not work if csum in last DWORD of packet */
+ if (skb->len <= 45 || !smsc95xx_can_checksum(skb)) {
/* workaround - hardware tx checksum does not work
* properly with extremely small packets */
long csstart = skb_checksum_start_offset(skb);
--
2.19.0
The tegra driver requires alignment of the buffer, so try and
make this better by pushing the buffer start back to an word
aligned address. At the worst this makes memcpy() easier as
it is word aligned, at best it makes sure the usb can directly
map the buffer.
Signed-off-by: Ben Dooks <[email protected]>
[todo - make this configurable]
---
drivers/net/usb/Kconfig | 12 ++++++++++++
drivers/net/usb/smsc95xx.c | 22 ++++++++++++++++++++--
2 files changed, 32 insertions(+), 2 deletions(-)
diff --git a/drivers/net/usb/Kconfig b/drivers/net/usb/Kconfig
index a32f1a446ce9..35bad8bd2e2a 100644
--- a/drivers/net/usb/Kconfig
+++ b/drivers/net/usb/Kconfig
@@ -360,6 +360,18 @@ config USB_NET_SMSC95XX_TURBO
driver's receive path. These can also be altered by the
turbo_mode module parameter.
+config USB_NET_SMSC95XX_TXALIGN
+ bool "Add bytes to align transmit buffers"
+ depends on USB_NET_SMSC95XX
+ default n
+ help
+ This option makes the tx buffers 32 bit aligned which might
+ help with systems that want tx data aligned to a 32 bit
+ boundary.
+
+ Using this option will mean there may be up to 3 bytes of
+ data per packet sent.
+
config USB_NET_GL620A
tristate "GeneSys GL620USB-A based cables"
depends on USB_USBNET
diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index fe13bef9579e..d244357bf1ad 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -78,6 +78,10 @@ struct smsc95xx_priv {
struct usbnet *dev;
};
+static bool align_tx = IS_ENABLED(CONFIG_USB_NET_SMSC95XX_TXALIGN);
+module_param(align_tx, bool, 0644);
+MODULE_PARM_DESC(align_tx, "Align TX buffers to word boundaries");
+
static bool turbo_mode = IS_ENABLED(CONFIG_USB_NET_SMSC95XX_TURBO);
module_param(turbo_mode, bool, 0644);
MODULE_PARM_DESC(turbo_mode, "Enable multiple frames per Rx transaction");
@@ -2005,10 +2009,18 @@ static struct sk_buff *smsc95xx_tx_fixup(struct usbnet *dev,
bool csum = skb->ip_summed == CHECKSUM_PARTIAL;
int overhead = csum ? SMSC95XX_TX_OVERHEAD_CSUM : SMSC95XX_TX_OVERHEAD;
u32 tx_cmd_a, tx_cmd_b;
+ u32 data_len;
+ uintptr_t align = 0;
/* We do not advertise SG, so skbs should be already linearized */
BUG_ON(skb_shinfo(skb)->nr_frags);
+ if (IS_ENABLED(CONFIG_USB_NET_SMSC95XX_TXALIGN) && align_tx) {
+ align = (uintptr_t)skb->data & 3;
+ if (align)
+ overhead += 4 - align;
+ }
+
/* Make writable and expand header space by overhead if required */
if (skb_cow_head(skb, overhead)) {
/* Must deallocate here as returning NULL to indicate error
@@ -2037,16 +2049,22 @@ static struct sk_buff *smsc95xx_tx_fixup(struct usbnet *dev,
}
}
+ data_len = skb->len;
+ if (align)
+ skb_push(skb, 4 - align);
+
skb_push(skb, 4);
- tx_cmd_b = (u32)(skb->len - 4);
+ tx_cmd_b = (u32)(data_len);
if (csum)
tx_cmd_b |= TX_CMD_B_CSUM_ENABLE;
cpu_to_le32s(&tx_cmd_b);
memcpy(skb->data, &tx_cmd_b, 4);
skb_push(skb, 4);
- tx_cmd_a = (u32)(skb->len - 8) | TX_CMD_A_FIRST_SEG_ |
+ tx_cmd_a = (u32)(data_len) | TX_CMD_A_FIRST_SEG_ |
TX_CMD_A_LAST_SEG_;
+ if (align)
+ tx_cmd_a |= (4 - align) << 16;
cpu_to_le32s(&tx_cmd_a);
memcpy(skb->data, &tx_cmd_a, 4);
--
2.19.0
Hello!
On 10/2/2018 12:26 PM, Ben Dooks wrote:
> The manual states that the checksum cannot lie in the last DWORD of the
> transmission, so add a basic check for this and fall back to software
> checksumming the packet.
>
> This only seems to trigger for ACK packets with no options or data to
> return to the other end, and the use of the tx-alignment option makes
> it more likely to happen.
>
> Signed-off-by: Ben Dooks <[email protected]>
> ---
> drivers/net/usb/smsc95xx.c | 17 ++++++++++++++++-
> 1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
> index d244357bf1ad..46385a4b8b98 100644
> --- a/drivers/net/usb/smsc95xx.c
> +++ b/drivers/net/usb/smsc95xx.c
> @@ -2003,6 +2003,20 @@ static u32 smsc95xx_calc_csum_preamble(struct sk_buff *skb)
> return (high_16 << 16) | low_16;
> }
>
> +/* The CSUM won't work if the checksum lies in the last 4 bytes of the
> + * transmission. This is fairly unlikely, only seems to trigger with some
> + * short TCP ACK packets sent.
> + *
> + * Note, this calculation should probably check for the alignment of the
> + * data as well, but a straight chec for csum being in the last four bytes
s/chec/check/?
> + * of the packet should be ok for now.
> +*/
> +static bool smsc95xx_can_checksum(struct sk_buff *skb)
> +{
> + unsigned int len = skb->len - skb_checksum_start_offset(skb);
> + return skb->csum_offset < (len - (4 + 1));
> +}
> +
> static struct sk_buff *smsc95xx_tx_fixup(struct usbnet *dev,
> struct sk_buff *skb, gfp_t flags)
> {
[...]
MBR, Sergei
On Tue, Oct 02, 2018 at 10:26:42AM +0100, Ben Dooks wrote:
> Add a configuration option for the default state of turbo mode
> on the smsc95xx networking driver. Some systems it is better
> to default this to off as it causes significant increases in
> soft-irq load.
>
> Signed-off-by: Ben Dooks <[email protected]>
> ---
> drivers/net/usb/Kconfig | 9 +++++++++
> drivers/net/usb/smsc95xx.c | 2 +-
> 2 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/usb/Kconfig b/drivers/net/usb/Kconfig
> index 418b0904cecb..a32f1a446ce9 100644
> --- a/drivers/net/usb/Kconfig
> +++ b/drivers/net/usb/Kconfig
> @@ -351,6 +351,15 @@ config USB_NET_SMSC95XX
> This option adds support for SMSC LAN95XX based USB 2.0
> 10/100 Ethernet adapters.
>
> +config USB_NET_SMSC95XX_TURBO
> + bool "Use turbo receive mode by default"
> + depends on USB_NET_SMSC95XX
> + default y
> + help
> + This options sets the default turbo mode settings for the
> + driver's receive path. These can also be altered by the
> + turbo_mode module parameter.
> +
Hi Ben
Is it worth adding a comment here why you would want to turn it off?
To reduce soft-irq load?
Thanks
Andrew
From: Ben Dooks
> Sent: 02 October 2018 10:27
>
> The tegra driver requires alignment of the buffer, so try and
> make this better by pushing the buffer start back to an word
> aligned address. At the worst this makes memcpy() easier as
> it is word aligned, at best it makes sure the usb can directly
> map the buffer.
>
> Signed-off-by: Ben Dooks <[email protected]>
> [todo - make this configurable]
> ---
> drivers/net/usb/Kconfig | 12 ++++++++++++
> drivers/net/usb/smsc95xx.c | 22 ++++++++++++++++++++--
> 2 files changed, 32 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/usb/Kconfig b/drivers/net/usb/Kconfig
...
> +static bool align_tx = IS_ENABLED(CONFIG_USB_NET_SMSC95XX_TXALIGN);
> +module_param(align_tx, bool, 0644);
> +MODULE_PARM_DESC(align_tx, "Align TX buffers to word boundaries");
DM doesn't like module parameters.
> static bool turbo_mode = IS_ENABLED(CONFIG_USB_NET_SMSC95XX_TURBO);
> module_param(turbo_mode, bool, 0644);
> MODULE_PARM_DESC(turbo_mode, "Enable multiple frames per Rx transaction");
> @@ -2005,10 +2009,18 @@ static struct sk_buff *smsc95xx_tx_fixup(struct usbnet *dev,
> bool csum = skb->ip_summed == CHECKSUM_PARTIAL;
> int overhead = csum ? SMSC95XX_TX_OVERHEAD_CSUM : SMSC95XX_TX_OVERHEAD;
> u32 tx_cmd_a, tx_cmd_b;
> + u32 data_len;
> + uintptr_t align = 0;
>
> /* We do not advertise SG, so skbs should be already linearized */
> BUG_ON(skb_shinfo(skb)->nr_frags);
>
> + if (IS_ENABLED(CONFIG_USB_NET_SMSC95XX_TXALIGN) && align_tx) {
> + align = (uintptr_t)skb->data & 3;
> + if (align)
> + overhead += 4 - align;
Better to calculate the pad size once:
align = (-(long)skb->data) & 3;
should do it - and you can unconditionally add it in.
> + }
> +
> /* Make writable and expand header space by overhead if required */
> if (skb_cow_head(skb, overhead)) {
> /* Must deallocate here as returning NULL to indicate error
> @@ -2037,16 +2049,22 @@ static struct sk_buff *smsc95xx_tx_fixup(struct usbnet *dev,
> }
> }
>
> + data_len = skb->len;
> + if (align)
> + skb_push(skb, 4 - align);
> +
> skb_push(skb, 4);
You don't want to call skb_push() twice.
IIRC really horrid things happen if the data has to be copied.
(Actually what happens to the alignment in that case??)
And there is another skb_push() below....
> - tx_cmd_b = (u32)(skb->len - 4);
> + tx_cmd_b = (u32)(data_len);
You don't need the cast here at all (if it was ever needed).
Actually you don't need the new 'data_len' variable.
Just set tx_cmd_b earlier.
...
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On 02/10/18 14:19, David Laight wrote:
> From: Ben Dooks
>> Sent: 02 October 2018 10:27
>>
>> The tegra driver requires alignment of the buffer, so try and
>> make this better by pushing the buffer start back to an word
>> aligned address. At the worst this makes memcpy() easier as
>> it is word aligned, at best it makes sure the usb can directly
>> map the buffer.
>>
>> Signed-off-by: Ben Dooks <[email protected]>
>> [todo - make this configurable]
>> ---
>> drivers/net/usb/Kconfig | 12 ++++++++++++
>> drivers/net/usb/smsc95xx.c | 22 ++++++++++++++++++++--
>> 2 files changed, 32 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/usb/Kconfig b/drivers/net/usb/Kconfig
> ...
>> +static bool align_tx = IS_ENABLED(CONFIG_USB_NET_SMSC95XX_TXALIGN);
>> +module_param(align_tx, bool, 0644);
>> +MODULE_PARM_DESC(align_tx, "Align TX buffers to word boundaries");
>
> DM doesn't like module parameters.
>
>> static bool turbo_mode = IS_ENABLED(CONFIG_USB_NET_SMSC95XX_TURBO);
>> module_param(turbo_mode, bool, 0644);
>> MODULE_PARM_DESC(turbo_mode, "Enable multiple frames per Rx transaction");
>> @@ -2005,10 +2009,18 @@ static struct sk_buff *smsc95xx_tx_fixup(struct usbnet *dev,
>> bool csum = skb->ip_summed == CHECKSUM_PARTIAL;
>> int overhead = csum ? SMSC95XX_TX_OVERHEAD_CSUM : SMSC95XX_TX_OVERHEAD;
>> u32 tx_cmd_a, tx_cmd_b;
>> + u32 data_len;
>> + uintptr_t align = 0;
>>
>> /* We do not advertise SG, so skbs should be already linearized */
>> BUG_ON(skb_shinfo(skb)->nr_frags);
>>
>> + if (IS_ENABLED(CONFIG_USB_NET_SMSC95XX_TXALIGN) && align_tx) {
>> + align = (uintptr_t)skb->data & 3;
>> + if (align)
>> + overhead += 4 - align;
>
> Better to calculate the pad size once:
> align = (-(long)skb->data) & 3;
> should do it - and you can unconditionally add it in.
>
>> + }
>> +
>> /* Make writable and expand header space by overhead if required */
>> if (skb_cow_head(skb, overhead)) {
>> /* Must deallocate here as returning NULL to indicate error
>> @@ -2037,16 +2049,22 @@ static struct sk_buff *smsc95xx_tx_fixup(struct usbnet *dev,
>> }
>> }
>>
>> + data_len = skb->len;
>> + if (align)
>> + skb_push(skb, 4 - align);
>> +
>> skb_push(skb, 4);
>
> You don't want to call skb_push() twice.
> IIRC really horrid things happen if the data has to be copied.
> (Actually what happens to the alignment in that case??)
> And there is another skb_push() below....
The driver does it /multiple/ times depending on the path used.
Is it wise to try and make a separate patch to skb_push() once
and also move the tx_cmd_a and tx_cmd_b bit to a single point?
>> - tx_cmd_b = (u32)(skb->len - 4);
>> + tx_cmd_b = (u32)(data_len);
>
> You don't need the cast here at all (if it was ever needed).
> Actually you don't need the new 'data_len' variable.
> Just set tx_cmd_b earlier.
>
> ...
>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>
>
--
Ben Dooks http://www.codethink.co.uk/
Senior Engineer Codethink - Providing Genius
https://www.codethink.co.uk/privacy.html
The smsc95xx_tx_fixup is doing multiple calls to skb_push() to
put an 8-byte command header onto the packet. It would be easier
to do one skb_push() and then copy the data in once the push is
done.
Signed-off-by: Ben Dooks <[email protected]>
---
drivers/net/usb/smsc95xx.c | 25 +++++++++++++------------
1 file changed, 13 insertions(+), 12 deletions(-)
diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index cb19aea139d3..813ab93ee2c3 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -2006,6 +2006,7 @@ static struct sk_buff *smsc95xx_tx_fixup(struct usbnet *dev,
bool csum = skb->ip_summed == CHECKSUM_PARTIAL;
int overhead = csum ? SMSC95XX_TX_OVERHEAD_CSUM : SMSC95XX_TX_OVERHEAD;
u32 tx_cmd_a, tx_cmd_b;
+ void *ptr;
/* We do not advertise SG, so skbs should be already linearized */
BUG_ON(skb_shinfo(skb)->nr_frags);
@@ -2019,6 +2020,9 @@ static struct sk_buff *smsc95xx_tx_fixup(struct usbnet *dev,
return NULL;
}
+ tx_cmd_b = (u32)skb->len;
+ tx_cmd_a = tx_cmd_b | TX_CMD_A_FIRST_SEG_ | TX_CMD_A_LAST_SEG_;
+
if (csum) {
if (skb->len <= 45) {
/* workaround - hardware tx checksum does not work
@@ -2035,21 +2039,18 @@ static struct sk_buff *smsc95xx_tx_fixup(struct usbnet *dev,
skb_push(skb, 4);
cpu_to_le32s(&csum_preamble);
memcpy(skb->data, &csum_preamble, 4);
+
+ tx_cmd_a += 4;
+ tx_cmd_b += 4;
+ tx_cmd_b |= TX_CMD_B_CSUM_ENABLE;
}
}
- skb_push(skb, 4);
- tx_cmd_b = (u32)(skb->len - 4);
- if (csum)
- tx_cmd_b |= TX_CMD_B_CSUM_ENABLE;
- cpu_to_le32s(&tx_cmd_b);
- memcpy(skb->data, &tx_cmd_b, 4);
-
- skb_push(skb, 4);
- tx_cmd_a = (u32)(skb->len - 8) | TX_CMD_A_FIRST_SEG_ |
- TX_CMD_A_LAST_SEG_;
- cpu_to_le32s(&tx_cmd_a);
- memcpy(skb->data, &tx_cmd_a, 4);
+ ptr = skb_push(skb, 8);
+ tx_cmd_a = cpu_to_le32(tx_cmd_a);
+ tx_cmd_b = cpu_to_le32(tx_cmd_b);
+ memcpy(ptr, &tx_cmd_a, 4);
+ memcpy(ptr+4, &tx_cmd_b, 4);
return skb;
}
--
2.19.0
From: Ben Dooks
> Sent: 02 October 2018 17:56
>
> The smsc95xx_tx_fixup is doing multiple calls to skb_push() to
> put an 8-byte command header onto the packet. It would be easier
> to do one skb_push() and then copy the data in once the push is
> done.
>
> Signed-off-by: Ben Dooks <[email protected]>
> ---
> drivers/net/usb/smsc95xx.c | 25 +++++++++++++------------
> 1 file changed, 13 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
> index cb19aea139d3..813ab93ee2c3 100644
> --- a/drivers/net/usb/smsc95xx.c
> +++ b/drivers/net/usb/smsc95xx.c
> @@ -2006,6 +2006,7 @@ static struct sk_buff *smsc95xx_tx_fixup(struct usbnet *dev,
> bool csum = skb->ip_summed == CHECKSUM_PARTIAL;
> int overhead = csum ? SMSC95XX_TX_OVERHEAD_CSUM : SMSC95XX_TX_OVERHEAD;
> u32 tx_cmd_a, tx_cmd_b;
> + void *ptr;
It might be useful to define a structure for the header.
You might need to find the 'store unaligned 32bit word' macro though.
(Actually that will probably be better than the memcpy() which might
end up doing memory-memory copies rather than storing the register.)
Although if/when you add the tx alignment that won't be needed because the
header will be aligned.
> /* We do not advertise SG, so skbs should be already linearized */
> BUG_ON(skb_shinfo(skb)->nr_frags);
> @@ -2019,6 +2020,9 @@ static struct sk_buff *smsc95xx_tx_fixup(struct usbnet *dev,
> return NULL;
> }
>
> + tx_cmd_b = (u32)skb->len;
> + tx_cmd_a = tx_cmd_b | TX_CMD_A_FIRST_SEG_ | TX_CMD_A_LAST_SEG_;
> +
> if (csum) {
> if (skb->len <= 45) {
> /* workaround - hardware tx checksum does not work
> @@ -2035,21 +2039,18 @@ static struct sk_buff *smsc95xx_tx_fixup(struct usbnet *dev,
> skb_push(skb, 4);
> cpu_to_le32s(&csum_preamble);
Not related, but csum_preamble = cpu_to_le32(csum_preamble) is likely to
generate better code (at least for some architectures).
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On 2018-10-03 14:36, David Laight wrote:
> From: Ben Dooks
>> Sent: 02 October 2018 17:56
>>
>> The smsc95xx_tx_fixup is doing multiple calls to skb_push() to
>> put an 8-byte command header onto the packet. It would be easier
>> to do one skb_push() and then copy the data in once the push is
>> done.
>>
>> Signed-off-by: Ben Dooks <[email protected]>
>> ---
>> drivers/net/usb/smsc95xx.c | 25 +++++++++++++------------
>> 1 file changed, 13 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
>> index cb19aea139d3..813ab93ee2c3 100644
>> --- a/drivers/net/usb/smsc95xx.c
>> +++ b/drivers/net/usb/smsc95xx.c
>> @@ -2006,6 +2006,7 @@ static struct sk_buff *smsc95xx_tx_fixup(struct
>> usbnet *dev,
>> bool csum = skb->ip_summed == CHECKSUM_PARTIAL;
>> int overhead = csum ? SMSC95XX_TX_OVERHEAD_CSUM :
>> SMSC95XX_TX_OVERHEAD;
>> u32 tx_cmd_a, tx_cmd_b;
>> + void *ptr;
>
> It might be useful to define a structure for the header.
> You might need to find the 'store unaligned 32bit word' macro though.
> (Actually that will probably be better than the memcpy() which might
> end up doing memory-memory copies rather than storing the register.)
> Although if/when you add the tx alignment that won't be needed because
> the
> header will be aligned.
Ok, might be worth doing.
I did try to do a "u32 tx_cmd[2]" but the code generated ended up
storing
stuff onto the stack before copying into the packet. I agree that
possibly
going to the "put_unaligned" function might be nicer too.
If we did enable tx-align all the time then we'd not have to care about
the
alignment, but I didn't want to do that if possible as that would end up
sending up to 3 bytes extra per packet.
I am trying not too do too many changes at one time to allow roll back.
>> /* We do not advertise SG, so skbs should be already linearized */
>> BUG_ON(skb_shinfo(skb)->nr_frags);
>> @@ -2019,6 +2020,9 @@ static struct sk_buff *smsc95xx_tx_fixup(struct
>> usbnet *dev,
>> return NULL;
>> }
>>
>> + tx_cmd_b = (u32)skb->len;
>> + tx_cmd_a = tx_cmd_b | TX_CMD_A_FIRST_SEG_ | TX_CMD_A_LAST_SEG_;
>> +
>> if (csum) {
>> if (skb->len <= 45) {
>> /* workaround - hardware tx checksum does not work
>> @@ -2035,21 +2039,18 @@ static struct sk_buff
>> *smsc95xx_tx_fixup(struct usbnet *dev,
>> skb_push(skb, 4);
>> cpu_to_le32s(&csum_preamble);
>
> Not related, but csum_preamble = cpu_to_le32(csum_preamble) is likely
> to
> generate better code (at least for some architectures).
>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
> MK1 1PT, UK
> Registration No: 1397386 (Wales)
On Tue, 2018-10-02 at 10:26 +0100, Ben Dooks wrote:
> The tegra driver requires alignment of the buffer, so try and
> make this better by pushing the buffer start back to an word
> aligned address. At the worst this makes memcpy() easier as
> it is word aligned, at best it makes sure the usb can directly
> map the buffer.
>
> Signed-off-by: Ben Dooks <[email protected]>
> [todo - make this configurable]
[...]
I don't think you need a separate kconfig symbol for this. Aligning RX
buffers to words (or better, cache lines) is almost always a win, so
long as the CPU can handle misaligned fields in the network/transport
headers. You can use #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS to
check that.
It seems like NET_IP_ALIGN should be defined to 0 or 2 depending on
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS, but NET_IP_ALIGN predates the
latter.
Ben.
--
Ben Hutchings, Software Developer Codethink Ltd
https://www.codethink.co.uk/ Dale House, 35 Dale Street
Manchester, M1 2HF, United Kingdom
On Tue, 2018-10-02 at 10:26 +0100, Ben Dooks wrote:
[...]
> @@ -2031,7 +2045,8 @@ static struct sk_buff *smsc95xx_tx_fixup(struct usbnet *dev,
> > }
>
> if (csum) {
> - if (skb->len <= 45) {
> + /* note, csum does not work if csum in last DWORD of packet */
> + if (skb->len <= 45 || !smsc95xx_can_checksum(skb)) {
It would make more sense to move the length check into
smsc95xx_can_checksum() as well.
Ben.
> /* workaround - hardware tx checksum does not work
> * properly with extremely small packets */
> long csstart = skb_checksum_start_offset(skb);
--
Ben Hutchings, Software Developer Codethink Ltd
https://www.codethink.co.uk/ Dale House, 35 Dale Street
Manchester, M1 2HF, United Kingdom
From: Ben Dooks <[email protected]>
Date: Tue, 2 Oct 2018 17:56:02 +0100
> - memcpy(skb->data, &tx_cmd_a, 4);
> + ptr = skb_push(skb, 8);
> + tx_cmd_a = cpu_to_le32(tx_cmd_a);
> + tx_cmd_b = cpu_to_le32(tx_cmd_b);
> + memcpy(ptr, &tx_cmd_a, 4);
> + memcpy(ptr+4, &tx_cmd_b, 4);
Even a memcpy() through a void pointer does not guarantee that gcc will
not emit word sized loads and stores.
You must use the get_unaligned()/put_unaligned() facilities to do this
properly.
I also agree that making a proper type and structure instead of using
a void pointer would be better.
On 2018-10-05 22:24, David Miller wrote:
> From: Ben Dooks <[email protected]>
> Date: Tue, 2 Oct 2018 17:56:02 +0100
>
>> - memcpy(skb->data, &tx_cmd_a, 4);
>> + ptr = skb_push(skb, 8);
>> + tx_cmd_a = cpu_to_le32(tx_cmd_a);
>> + tx_cmd_b = cpu_to_le32(tx_cmd_b);
>> + memcpy(ptr, &tx_cmd_a, 4);
>> + memcpy(ptr+4, &tx_cmd_b, 4);
>
> Even a memcpy() through a void pointer does not guarantee that gcc will
> not emit word sized loads and stores.
>
> You must use the get_unaligned()/put_unaligned() facilities to do this
> properly.
Thanks, got a new version of the series just being tested with this.
Should it go into the original, or as a separate change?
>
> I also agree that making a proper type and structure instead of using
> a void pointer would be better.
From: Ben Dooks <[email protected]>
Date: Sat, 06 Oct 2018 12:27:27 +0100
> Thanks, got a new version of the series just being tested with this.
> Should it go into the original, or as a separate change?
Into the original.
From: David Miller
> Sent: 05 October 2018 22:24
>
> From: Ben Dooks <[email protected]>
> Date: Tue, 2 Oct 2018 17:56:02 +0100
>
> > - memcpy(skb->data, &tx_cmd_a, 4);
> > + ptr = skb_push(skb, 8);
> > + tx_cmd_a = cpu_to_le32(tx_cmd_a);
> > + tx_cmd_b = cpu_to_le32(tx_cmd_b);
> > + memcpy(ptr, &tx_cmd_a, 4);
> > + memcpy(ptr+4, &tx_cmd_b, 4);
>
> Even a memcpy() through a void pointer does not guarantee that gcc will
> not emit word sized loads and stores.
True, but only if gcc can 'see' something that would require the
pointer be aligned.
In this case the void pointer comes from an external function
so is fine.
> You must use the get_unaligned()/put_unaligned() facilities to do this
> properly.
>
> I also agree that making a proper type and structure instead of using
> a void pointer would be better.
The structure would need to be marked 'packed' - since its alignment
isn't guaranteed.
Then you don't need to use put_unaligned().
If it wasn't 'packed' then gcc would implement
memcpy(&hdr->tx_cmd_a, &tx_cmd_a, 4) using an aligned write.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)