Occasionally, during the disconnection procedure on XenBus which
includes hash cache deinitialization there might be some packets
still in-flight on other processors. Handling of these packets includes
hashing and hash cache population that finally results in hash cache
data structure corruption.
In order to avoid this we prevent hashing of those packets if there
are no queues initialized. In that case RCU protection of queues guards
the hash cache as well.
Signed-off-by: Igor Druzhinin <[email protected]>
---
Found this while applying the previous patch to our patchqueue. Seems it
never went to the mailing list and, to my knowledge, the problem is still
present. From my recollection, it only happened on stress frontend on/off
test with Windows guests (since only those detach the frontend completely).
So better late than never.
---
drivers/net/xen-netback/hash.c | 2 ++
drivers/net/xen-netback/interface.c | 7 +++++++
2 files changed, 9 insertions(+)
diff --git a/drivers/net/xen-netback/hash.c b/drivers/net/xen-netback/hash.c
index 0ccb021..10d580c 100644
--- a/drivers/net/xen-netback/hash.c
+++ b/drivers/net/xen-netback/hash.c
@@ -454,6 +454,8 @@ void xenvif_init_hash(struct xenvif *vif)
if (xenvif_hash_cache_size == 0)
return;
+ BUG_ON(vif->hash.cache.count);
+
spin_lock_init(&vif->hash.cache.lock);
INIT_LIST_HEAD(&vif->hash.cache.list);
}
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 182d677..6da1251 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -153,6 +153,13 @@ static u16 xenvif_select_queue(struct net_device *dev, struct sk_buff *skb,
{
struct xenvif *vif = netdev_priv(dev);
unsigned int size = vif->hash.size;
+ unsigned int num_queues;
+
+ /* If queues are not set up internally - always return 0
+ * as the packet going to be dropped anyway */
+ num_queues = READ_ONCE(vif->num_queues);
+ if (num_queues < 1)
+ return 0;
if (vif->hash.alg == XEN_NETIF_CTRL_HASH_ALGORITHM_NONE)
return fallback(dev, skb, NULL) % dev->real_num_tx_queues;
--
2.7.4
> -----Original Message-----
> From: Igor Druzhinin [mailto:[email protected]]
> Sent: 28 February 2019 14:11
> To: [email protected]; [email protected]; [email protected]
> Cc: Wei Liu <[email protected]>; Paul Durrant <[email protected]>; [email protected]; Igor
> Druzhinin <[email protected]>
> Subject: [PATCH] xen-netback: don't populate the hash cache on XenBus disconnect
>
> Occasionally, during the disconnection procedure on XenBus which
> includes hash cache deinitialization there might be some packets
> still in-flight on other processors. Handling of these packets includes
> hashing and hash cache population that finally results in hash cache
> data structure corruption.
>
> In order to avoid this we prevent hashing of those packets if there
> are no queues initialized. In that case RCU protection of queues guards
> the hash cache as well.
>
> Signed-off-by: Igor Druzhinin <[email protected]>
Reviewed-by: Paul Durrant <[email protected]>
> ---
>
> Found this while applying the previous patch to our patchqueue. Seems it
> never went to the mailing list and, to my knowledge, the problem is still
> present. From my recollection, it only happened on stress frontend on/off
> test with Windows guests (since only those detach the frontend completely).
> So better late than never.
>
> ---
> drivers/net/xen-netback/hash.c | 2 ++
> drivers/net/xen-netback/interface.c | 7 +++++++
> 2 files changed, 9 insertions(+)
>
> diff --git a/drivers/net/xen-netback/hash.c b/drivers/net/xen-netback/hash.c
> index 0ccb021..10d580c 100644
> --- a/drivers/net/xen-netback/hash.c
> +++ b/drivers/net/xen-netback/hash.c
> @@ -454,6 +454,8 @@ void xenvif_init_hash(struct xenvif *vif)
> if (xenvif_hash_cache_size == 0)
> return;
>
> + BUG_ON(vif->hash.cache.count);
> +
> spin_lock_init(&vif->hash.cache.lock);
> INIT_LIST_HEAD(&vif->hash.cache.list);
> }
> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
> index 182d677..6da1251 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -153,6 +153,13 @@ static u16 xenvif_select_queue(struct net_device *dev, struct sk_buff *skb,
> {
> struct xenvif *vif = netdev_priv(dev);
> unsigned int size = vif->hash.size;
> + unsigned int num_queues;
> +
> + /* If queues are not set up internally - always return 0
> + * as the packet going to be dropped anyway */
> + num_queues = READ_ONCE(vif->num_queues);
> + if (num_queues < 1)
> + return 0;
>
> if (vif->hash.alg == XEN_NETIF_CTRL_HASH_ALGORITHM_NONE)
> return fallback(dev, skb, NULL) % dev->real_num_tx_queues;
> --
> 2.7.4
From: Igor Druzhinin <[email protected]>
Date: Thu, 28 Feb 2019 14:11:26 +0000
> Occasionally, during the disconnection procedure on XenBus which
> includes hash cache deinitialization there might be some packets
> still in-flight on other processors. Handling of these packets includes
> hashing and hash cache population that finally results in hash cache
> data structure corruption.
>
> In order to avoid this we prevent hashing of those packets if there
> are no queues initialized. In that case RCU protection of queues guards
> the hash cache as well.
>
> Signed-off-by: Igor Druzhinin <[email protected]>
Applied and queued up for -stable, thanks.