2012-06-06 08:35:35

by Eric Dumazet

[permalink] [raw]
Subject: [PATCH] virtio-net: fix a race on 32bit arches

From: Eric Dumazet <[email protected]>

commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race
on 32bit arches.

We must use separate syncp for rx and tx path as they can be run at the
same time on different cpus. Thus one sequence increment can be lost and
readers spin forever.

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Jason Wang <[email protected]>
---
drivers/net/virtio_net.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 5214b1e..f18149a 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -42,7 +42,8 @@ module_param(gso, bool, 0444);
#define VIRTNET_DRIVER_VERSION "1.0.0"

struct virtnet_stats {
- struct u64_stats_sync syncp;
+ struct u64_stats_sync tx_syncp;
+ struct u64_stats_sync rx_syncp;
u64 tx_bytes;
u64 tx_packets;

@@ -300,10 +301,10 @@ static void receive_buf(struct net_device *dev, void *buf, unsigned int len)

hdr = skb_vnet_hdr(skb);

- u64_stats_update_begin(&stats->syncp);
+ u64_stats_update_begin(&stats->rx_syncp);
stats->rx_bytes += skb->len;
stats->rx_packets++;
- u64_stats_update_end(&stats->syncp);
+ u64_stats_update_end(&stats->rx_syncp);

if (hdr->hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
pr_debug("Needs csum!\n");
@@ -565,10 +566,10 @@ static unsigned int free_old_xmit_skbs(struct virtnet_info *vi)
while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) {
pr_debug("Sent skb %p\n", skb);

- u64_stats_update_begin(&stats->syncp);
+ u64_stats_update_begin(&stats->tx_syncp);
stats->tx_bytes += skb->len;
stats->tx_packets++;
- u64_stats_update_end(&stats->syncp);
+ u64_stats_update_end(&stats->tx_syncp);

tot_sgs += skb_vnet_hdr(skb)->num_sg;
dev_kfree_skb_any(skb);
@@ -703,12 +704,16 @@ static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev,
u64 tpackets, tbytes, rpackets, rbytes;

do {
- start = u64_stats_fetch_begin(&stats->syncp);
+ start = u64_stats_fetch_begin(&stats->tx_syncp);
tpackets = stats->tx_packets;
tbytes = stats->tx_bytes;
+ } while (u64_stats_fetch_retry(&stats->tx_syncp, start));
+
+ do {
+ start = u64_stats_fetch_begin(&stats->rx_syncp);
rpackets = stats->rx_packets;
rbytes = stats->rx_bytes;
- } while (u64_stats_fetch_retry(&stats->syncp, start));
+ } while (u64_stats_fetch_retry(&stats->rx_syncp, start));

tot->rx_packets += rpackets;
tot->tx_packets += tpackets;


2012-06-06 08:45:49

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

On Wed, 2012-06-06 at 10:35 +0200, Eric Dumazet wrote:
> From: Eric Dumazet <[email protected]>
>
> commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race
> on 32bit arches.
>
> We must use separate syncp for rx and tx path as they can be run at the
> same time on different cpus. Thus one sequence increment can be lost and
> readers spin forever.
>
> Signed-off-by: Eric Dumazet <[email protected]>
> Cc: Stephen Hemminger <[email protected]>
> Cc: Michael S. Tsirkin <[email protected]>
> Cc: Jason Wang <[email protected]>
> ---

Just to make clear : even using percpu stats/syncp, we have no guarantee
that write_seqcount_begin() is done with one instruction. [1]

It is OK on x86 if "incl" instruction is generated by the compiler, but
on a RISC cpu, the "load memory,%reg ; inc %reg ; store %reg,memory" can
be interrupted.

So if you are 100% sure all paths are safe against preemption/BH, then
this patch is not needed, but a big comment in the code would avoid
adding possible races in the future.

[1] If done with one instruction, we still have a race, since a reader
might see an even sequence and conclude no writer is inside the critical
section. So read values could be wrong.


2012-06-06 09:36:02

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

On 06/06/2012 04:45 PM, Eric Dumazet wrote:
> On Wed, 2012-06-06 at 10:35 +0200, Eric Dumazet wrote:
>> From: Eric Dumazet<[email protected]>
>>
>> commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race
>> on 32bit arches.
>>
>> We must use separate syncp for rx and tx path as they can be run at the
>> same time on different cpus. Thus one sequence increment can be lost and
>> readers spin forever.
>>
>> Signed-off-by: Eric Dumazet<[email protected]>
>> Cc: Stephen Hemminger<[email protected]>
>> Cc: Michael S. Tsirkin<[email protected]>
>> Cc: Jason Wang<[email protected]>
>> ---
> Just to make clear : even using percpu stats/syncp, we have no guarantee
> that write_seqcount_begin() is done with one instruction. [1]
>
> It is OK on x86 if "incl" instruction is generated by the compiler, but
> on a RISC cpu, the "load memory,%reg ; inc %reg ; store %reg,memory" can
> be interrupted.
>
> So if you are 100% sure all paths are safe against preemption/BH, then
> this patch is not needed, but a big comment in the code would avoid
> adding possible races in the future.

Thanks for explaing, current virtio-net is safe I think. But the patch
is still needed as my patch would update the statistics in irq.
>
> [1] If done with one instruction, we still have a race, since a reader
> might see an even sequence and conclude no writer is inside the critical
> section. So read values could be wrong.
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2012-06-06 11:14:05

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

On Wed, Jun 06, 2012 at 10:45:41AM +0200, Eric Dumazet wrote:
> On Wed, 2012-06-06 at 10:35 +0200, Eric Dumazet wrote:
> > From: Eric Dumazet <[email protected]>
> >
> > commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race
> > on 32bit arches.
> >
> > We must use separate syncp for rx and tx path as they can be run at the
> > same time on different cpus. Thus one sequence increment can be lost and
> > readers spin forever.
> >
> > Signed-off-by: Eric Dumazet <[email protected]>
> > Cc: Stephen Hemminger <[email protected]>
> > Cc: Michael S. Tsirkin <[email protected]>
> > Cc: Jason Wang <[email protected]>
> > ---
>
> Just to make clear : even using percpu stats/syncp, we have no guarantee
> that write_seqcount_begin() is done with one instruction. [1]
>
> It is OK on x86 if "incl" instruction is generated by the compiler, but
> on a RISC cpu, the "load memory,%reg ; inc %reg ; store %reg,memory" can
> be interrupted.
>
> So if you are 100% sure all paths are safe against preemption/BH, then
> this patch is not needed, but a big comment in the code would avoid
> adding possible races in the future.

We currently do all stats either on napi callback or from
start_xmit callback.
This makes them safe, yes?

> [1] If done with one instruction, we still have a race, since a reader
> might see an even sequence and conclude no writer is inside the critical
> section. So read values could be wrong.
>
>

2012-06-06 13:10:20

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

On Wed, 2012-06-06 at 14:13 +0300, Michael S. Tsirkin wrote:

> We currently do all stats either on napi callback or from
> start_xmit callback.
> This makes them safe, yes?

Hmm, then _bh() variant is needed in virtnet_stats(), as explained in
include/linux/u64_stats_sync.h section 6)

* 6) If counter might be written by an interrupt, readers should block interrupts.
* (On UP, there is no seqcount_t protection, a reader allowing interrupts could
* read partial values)

Yes, its tricky...

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 5214b1e..705aaa7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -703,12 +703,12 @@ static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev,
u64 tpackets, tbytes, rpackets, rbytes;

do {
- start = u64_stats_fetch_begin(&stats->syncp);
+ start = u64_stats_fetch_begin_bh(&stats->syncp);
tpackets = stats->tx_packets;
tbytes = stats->tx_bytes;
rpackets = stats->rx_packets;
rbytes = stats->rx_bytes;
- } while (u64_stats_fetch_retry(&stats->syncp, start));
+ } while (u64_stats_fetch_retry_bh(&stats->syncp, start));

tot->rx_packets += rpackets;
tot->tx_packets += tpackets;

2012-06-06 14:49:50

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

On Wed, Jun 06, 2012 at 03:10:10PM +0200, Eric Dumazet wrote:
> On Wed, 2012-06-06 at 14:13 +0300, Michael S. Tsirkin wrote:
>
> > We currently do all stats either on napi callback or from
> > start_xmit callback.
> > This makes them safe, yes?
>
> Hmm, then _bh() variant is needed in virtnet_stats(), as explained in
> include/linux/u64_stats_sync.h section 6)
>
> * 6) If counter might be written by an interrupt, readers should block interrupts.
> * (On UP, there is no seqcount_t protection, a reader allowing interrupts could
> * read partial values)
>
> Yes, its tricky...

Sounds good, but I have a question: this realies on counters
being atomic on 64 bit.
Would not it be better to always use a seqlock even on 64 bit?
This way counters would actually be correct and in sync.
As it is if we want e.g. average packet size,
we can not rely e.g. on it being bytes/packets.

> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 5214b1e..705aaa7 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -703,12 +703,12 @@ static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev,
> u64 tpackets, tbytes, rpackets, rbytes;
>
> do {
> - start = u64_stats_fetch_begin(&stats->syncp);
> + start = u64_stats_fetch_begin_bh(&stats->syncp);
> tpackets = stats->tx_packets;
> tbytes = stats->tx_bytes;
> rpackets = stats->rx_packets;
> rbytes = stats->rx_bytes;
> - } while (u64_stats_fetch_retry(&stats->syncp, start));
> + } while (u64_stats_fetch_retry_bh(&stats->syncp, start));
>
> tot->rx_packets += rpackets;
> tot->tx_packets += tpackets;
>

2012-06-06 15:14:37

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

On Wed, 6 Jun 2012 17:49:42 +0300
"Michael S. Tsirkin" <[email protected]> wrote:

> Sounds good, but I have a question: this realies on counters
> being atomic on 64 bit.
> Would not it be better to always use a seqlock even on 64 bit?
> This way counters would actually be correct and in sync.
> As it is if we want e.g. average packet size,
> we can not rely e.g. on it being bytes/packets.

This has not been a requirement on real physical devices; therefore
the added overhead is not really justified.

Many network cards use counters in hardware to count packets/bytes
and there is no expectation of atomic access there.

2012-06-06 15:19:16

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

On Wed, 2012-06-06 at 17:49 +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 06, 2012 at 03:10:10PM +0200, Eric Dumazet wrote:
> > On Wed, 2012-06-06 at 14:13 +0300, Michael S. Tsirkin wrote:
> >
> > > We currently do all stats either on napi callback or from
> > > start_xmit callback.
> > > This makes them safe, yes?
> >
> > Hmm, then _bh() variant is needed in virtnet_stats(), as explained in
> > include/linux/u64_stats_sync.h section 6)
> >
> > * 6) If counter might be written by an interrupt, readers should block interrupts.
> > * (On UP, there is no seqcount_t protection, a reader allowing interrupts could
> > * read partial values)
> >
> > Yes, its tricky...
>
> Sounds good, but I have a question: this realies on counters
> being atomic on 64 bit.
> Would not it be better to always use a seqlock even on 64 bit?
> This way counters would actually be correct and in sync.
> As it is if we want e.g. average packet size,
> we can not rely e.g. on it being bytes/packets.

When this stuff was discussed, we chose to have a nop on 64bits.

Your point has little to do with 64bit stats, it was already like that
with 'long int' counters.

Consider average driver doing :

dev->stats.rx_bytes += skb->len;
dev->stats.rx_packets++;

A concurrent reader can read an updated rx_bytes and a 'previous'
rx_packets one.

'fixing' this requires a lot of work and memory barriers (in all
drivers), for a very litle gain (at most one packet error)

u64_stats_sync was really meant to be 0-cost on 64bit arches.


2012-06-06 16:17:22

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

On Wed, Jun 06, 2012 at 05:19:04PM +0200, Eric Dumazet wrote:
> On Wed, 2012-06-06 at 17:49 +0300, Michael S. Tsirkin wrote:
> > On Wed, Jun 06, 2012 at 03:10:10PM +0200, Eric Dumazet wrote:
> > > On Wed, 2012-06-06 at 14:13 +0300, Michael S. Tsirkin wrote:
> > >
> > > > We currently do all stats either on napi callback or from
> > > > start_xmit callback.
> > > > This makes them safe, yes?
> > >
> > > Hmm, then _bh() variant is needed in virtnet_stats(), as explained in
> > > include/linux/u64_stats_sync.h section 6)
> > >
> > > * 6) If counter might be written by an interrupt, readers should block interrupts.
> > > * (On UP, there is no seqcount_t protection, a reader allowing interrupts could
> > > * read partial values)
> > >
> > > Yes, its tricky...
> >
> > Sounds good, but I have a question: this realies on counters
> > being atomic on 64 bit.
> > Would not it be better to always use a seqlock even on 64 bit?
> > This way counters would actually be correct and in sync.
> > As it is if we want e.g. average packet size,
> > we can not rely e.g. on it being bytes/packets.
>
> When this stuff was discussed, we chose to have a nop on 64bits.
>
> Your point has little to do with 64bit stats, it was already like that
> with 'long int' counters.

Yes, of course.

> Consider average driver doing :
>
> dev->stats.rx_bytes += skb->len;
> dev->stats.rx_packets++;
>
> A concurrent reader can read an updated rx_bytes and a 'previous'
> rx_packets one.
>
> 'fixing' this requires a lot of work and memory barriers (in all
> drivers), for a very litle gain (at most one packet error)
> u64_stats_sync was really meant to be 0-cost on 64bit arches.
>
>

I understand, and not arguing about that.

But why do you say at most 1 packet?

Consider get_stats doing:
u64_stats_update_begin(&stats->syncp);
stats->tx_bytes += skb->len;

on 64 bit at this point
tx_packets might get incremented any number of times, no?

stats->tx_packets++;
u64_stats_update_end(&stats->syncp);

now tx_bytes and tx_packets are out of sync by more than 1.


2012-06-06 17:13:10

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

On Wed, 2012-06-06 at 19:17 +0300, Michael S. Tsirkin wrote:

> But why do you say at most 1 packet?
>
> Consider get_stats doing:
> u64_stats_update_begin(&stats->syncp);
> stats->tx_bytes += skb->len;
>
> on 64 bit at this point
> tx_packets might get incremented any number of times, no?
>
> stats->tx_packets++;
> u64_stats_update_end(&stats->syncp);
>
> now tx_bytes and tx_packets are out of sync by more than 1.

You lost me there.

No idea of what you are thinking about.

There is no atomicity guarantee in SNMP counters. (Ie fetching tx_bytes
and tx_packets in a transaction is not mandatory in any RFC)

As long as there is no cumulative error, its OK.


2012-06-06 18:43:58

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

On Wed, Jun 06, 2012 at 07:13:02PM +0200, Eric Dumazet wrote:
> On Wed, 2012-06-06 at 19:17 +0300, Michael S. Tsirkin wrote:
>
> > But why do you say at most 1 packet?
> >
> > Consider get_stats doing:
> > u64_stats_update_begin(&stats->syncp);
> > stats->tx_bytes += skb->len;
> >
> > on 64 bit at this point
> > tx_packets might get incremented any number of times, no?
> >
> > stats->tx_packets++;
> > u64_stats_update_end(&stats->syncp);
> >
> > now tx_bytes and tx_packets are out of sync by more than 1.
>
> You lost me there.
>
> No idea of what you are thinking about.

Sorry about that. This is not a bug. I am saying two things:

1. We are trying to look at counters for purposes of tuning the device.
E.g. if ethtool reports packets and bytes, we'd like to calculate
average packet size by bytes/packets.

If both counters are read atomically the metric becomes more exact.
Not a must but nice to have.

2. 32 bit systems have some overhead because of the seqlock.
virtio could instead simply keep tx counters in the queue structure, and
get the tx lock when they are read.


--
MST

2012-06-06 18:51:14

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

On Wed, Jun 06, 2012 at 08:14:32AM -0700, Stephen Hemminger wrote:
> On Wed, 6 Jun 2012 17:49:42 +0300
> "Michael S. Tsirkin" <[email protected]> wrote:
>
> > Sounds good, but I have a question: this realies on counters
> > being atomic on 64 bit.
> > Would not it be better to always use a seqlock even on 64 bit?
> > This way counters would actually be correct and in sync.
> > As it is if we want e.g. average packet size,
> > we can not rely e.g. on it being bytes/packets.
>
> This has not been a requirement on real physical devices; therefore
> the added overhead is not really justified.
>
> Many network cards use counters in hardware to count packets/bytes
> and there is no expectation of atomic access there.

BTW for cards that do implement the counters in software,
under xmit lock, is anything wrong with simply taking the xmit lock
when we get the stats instead of the per-cpu trick + seqlock?

2012-06-06 19:03:11

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

On Wed, Jun 06, 2012 at 05:19:04PM +0200, Eric Dumazet wrote:
> On Wed, 2012-06-06 at 17:49 +0300, Michael S. Tsirkin wrote:
> > On Wed, Jun 06, 2012 at 03:10:10PM +0200, Eric Dumazet wrote:
> > > On Wed, 2012-06-06 at 14:13 +0300, Michael S. Tsirkin wrote:
> > >
> > > > We currently do all stats either on napi callback or from
> > > > start_xmit callback.
> > > > This makes them safe, yes?
> > >
> > > Hmm, then _bh() variant is needed in virtnet_stats(), as explained in
> > > include/linux/u64_stats_sync.h section 6)
> > >
> > > * 6) If counter might be written by an interrupt, readers should block interrupts.
> > > * (On UP, there is no seqcount_t protection, a reader allowing interrupts could
> > > * read partial values)
> > >
> > > Yes, its tricky...
> >
> > Sounds good, but I have a question: this realies on counters
> > being atomic on 64 bit.
> > Would not it be better to always use a seqlock even on 64 bit?
> > This way counters would actually be correct and in sync.
> > As it is if we want e.g. average packet size,
> > we can not rely e.g. on it being bytes/packets.
>
> When this stuff was discussed, we chose to have a nop on 64bits.
>
> Your point has little to do with 64bit stats, it was already like that
> with 'long int' counters.
>
> Consider average driver doing :
>
> dev->stats.rx_bytes += skb->len;
> dev->stats.rx_packets++;
>
> A concurrent reader can read an updated rx_bytes and a 'previous'
> rx_packets one.
>
> 'fixing' this requires a lot of work and memory barriers (in all
> drivers), for a very litle gain (at most one packet error)
>
> u64_stats_sync was really meant to be 0-cost on 64bit arches.
>



So for virtio since all counters get incremented from bh we can
ensure they are read atomically, simply but reading them
from the correct CPU with bh disabled.
And then we don't need u64_stats_sync at all.

--
MST

2012-06-06 19:54:07

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

On Wed, 2012-06-06 at 21:51 +0300, Michael S. Tsirkin wrote:

> BTW for cards that do implement the counters in software,
> under xmit lock, is anything wrong with simply taking the xmit lock
> when we get the stats instead of the per-cpu trick + seqlock?
>

I still dont understand why you would do that.

Most modern machines are 64bits, so there is no seqlock overhead,
nothing at all.

If you focus on 32bit hardware, just stick on 32bit counters ?

Note that most u64_stats_sync users are virtual drivers, without xmit
lock (LLTX drivers)


2012-06-06 19:58:20

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

On Wed, Jun 06, 2012 at 09:54:01PM +0200, Eric Dumazet wrote:
> On Wed, 2012-06-06 at 21:51 +0300, Michael S. Tsirkin wrote:
>
> > BTW for cards that do implement the counters in software,
> > under xmit lock, is anything wrong with simply taking the xmit lock
> > when we get the stats instead of the per-cpu trick + seqlock?
> >
>
> I still dont understand why you would do that.
>
> Most modern machines are 64bits, so there is no seqlock overhead,
> nothing at all.
>
> If you focus on 32bit hardware, just stick on 32bit counters ?

These wrap around.

> Note that most u64_stats_sync users are virtual drivers, without xmit
> lock (LLTX drivers)
>
>

Absolutely, I am talking about virtio here. I'm not kicking
u64_stats_sync idea I am just saying that simple locking
would work for virtio and might be better as it
gives us a way to get counters atomically.


--
MST

2012-06-06 20:00:07

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

On Wed, 2012-06-06 at 19:57 +0300, Michael S. Tsirkin wrote:

> So for virtio since all counters get incremented from bh we can
> ensure they are read atomically, simply but reading them
> from the correct CPU with bh disabled.
> And then we don't need u64_stats_sync at all.
>

Really ? How are you going to read 64bit stats from foreign cpus on
32bit arches, without additional cost in fast path ?

You should read include/linux/u64_stats_sync.h to fully understand the
issues.


2012-06-06 20:06:16

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

On Wed, 2012-06-06 at 21:43 +0300, Michael S. Tsirkin wrote:

> 1. We are trying to look at counters for purposes of tuning the device.
> E.g. if ethtool reports packets and bytes, we'd like to calculate
> average packet size by bytes/packets.
>
> If both counters are read atomically the metric becomes more exact.
> Not a must but nice to have.
>

metrics are exact right now.

As soon as you read a value, it might already have changed.

Maybe you want to stop_machine() to make sure all the metrics you want
are 'exact' ;)

> 2. 32 bit systems have some overhead because of the seqlock.
> virtio could instead simply keep tx counters in the queue structure, and
> get the tx lock when they are read.
>

But then you need atomic64 stuff, have you an idea of the cost of such
primitives on 32bit ?

3. use 32bit counters on 32bit arches, as many drivers still do ?


2012-06-06 20:08:14

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

On Wed, 2012-06-06 at 22:58 +0300, Michael S. Tsirkin wrote:

> Absolutely, I am talking about virtio here. I'm not kicking
> u64_stats_sync idea I am just saying that simple locking
> would work for virtio and might be better as it
> gives us a way to get counters atomically.

Which lock do you own in the RX path ?

You'll have to add a lock in fast path. This sounds really a bad choice
to me.

2012-06-06 20:16:27

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

On Wed, Jun 06, 2012 at 10:08:09PM +0200, Eric Dumazet wrote:
> On Wed, 2012-06-06 at 22:58 +0300, Michael S. Tsirkin wrote:
>
> > Absolutely, I am talking about virtio here. I'm not kicking
> > u64_stats_sync idea I am just saying that simple locking
> > would work for virtio and might be better as it
> > gives us a way to get counters atomically.
>
> Which lock do you own in the RX path ?

We can just disable napi, everything is updated from napi callback.

> You'll have to add a lock in fast path. This sounds really a bad choice
> to me.

.ndo_get_stats64 is not data path though, is it?

--
MST

2012-06-06 20:19:45

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

On Wed, 2012-06-06 at 22:08 +0200, Eric Dumazet wrote:
> On Wed, 2012-06-06 at 22:58 +0300, Michael S. Tsirkin wrote:
>
> > Absolutely, I am talking about virtio here. I'm not kicking
> > u64_stats_sync idea I am just saying that simple locking
> > would work for virtio and might be better as it
> > gives us a way to get counters atomically.
>
> Which lock do you own in the RX path ?
>
> You'll have to add a lock in fast path. This sounds really a bad choice
> to me.

You have the NAPI 'lock', so when gathering stats you can synchronise
using napi_disable() ;-)

Ben.

--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

2012-06-06 20:20:04

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

On Wed, Jun 06, 2012 at 10:06:11PM +0200, Eric Dumazet wrote:
> On Wed, 2012-06-06 at 21:43 +0300, Michael S. Tsirkin wrote:
>
> > 1. We are trying to look at counters for purposes of tuning the device.
> > E.g. if ethtool reports packets and bytes, we'd like to calculate
> > average packet size by bytes/packets.
> >
> > If both counters are read atomically the metric becomes more exact.
> > Not a must but nice to have.
> >
>
> metrics are exact right now.

Yes, but they are not synchronised between themselves.
E.g. you can in theory have a report where #of packets > #of bytes.

I know there's no guarantee they are synchronised
on an arbitrary device but if they are, without
slowing fast path, it's nice.

2012-06-06 20:24:24

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

On Wed, 2012-06-06 at 23:16 +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 06, 2012 at 10:08:09PM +0200, Eric Dumazet wrote:
> > On Wed, 2012-06-06 at 22:58 +0300, Michael S. Tsirkin wrote:
> >
> > > Absolutely, I am talking about virtio here. I'm not kicking
> > > u64_stats_sync idea I am just saying that simple locking
> > > would work for virtio and might be better as it
> > > gives us a way to get counters atomically.
> >
> > Which lock do you own in the RX path ?
>
> We can just disable napi, everything is updated from napi callback.

This is very disruptive, and illegal from ndo_get_stats64()

(ndo_get_stats64() is not allowed to sleep, and I cant see how you are
going to disable napi without sleeping)


2012-06-06 20:26:03

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

On Wed, 2012-06-06 at 21:19 +0100, Ben Hutchings wrote:
> On Wed, 2012-06-06 at 22:08 +0200, Eric Dumazet wrote:
> > On Wed, 2012-06-06 at 22:58 +0300, Michael S. Tsirkin wrote:
> >
> > > Absolutely, I am talking about virtio here. I'm not kicking
> > > u64_stats_sync idea I am just saying that simple locking
> > > would work for virtio and might be better as it
> > > gives us a way to get counters atomically.
> >
> > Which lock do you own in the RX path ?
> >
> > You'll have to add a lock in fast path. This sounds really a bad choice
> > to me.
>
> You have the NAPI 'lock', so when gathering stats you can synchronise
> using napi_disable() ;-)

Nice, this adds one new bug in network stack.

Really guys, can we stop this thread, please ?


2012-06-06 20:35:10

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

On Wed, 2012-06-06 at 23:16 +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 06, 2012 at 10:08:09PM +0200, Eric Dumazet wrote:
> > On Wed, 2012-06-06 at 22:58 +0300, Michael S. Tsirkin wrote:
> >
> > > Absolutely, I am talking about virtio here. I'm not kicking
> > > u64_stats_sync idea I am just saying that simple locking
> > > would work for virtio and might be better as it
> > > gives us a way to get counters atomically.
> >
> > Which lock do you own in the RX path ?
>
> We can just disable napi, everything is updated from napi callback.

Seriously, though: don't do that; this is going to hurt performance for
minimal benefit.

Ben.

> > You'll have to add a lock in fast path. This sounds really a bad choice
> > to me.
>
> .ndo_get_stats64 is not data path though, is it?
>

--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

2012-06-06 20:38:08

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

On Wed, 2012-06-06 at 22:24 +0200, Eric Dumazet wrote:

> (ndo_get_stats64() is not allowed to sleep, and I cant see how you are
> going to disable napi without sleeping)
>
>

In case you wonder, take a look at bond_get_stats() in
drivers/net/bonding/bond_main.c


2012-06-06 20:43:43

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

On Wed, Jun 06, 2012 at 09:35:04PM +0100, Ben Hutchings wrote:
> On Wed, 2012-06-06 at 23:16 +0300, Michael S. Tsirkin wrote:
> > On Wed, Jun 06, 2012 at 10:08:09PM +0200, Eric Dumazet wrote:
> > > On Wed, 2012-06-06 at 22:58 +0300, Michael S. Tsirkin wrote:
> > >
> > > > Absolutely, I am talking about virtio here. I'm not kicking
> > > > u64_stats_sync idea I am just saying that simple locking
> > > > would work for virtio and might be better as it
> > > > gives us a way to get counters atomically.
> > >
> > > Which lock do you own in the RX path ?
> >
> > We can just disable napi, everything is updated from napi callback.
>
> Seriously, though: don't do that; this is going to hurt performance for
> minimal benefit.
>
> Ben.

Yea, it doesn't work anyway. Maybe take a xmit lock for tx and keep
using the per-cpu counters for rx. Or does this sound too disruptive
too?

> > > You'll have to add a lock in fast path. This sounds really a bad choice
> > > to me.
> >
> > .ndo_get_stats64 is not data path though, is it?
> >
>
> --
> Ben Hutchings, Staff Engineer, Solarflare
> Not speaking for my employer; that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.

2012-06-10 06:39:13

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

On Wed, 06 Jun 2012 10:45:41 +0200, Eric Dumazet <[email protected]> wrote:
> On Wed, 2012-06-06 at 10:35 +0200, Eric Dumazet wrote:
> > From: Eric Dumazet <[email protected]>
> >
> > commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race
> > on 32bit arches.
> >
> > We must use separate syncp for rx and tx path as they can be run at the
> > same time on different cpus. Thus one sequence increment can be lost and
> > readers spin forever.
> >
> > Signed-off-by: Eric Dumazet <[email protected]>
> > Cc: Stephen Hemminger <[email protected]>
> > Cc: Michael S. Tsirkin <[email protected]>
> > Cc: Jason Wang <[email protected]>
> > ---
>
> Just to make clear : even using percpu stats/syncp, we have no guarantee
> that write_seqcount_begin() is done with one instruction. [1]
>
> It is OK on x86 if "incl" instruction is generated by the compiler, but
> on a RISC cpu, the "load memory,%reg ; inc %reg ; store %reg,memory" can
> be interrupted.
>
> So if you are 100% sure all paths are safe against preemption/BH, then
> this patch is not needed, but a big comment in the code would avoid
> adding possible races in the future.

Too fragile; let's keep them separate as per this patch.

Acked-by: Rusty Russell <[email protected]>

Thanks,
Rusty.

2012-06-10 07:03:37

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

On Sun, Jun 10, 2012 at 04:06:34PM +0930, Rusty Russell wrote:
> On Wed, 06 Jun 2012 10:45:41 +0200, Eric Dumazet <[email protected]> wrote:
> > On Wed, 2012-06-06 at 10:35 +0200, Eric Dumazet wrote:
> > > From: Eric Dumazet <[email protected]>
> > >
> > > commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race
> > > on 32bit arches.
> > >
> > > We must use separate syncp for rx and tx path as they can be run at the
> > > same time on different cpus. Thus one sequence increment can be lost and
> > > readers spin forever.
> > >
> > > Signed-off-by: Eric Dumazet <[email protected]>
> > > Cc: Stephen Hemminger <[email protected]>
> > > Cc: Michael S. Tsirkin <[email protected]>
> > > Cc: Jason Wang <[email protected]>
> > > ---
> >
> > Just to make clear : even using percpu stats/syncp, we have no guarantee
> > that write_seqcount_begin() is done with one instruction. [1]
> >
> > It is OK on x86 if "incl" instruction is generated by the compiler, but
> > on a RISC cpu, the "load memory,%reg ; inc %reg ; store %reg,memory" can
> > be interrupted.
> >
> > So if you are 100% sure all paths are safe against preemption/BH, then
> > this patch is not needed, but a big comment in the code would avoid
> > adding possible races in the future.
>
> Too fragile; let's keep them separate as per this patch.
>
> Acked-by: Rusty Russell <[email protected]>
>
> Thanks,
> Rusty.

One question though: do we want to lay the structure
out so that the rx sync structure precedes the rx counters?

--
MST

2012-06-10 10:21:20

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

On Sun, 2012-06-10 at 10:03 +0300, Michael S. Tsirkin wrote:

> One question though: do we want to lay the structure
> out so that the rx sync structure precedes the rx counters?
>

I am not sure its worth having holes in the structure, since its percpu
data.

That would be 8 bytes lost per cpu and per device.

2012-06-10 10:22:31

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

On Sun, Jun 10, 2012 at 12:21:14PM +0200, Eric Dumazet wrote:
> On Sun, 2012-06-10 at 10:03 +0300, Michael S. Tsirkin wrote:
>
> > One question though: do we want to lay the structure
> > out so that the rx sync structure precedes the rx counters?
> >
>
> I am not sure its worth having holes in the structure, since its percpu
> data.
>
> That would be 8 bytes lost per cpu and per device.

Right, I forgot it's per cpu.

2012-06-10 10:24:47

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

On Wed, Jun 06, 2012 at 10:35:24AM +0200, Eric Dumazet wrote:
> From: Eric Dumazet <[email protected]>
>
> commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race
> on 32bit arches.
>
> We must use separate syncp for rx and tx path as they can be run at the
> same time on different cpus. Thus one sequence increment can be lost and
> readers spin forever.
>
> Signed-off-by: Eric Dumazet <[email protected]>
> Cc: Stephen Hemminger <[email protected]>
> Cc: Michael S. Tsirkin <[email protected]>
> Cc: Jason Wang <[email protected]>

I'm still thinking about moving tx to take a xmit lock long term,
meanwhile this fix appears appropriate for 3.5.

Acked-by: Michael S. Tsirkin <[email protected]>

Dave, can you pick this up pls?

> ---
> drivers/net/virtio_net.c | 19 ++++++++++++-------
> 1 file changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 5214b1e..f18149a 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -42,7 +42,8 @@ module_param(gso, bool, 0444);
> #define VIRTNET_DRIVER_VERSION "1.0.0"
>
> struct virtnet_stats {
> - struct u64_stats_sync syncp;
> + struct u64_stats_sync tx_syncp;
> + struct u64_stats_sync rx_syncp;
> u64 tx_bytes;
> u64 tx_packets;
>
> @@ -300,10 +301,10 @@ static void receive_buf(struct net_device *dev, void *buf, unsigned int len)
>
> hdr = skb_vnet_hdr(skb);
>
> - u64_stats_update_begin(&stats->syncp);
> + u64_stats_update_begin(&stats->rx_syncp);
> stats->rx_bytes += skb->len;
> stats->rx_packets++;
> - u64_stats_update_end(&stats->syncp);
> + u64_stats_update_end(&stats->rx_syncp);
>
> if (hdr->hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
> pr_debug("Needs csum!\n");
> @@ -565,10 +566,10 @@ static unsigned int free_old_xmit_skbs(struct virtnet_info *vi)
> while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) {
> pr_debug("Sent skb %p\n", skb);
>
> - u64_stats_update_begin(&stats->syncp);
> + u64_stats_update_begin(&stats->tx_syncp);
> stats->tx_bytes += skb->len;
> stats->tx_packets++;
> - u64_stats_update_end(&stats->syncp);
> + u64_stats_update_end(&stats->tx_syncp);
>
> tot_sgs += skb_vnet_hdr(skb)->num_sg;
> dev_kfree_skb_any(skb);
> @@ -703,12 +704,16 @@ static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev,
> u64 tpackets, tbytes, rpackets, rbytes;
>
> do {
> - start = u64_stats_fetch_begin(&stats->syncp);
> + start = u64_stats_fetch_begin(&stats->tx_syncp);
> tpackets = stats->tx_packets;
> tbytes = stats->tx_bytes;
> + } while (u64_stats_fetch_retry(&stats->tx_syncp, start));
> +
> + do {
> + start = u64_stats_fetch_begin(&stats->rx_syncp);
> rpackets = stats->rx_packets;
> rbytes = stats->rx_bytes;
> - } while (u64_stats_fetch_retry(&stats->syncp, start));
> + } while (u64_stats_fetch_retry(&stats->rx_syncp, start));
>
> tot->rx_packets += rpackets;
> tot->tx_packets += tpackets;
>

2012-06-11 03:23:43

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] virtio-net: fix a race on 32bit arches

From: "Michael S. Tsirkin" <[email protected]>
Date: Sun, 10 Jun 2012 13:25:12 +0300

> On Wed, Jun 06, 2012 at 10:35:24AM +0200, Eric Dumazet wrote:
>> From: Eric Dumazet <[email protected]>
>>
>> commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race
>> on 32bit arches.
>>
>> We must use separate syncp for rx and tx path as they can be run at the
>> same time on different cpus. Thus one sequence increment can be lost and
>> readers spin forever.
>>
>> Signed-off-by: Eric Dumazet <[email protected]>
>> Cc: Stephen Hemminger <[email protected]>
>> Cc: Michael S. Tsirkin <[email protected]>
>> Cc: Jason Wang <[email protected]>
>
> I'm still thinking about moving tx to take a xmit lock long term,
> meanwhile this fix appears appropriate for 3.5.
>
> Acked-by: Michael S. Tsirkin <[email protected]>
>
> Dave, can you pick this up pls?

Done, thanks.