2011-04-24 22:41:36

by Maximilian Engelhardt

[permalink] [raw]
Subject: Kernel crash after using new Intel NIC (igb)

Hello,

some time ago we switched some of our servers to a new networking card that
uses the Intel igb driver. Since that time we see regular kernel crashes.
The crashes happen at very irregular intervals, sometimes after a week uptime,
sometimes after a month or even more. They seem to be independent of the
server load as they also happen in the night when there is low traffic.

The affected server is used as a NAT device with some iptables rules and serves
about 2000 people.

Attached are two logs of the crashes as well as the output of dmesg, lspci,
and /proc/interrupts as well as the used kernel config.

I have no idea what might be wrong but I think it is a kernel bug. Perhaps
someone with more knowledge has a clue.

If needed I can provide additional information or build different kernels.

Greetings,
Maxi


Attachments:
(No filename) (830.00 B)
crash1 (3.58 kB)
crash2 (3.63 kB)
dmesg (42.04 kB)
interrupts (1.50 kB)
lspci (2.35 kB)
config-2.6.37.1 (44.63 kB)
signature.asc (836.00 B)
This is a digitally signed message part.
Download all attachments

2011-04-26 23:34:14

by Wyborny, Carolyn

[permalink] [raw]
Subject: RE: Kernel crash after using new Intel NIC (igb)



>-----Original Message-----
>From: [email protected] [mailto:[email protected]]
>On Behalf Of Maximilian Engelhardt
>Sent: Sunday, April 24, 2011 3:33 PM
>To: [email protected]
>Cc: [email protected]; StuStaNet Vorstand
>Subject: Kernel crash after using new Intel NIC (igb)
>
>Hello,
>
>some time ago we switched some of our servers to a new networking card
>that
>uses the Intel igb driver. Since that time we see regular kernel
>crashes.
>The crashes happen at very irregular intervals, sometimes after a week
>uptime,
>sometimes after a month or even more. They seem to be independent of the
>server load as they also happen in the night when there is low traffic.
>
>The affected server is used as a NAT device with some iptables rules and
>serves
>about 2000 people.
>
>Attached are two logs of the crashes as well as the output of dmesg,
>lspci,
>and /proc/interrupts as well as the used kernel config.
>
>I have no idea what might be wrong but I think it is a kernel bug.
>Perhaps
>someone with more knowledge has a clue.
>
>If needed I can provide additional information or build different
>kernels.
>
>Greetings,
>Maxi

Hello,

I'm sorry you're having crashes since installing our NIC. Thank you for the data. I haven't had a chance to review it carefully yet, but it looks to me like the crashes have us in the stack sometimes and sometimes not. I need to do a bit more research and will need some more information. Can I get an ethtool -i eth# for the device and also lspci -vvv for the platform its installed on.

If you open an issue at SourceForge we will have a place to keep the logs.

I will research this a bit more and get back to you tomorrow my time.

Thanks,

Carolyn
Carolyn Wyborny
Linux Development
LAN Access Division
Intel Corporation




????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2011-04-27 04:24:11

by Eric Dumazet

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)

Le lundi 25 avril 2011 à 00:32 +0200, Maximilian Engelhardt a écrit :
> Hello,
>
> some time ago we switched some of our servers to a new networking card that
> uses the Intel igb driver. Since that time we see regular kernel crashes.
> The crashes happen at very irregular intervals, sometimes after a week uptime,
> sometimes after a month or even more. They seem to be independent of the
> server load as they also happen in the night when there is low traffic.
>
> The affected server is used as a NAT device with some iptables rules and serves
> about 2000 people.
>
> Attached are two logs of the crashes as well as the output of dmesg, lspci,
> and /proc/interrupts as well as the used kernel config.
>
> I have no idea what might be wrong but I think it is a kernel bug. Perhaps
> someone with more knowledge has a clue.
>
> If needed I can provide additional information or build different kernels.
>
> Greetings,
> Maxi

Hello Maximilian

We had similar reports in the past that disappeared when adding
"slab_nomerge" to boot parameters. We suspect a memory corruption from
another part of kernel on 64bytes kmemcache objects.

In 2.6.37, inetpeer code uses 64bytes objects. Using slab_nomerge and
SLUB allocator (as you already do), makes sure inetpeer kmemcache wont
be shared by other 64bytes objects in kernel.

In 2.6.38 and up, inetpeer objects are now larger, so you also could try
latest linux-2.6 tree, just to make sure inetpeer code is not faulty.

Thanks

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff8145ea9f>] cleanup_once+0x3f/0xa0
PGD 12d82a067 PUD 12ea49067 PMD 0
Oops: 0002 [#1] PREEMPT SMP
last sysfs file: /sys/devices/virtual/vc/vcsa5/uevent
CPU 0
Pid: 0, comm: swapper Not tainted 2.6.37.1 #1 Supermicro X7SB4/E/X7SB4/E
RIP: 0010:[<ffffffff8145ea9f>] [<ffffffff8145ea9f>] cleanup_once+0x3f/0xa0
RSP: 0018:ffff8800cfc03e40 EFLAGS: 00010202
RAX: ffff880128167798 RBX: ffff880128167780 RCX: 0000000000000000
RDX: c398112e00026cf7 RSI: 00000000000001a2 RDI: ffffffff8166ce10
RBP: 0000000000024702 R08: 00000000003d0900 R09: 00040ea8ea5b7700
R10: ffffffff814f312d R11: 0000000000000010 R12: ffffffff8161ffd8
R13: 0000000000000102 R14: ffffffff8174b4e0 R15: ffffffff8161ffd8
FS: 0000000000000000(0000) GS:ffff8800cfc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 000000012fe67000 CR4: 00000000000406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff8161e000, task ffffffff81638020)
Stack:
ffff8800cfc11f00 0000000111034f87 0000000000024702 ffffffff8145ed68
ffffffff8174a4c0 ffffffff8174a4c0 ffff8800cfc03eb0 ffffffff81044cb8
ffffffff81034079 ffffffff8145ed30 0000000000000000 ffffffff8174b8e0
Call Trace:
<IRQ>
[<ffffffff8145ed68>] ? peer_check_expire+0x38/0x110
[<ffffffff81044cb8>] ? run_timer_softirq+0x138/0x250
[<ffffffff81034079>] ? scheduler_tick+0xd9/0x2e0
[<ffffffff8145ed30>] ? peer_check_expire+0x0/0x110
[<ffffffff8103eb0d>] ? __do_softirq+0x9d/0x130
[<ffffffff8100320c>] ? call_softirq+0x1c/0x30
[<ffffffff8100531d>] ? do_softirq+0x4d/0x80
[<ffffffff8103e9cd>] ? irq_exit+0x8d/0x90
[<ffffffff8101d5ea>] ? smp_apic_timer_interrupt+0x6a/0xa0
[<ffffffff81002cd3>] ? apic_timer_interrupt+0x13/0x20
<EOI>
[<ffffffff8100a93a>] ? mwait_idle+0x6a/0x80
[<ffffffff81001528>] ? cpu_idle+0x58/0xb0
[<ffffffff81698dd3>] ? start_kernel+0x334/0x33f
[<ffffffff8169840d>] ? x86_64_start_kernel+0xf3/0xf7
Code: 00 48 8b 05 84 e3 20 00 48 3d 00 ce 66 81 74 5c 48 8d 58 e8 48 8b 15 31 5e 22 00 2b 53 28 48 39 ea 72 49 48 8b 4b 18 48 8b 53 20 <48> 89 51 08 48 89 0a 48 89 43 18 48 89 43 20 f0 ff 40 14 48 c7
RIP [<ffffffff8145ea9f>] cleanup_once+0x3f/0xa0
RSP <ffff8800cfc03e40>
CR2: 0000000000000008
---[ end trace 904f16191de0663c ]---
Kernel panic - not syncing: Fatal exception in interrupt
Pid: 0, comm: swapper Tainted: G D 2.6.37.1 #1
Call Trace:
<IRQ> [<ffffffff814e4152>] ? panic+0xa1/0x19e
[<ffffffff810068eb>] ? oops_end+0x9b/0xa0
[<ffffffff81024523>] ? no_context+0x103/0x270
[<ffffffff81024d10>] ? do_page_fault+0x290/0x430
[<ffffffff813eabd2>] ? __alloc_skb+0x72/0x160
[<ffffffff81262f40>] ? swiotlb_dma_mapping_error+0x10/0x20
[<ffffffff8133e168>] ? igb_alloc_rx_buffers_adv+0x208/0x3a0
[<ffffffff814e780f>] ? page_fault+0x1f/0x30
[<ffffffff8145ea9f>] ? cleanup_once+0x3f/0xa0
[<ffffffff8145ed68>] ? peer_check_expire+0x38/0x110
[<ffffffff81044cb8>] ? run_timer_softirq+0x138/0x250
[<ffffffff81034079>] ? scheduler_tick+0xd9/0x2e0
[<ffffffff8145ed30>] ? peer_check_expire+0x0/0x110
[<ffffffff8103eb0d>] ? __do_softirq+0x9d/0x130
[<ffffffff8100320c>] ? call_softirq+0x1c/0x30
[<ffffffff8100531d>] ? do_softirq+0x4d/0x80
[<ffffffff8103e9cd>] ? irq_exit+0x8d/0x90
[<ffffffff8101d5ea>] ? smp_apic_timer_interrupt+0x6a/0xa0
[<ffffffff81002cd3>] ? apic_timer_interrupt+0x13/0x20
<EOI> [<ffffffff8100a93a>] ? mwait_idle+0x6a/0x80
[<ffffffff81001528>] ? cpu_idle+0x58/0xb0
[<ffffffff81698dd3>] ? start_kernel+0x334/0x33f
[<ffffffff8169840d>] ? x86_64_start_kernel+0xf3/0xf7

2011-04-27 04:32:59

by Eric Dumazet

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)

Le mercredi 27 avril 2011 à 06:24 +0200, Eric Dumazet a écrit :

> We had similar reports in the past that disappeared when adding
> "slab_nomerge" to boot parameters. We suspect a memory corruption from
> another part of kernel on 64bytes kmemcache objects.
>
> In 2.6.37, inetpeer code uses 64bytes objects. Using slab_nomerge and
> SLUB allocator (as you already do), makes sure inetpeer kmemcache wont
> be shared by other 64bytes objects in kernel.
>

Of course, the right option name is slub_nomerge

vi +2293 Documentation/kernel-parameters.txt

slub_nomerge [MM, SLUB]
Disable merging of slabs with similar size. May be
necessary if there is some reason to distinguish
allocs to different slabs. Debug options disable
merging on their own.
For more information see Documentation/vm/slub.txt.


2011-04-27 11:46:59

by Maximilian Engelhardt

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)

On Wednesday 27 April 2011 01:34:09 Wyborny, Carolyn wrote:
> >-----Original Message-----
> >From: [email protected] [mailto:[email protected]]
> >On Behalf Of Maximilian Engelhardt
> >Sent: Sunday, April 24, 2011 3:33 PM
> >To: [email protected]
> >Cc: [email protected]; StuStaNet Vorstand
> >Subject: Kernel crash after using new Intel NIC (igb)
> >
> >Hello,
> >
> >some time ago we switched some of our servers to a new networking card
> >that
> >uses the Intel igb driver. Since that time we see regular kernel
> >crashes.
> >The crashes happen at very irregular intervals, sometimes after a week
> >uptime,
> >sometimes after a month or even more. They seem to be independent of the
> >server load as they also happen in the night when there is low traffic.
> >
> >The affected server is used as a NAT device with some iptables rules and
> >serves
> >about 2000 people.
> >
> >Attached are two logs of the crashes as well as the output of dmesg,
> >lspci,
> >and /proc/interrupts as well as the used kernel config.
> >
> >I have no idea what might be wrong but I think it is a kernel bug.
> >Perhaps
> >someone with more knowledge has a clue.
> >
> >If needed I can provide additional information or build different
> >kernels.
> >
> >Greetings,
> >Maxi
>
> Hello,
>
> I'm sorry you're having crashes since installing our NIC. Thank you for
> the data. I haven't had a chance to review it carefully yet, but it looks
> to me like the crashes have us in the stack sometimes and sometimes not.
> I need to do a bit more research and will need some more information. Can
> I get an ethtool -i eth# for the device and also lspci -vvv for the
> platform its installed on.
>
> If you open an issue at SourceForge we will have a place to keep the logs.
>
> I will research this a bit more and get back to you tomorrow my time.
>
> Thanks,
>
> Carolyn
> Carolyn Wyborny
> Linux Development
> LAN Access Division
> Intel Corporation


Hello Carolyn,

Thanks for your response.

I have opened a issue at
https://sourceforge.net/tracker/?func=detail&aid=3293703&group_id=42302&atid=447449
and also posted all information there.


Please not that yesterday I updated the kernel, so I'm now running 2.6.38.4.
Eric Dumazet mentioned on the LKML that this might be a memory corruption that
my be solved with kernel 2.6.38.

I'll report if the crash happens again, but it might take some times as in the
past it happened within the interval of weeks to month.


Here is the output of ethtool (with the new 2.6.38.4 kernel):
$ /sbin/ethtool -i eth0
driver: igb
version: 2.1.0-k2
firmware-version: 1.2-1
bus-info: 0000:05:00.0
$ /sbin/ethtool -i eth1
driver: igb
version: 2.1.0-k2
firmware-version: 1.2-1
bus-info: 0000:05:00.1

The output of lspci -vvv is attached (also with kernel 2.6.38.4 but I guess it
doesn't make any difference)


Greetings,
Maxi


Attachments:
(No filename) (2.83 kB)
lspci_vvv (16.62 kB)
signature.asc (836.00 B)
This is a digitally signed message part.
Download all attachments

2011-04-27 11:52:24

by Maximilian Engelhardt

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)

Hello Eric,

On Wednesday 27 April 2011 06:32:51 Eric Dumazet wrote:
> Le mercredi 27 avril 2011 à 06:24 +0200, Eric Dumazet a écrit :
> > We had similar reports in the past that disappeared when adding
> > "slab_nomerge" to boot parameters. We suspect a memory corruption from
> > another part of kernel on 64bytes kmemcache objects.
> >
> > In 2.6.37, inetpeer code uses 64bytes objects. Using slab_nomerge and
> > SLUB allocator (as you already do), makes sure inetpeer kmemcache wont
> > be shared by other 64bytes objects in kernel.
>
> Of course, the right option name is slub_nomerge
>
> vi +2293 Documentation/kernel-parameters.txt
>
> slub_nomerge [MM, SLUB]
> Disable merging of slabs with similar size. May be
> necessary if there is some reason to distinguish
> allocs to different slabs. Debug options disable
> merging on their own.
> For more information see Documentation/vm/slub.txt.

thank you for this information. I updated the kernel of the affected server to
version 2.6.38.4 yesterday. I'll report when there are still crashes, but it
might take a while, as in the past they only happened within the interval of
weeks to month.

Greetings,
Maxi


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part.

2011-04-27 12:04:28

by Eric Dumazet

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)

Le mercredi 27 avril 2011 à 13:46 +0200, Maximilian Engelhardt a écrit :

> Hello Carolyn,
>
> Thanks for your response.
>
> I have opened a issue at
> https://sourceforge.net/tracker/?func=detail&aid=3293703&group_id=42302&atid=447449
> and also posted all information there.
>
>
> Please not that yesterday I updated the kernel, so I'm now running 2.6.38.4.
> Eric Dumazet mentioned on the LKML that this might be a memory corruption that
> my be solved with kernel 2.6.38.
>

Hmm, I suggested to use slub_nomerge to make sure inetpeer kmemcache is
not shared with another (possibly mem corruptor) layer.

Please note we were not able to track the bug in 2.6.37 kernels.

> I'll report if the crash happens again, but it might take some times as in the
> past it happened within the interval of weeks to month.
>

Thanks

2011-05-12 21:10:35

by Arun Sharma

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)

On Wed, Apr 27, 2011 at 01:51:59PM +0200, Maximilian Engelhardt wrote:
>
> thank you for this information. I updated the kernel of the affected server to
> version 2.6.38.4 yesterday. I'll report when there are still crashes, but it
> might take a while, as in the past they only happened within the interval of
> weeks to month.

I've seen this panic on 2.6.38.y as well. It's rare (< 1%), but the
panic consistently happens at the same place.

Are any of the inetpeer.c commits in the 2.6.39 tree, which are not in -stable yet,
good candidates to try?

-Arun

2011-05-12 21:16:00

by Eric Dumazet

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)

Le jeudi 12 mai 2011 à 14:10 -0700, Arun Sharma a écrit :
> On Wed, Apr 27, 2011 at 01:51:59PM +0200, Maximilian Engelhardt wrote:
> >
> > thank you for this information. I updated the kernel of the affected server to
> > version 2.6.38.4 yesterday. I'll report when there are still crashes, but it
> > might take a while, as in the past they only happened within the interval of
> > weeks to month.
>
> I've seen this panic on 2.6.38.y as well. It's rare (< 1%), but the
> panic consistently happens at the same place.
>
> Are any of the inetpeer.c commits in the 2.6.39 tree, which are not in -stable yet,
> good candidates to try?
>

Probably not.

What gives slub_nomerge=1 for you ?



2011-05-24 21:33:29

by Arun Sharma

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)

On Thu, May 12, 2011 at 11:15:53PM +0200, Eric Dumazet wrote:
>
> Probably not.
>
> What gives slub_nomerge=1 for you ?
>

It took me a while to get a new kernel on a large enough sample
of machines to get some data.

Like you observed in the other thread, this is unlikely to be a random
memory corruption.

The panics stopped after we moved the list_empty() check under the lock.

--- a/net/ipv4/inetpeer.c
+++ b/net/ipv4/inetpeer.c
@@ -154,11 +154,11 @@ void __init inet_initpeers(void)
/* Called with or without local BH being disabled. */
static void unlink_from_unused(struct inet_peer *p)
{
+ spin_lock_bh(&unused_peers.lock);
if (!list_empty(&p->unused)) {
- spin_lock_bh(&unused_peers.lock);
list_del_init(&p->unused);
- spin_unlock_bh(&unused_peers.lock);
}
+ spin_unlock_bh(&unused_peers.lock);
}

static int addr_compare(const struct inetpeer_addr *a,

The idea being that the list gets corrupted under some kind of a race
condition. Two threads racing on list_empty() and executing
list_del_init() seems harmless.

There is probably a different race condition that is mitigated by doing
the list_empty() check under the lock.

-Arun

2011-05-25 02:44:37

by Eric Dumazet

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)

Le mardi 24 mai 2011 à 14:33 -0700, Arun Sharma a écrit :
> On Thu, May 12, 2011 at 11:15:53PM +0200, Eric Dumazet wrote:
> >
> > Probably not.
> >
> > What gives slub_nomerge=1 for you ?
> >
>
> It took me a while to get a new kernel on a large enough sample
> of machines to get some data.
>
> Like you observed in the other thread, this is unlikely to be a random
> memory corruption.
>
> The panics stopped after we moved the list_empty() check under the lock.
>
> --- a/net/ipv4/inetpeer.c
> +++ b/net/ipv4/inetpeer.c
> @@ -154,11 +154,11 @@ void __init inet_initpeers(void)
> /* Called with or without local BH being disabled. */
> static void unlink_from_unused(struct inet_peer *p)
> {
> + spin_lock_bh(&unused_peers.lock);
> if (!list_empty(&p->unused)) {
> - spin_lock_bh(&unused_peers.lock);
> list_del_init(&p->unused);
> - spin_unlock_bh(&unused_peers.lock);
> }
> + spin_unlock_bh(&unused_peers.lock);
> }
>
> static int addr_compare(const struct inetpeer_addr *a,
>
> The idea being that the list gets corrupted under some kind of a race
> condition. Two threads racing on list_empty() and executing
> list_del_init() seems harmless.
>
> There is probably a different race condition that is mitigated by doing
> the list_empty() check under the lock.
>

Hmm, thanks for the report. Are you running x86 or another arch ?

We probably need some sort of memory barrier.

However, locking this central lock makes the thing too slow, I'll try to
use an atomic_inc_return on p->refcnt instead. (and then lock
unused_peers.lock if we got a 0->1 transition)

I am testing following patch :


diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c
index 9df4e63..43aacbf 100644
--- a/net/ipv4/inetpeer.c
+++ b/net/ipv4/inetpeer.c
@@ -154,11 +154,9 @@ void __init inet_initpeers(void)
/* Called with or without local BH being disabled. */
static void unlink_from_unused(struct inet_peer *p)
{
- if (!list_empty(&p->unused)) {
- spin_lock_bh(&unused_peers.lock);
- list_del_init(&p->unused);
- spin_unlock_bh(&unused_peers.lock);
- }
+ spin_lock_bh(&unused_peers.lock);
+ list_del_init(&p->unused);
+ spin_unlock_bh(&unused_peers.lock);
}

static int addr_compare(const struct inetpeer_addr *a,
@@ -213,10 +211,11 @@ static int addr_compare(const struct inetpeer_addr *a,
* We exit from this function if number of links exceeds PEER_MAXDEPTH
*/
static struct inet_peer *lookup_rcu(const struct inetpeer_addr *daddr,
- struct inet_peer_base *base)
+ struct inet_peer_base *base,
+ int *newrefcnt)
{
struct inet_peer *u = rcu_dereference(base->root);
- int count = 0;
+ int old, new, count = 0;

while (u != peer_avl_empty) {
int cmp = addr_compare(daddr, &u->daddr);
@@ -226,8 +225,16 @@ static struct inet_peer *lookup_rcu(const struct inetpeer_addr *daddr,
* distinction between an unused entry (refcnt=0) and
* a freed one.
*/
- if (unlikely(!atomic_add_unless(&u->refcnt, 1, -1)))
- u = NULL;
+ while (1) {
+ old = atomic_read(&u->refcnt);
+ if (old == -1)
+ return NULL;
+ new = old + 1;
+ if (atomic_cmpxchg(&u->refcnt,
+ old, new) == old)
+ break;
+ }
+ *newrefcnt = new;
return u;
}
if (cmp == -1)
@@ -465,14 +472,14 @@ struct inet_peer *inet_getpeer(struct inetpeer_addr *daddr, int create)
struct inet_peer_base *base = family_to_base(daddr->family);
struct inet_peer *p;
unsigned int sequence;
- int invalidated;
+ int invalidated, newrefcnt = 0;

/* Look up for the address quickly, lockless.
* Because of a concurrent writer, we might not find an existing entry.
*/
rcu_read_lock();
sequence = read_seqbegin(&base->lock);
- p = lookup_rcu(daddr, base);
+ p = lookup_rcu(daddr, base, &newrefcnt);
invalidated = read_seqretry(&base->lock, sequence);
rcu_read_unlock();

@@ -480,7 +487,8 @@ struct inet_peer *inet_getpeer(struct inetpeer_addr *daddr, int create)
/* The existing node has been found.
* Remove the entry from unused list if it was there.
*/
- unlink_from_unused(p);
+ if (newrefcnt == 1)
+ unlink_from_unused(p);
return p;
}

@@ -494,10 +502,11 @@ struct inet_peer *inet_getpeer(struct inetpeer_addr *daddr, int create)
write_seqlock_bh(&base->lock);
p = lookup(daddr, stack, base);
if (p != peer_avl_empty) {
- atomic_inc(&p->refcnt);
+ newrefcnt = atomic_inc_return(&p->refcnt);
write_sequnlock_bh(&base->lock);
/* Remove the entry from unused list if it was there. */
- unlink_from_unused(p);
+ if (newrefcnt == 1)
+ unlink_from_unused(p);
return p;
}
p = create ? kmem_cache_alloc(peer_cachep, GFP_ATOMIC) : NULL;

2011-05-25 06:06:12

by Arun Sharma

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)

On Wed, May 25, 2011 at 04:44:29AM +0200, Eric Dumazet wrote:
>
> Hmm, thanks for the report. Are you running x86 or another arch ?
>

This was on x86.

> We probably need some sort of memory barrier.
>
> However, locking this central lock makes the thing too slow, I'll try to
> use an atomic_inc_return on p->refcnt instead. (and then lock
> unused_peers.lock if we got a 0->1 transition)

Another possibility is to do the list_empty() check twice. Once without
taking the lock and again with the spinlock held.

-Arun

2011-05-25 06:35:37

by Eric Dumazet

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)

Le mardi 24 mai 2011 à 23:06 -0700, Arun Sharma a écrit :
> On Wed, May 25, 2011 at 04:44:29AM +0200, Eric Dumazet wrote:
> >
> > Hmm, thanks for the report. Are you running x86 or another arch ?
> >
>
> This was on x86.
>
> > We probably need some sort of memory barrier.
> >
> > However, locking this central lock makes the thing too slow, I'll try to
> > use an atomic_inc_return on p->refcnt instead. (and then lock
> > unused_peers.lock if we got a 0->1 transition)
>
> Another possibility is to do the list_empty() check twice. Once without
> taking the lock and again with the spinlock held.
>

Why ?

list_del_init(&p->unused); (done under lock of course) is safe, you can
call it twice, no problem.

No, the real problem is the (!list_empty(&p->unused) test : It seems to
not always tell the truth if not done under lock.


2011-05-26 17:05:09

by Ben Hutchings

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)

On Wed, 2011-05-25 at 08:35 +0200, Eric Dumazet wrote:
> Le mardi 24 mai 2011 à 23:06 -0700, Arun Sharma a écrit :
> > On Wed, May 25, 2011 at 04:44:29AM +0200, Eric Dumazet wrote:
> > >
> > > Hmm, thanks for the report. Are you running x86 or another arch ?
> > >
> >
> > This was on x86.
> >
> > > We probably need some sort of memory barrier.
> > >
> > > However, locking this central lock makes the thing too slow, I'll try to
> > > use an atomic_inc_return on p->refcnt instead. (and then lock
> > > unused_peers.lock if we got a 0->1 transition)
> >
> > Another possibility is to do the list_empty() check twice. Once without
> > taking the lock and again with the spinlock held.
> >
>
> Why ?
>
> list_del_init(&p->unused); (done under lock of course) is safe, you can
> call it twice, no problem.
>
> No, the real problem is the (!list_empty(&p->unused) test : It seems to
> not always tell the truth if not done under lock.

Of course not; list modification operations are not atomic.

Ben.

--
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

2011-05-26 19:29:45

by Arun Sharma

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)

On 5/24/11 11:35 PM, Eric Dumazet wrote:

>> Another possibility is to do the list_empty() check twice. Once without
>> taking the lock and again with the spinlock held.
>>
>
> Why ?
>

Part of the problem is that I don't have a precise understanding of the
race condition that's causing the list to become corrupted.

All I know is that doing it under the lock fixes it. If it's slowing
things down, we do a check outside the lock (since it's cheap). But if
we get the wrong answer, we verify it again under the lock.

> list_del_init(&p->unused); (done under lock of course) is safe, you can
> call it twice, no problem.

Doing it twice is not a problem. But doing it when we shouldn't be doing
it could be the problem.

The list modification under unused_peers.lock looks generally safe. But
the control flow (based on refcnt) done outside the lock might have races.

Eg: inet_putpeer() might find the refcnt go to zero, but before it adds
it to the unused list, another thread may be doing inet_getpeer() and
set refcnt to 1. In the end, we end up with a node that's potentially in
use, but ends up on the unused list.

-Arun

2011-05-26 19:47:32

by Eric Dumazet

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)

Le jeudi 26 mai 2011 à 12:30 -0700, Arun Sharma a écrit :
> On 5/24/11 11:35 PM, Eric Dumazet wrote:
>
> >> Another possibility is to do the list_empty() check twice. Once without
> >> taking the lock and again with the spinlock held.
> >>
> >
> > Why ?
> >
>
> Part of the problem is that I don't have a precise understanding of the
> race condition that's causing the list to become corrupted.
>
> All I know is that doing it under the lock fixes it. If it's slowing
> things down, we do a check outside the lock (since it's cheap). But if
> we get the wrong answer, we verify it again under the lock.
>

You dont get the problem. Problem is : We can do the empty() test only
if protected by the lock.

If not locked, result can be wrong. [ false positive or negative ]


> > list_del_init(&p->unused); (done under lock of course) is safe, you can
> > call it twice, no problem.
>
> Doing it twice is not a problem. But doing it when we shouldn't be doing
> it could be the problem.
>
> The list modification under unused_peers.lock looks generally safe. But
> the control flow (based on refcnt) done outside the lock might have races.
>

"might" is not a good word when dealing with this ;)

> Eg: inet_putpeer() might find the refcnt go to zero, but before it adds
> it to the unused list, another thread may be doing inet_getpeer() and
> set refcnt to 1. In the end, we end up with a node that's potentially in
> use, but ends up on the unused list.
>

Did you test my fix ?

Its doing the right thing : Using refcnt as the only marker to say if
the item must be removed from unused list (and lock the central lock
protecting this list only when needed)

Since we already must do an atomic operation on refcnt, using
atomic_inc_return [ or similar full barrier op ] is enough to tell us
the truth.


2011-05-26 21:47:52

by Arun Sharma

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)

On 5/26/11 12:47 PM, Eric Dumazet wrote:

> You dont get the problem. Problem is : We can do the empty() test only
> if protected by the lock.
>
> If not locked, result can be wrong. [ false positive or negative ]
>


Agreed. Failing to unlink from unused list when we should have sounds wrong.

>> The list modification under unused_peers.lock looks generally safe. But
>> the control flow (based on refcnt) done outside the lock might have races.
>>
>
> "might" is not a good word when dealing with this ;)

Potential race in the current code:

initial refcnt = 1

T1: T2

atomic_dec_and_lock(refcnt)
// refcnt == 0

atomic_add_unless(refcnt)
unlink_from_unused()

list_add_tail(unused)
// T2 using "unused" entry


> Did you test my fix ?

I could try it on one or two machines - but it won't tell us anything
for weeks if not months. Unfortunately my next window to try a new
kernel on a large enough sample is several months away.

>
> Its doing the right thing : Using refcnt as the only marker to say if
> the item must be removed from unused list (and lock the central lock
> protecting this list only when needed)
>
> Since we already must do an atomic operation on refcnt, using
> atomic_inc_return [ or similar full barrier op ] is enough to tell us
> the truth.

Yeah - using the refcnt seems better than list_empty(), but I'm not sure
that your patch addresses the race above.

-Arun

2011-05-26 22:01:38

by Eric Dumazet

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)

Le jeudi 26 mai 2011 à 14:48 -0700, Arun Sharma a écrit :
> On 5/26/11 12:47 PM, Eric Dumazet wrote:
>
> > You dont get the problem. Problem is : We can do the empty() test only
> > if protected by the lock.
> >
> > If not locked, result can be wrong. [ false positive or negative ]
> >
>
>
> Agreed. Failing to unlink from unused list when we should have sounds wrong.
>
> >> The list modification under unused_peers.lock looks generally safe. But
> >> the control flow (based on refcnt) done outside the lock might have races.
> >>
> >
> > "might" is not a good word when dealing with this ;)
>
> Potential race in the current code:
>
> initial refcnt = 1
>
> T1: T2
>
> atomic_dec_and_lock(refcnt)
> // refcnt == 0
>
> atomic_add_unless(refcnt)
> unlink_from_unused()
>
> list_add_tail(unused)
> // T2 using "unused" entry
>
>
> > Did you test my fix ?
>
> I could try it on one or two machines - but it won't tell us anything
> for weeks if not months. Unfortunately my next window to try a new
> kernel on a large enough sample is several months away.
>
> >
> > Its doing the right thing : Using refcnt as the only marker to say if
> > the item must be removed from unused list (and lock the central lock
> > protecting this list only when needed)
> >
> > Since we already must do an atomic operation on refcnt, using
> > atomic_inc_return [ or similar full barrier op ] is enough to tell us
> > the truth.
>
> Yeah - using the refcnt seems better than list_empty(), but I'm not sure
> that your patch addresses the race above.

It does.

It becomes

T1: T2
>
> atomic_dec_and_lock(refcnt)
> // refcnt == 0
>
> newref = atomic_add_unless_and_return(refcnt)
>
> list_add_tail(unused)
unlock();
>
> if (newref == 1) {
lock()
unlink_from_unused()
unlock()
}


2011-05-27 00:09:22

by Arun Sharma

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)

On 5/26/11 3:01 PM, Eric Dumazet wrote:


>> Yeah - using the refcnt seems better than list_empty(), but I'm not sure
>> that your patch addresses the race above.
>
> It does.

True. I can't find any holes in this method and it resolves the "failure
to unlink from unused" case.

Perhaps wrap the while(1) loop into its own primitive in atomic.h or use
an existing primitive?

-Arun

2011-05-27 03:27:18

by Eric Dumazet

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)

Le jeudi 26 mai 2011 à 17:09 -0700, Arun Sharma a écrit :
> On 5/26/11 3:01 PM, Eric Dumazet wrote:
>
>
> >> Yeah - using the refcnt seems better than list_empty(), but I'm not sure
> >> that your patch addresses the race above.
> >
> > It does.
>
> True. I can't find any holes in this method and it resolves the "failure
> to unlink from unused" case.
>
> Perhaps wrap the while(1) loop into its own primitive in atomic.h or use
> an existing primitive?
>

Sure, here is a formal submission I cooked.

Thanks

[PATCH] inetpeer: fix race in unused_list manipulations

Several crashes in cleanup_once() were reported in recent kernels.

Commit d6cc1d642de9 (inetpeer: various changes) added a race in
unlink_from_unused().

One way to avoid taking unused_peers.lock before doing the list_empty()
test is to catch 0->1 refcnt transitions, using full barrier atomic
operations variants (atomic_cmpxchg() and atomic_inc_return()) instead
of previous atomic_inc() and atomic_add_unless() variants.

We then call unlink_from_unused() only for the owner of the 0->1
transition.

Add a new atomic_add_unless_return() static helper

With help from Arun Sharma.

Refs: https://bugzilla.kernel.org/show_bug.cgi?id=32772

Reported-by: Arun Sharma <[email protected]>
Reported-by: Maximilian Engelhardt <[email protected]>
Reported-by: Yann Dupont <[email protected]>
Reported-by: Denys Fedoryshchenko <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
---
net/ipv4/inetpeer.c | 42 +++++++++++++++++++++++++++---------------
1 files changed, 27 insertions(+), 15 deletions(-)

diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c
index 9df4e63..ce616d9 100644
--- a/net/ipv4/inetpeer.c
+++ b/net/ipv4/inetpeer.c
@@ -154,11 +154,9 @@ void __init inet_initpeers(void)
/* Called with or without local BH being disabled. */
static void unlink_from_unused(struct inet_peer *p)
{
- if (!list_empty(&p->unused)) {
- spin_lock_bh(&unused_peers.lock);
- list_del_init(&p->unused);
- spin_unlock_bh(&unused_peers.lock);
- }
+ spin_lock_bh(&unused_peers.lock);
+ list_del_init(&p->unused);
+ spin_unlock_bh(&unused_peers.lock);
}

static int addr_compare(const struct inetpeer_addr *a,
@@ -205,6 +203,20 @@ static int addr_compare(const struct inetpeer_addr *a,
u; \
})

+static bool atomic_add_unless_return(atomic_t *ptr, int a, int u, int *newv)
+{
+ int cur, old = atomic_read(ptr);
+
+ while (old != u) {
+ *newv = old + a;
+ cur = atomic_cmpxchg(ptr, old, *newv);
+ if (cur == old)
+ return true;
+ old = cur;
+ }
+ return false;
+}
+
/*
* Called with rcu_read_lock()
* Because we hold no lock against a writer, its quite possible we fall
@@ -213,7 +225,8 @@ static int addr_compare(const struct inetpeer_addr *a,
* We exit from this function if number of links exceeds PEER_MAXDEPTH
*/
static struct inet_peer *lookup_rcu(const struct inetpeer_addr *daddr,
- struct inet_peer_base *base)
+ struct inet_peer_base *base,
+ int *newrefcnt)
{
struct inet_peer *u = rcu_dereference(base->root);
int count = 0;
@@ -226,7 +239,7 @@ static struct inet_peer *lookup_rcu(const struct inetpeer_addr *daddr,
* distinction between an unused entry (refcnt=0) and
* a freed one.
*/
- if (unlikely(!atomic_add_unless(&u->refcnt, 1, -1)))
+ if (!atomic_add_unless_return(&u->refcnt, 1, -1, newrefcnt))
u = NULL;
return u;
}
@@ -465,22 +478,23 @@ struct inet_peer *inet_getpeer(struct inetpeer_addr *daddr, int create)
struct inet_peer_base *base = family_to_base(daddr->family);
struct inet_peer *p;
unsigned int sequence;
- int invalidated;
+ int invalidated, newrefcnt = 0;

/* Look up for the address quickly, lockless.
* Because of a concurrent writer, we might not find an existing entry.
*/
rcu_read_lock();
sequence = read_seqbegin(&base->lock);
- p = lookup_rcu(daddr, base);
+ p = lookup_rcu(daddr, base, &newrefcnt);
invalidated = read_seqretry(&base->lock, sequence);
rcu_read_unlock();

if (p) {
- /* The existing node has been found.
+found: /* The existing node has been found.
* Remove the entry from unused list if it was there.
*/
- unlink_from_unused(p);
+ if (newrefcnt == 1)
+ unlink_from_unused(p);
return p;
}

@@ -494,11 +508,9 @@ struct inet_peer *inet_getpeer(struct inetpeer_addr *daddr, int create)
write_seqlock_bh(&base->lock);
p = lookup(daddr, stack, base);
if (p != peer_avl_empty) {
- atomic_inc(&p->refcnt);
+ newrefcnt = atomic_inc_return(&p->refcnt);
write_sequnlock_bh(&base->lock);
- /* Remove the entry from unused list if it was there. */
- unlink_from_unused(p);
- return p;
+ goto found;
}
p = create ? kmem_cache_alloc(peer_cachep, GFP_ATOMIC) : NULL;
if (p) {

2011-05-27 07:59:18

by Yann Dupont

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)

Le 27/05/2011 05:27, Eric Dumazet a écrit :
> Le jeudi 26 mai 2011 à 17:09 -0700, Arun Sharma a écrit :
>> On 5/26/11 3:01 PM, Eric Dumazet wrote:
>>
>>
>>>> Yeah - using the refcnt seems better than list_empty(), but I'm not sure
>>>> that your patch addresses the race above.
>>> It does.
>> True. I can't find any holes in this method and it resolves the "failure
>> to unlink from unused" case.
>>
>> Perhaps wrap the while(1) loop into its own primitive in atomic.h or use
>> an existing primitive?
>>
> Sure, here is a formal submission I cooked.
>
> Thanks
>
> [PATCH] inetpeer: fix race in unused_list manipulations

Thanks eric, didn't noticed this thread, nice to see you squashed this bug.
As you said in a previous message, slub_nomerge prevented us from
crashing for 113 days now :)

But of course, THE REAL FIX is much preffered. Will try this patch with
the next -stable update.

Thanks for your efforts,

--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : [email protected]

2011-05-27 17:43:56

by David Miller

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)

From: Eric Dumazet <[email protected]>
Date: Fri, 27 May 2011 05:27:11 +0200

> Sure, here is a formal submission I cooked.
>
> Thanks
>
> [PATCH] inetpeer: fix race in unused_list manipulations

Applied, thanks Eric.

2011-05-27 17:52:04

by Arun Sharma

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)

On 5/26/11 8:27 PM, Eric Dumazet wrote:

Looks good. Thanks for taking care of this.

> +static bool atomic_add_unless_return(atomic_t *ptr, int a, int u, int *newv)
> +{
> + int cur, old = atomic_read(ptr);
> +
> + while (old != u) {
> + *newv = old + a;
> + cur = atomic_cmpxchg(ptr, old, *newv);
> + if (cur == old)
> + return true;
> + old = cur;
> + }
> + return false;
> +}

This looks very similar to atomic_add_unless(). If we had a

__atomic_add_unless() that returned "old", we could then do:

atomic_add_unless() { return __atomic_add_unless() != u }
atomic_add_unless_return() { return __atomic_add_unless() + a}

-Arun

2011-05-27 19:57:08

by Eric Dumazet

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)

Le vendredi 27 mai 2011 à 10:52 -0700, Arun Sharma a écrit :
> On 5/26/11 8:27 PM, Eric Dumazet wrote:
>
> Looks good. Thanks for taking care of this.
>
> > +static bool atomic_add_unless_return(atomic_t *ptr, int a, int u, int *newv)
> > +{
> > + int cur, old = atomic_read(ptr);
> > +
> > + while (old != u) {
> > + *newv = old + a;
> > + cur = atomic_cmpxchg(ptr, old, *newv);
> > + if (cur == old)
> > + return true;
> > + old = cur;
> > + }
> > + return false;
> > +}
>
> This looks very similar to atomic_add_unless(). If we had a
>
> __atomic_add_unless() that returned "old", we could then do:
>
> atomic_add_unless() { return __atomic_add_unless() != u }
> atomic_add_unless_return() { return __atomic_add_unless() + a}
>

Sure !

I preferred to not touch lot of files in kernel (atomic_add_unless() is
defined in several files) because its a stable candidate patch (2.6.36+)

So a cleanup patch for 2.6.40+ is certainly doable, do you want to do
this ?

Thanks


2011-05-27 21:14:27

by Arun Sharma

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)

On Fri, May 27, 2011 at 09:56:59PM +0200, Eric Dumazet wrote:
> >
> > This looks very similar to atomic_add_unless(). If we had a
> >
> > __atomic_add_unless() that returned "old", we could then do:
> >
> > atomic_add_unless() { return __atomic_add_unless() != u }
> > atomic_add_unless_return() { return __atomic_add_unless() + a}
> >
>
> Sure !
>
> I preferred to not touch lot of files in kernel (atomic_add_unless() is
> defined in several files) because its a stable candidate patch (2.6.36+)
>
> So a cleanup patch for 2.6.40+ is certainly doable, do you want to do
> this ?

The attached works for me for x86_64. Cc'ing Ingo/Thomas for comment.

-Arun

atomic: Refactor atomic_add_unless

Commit 686a7e3 (inetpeer: fix race in unused_list manipulations)
in net-2.6 added a atomic_add_unless_return() variant that tries
to detect 0->1 transitions of an atomic reference count.

This sounds like a generic functionality that could be expressed
in terms of an __atomic_add_unless() that returned the old value
instead of a bool.

Signed-off-by: Arun Sharma <[email protected]>
---
arch/x86/include/asm/atomic.h | 22 ++++++++++++++++++----
1 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
index 952a826..bbdbffe 100644
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -221,15 +221,15 @@ static inline int atomic_xchg(atomic_t *v, int new)
}

/**
- * atomic_add_unless - add unless the number is already a given value
+ * __atomic_add_unless - add unless the number is already a given value
* @v: pointer of type atomic_t
* @a: the amount to add to v...
* @u: ...unless v is equal to u.
*
* Atomically adds @a to @v, so long as @v was not already @u.
- * Returns non-zero if @v was not @u, and zero otherwise.
+ * Returns the old value of v
*/
-static inline int atomic_add_unless(atomic_t *v, int a, int u)
+static inline int __atomic_add_unless(atomic_t *v, int a, int u)
{
int c, old;
c = atomic_read(v);
@@ -241,7 +241,21 @@ static inline int atomic_add_unless(atomic_t *v, int a, int u)
break;
c = old;
}
- return c != (u);
+ return c;
+}
+
+/**
+ * atomic_add_unless - add unless the number is already a given value
+ * @v: pointer of type atomic_t
+ * @a: the amount to add to v...
+ * @u: ...unless v is equal to u.
+ *
+ * Atomically adds @a to @v, so long as @v was not already @u.
+ * Returns non-zero if @v was not @u, and zero otherwise.
+ */
+static inline int atomic_add_unless(atomic_t *v, int a, int u)
+{
+ return __atomic_add_unless(v, a, u) != u;
}

#define atomic_inc_not_zero(v) atomic_add_unless((v), 1, 0)
--
1.7.4

2011-05-28 05:41:34

by Eric Dumazet

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)

Le vendredi 27 mai 2011 à 14:14 -0700, Arun Sharma a écrit :

> The attached works for me for x86_64. Cc'ing Ingo/Thomas for comment.
>
> -Arun
>
> atomic: Refactor atomic_add_unless
>
> Commit 686a7e3 (inetpeer: fix race in unused_list manipulations)
> in net-2.6 added a atomic_add_unless_return() variant that tries
> to detect 0->1 transitions of an atomic reference count.
>
> This sounds like a generic functionality that could be expressed
> in terms of an __atomic_add_unless() that returned the old value
> instead of a bool.
>
> Signed-off-by: Arun Sharma <[email protected]>
> ---
> arch/x86/include/asm/atomic.h | 22 ++++++++++++++++++----
> 1 files changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
> index 952a826..bbdbffe 100644
> --- a/arch/x86/include/asm/atomic.h
> +++ b/arch/x86/include/asm/atomic.h
> @@ -221,15 +221,15 @@ static inline int atomic_xchg(atomic_t *v, int new)
> }
>
> /**
> - * atomic_add_unless - add unless the number is already a given value
> + * __atomic_add_unless - add unless the number is already a given value
> * @v: pointer of type atomic_t
> * @a: the amount to add to v...
> * @u: ...unless v is equal to u.
> *
> * Atomically adds @a to @v, so long as @v was not already @u.
> - * Returns non-zero if @v was not @u, and zero otherwise.
> + * Returns the old value of v
> */
> -static inline int atomic_add_unless(atomic_t *v, int a, int u)
> +static inline int __atomic_add_unless(atomic_t *v, int a, int u)
> {
> int c, old;
> c = atomic_read(v);
> @@ -241,7 +241,21 @@ static inline int atomic_add_unless(atomic_t *v, int a, int u)
> break;
> c = old;
> }
> - return c != (u);
> + return c;
> +}
> +
> +/**
> + * atomic_add_unless - add unless the number is already a given value
> + * @v: pointer of type atomic_t
> + * @a: the amount to add to v...
> + * @u: ...unless v is equal to u.
> + *
> + * Atomically adds @a to @v, so long as @v was not already @u.
> + * Returns non-zero if @v was not @u, and zero otherwise.
> + */
> +static inline int atomic_add_unless(atomic_t *v, int a, int u)
> +{
> + return __atomic_add_unless(v, a, u) != u;
> }
>
> #define atomic_inc_not_zero(v) atomic_add_unless((v), 1, 0)

As I said, atomic_add_unless() has several implementations in various
arches. You must take care of all, not only x86.


2011-05-28 18:04:47

by Ingo Molnar

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)


* Eric Dumazet <[email protected]> wrote:

> > +static inline int atomic_add_unless(atomic_t *v, int a, int u)
> > +{
> > + return __atomic_add_unless(v, a, u) != u;
> > }
> >
> > #define atomic_inc_not_zero(v) atomic_add_unless((v), 1, 0)
>
> As I said, atomic_add_unless() has several implementations in
> various arches. You must take care of all, not only x86.

It's a bit sad to see local hacks to generic facilities committed
upstream like that.

Arun: the x86 bits look good at first sight.

Thanks,

Ingo

2011-05-29 07:33:53

by Eric Dumazet

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)

Le samedi 28 mai 2011 à 20:04 +0200, Ingo Molnar a écrit :
> * Eric Dumazet <[email protected]> wrote:
>
> > > +static inline int atomic_add_unless(atomic_t *v, int a, int u)
> > > +{
> > > + return __atomic_add_unless(v, a, u) != u;
> > > }
> > >
> > > #define atomic_inc_not_zero(v) atomic_add_unless((v), 1, 0)
> >
> > As I said, atomic_add_unless() has several implementations in
> > various arches. You must take care of all, not only x86.
>
> It's a bit sad to see local hacks to generic facilities committed
> upstream like that.
>

Again, its a stable candidate patch, on a file that had many changes in
recent kernels.

I felt this was the right thing to do, to ease stable teams work.

Its not like there is an urgent need for this atomic_add_unless_return()
thing that we have to worry right now.


2011-05-29 07:38:51

by Ingo Molnar

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)


* Eric Dumazet <[email protected]> wrote:

> Le samedi 28 mai 2011 ? 20:04 +0200, Ingo Molnar a ?crit :
> > * Eric Dumazet <[email protected]> wrote:
> >
> > > > +static inline int atomic_add_unless(atomic_t *v, int a, int u)
> > > > +{
> > > > + return __atomic_add_unless(v, a, u) != u;
> > > > }
> > > >
> > > > #define atomic_inc_not_zero(v) atomic_add_unless((v), 1, 0)
> > >
> > > As I said, atomic_add_unless() has several implementations in
> > > various arches. You must take care of all, not only x86.
> >
> > It's a bit sad to see local hacks to generic facilities committed
> > upstream like that.
> >
>
> Again, its a stable candidate patch, on a file that had many changes in
> recent kernels.
>
> I felt this was the right thing to do, to ease stable teams work.

Yes and i do this all the time as well, to make life easier for the
stable team.

What wasnt fine was to not follow up the backportable hack with the
proper patch though. Or did you plan to do that yourself if Arun
fails to complete the generic variant? (in which case my remark is
moot)

Thanks,

Ingo

2011-05-29 07:43:12

by Eric Dumazet

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)

Le dimanche 29 mai 2011 à 09:38 +0200, Ingo Molnar a écrit :

> Yes and i do this all the time as well, to make life easier for the
> stable team.
>
> What wasnt fine was to not follow up the backportable hack with the
> proper patch though. Or did you plan to do that yourself if Arun
> fails to complete the generic variant? (in which case my remark is
> moot)
>

I am currently working on seqlock.h split into seqcount.h and seqlock.h,
because I really need this split, but can take care of this work too, no
problem :

I asked Arun if he wanted to make this himself, because initial idea was
coming from him, not because I did not want to make it ;)


2011-05-29 12:33:42

by Ingo Molnar

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)


* Eric Dumazet <[email protected]> wrote:

> I asked Arun if he wanted to make this himself, because initial
> idea was coming from him, not because I did not want to make it ;)

Hey, fair enough and sorry about the fuss! :-)

Thanks,

Ingo

2011-05-30 18:33:59

by Arun Sharma

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)

On 5/29/11 5:33 AM, Ingo Molnar wrote:
>
> * Eric Dumazet<[email protected]> wrote:
>
>> I asked Arun if he wanted to make this himself, because initial
>> idea was coming from him, not because I did not want to make it ;)
>
> Hey, fair enough and sorry about the fuss! :-)

Sounds like there is general consensus that such a cleanup would be
good. I'll try to post a patch that does this cleanup for all archs in
the next couple of days - but I won't have a way of testing it on
anything other than x86_64.

-Arun

2011-05-31 10:50:33

by Ingo Molnar

[permalink] [raw]
Subject: Re: Kernel crash after using new Intel NIC (igb)


* Arun Sharma <[email protected]> wrote:

> On 5/29/11 5:33 AM, Ingo Molnar wrote:
> >
> >* Eric Dumazet<[email protected]> wrote:
> >
> >>I asked Arun if he wanted to make this himself, because initial
> >>idea was coming from him, not because I did not want to make it ;)
> >
> >Hey, fair enough and sorry about the fuss! :-)
>
> Sounds like there is general consensus that such a cleanup would be
> good. I'll try to post a patch that does this cleanup for all archs
> in the next couple of days - but I won't have a way of testing it
> on anything other than x86_64.

-mm is a good place to do such many-arch kind of facility
cleanups/extensions so please Cc: Andrew if you think the patch is
ready to be tested on that level.

Thanks,

Ingo