2003-07-30 14:08:17

by Willy Tarreau

[permalink] [raw]
Subject: [PATCH] 2.4.22-pre9-bk : bonding bug fixes


Hi Marcelo, David, Jeff...

there are still a few bugs in the current bonding driver. I've reported them
several times now, but perhaps not at the right places...

So :
- the first patch fixes a typo in the MODULE_PARM_DESC

- the second one adds a comment and a warning around some code I don't
understand, but which cannot be executed. It's within function
bond_xmit_activebackup, and only executes if bond->mode != ACTIVEBACKUP....

- now the last one fixes a kernel panic due to a cheap hack which was introduced
to determine the source IP address to use with ARP checks. It takes the first
address of the first slave, and puts a lock on it. If there's no address, its
ip_ptr is NULL, and the kernel panics while trying to get the lock. You can
reproduce it easily this way :

# modprobe eth0
# modprobe bonding mode=active-backup miimon=1000
# ip link set bond0 up
# ifenslave bond0 eth0
=> kernel panic !

Now here are the patches. I really hope to get them into 2.4.22, since I'm a
bit fed up with my server panicing each time I try a vanilla new kernel which
I forget to patch...

Cheers,
Willy


======== first one ==========

--- linux-2.4.22-pre9-bk/drivers/net/bonding/bond_main.c Wed Jul 30 09:49:48 2003
+++ linux-2.4.22-pre9-bk-bond/drivers/net/bonding/bond_main.c Wed Jul 30 15:09:15 2003
@@ -524,7 +524,7 @@
MODULE_PARM(miimon, "i");
MODULE_PARM_DESC(miimon, "Link check interval in milliseconds");
MODULE_PARM(use_carrier, "i");
-MODULE_PARM_DESC(use_carrier, "Use netif_carrier_ok (vs MII ioctls) in miimon; 09 for off, 1 for on (default)");
+MODULE_PARM_DESC(use_carrier, "Use netif_carrier_ok (vs MII ioctls) in miimon; 0 for off, 1 for on (default)");
MODULE_PARM(mode, "s");
MODULE_PARM_DESC(mode, "Mode of operation : 0 for round robin, 1 for active-backup, 2 for xor");
MODULE_PARM(arp_interval, "i");



======== second one ==========

--- linux-2.4.22-pre9-bk/drivers/net/bonding/bond_main.c Wed Jul 30 15:12:11 2003
+++ linux-2.4.22-pre9-bk-bond/drivers/net/bonding/bond_main.c Wed Jul 30 15:31:01 2003
@@ -3281,6 +3281,19 @@
memcpy(&my_ip, the_ip, 4);
}

+ /* w.tarreau - 2003/07/30
+ * I don't understand the logic here :
+ * - this code should be run only if we're NOT in active-backup mode, which
+ * is the only mode for which this function will be called.
+ * - the comment says the code tries to avoid sending broadcasts for ARP
+ * requests when the destination is known. This is obviously wrong since
+ * it will prevent you from changing the dead equipment you were checking
+ * without reloading the bonding driver ! High availability and low
+ * network usage never mix well ...
+ */
+#warning "This code may need a fix !"
+#ifdef HOW_CAN_THIS_BE_CALLED
+
/* if we are sending arp packets and don't know
* the target hw address, save it so we don't need
* to use a broadcast address.
@@ -3302,6 +3315,7 @@
memcpy(arp_target_hw_addr, eth_hdr->h_dest, ETH_ALEN);
}
}
+#endif

read_lock(&bond->lock);


========= third one ==========


--- linux-2.4.22-pre9-bk/drivers/net/bonding/bond_main.c Wed Jul 30 15:09:15 2003
+++ linux-2.4.22-pre9-bk-bond/drivers/net/bonding/bond_main.c Wed Jul 30 15:12:11 2003
@@ -1594,11 +1594,14 @@
#endif
bond_set_slave_inactive_flags(new_slave);
}
- read_lock_irqsave(&(((struct in_device *)slave_dev->ip_ptr)->lock), rflags);
- ifap= &(((struct in_device *)slave_dev->ip_ptr)->ifa_list);
- ifa = *ifap;
- my_ip = ifa->ifa_address;
- read_unlock_irqrestore(&(((struct in_device *)slave_dev->ip_ptr)->lock), rflags);
+ if (((struct in_device *)slave_dev->ip_ptr) != NULL) {
+ read_lock_irqsave(&(((struct in_device *)slave_dev->ip_ptr)->lock), rflags);
+ ifap= &(((struct in_device *)slave_dev->ip_ptr)->ifa_list);
+ ifa = *ifap;
+ if (ifa != NULL)
+ my_ip = ifa->ifa_address;
+ read_unlock_irqrestore(&(((struct in_device *)slave_dev->ip_ptr)->lock), rflags);
+ }

/* if there is a primary slave, remember it */
if (primary != NULL) {



2003-07-30 23:55:38

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] 2.4.22-pre9-bk : bonding bug fixes

On Wed, 30 Jul 2003 16:06:58 +0200
Willy Tarreau <[email protected]> wrote:

> there are still a few bugs in the current bonding driver. I've reported them
> several times now, but perhaps not at the right places...

So now we have these few bug fixes, and the backport of the
2.6.x version of the bonding code, both submitted on the same
day in fact :-)

Jeff I'd recommend we put Willy's fixes in if you think they're
OK, then we can think about the 2.6.x backport work for 2.4.23-preX

2003-07-31 00:24:54

by Jay Vosburgh

[permalink] [raw]
Subject: Re: [PATCH] 2.4.22-pre9-bk : bonding bug fixes

>On Wed, 30 Jul 2003 16:06:58 +0200
>Willy Tarreau <[email protected]> wrote:
>
>> there are still a few bugs in the current bonding driver. I've reported them
>> several times now, but perhaps not at the right places...
>
>So now we have these few bug fixes, and the backport of the
>2.6.x version of the bonding code, both submitted on the same
>day in fact :-)
>
>Jeff I'd recommend we put Willy's fixes in if you think they're
>OK, then we can think about the 2.6.x backport work for 2.4.23-preX

I've been looking at Willy's fixes, and the typo (first patch)
and locking fix (third patch) both look good to me. The second patch
(the dead code warning) points out a real problem, in that the code in
question really has no function, but the patch probably doesn't go far
enough for a final solution (the variable that code would set,
arp_target_hw_addr, is referenced in other places, but ends up always
being NULL because the dead code is the only place it was ever set).

A more proper solution would be to simply delete the dead code
and the arp_target_hw_addr variable, and replace the variable
references with NULL. This means that all of the ARP probes sent will
be sent out as broadcasts, which is what's already happening, this
just makes the code clearer. Patch follows (which replaces Willy's
second patch).

Does this sound reasonable to everybody?

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [email protected]


--- linux-2.4.22-pre9-bk-wt/drivers/net/bonding/bond_main.c 2003-07-30 17:06:50.000000000 -0700
+++ linux-2.4.22-pre9-bk/drivers/net/bonding/bond_main.c 2003-07-30 17:08:53.000000000 -0700
@@ -463,7 +463,6 @@
static unsigned long arp_target[MAX_ARP_IP_TARGETS] = { 0, } ;
static int arp_ip_count = 0;
static u32 my_ip = 0;
-char *arp_target_hw_addr = NULL;

static char *primary= NULL;

@@ -596,8 +595,7 @@

for (i = 0; (i<MAX_ARP_IP_TARGETS) && arp_target[i]; i++) {
arp_send(ARPOP_REQUEST, ETH_P_ARP, arp_target[i], slave->dev,
- my_ip, arp_target_hw_addr, slave->dev->dev_addr,
- arp_target_hw_addr);
+ my_ip, NULL, slave->dev->dev_addr, NULL);
}
}

@@ -1031,10 +1029,6 @@
}
if (arp_interval> 0) { /* arp interval, in milliseconds. */
del_timer(&bond->arp_timer);
- if (arp_target_hw_addr != NULL) {
- kfree(arp_target_hw_addr);
- arp_target_hw_addr = NULL;
- }
}

if (bond_mode == BOND_MODE_8023AD) {
@@ -3281,28 +3275,6 @@
memcpy(&my_ip, the_ip, 4);
}

- /* if we are sending arp packets and don't know
- * the target hw address, save it so we don't need
- * to use a broadcast address.
- * don't do this if in active backup mode because the slaves must
- * receive packets to stay up, and the only ones they receive are
- * broadcasts.
- */
- if ( (bond_mode != BOND_MODE_ACTIVEBACKUP) &&
- (arp_ip_count == 1) &&
- (arp_interval > 0) && (arp_target_hw_addr == NULL) &&
- (skb->protocol == __constant_htons(ETH_P_IP) ) ) {
- struct ethhdr *eth_hdr =
- (struct ethhdr *) (((char *)skb->data));
- struct iphdr *ip_hdr = (struct iphdr *)(eth_hdr + 1);
-
- if (arp_target[0] == ip_hdr->daddr) {
- arp_target_hw_addr = kmalloc(ETH_ALEN, GFP_KERNEL);
- if (arp_target_hw_addr != NULL)
- memcpy(arp_target_hw_addr, eth_hdr->h_dest, ETH_ALEN);
- }
- }
-
read_lock(&bond->lock);

read_lock(&bond->ptrlock);

2003-07-31 18:50:54

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] 2.4.22-pre9-bk : bonding bug fixes

Applied patches 1 and 3, and will forward to Marcelo today.