2005-11-04 23:30:49

by Willy Tarreau

[permalink] [raw]
Subject: Linux-2.4.31-hf8

Hi,

This is the eighth hotfix for 2.4.31. OK, I know there was one not long ago,
but a recent fix in IPVS which got merged into -hf7 left a refcnt problem in
ip_vs_conn_expire_now, which can cause mid-term/long-term stability problems.
I took this opportunity to merge a backport from 2.6 of another fix from Yan
Zheng affecting multicast source filters.

There's no other fix, people not using IPVS nor IPv6 have no reason to upgrade.

I'd like to specially thank Roberto Nibali, Yan Zheng and David Stevens for
their determination in tracking those bugs, and getting them fixed early
(and most of all, taking some time to explain me what the fixes do !).

Changelog and incremental patch appended. Kernel has been rebuilt on x86.
As usual, 2.4.29-hf18, 2.4.30-hf11 and 2.4.31-hf8 have been released.
You can get them from the usual places :

hotfixes home : http://linux.exosec.net/kernel/2.4-hf/
last version : http://linux.exosec.net/kernel/2.4-hf/LATEST/LATEST/
RSS feed : http://linux.exosec.net/kernel/hf.xml
build results : http://bugsplatter.mine.nu/test/linux-2.4/ (Grant's site)

I hope I did not forget anything, otherwise please bug me.

Regards,
Willy

Changelog from 2.4.31-hf7 to 2.4.31-hf8
---------------------------------------
'+' = added ; '-' = removed

+ 2.4.32-rc2-ip_vs_conn_expire_now-fix_refcnt-dec-1 (Julian Anastasov)

Quoting Roberto Nibali: It is absolutely needed. Without it, people will
really experience a long term problem with hanging templates in IPVS,
manifesting itself depending on time and hardware configuration.
It seems we forgot to fix one place where ip_vs_conn_expire_now is used.
Callers should hold write lock or cp->refcnt (and not forget it). This
results in hanging template entries when expire_nodest_conn is kicking
in and trying to remove all connection entries for a specific
destination. Julian Anastasov created a patch to fix this and asked me
to forward it for inclusion, after test and verification, which have
happened the last 24 hours.

+ 2.4.32-rc2-mcast-filter-1 (Willy Tarreau)

[PATCH-2.4][MCAST]IPv6: small fix for ip6_mc_msfilter(...)
Multicast source filters aren't widely used yet, and that's really
the only feature that's affected if an application actually exercises
this bug, as far as I can tell. An ordinary filter-less multicast join
should still work, and only forwarded multicast traffic making use of
filters and doing empty-source filters with the MSFILTER ioctl would
be at risk of not getting multicast traffic forwarded to them because
the reports generated would not be based on the correct counts.
Initial 2.6 patch by Yan Zheng, bug explanation by David Stevens,
patch ACKed by David.

--

Incremental diff from 2.4.31-hf7
--- linux-2.4.31-hf7/Makefile Tue Nov 1 11:08:11 2005
+++ linux-2.4.31-hf8/Makefile Fri Nov 4 23:50:30 2005
@@ -1,7 +1,7 @@
VERSION = 2
PATCHLEVEL = 4
SUBLEVEL = 31
-EXTRAVERSION = -hf7
+EXTRAVERSION = -hf8

KERNELRELEASE=$(VERSION).$(PATCHLEVEL).$(SUBLEVEL)$(EXTRAVERSION)

--- linux-2.4.31-hf7/net/ipv4/igmp.c Tue Nov 1 11:08:11 2005
+++ linux-2.4.31-hf8/net/ipv4/igmp.c Fri Nov 4 23:50:30 2005
@@ -1876,8 +1876,11 @@
sock_kfree_s(sk, newpsl, IP_SFLSIZE(newpsl->sl_max));
goto done;
}
- } else
- newpsl = 0;
+ } else {
+ newpsl = NULL;
+ (void) ip_mc_add_src(in_dev, &msf->imsf_multiaddr,
+ msf->imsf_fmode, 0, NULL, 0);
+ }
psl = pmc->sflist;
if (psl) {
(void) ip_mc_del_src(in_dev, &msf->imsf_multiaddr, pmc->sfmode,
--- linux-2.4.31-hf7/net/ipv4/ipvs/ip_vs_core.c Tue Nov 1 11:08:11 2005
+++ linux-2.4.31-hf8/net/ipv4/ipvs/ip_vs_core.c Fri Nov 4 23:50:29 2005
@@ -1111,11 +1111,10 @@
if (sysctl_ip_vs_expire_nodest_conn) {
/* try to expire the connection immediately */
ip_vs_conn_expire_now(cp);
- } else {
- /* don't restart its timer, and silently
- drop the packet. */
- __ip_vs_conn_put(cp);
}
+ /* don't restart its timer, and silently
+ drop the packet. */
+ __ip_vs_conn_put(cp);
return NF_DROP;
}

--- linux-2.4.31-hf7/net/ipv6/mcast.c Tue Nov 1 11:08:11 2005
+++ linux-2.4.31-hf8/net/ipv6/mcast.c Fri Nov 4 23:50:30 2005
@@ -505,8 +505,11 @@
sock_kfree_s(sk, newpsl, IP6_SFLSIZE(newpsl->sl_max));
goto done;
}
- } else
- newpsl = 0;
+ } else {
+ newpsl = NULL;
+ (void) ip6_mc_add_src(idev, group, gsf->gf_fmode, 0, NULL, 0);
+ }
+
psl = pmc->sflist;
if (psl) {
(void) ip6_mc_del_src(idev, group, pmc->sfmode,



2005-11-05 07:00:50

by Roberto Nibali

[permalink] [raw]
Subject: Re: Linux-2.4.31-hf8

Hello Willy,

> This is the eighth hotfix for 2.4.31. OK, I know there was one not long ago,
> but a recent fix in IPVS which got merged into -hf7 left a refcnt problem in
> ip_vs_conn_expire_now, which can cause mid-term/long-term stability problems.
> I took this opportunity to merge a backport from 2.6 of another fix from Yan
> Zheng affecting multicast source filters.

Well, to be honest, Horms just found another IPVS "issue" :). It seems
we are getting into reviewing 2.4.x IPVS a bit more closely. The problem
is that if you have setups where the persistency timeout is below the
IPVS state machine related FIN_WAIT (not TCP state) timeout (currently
2*60*HZ) persistent templates will not be invalidated and the timer gets
re-set if a we still have a valid connection entry hashed. I've first
noted this somewhat aberrant behaviour in 2.2.x kernels but never got
around looking at it too closely because in 2.2.x we had a timer mess.

This issue however is absolutely minor since this buglet has been there
for ages already and we never received such a bug report. In fact, it
would be quite unusual to set a persistency timeout below fin_wait in a
LVS_DR setup for productive environments. And I didn't see it because I
set the FIN_WAIT to 10*HZ to relax sockets lingering. We can/will queue
it up, together with a small refcnt change for -hf9 and post 2.4.32.

I take it you read netdev as well, since we will post those patches
there. I'm delighted to see your -hf kernels since lately I have been
told off by a couple of kernel maintainers regarding 2.4.x, which we use
in about 100 of our boxes all over the world, about 300 still run 2.2.x
and are slowly migrated to the now stable 2.4.x series. Doing business
in the finance sector really opts for stability, which is given by 2.4.x.

Have a nice weekend,
Roberto Nibali, ratz
--
echo
'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc

2005-11-05 08:12:03

by Willy Tarreau

[permalink] [raw]
Subject: Re: Linux-2.4.31-hf8

Hi Roberto, Hi Marcelo,

On Sat, Nov 05, 2005 at 08:00:37AM +0100, Roberto Nibali wrote:
> Well, to be honest, Horms just found another IPVS "issue" :). It seems
> we are getting into reviewing 2.4.x IPVS a bit more closely. The problem
> is that if you have setups where the persistency timeout is below the
> IPVS state machine related FIN_WAIT (not TCP state) timeout (currently
> 2*60*HZ) persistent templates will not be invalidated and the timer gets
> re-set if a we still have a valid connection entry hashed. I've first
> noted this somewhat aberrant behaviour in 2.2.x kernels but never got
> around looking at it too closely because in 2.2.x we had a timer mess.
>
> This issue however is absolutely minor since this buglet has been there
> for ages already and we never received such a bug report. In fact, it
> would be quite unusual to set a persistency timeout below fin_wait in a
> LVS_DR setup for productive environments. And I didn't see it because I
> set the FIN_WAIT to 10*HZ to relax sockets lingering. We can/will queue
> it up, together with a small refcnt change for -hf9 and post 2.4.32.

I have a feeling that we will have a lot of network related fixes post
2.4.32 (IPVS, IPv6, mcast...). Marcelo, perhaps it would be a good idea
to merge them in early 2.4.33-pre1 so that competent users have enough
time to test them ? As Roberto explained it, some of the fixes need
hours or days of testing, and some of them are used by only a bunch of
people around the world.

> I take it you read netdev as well, since we will post those patches
> there.

OK, I will put my nose there.

> I'm delighted to see your -hf kernels since lately I have been
> told off by a couple of kernel maintainers regarding 2.4.x, which we use
> in about 100 of our boxes all over the world, about 300 still run 2.2.x
> and are slowly migrated to the now stable 2.4.x series. Doing business
> in the finance sector really opts for stability, which is given by 2.4.x.

Working half of my time in the same area, I've been starting to consider
since 2.4.31 that 2.4 is becoming very stable and ready for production use
in those sensible environments. Having small kernels updates which don't
break PaX compatibility every two weeks is also a very good thing when
seeking for enhanced security on servers ;-)

Regards,
Willy

2005-11-07 14:33:00

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: Linux-2.4.31-hf8

On Sat, Nov 05, 2005 at 08:59:15AM +0100, Willy Tarreau wrote:
> Hi Roberto, Hi Marcelo,
>
> On Sat, Nov 05, 2005 at 08:00:37AM +0100, Roberto Nibali wrote:
> > Well, to be honest, Horms just found another IPVS "issue" :). It seems
> > we are getting into reviewing 2.4.x IPVS a bit more closely. The problem
> > is that if you have setups where the persistency timeout is below the
> > IPVS state machine related FIN_WAIT (not TCP state) timeout (currently
> > 2*60*HZ) persistent templates will not be invalidated and the timer gets
> > re-set if a we still have a valid connection entry hashed. I've first
> > noted this somewhat aberrant behaviour in 2.2.x kernels but never got
> > around looking at it too closely because in 2.2.x we had a timer mess.
> >
> > This issue however is absolutely minor since this buglet has been there
> > for ages already and we never received such a bug report. In fact, it
> > would be quite unusual to set a persistency timeout below fin_wait in a
> > LVS_DR setup for productive environments. And I didn't see it because I
> > set the FIN_WAIT to 10*HZ to relax sockets lingering. We can/will queue
> > it up, together with a small refcnt change for -hf9 and post 2.4.32.
>
> I have a feeling that we will have a lot of network related fixes post
> 2.4.32 (IPVS, IPv6, mcast...). Marcelo, perhaps it would be a good idea
> to merge them in early 2.4.33-pre1 so that competent users have enough
> time to test them ?

Definately. Please queue them up Willy, I will apply Roberto's fix and
release another -rc.

Thanks guys