2009-12-12 05:25:47

by Yinghai Lu

[permalink] [raw]
Subject: Re: nfs broken in net-next? -- now in mainline -- bisected: d9f5950f90292f7cc42834338dfd5f44dc4cc4ca

On Wed, Dec 9, 2009 at 11:33 PM, Ingo Molnar <[email protected]> wrote:
>
> * Yinghai Lu <[email protected]> wrote:
>
>> On Thu, Oct 29, 2009 at 2:03 AM, Yinghai Lu <[email protected]> wrote:
>> > On Thu, Oct 29, 2009 at 12:23 AM, Suresh Jayaraman <[email protected]> wrote:
>> >> On 10/29/2009 01:43 AM, Yinghai Lu wrote:
>> >>> pk12-3214-189-102:~ # mount -t nfs 10.6.75.100:/data/shared/pxeboot /x
>> >>> mount.nfs: rpc.statd is not running but is required for remote locking.
>> >>> mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
>> >>
>> >> rpc.statd on client should have be started by mount.nfs when a nfs
>> >> filesystem is mounted. Is this not happening for some reason or do you
>> >> see any errors in syslog?
>> >>
>> >>>
>> >>> using opensuse11.1
>> >>>
>> >>
>> >> Are you using 11.1 betas? I know of a problem where non-root user mounts
>> >> fail to start rpc.statd in betas that got fixed later:
>> >>
>> >> http://marc.info/?l=linux-nfs&m=122748525624094&w=2
>> >>
>> >> Is the problem seen only recently (after updating to net-next)?
>> >>
>> >
>> > only happen with net-next.
>> >
>> > linus tree and tip tree are ok.
>> >
>>
>> Finally it reached linus tree and tip.
>
> In the quoted text above it's being disputed that it's a kernel
> regression so i guess your best option is to bisect it (if you can).

[linux-2.6]# git bisect good
d9f5950f90292f7cc42834338dfd5f44dc4cc4ca is first bad commit
commit d9f5950f90292f7cc42834338dfd5f44dc4cc4ca
Author: Sridhar Samudrala <[email protected]>
Date: Wed Oct 7 12:24:25 2009 +0000

net: Make UFO on master device independent of attached devices

Now that software UFO is supported, UFO can be enabled on master
devices like bridge, bond even though the attached device doesn't
support this feature in hardware.

This allows UFO to be used between KVM host and guest even when a
physical interface attached to the bridge doesn't support UFO.

Signed-off-by: Sridhar Samudrala <[email protected]>
Acked-by: Herbert Xu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

:040000 040000 0b2424e2c2f3e6f69701e4da909e9e4c83ce2170
cc819b2752e9475ab1cac137ac99b4399328acd7 M net
[linux-2.6]# git bisect log
git bisect start
# bad: [11bd04f6f35621193311c32e0721142b073a7794] Merge branch
'linux-next' of
git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
git bisect bad 11bd04f6f35621193311c32e0721142b073a7794
# good: [22763c5cf3690a681551162c15d34d935308c8d7] Linux 2.6.32
git bisect good 22763c5cf3690a681551162c15d34d935308c8d7
# bad: [d7fc02c7bae7b1cf69269992cf880a43a350cdaa] Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
git bisect bad d7fc02c7bae7b1cf69269992cf880a43a350cdaa
# bad: [cfb3f91af49dff9b50de6929dc4de06100c4cfa8] ixgbe: handle
parameters for tx and rx EITR, no div0
git bisect bad cfb3f91af49dff9b50de6929dc4de06100c4cfa8
# bad: [b4a77d0dee11db834bebe0cc78c211cfebf0d924] rt2800pci: add
rt2800_regbusy_read() wrapper
git bisect bad b4a77d0dee11db834bebe0cc78c211cfebf0d924
# bad: [b771eee583343782c8b44d2b78cf53c29d0f3303] wl1271: Enable
beacon filtering with the stack
git bisect bad b771eee583343782c8b44d2b78cf53c29d0f3303
# bad: [8aa0f64ac3835a6daf84d0b0e07c4c01d7d8eddc] Merge branch
'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
git bisect bad 8aa0f64ac3835a6daf84d0b0e07c4c01d7d8eddc
# good: [125b181aec7a67c71234284ecf6d9c729d05deda] staging: Add proper
selection of WIRELESS_EXT and WEXT_PRIV
git bisect good 125b181aec7a67c71234284ecf6d9c729d05deda
# bad: [d9f5950f90292f7cc42834338dfd5f44dc4cc4ca] net: Make UFO on
master device independent of attached devices
git bisect bad d9f5950f90292f7cc42834338dfd5f44dc4cc4ca
# good: [aaba2b3f8213e1d66e71c351fa7a2b1cbd974d3c] gigaset: allow
building without I4L
git bisect good aaba2b3f8213e1d66e71c351fa7a2b1cbd974d3c
# good: [b1c00fe3cf8f54d97d20cdf196145a106f04bd63] dccp ccid-2:
Overhaul CCID naming convention 1/2
git bisect good b1c00fe3cf8f54d97d20cdf196145a106f04bd63
# good: [b301e82cf8104cfddbe5452ebe625bab49597c64] IPv6: use
ipv6_addr_set_v4mapped()
git bisect good b301e82cf8104cfddbe5452ebe625bab49597c64
# good: [9e8342971d44ce86d8567047f5366fc1c06a75ed] econet: Fix
redeclaration of symbol len
git bisect good 9e8342971d44ce86d8567047f5366fc1c06a75ed
# good: [8a6dfd43d1891882f8ca05d73aa7735fb0edae3b] IPv6: Fix 6RD typo
git bisect good 8a6dfd43d1891882f8ca05d73aa7735fb0edae3b
# good: [f86dcc5aa8c7908f2c287e7a211228df599e3e71] udp: dynamically
size hash tables at boot time
git bisect good f86dcc5aa8c7908f2c287e7a211228df599e3e71

after reverting that commit d9f5950f90292f7cc42834338dfd5f44dc4cc4ca,
the nfs client works again.

YH


2009-12-12 05:41:13

by David Miller

[permalink] [raw]
Subject: Re: nfs broken in net-next? -- now in mainline -- bisected: d9f5950f90292f7cc42834338dfd5f44dc4cc4ca

From: Yinghai Lu <[email protected]>
Date: Fri, 11 Dec 2009 21:25:49 -0800

> [linux-2.6]# git bisect good
> d9f5950f90292f7cc42834338dfd5f44dc4cc4ca is first bad commit
> commit d9f5950f90292f7cc42834338dfd5f44dc4cc4ca
> Author: Sridhar Samudrala <[email protected]>
> Date: Wed Oct 7 12:24:25 2009 +0000
>
> net: Make UFO on master device independent of attached devices

Thanks.

If I don't hear anything from Sridhar in a day or two I'll simply
revert that change.

2009-12-13 01:12:07

by Yinghai Lu

[permalink] [raw]
Subject: Re: nfs broken in net-next? -- now in mainline -- bisected: d9f5950f90292f7cc42834338dfd5f44dc4cc4ca

On Fri, Dec 11, 2009 at 9:41 PM, David Miller <[email protected]> wrote:
> From: Yinghai Lu <[email protected]>
> Date: Fri, 11 Dec 2009 21:25:49 -0800
>
>> [linux-2.6]# git bisect good
>> d9f5950f90292f7cc42834338dfd5f44dc4cc4ca is first bad commit
>> commit d9f5950f90292f7cc42834338dfd5f44dc4cc4ca
>> Author: Sridhar Samudrala <[email protected]>
>> Date: ? Wed Oct 7 12:24:25 2009 +0000
>>
>> ? ? net: Make UFO on master device independent of attached devices
>
> Thanks.
>
> If I don't hear anything from Sridhar in a day or two I'll simply
> revert that change.

seems revert that doesn't fix the problem.

during bisecting, last several testing, i was using kexec instead of
boot from BIOS...

could be other commit like

[f86dcc5aa8c7908f2c287e7a211228df599e3e71] udp: dynamically
size hash tables at boot time

YH

2009-12-13 01:57:55

by Yinghai Lu

[permalink] [raw]
Subject: Re: nfs broken in net-next? -- now in mainline -- bisected: d9f5950f90292f7cc42834338dfd5f44dc4cc4ca

On Sat, Dec 12, 2009 at 5:05 PM, Yinghai Lu <[email protected]> wrote:
>
> [f86dcc5aa8c7908f2c287e7a211228df599e3e71] udp: dynamically
> size hash tables at boot time

commit f86dcc5aa8c7908f2c287e7a211228df599e3e71
Author: Eric Dumazet <[email protected]>
Date: Wed Oct 7 00:37:59 2009 +0000

udp: dynamically size hash tables at boot time

UDP_HTABLE_SIZE was initialy defined to 128, which is a bit small for
several setups.

4000 active UDP sockets -> 32 sockets per chain in average. An
incoming frame has to lookup all sockets to find best match, so long
chains hurt latency.

Instead of a fixed size hash table that cant be perfect for every
needs, let UDP stack choose its table size at boot time like tcp/ip
route, using alloc_large_system_hash() helper

Add an optional boot parameter, uhash_entries=x so that an admin can
force a size between 256 and 65536 if needed, like thash_entries and
rhash_entries.

dmesg logs two new lines :
[ 0.647039] UDP hash table entries: 512 (order: 0, 4096 bytes)
[ 0.647099] UDP Lite hash table entries: 512 (order: 0, 4096 bytes)

Maximal size on 64bit arches would be 65536 slots, ie 1 MBytes for non
debugging spinlocks.

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

cause the problem: nfs mount fail.

the setup is:
64bit kernel, have all needed drivers in kernel, and boot with
ip=dhcp, root disk is 256M ramdisk.
then try to mount nfs....

YH

2009-12-13 02:25:24

by Yinghai Lu

[permalink] [raw]
Subject: Re: nfs broken in net-next? -- now in mainline -- bisected: d9f5950f90292f7cc42834338dfd5f44dc4cc4ca

On Sat, Dec 12, 2009 at 5:57 PM, Yinghai Lu <[email protected]> wrote:
> On Sat, Dec 12, 2009 at 5:05 PM, Yinghai Lu <[email protected]> wrote:
>>
>> [f86dcc5aa8c7908f2c287e7a211228df599e3e71] udp: dynamically
>> size hash tables at boot time
>
> commit f86dcc5aa8c7908f2c287e7a211228df599e3e71
> Author: Eric Dumazet <[email protected]>
> Date: ? Wed Oct 7 00:37:59 2009 +0000
>
> ? ?udp: dynamically size hash tables at boot time
>
> ? ?UDP_HTABLE_SIZE was initialy defined to 128, which is a bit small for
> ? ?several setups.
>
> ? ?4000 active UDP sockets -> 32 sockets per chain in average. An
> ? ?incoming frame has to lookup all sockets to find best match, so long
> ? ?chains hurt latency.
>
> ? ?Instead of a fixed size hash table that cant be perfect for every
> ? ?needs, let UDP stack choose its table size at boot time like tcp/ip
> ? ?route, using alloc_large_system_hash() helper
>
> ? ?Add an optional boot parameter, uhash_entries=x so that an admin can
> ? ?force a size between 256 and 65536 if needed, like thash_entries and
> ? ?rhash_entries.
>
> ? ?dmesg logs two new lines :
> ? ?[ ? ?0.647039] UDP hash table entries: 512 (order: 0, 4096 bytes)
> ? ?[ ? ?0.647099] UDP Lite hash table entries: 512 (order: 0, 4096 bytes)
>
> ? ?Maximal size on 64bit arches would be 65536 slots, ie 1 MBytes for non
> ? ?debugging spinlocks.
>
> ? ?Signed-off-by: Eric Dumazet <[email protected]>
> ? ?Signed-off-by: David S. Miller <[email protected]>
>
> cause the problem: nfs mount fail.
>
> the setup is:
> 64bit kernel, have all needed drivers in kernel, and boot with
> ip=dhcp, root disk is 256M ramdisk.
> then try to mount nfs....
>

change entries default value from 65536 to 256,
nfs mount will work.

YH

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 1f95348..57c13c3 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2054,7 +2054,7 @@ void udp4_proc_exit(void)
}
#endif /* CONFIG_PROC_FS */

-static __initdata unsigned long uhash_entries;
+static __initdata unsigned long uhash_entries = UDP_HTABLE_SIZE_MIN;
static int __init set_uhash_entries(char *str)
{
if (!str)

2009-12-13 02:41:55

by Yinghai Lu

[permalink] [raw]
Subject: Re: nfs broken in net-next? -- now in mainline -- bisected: d9f5950f90292f7cc42834338dfd5f44dc4cc4ca

On Sat, Dec 12, 2009 at 6:25 PM, Yinghai Lu <[email protected]> wrote:
> On Sat, Dec 12, 2009 at 5:57 PM, Yinghai Lu <[email protected]> wrote:
>> On Sat, Dec 12, 2009 at 5:05 PM, Yinghai Lu <[email protected]> wrote:
>>>
>>> [f86dcc5aa8c7908f2c287e7a211228df599e3e71] udp: dynamically
>>> size hash tables at boot time
>>
>> commit f86dcc5aa8c7908f2c287e7a211228df599e3e71
>> Author: Eric Dumazet <[email protected]>
>> Date: ? Wed Oct 7 00:37:59 2009 +0000
>>
>> ? ?udp: dynamically size hash tables at boot time
>>
>> ? ?UDP_HTABLE_SIZE was initialy defined to 128, which is a bit small for
>> ? ?several setups.
>>
>> ? ?4000 active UDP sockets -> 32 sockets per chain in average. An
>> ? ?incoming frame has to lookup all sockets to find best match, so long
>> ? ?chains hurt latency.
>>
>> ? ?Instead of a fixed size hash table that cant be perfect for every
>> ? ?needs, let UDP stack choose its table size at boot time like tcp/ip
>> ? ?route, using alloc_large_system_hash() helper
>>
>> ? ?Add an optional boot parameter, uhash_entries=x so that an admin can
>> ? ?force a size between 256 and 65536 if needed, like thash_entries and
>> ? ?rhash_entries.
>>
>> ? ?dmesg logs two new lines :
>> ? ?[ ? ?0.647039] UDP hash table entries: 512 (order: 0, 4096 bytes)
>> ? ?[ ? ?0.647099] UDP Lite hash table entries: 512 (order: 0, 4096 bytes)
>>
>> ? ?Maximal size on 64bit arches would be 65536 slots, ie 1 MBytes for non
>> ? ?debugging spinlocks.
>>
>> ? ?Signed-off-by: Eric Dumazet <[email protected]>
>> ? ?Signed-off-by: David S. Miller <[email protected]>
>>
>> cause the problem: nfs mount fail.
>>
>> the setup is:
>> 64bit kernel, have all needed drivers in kernel, and boot with
>> ip=dhcp, root disk is 256M ramdisk.
>> then try to mount nfs....
>>
>
> change entries default value from 65536 to 256,
> nfs mount will work.
>
> YH
>
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 1f95348..57c13c3 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -2054,7 +2054,7 @@ void udp4_proc_exit(void)
> ?}
> ?#endif /* CONFIG_PROC_FS */
>
> -static __initdata unsigned long uhash_entries;
> +static __initdata unsigned long uhash_entries = UDP_HTABLE_SIZE_MIN;
> ?static int __init set_uhash_entries(char *str)
> ?{
> ? ? ? ?if (!str)
>

interesting:

m:~/dump # grep UDP dmesg.txt
[ 28.996034] UDP hash table entries: 65536 (order: 11, 10485760 bytes)
[ 29.032364] UDP-Lite hash table entries: 65536 (order: 11, 10485760 bytes)

will not work

but 32768 will work.

2009-12-13 16:58:38

by Eric Dumazet

[permalink] [raw]
Subject: Re: nfs broken in net-next? -- now in mainline -- bisected: d9f5950f90292f7cc42834338dfd5f44dc4cc4ca

Le 13/12/2009 03:41, Yinghai Lu a ?crit :
>
>>> cause the problem: nfs mount fail.
>>>
>>> the setup is:
>>> 64bit kernel, have all needed drivers in kernel, and boot with
>>> ip=dhcp, root disk is 256M ramdisk.
>>> then try to mount nfs....
>>>
>>
>> change entries default value from 65536 to 256,
>> nfs mount will work.
>>
> interesting:
>
> m:~/dump # grep UDP dmesg.txt
> [ 28.996034] UDP hash table entries: 65536 (order: 11, 10485760 bytes)
> [ 29.032364] UDP-Lite hash table entries: 65536 (order: 11, 10485760 bytes)
>
> will not work
>
> but 32768 will work.

Thanks a lot for this work Yinghai !
This last bit helped me a lot ...

Hmm, udp_lib_get_port() assumes it can loop at least one time in :

for (last = first + udptable->mask + 1;
first != last;
first++) {
// unit_work
}

but if udptable->mask == 65535, loop is not entered at all (since last == first)

We should convert it to a do { } while(...); construct, or use 32bit variables
and (u16) casts.

Thanks

[PATCH] udp: udp_lib_get_port() fix

Now we can have a large udp hash table, udp_lib_get_port() loop
should be converted to a do {} while (cond) form,
or we dont enter it at all if hash table size is exactly 65536.

Reported-by: Yinghai Lu <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
---
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 1f95348..f0126fd 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -216,9 +216,8 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
* force rand to be an odd multiple of UDP_HTABLE_SIZE
*/
rand = (rand | 1) * (udptable->mask + 1);
- for (last = first + udptable->mask + 1;
- first != last;
- first++) {
+ last = first + udptable->mask + 1;
+ do {
hslot = udp_hashslot(udptable, net, first);
bitmap_zero(bitmap, PORTS_PER_CHAIN);
spin_lock_bh(&hslot->lock);
@@ -238,7 +237,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
snum += rand;
} while (snum != first);
spin_unlock_bh(&hslot->lock);
- }
+ } while (++first != last);
goto fail;
} else {
hslot = udp_hashslot(udptable, net, snum);

2009-12-13 22:29:57

by Yinghai Lu

[permalink] [raw]
Subject: Re: nfs broken in net-next? -- now in mainline -- bisected: d9f5950f90292f7cc42834338dfd5f44dc4cc4ca

Eric Dumazet wrote:
> Le 13/12/2009 03:41, Yinghai Lu a ?crit :
>>>> cause the problem: nfs mount fail.
>>>>
>>>> the setup is:
>>>> 64bit kernel, have all needed drivers in kernel, and boot with
>>>> ip=dhcp, root disk is 256M ramdisk.
>>>> then try to mount nfs....
>>>>
>>> change entries default value from 65536 to 256,
>>> nfs mount will work.
>>>
>> interesting:
>>
>> m:~/dump # grep UDP dmesg.txt
>> [ 28.996034] UDP hash table entries: 65536 (order: 11, 10485760 bytes)
>> [ 29.032364] UDP-Lite hash table entries: 65536 (order: 11, 10485760 bytes)
>>
>> will not work
>>
>> but 32768 will work.
>
> Thanks a lot for this work Yinghai !
> This last bit helped me a lot ...
>
> Hmm, udp_lib_get_port() assumes it can loop at least one time in :
>
> for (last = first + udptable->mask + 1;
> first != last;
> first++) {
> // unit_work
> }
>
> but if udptable->mask == 65535, loop is not entered at all (since last == first)
>
> We should convert it to a do { } while(...); construct, or use 32bit variables
> and (u16) casts.
>
> Thanks
>
> [PATCH] udp: udp_lib_get_port() fix
>
> Now we can have a large udp hash table, udp_lib_get_port() loop
> should be converted to a do {} while (cond) form,
> or we dont enter it at all if hash table size is exactly 65536.
>
> Reported-by: Yinghai Lu <[email protected]>
> Signed-off-by: Eric Dumazet <[email protected]>
> ---
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 1f95348..f0126fd 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -216,9 +216,8 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
> * force rand to be an odd multiple of UDP_HTABLE_SIZE
> */
> rand = (rand | 1) * (udptable->mask + 1);
> - for (last = first + udptable->mask + 1;
> - first != last;
> - first++) {
> + last = first + udptable->mask + 1;
> + do {
> hslot = udp_hashslot(udptable, net, first);
> bitmap_zero(bitmap, PORTS_PER_CHAIN);
> spin_lock_bh(&hslot->lock);
> @@ -238,7 +237,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
> snum += rand;
> } while (snum != first);
> spin_unlock_bh(&hslot->lock);
> - }
> + } while (++first != last);
> goto fail;
> } else {
> hslot = udp_hashslot(udptable, net, snum);

thanks. that fix the problem.

YH

2009-12-14 03:33:05

by David Miller

[permalink] [raw]
Subject: Re: nfs broken in net-next? -- now in mainline -- bisected: d9f5950f90292f7cc42834338dfd5f44dc4cc4ca

From: Yinghai Lu <[email protected]>
Date: Sun, 13 Dec 2009 14:28:14 -0800

> Eric Dumazet wrote:
>> [PATCH] udp: udp_lib_get_port() fix
>>
>> Now we can have a large udp hash table, udp_lib_get_port() loop
>> should be converted to a do {} while (cond) form,
>> or we dont enter it at all if hash table size is exactly 65536.
>>
>> Reported-by: Yinghai Lu <[email protected]>
>> Signed-off-by: Eric Dumazet <[email protected]>
...
> thanks. that fix the problem.

Applied, thanks everyone.