2006-03-18 20:25:18

by Parag Warudkar

[permalink] [raw]
Subject: OOPS: 2.6.16-rc6 cpufreq_conservative

I can reproduce the below OOPS by doing
$ modprobe cpufreq_conservative
$ echo conservative > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
$ echo conservative > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor

Which brings up a question - Do we really support difference scaling governors
for different cpu cores?

This is a Centrino Duo Laptop.

Unable to handle kernel NULL pointer dereference at virtual address 0000001c
printing eip:
f834e6d0
*pde = 00000000
Oops: 0000 [#1]
PREEMPT SMP
Modules linked in: cpufreq_conservative oprofile ntfs snd_pcm_oss
snd_mixer_oss snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device eth1394
ohci1394 i2c_i801 i2c_core hw_random ipw3945 ieee80211 ieee80211_crypt
firmware_class snd_hda_intel snd_hda_codec snd_pcm snd_timer snd
snd_page_alloc i915 drm cpufreq_ondemand speedstep_centrino b44
CPU: 1
EIP: 0060:[<f834e6d0>] Not tainted VLI
EFLAGS: 00010206 (2.6.16-rc6 #3)
EIP is at dbs_check_cpu+0x220/0x400 [cpufreq_conservative]
eax: 00000000 ebx: 00000001 ecx: 00000001 edx: 7840d1bc
esi: 78445730 edi: 00000002 ebp: 78445730 esp: 7a1c1ef0
ds: 007b es: 007b ss: 0068
Process events/1 (pid: 9, threadinfo=7a1c0000 task=7a33fa90)
Stack: <0>7840d1bc 00000002 00000001 7a2d3a80 00000000 f834f704 7a1aa1c0
00000000
f834e938 00000000 00000202 7a1c0000 f834f700 78132869 00000000 00000000
18a60e00 7a1aa1d0 7a1aa1e8 f834e8b0 00000202 7a1c0000 7a1aa1d8 7a1aa1d0
Call Trace:
[<f834e938>] do_dbs_timer+0x88/0xc0 [cpufreq_conservative]
[<78132869>] run_workqueue+0x79/0xf0
[<f834e8b0>] do_dbs_timer+0x0/0xc0 [cpufreq_conservative]
[<78132a38>] worker_thread+0x158/0x180
[<7811b0d0>] default_wake_function+0x0/0x20
[<7811b0d0>] default_wake_function+0x0/0x20
[<781328e0>] worker_thread+0x0/0x180
[<7813664c>] kthread+0xbc/0x100
[<78136590>] kthread+0x0/0x100
[<78101285>] kernel_thread_helper+0x5/0x10
Code: 0f bc c0 83 f8 03 bb 02 00 00 00 0f 4c d8 83 fb 01 77 49 89 ee 8d b6 00
00 00 00 8b 04 9d 04 80 44 78 bf 02 00 00 00 01 f0 8b 00 <8b> 40 1c 89 7c 24
04 c7 04 24 bc d1 40 78 89 04 9d 48 fa 34 f8


2006-03-18 21:40:44

by Linus Torvalds

[permalink] [raw]
Subject: Re: OOPS: 2.6.16-rc6 cpufreq_conservative


Gaah. Looking at the assembly code to try to figure out where the oops
happened, one thing is clear: "for_each_cpu_mask()" generates some truly
horrible code. C code that looks simple ends up being tons of complex
assembly language due to inline functions etc.

We've made it way too easy for people to write code that just absolutely
_sucks_.

It's extra sad, because disassembling Parag's oops, it looks like he has
limited NR_CPU's to 2 (appropriate for his setup), yet we literally do
things like:

<code+0>: bsf %eax,%eax
<code+3>: cmp $0x3,%eax
<code+6>: mov $0x2,%ebx
<code+11>: cmovl %eax,%ebx
<code+14>: cmp $0x1,%ebx
<code+17>: ja 0x80495fc
<code+19>: mov %ebp,%esi
<code+21>: lea 0x0(%esi),%esi
<code+27>: mov 0x78448004(,%ebx,4),%eax
<code+34>: mov $0x2,%edi
<code+39>: add %esi,%eax
** <code+41>: mov (%eax),%eax ** oops here **
<code+43>: mov 0x1c(%eax),%eax
<code+46>: mov %edi,0x4(%esp)
<code+50>: movl $0x7840d1bc,(%esp)
<code+57>: mov %eax,0x???(,%ebx,4)

where 90% of that code seems to be just cruft from __next_cpu(): the
"bsf", the comparisons against 3 (n+1), the $0x2 setup for calling
find_next_bit()..

Anyway, I _think_ it's this one:

for_each_online_cpu(j) {
dbs_info = &per_cpu(cpu_dbs_info, j);
requested_freq[j] = dbs_info->cur_policy->cur;
}

where dbs_info->cur_policy seems to be NULL. Maybe. Exactly because 90% of
the instructions come from that generic stuff around it, it's hard to be
sure, and I don't have the same compiler version Parag does.

Linus

On Sat, 18 Mar 2006, Parag Warudkar wrote:
>
> Unable to handle kernel NULL pointer dereference at virtual address 0000001c
> printing eip:
> f834e6d0
> *pde = 00000000
> Oops: 0000 [#1]
> PREEMPT SMP
> Modules linked in: cpufreq_conservative oprofile ntfs snd_pcm_oss
> snd_mixer_oss snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device eth1394
> ohci1394 i2c_i801 i2c_core hw_random ipw3945 ieee80211 ieee80211_crypt
> firmware_class snd_hda_intel snd_hda_codec snd_pcm snd_timer snd
> snd_page_alloc i915 drm cpufreq_ondemand speedstep_centrino b44
> CPU: 1
> EIP: 0060:[<f834e6d0>] Not tainted VLI
> EFLAGS: 00010206 (2.6.16-rc6 #3)
> EIP is at dbs_check_cpu+0x220/0x400 [cpufreq_conservative]
> eax: 00000000 ebx: 00000001 ecx: 00000001 edx: 7840d1bc
> esi: 78445730 edi: 00000002 ebp: 78445730 esp: 7a1c1ef0
> ds: 007b es: 007b ss: 0068
> Process events/1 (pid: 9, threadinfo=7a1c0000 task=7a33fa90)
> Stack: <0>7840d1bc 00000002 00000001 7a2d3a80 00000000 f834f704 7a1aa1c0
> 00000000
> f834e938 00000000 00000202 7a1c0000 f834f700 78132869 00000000 00000000
> 18a60e00 7a1aa1d0 7a1aa1e8 f834e8b0 00000202 7a1c0000 7a1aa1d8 7a1aa1d0
> Call Trace:
> [<f834e938>] do_dbs_timer+0x88/0xc0 [cpufreq_conservative]
> [<78132869>] run_workqueue+0x79/0xf0
> [<f834e8b0>] do_dbs_timer+0x0/0xc0 [cpufreq_conservative]
> [<78132a38>] worker_thread+0x158/0x180
> [<7811b0d0>] default_wake_function+0x0/0x20
> [<7811b0d0>] default_wake_function+0x0/0x20
> [<781328e0>] worker_thread+0x0/0x180
> [<7813664c>] kthread+0xbc/0x100
> [<78136590>] kthread+0x0/0x100
> [<78101285>] kernel_thread_helper+0x5/0x10
> Code: 0f bc c0 83 f8 03 bb 02 00 00 00 0f 4c d8 83 fb 01 77 49 89 ee 8d b6 00
> 00 00 00 8b 04 9d 04 80 44 78 bf 02 00 00 00 01 f0 8b 00 <8b> 40 1c 89 7c 24
> 04 c7 04 24 bc d1 40 78 89 04 9d 48 fa 34 f8
>

2006-03-18 22:09:13

by Parag Warudkar

[permalink] [raw]
Subject: Re: OOPS: 2.6.16-rc6 cpufreq_conservative

On Saturday 18 March 2006 16:40, Linus Torvalds wrote:
> Anyway, I _think_ it's this one:
>
> ? ? ? ? ? ? ? ? for_each_online_cpu(j) {
> ? ? ? ? ? ? ? ? ? ? ? ? dbs_info = &per_cpu(cpu_dbs_info, j);
> ? ? ? ? ? ? ? ? ? ? ? ? requested_freq[j] = dbs_info->cur_policy->cur;
> ? ? ? ? ? ? ? ? }
Here is the disassembly of cpufrequency_conservative.o:dbs_check_cpu() from my
setup, if it is of any help.

000004b0 <dbs_check_cpu>:
4b0: 55 push %ebp
4b1: bd 00 00 00 00 mov $0x0,%ebp
4b6: 89 e8 mov %ebp,%eax
4b8: 57 push %edi
4b9: 56 push %esi
4ba: 53 push %ebx
4bb: 83 ec 10 sub $0x10,%esp
4be: 8b 54 24 24 mov 0x24(%esp),%edx
4c2: 8b 0c 95 00 00 00 00 mov 0x0(,%edx,4),%ecx
4c9: 01 c8 add %ecx,%eax
4cb: 8b 50 0c mov 0xc(%eax),%edx
4ce: 85 d2 test %edx,%edx
4d0: 0f 84 ba 01 00 00 je 690 <dbs_check_cpu+0x1e0>
4d6: 66 83 3d 10 00 00 00 cmpw $0x0,0x10
4dd: 00
4de: 8b 00 mov (%eax),%eax
4e0: 89 44 24 0c mov %eax,0xc(%esp)
4e4: 0f 84 ae 01 00 00 je 698 <dbs_check_cpu+0x1e8>
4ea: 8b 4c 24 0c mov 0xc(%esp),%ecx
4ee: bf ff ff ff ff mov $0xffffffff,%edi
4f3: 8b 01 mov (%ecx),%eax
4f5: 85 c0 test %eax,%eax
4f7: 0f 85 23 02 00 00 jne 720 <dbs_check_cpu+0x270>
4fd: b8 20 00 00 00 mov $0x20,%eax
502: eb 52 jmp 556 <dbs_check_cpu+0xa6>
504: 8b 14 9d 00 00 00 00 mov 0x0(,%ebx,4),%edx
50b: 89 e8 mov %ebp,%eax
50d: 8b 0d 60 00 00 00 mov 0x60,%ecx
513: 8d 34 02 lea (%edx,%eax,1),%esi
516: b8 00 00 00 00 mov $0x0,%eax
51b: 8d 04 02 lea (%edx,%eax,1),%eax
51e: 8b 50 30 mov 0x30(%eax),%edx
521: 03 50 28 add 0x28(%eax),%edx
524: 85 c9 test %ecx,%ecx
526: 74 03 je 52b <dbs_check_cpu+0x7b>
528: 03 50 08 add 0x8(%eax),%edx
52b: 8b 4e 04 mov 0x4(%esi),%ecx
52e: 89 d0 mov %edx,%eax
530: 89 56 04 mov %edx,0x4(%esi)
533: 29 c8 sub %ecx,%eax
535: 39 f8 cmp %edi,%eax
537: 0f 42 f8 cmovb %eax,%edi
53a: 8d 43 01 lea 0x1(%ebx),%eax
53d: 89 44 24 08 mov %eax,0x8(%esp)
541: b8 02 00 00 00 mov $0x2,%eax
546: 89 44 24 04 mov %eax,0x4(%esp)
54a: 8b 44 24 0c mov 0xc(%esp),%eax
54e: 89 04 24 mov %eax,(%esp)
551: e8 fc ff ff ff call 552 <dbs_check_cpu+0xa2>
556: 83 f8 03 cmp $0x3,%eax
559: bb 02 00 00 00 mov $0x2,%ebx
55e: 0f 4c d8 cmovl %eax,%ebx
561: 83 fb 01 cmp $0x1,%ebx
564: 76 9e jbe 504 <dbs_check_cpu+0x54>
566: 8d 04 bf lea (%edi,%edi,4),%eax
569: 8b 35 50 00 00 00 mov 0x50,%esi
56f: b9 64 00 00 00 mov $0x64,%ecx
574: 8d 04 80 lea (%eax,%eax,4),%eax
577: ba fe ff ff 7f mov $0x7ffffffe,%edx
57c: 8d 1c 85 00 00 00 00 lea 0x0(,%eax,4),%ebx
583: a1 58 00 00 00 mov 0x58,%eax
588: 29 c1 sub %eax,%ecx
58a: 81 fe c0 e0 ff ff cmp $0xffffe0c0,%esi
590: 77 10 ja 5a2 <dbs_check_cpu+0xf2>
592: 8d 96 9f 0f 00 00 lea 0xf9f(%esi),%edx
598: b8 d3 4d 62 10 mov $0x10624dd3,%eax
59d: f7 e2 mul %edx
59f: c1 ea 08 shr $0x8,%edx
5a2: 0f af ca imul %edx,%ecx
5a5: 39 cb cmp %ecx,%ebx
5a7: 0f 83 7b 01 00 00 jae 728 <dbs_check_cpu+0x278>
5ad: 8b 54 24 24 mov 0x24(%esp),%edx
5b1: 31 c0 xor %eax,%eax
5b3: 8b 4c 24 0c mov 0xc(%esp),%ecx
5b7: 89 04 95 00 00 00 00 mov %eax,0x0(,%edx,4)
5be: 8b 01 mov (%ecx),%eax
5c0: 85 c0 test %eax,%eax
5c2: 0f 84 ca 02 00 00 je 892 <dbs_check_cpu+0x3e2>
5c8: 0f bc c0 bsf %eax,%eax
5cb: 83 f8 03 cmp $0x3,%eax
5ce: bb 02 00 00 00 mov $0x2,%ebx
5d3: 0f 4c d8 cmovl %eax,%ebx
5d6: 83 fb 01 cmp $0x1,%ebx
5d9: 77 40 ja 61b <dbs_check_cpu+0x16b>
5db: 89 ee mov %ebp,%esi
5dd: 8d 76 00 lea 0x0(%esi),%esi
5e0: 8b 14 9d 00 00 00 00 mov 0x0(,%ebx,4),%edx
5e7: 01 f2 add %esi,%edx
5e9: 8b 42 04 mov 0x4(%edx),%eax
5ec: 89 42 08 mov %eax,0x8(%edx)
5ef: 8d 43 01 lea 0x1(%ebx),%eax
5f2: bb 02 00 00 00 mov $0x2,%ebx
5f7: 89 44 24 08 mov %eax,0x8(%esp)
5fb: b8 02 00 00 00 mov $0x2,%eax
600: 89 44 24 04 mov %eax,0x4(%esp)
604: 8b 44 24 0c mov 0xc(%esp),%eax
608: 89 04 24 mov %eax,(%esp)
60b: e8 fc ff ff ff call 60c <dbs_check_cpu+0x15c>
610: 83 f8 03 cmp $0x3,%eax
613: 0f 4c d8 cmovl %eax,%ebx
616: 83 fb 01 cmp $0x1,%ebx
619: 76 c5 jbe 5e0 <dbs_check_cpu+0x130>
61b: 8b 54 24 24 mov 0x24(%esp),%edx
61f: 8b 0c 95 08 00 00 00 mov 0x8(,%edx,4),%ecx
626: 8b 54 24 0c mov 0xc(%esp),%edx
62a: 8b 42 18 mov 0x18(%edx),%eax
62d: 39 c1 cmp %eax,%ecx
62f: 74 5f je 690 <dbs_check_cpu+0x1e0>
631: 8b 15 64 00 00 00 mov 0x64,%edx
637: 0f af d0 imul %eax,%edx
63a: b8 1f 85 eb 51 mov $0x51eb851f,%eax
63f: f7 e2 mul %edx
641: b8 05 00 00 00 mov $0x5,%eax
646: c1 ea 05 shr $0x5,%edx
649: 0f 44 d0 cmove %eax,%edx
64c: 8d 04 11 lea (%ecx,%edx,1),%eax
64f: 8b 4c 24 24 mov 0x24(%esp),%ecx
653: 89 04 8d 08 00 00 00 mov %eax,0x8(,%ecx,4)
65a: 8b 4c 24 0c mov 0xc(%esp),%ecx
65e: 8b 51 18 mov 0x18(%ecx),%edx
661: 39 d0 cmp %edx,%eax
663: 76 0d jbe 672 <dbs_check_cpu+0x1c2>
665: 89 d0 mov %edx,%eax
667: 8b 54 24 24 mov 0x24(%esp),%edx
66b: 89 04 95 08 00 00 00 mov %eax,0x8(,%edx,4)
672: 89 44 24 04 mov %eax,0x4(%esp)
676: 8b 4c 24 0c mov 0xc(%esp),%ecx
67a: ba 01 00 00 00 mov $0x1,%edx
67f: 89 54 24 08 mov %edx,0x8(%esp)
683: 89 0c 24 mov %ecx,(%esp)
686: e8 fc ff ff ff call 687 <dbs_check_cpu+0x1d7>
68b: 90 nop
68c: 8d 74 26 00 lea 0x0(%esi),%esi
690: 83 c4 10 add $0x10,%esp
693: 5b pop %ebx
694: 5e pop %esi
695: 5f pop %edi
696: 5d pop %ebp
697: c3 ret
698: a1 00 00 00 00 mov 0x0,%eax
69d: 85 c0 test %eax,%eax
69f: 0f 84 e3 01 00 00 je 888 <dbs_check_cpu+0x3d8>
6a5: 0f bc c0 bsf %eax,%eax
6a8: 83 f8 03 cmp $0x3,%eax
6ab: bb 02 00 00 00 mov $0x2,%ebx
6b0: 0f 4c d8 cmovl %eax,%ebx
6b3: 83 fb 01 cmp $0x1,%ebx
6b6: 77 49 ja 701 <dbs_check_cpu+0x251>
6b8: 89 ee mov %ebp,%esi
6ba: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
6c0: 8b 04 9d 00 00 00 00 mov 0x0(,%ebx,4),%eax
6c7: bf 02 00 00 00 mov $0x2,%edi
6cc: 01 f0 add %esi,%eax
6ce: 8b 00 mov (%eax),%eax
6d0: 8b 40 1c mov 0x1c(%eax),%eax
6d3: 89 7c 24 04 mov %edi,0x4(%esp)
6d7: c7 04 24 00 00 00 00 movl $0x0,(%esp)
6de: 89 04 9d 08 00 00 00 mov %eax,0x8(,%ebx,4)
6e5: 8d 43 01 lea 0x1(%ebx),%eax
6e8: bb 02 00 00 00 mov $0x2,%ebx
6ed: 89 44 24 08 mov %eax,0x8(%esp)
6f1: e8 fc ff ff ff call 6f2 <dbs_check_cpu+0x242>
6f6: 83 f8 03 cmp $0x3,%eax
6f9: 0f 4c d8 cmovl %eax,%ebx
6fc: 83 fb 01 cmp $0x1,%ebx
6ff: 76 bf jbe 6c0 <dbs_check_cpu+0x210>
701: 8b 4c 24 0c mov 0xc(%esp),%ecx
705: bb 01 00 00 00 mov $0x1,%ebx
70a: bf ff ff ff ff mov $0xffffffff,%edi
70f: 66 89 1d 10 00 00 00 mov %bx,0x10
716: 8b 01 mov (%ecx),%eax
718: 85 c0 test %eax,%eax
71a: 0f 84 dd fd ff ff je 4fd <dbs_check_cpu+0x4d>
720: 0f bc c0 bsf %eax,%eax
723: e9 2e fe ff ff jmp 556 <dbs_check_cpu+0xa6>
728: 8b 54 24 24 mov 0x24(%esp),%edx
72c: 8b 04 95 00 00 00 00 mov 0x0(,%edx,4),%eax
733: 40 inc %eax
734: 89 04 95 00 00 00 00 mov %eax,0x0(,%edx,4)
73b: 8b 15 54 00 00 00 mov 0x54,%edx
741: 39 d0 cmp %edx,%eax
743: 0f 82 47 ff ff ff jb 690 <dbs_check_cpu+0x1e0>
749: 8b 4c 24 0c mov 0xc(%esp),%ecx
74d: bf ff ff ff ff mov $0xffffffff,%edi
752: 8b 01 mov (%ecx),%eax
754: 85 c0 test %eax,%eax
756: 0f 85 40 01 00 00 jne 89c <dbs_check_cpu+0x3ec>
75c: b8 20 00 00 00 mov $0x20,%eax
761: 83 f8 03 cmp $0x3,%eax
764: bb 02 00 00 00 mov $0x2,%ebx
769: 0f 4c d8 cmovl %eax,%ebx
76c: 83 fb 01 cmp $0x1,%ebx
76f: 77 64 ja 7d5 <dbs_check_cpu+0x325>
771: 89 ee mov %ebp,%esi
773: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
779: 8d bc 27 00 00 00 00 lea 0x0(%edi),%edi
780: 8b 04 9d 00 00 00 00 mov 0x0(,%ebx,4),%eax
787: 01 f0 add %esi,%eax
789: 8b 50 04 mov 0x4(%eax),%edx
78c: 8b 68 08 mov 0x8(%eax),%ebp
78f: 89 d1 mov %edx,%ecx
791: 29 e9 sub %ebp,%ecx
793: 89 50 08 mov %edx,0x8(%eax)
796: 8d 43 01 lea 0x1(%ebx),%eax
799: 39 f9 cmp %edi,%ecx
79b: 89 44 24 08 mov %eax,0x8(%esp)
79f: 8b 44 24 0c mov 0xc(%esp),%eax
7a3: bd 02 00 00 00 mov $0x2,%ebp
7a8: 0f 42 f9 cmovb %ecx,%edi
7ab: 89 6c 24 04 mov %ebp,0x4(%esp)
7af: 89 04 24 mov %eax,(%esp)
7b2: e8 fc ff ff ff call 7b3 <dbs_check_cpu+0x303>
7b7: 83 f8 03 cmp $0x3,%eax
7ba: ba 02 00 00 00 mov $0x2,%edx
7bf: 0f 4d c2 cmovge %edx,%eax
7c2: 83 f8 01 cmp $0x1,%eax
7c5: 89 c3 mov %eax,%ebx
7c7: 76 b7 jbe 780 <dbs_check_cpu+0x2d0>
7c9: 8b 35 50 00 00 00 mov 0x50,%esi
7cf: 8b 15 54 00 00 00 mov 0x54,%edx
7d5: 8d 04 bf lea (%edi,%edi,4),%eax
7d8: 8b 4c 24 24 mov 0x24(%esp),%ecx
7dc: 8d 04 80 lea (%eax,%eax,4),%eax
7df: 8d 1c 85 00 00 00 00 lea 0x0(,%eax,4),%ebx
7e6: 89 f0 mov %esi,%eax
7e8: 0f af c2 imul %edx,%eax
7eb: 8b 35 5c 00 00 00 mov 0x5c,%esi
7f1: 31 ff xor %edi,%edi
7f3: ba fe ff ff 7f mov $0x7ffffffe,%edx
7f8: 89 3c 8d 00 00 00 00 mov %edi,0x0(,%ecx,4)
7ff: b9 64 00 00 00 mov $0x64,%ecx
804: 29 f1 sub %esi,%ecx
806: 3d c0 e0 ff ff cmp $0xffffe0c0,%eax
80b: 77 10 ja 81d <dbs_check_cpu+0x36d>
80d: 8d 90 9f 0f 00 00 lea 0xf9f(%eax),%edx
813: b8 d3 4d 62 10 mov $0x10624dd3,%eax
818: f7 e2 mul %edx
81a: c1 ea 08 shr $0x8,%edx
81d: 0f af ca imul %edx,%ecx
820: 39 cb cmp %ecx,%ebx
822: 0f 86 68 fe ff ff jbe 690 <dbs_check_cpu+0x1e0>
828: 8b 44 24 24 mov 0x24(%esp),%eax
82c: 8b 54 24 0c mov 0xc(%esp),%edx
830: 8b 0c 85 08 00 00 00 mov 0x8(,%eax,4),%ecx
837: 3b 4a 14 cmp 0x14(%edx),%ecx
83a: 0f 84 50 fe ff ff je 690 <dbs_check_cpu+0x1e0>
840: a1 64 00 00 00 mov 0x64,%eax
845: 85 c0 test %eax,%eax
847: 0f 84 43 fe ff ff je 690 <dbs_check_cpu+0x1e0>
84d: 8b 5a 18 mov 0x18(%edx),%ebx
850: ba 1f 85 eb 51 mov $0x51eb851f,%edx
855: 0f af c3 imul %ebx,%eax
858: f7 e2 mul %edx
85a: b8 05 00 00 00 mov $0x5,%eax
85f: c1 ea 05 shr $0x5,%edx
862: 0f 44 d0 cmove %eax,%edx
865: 89 c8 mov %ecx,%eax
867: 8b 4c 24 24 mov 0x24(%esp),%ecx
86b: 29 d0 sub %edx,%eax
86d: 89 04 8d 08 00 00 00 mov %eax,0x8(,%ecx,4)
874: 8b 4c 24 0c mov 0xc(%esp),%ecx
878: 8b 51 14 mov 0x14(%ecx),%edx
87b: 39 d0 cmp %edx,%eax
87d: 0f 83 ef fd ff ff jae 672 <dbs_check_cpu+0x1c2>
883: e9 dd fd ff ff jmp 665 <dbs_check_cpu+0x1b5>
888: b8 20 00 00 00 mov $0x20,%eax
88d: e9 16 fe ff ff jmp 6a8 <dbs_check_cpu+0x1f8>
892: b8 20 00 00 00 mov $0x20,%eax
897: e9 2f fd ff ff jmp 5cb <dbs_check_cpu+0x11b>
89c: 0f bc c0 bsf %eax,%eax
89f: e9 bd fe ff ff jmp 761 <dbs_check_cpu+0x2b1>
8a4: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
8aa: 8d bf 00 00 00 00 lea 0x0(%edi),%edi

2006-03-18 22:26:26

by Parag Warudkar

[permalink] [raw]
Subject: Re: OOPS: 2.6.16-rc6 cpufreq_conservative

On Saturday 18 March 2006 16:40, Linus Torvalds wrote:
> Anyway, I _think_ it's this one:
>
> ? ? ? ? ? ? ? ? for_each_online_cpu(j) {
> ? ? ? ? ? ? ? ? ? ? ? ? dbs_info = &per_cpu(cpu_dbs_info, j);
> ? ? ? ? ? ? ? ? ? ? ? ? requested_freq[j] = dbs_info->cur_policy->cur;
> ? ? ? ? ? ? ? ? }
>
> where dbs_info->cur_policy seems to be NULL. Maybe.

That was right on target.
I just put a check in the code which confirms that for some reason cur_policy
for cpu0 is NULL.

Parag

2006-03-18 23:13:31

by Linus Torvalds

[permalink] [raw]
Subject: Re: OOPS: 2.6.16-rc6 cpufreq_conservative



On Sat, 18 Mar 2006, Parag Warudkar wrote:
>
> Here is the disassembly of cpufrequency_conservative.o:dbs_check_cpu() from my
> setup, if it is of any help.

Yup. Just confirming that this seems to be what I was pointing to.

It's the first "for_each_online_cpu()" loop: the reason it's pretty far
down in the function disassembly is that the early "init_flag == 0" test
jumps off into the latter part of the thing, even though the code is
early. That's just gcc re-ordering stuff.

> 000004b0 <dbs_check_cpu>:
> 4b0: 55 push %ebp
> 4b1: bd 00 00 00 00 mov $0x0,%ebp
> 4b6: 89 e8 mov %ebp,%eax
> 4b8: 57 push %edi
> 4b9: 56 push %esi
> 4ba: 53 push %ebx
> 4bb: 83 ec 10 sub $0x10,%esp
> 4be: 8b 54 24 24 mov 0x24(%esp),%edx
> 4c2: 8b 0c 95 00 00 00 00 mov 0x0(,%edx,4),%ecx
> 4c9: 01 c8 add %ecx,%eax
> 4cb: 8b 50 0c mov 0xc(%eax),%edx
> 4ce: 85 d2 test %edx,%edx
> 4d0: 0f 84 ba 01 00 00 je 690 <dbs_check_cpu+0x1e0>

This is the "init_flag == 0" test:

> 4d6: 66 83 3d 10 00 00 00 cmpw $0x0,0x10
> 4dd: 00
> 4de: 8b 00 mov (%eax),%eax
> 4e0: 89 44 24 0c mov %eax,0xc(%esp)
> 4e4: 0f 84 ae 01 00 00 je 698 <dbs_check_cpu+0x1e8>
.......

This is

for_each_online_cpu(j) {
> 698: a1 00 00 00 00 mov 0x0,%eax
> 69d: 85 c0 test %eax,%eax
> 69f: 0f 84 e3 01 00 00 je 888 <dbs_check_cpu+0x3d8>
> 6a5: 0f bc c0 bsf %eax,%eax
> 6a8: 83 f8 03 cmp $0x3,%eax
> 6ab: bb 02 00 00 00 mov $0x2,%ebx
> 6b0: 0f 4c d8 cmovl %eax,%ebx
> 6b3: 83 fb 01 cmp $0x1,%ebx
> 6b6: 77 49 ja 701 <dbs_check_cpu+0x251>
> 6b8: 89 ee mov %ebp,%esi
> 6ba: 8d b6 00 00 00 00 lea 0x0(%esi),%esi


dbs_info = &per_cpu(cpu_dbs_info, j);


> 6c0: 8b 04 9d 00 00 00 00 mov 0x0(,%ebx,4),%eax
> 6c7: bf 02 00 00 00 mov $0x2,%edi
> 6cc: 01 f0 add %esi,%eax
> 6ce: 8b 00 mov (%eax),%eax

And this is the oopsing instruction:

> 6d0: 8b 40 1c mov 0x1c(%eax),%eax
> 6d3: 89 7c 24 04 mov %edi,0x4(%esp)
> 6d7: c7 04 24 00 00 00 00 movl $0x0,(%esp)

And this is the "requested_freq[j]" assignment:

> 6de: 89 04 9d 08 00 00 00 mov %eax,0x8(,%ebx,4)
> 6e5: 8d 43 01 lea 0x1(%ebx),%eax
> 6e8: bb 02 00 00 00 mov $0x2,%ebx
> 6ed: 89 44 24 08 mov %eax,0x8(%esp)

This is really a "call find_next_bit"

> 6f1: e8 fc ff ff ff call 6f2 <dbs_check_cpu+0x242>
> 6f6: 83 f8 03 cmp $0x3,%eax
> 6f9: 0f 4c d8 cmovl %eax,%ebx
> 6fc: 83 fb 01 cmp $0x1,%ebx
> 6ff: 76 bf jbe 6c0 <dbs_check_cpu+0x210>

And that wass the end of that loop:

> 701: 8b 4c 24 0c mov 0xc(%esp),%ecx
> 705: bb 01 00 00 00 mov $0x1,%ebx
> 70a: bf ff ff ff ff mov $0xffffffff,%edi
> 70f: 66 89 1d 10 00 00 00 mov %bx,0x10

And that was setting "init_flag = 1"

Horrid, horrid assembly language from that simple "for_each_online_cpu(j)"
loop. Oh, well.

Linus

2006-03-19 00:56:43

by Andrew Morton

[permalink] [raw]
Subject: Re: OOPS: 2.6.16-rc6 cpufreq_conservative

Linus Torvalds <[email protected]> wrote:
>
> Gaah. Looking at the assembly code to try to figure out where the oops
> happened, one thing is clear: "for_each_cpu_mask()" generates some truly
> horrible code. C code that looks simple ends up being tons of complex
> assembly language due to inline functions etc.
>
> We've made it way too easy for people to write code that just absolutely
> _sucks_.

Yes, uninlining merely first_cpu() shrinks my usual vmlinux by 2.5k. I'll
fix it up.

Meanwhile, I suppose we need to check that pointer for NULL as Parag
suggests. I might stick a once-off WARN_ON() in there so someone gets in
and works out why we keep on having to graft mysterious null-pointer
avoidances into cpufreq.

2006-03-19 02:40:01

by Linus Torvalds

[permalink] [raw]
Subject: Re: OOPS: 2.6.16-rc6 cpufreq_conservative



On Sat, 18 Mar 2006, Andrew Morton wrote:
>
> Yes, uninlining merely first_cpu() shrinks my usual vmlinux by 2.5k. I'll
> fix it up.

The thing is, we could do _so_ much better if we just fixed the "calling
convention" for that loop.

In particular, if we instead of

for_each_cpu_mask(cpu, mask)
.. do something with cpu ..

had

for_each_cpu_mask(cpu, mask) {
.. do something with cpu ..
} end_for_each_online_cpu;

we could do

#if NR_CPUS <= BITS_IN_LONG
#define for_each_cpu_mask(cpu, mask) \
{ unsigned long __cpubits = (mask)->bits[0]; \
for (cpu = 0; __cpubits; cpu++, cpubits >>= 1) { \
if (!(__cpubits & 1)) \
continue;

#define end_for_each_cpu_mask } }

then the code we'd generate would be
(a) tons smaller
(b) lots faster
than what we do now. No uninlining necessary, because we'd simply not have
any complex code to either inline or to call.

The reason the code sucks right now is that the interface is bad, and
causes us to do insane things to make it work at all with that syntax.

(We already have this kind of thing for the "do_each_thread /
while_each_thread" interface to make it work sanely. Also, anybody who has
ever played with "sparse" has seen the wonderful and flexible
FOR_EACH_PTR() .. END_FOR_EACH_PTR() interface that allows for efficient
pointer list traversal).

Linus

2006-03-19 05:09:15

by Paul Jackson

[permalink] [raw]
Subject: Re: OOPS: 2.6.16-rc6 cpufreq_conservative

If find_first_bit and find_next_bit suck so bad, then why just fix
their use in the for_each_cpu_mask(). How about the other uses?

Such as the other 50 odd places in the kernel that call them directly,
such as:

$ grep ' = find_[a-z]*_bit' lib/bitmap.c
rbot = cur = find_first_bit(maskp, nmaskbits);
cur = find_next_bit(maskp, nmaskbits, cur+1);
i = find_first_bit(buf, bits);
i = find_next_bit(buf, bits, i + 1);
for (i = find_first_bit(buf, bits);
i = find_next_bit(buf, bits, i + 1))
for (oldbit = find_first_bit(src, bits);
oldbit = find_next_bit(src, bits, oldbit + 1)) {

Perhaps the common interface to these find_*_bit routines should be out
of line, with perhaps just a couple of key calls from the scheduler
using the inline form.

And if we fix the cpu loop to the API Linus suggests, we should do the
same with the node loops, such as used in:

$ grep for_each.*_node mm/slab.c
for_each_node(i) {
for_each_node(i)
for_each_online_node(i) {
for_each_online_node(node) {
for_each_online_node(node) {
for_each_online_node(node) {
for_each_online_node(node) {
for_each_online_node(i) {
for_each_online_node(i) {
for_each_online_node(node) {
for_each_online_node(node) {
for_each_online_node(node) {
for_each_online_node(node) {

And for those of us with too many CPUs, how about something like
(totally untested and probably totally bogus):

#if NR_CPUS <= BITS_IN_LONG
... as per Linus, shifting a mask right ...
#else
#define for_each_cpu_mask(cpu, mask)
{ for (cpu = 0; cpu < NR_CPUS; cpu++) {
if (!(test_bit(cpu, mask))
continue;
#endif

#define end_for_each_cpu_mask } }

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2006-03-19 06:34:03

by Parag Warudkar

[permalink] [raw]
Subject: Re: OOPS: 2.6.16-rc6 cpufreq_conservative

On Saturday 18 March 2006 19:53, Andrew Morton wrote:
> I might stick a once-off WARN_ON() in there so someone gets in
> and works out why we keep on having to graft mysterious null-pointer
> avoidances into cpufreq.
cpufreq_conservative should be marked broken on SMP - I have used it on UP
boxes without trouble but I can't even safely modprobe it on SMP - it nearly
ate my filesystem.

And there seem to be multiple different problems with it - I get different
oopses depending upon whether or not I have loaded it before or after the
ondemand module. Weird enough - cpufreq_conservative shares much of it's
code with cpufreq_ondemand, which works without any problem.

Let me know if anyone has objection to marking cpufreq_conservative
depends !SMP - I am planning to submit a patch soon.

Parag

2006-03-19 12:00:56

by Alexander Clouter

[permalink] [raw]
Subject: Re: OOPS: 2.6.16-rc6 cpufreq_conservative

Hi,

Parag Warudkar <[email protected]> [20060319 01:34:01 -0500]:
>
> On Saturday 18 March 2006 19:53, Andrew Morton wrote:
> > I might stick a once-off WARN_ON() in there so someone gets in
> > and works out why we keep on having to graft mysterious null-pointer
> > avoidances into cpufreq.
> cpufreq_conservative should be marked broken on SMP - I have used it on UP
> boxes without trouble but I can't even safely modprobe it on SMP - it nearly
> ate my filesystem.
>
> And there seem to be multiple different problems with it - I get different
> oopses depending upon whether or not I have loaded it before or after the
> ondemand module. Weird enough - cpufreq_conservative shares much of it's
> code with cpufreq_ondemand, which works without any problem.
>
Well its drifted a bit, however I submitted a number of patches here about
two weeks ago to bring it back into line and hopefully make it HOTPLUG safe.

The new set of patches pretty much make conservative's codebase identical to
ondemands....as no one has posted back having used these or anything what am
I to do?!

> Let me know if anyone has objection to marking cpufreq_conservative
> depends !SMP - I am planning to submit a patch soon.
>
Doesn't bother me, what does is no one is trying to updates to conservative
before deciding to declare it borked?

Hey ho....

Alex

> Parag

--
_______________________________________
/ Misfortunes arrive on wings and leave \
\ on foot. /
---------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||


Attachments:
(No filename) (1.63 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2006-03-19 14:06:24

by Parag Warudkar

[permalink] [raw]
Subject: Re: OOPS: 2.6.16-rc6 cpufreq_conservative

On Sunday 19 March 2006 07:00, Alexander Clouter wrote:
> Well its drifted a bit, however I submitted a number of patches here about
> two weeks ago to bring it back into line and hopefully make it HOTPLUG
> safe.
>
> The new set of patches pretty much make conservative's codebase identical
> to ondemands....as no one has posted back having used these or anything
> what am I to do?!

The codebase already seems identical to ondemand - Are your patches in
2.6.16-rc6 or -mm? If they are - let me know which. If you posted them but
they haven't yet made it into either -mm or mainline can you please post
links to all your patches please? I can test them.

Why do we even have conservative and ondemand as two separate modules given
they share huge amount of code - perhaps make conservative an optional
behaviour of ondemand or alteast make a common lib which both use?

Thanks
Parag

2006-03-19 17:34:36

by Alexander Clouter

[permalink] [raw]
Subject: Re: OOPS: 2.6.16-rc6 cpufreq_conservative

Hi,

Parag Warudkar <[email protected]> [20060319 09:06:25 -0500]:
>
> The codebase already seems identical to ondemand - Are your patches in
> 2.6.16-rc6 or -mm? If they are - let me know which. If you posted them but
> they haven't yet made it into either -mm or mainline can you please post
> links to all your patches please? I can test them.
>
Well I submitted them back on 2006-02-24:

http://marc.theaimsgroup.com/?l=linux-kernel&m=114079151404567&w=2
http://marc.theaimsgroup.com/?l=linux-kernel&m=114079151425558&w=2
http://marc.theaimsgroup.com/?l=linux-kernel&m=114079151417220&w=2

It has been pointed out I did forget to CC the cpufreq folk, so thats
completely my fault; I thought I had. :-/

> Why do we even have conservative and ondemand as two separate modules given
> they share huge amount of code - perhaps make conservative an optional
> behaviour of ondemand or alteast make a common lib which both use?
>
Originally the 'conservative' feature was just a sysfs flag that could be set
however it was rejected for a number of reasons; one of them quite rightly,
we have a modular system that can take stacks of cpufreq governors so lets
use that :)

Also, more importantly, bugs in my bits do not affect the original author.
It makes more sense to keep things this way as the internal code then does
not need a bunch of if{}'s scattered around and also I have a number of extra
sysfs files to tweak which would either do nothing in 'ondemand' mode or have
to be magically created and destroyed.

Either way, its probably neater this way and its *my* duty to make sure
anything changing for ondemand is considered for conservative. If you look
at a lot of the userland tools that have come out, it would be a pain to have
them consider the exception class of handling the combined
ondemand/conservative.

Breaking out the thing into a library probably would be awkward as all the
similar code is actually inline in functions, putting that in a seperate file
would be pointless. Hopefully when you apply my patches and then do a diff
between the ondemand and conservative governors you will see what I mean.

Cheers

Alex

--
________________________________________
/ Fortune finishes the great quotations, \
| #3 |
| |
| Birds of a feather flock to a newly |
\ washed car. /
----------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||


Attachments:
(No filename) (2.53 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2006-03-19 17:45:43

by Linus Torvalds

[permalink] [raw]
Subject: Re: OOPS: 2.6.16-rc6 cpufreq_conservative



On Sat, 18 Mar 2006, Paul Jackson wrote:
>
> If find_first_bit and find_next_bit suck so bad, then why just fix
> their use in the for_each_cpu_mask(). How about the other uses?

find_first_bit/find_next_bit don't actually suck. They're perfectly
efficient for large bitmaps, which is what they were designed for (ie they
were written for doing the free-block/inode bitmaps for minixfs
originally).

In other words, they are meant for bitmaps of _thousands_ of bits, and
for potentially sparse bitmaps.

Now, the CPU masks _can_ be large too, but commonly they are not. In this
example, the "bitmap" was exactly two bits: cpu 0 and cpu 1. Using a
complex "find_next_bit" for something like that is just overkill.

> And if we fix the cpu loop to the API Linus suggests, we should do the
> same with the node loops

Yes. Almost ever "for_each_xyzzy" tends to be more easily written as a
macro if we also have a ending condition.

> And for those of us with too many CPUs, how about something like
> (totally untested and probably totally bogus):
>
> #if NR_CPUS <= BITS_IN_LONG
> ... as per Linus, shifting a mask right ...
> #else
> #define for_each_cpu_mask(cpu, mask)
> { for (cpu = 0; cpu < NR_CPUS; cpu++) {
> if (!(test_bit(cpu, mask))
> continue;
> #endif
>
> #define end_for_each_cpu_mask } }

Yes, except you'd probably be better off with testing whole words at a
time, since a large NR_CPUs can often be because somebody wanted to
support 256 CPU machines, and then used the same binary on a 8-way setup.

That would most helpfully be done with a double loop (outer one iterating
over the bitmask words, inner one over the bits), but then it's hard to
make "break" do the right thing inside the loop.

Linus

2006-03-19 18:47:09

by Linus Torvalds

[permalink] [raw]
Subject: Re: OOPS: 2.6.16-rc6 cpufreq_conservative



On Sat, 18 Mar 2006, Linus Torvalds wrote:
>
> The thing is, we could do _so_ much better if we just fixed the "calling
> convention" for that loop.

Actually, looking at it a bit more, I think we can do better even
_without_ fixing the "calling convention".

Namely like this.

This should make the "NR_CPUS <= 16" case have a _much_ simpler loop.

I picked "16" at random. The loop is much faster and much simpler (it
doesn't use complex instructions like "find first bit", but the downside
is that if the CPU mask is very sparse, it will take more of those (very
simple) iterations to figure out that it's empty.

I suspect the "16" could be raised to 32 (at which point it would cover
all the default vaules), but somebody should probably take a look at
how much this shrinks the kernel.

I have to admit that it's a bit evil to leave a dangling "else" in a
macro, but it beautifully handles the "must be followed by exactly one C
statement" requirement of the old macro ;)

So instead of calling it "evil", let's just say that it's "creative
macro-writing".

(Btw, we could make this even cleaner by making the non-SMP case define
"cpu_isset()" to 1 - at that point the UP and the "low CPU count" #defines
could be merged into one).

Linus

--
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 60e56c6..a659f42 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -313,11 +313,17 @@ static inline void __cpus_remap(cpumask_
bitmap_remap(dstp->bits, srcp->bits, oldp->bits, newp->bits, nbits);
}

-#if NR_CPUS > 1
+#if NR_CPUS > 16
#define for_each_cpu_mask(cpu, mask) \
for ((cpu) = first_cpu(mask); \
(cpu) < NR_CPUS; \
(cpu) = next_cpu((cpu), (mask)))
+#elif NR_CPUS > 1
+#define for_each_cpu_mask(cpu, mask) \
+ for ((cpu) = 0; (cpu) < NR_CPUS; (cpu)++) \
+ if (!cpu_isset(cpu, mask)) \
+ continue; \
+ else
#else /* NR_CPUS == 1 */
#define for_each_cpu_mask(cpu, mask) for ((cpu) = 0; (cpu) < 1; (cpu)++)
#endif /* NR_CPUS */

2006-03-19 19:02:47

by Linus Torvalds

[permalink] [raw]
Subject: Re: OOPS: 2.6.16-rc6 cpufreq_conservative



On Sun, 19 Mar 2006, Linus Torvalds wrote:
>
> Actually, looking at it a bit more, I think we can do better even
> _without_ fixing the "calling convention".

Here's the "before and after" example of nr_context_switches() from
kernel/sched.c with this patch on x86-64 (selected because the whole
function is just basically one simple "loop over all possible CPU's and
return the sum of context switches").

I'll do the "after" first, because it's actually readable:

nr_context_switches:
pushq %rbp
xorl %esi, %esi
xorl %ecx, %ecx
movq %rsp, %rbp
.L210:
#APP
btl %ecx,cpu_possible_map(%rip)
sbbl %eax,%eax
#NO_APP
testl %eax, %eax
je .L211
movq _cpu_pda(,%rcx,8), %rdx
movq $per_cpu__runqueues, %rax
addq 8(%rdx), %rax
addq 40(%rax), %rsi
.L211:
incq %rcx
cmpq $8, %rcx
jne .L210
leave
movq %rsi, %rax
ret

ie the loop starts at .L210, end basically iterates %rcx from 0..7, and
you can read the assembly and it actually makes sense.

(Whether it _works_ is another matter: I haven't actually booted the
patched kernel ;)

Compare that to the "before" state:

nr_context_switches:
pushq %rbp
movl $8, %eax
movq cpu_possible_map(%rip), %rdi
#APP
bsfq %rdi,%rdx ; cmovz %rax,%rdx
#NO_APP
cmpl $9, %edx
movq %rdx, %rax
movl $8, %edx
cmovge %edx, %eax
movq %rsp, %rbp
pushq %rbx
movslq %eax,%rcx
xorl %r8d, %r8d
jmp .L261
.L262:
movq _cpu_pda(,%rcx,8), %rdx
movl $8, %esi
movq $per_cpu__runqueues, %rax
movq %rdi, %rbx
movq 8(%rdx), %rdx
addq 40(%rax,%rdx), %r8
movl %ecx, %edx
movl %esi, %eax
leal 1(%rdx), %ecx
subl %edx, %eax
decl %eax
shrq %cl, %rbx
cltq
movq %rbx, %rcx
#APP
bsfq %rcx,%rbx ; cmovz %rax,%rbx
#NO_APP
leal 1(%rdx,%rbx), %edx
cmpl $9, %edx
cmovge %esi, %edx
movslq %edx,%rcx
.L261:
cmpq $7, %rcx
jbe .L262
popq %rbx
leave
movq %r8, %rax
ret

which is not only obviously bigger (40 instructions vs just 18), I also
dare anybody to actually read and understand the assembly.

Now, the simple version will iterate 8 times over the loop, while the more
complex version will iterate just as many times as there are CPU's in the
actual system. So there's a trade-off. The "load the CPU mask first and
then shift it down by one every time" thing (that required different
syntax) would have been able to exit early. This one isn't. So the syntax
change would still help a lot (and would avoid the "btl").

Of course, if people were to set CONFIG_NR_CPUS appropriately for their
system, the shorter version gets increasingly better (since it then won't
do any unnecessary work).

Linus

2006-03-19 19:33:35

by Linus Torvalds

[permalink] [raw]
Subject: Re: OOPS: 2.6.16-rc6 cpufreq_conservative



On Sun, 19 Mar 2006, Linus Torvalds wrote:
>
> Now, the simple version will iterate 8 times over the loop, while the more
> complex version will iterate just as many times as there are CPU's in the
> actual system. So there's a trade-off. The "load the CPU mask first and
> then shift it down by one every time" thing (that required different
> syntax) would have been able to exit early. This one isn't. So the syntax
> change would still help a lot (and would avoid the "btl").

Hah. My "abuse every gcc feature and do really ugly macros" dark side has
been vetted, and it came up with the appended work of art.

Doing a "break" inside a conditional by using the gcc statement
expressions is sublime.

And it works. It's a couple of instructions longer than the shortest
version (but still about half the size of the horror we have now), but the
instructions are simpler (just a shift rather than a "btl"), and it now
knows to break out early if there are no CPU's left to check.

It has the added advantage that because it uses simpler core instructions,
gcc can actually optimize it better - in "nr_context_switches()" gcc
happily hoisted the "mask->bits[0]" load to outside the loop, for example.

With NR_CPUS==2, gcc even apparently unrolls the loop, to the point where
it isn't even a loop at all. Good boy (at least for that case - I sure
hope it won't do that for some of the larger loops ;).

Of course, I shouldn't say "works", since it is still totally untested. It
_looks_ good, and that statement expression usage is just _so_ ugly it's
cute.

Linus

---
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 60e56c6..17a965c 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -313,11 +313,20 @@ static inline void __cpus_remap(cpumask_
bitmap_remap(dstp->bits, srcp->bits, oldp->bits, newp->bits, nbits);
}

-#if NR_CPUS > 1
+#define check_for_each_cpu(cpu, mask) \
+ ({ unsigned long __bits = (mask).bits[0] >> (cpu); if (!__bits) break; __bits & 1; })
+
+#if NR_CPUS > 32
#define for_each_cpu_mask(cpu, mask) \
for ((cpu) = first_cpu(mask); \
(cpu) < NR_CPUS; \
(cpu) = next_cpu((cpu), (mask)))
+#elif NR_CPUS > 1
+#define for_each_cpu_mask(cpu, mask) \
+ for ((cpu) = 0; (cpu) < NR_CPUS; (cpu)++) \
+ if (!check_for_each_cpu(cpu, mask)) \
+ continue; \
+ else
#else /* NR_CPUS == 1 */
#define for_each_cpu_mask(cpu, mask) for ((cpu) = 0; (cpu) < 1; (cpu)++)
#endif /* NR_CPUS */

2006-03-19 19:40:07

by Al Viro

[permalink] [raw]
Subject: Re: OOPS: 2.6.16-rc6 cpufreq_conservative

On Sun, Mar 19, 2006 at 11:33:17AM -0800, Linus Torvalds wrote:
> Doing a "break" inside a conditional by using the gcc statement
> expressions is sublime.
>
> And it works.

In the version of gcc you've tested. With options and phase of moon
being what they had been. IOW, you are awfully optimistic - it's not
just using gcc extension, it's using undocumented (in the best case)
behaviour outside the intended use of that extension.

2006-03-19 20:02:29

by Linus Torvalds

[permalink] [raw]
Subject: Re: OOPS: 2.6.16-rc6 cpufreq_conservative



On Sun, 19 Mar 2006, Al Viro wrote:
>
> In the version of gcc you've tested. With options and phase of moon
> being what they had been. IOW, you are awfully optimistic - it's not
> just using gcc extension, it's using undocumented (in the best case)
> behaviour outside the intended use of that extension.

I admit that it's ugly, but it's not undocumented. It flows directly from
"statements as expression". Once you do that, you have to do flow control
with them.

The end result may be _surprising_, the same way Duff's device is
surprising (and for the same reason). But a C compiler that doesn't
support Duff's device is not a C compiler. And this is really no
different: it may not bestandard C: but it _is_ standard and documented
GNU C.

And btw, this is _not_ new behaviour for the kernel. We have used
non-local control behaviour in statement expressions before, just do a

git grep '({.*return'

to see at least ten cases of that (in fact, check out NFA_PUT(), which
does a goto for the failure case in networking). That grep misses all the
multi-line cases, so I assume there are more of them.

So this definitely works, and is not new.

Linus

2006-03-19 20:32:04

by Linus Torvalds

[permalink] [raw]
Subject: Re: OOPS: 2.6.16-rc6 cpufreq_conservative



On Sun, 19 Mar 2006, Linus Torvalds wrote:
>
> So this definitely works, and is not new.

Btw, don't get me wrong - I think it would be preferable to use the
"for_each/end_for_each" syntax, since that would make the macros simpler,
and would possibly allow for somewhat simpler generated code.

But that would require each of the 300+ "for_each.*_cpu()" uses in the
current kernel to be modified and checked. In the meantime, people who are
interested can test out how much difference the trivial patch makes for
them..

For me, it made a 4970 byte difference in code size. Noticeable? Maybe
not (it's about 0.15% of my kernel text-size), but it _is_ better code,
and next time somebody gets an oops near there, at least the crap code
won't be hiding the cause ;)

Linus

2006-03-19 20:51:36

by Andrew Morton

[permalink] [raw]
Subject: Re: OOPS: 2.6.16-rc6 cpufreq_conservative

Linus Torvalds <[email protected]> wrote:
>
> For me, it made a 4970 byte difference in code size.
>

That's about the same saving as uninlining first_cpu() and next_cpu()
provides.

Anything which iterates across multiple CPUs is cachemiss heaven - I doubt
if this is performance-critical code. Or at least if it is, we have bigger
problems..

2006-03-19 20:57:37

by Parag Warudkar

[permalink] [raw]
Subject: Re: OOPS: 2.6.16-rc6 cpufreq_conservative

On Sunday 19 March 2006 15:31, Linus Torvalds wrote:
> In the meantime, people who are
> interested can test out how much difference the trivial patch makes for
> them..

4464 bytes saving in text size for me on x86 - boots and seems to work fine.

Parag

2006-03-19 22:18:43

by Linus Torvalds

[permalink] [raw]
Subject: Re: OOPS: 2.6.16-rc6 cpufreq_conservative



On Sun, 19 Mar 2006, Andrew Morton wrote:
>
> That's about the same saving as uninlining first_cpu() and next_cpu()
> provides.

Btw, we could do that better for UP too.

How does this patch look? It makes "for_each_cpu()" look the same
regardless of whether it is UP/SMP, by simply making first_cpu() and
next_cpu() work appropriately.

No ugly macros this time ;)

Linus

---
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 60e56c6..c8dbc24 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -212,17 +212,15 @@ static inline void __cpus_shift_left(cpu
bitmap_shift_left(dstp->bits, srcp->bits, n, nbits);
}

-#define first_cpu(src) __first_cpu(&(src), NR_CPUS)
-static inline int __first_cpu(const cpumask_t *srcp, int nbits)
-{
- return min_t(int, nbits, find_first_bit(srcp->bits, nbits));
-}
-
-#define next_cpu(n, src) __next_cpu((n), &(src), NR_CPUS)
-static inline int __next_cpu(int n, const cpumask_t *srcp, int nbits)
-{
- return min_t(int, nbits, find_next_bit(srcp->bits, nbits, n+1));
-}
+#if NR_CPUS > 1
+extern int fastcall __first_cpu(const cpumask_t *srcp);
+extern int fastcall __next_cpu(int n, const cpumask_t *srcp);
+#define first_cpu(srcp) __first_cpu(&(srcp))
+#define next_cpu(n,srcp) __next_cpu((n),&(srcp))
+#else
+#define first_cpu(srcp) (0)
+#define next_cpu(n, srcp) ((n)+1)
+#endif

#define cpumask_of_cpu(cpu) \
({ \
@@ -313,14 +311,10 @@ static inline void __cpus_remap(cpumask_
bitmap_remap(dstp->bits, srcp->bits, oldp->bits, newp->bits, nbits);
}

-#if NR_CPUS > 1
#define for_each_cpu_mask(cpu, mask) \
for ((cpu) = first_cpu(mask); \
(cpu) < NR_CPUS; \
(cpu) = next_cpu((cpu), (mask)))
-#else /* NR_CPUS == 1 */
-#define for_each_cpu_mask(cpu, mask) for ((cpu) = 0; (cpu) < 1; (cpu)++)
-#endif /* NR_CPUS */

/*
* The following particular system cpumasks and operations manage
diff --git a/kernel/cpu.c b/kernel/cpu.c
index e882c6b..0a9cc97 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -10,6 +10,7 @@
#include <linux/sched.h>
#include <linux/unistd.h>
#include <linux/cpu.h>
+#include <linux/cpumask.h>
#include <linux/module.h>
#include <linux/kthread.h>
#include <linux/stop_machine.h>
@@ -20,6 +21,18 @@ static DECLARE_MUTEX(cpucontrol);

static struct notifier_block *cpu_chain;

+int __first_cpu(const cpumask_t *srcp)
+{
+ return min_t(int, NR_CPUS, find_first_bit(srcp->bits, NR_CPUS));
+}
+EXPORT_SYMBOL(__first_cpu);
+
+int __next_cpu(int n, const cpumask_t *srcp)
+{
+ return min_t(int, NR_CPUS, find_next_bit(srcp->bits, NR_CPUS, n+1));
+}
+EXPORT_SYMBOL(__next_cpu);
+
#ifdef CONFIG_HOTPLUG_CPU
static struct task_struct *lock_cpu_hotplug_owner;
static int lock_cpu_hotplug_depth;

2006-03-19 22:38:33

by Andrew Morton

[permalink] [raw]
Subject: Re: OOPS: 2.6.16-rc6 cpufreq_conservative

Linus Torvalds <[email protected]> wrote:
>
> How does this patch look?
>

Wonderful - coz it's just like mine ;)

(4 patches temp-joined below)

- I made SMP=n next_cpu() just a constant "1".

- It's odd that the code uses (NR_CPUS>1) rather than CONFIG_SMP. On x86
(at least), SMP=y,NR_CPUS==1 isn't allowed anyway.



From: Andrew Morton <[email protected]>

text data bss dec hex filename
before: 3490577 1322408 360000 5172985 4eeef9 vmlinux
after: 3488027 1322496 360128 5170651 4ee5db vmlinux

Cc: Paul Jackson <[email protected]>
DESC
cpumask: uninline next_cpu()
EDESC
From: Andrew Morton <[email protected]>

text data bss dec hex filename
before: 3488027 1322496 360128 5170651 4ee5db vmlinux
after: 3485112 1322480 359968 5167560 4ed9c8 vmlinux

2931 bytes saved

Cc: Paul Jackson <[email protected]>
DESC
cpumask: uninline highest_possible_processor_id()
EDESC
From: Andrew Morton <[email protected]>

Shrinks the only caller (net/bridge/netfilter/ebtables.c) by 174 bytes.

Also, optimise highest_possible_processor_id() out of existence on
CONFIG_SMP=n.

Cc: Paul Jackson <[email protected]>
DESC
cpumask: uninline any_online_cpu()
EDESC
From: Andrew Morton <[email protected]>

text data bss dec hex filename
before: 3605597 1363528 363328 5332453 515de5 vmlinux
after: 3605295 1363612 363200 5332107 515c8b vmlinux

218 bytes saved.

Also, optimise any_online_cpu() out of existence on CONFIG_SMP=n.

This function seems inefficient. Can't we simply AND the two masks, then use
find_first_bit()?

Cc: Paul Jackson <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

include/linux/cpumask.h | 46 ++++++++++++++------------------------
lib/Makefile | 2 +
lib/cpumask.c | 45 +++++++++++++++++++++++++++++++++++++
3 files changed, 64 insertions(+), 29 deletions(-)

diff -puN include/linux/cpumask.h~cpumask-uninline-first_cpu include/linux/cpumask.h
--- devel/include/linux/cpumask.h~cpumask-uninline-first_cpu 2006-03-19 14:26:35.000000000 -0800
+++ devel-akpm/include/linux/cpumask.h 2006-03-19 14:28:45.000000000 -0800
@@ -212,17 +212,15 @@ static inline void __cpus_shift_left(cpu
bitmap_shift_left(dstp->bits, srcp->bits, n, nbits);
}

-#define first_cpu(src) __first_cpu(&(src), NR_CPUS)
-static inline int __first_cpu(const cpumask_t *srcp, int nbits)
-{
- return min_t(int, nbits, find_first_bit(srcp->bits, nbits));
-}
-
-#define next_cpu(n, src) __next_cpu((n), &(src), NR_CPUS)
-static inline int __next_cpu(int n, const cpumask_t *srcp, int nbits)
-{
- return min_t(int, nbits, find_next_bit(srcp->bits, nbits, n+1));
-}
+#ifdef CONFIG_SMP
+int __first_cpu(const cpumask_t *srcp);
+#define first_cpu(src) __first_cpu(&(src))
+int __next_cpu(int n, const cpumask_t *srcp);
+#define next_cpu(n, src) __next_cpu((n), &(src))
+#else
+#define first_cpu(src) 0
+#define next_cpu(n, src) 1
+#endif

#define cpumask_of_cpu(cpu) \
({ \
@@ -398,27 +396,17 @@ extern cpumask_t cpu_present_map;
#define cpu_present(cpu) ((cpu) == 0)
#endif

-#define any_online_cpu(mask) \
-({ \
- int cpu; \
- for_each_cpu_mask(cpu, (mask)) \
- if (cpu_online(cpu)) \
- break; \
- cpu; \
-})
+#ifdef CONFIG_SMP
+int highest_possible_processor_id(void);
+#define any_online_cpu(mask) __any_online_cpu(&(mask))
+int __any_online_cpu(const cpumask_t *mask);
+#else
+#define highest_possible_processor_id() 0
+#define any_online_cpu(mask) 0
+#endif

#define for_each_cpu(cpu) for_each_cpu_mask((cpu), cpu_possible_map)
#define for_each_online_cpu(cpu) for_each_cpu_mask((cpu), cpu_online_map)
#define for_each_present_cpu(cpu) for_each_cpu_mask((cpu), cpu_present_map)

-/* Find the highest possible smp_processor_id() */
-#define highest_possible_processor_id() \
-({ \
- unsigned int cpu, highest = 0; \
- for_each_cpu_mask(cpu, cpu_possible_map) \
- highest = cpu; \
- highest; \
-})
-
-
#endif /* __LINUX_CPUMASK_H */
diff -puN /dev/null lib/cpumask.c
--- /dev/null 2003-09-15 06:40:47.000000000 -0700
+++ devel-akpm/lib/cpumask.c 2006-03-19 14:28:45.000000000 -0800
@@ -0,0 +1,45 @@
+#include <linux/kernel.h>
+#include <linux/bitops.h>
+#include <linux/cpumask.h>
+#include <linux/module.h>
+
+int __first_cpu(const cpumask_t *srcp)
+{
+ return min_t(int, NR_CPUS, find_first_bit(srcp->bits, NR_CPUS));
+}
+EXPORT_SYMBOL(__first_cpu);
+
+int __next_cpu(int n, const cpumask_t *srcp)
+{
+ return min_t(int, NR_CPUS, find_next_bit(srcp->bits, NR_CPUS, n+1));
+}
+EXPORT_SYMBOL(__next_cpu);
+
+/*
+ * Find the highest possible smp_processor_id()
+ *
+ * Note: if we're prepared to assume that cpu_possible_map never changes
+ * (reasonable) then this function should cache its return value.
+ */
+int highest_possible_processor_id(void)
+{
+ unsigned int cpu;
+ unsigned highest = 0;
+
+ for_each_cpu_mask(cpu, cpu_possible_map)
+ highest = cpu;
+ return highest;
+}
+EXPORT_SYMBOL(highest_possible_processor_id);
+
+int __any_online_cpu(const cpumask_t *mask)
+{
+ int cpu;
+
+ for_each_cpu_mask(cpu, *mask) {
+ if (cpu_online(cpu))
+ break;
+ }
+ return cpu;
+}
+EXPORT_SYMBOL(__any_online_cpu);
diff -puN lib/Makefile~cpumask-uninline-first_cpu lib/Makefile
--- devel/lib/Makefile~cpumask-uninline-first_cpu 2006-03-19 14:26:35.000000000 -0800
+++ devel-akpm/lib/Makefile 2006-03-19 14:26:35.000000000 -0800
@@ -7,6 +7,8 @@ lib-y := errno.o ctype.o string.o vsprin
idr.o div64.o int_sqrt.o bitmap.o extable.o prio_tree.o \
sha1.o

+lib-$(CONFIG_SMP) += cpumask.o
+
lib-y += kobject.o kref.o kobject_uevent.o klist.o

obj-y += sort.o parser.o halfmd4.o iomap_copy.o
_

2006-03-19 22:47:04

by Linus Torvalds

[permalink] [raw]
Subject: Re: OOPS: 2.6.16-rc6 cpufreq_conservative



On Sun, 19 Mar 2006, Linus Torvalds wrote:
>
> How does this patch look? It makes "for_each_cpu()" look the same
> regardless of whether it is UP/SMP, by simply making first_cpu() and
> next_cpu() work appropriately.

It's still about 200 bytes larger than the inlined "smarter code", but
yeah, the unlinlining is certainly the smaller change and doesn't depend
on the compiler being as smart.

Here's another uninlining patch if you want it.

It doesn't make much of a difference on x86 (since the ffs/fls things are
inlined to single instructions rather than making use of the generic
routines, so the fact that we now link against the generic versions
actually increases code size that isn't ever used), but if anybody is
collecting patches that removes unnecessary inlining, it definitely looks
like the right thing for v850/sh/sh64/arm etc that use the generic
versions..

Linus

---
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 208650b..2d3ed35 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -1,74 +1,28 @@
#ifndef _LINUX_BITOPS_H
#define _LINUX_BITOPS_H
+
+#include <linux/linkage.h>
#include <asm/types.h>

/*
* ffs: find first bit set. This is defined the same way as
* the libc and compiler builtin ffs routines, therefore
* differs in spirit from the above ffz (man ffs).
+ *
+ * fls: find last bit set.
*/

-static inline int generic_ffs(int x)
-{
- int r = 1;
-
- if (!x)
- return 0;
- if (!(x & 0xffff)) {
- x >>= 16;
- r += 16;
- }
- if (!(x & 0xff)) {
- x >>= 8;
- r += 8;
- }
- if (!(x & 0xf)) {
- x >>= 4;
- r += 4;
- }
- if (!(x & 3)) {
- x >>= 2;
- r += 2;
- }
- if (!(x & 1)) {
- x >>= 1;
- r += 1;
- }
- return r;
-}
+extern int fastcall generic_ffs(int x);
+extern int fastcall generic_fls(int x);

/*
- * fls: find last bit set.
+ * hweightN: returns the hamming weight (i.e. the number
+ * of bits set) of a N-bit word
*/
-
-static __inline__ int generic_fls(int x)
-{
- int r = 32;
-
- if (!x)
- return 0;
- if (!(x & 0xffff0000u)) {
- x <<= 16;
- r -= 16;
- }
- if (!(x & 0xff000000u)) {
- x <<= 8;
- r -= 8;
- }
- if (!(x & 0xf0000000u)) {
- x <<= 4;
- r -= 4;
- }
- if (!(x & 0xc0000000u)) {
- x <<= 2;
- r -= 2;
- }
- if (!(x & 0x80000000u)) {
- x <<= 1;
- r -= 1;
- }
- return r;
-}
+unsigned int fastcall generic_hweight64(__u64 w);
+unsigned int fastcall generic_hweight32(unsigned int w);
+unsigned int fastcall generic_hweight16(unsigned int w);
+unsigned int fastcall generic_hweight8(unsigned int w);

/*
* Include this here because some architectures need generic_ffs/fls in
@@ -76,7 +30,6 @@ static __inline__ int generic_fls(int x)
*/
#include <asm/bitops.h>

-
static inline int generic_fls64(__u64 x)
{
__u32 h = x >> 32;
@@ -103,51 +56,6 @@ static __inline__ int get_count_order(un
return order;
}

-/*
- * hweightN: returns the hamming weight (i.e. the number
- * of bits set) of a N-bit word
- */
-
-static inline unsigned int generic_hweight32(unsigned int w)
-{
- unsigned int res = (w & 0x55555555) + ((w >> 1) & 0x55555555);
- res = (res & 0x33333333) + ((res >> 2) & 0x33333333);
- res = (res & 0x0F0F0F0F) + ((res >> 4) & 0x0F0F0F0F);
- res = (res & 0x00FF00FF) + ((res >> 8) & 0x00FF00FF);
- return (res & 0x0000FFFF) + ((res >> 16) & 0x0000FFFF);
-}
-
-static inline unsigned int generic_hweight16(unsigned int w)
-{
- unsigned int res = (w & 0x5555) + ((w >> 1) & 0x5555);
- res = (res & 0x3333) + ((res >> 2) & 0x3333);
- res = (res & 0x0F0F) + ((res >> 4) & 0x0F0F);
- return (res & 0x00FF) + ((res >> 8) & 0x00FF);
-}
-
-static inline unsigned int generic_hweight8(unsigned int w)
-{
- unsigned int res = (w & 0x55) + ((w >> 1) & 0x55);
- res = (res & 0x33) + ((res >> 2) & 0x33);
- return (res & 0x0F) + ((res >> 4) & 0x0F);
-}
-
-static inline unsigned long generic_hweight64(__u64 w)
-{
-#if BITS_PER_LONG < 64
- return generic_hweight32((unsigned int)(w >> 32)) +
- generic_hweight32((unsigned int)w);
-#else
- u64 res;
- res = (w & 0x5555555555555555ul) + ((w >> 1) & 0x5555555555555555ul);
- res = (res & 0x3333333333333333ul) + ((res >> 2) & 0x3333333333333333ul);
- res = (res & 0x0F0F0F0F0F0F0F0Ful) + ((res >> 4) & 0x0F0F0F0F0F0F0F0Ful);
- res = (res & 0x00FF00FF00FF00FFul) + ((res >> 8) & 0x00FF00FF00FF00FFul);
- res = (res & 0x0000FFFF0000FFFFul) + ((res >> 16) & 0x0000FFFF0000FFFFul);
- return (res & 0x00000000FFFFFFFFul) + ((res >> 32) & 0x00000000FFFFFFFFul);
-#endif
-}
-
static inline unsigned long hweight_long(unsigned long w)
{
return sizeof(w) == 4 ? generic_hweight32(w) : generic_hweight64(w);
diff --git a/lib/Makefile b/lib/Makefile
index 648b2c1..bba36f1 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -5,7 +5,7 @@
lib-y := errno.o ctype.o string.o vsprintf.o cmdline.o \
bust_spinlocks.o rbtree.o radix-tree.o dump_stack.o \
idr.o div64.o int_sqrt.o bitmap.o extable.o prio_tree.o \
- sha1.o
+ sha1.o bitops.o

lib-y += kobject.o kref.o kobject_uevent.o klist.o

diff --git a/lib/bitops.c b/lib/bitops.c
new file mode 100644
index 0000000..e5e4cab
--- /dev/null
+++ b/lib/bitops.c
@@ -0,0 +1,106 @@
+#include <linux/bitops.h>
+#include <linux/module.h>
+
+int generic_ffs(int x)
+{
+ int r = 1;
+
+ if (!x)
+ return 0;
+ if (!(x & 0xffff)) {
+ x >>= 16;
+ r += 16;
+ }
+ if (!(x & 0xff)) {
+ x >>= 8;
+ r += 8;
+ }
+ if (!(x & 0xf)) {
+ x >>= 4;
+ r += 4;
+ }
+ if (!(x & 3)) {
+ x >>= 2;
+ r += 2;
+ }
+ if (!(x & 1)) {
+ x >>= 1;
+ r += 1;
+ }
+ return r;
+}
+EXPORT_SYMBOL(generic_ffs);
+
+int generic_fls(int x)
+{
+ int r = 32;
+
+ if (!x)
+ return 0;
+ if (!(x & 0xffff0000u)) {
+ x <<= 16;
+ r -= 16;
+ }
+ if (!(x & 0xff000000u)) {
+ x <<= 8;
+ r -= 8;
+ }
+ if (!(x & 0xf0000000u)) {
+ x <<= 4;
+ r -= 4;
+ }
+ if (!(x & 0xc0000000u)) {
+ x <<= 2;
+ r -= 2;
+ }
+ if (!(x & 0x80000000u)) {
+ x <<= 1;
+ r -= 1;
+ }
+ return r;
+}
+EXPORT_SYMBOL(generic_fls);
+
+unsigned int generic_hweight32(unsigned int w)
+{
+ unsigned int res = (w & 0x55555555) + ((w >> 1) & 0x55555555);
+ res = (res & 0x33333333) + ((res >> 2) & 0x33333333);
+ res = (res & 0x0F0F0F0F) + ((res >> 4) & 0x0F0F0F0F);
+ res = (res & 0x00FF00FF) + ((res >> 8) & 0x00FF00FF);
+ return (res & 0x0000FFFF) + ((res >> 16) & 0x0000FFFF);
+}
+EXPORT_SYMBOL(generic_hweight32);
+
+unsigned int generic_hweight16(unsigned int w)
+{
+ unsigned int res = (w & 0x5555) + ((w >> 1) & 0x5555);
+ res = (res & 0x3333) + ((res >> 2) & 0x3333);
+ res = (res & 0x0F0F) + ((res >> 4) & 0x0F0F);
+ return (res & 0x00FF) + ((res >> 8) & 0x00FF);
+}
+EXPORT_SYMBOL(generic_hweight16);
+
+unsigned int generic_hweight8(unsigned int w)
+{
+ unsigned int res = (w & 0x55) + ((w >> 1) & 0x55);
+ res = (res & 0x33) + ((res >> 2) & 0x33);
+ return (res & 0x0F) + ((res >> 4) & 0x0F);
+}
+EXPORT_SYMBOL(generic_hweight8);
+
+unsigned int generic_hweight64(__u64 w)
+{
+#if BITS_PER_LONG < 64
+ return generic_hweight32((unsigned int)(w >> 32)) +
+ generic_hweight32((unsigned int)w);
+#else
+ u64 res;
+ res = (w & 0x5555555555555555ul) + ((w >> 1) & 0x5555555555555555ul);
+ res = (res & 0x3333333333333333ul) + ((res >> 2) & 0x3333333333333333ul);
+ res = (res & 0x0F0F0F0F0F0F0F0Ful) + ((res >> 4) & 0x0F0F0F0F0F0F0F0Ful);
+ res = (res & 0x00FF00FF00FF00FFul) + ((res >> 8) & 0x00FF00FF00FF00FFul);
+ res = (res & 0x0000FFFF0000FFFFul) + ((res >> 16) & 0x0000FFFF0000FFFFul);
+ return (res & 0x00000000FFFFFFFFul) + ((res >> 32) & 0x00000000FFFFFFFFul);
+#endif
+}
+EXPORT_SYMBOL(generic_hweight64);

2006-03-19 22:55:28

by Linus Torvalds

[permalink] [raw]
Subject: Re: OOPS: 2.6.16-rc6 cpufreq_conservative



On Sun, 19 Mar 2006, Andrew Morton wrote:
>
> Also, optimise any_online_cpu() out of existence on CONFIG_SMP=n.
>
> This function seems inefficient. Can't we simply AND the two masks, then use
> find_first_bit()?

Then you'd need to generate a temporary cpumask thing. Not a big deal as
long as it fits in an "unsigned long", but since the online-cpu thing is
likely dense in any relevant cpu-mask, I actually think "any_online_cpu()"
as it stands now is likely to be simpler/more efficient than generating a
temporary mask.

Linus

2006-03-19 23:08:06

by Andrew Morton

[permalink] [raw]
Subject: Re: OOPS: 2.6.16-rc6 cpufreq_conservative

Linus Torvalds <[email protected]> wrote:
>
> Here's another uninlining patch if you want it.
>

Yes, the bitops need that.

We have a 50-patch series against the bitops code queued up. It's from
Akinobu Mita. It consolidates of all the C-coded operations which
archtectures are presently duplicating. In toto:

80 files changed, 1271 insertions(+), 4999 deletions(-)

It does include uninlining of the hweight functions, although I note that
include/asm-generic/bitops/__ffs.h remains inlined. I don't know how many
architectures are using the generic code though.

2006-03-20 06:12:41

by Willy Tarreau

[permalink] [raw]
Subject: Re: OOPS: 2.6.16-rc6 cpufreq_conservative

On Sun, Mar 19, 2006 at 11:33:17AM -0800, Linus Torvalds wrote:

> Of course, I shouldn't say "works", since it is still totally untested. It
> _looks_ good, and that statement expression usage is just _so_ ugly it's
> cute.
>
> Linus

At least, you could have moved the macro closer to where it's used.
It's very uncommon to break a statement within an if condition, and
it's not expected that the macro you're calling does a break under
you. It took me several minutes to understand precisely how this
works. Now it seems trivial, but I guess that at 3am I would have
gone to bed instead.

One first enhancement would be to make it easier to understand
by putting it closer to its user :

#elif NR_CPUS > 1
#define check_for_each_cpu(cpu, mask) \
({ unsigned long __bits = (mask).bits[0] >> (cpu); if (!__bits) break; __bits & 1; })

#define for_each_cpu_mask(cpu, mask) \
for ((cpu) = 0; (cpu) < NR_CPUS; (cpu)++) \
if (!check_for_each_cpu(cpu, mask)) \
continue; \
else

Now, does removing the macro completely change the output code ?
I think that if something written like this produces the same
code, it would be easier to read :

#define for_each_cpu_mask(cpu, mask) \
for ((cpu) = 0; (cpu) < NR_CPUS; (cpu)++) { \
unsigned long __bits = (mask).bits[0] >> (cpu); \
if (!__bits) \
break; \
if (!__bits & 1) \
continue; \
else

Regards,
Willy

2006-03-20 06:29:58

by Linus Torvalds

[permalink] [raw]
Subject: Re: OOPS: 2.6.16-rc6 cpufreq_conservative



On Mon, 20 Mar 2006, Willy Tarreau wrote:
>
> Now, does removing the macro completely change the output code ?
> I think that if something written like this produces the same
> code, it would be easier to read :
>
> #define for_each_cpu_mask(cpu, mask) \
> for ((cpu) = 0; (cpu) < NR_CPUS; (cpu)++) { \
> unsigned long __bits = (mask).bits[0] >> (cpu); \
> if (!__bits) \
> break; \
> if (!__bits & 1) \
> continue; \
> else

Absolutely, but now it has a dangling "{" that didn't get closed. So the
above would definitely be more readable, it just doesn't actually work.

Unless you'd do the "end_for_each_cpu" define (to close the statement),
and update the 300+ places that use this. Which might well be worth it.

So the subtle "break from the middle of a statement expression" was just a
rather hacky way to avoid having to change all the users of this macro.

Linus

2006-03-20 07:20:42

by Willy Tarreau

[permalink] [raw]
Subject: Re: OOPS: 2.6.16-rc6 cpufreq_conservative

On Sun, Mar 19, 2006 at 10:26:30PM -0800, Linus Torvalds wrote:
>
>
> On Mon, 20 Mar 2006, Willy Tarreau wrote:
> >
> > Now, does removing the macro completely change the output code ?
> > I think that if something written like this produces the same
> > code, it would be easier to read :
> >
> > #define for_each_cpu_mask(cpu, mask) \
> > for ((cpu) = 0; (cpu) < NR_CPUS; (cpu)++) { \
> > unsigned long __bits = (mask).bits[0] >> (cpu); \
> > if (!__bits) \
> > break; \
> > if (!__bits & 1) \
> > continue; \
> > else
>
> Absolutely, but now it has a dangling "{" that didn't get closed. So the
> above would definitely be more readable, it just doesn't actually work.
>
> Unless you'd do the "end_for_each_cpu" define (to close the statement),
> and update the 300+ places that use this. Which might well be worth it.
>
> So the subtle "break from the middle of a statement expression" was just a
> rather hacky way to avoid having to change all the users of this macro.
>
> Linus

Oh, you're right, now I understand your motivation in doing this.
Then perhaps using your trick but applying it to the whole for loop
would make it easier to read ?

#define for_each_cpu_mask(cpu, mask) \
for ((cpu) = 0; (cpu) < NR_CPUS; (cpu)++) \
({ unsigned long __bits = (mask).bits[0] >> (cpu); \
if (!__bits) \
break; \
if (!__bits & 1) \
continue; \
else \
... \
})

Please note that I've not read the rest of the code, so there
may be some problems left. However, if the above works, I find
it easier to read. And in this case, yes, it's interesting to
be able to break from within an expression.

Cheers,
Willy

2006-03-20 08:22:12

by Peter Breuer

[permalink] [raw]
Subject: Re: OOPS: 2.6.16-rc6 cpufreq_conservative

In article <[email protected]> you wrote:
> On Sun, Mar 19, 2006 at 11:33:17AM -0800, Linus Torvalds wrote:
>> Of course, I shouldn't say "works", since it is still totally untested. It
>> _looks_ good, and that statement expression usage is just _so_ ugly it's
>> cute.

> #define for_each_cpu_mask(cpu, mask) \
> for ((cpu) = 0; (cpu) < NR_CPUS; (cpu)++) { \
> unsigned long __bits = (mask).bits[0] >> (cpu); \
> if (!__bits) \
> break; \
> if (!__bits & 1) \
> continue; \
> else

While that's slightly wrong (needs a closing } supplied by the user), it
does inspire me to point out that one can put the skips in the ordinary
statement part of the for and use the if else idea to make sure that the
for needs just one statement following (i.e. no dangling } supplied by
the user)

#define for_each_cpu_mask(cpu, mask) \
for ((cpu) = 0; (cpu) < NR_CPUS; (cpu)++) \
if (!((mask).bits[0] >> (cpu)) { \
break; \
} else if (!((mask).bits[0] >> (cpu)) & 1) { \
continue; \
} else

I expect common subexpression optimization in the compiler to remove the
repetition here. If not, somebody else can think about it!




Peter

2006-03-21 06:36:31

by Willy Tarreau

[permalink] [raw]
Subject: Re: OOPS: 2.6.16-rc6 cpufreq_conservative

On Mon, Mar 20, 2006 at 08:18:46AM +0100, Willy Tarreau wrote:
> On Sun, Mar 19, 2006 at 10:26:30PM -0800, Linus Torvalds wrote:
> >
> >
> > On Mon, 20 Mar 2006, Willy Tarreau wrote:
> > >
> > > Now, does removing the macro completely change the output code ?
> > > I think that if something written like this produces the same
> > > code, it would be easier to read :
> > >
> > > #define for_each_cpu_mask(cpu, mask) \
> > > for ((cpu) = 0; (cpu) < NR_CPUS; (cpu)++) { \
> > > unsigned long __bits = (mask).bits[0] >> (cpu); \
> > > if (!__bits) \
> > > break; \
> > > if (!__bits & 1) \
> > > continue; \
> > > else
> >
> > Absolutely, but now it has a dangling "{" that didn't get closed. So the
> > above would definitely be more readable, it just doesn't actually work.
> >
> > Unless you'd do the "end_for_each_cpu" define (to close the statement),
> > and update the 300+ places that use this. Which might well be worth it.
> >
> > So the subtle "break from the middle of a statement expression" was just a
> > rather hacky way to avoid having to change all the users of this macro.
> >
> > Linus
>
> Oh, you're right, now I understand your motivation in doing this.
> Then perhaps using your trick but applying it to the whole for loop
> would make it easier to read ?
>
> #define for_each_cpu_mask(cpu, mask) \
> for ((cpu) = 0; (cpu) < NR_CPUS; (cpu)++) \
> ({ unsigned long __bits = (mask).bits[0] >> (cpu); \
> if (!__bits) \
> break; \
> if (!__bits & 1) \
> continue; \
> else \
> ... \
> })

Well, forget it !
I did not realize that the 'else' was what called the inner loop.
So this construct it not possible at all either for the same reason.

Sorry for not having read the code before posting.

Cheers,
Willy