2009-04-04 14:34:31

by Masami Hiramatsu

[permalink] [raw]
Subject: [BUG][-tip] kprobes on module functions hits kernel BUG in text_poke on x86-32

Hi,

I found text_poke() problem on x86-32 with the latest-tip tree.
When I put a kprobe on a module function, text_poke() hit a BUG.

This bug can be reproduced on x86-32, but not on x86-64.
And inserting kprobes on a kernel-core function is OK.

Thank you,

------------[ cut here ]------------
kernel BUG at /home/mhiramat/ksrc/linux-2.6-tip/arch/x86/kernel/alternative.c:543!
invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/0000:02:00.0/net/eth0/broadcast
Modules linked in: probe_bench(+) netconsole configfs sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_state
nf_conntrack iptable_filter ip_tables ip6table_filter ip6_tables ipv6 cpufreq_ondemand powernow_k8 dm_mirror
dm_region_hash dm_log dm_multipath uinput snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_seq_dummy
snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore dcdbas
i2c_nforce2 pcspkr tg3 snd_page_alloc rtc_cmos rtc_core i2c_core libphy rtc_lib ata_generic pata_acpi sata_nv [last
unloaded: scsi_wait_scan]

Pid: 5411, comm: insmod Not tainted (2.6.29-tip #8) OptiPlex 740
EIP: 0060:[<c06e97ef>] EFLAGS: 00210893 CPU: 0
EIP is at text_poke+0x168/0x1a4
EAX: 00040f55 EBX: 00020800 ECX: f45d8f03 EDX: 00000000
ESI: f45d8f04 EDI: ffc58001 EBP: f45d8ef8 ESP: f45d8ed8
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process insmod (pid: 5411, ti=f45d8000 task=f4490000 task.ti=f45d8000)
Stack:
00000001 f45d8f03 f8464000 00200286 00000000 00000000 00000000 f846492c
f45d8f04 c06ea30d cc46492c f45d8f30 c06eb4bb 000000d8 f8464934 f84647e4
f84647e4 00936fc0 f45d8f30 f84647e4 fffffffc 00936fc0 f45d8f40 f8464110
Call Trace:
[<f8464000>] ? dummy_function+0x0/0xb [probe_bench]
[<c06ea30d>] ? arch_arm_kprobe+0x1a/0x1c
[<c06eb4bb>] ? register_kprobe+0x3b8/0x40a
[<f8464110>] ? install_probe+0x31/0x13d [probe_bench]
[<c040304f>] ? do_one_initcall+0x4a/0x11a
[<f84640df>] ? install_probe+0x0/0x13d [probe_bench]
[<c044b8fe>] ? up_read+0x16/0x2c
[<c044c14e>] ? __blocking_notifier_call_chain+0x40/0x4c
[<c045e5e4>] ? sys_init_module+0x89/0x18c
[<c0407b44>] ? sysenter_do_call+0x12/0x38
Code: 00 00 6a 00 ff 15 04 83 85 c0 58 5a e8 86 82 d3 ff 90 b8 01 00 00 00 0f a2 31 d2 eb 13 8b 4d e8 8a 04 11 8b 4d e4
3a 04 11 74 04 <0f> 0b eb fe 42 3b 55 e0 72 e8 f7 45 ec 00 02 00 00 75 10 8b 45
EIP: [<c06e97ef>] text_poke+0x168/0x1a4 SS:ESP 0068:f45d8ed8
---[ end trace 12f1ca8c7f7964a0 ]---
Kernel panic - not syncing: Fatal exception
Pid: 5411, comm: insmod Tainted: G D 2.6.29-tip #8
Call Trace:
[<c06e6081>] ? printk+0xf/0x16
[<c06e5fc8>] panic+0x39/0xe3
[<c06e9526>] oops_end+0x96/0xa5
[<c040a486>] die+0x54/0x5a
[<c06e8e43>] do_trap+0x89/0xa2
[<c0408bbe>] ? do_invalid_op+0x0/0x7b
[<c0408c2f>] do_invalid_op+0x71/0x7b
[<c06e97ef>] ? text_poke+0x168/0x1a4
[<c045567d>] ? mark_lock+0x1e/0x1e0
[<c042672c>] ? set_pte_vaddr+0xac/0xcf
[<c054d6fc>] ? trace_hardirqs_off_thunk+0xc/0x10
[<c06e8bfa>] error_code+0x72/0x78
[<c0408bbe>] ? do_invalid_op+0x0/0x7b
[<c06e97ef>] ? text_poke+0x168/0x1a4
[<f8464000>] ? dummy_function+0x0/0xb [probe_bench]
[<c06ea30d>] arch_arm_kprobe+0x1a/0x1c
[<c06eb4bb>] register_kprobe+0x3b8/0x40a
[<f8464110>] install_probe+0x31/0x13d [probe_bench]
[<c040304f>] do_one_initcall+0x4a/0x11a
[<f84640df>] ? install_probe+0x0/0x13d [probe_bench]
[<c044b8fe>] ? up_read+0x16/0x2c
[<c044c14e>] ? __blocking_notifier_call_chain+0x40/0x4c
[<c045e5e4>] sys_init_module+0x89/0x18c
[<c0407b44>] sysenter_do_call+0x12/0x38
Rebooting in 5 seconds..

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: [email protected]


2009-04-04 15:42:46

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [BUG][-tip] kprobes on module functions hits kernel BUG in text_poke on x86-32

* Masami Hiramatsu ([email protected]) wrote:
> Hi,
>
> I found text_poke() problem on x86-32 with the latest-tip tree.
> When I put a kprobe on a module function, text_poke() hit a BUG.
>
> This bug can be reproduced on x86-32, but not on x86-64.
> And inserting kprobes on a kernel-core function is OK.
>
> Thank you,
>

Hi Masami,

OK, so text_poke safety net saves the day :)

Basically, what we have here is the BUG_ON I have put :

for (i = 0; i < len; i++)
BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]);

Which checks that the modification is really preceivable in the kernel
code, triggers this bug. Only for modules you say.

It might not be this, but.. let's try something simple (this could be
completely unrelated, but won't take long to test): can you try to add a
vmalloc_sync_all() at the beginning of text_poke ? This would make sure
that lazily-populated TLB entries, which include module code and data on
x86, will be populated. I wonder if we hit this problem because
vmalloc_to_page would be returning a mapping to a yet unpopulated TLB
entry, if it is ever possible.

If that's not this, then I guess we have some problem with setting a
fixmap to a page returned by vmalloc on x86 32.

Mathieu


> ------------[ cut here ]------------
> kernel BUG at /home/mhiramat/ksrc/linux-2.6-tip/arch/x86/kernel/alternative.c:543!
> invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
> last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/0000:02:00.0/net/eth0/broadcast
> Modules linked in: probe_bench(+) netconsole configfs sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_state
> nf_conntrack iptable_filter ip_tables ip6table_filter ip6_tables ipv6 cpufreq_ondemand powernow_k8 dm_mirror
> dm_region_hash dm_log dm_multipath uinput snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_seq_dummy
> snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore dcdbas
> i2c_nforce2 pcspkr tg3 snd_page_alloc rtc_cmos rtc_core i2c_core libphy rtc_lib ata_generic pata_acpi sata_nv [last
> unloaded: scsi_wait_scan]
>
> Pid: 5411, comm: insmod Not tainted (2.6.29-tip #8) OptiPlex 740
> EIP: 0060:[<c06e97ef>] EFLAGS: 00210893 CPU: 0
> EIP is at text_poke+0x168/0x1a4
> EAX: 00040f55 EBX: 00020800 ECX: f45d8f03 EDX: 00000000
> ESI: f45d8f04 EDI: ffc58001 EBP: f45d8ef8 ESP: f45d8ed8
> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process insmod (pid: 5411, ti=f45d8000 task=f4490000 task.ti=f45d8000)
> Stack:
> 00000001 f45d8f03 f8464000 00200286 00000000 00000000 00000000 f846492c
> f45d8f04 c06ea30d cc46492c f45d8f30 c06eb4bb 000000d8 f8464934 f84647e4
> f84647e4 00936fc0 f45d8f30 f84647e4 fffffffc 00936fc0 f45d8f40 f8464110
> Call Trace:
> [<f8464000>] ? dummy_function+0x0/0xb [probe_bench]
> [<c06ea30d>] ? arch_arm_kprobe+0x1a/0x1c
> [<c06eb4bb>] ? register_kprobe+0x3b8/0x40a
> [<f8464110>] ? install_probe+0x31/0x13d [probe_bench]
> [<c040304f>] ? do_one_initcall+0x4a/0x11a
> [<f84640df>] ? install_probe+0x0/0x13d [probe_bench]
> [<c044b8fe>] ? up_read+0x16/0x2c
> [<c044c14e>] ? __blocking_notifier_call_chain+0x40/0x4c
> [<c045e5e4>] ? sys_init_module+0x89/0x18c
> [<c0407b44>] ? sysenter_do_call+0x12/0x38
> Code: 00 00 6a 00 ff 15 04 83 85 c0 58 5a e8 86 82 d3 ff 90 b8 01 00 00 00 0f a2 31 d2 eb 13 8b 4d e8 8a 04 11 8b 4d e4
> 3a 04 11 74 04 <0f> 0b eb fe 42 3b 55 e0 72 e8 f7 45 ec 00 02 00 00 75 10 8b 45
> EIP: [<c06e97ef>] text_poke+0x168/0x1a4 SS:ESP 0068:f45d8ed8
> ---[ end trace 12f1ca8c7f7964a0 ]---
> Kernel panic - not syncing: Fatal exception
> Pid: 5411, comm: insmod Tainted: G D 2.6.29-tip #8
> Call Trace:
> [<c06e6081>] ? printk+0xf/0x16
> [<c06e5fc8>] panic+0x39/0xe3
> [<c06e9526>] oops_end+0x96/0xa5
> [<c040a486>] die+0x54/0x5a
> [<c06e8e43>] do_trap+0x89/0xa2
> [<c0408bbe>] ? do_invalid_op+0x0/0x7b
> [<c0408c2f>] do_invalid_op+0x71/0x7b
> [<c06e97ef>] ? text_poke+0x168/0x1a4
> [<c045567d>] ? mark_lock+0x1e/0x1e0
> [<c042672c>] ? set_pte_vaddr+0xac/0xcf
> [<c054d6fc>] ? trace_hardirqs_off_thunk+0xc/0x10
> [<c06e8bfa>] error_code+0x72/0x78
> [<c0408bbe>] ? do_invalid_op+0x0/0x7b
> [<c06e97ef>] ? text_poke+0x168/0x1a4
> [<f8464000>] ? dummy_function+0x0/0xb [probe_bench]
> [<c06ea30d>] arch_arm_kprobe+0x1a/0x1c
> [<c06eb4bb>] register_kprobe+0x3b8/0x40a
> [<f8464110>] install_probe+0x31/0x13d [probe_bench]
> [<c040304f>] do_one_initcall+0x4a/0x11a
> [<f84640df>] ? install_probe+0x0/0x13d [probe_bench]
> [<c044b8fe>] ? up_read+0x16/0x2c
> [<c044c14e>] ? __blocking_notifier_call_chain+0x40/0x4c
> [<c045e5e4>] sys_init_module+0x89/0x18c
> [<c0407b44>] sysenter_do_call+0x12/0x38
> Rebooting in 5 seconds..
>
> --
> Masami Hiramatsu
>
> Software Engineer
> Hitachi Computer Products (America) Inc.
> Software Solutions Division
>
> e-mail: [email protected]
>
>

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2009-04-04 18:28:39

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [BUG][-tip] kprobes on module functions hits kernel BUG in text_poke on x86-32

Mathieu Desnoyers wrote:
> * Masami Hiramatsu ([email protected]) wrote:
>> Hi,
>>
>> I found text_poke() problem on x86-32 with the latest-tip tree.
>> When I put a kprobe on a module function, text_poke() hit a BUG.
>>
>> This bug can be reproduced on x86-32, but not on x86-64.
>> And inserting kprobes on a kernel-core function is OK.
>>
>> Thank you,
>>
>
> Hi Masami,
>
> OK, so text_poke safety net saves the day :)
>
> Basically, what we have here is the BUG_ON I have put :
>
> for (i = 0; i < len; i++)
> BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]);
>
> Which checks that the modification is really preceivable in the kernel
> code, triggers this bug. Only for modules you say.
>
> It might not be this, but.. let's try something simple (this could be
> completely unrelated, but won't take long to test): can you try to add a
> vmalloc_sync_all() at the beginning of text_poke ? This would make sure
> that lazily-populated TLB entries, which include module code and data on
> x86, will be populated. I wonder if we hit this problem because
> vmalloc_to_page would be returning a mapping to a yet unpopulated TLB
> entry, if it is ever possible.

Hmm, from the result of my test, vmalloc_sync_all() didn't change anything...

> If that's not this, then I guess we have some problem with setting a
> fixmap to a page returned by vmalloc on x86 32.

I investigate it a bit deeper. I compared fixmap's page* and original
which vmalloc_to_page returns(because vmalloc_to_page just decode
current pagetable).

I added a printk right after set_fixmap, which shows below message.

fixmap:<fixmap's vaddr>:<page* by vmalloc_to_page>, \
orig:<original vaddr>:<page* passed to fixmap>

When I probe a module address, I got this;
fixmap:ffc58000:c1db1ae4, orig:f84a1000:c59b1ae4

on the other hand, when probing a kernel addrees, I got this;
fixmap:ffc58000:c129e01c, orig:c048946a:c129e01c

I guess this means that set_fixmap didn't set a correct page or
page_to_phys() returned incorrect phys address.

Thank you,

>
> Mathieu
>
>
>> ------------[ cut here ]------------
>> kernel BUG at /home/mhiramat/ksrc/linux-2.6-tip/arch/x86/kernel/alternative.c:543!
>> invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
>> last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/0000:02:00.0/net/eth0/broadcast
>> Modules linked in: probe_bench(+) netconsole configfs sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_state
>> nf_conntrack iptable_filter ip_tables ip6table_filter ip6_tables ipv6 cpufreq_ondemand powernow_k8 dm_mirror
>> dm_region_hash dm_log dm_multipath uinput snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_seq_dummy
>> snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore dcdbas
>> i2c_nforce2 pcspkr tg3 snd_page_alloc rtc_cmos rtc_core i2c_core libphy rtc_lib ata_generic pata_acpi sata_nv [last
>> unloaded: scsi_wait_scan]
>>
>> Pid: 5411, comm: insmod Not tainted (2.6.29-tip #8) OptiPlex 740
>> EIP: 0060:[<c06e97ef>] EFLAGS: 00210893 CPU: 0
>> EIP is at text_poke+0x168/0x1a4
>> EAX: 00040f55 EBX: 00020800 ECX: f45d8f03 EDX: 00000000
>> ESI: f45d8f04 EDI: ffc58001 EBP: f45d8ef8 ESP: f45d8ed8
>> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
>> Process insmod (pid: 5411, ti=f45d8000 task=f4490000 task.ti=f45d8000)
>> Stack:
>> 00000001 f45d8f03 f8464000 00200286 00000000 00000000 00000000 f846492c
>> f45d8f04 c06ea30d cc46492c f45d8f30 c06eb4bb 000000d8 f8464934 f84647e4
>> f84647e4 00936fc0 f45d8f30 f84647e4 fffffffc 00936fc0 f45d8f40 f8464110
>> Call Trace:
>> [<f8464000>] ? dummy_function+0x0/0xb [probe_bench]
>> [<c06ea30d>] ? arch_arm_kprobe+0x1a/0x1c
>> [<c06eb4bb>] ? register_kprobe+0x3b8/0x40a
>> [<f8464110>] ? install_probe+0x31/0x13d [probe_bench]
>> [<c040304f>] ? do_one_initcall+0x4a/0x11a
>> [<f84640df>] ? install_probe+0x0/0x13d [probe_bench]
>> [<c044b8fe>] ? up_read+0x16/0x2c
>> [<c044c14e>] ? __blocking_notifier_call_chain+0x40/0x4c
>> [<c045e5e4>] ? sys_init_module+0x89/0x18c
>> [<c0407b44>] ? sysenter_do_call+0x12/0x38
>> Code: 00 00 6a 00 ff 15 04 83 85 c0 58 5a e8 86 82 d3 ff 90 b8 01 00 00 00 0f a2 31 d2 eb 13 8b 4d e8 8a 04 11 8b 4d e4
>> 3a 04 11 74 04 <0f> 0b eb fe 42 3b 55 e0 72 e8 f7 45 ec 00 02 00 00 75 10 8b 45
>> EIP: [<c06e97ef>] text_poke+0x168/0x1a4 SS:ESP 0068:f45d8ed8
>> ---[ end trace 12f1ca8c7f7964a0 ]---
>> Kernel panic - not syncing: Fatal exception
>> Pid: 5411, comm: insmod Tainted: G D 2.6.29-tip #8
>> Call Trace:
>> [<c06e6081>] ? printk+0xf/0x16
>> [<c06e5fc8>] panic+0x39/0xe3
>> [<c06e9526>] oops_end+0x96/0xa5
>> [<c040a486>] die+0x54/0x5a
>> [<c06e8e43>] do_trap+0x89/0xa2
>> [<c0408bbe>] ? do_invalid_op+0x0/0x7b
>> [<c0408c2f>] do_invalid_op+0x71/0x7b
>> [<c06e97ef>] ? text_poke+0x168/0x1a4
>> [<c045567d>] ? mark_lock+0x1e/0x1e0
>> [<c042672c>] ? set_pte_vaddr+0xac/0xcf
>> [<c054d6fc>] ? trace_hardirqs_off_thunk+0xc/0x10
>> [<c06e8bfa>] error_code+0x72/0x78
>> [<c0408bbe>] ? do_invalid_op+0x0/0x7b
>> [<c06e97ef>] ? text_poke+0x168/0x1a4
>> [<f8464000>] ? dummy_function+0x0/0xb [probe_bench]
>> [<c06ea30d>] arch_arm_kprobe+0x1a/0x1c
>> [<c06eb4bb>] register_kprobe+0x3b8/0x40a
>> [<f8464110>] install_probe+0x31/0x13d [probe_bench]
>> [<c040304f>] do_one_initcall+0x4a/0x11a
>> [<f84640df>] ? install_probe+0x0/0x13d [probe_bench]
>> [<c044b8fe>] ? up_read+0x16/0x2c
>> [<c044c14e>] ? __blocking_notifier_call_chain+0x40/0x4c
>> [<c045e5e4>] sys_init_module+0x89/0x18c
>> [<c0407b44>] sysenter_do_call+0x12/0x38
>> Rebooting in 5 seconds..
>>
>> --
>> Masami Hiramatsu
>>
>> Software Engineer
>> Hitachi Computer Products (America) Inc.
>> Software Solutions Division
>>
>> e-mail: [email protected]
>>
>>
>

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: [email protected]

2009-04-04 19:03:50

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [BUG][-tip] kprobes on module functions hits kernel BUG in text_poke on x86-32

Masami Hiramatsu wrote:
> Mathieu Desnoyers wrote:
>> * Masami Hiramatsu ([email protected]) wrote:
>>> Hi,
>>>
>>> I found text_poke() problem on x86-32 with the latest-tip tree.
>>> When I put a kprobe on a module function, text_poke() hit a BUG.
>>>
>>> This bug can be reproduced on x86-32, but not on x86-64.
>>> And inserting kprobes on a kernel-core function is OK.
>>>
>>> Thank you,
>>>
>> Hi Masami,
>>
>> OK, so text_poke safety net saves the day :)
>>
>> Basically, what we have here is the BUG_ON I have put :
>>
>> for (i = 0; i < len; i++)
>> BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]);
>>
>> Which checks that the modification is really preceivable in the kernel
>> code, triggers this bug. Only for modules you say.
>>
>> It might not be this, but.. let's try something simple (this could be
>> completely unrelated, but won't take long to test): can you try to add a
>> vmalloc_sync_all() at the beginning of text_poke ? This would make sure
>> that lazily-populated TLB entries, which include module code and data on
>> x86, will be populated. I wonder if we hit this problem because
>> vmalloc_to_page would be returning a mapping to a yet unpopulated TLB
>> entry, if it is ever possible.
>
> Hmm, from the result of my test, vmalloc_sync_all() didn't change anything...
>
>> If that's not this, then I guess we have some problem with setting a
>> fixmap to a page returned by vmalloc on x86 32.
>
> I investigate it a bit deeper. I compared fixmap's page* and original
> which vmalloc_to_page returns(because vmalloc_to_page just decode
> current pagetable).
>
> I added a printk right after set_fixmap, which shows below message.
>
> fixmap:<fixmap's vaddr>:<page* by vmalloc_to_page>, \
> orig:<original vaddr>:<page* passed to fixmap>
>
> When I probe a module address, I got this;
> fixmap:ffc58000:c1db1ae4, orig:f84a1000:c59b1ae4
>
> on the other hand, when probing a kernel addrees, I got this;
> fixmap:ffc58000:c129e01c, orig:c048946a:c129e01c
>
> I guess this means that set_fixmap didn't set a correct page or
> page_to_phys() returned incorrect phys address.

Oops, I found a funny truth,

fixmap:ffc58000:c1db1670, orig:f83fb000:c59b1670
orig page c59b1670, phys 12f8a4000
fixmap page c1db1670, phys 2f8a4000

page means (struct page *) value, and phys means
its physical address.

You can see set_fixmap() cuts higher bits than 32 bit in fixmap.h

----
void native_set_fixmap(enum fixed_addresses idx,
unsigned long phys, pgprot_t flags);

#ifndef CONFIG_PARAVIRT
static inline void __set_fixmap(enum fixed_addresses idx,
unsigned long phys, pgprot_t flags)
{
native_set_fixmap(idx, phys, flags);
}
#endif
---

in x86-64, unsigned long is 64bit, so it can handle highmem. So,
this problem never happens on x86-64.

Ingo, would you think we can expand phys to unsigned long long or
somesush in fixmap.h?

Thank you,

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: [email protected]

2009-04-05 03:46:22

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [BUG][-tip] kprobes on module functions hits kernel BUG in text_poke on x86-32

Masami Hiramatsu wrote:
> Masami Hiramatsu wrote:
>> Mathieu Desnoyers wrote:
>>> * Masami Hiramatsu ([email protected]) wrote:
>>>> Hi,
>>>>
>>>> I found text_poke() problem on x86-32 with the latest-tip tree.
>>>> When I put a kprobe on a module function, text_poke() hit a BUG.
>>>>
>>>> This bug can be reproduced on x86-32, but not on x86-64.
>>>> And inserting kprobes on a kernel-core function is OK.
>>>>
>>>> Thank you,
>>>>
>>> Hi Masami,
>>>
>>> OK, so text_poke safety net saves the day :)
>>>
>>> Basically, what we have here is the BUG_ON I have put :
>>>
>>> for (i = 0; i < len; i++)
>>> BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]);
>>>
>>> Which checks that the modification is really preceivable in the kernel
>>> code, triggers this bug. Only for modules you say.
>>>
>>> It might not be this, but.. let's try something simple (this could be
>>> completely unrelated, but won't take long to test): can you try to add a
>>> vmalloc_sync_all() at the beginning of text_poke ? This would make sure
>>> that lazily-populated TLB entries, which include module code and data on
>>> x86, will be populated. I wonder if we hit this problem because
>>> vmalloc_to_page would be returning a mapping to a yet unpopulated TLB
>>> entry, if it is ever possible.
>> Hmm, from the result of my test, vmalloc_sync_all() didn't change anything...
>>
>>> If that's not this, then I guess we have some problem with setting a
>>> fixmap to a page returned by vmalloc on x86 32.

Hmm, ok. AFAICS, fixmap is only for lowmem, and pkmap is only for highmem.

So, I think we have some options;

A) Separate text_poke into __text_poke and __text_poke_highmem. And
use pkmap_atomic in __text_poke_highmem. This way doesn't require
any additional change except adding KM_TEXT_POKE0/1 in km_type.

B) Add set_fixmap_page and use it in text_poke. This will require
changes in paravirt_ops and pgtable.c. We need to ensure there is
no side effects.

C) Change pkmap_atomic_prot to map lowmem only if the page's pgprot
is different from user specified pgprot. And use it instead of
fixmap. This also requires KM_TEXT_POKE0/1, however we can
remove FIX_TEXT_POKE0/1.

etc...

I think A) is for short-term solution. I guess it will be acceptable
for next release. But for long-term, C) might be better.

Thank you,

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: [email protected]

2009-04-05 03:49:42

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [BUG][-tip] kprobes on module functions hits kernel BUG in text_poke on x86-32

Masami Hiramatsu wrote:
> Hmm, ok. AFAICS, fixmap is only for lowmem, and pkmap is only for highmem.

Oops, I mean kmap...


> So, I think we have some options;
>
> A) Separate text_poke into __text_poke and __text_poke_highmem. And
> use pkmap_atomic in __text_poke_highmem. This way doesn't require
> any additional change except adding KM_TEXT_POKE0/1 in km_type.
>
> B) Add set_fixmap_page and use it in text_poke. This will require
> changes in paravirt_ops and pgtable.c. We need to ensure there is
> no side effects.
>
> C) Change pkmap_atomic_prot to map lowmem only if the page's pgprot
> is different from user specified pgprot. And use it instead of
> fixmap. This also requires KM_TEXT_POKE0/1, however we can
> remove FIX_TEXT_POKE0/1.

same, s/pkmap/kmap/g


--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: [email protected]

2009-04-06 17:10:49

by Masami Hiramatsu

[permalink] [raw]
Subject: [BUGFIX][PATCH -tip] x86: fix text_poke to handle highmem pages

Fix a bug in text_poke to handle highmem pages, because module
text pages are possible to be highmem pages on x86-32.
In that case, since fixmap can't handle those pages, text_poke
uses kmap_atomic.

Moreover, module_text pages will be discontinuous and it is
possible that one page is lowmem and another page is highmem,
because it is allocated by vmalloc. To handle this situation,
text_poke splits poke area into two pieces and write the areas
page by page.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Ingo Molnar <[email protected]>
---

Hi Ingo and Mathieu,

Masami Hiramatsu wrote:
> A) Separate text_poke into __text_poke and __text_poke_highmem. And
> use pkmap_atomic in __text_poke_highmem. This way doesn't require
> any additional change except adding KM_TEXT_POKE0/1 in km_type.

Here is A) patch.

Thank you,

arch/x86/include/asm/fixmap.h | 3 +-
arch/x86/include/asm/kmap_types.h | 3 ++
arch/x86/kernel/alternative.c | 45 ++++++++++++++++++++++++-------------
3 files changed, 32 insertions(+), 19 deletions(-)


diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index 81937a5..6c59d4a 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -111,8 +111,7 @@ enum fixed_addresses {
#ifdef CONFIG_PARAVIRT
FIX_PARAVIRT_BOOTMAP,
#endif
- FIX_TEXT_POKE0, /* reserve 2 pages for text_poke() */
- FIX_TEXT_POKE1,
+ FIX_TEXT_POKE, /* reserve 1 page for text_poke() */
__end_of_permanent_fixed_addresses,
#ifdef CONFIG_PROVIDE_OHCI1394_DMA_INIT
FIX_OHCI1394_BASE,
diff --git a/arch/x86/include/asm/kmap_types.h b/arch/x86/include/asm/kmap_types.h
index 5759c16..e48f11f 100644
--- a/arch/x86/include/asm/kmap_types.h
+++ b/arch/x86/include/asm/kmap_types.h
@@ -21,7 +21,8 @@ D(9) KM_IRQ0,
D(10) KM_IRQ1,
D(11) KM_SOFTIRQ0,
D(12) KM_SOFTIRQ1,
-D(13) KM_TYPE_NR
+D(13) KM_TEXT_POKE,
+D(14) KM_TYPE_NR
};

#undef D
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index f576587..f4487c9 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -6,6 +6,7 @@
#include <linux/mm.h>
#include <linux/vmalloc.h>
#include <linux/memory.h>
+#include <linux/highmem.h>
#include <asm/alternative.h>
#include <asm/sections.h>
#include <asm/pgtable.h>
@@ -514,27 +515,39 @@ void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
{
unsigned long flags;
char *vaddr;
- struct page *pages[2];
+ struct page *page;
int i;
+ unsigned long endp = ((unsigned long)addr + len) & PAGE_MASK;

- if (!core_kernel_text((unsigned long)addr)) {
- pages[0] = vmalloc_to_page(addr);
- pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
- } else {
- pages[0] = virt_to_page(addr);
- WARN_ON(!PageReserved(pages[0]));
- pages[1] = virt_to_page(addr + PAGE_SIZE);
+ /*
+ * If the written range covers 2 pages, we'll split it, because
+ * vmalloc pages are not always continuous -- e.g. 1st page is
+ * lowmem and 2nd page is highmem.
+ */
+ if (((unsigned long)addr & PAGE_MASK) != endp) {
+ text_poke(addr, opcode, endp - (unsigned long)addr);
+ addr = (void *)endp;
+ opcode = (char *)opcode + (endp - (unsigned long)addr);
+ len -= endp - (unsigned long)addr;
}
- BUG_ON(!pages[0]);
+
+ if (!core_kernel_text((unsigned long)addr))
+ page = vmalloc_to_page(addr);
+ else
+ page = virt_to_page(addr);
+ BUG_ON(!page);
local_irq_save(flags);
- set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
- if (pages[1])
- set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
- vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
+ if (PageHighMem(page))
+ vaddr = kmap_atomic(page, KM_TEXT_POKE);
+ else {
+ set_fixmap(FIX_TEXT_POKE, page_to_phys(page));
+ vaddr = (char *)fix_to_virt(FIX_TEXT_POKE);
+ }
memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
- clear_fixmap(FIX_TEXT_POKE0);
- if (pages[1])
- clear_fixmap(FIX_TEXT_POKE1);
+ if (PageHighMem(page))
+ kunmap_atomic(vaddr, KM_TEXT_POKE);
+ else
+ clear_fixmap(FIX_TEXT_POKE);
local_flush_tlb();
sync_core();
/* Could also do a CLFLUSH here to speed up CPU recovery; but

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: [email protected]

2009-04-06 17:32:53

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH -tip] x86: fix text_poke to handle highmem pages

* Masami Hiramatsu ([email protected]) wrote:
> Fix a bug in text_poke to handle highmem pages, because module
> text pages are possible to be highmem pages on x86-32.
> In that case, since fixmap can't handle those pages, text_poke
> uses kmap_atomic.
>

Hrm, can you remind me what would be the downside of using kmap_atomic
in every scenarios (highmem and non-highmem) then ?

I would try to avoid "special cases" as much as possible, because they
just make problems harder to reproduce.

The idea would be to either add fixmap highmem support, or to simply use
kmap_atomic() for all cases until we add fixmap highmem support.

Mathieu

> Moreover, module_text pages will be discontinuous and it is
> possible that one page is lowmem and another page is highmem,
> because it is allocated by vmalloc. To handle this situation,
> text_poke splits poke area into two pieces and write the areas
> page by page.
>
> Signed-off-by: Masami Hiramatsu <[email protected]>
> Cc: Mathieu Desnoyers <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> ---
>
> Hi Ingo and Mathieu,
>
> Masami Hiramatsu wrote:
> > A) Separate text_poke into __text_poke and __text_poke_highmem. And
> > use pkmap_atomic in __text_poke_highmem. This way doesn't require
> > any additional change except adding KM_TEXT_POKE0/1 in km_type.
>
> Here is A) patch.
>
> Thank you,
>
> arch/x86/include/asm/fixmap.h | 3 +-
> arch/x86/include/asm/kmap_types.h | 3 ++
> arch/x86/kernel/alternative.c | 45 ++++++++++++++++++++++++-------------
> 3 files changed, 32 insertions(+), 19 deletions(-)
>
>
> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
> index 81937a5..6c59d4a 100644
> --- a/arch/x86/include/asm/fixmap.h
> +++ b/arch/x86/include/asm/fixmap.h
> @@ -111,8 +111,7 @@ enum fixed_addresses {
> #ifdef CONFIG_PARAVIRT
> FIX_PARAVIRT_BOOTMAP,
> #endif
> - FIX_TEXT_POKE0, /* reserve 2 pages for text_poke() */
> - FIX_TEXT_POKE1,
> + FIX_TEXT_POKE, /* reserve 1 page for text_poke() */
> __end_of_permanent_fixed_addresses,
> #ifdef CONFIG_PROVIDE_OHCI1394_DMA_INIT
> FIX_OHCI1394_BASE,
> diff --git a/arch/x86/include/asm/kmap_types.h b/arch/x86/include/asm/kmap_types.h
> index 5759c16..e48f11f 100644
> --- a/arch/x86/include/asm/kmap_types.h
> +++ b/arch/x86/include/asm/kmap_types.h
> @@ -21,7 +21,8 @@ D(9) KM_IRQ0,
> D(10) KM_IRQ1,
> D(11) KM_SOFTIRQ0,
> D(12) KM_SOFTIRQ1,
> -D(13) KM_TYPE_NR
> +D(13) KM_TEXT_POKE,
> +D(14) KM_TYPE_NR
> };
>
> #undef D
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index f576587..f4487c9 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -6,6 +6,7 @@
> #include <linux/mm.h>
> #include <linux/vmalloc.h>
> #include <linux/memory.h>
> +#include <linux/highmem.h>
> #include <asm/alternative.h>
> #include <asm/sections.h>
> #include <asm/pgtable.h>
> @@ -514,27 +515,39 @@ void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
> {
> unsigned long flags;
> char *vaddr;
> - struct page *pages[2];
> + struct page *page;
> int i;
> + unsigned long endp = ((unsigned long)addr + len) & PAGE_MASK;
>
> - if (!core_kernel_text((unsigned long)addr)) {
> - pages[0] = vmalloc_to_page(addr);
> - pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
> - } else {
> - pages[0] = virt_to_page(addr);
> - WARN_ON(!PageReserved(pages[0]));
> - pages[1] = virt_to_page(addr + PAGE_SIZE);
> + /*
> + * If the written range covers 2 pages, we'll split it, because
> + * vmalloc pages are not always continuous -- e.g. 1st page is
> + * lowmem and 2nd page is highmem.
> + */
> + if (((unsigned long)addr & PAGE_MASK) != endp) {
> + text_poke(addr, opcode, endp - (unsigned long)addr);
> + addr = (void *)endp;
> + opcode = (char *)opcode + (endp - (unsigned long)addr);
> + len -= endp - (unsigned long)addr;
> }
> - BUG_ON(!pages[0]);
> +
> + if (!core_kernel_text((unsigned long)addr))
> + page = vmalloc_to_page(addr);
> + else
> + page = virt_to_page(addr);
> + BUG_ON(!page);
> local_irq_save(flags);
> - set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
> - if (pages[1])
> - set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
> - vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
> + if (PageHighMem(page))
> + vaddr = kmap_atomic(page, KM_TEXT_POKE);
> + else {
> + set_fixmap(FIX_TEXT_POKE, page_to_phys(page));
> + vaddr = (char *)fix_to_virt(FIX_TEXT_POKE);
> + }
> memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
> - clear_fixmap(FIX_TEXT_POKE0);
> - if (pages[1])
> - clear_fixmap(FIX_TEXT_POKE1);
> + if (PageHighMem(page))
> + kunmap_atomic(vaddr, KM_TEXT_POKE);
> + else
> + clear_fixmap(FIX_TEXT_POKE);
> local_flush_tlb();
> sync_core();
> /* Could also do a CLFLUSH here to speed up CPU recovery; but
>
> --
> Masami Hiramatsu
>
> Software Engineer
> Hitachi Computer Products (America) Inc.
> Software Solutions Division
>
> e-mail: [email protected]
>

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2009-04-06 17:43:41

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH -tip] x86: fix text_poke to handle highmem pages

Mathieu Desnoyers wrote:
> * Masami Hiramatsu ([email protected]) wrote:
>> Fix a bug in text_poke to handle highmem pages, because module
>> text pages are possible to be highmem pages on x86-32.
>> In that case, since fixmap can't handle those pages, text_poke
>> uses kmap_atomic.
>>
>
> Hrm, can you remind me what would be the downside of using kmap_atomic
> in every scenarios (highmem and non-highmem) then ?

kmap_atomic can handle only highmem pages. If you passes lowmem pages,
it returns just original vaddr of it. (because kmap is only for
highmem support)

>
> I would try to avoid "special cases" as much as possible, because they
> just make problems harder to reproduce.

Actually, this bug is a special case because it happens only on PAE kernel.

>
> The idea would be to either add fixmap highmem support, or to simply use
> kmap_atomic() for all cases until we add fixmap highmem support.
>


Thank you,

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: [email protected]

2009-04-06 17:59:41

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH -tip] x86: fix text_poke to handle highmem pages

* Masami Hiramatsu ([email protected]) wrote:
> Mathieu Desnoyers wrote:
> > * Masami Hiramatsu ([email protected]) wrote:
> >> Fix a bug in text_poke to handle highmem pages, because module
> >> text pages are possible to be highmem pages on x86-32.
> >> In that case, since fixmap can't handle those pages, text_poke
> >> uses kmap_atomic.
> >>
> >
> > Hrm, can you remind me what would be the downside of using kmap_atomic
> > in every scenarios (highmem and non-highmem) then ?
>
> kmap_atomic can handle only highmem pages. If you passes lowmem pages,
> it returns just original vaddr of it. (because kmap is only for
> highmem support)
>

OK, and if we are doing a second kmap_atomic() of a module text page
which is already mapped, does the second kmap_atomic return the vaddr of
the page originally mapped or is it creating a second mapping ?

Because if we ever decide to enforce read-only page mapping for module
text pages, touching highmem pages too, we will run into real trouble if
those happen to be the same page.

Mathieu

> >
> > I would try to avoid "special cases" as much as possible, because they
> > just make problems harder to reproduce.
>
> Actually, this bug is a special case because it happens only on PAE kernel.
>
> >
> > The idea would be to either add fixmap highmem support, or to simply use
> > kmap_atomic() for all cases until we add fixmap highmem support.
> >
>
>
> Thank you,
>
> --
> Masami Hiramatsu
>
> Software Engineer
> Hitachi Computer Products (America) Inc.
> Software Solutions Division
>
> e-mail: [email protected]
>

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2009-04-06 20:22:23

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH -tip] x86: fix text_poke to handle highmem pages



Mathieu Desnoyers wrote:
> * Masami Hiramatsu ([email protected]) wrote:
>> Mathieu Desnoyers wrote:
>>> * Masami Hiramatsu ([email protected]) wrote:
>>>> Fix a bug in text_poke to handle highmem pages, because module
>>>> text pages are possible to be highmem pages on x86-32.
>>>> In that case, since fixmap can't handle those pages, text_poke
>>>> uses kmap_atomic.
>>>>
>>> Hrm, can you remind me what would be the downside of using kmap_atomic
>>> in every scenarios (highmem and non-highmem) then ?
>> kmap_atomic can handle only highmem pages. If you passes lowmem pages,
>> it returns just original vaddr of it. (because kmap is only for
>> highmem support)
>>
>
> OK, and if we are doing a second kmap_atomic() of a module text page
> which is already mapped, does the second kmap_atomic return the vaddr of
> the page originally mapped or is it creating a second mapping ?

Good point, kmap_atomic creates a second mapping if the page is highmem.

in arch/x86/mm/highmem_32.c:

void *kmap_atomic_prot(struct page *page, enum km_type type, pgprot_t prot)
{
...
if (!PageHighMem(page))
return page_address(page);
...
}

> Because if we ever decide to enforce read-only page mapping for module
> text pages, touching highmem pages too, we will run into real trouble if
> those happen to be the same page.

So, as long as the page is in highmem, we can create a second mapping by
using kmap_atomic.

Thank you,

>
> Mathieu
>
>>> I would try to avoid "special cases" as much as possible, because they
>>> just make problems harder to reproduce.
>> Actually, this bug is a special case because it happens only on PAE kernel.
>>
>>> The idea would be to either add fixmap highmem support, or to simply use
>>> kmap_atomic() for all cases until we add fixmap highmem support.
>>>
>>
>> Thank you,
>>
>> --
>> Masami Hiramatsu
>>
>> Software Engineer
>> Hitachi Computer Products (America) Inc.
>> Software Solutions Division
>>
>> e-mail: [email protected]
>>
>

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: [email protected]

2009-04-08 12:33:06

by Ingo Molnar

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH -tip] x86: fix text_poke to handle highmem pages


* Masami Hiramatsu <[email protected]> wrote:

> @@ -514,27 +515,39 @@ void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
> {
> unsigned long flags;
> char *vaddr;
> - struct page *pages[2];
> + struct page *page;
> int i;
> + unsigned long endp = ((unsigned long)addr + len) & PAGE_MASK;
>
> - if (!core_kernel_text((unsigned long)addr)) {
> - pages[0] = vmalloc_to_page(addr);
> - pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
> - } else {
> - pages[0] = virt_to_page(addr);
> - WARN_ON(!PageReserved(pages[0]));
> - pages[1] = virt_to_page(addr + PAGE_SIZE);
> + /*
> + * If the written range covers 2 pages, we'll split it, because
> + * vmalloc pages are not always continuous -- e.g. 1st page is
> + * lowmem and 2nd page is highmem.
> + */
> + if (((unsigned long)addr & PAGE_MASK) != endp) {
> + text_poke(addr, opcode, endp - (unsigned long)addr);
> + addr = (void *)endp;
> + opcode = (char *)opcode + (endp - (unsigned long)addr);
> + len -= endp - (unsigned long)addr;
> }
> - BUG_ON(!pages[0]);
> +
> + if (!core_kernel_text((unsigned long)addr))
> + page = vmalloc_to_page(addr);
> + else
> + page = virt_to_page(addr);

hm, the bug is upstream now. And your fix turns a
supposed-to-be-simpler kmap based patching thing back into something
fragile looking again. We might be better off with a revert - or we
do a real clean patch.

Firstly, that core_kernel_text() distinction above looks
artificially open-coded - dont we have a proper, generic
"look-up-the-page" variant in the MM somewhere?

> + BUG_ON(!page);
> local_irq_save(flags);
> - set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
> - if (pages[1])
> - set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
> - vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
> + if (PageHighMem(page))
> + vaddr = kmap_atomic(page, KM_TEXT_POKE);
> + else {
> + set_fixmap(FIX_TEXT_POKE, page_to_phys(page));
> + vaddr = (char *)fix_to_virt(FIX_TEXT_POKE);
> + }

that too looks artificially complex. Why cannot we kmap lowmem pages
too? If the API isnt available on !HIGHMEM kernels .. then the
solution is to make it available, not to branch our way around it.

> memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
> - clear_fixmap(FIX_TEXT_POKE0);
> - if (pages[1])
> - clear_fixmap(FIX_TEXT_POKE1);
> + if (PageHighMem(page))
> + kunmap_atomic(vaddr, KM_TEXT_POKE);
> + else
> + clear_fixmap(FIX_TEXT_POKE);

ditto.

Thanks,

Ingo

2009-04-08 14:58:59

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH -tip] x86: fix text_poke to handle highmem pages

Ingo Molnar wrote:
> * Masami Hiramatsu <[email protected]> wrote:
>
>> @@ -514,27 +515,39 @@ void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
>> {
>> unsigned long flags;
>> char *vaddr;
>> - struct page *pages[2];
>> + struct page *page;
>> int i;
>> + unsigned long endp = ((unsigned long)addr + len) & PAGE_MASK;
>>
>> - if (!core_kernel_text((unsigned long)addr)) {
>> - pages[0] = vmalloc_to_page(addr);
>> - pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
>> - } else {
>> - pages[0] = virt_to_page(addr);
>> - WARN_ON(!PageReserved(pages[0]));
>> - pages[1] = virt_to_page(addr + PAGE_SIZE);
>> + /*
>> + * If the written range covers 2 pages, we'll split it, because
>> + * vmalloc pages are not always continuous -- e.g. 1st page is
>> + * lowmem and 2nd page is highmem.
>> + */
>> + if (((unsigned long)addr & PAGE_MASK) != endp) {
>> + text_poke(addr, opcode, endp - (unsigned long)addr);
>> + addr = (void *)endp;
>> + opcode = (char *)opcode + (endp - (unsigned long)addr);
>> + len -= endp - (unsigned long)addr;
>> }
>> - BUG_ON(!pages[0]);
>> +
>> + if (!core_kernel_text((unsigned long)addr))
>> + page = vmalloc_to_page(addr);
>> + else
>> + page = virt_to_page(addr);
>
> hm, the bug is upstream now. And your fix turns a
> supposed-to-be-simpler kmap based patching thing back into something
> fragile looking again. We might be better off with a revert - or we
> do a real clean patch.
>
> Firstly, that core_kernel_text() distinction above looks
> artificially open-coded - dont we have a proper, generic
> "look-up-the-page" variant in the MM somewhere?

Actually, vmalloc_to_page() is generic one. It decodes
the kernel page table directly to find struct page *.
virt_to_page() is just a short-cut api.

>
>> + BUG_ON(!page);
>> local_irq_save(flags);
>> - set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
>> - if (pages[1])
>> - set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
>> - vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
>> + if (PageHighMem(page))
>> + vaddr = kmap_atomic(page, KM_TEXT_POKE);
>> + else {
>> + set_fixmap(FIX_TEXT_POKE, page_to_phys(page));
>> + vaddr = (char *)fix_to_virt(FIX_TEXT_POKE);
>> + }
>
> that too looks artificially complex. Why cannot we kmap lowmem pages
> too? If the API isnt available on !HIGHMEM kernels .. then the
> solution is to make it available, not to branch our way around it.

Hmm, why don't we enhance fixmap to handle highmem pages?
(e.g. adding set_fixmap_page())
Since kmap is only for highmem kernels, I think changing it will effects
more users...

Thank you,

>
>> memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
>> - clear_fixmap(FIX_TEXT_POKE0);
>> - if (pages[1])
>> - clear_fixmap(FIX_TEXT_POKE1);
>> + if (PageHighMem(page))
>> + kunmap_atomic(vaddr, KM_TEXT_POKE);
>> + else
>> + clear_fixmap(FIX_TEXT_POKE);
>
> ditto.
>
> Thanks,
>
> Ingo

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: [email protected]

2009-04-08 15:01:28

by Ingo Molnar

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH -tip] x86: fix text_poke to handle highmem pages


* Masami Hiramatsu <[email protected]> wrote:

> Ingo Molnar wrote:
> > * Masami Hiramatsu <[email protected]> wrote:
> >
> >> @@ -514,27 +515,39 @@ void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
> >> {
> >> unsigned long flags;
> >> char *vaddr;
> >> - struct page *pages[2];
> >> + struct page *page;
> >> int i;
> >> + unsigned long endp = ((unsigned long)addr + len) & PAGE_MASK;
> >>
> >> - if (!core_kernel_text((unsigned long)addr)) {
> >> - pages[0] = vmalloc_to_page(addr);
> >> - pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
> >> - } else {
> >> - pages[0] = virt_to_page(addr);
> >> - WARN_ON(!PageReserved(pages[0]));
> >> - pages[1] = virt_to_page(addr + PAGE_SIZE);
> >> + /*
> >> + * If the written range covers 2 pages, we'll split it, because
> >> + * vmalloc pages are not always continuous -- e.g. 1st page is
> >> + * lowmem and 2nd page is highmem.
> >> + */
> >> + if (((unsigned long)addr & PAGE_MASK) != endp) {
> >> + text_poke(addr, opcode, endp - (unsigned long)addr);
> >> + addr = (void *)endp;
> >> + opcode = (char *)opcode + (endp - (unsigned long)addr);
> >> + len -= endp - (unsigned long)addr;
> >> }
> >> - BUG_ON(!pages[0]);
> >> +
> >> + if (!core_kernel_text((unsigned long)addr))
> >> + page = vmalloc_to_page(addr);
> >> + else
> >> + page = virt_to_page(addr);
> >
> > hm, the bug is upstream now. And your fix turns a
> > supposed-to-be-simpler kmap based patching thing back into something
> > fragile looking again. We might be better off with a revert - or we
> > do a real clean patch.
> >
> > Firstly, that core_kernel_text() distinction above looks
> > artificially open-coded - dont we have a proper, generic
> > "look-up-the-page" variant in the MM somewhere?
>
> Actually, vmalloc_to_page() is generic one. It decodes
> the kernel page table directly to find struct page *.
> virt_to_page() is just a short-cut api.
>
> >
> >> + BUG_ON(!page);
> >> local_irq_save(flags);
> >> - set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
> >> - if (pages[1])
> >> - set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
> >> - vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
> >> + if (PageHighMem(page))
> >> + vaddr = kmap_atomic(page, KM_TEXT_POKE);
> >> + else {
> >> + set_fixmap(FIX_TEXT_POKE, page_to_phys(page));
> >> + vaddr = (char *)fix_to_virt(FIX_TEXT_POKE);
> >> + }
> >
> > that too looks artificially complex. Why cannot we kmap lowmem pages
> > too? If the API isnt available on !HIGHMEM kernels .. then the
> > solution is to make it available, not to branch our way around it.
>
> Hmm, why don't we enhance fixmap to handle highmem pages?
> (e.g. adding set_fixmap_page())
> Since kmap is only for highmem kernels, I think changing it will effects
> more users...
>
> Thank you,

Sure, whichever way looks more logical to you - my only condition is
that it should result in a clean patch! :-)

Ingo

2009-04-09 17:57:20

by Masami Hiramatsu

[permalink] [raw]
Subject: [BUGFIX][PATCH] x86: fix set_fixmap to use phys_addr_t

Use phys_addr_t for receiving a physical address argument
instead of unsigned long. This allows fixmap to handle
pages higher than 4GB on x86-32.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
---

arch/x86/include/asm/fixmap.h | 4 ++--
arch/x86/include/asm/paravirt.h | 4 ++--
arch/x86/mm/pgtable.c | 3 ++-
arch/x86/xen/mmu.c | 2 +-
4 files changed, 7 insertions(+), 6 deletions(-)


diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index 81937a5..2d81af3 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -151,11 +151,11 @@ extern pte_t *pkmap_page_table;

void __native_set_fixmap(enum fixed_addresses idx, pte_t pte);
void native_set_fixmap(enum fixed_addresses idx,
- unsigned long phys, pgprot_t flags);
+ phys_addr_t phys, pgprot_t flags);

#ifndef CONFIG_PARAVIRT
static inline void __set_fixmap(enum fixed_addresses idx,
- unsigned long phys, pgprot_t flags)
+ phys_addr_t phys, pgprot_t flags)
{
native_set_fixmap(idx, phys, flags);
}
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 7727aa8..378e369 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -347,7 +347,7 @@ struct pv_mmu_ops {
/* Sometimes the physical address is a pfn, and sometimes its
an mfn. We can tell which is which from the index. */
void (*set_fixmap)(unsigned /* enum fixed_addresses */ idx,
- unsigned long phys, pgprot_t flags);
+ phys_addr_t phys, pgprot_t flags);
};

struct raw_spinlock;
@@ -1432,7 +1432,7 @@ static inline void arch_leave_lazy_mmu_mode(void)
void arch_flush_lazy_mmu_mode(void);

static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx,
- unsigned long phys, pgprot_t flags)
+ phys_addr_t phys, pgprot_t flags)
{
pv_mmu_ops.set_fixmap(idx, phys, flags);
}
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 7a4d6ee..8e43bdd 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -347,7 +347,8 @@ void __native_set_fixmap(enum fixed_addresses idx, pte_t pte)
fixmaps_set++;
}

-void native_set_fixmap(enum fixed_addresses idx, unsigned long phys, pgprot_t flags)
+void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
+ pgprot_t flags)
{
__native_set_fixmap(idx, pfn_pte(phys >> PAGE_SHIFT, flags));
}
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index db3802f..2a81838 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1750,7 +1750,7 @@ __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,
}
#endif /* CONFIG_X86_64 */

-static void xen_set_fixmap(unsigned idx, unsigned long phys, pgprot_t prot)
+static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot)
{
pte_t pte;

2009-04-09 18:46:36

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH] x86: fix set_fixmap to use phys_addr_t

* Masami Hiramatsu ([email protected]) wrote:
> Use phys_addr_t for receiving a physical address argument
> instead of unsigned long. This allows fixmap to handle
> pages higher than 4GB on x86-32.
>

You might also want to update arch/x86/mm/ioremap.c:early_set_fixmap()
and __early_set_fixmap().

Otherwise it looks good.

Thanks,

Mathieu

> Signed-off-by: Masami Hiramatsu <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Mathieu Desnoyers <[email protected]>
> ---
>
> arch/x86/include/asm/fixmap.h | 4 ++--
> arch/x86/include/asm/paravirt.h | 4 ++--
> arch/x86/mm/pgtable.c | 3 ++-
> arch/x86/xen/mmu.c | 2 +-
> 4 files changed, 7 insertions(+), 6 deletions(-)
>
>
> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
> index 81937a5..2d81af3 100644
> --- a/arch/x86/include/asm/fixmap.h
> +++ b/arch/x86/include/asm/fixmap.h
> @@ -151,11 +151,11 @@ extern pte_t *pkmap_page_table;
>
> void __native_set_fixmap(enum fixed_addresses idx, pte_t pte);
> void native_set_fixmap(enum fixed_addresses idx,
> - unsigned long phys, pgprot_t flags);
> + phys_addr_t phys, pgprot_t flags);
>
> #ifndef CONFIG_PARAVIRT
> static inline void __set_fixmap(enum fixed_addresses idx,
> - unsigned long phys, pgprot_t flags)
> + phys_addr_t phys, pgprot_t flags)
> {
> native_set_fixmap(idx, phys, flags);
> }
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index 7727aa8..378e369 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -347,7 +347,7 @@ struct pv_mmu_ops {
> /* Sometimes the physical address is a pfn, and sometimes its
> an mfn. We can tell which is which from the index. */
> void (*set_fixmap)(unsigned /* enum fixed_addresses */ idx,
> - unsigned long phys, pgprot_t flags);
> + phys_addr_t phys, pgprot_t flags);
> };
>
> struct raw_spinlock;
> @@ -1432,7 +1432,7 @@ static inline void arch_leave_lazy_mmu_mode(void)
> void arch_flush_lazy_mmu_mode(void);
>
> static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx,
> - unsigned long phys, pgprot_t flags)
> + phys_addr_t phys, pgprot_t flags)
> {
> pv_mmu_ops.set_fixmap(idx, phys, flags);
> }
> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index 7a4d6ee..8e43bdd 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -347,7 +347,8 @@ void __native_set_fixmap(enum fixed_addresses idx, pte_t pte)
> fixmaps_set++;
> }
>
> -void native_set_fixmap(enum fixed_addresses idx, unsigned long phys, pgprot_t flags)
> +void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
> + pgprot_t flags)
> {
> __native_set_fixmap(idx, pfn_pte(phys >> PAGE_SHIFT, flags));
> }
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index db3802f..2a81838 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1750,7 +1750,7 @@ __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,
> }
> #endif /* CONFIG_X86_64 */
>
> -static void xen_set_fixmap(unsigned idx, unsigned long phys, pgprot_t prot)
> +static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot)
> {
> pte_t pte;
>

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2009-04-09 21:53:23

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH] x86: fix set_fixmap to use phys_addr_t

Mathieu Desnoyers wrote:
> * Masami Hiramatsu ([email protected]) wrote:
>> Use phys_addr_t for receiving a physical address argument
>> instead of unsigned long. This allows fixmap to handle
>> pages higher than 4GB on x86-32.
>>
>
> You might also want to update arch/x86/mm/ioremap.c:early_set_fixmap()
> and __early_set_fixmap().

Hmm, it is easy to change unsigned long to phys_addr_t in
early_set_fixmap(), however, ioremap also uses unsigned long for
physical addresses. Do I need to update ioremap functions too?

I mean, is there any chance that over 4GB physical address will
be passed to ioremap? If not, I just change early_set_fixmap()
and __early_set_fixmap().

Thanks,

>
> Otherwise it looks good.
>
> Thanks,
>
> Mathieu
>
>> Signed-off-by: Masami Hiramatsu <[email protected]>
>> Cc: Ingo Molnar <[email protected]>
>> Cc: Mathieu Desnoyers <[email protected]>
>> ---
>>
>> arch/x86/include/asm/fixmap.h | 4 ++--
>> arch/x86/include/asm/paravirt.h | 4 ++--
>> arch/x86/mm/pgtable.c | 3 ++-
>> arch/x86/xen/mmu.c | 2 +-
>> 4 files changed, 7 insertions(+), 6 deletions(-)
>>
>>
>> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
>> index 81937a5..2d81af3 100644
>> --- a/arch/x86/include/asm/fixmap.h
>> +++ b/arch/x86/include/asm/fixmap.h
>> @@ -151,11 +151,11 @@ extern pte_t *pkmap_page_table;
>>
>> void __native_set_fixmap(enum fixed_addresses idx, pte_t pte);
>> void native_set_fixmap(enum fixed_addresses idx,
>> - unsigned long phys, pgprot_t flags);
>> + phys_addr_t phys, pgprot_t flags);
>>
>> #ifndef CONFIG_PARAVIRT
>> static inline void __set_fixmap(enum fixed_addresses idx,
>> - unsigned long phys, pgprot_t flags)
>> + phys_addr_t phys, pgprot_t flags)
>> {
>> native_set_fixmap(idx, phys, flags);
>> }
>> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
>> index 7727aa8..378e369 100644
>> --- a/arch/x86/include/asm/paravirt.h
>> +++ b/arch/x86/include/asm/paravirt.h
>> @@ -347,7 +347,7 @@ struct pv_mmu_ops {
>> /* Sometimes the physical address is a pfn, and sometimes its
>> an mfn. We can tell which is which from the index. */
>> void (*set_fixmap)(unsigned /* enum fixed_addresses */ idx,
>> - unsigned long phys, pgprot_t flags);
>> + phys_addr_t phys, pgprot_t flags);
>> };
>>
>> struct raw_spinlock;
>> @@ -1432,7 +1432,7 @@ static inline void arch_leave_lazy_mmu_mode(void)
>> void arch_flush_lazy_mmu_mode(void);
>>
>> static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx,
>> - unsigned long phys, pgprot_t flags)
>> + phys_addr_t phys, pgprot_t flags)
>> {
>> pv_mmu_ops.set_fixmap(idx, phys, flags);
>> }
>> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
>> index 7a4d6ee..8e43bdd 100644
>> --- a/arch/x86/mm/pgtable.c
>> +++ b/arch/x86/mm/pgtable.c
>> @@ -347,7 +347,8 @@ void __native_set_fixmap(enum fixed_addresses idx, pte_t pte)
>> fixmaps_set++;
>> }
>>
>> -void native_set_fixmap(enum fixed_addresses idx, unsigned long phys, pgprot_t flags)
>> +void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
>> + pgprot_t flags)
>> {
>> __native_set_fixmap(idx, pfn_pte(phys >> PAGE_SHIFT, flags));
>> }
>> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
>> index db3802f..2a81838 100644
>> --- a/arch/x86/xen/mmu.c
>> +++ b/arch/x86/xen/mmu.c
>> @@ -1750,7 +1750,7 @@ __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,
>> }
>> #endif /* CONFIG_X86_64 */
>>
>> -static void xen_set_fixmap(unsigned idx, unsigned long phys, pgprot_t prot)
>> +static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot)
>> {
>> pte_t pte;
>>
>

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: [email protected]

2009-04-10 14:10:10

by Masami Hiramatsu

[permalink] [raw]
Subject: [tip:x86/urgent] x86: fix set_fixmap to use phys_addr_t

Commit-ID: 189cdc2b41fce3a780a6f3aa963cc4e8114aec6b
Gitweb: http://git.kernel.org/tip/189cdc2b41fce3a780a6f3aa963cc4e8114aec6b
Author: Masami Hiramatsu <[email protected]>
AuthorDate: Thu, 9 Apr 2009 10:55:33 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 10 Apr 2009 15:52:59 +0200

x86: fix set_fixmap to use phys_addr_t

Impact: fix kprobes crash on 32-bit with RAM above 4G

Use phys_addr_t for receiving a physical address argument
instead of unsigned long. This allows fixmap to handle
pages higher than 4GB on x86-32.

Signed-off-by: Masami Hiramatsu <[email protected]>
Acked-by: Mathieu Desnoyers <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: systemtap-ml <[email protected]>
Cc: Gary Hade <[email protected]>
Cc: Linus Torvalds <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/include/asm/fixmap.h | 4 ++--
arch/x86/include/asm/paravirt.h | 4 ++--
arch/x86/mm/pgtable.c | 3 ++-
arch/x86/xen/mmu.c | 2 +-
4 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index 81937a5..2d81af3 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -151,11 +151,11 @@ extern pte_t *pkmap_page_table;

void __native_set_fixmap(enum fixed_addresses idx, pte_t pte);
void native_set_fixmap(enum fixed_addresses idx,
- unsigned long phys, pgprot_t flags);
+ phys_addr_t phys, pgprot_t flags);

#ifndef CONFIG_PARAVIRT
static inline void __set_fixmap(enum fixed_addresses idx,
- unsigned long phys, pgprot_t flags)
+ phys_addr_t phys, pgprot_t flags)
{
native_set_fixmap(idx, phys, flags);
}
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 7727aa8..378e369 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -347,7 +347,7 @@ struct pv_mmu_ops {
/* Sometimes the physical address is a pfn, and sometimes its
an mfn. We can tell which is which from the index. */
void (*set_fixmap)(unsigned /* enum fixed_addresses */ idx,
- unsigned long phys, pgprot_t flags);
+ phys_addr_t phys, pgprot_t flags);
};

struct raw_spinlock;
@@ -1432,7 +1432,7 @@ static inline void arch_leave_lazy_mmu_mode(void)
void arch_flush_lazy_mmu_mode(void);

static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx,
- unsigned long phys, pgprot_t flags)
+ phys_addr_t phys, pgprot_t flags)
{
pv_mmu_ops.set_fixmap(idx, phys, flags);
}
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 5b7c7c8..7aa03a5 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -345,7 +345,8 @@ void __native_set_fixmap(enum fixed_addresses idx, pte_t pte)
fixmaps_set++;
}

-void native_set_fixmap(enum fixed_addresses idx, unsigned long phys, pgprot_t flags)
+void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
+ pgprot_t flags)
{
__native_set_fixmap(idx, pfn_pte(phys >> PAGE_SHIFT, flags));
}
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index db3802f..2a81838 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1750,7 +1750,7 @@ __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,
}
#endif /* CONFIG_X86_64 */

-static void xen_set_fixmap(unsigned idx, unsigned long phys, pgprot_t prot)
+static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot)
{
pte_t pte;

2009-04-10 15:22:29

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86: fix set_fixmap to use phys_addr_t

Masami Hiramatsu wrote:
> Commit-ID: 189cdc2b41fce3a780a6f3aa963cc4e8114aec6b
> Gitweb: http://git.kernel.org/tip/189cdc2b41fce3a780a6f3aa963cc4e8114aec6b
> Author: Masami Hiramatsu <[email protected]>
> AuthorDate: Thu, 9 Apr 2009 10:55:33 -0700
> Committer: Ingo Molnar <[email protected]>
> CommitDate: Fri, 10 Apr 2009 15:52:59 +0200
>
> x86: fix set_fixmap to use phys_addr_t
>
> Impact: fix kprobes crash on 32-bit with RAM above 4G
>
> Use phys_addr_t for receiving a physical address argument
> instead of unsigned long. This allows fixmap to handle
> pages higher than 4GB on x86-32.
>
> Signed-off-by: Masami Hiramatsu <[email protected]>
> Acked-by: Mathieu Desnoyers <[email protected]>

I thought I had to update ioremap.c too...
Anyway, here is the ioremap.c update. If you think it is useful,
please merge it.

Thank you,
----

x86: fix early_ioremap and early_set_fixmap to handle over 4G pages

From: Masami Hiramatsu <[email protected]>

Impact: Allow early_ioremap to handle over 4G pages

Use phys_addr_t and resource_size_t for receiving a physical
address argument instead of unsigned long in early_set_fixmap
and early_ioremap. This allows early_ioremap to handle
pages higher than 4GB on x86-32 with PAE.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Ingo Molnar <[email protected]>
---

arch/x86/include/asm/io.h | 6 ++++--
arch/x86/mm/ioremap.c | 23 +++++++++++++----------
2 files changed, 17 insertions(+), 12 deletions(-)


diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index e5383e3..7373932 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -193,8 +193,10 @@ extern void __iomem *ioremap_wc(resource_size_t offset, unsigned long size);
*/
extern void early_ioremap_init(void);
extern void early_ioremap_reset(void);
-extern void __iomem *early_ioremap(unsigned long offset, unsigned long size);
-extern void __iomem *early_memremap(unsigned long offset, unsigned long size);
+extern void __iomem *early_ioremap(resource_size_t phys_addr,
+ unsigned long size);
+extern void __iomem *early_memremap(resource_size_t phys_addr,
+ unsigned long size);
extern void early_iounmap(void __iomem *addr, unsigned long size);

#define IO_SPACE_LIMIT 0xffff
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 0dfa09d..09daebf 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -547,7 +547,7 @@ void __init early_ioremap_reset(void)
}

static void __init __early_set_fixmap(enum fixed_addresses idx,
- unsigned long phys, pgprot_t flags)
+ phys_addr_t phys, pgprot_t flags)
{
unsigned long addr = __fix_to_virt(idx);
pte_t *pte;
@@ -566,7 +566,7 @@ static void __init __early_set_fixmap(enum fixed_addresses idx,
}

static inline void __init early_set_fixmap(enum fixed_addresses idx,
- unsigned long phys, pgprot_t prot)
+ phys_addr_t phys, pgprot_t prot)
{
if (after_paging_init)
__set_fixmap(idx, phys, prot);
@@ -607,9 +607,10 @@ static int __init check_early_ioremap_leak(void)
late_initcall(check_early_ioremap_leak);

static void __init __iomem *
-__early_ioremap(unsigned long phys_addr, unsigned long size, pgprot_t prot)
+__early_ioremap(resource_size_t phys_addr, unsigned long size, pgprot_t prot)
{
- unsigned long offset, last_addr;
+ unsigned long offset;
+ resource_size_t last_addr;
unsigned int nrpages;
enum fixed_addresses idx0, idx;
int i, slot;
@@ -625,15 +626,15 @@ __early_ioremap(unsigned long phys_addr, unsigned long size, pgprot_t prot)
}

if (slot < 0) {
- printk(KERN_INFO "early_iomap(%08lx, %08lx) not found slot\n",
- phys_addr, size);
+ printk(KERN_INFO "early_iomap(%08llx, %08lx) not found slot\n",
+ (u64)phys_addr, size);
WARN_ON(1);
return NULL;
}

if (early_ioremap_debug) {
- printk(KERN_INFO "early_ioremap(%08lx, %08lx) [%d] => ",
- phys_addr, size, slot);
+ printk(KERN_INFO "early_ioremap(%08llx, %08lx) [%d] => ",
+ (u64)phys_addr, size, slot);
dump_stack();
}

@@ -680,13 +681,15 @@ __early_ioremap(unsigned long phys_addr, unsigned long size, pgprot_t prot)
}

/* Remap an IO device */
-void __init __iomem *early_ioremap(unsigned long phys_addr, unsigned long size)
+void __init __iomem *
+early_ioremap(resource_size_t phys_addr, unsigned long size)
{
return __early_ioremap(phys_addr, size, PAGE_KERNEL_IO);
}

/* Remap memory */
-void __init __iomem *early_memremap(unsigned long phys_addr, unsigned long size)
+void __init __iomem *
+early_memremap(resource_size_t phys_addr, unsigned long size)
{
return __early_ioremap(phys_addr, size, PAGE_KERNEL);
}
--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: [email protected]

2009-04-10 16:06:09

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86: fix set_fixmap to use phys_addr_t

* Masami Hiramatsu ([email protected]) wrote:
> Masami Hiramatsu wrote:
> > Commit-ID: 189cdc2b41fce3a780a6f3aa963cc4e8114aec6b
> > Gitweb: http://git.kernel.org/tip/189cdc2b41fce3a780a6f3aa963cc4e8114aec6b
> > Author: Masami Hiramatsu <[email protected]>
> > AuthorDate: Thu, 9 Apr 2009 10:55:33 -0700
> > Committer: Ingo Molnar <[email protected]>
> > CommitDate: Fri, 10 Apr 2009 15:52:59 +0200
> >
> > x86: fix set_fixmap to use phys_addr_t
> >
> > Impact: fix kprobes crash on 32-bit with RAM above 4G
> >
> > Use phys_addr_t for receiving a physical address argument
> > instead of unsigned long. This allows fixmap to handle
> > pages higher than 4GB on x86-32.
> >
> > Signed-off-by: Masami Hiramatsu <[email protected]>
> > Acked-by: Mathieu Desnoyers <[email protected]>
>
> I thought I had to update ioremap.c too...
> Anyway, here is the ioremap.c update. If you think it is useful,
> please merge it.
>
> Thank you,
> ----
>
> x86: fix early_ioremap and early_set_fixmap to handle over 4G pages
>
> From: Masami Hiramatsu <[email protected]>
>
> Impact: Allow early_ioremap to handle over 4G pages
>
> Use phys_addr_t and resource_size_t for receiving a physical
> address argument instead of unsigned long in early_set_fixmap
> and early_ioremap. This allows early_ioremap to handle
> pages higher than 4GB on x86-32 with PAE.
>
> Signed-off-by: Masami Hiramatsu <[email protected]>
> Cc: Mathieu Desnoyers <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> ---
>
> arch/x86/include/asm/io.h | 6 ++++--
> arch/x86/mm/ioremap.c | 23 +++++++++++++----------
> 2 files changed, 17 insertions(+), 12 deletions(-)
>
>
> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
> index e5383e3..7373932 100644
> --- a/arch/x86/include/asm/io.h
> +++ b/arch/x86/include/asm/io.h
> @@ -193,8 +193,10 @@ extern void __iomem *ioremap_wc(resource_size_t offset, unsigned long size);
> */
> extern void early_ioremap_init(void);
> extern void early_ioremap_reset(void);
> -extern void __iomem *early_ioremap(unsigned long offset, unsigned long size);
> -extern void __iomem *early_memremap(unsigned long offset, unsigned long size);
> +extern void __iomem *early_ioremap(resource_size_t phys_addr,
> + unsigned long size);
> +extern void __iomem *early_memremap(resource_size_t phys_addr,
> + unsigned long size);


Nitpick : those second lines are not aligned with each other. The rest
has my

Acked-by: Mathieu Desnoyers <[email protected]>

Thanks !

Mathieu

> extern void early_iounmap(void __iomem *addr, unsigned long size);
>
> #define IO_SPACE_LIMIT 0xffff
> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
> index 0dfa09d..09daebf 100644
> --- a/arch/x86/mm/ioremap.c
> +++ b/arch/x86/mm/ioremap.c
> @@ -547,7 +547,7 @@ void __init early_ioremap_reset(void)
> }
>
> static void __init __early_set_fixmap(enum fixed_addresses idx,
> - unsigned long phys, pgprot_t flags)
> + phys_addr_t phys, pgprot_t flags)
> {
> unsigned long addr = __fix_to_virt(idx);
> pte_t *pte;
> @@ -566,7 +566,7 @@ static void __init __early_set_fixmap(enum fixed_addresses idx,
> }
>
> static inline void __init early_set_fixmap(enum fixed_addresses idx,
> - unsigned long phys, pgprot_t prot)
> + phys_addr_t phys, pgprot_t prot)
> {
> if (after_paging_init)
> __set_fixmap(idx, phys, prot);
> @@ -607,9 +607,10 @@ static int __init check_early_ioremap_leak(void)
> late_initcall(check_early_ioremap_leak);
>
> static void __init __iomem *
> -__early_ioremap(unsigned long phys_addr, unsigned long size, pgprot_t prot)
> +__early_ioremap(resource_size_t phys_addr, unsigned long size, pgprot_t prot)
> {
> - unsigned long offset, last_addr;
> + unsigned long offset;
> + resource_size_t last_addr;
> unsigned int nrpages;
> enum fixed_addresses idx0, idx;
> int i, slot;
> @@ -625,15 +626,15 @@ __early_ioremap(unsigned long phys_addr, unsigned long size, pgprot_t prot)
> }
>
> if (slot < 0) {
> - printk(KERN_INFO "early_iomap(%08lx, %08lx) not found slot\n",
> - phys_addr, size);
> + printk(KERN_INFO "early_iomap(%08llx, %08lx) not found slot\n",
> + (u64)phys_addr, size);
> WARN_ON(1);
> return NULL;
> }
>
> if (early_ioremap_debug) {
> - printk(KERN_INFO "early_ioremap(%08lx, %08lx) [%d] => ",
> - phys_addr, size, slot);
> + printk(KERN_INFO "early_ioremap(%08llx, %08lx) [%d] => ",
> + (u64)phys_addr, size, slot);
> dump_stack();
> }
>
> @@ -680,13 +681,15 @@ __early_ioremap(unsigned long phys_addr, unsigned long size, pgprot_t prot)
> }
>
> /* Remap an IO device */
> -void __init __iomem *early_ioremap(unsigned long phys_addr, unsigned long size)
> +void __init __iomem *
> +early_ioremap(resource_size_t phys_addr, unsigned long size)
> {
> return __early_ioremap(phys_addr, size, PAGE_KERNEL_IO);
> }
>
> /* Remap memory */
> -void __init __iomem *early_memremap(unsigned long phys_addr, unsigned long size)
> +void __init __iomem *
> +early_memremap(resource_size_t phys_addr, unsigned long size)
> {
> return __early_ioremap(phys_addr, size, PAGE_KERNEL);
> }
> --
> Masami Hiramatsu
>
> Software Engineer
> Hitachi Computer Products (America) Inc.
> Software Solutions Division
>
> e-mail: [email protected]
>

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2009-04-10 17:50:37

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86: fix set_fixmap to use phys_addr_t

Mathieu Desnoyers wrote:
> * Masami Hiramatsu ([email protected]) wrote:
>> Masami Hiramatsu wrote:
>>> Commit-ID: 189cdc2b41fce3a780a6f3aa963cc4e8114aec6b
>>> Gitweb: http://git.kernel.org/tip/189cdc2b41fce3a780a6f3aa963cc4e8114aec6b
>>> Author: Masami Hiramatsu <[email protected]>
>>> AuthorDate: Thu, 9 Apr 2009 10:55:33 -0700
>>> Committer: Ingo Molnar <[email protected]>
>>> CommitDate: Fri, 10 Apr 2009 15:52:59 +0200
>>>
>>> x86: fix set_fixmap to use phys_addr_t
>>>
>>> Impact: fix kprobes crash on 32-bit with RAM above 4G
>>>
>>> Use phys_addr_t for receiving a physical address argument
>>> instead of unsigned long. This allows fixmap to handle
>>> pages higher than 4GB on x86-32.
>>>
>>> Signed-off-by: Masami Hiramatsu <[email protected]>
>>> Acked-by: Mathieu Desnoyers <[email protected]>
>> I thought I had to update ioremap.c too...
>> Anyway, here is the ioremap.c update. If you think it is useful,
>> please merge it.
>>
>> Thank you,
>> ----
>>
>> x86: fix early_ioremap and early_set_fixmap to handle over 4G pages
>>
>> From: Masami Hiramatsu <[email protected]>
>>
>> Impact: Allow early_ioremap to handle over 4G pages
>>
>> Use phys_addr_t and resource_size_t for receiving a physical
>> address argument instead of unsigned long in early_set_fixmap
>> and early_ioremap. This allows early_ioremap to handle
>> pages higher than 4GB on x86-32 with PAE.
>>
>> Signed-off-by: Masami Hiramatsu <[email protected]>
>> Cc: Mathieu Desnoyers <[email protected]>
>> Cc: Ingo Molnar <[email protected]>
>> ---
>>
>> arch/x86/include/asm/io.h | 6 ++++--
>> arch/x86/mm/ioremap.c | 23 +++++++++++++----------
>> 2 files changed, 17 insertions(+), 12 deletions(-)
>>
>>
>> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
>> index e5383e3..7373932 100644
>> --- a/arch/x86/include/asm/io.h
>> +++ b/arch/x86/include/asm/io.h
>> @@ -193,8 +193,10 @@ extern void __iomem *ioremap_wc(resource_size_t offset, unsigned long size);
>> */
>> extern void early_ioremap_init(void);
>> extern void early_ioremap_reset(void);
>> -extern void __iomem *early_ioremap(unsigned long offset, unsigned long size);
>> -extern void __iomem *early_memremap(unsigned long offset, unsigned long size);
>> +extern void __iomem *early_ioremap(resource_size_t phys_addr,
>> + unsigned long size);
>> +extern void __iomem *early_memremap(resource_size_t phys_addr,
>> + unsigned long size);
>
>
> Nitpick : those second lines are not aligned with each other. The rest
> has my

hmm, after being patched, those are aligned to the first line...

---
extern void __iomem *early_ioremap(resource_size_t phys_addr,
unsigned long size);
extern void __iomem *early_memremap(resource_size_t phys_addr,
unsigned long size);
---

Thank you,

>
> Acked-by: Mathieu Desnoyers <[email protected]>
>
> Thanks !
>
> Mathieu
>
>> extern void early_iounmap(void __iomem *addr, unsigned long size);
>>
>> #define IO_SPACE_LIMIT 0xffff
>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
>> index 0dfa09d..09daebf 100644
>> --- a/arch/x86/mm/ioremap.c
>> +++ b/arch/x86/mm/ioremap.c
>> @@ -547,7 +547,7 @@ void __init early_ioremap_reset(void)
>> }
>>
>> static void __init __early_set_fixmap(enum fixed_addresses idx,
>> - unsigned long phys, pgprot_t flags)
>> + phys_addr_t phys, pgprot_t flags)
>> {
>> unsigned long addr = __fix_to_virt(idx);
>> pte_t *pte;
>> @@ -566,7 +566,7 @@ static void __init __early_set_fixmap(enum fixed_addresses idx,
>> }
>>
>> static inline void __init early_set_fixmap(enum fixed_addresses idx,
>> - unsigned long phys, pgprot_t prot)
>> + phys_addr_t phys, pgprot_t prot)
>> {
>> if (after_paging_init)
>> __set_fixmap(idx, phys, prot);
>> @@ -607,9 +607,10 @@ static int __init check_early_ioremap_leak(void)
>> late_initcall(check_early_ioremap_leak);
>>
>> static void __init __iomem *
>> -__early_ioremap(unsigned long phys_addr, unsigned long size, pgprot_t prot)
>> +__early_ioremap(resource_size_t phys_addr, unsigned long size, pgprot_t prot)
>> {
>> - unsigned long offset, last_addr;
>> + unsigned long offset;
>> + resource_size_t last_addr;
>> unsigned int nrpages;
>> enum fixed_addresses idx0, idx;
>> int i, slot;
>> @@ -625,15 +626,15 @@ __early_ioremap(unsigned long phys_addr, unsigned long size, pgprot_t prot)
>> }
>>
>> if (slot < 0) {
>> - printk(KERN_INFO "early_iomap(%08lx, %08lx) not found slot\n",
>> - phys_addr, size);
>> + printk(KERN_INFO "early_iomap(%08llx, %08lx) not found slot\n",
>> + (u64)phys_addr, size);
>> WARN_ON(1);
>> return NULL;
>> }
>>
>> if (early_ioremap_debug) {
>> - printk(KERN_INFO "early_ioremap(%08lx, %08lx) [%d] => ",
>> - phys_addr, size, slot);
>> + printk(KERN_INFO "early_ioremap(%08llx, %08lx) [%d] => ",
>> + (u64)phys_addr, size, slot);
>> dump_stack();
>> }
>>
>> @@ -680,13 +681,15 @@ __early_ioremap(unsigned long phys_addr, unsigned long size, pgprot_t prot)
>> }
>>
>> /* Remap an IO device */
>> -void __init __iomem *early_ioremap(unsigned long phys_addr, unsigned long size)
>> +void __init __iomem *
>> +early_ioremap(resource_size_t phys_addr, unsigned long size)
>> {
>> return __early_ioremap(phys_addr, size, PAGE_KERNEL_IO);
>> }
>>
>> /* Remap memory */
>> -void __init __iomem *early_memremap(unsigned long phys_addr, unsigned long size)
>> +void __init __iomem *
>> +early_memremap(resource_size_t phys_addr, unsigned long size)
>> {
>> return __early_ioremap(phys_addr, size, PAGE_KERNEL);
>> }
>> --
>> Masami Hiramatsu
>>
>> Software Engineer
>> Hitachi Computer Products (America) Inc.
>> Software Solutions Division
>>
>> e-mail: [email protected]
>>
>

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: [email protected]

2009-04-10 18:33:14

by Masami Hiramatsu

[permalink] [raw]
Subject: [tip:x86/urgent] x86: fix set_fixmap to use phys_addr_t

Commit-ID: 9b987aeb4a7bc42a3eb8361030b820b0263c31f1
Gitweb: http://git.kernel.org/tip/9b987aeb4a7bc42a3eb8361030b820b0263c31f1
Author: Masami Hiramatsu <[email protected]>
AuthorDate: Thu, 9 Apr 2009 10:55:33 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 10 Apr 2009 20:27:13 +0200

x86: fix set_fixmap to use phys_addr_t

Impact: fix kprobes crash on 32-bit with RAM above 4G

Use phys_addr_t for receiving a physical address argument
instead of unsigned long. This allows fixmap to handle
pages higher than 4GB on x86-32.

Signed-off-by: Masami Hiramatsu <[email protected]>
Acked-by: Mathieu Desnoyers <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: systemtap-ml <[email protected]>
Cc: Gary Hade <[email protected]>
Cc: Linus Torvalds <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/include/asm/fixmap.h | 4 ++--
arch/x86/include/asm/io.h | 6 ++++--
arch/x86/include/asm/paravirt.h | 4 ++--
arch/x86/mm/ioremap.c | 23 +++++++++++++----------
arch/x86/mm/pgtable.c | 3 ++-
arch/x86/xen/mmu.c | 2 +-
6 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index 81937a5..2d81af3 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -151,11 +151,11 @@ extern pte_t *pkmap_page_table;

void __native_set_fixmap(enum fixed_addresses idx, pte_t pte);
void native_set_fixmap(enum fixed_addresses idx,
- unsigned long phys, pgprot_t flags);
+ phys_addr_t phys, pgprot_t flags);

#ifndef CONFIG_PARAVIRT
static inline void __set_fixmap(enum fixed_addresses idx,
- unsigned long phys, pgprot_t flags)
+ phys_addr_t phys, pgprot_t flags)
{
native_set_fixmap(idx, phys, flags);
}
diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index e5383e3..7373932 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -193,8 +193,10 @@ extern void __iomem *ioremap_wc(resource_size_t offset, unsigned long size);
*/
extern void early_ioremap_init(void);
extern void early_ioremap_reset(void);
-extern void __iomem *early_ioremap(unsigned long offset, unsigned long size);
-extern void __iomem *early_memremap(unsigned long offset, unsigned long size);
+extern void __iomem *early_ioremap(resource_size_t phys_addr,
+ unsigned long size);
+extern void __iomem *early_memremap(resource_size_t phys_addr,
+ unsigned long size);
extern void early_iounmap(void __iomem *addr, unsigned long size);

#define IO_SPACE_LIMIT 0xffff
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 7727aa8..378e369 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -347,7 +347,7 @@ struct pv_mmu_ops {
/* Sometimes the physical address is a pfn, and sometimes its
an mfn. We can tell which is which from the index. */
void (*set_fixmap)(unsigned /* enum fixed_addresses */ idx,
- unsigned long phys, pgprot_t flags);
+ phys_addr_t phys, pgprot_t flags);
};

struct raw_spinlock;
@@ -1432,7 +1432,7 @@ static inline void arch_leave_lazy_mmu_mode(void)
void arch_flush_lazy_mmu_mode(void);

static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx,
- unsigned long phys, pgprot_t flags)
+ phys_addr_t phys, pgprot_t flags)
{
pv_mmu_ops.set_fixmap(idx, phys, flags);
}
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 0dfa09d..09daebf 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -547,7 +547,7 @@ void __init early_ioremap_reset(void)
}

static void __init __early_set_fixmap(enum fixed_addresses idx,
- unsigned long phys, pgprot_t flags)
+ phys_addr_t phys, pgprot_t flags)
{
unsigned long addr = __fix_to_virt(idx);
pte_t *pte;
@@ -566,7 +566,7 @@ static void __init __early_set_fixmap(enum fixed_addresses idx,
}

static inline void __init early_set_fixmap(enum fixed_addresses idx,
- unsigned long phys, pgprot_t prot)
+ phys_addr_t phys, pgprot_t prot)
{
if (after_paging_init)
__set_fixmap(idx, phys, prot);
@@ -607,9 +607,10 @@ static int __init check_early_ioremap_leak(void)
late_initcall(check_early_ioremap_leak);

static void __init __iomem *
-__early_ioremap(unsigned long phys_addr, unsigned long size, pgprot_t prot)
+__early_ioremap(resource_size_t phys_addr, unsigned long size, pgprot_t prot)
{
- unsigned long offset, last_addr;
+ unsigned long offset;
+ resource_size_t last_addr;
unsigned int nrpages;
enum fixed_addresses idx0, idx;
int i, slot;
@@ -625,15 +626,15 @@ __early_ioremap(unsigned long phys_addr, unsigned long size, pgprot_t prot)
}

if (slot < 0) {
- printk(KERN_INFO "early_iomap(%08lx, %08lx) not found slot\n",
- phys_addr, size);
+ printk(KERN_INFO "early_iomap(%08llx, %08lx) not found slot\n",
+ (u64)phys_addr, size);
WARN_ON(1);
return NULL;
}

if (early_ioremap_debug) {
- printk(KERN_INFO "early_ioremap(%08lx, %08lx) [%d] => ",
- phys_addr, size, slot);
+ printk(KERN_INFO "early_ioremap(%08llx, %08lx) [%d] => ",
+ (u64)phys_addr, size, slot);
dump_stack();
}

@@ -680,13 +681,15 @@ __early_ioremap(unsigned long phys_addr, unsigned long size, pgprot_t prot)
}

/* Remap an IO device */
-void __init __iomem *early_ioremap(unsigned long phys_addr, unsigned long size)
+void __init __iomem *
+early_ioremap(resource_size_t phys_addr, unsigned long size)
{
return __early_ioremap(phys_addr, size, PAGE_KERNEL_IO);
}

/* Remap memory */
-void __init __iomem *early_memremap(unsigned long phys_addr, unsigned long size)
+void __init __iomem *
+early_memremap(resource_size_t phys_addr, unsigned long size)
{
return __early_ioremap(phys_addr, size, PAGE_KERNEL);
}
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 5b7c7c8..7aa03a5 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -345,7 +345,8 @@ void __native_set_fixmap(enum fixed_addresses idx, pte_t pte)
fixmaps_set++;
}

-void native_set_fixmap(enum fixed_addresses idx, unsigned long phys, pgprot_t flags)
+void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
+ pgprot_t flags)
{
__native_set_fixmap(idx, pfn_pte(phys >> PAGE_SHIFT, flags));
}
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index db3802f..2a81838 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1750,7 +1750,7 @@ __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,
}
#endif /* CONFIG_X86_64 */

-static void xen_set_fixmap(unsigned idx, unsigned long phys, pgprot_t prot)
+static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot)
{
pte_t pte;