2011-03-11 13:17:54

by Stefano Stabellini

[permalink] [raw]
Subject: [GIT PULL tip/x86/mm] xen/x86 fixes

Hello,
recently we had a couple of long discussions with Yinghai about boot
crashes on xen, related to pagetable initialization.
As a result we came up with three patches, two of them fix the first [1]
boot crash and provide a nice cleanup on native:


Stefano Stabellini (1):
xen: set max_pfn_mapped to the last pfn mapped
Yinghai Lu (1):
x86: Cleanup highmap after brk is concluded


The third is a xen patch that fixes the other boot crash [2], indirectly
caused by the new initial kernel pagetable allocation strategy:


Stefano Stabellini (1):
xen: update mask_rw_pte after kernel page tables init changes


I have put together a branch with these three patches, based on
tip/x86/mm:

git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git 2.6.38-tip-fixes

Could you please pull from it?


[1] https://lkml.org/lkml/2011/1/31/232
[2] https://lkml.org/lkml/2011/3/1/281

The list of commits with a diffstat follows:

Stefano Stabellini (2):
xen: set max_pfn_mapped to the last pfn mapped
xen: update mask_rw_pte after kernel page tables init changes

Yinghai Lu (1):
x86: Cleanup highmap after brk is concluded

arch/x86/kernel/head64.c | 3 ---
arch/x86/kernel/setup.c | 25 +++----------------------
arch/x86/mm/init_64.c | 11 ++++++-----
arch/x86/xen/mmu.c | 21 ++++++++++++---------
4 files changed, 21 insertions(+), 39 deletions(-)

Thanks,

Stefano


2011-03-11 22:22:38

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [GIT PULL tip/x86/mm] xen/x86 fixes

On Fri, Mar 11, 2011 at 01:17:23PM +0000, Stefano Stabellini wrote:
> Hello,
> recently we had a couple of long discussions with Yinghai about boot
> crashes on xen, related to pagetable initialization.
> As a result we came up with three patches, two of them fix the first [1]
> boot crash and provide a nice cleanup on native:

I don't know why this is happening now, but it could be very well
related to the build config. Smaller builds don't seem to encounter this, while
this is a distro type build. If I use:

> Stefano Stabellini (1):
> xen: set max_pfn_mapped to the last pfn mapped

it hangs during bootup. The machine hangs during the box (no keyboard interaction)
and I can see this in the bootup.

Mar 11 16:30:08 phenom kernel: [ 9.060569] lp: driver loaded but no devices found
Mar 11 16:30:08 phenom kernel: [ 9.065769] piix4_smbus 0000:00:14.0: SMBus Host Controller at 0xb00, revision 0
Mar 11 16:30:08 phenom kernel: [ 9.075831] SP5100 TCO timer: SP5100 TCO WatchDog Timer Driver v0.01
Mar 11 16:30:08 phenom kernel: [ 9.075984] ------------[ cut here ]------------
Mar 11 16:30:08 phenom kernel: [ 9.075993] WARNING: at /home/konrad/ssd/linux/arch/x86/mm/ioremap.c:109 __ioremap_caller+0x3a3/0x3b0()
Mar 11 16:30:08 phenom kernel: [ 9.075997] Hardware name: TA890FXE
Mar 11 16:30:08 phenom kernel: [ 9.075999] Modules linked in: sp5100_tco(+) i2c_piix4 i2c_algo_bit video lp parport usb_storage usbhid hid uas btrfs r8169 ahci libahci zlib_deflate libcrc32c
Mar 11 16:30:08 phenom kernel: [ 9.076024] Pid: 449, comm: modprobe Tainted: G W 2.6.38-rc8-master-00310-gecfaad3 #40
Mar 11 16:30:08 phenom kernel: [ 9.076027] Call Trace:
Mar 11 16:30:08 phenom kernel: [ 9.076034] [<ffffffff8106214f>] ? warn_slowpath_common+0x7f/0xc0
Mar 11 16:30:08 phenom kernel: [ 9.076039] [<ffffffff81007b4f>] ? xen_restore_fl_direct_end+0x0/0x1
Mar 11 16:30:08 phenom kernel: [ 9.076045] [<ffffffff810621aa>] ? warn_slowpath_null+0x1a/0x20
Mar 11 16:30:08 phenom kernel: [ 9.076049] [<ffffffff8103e9a3>] ? __ioremap_caller+0x3a3/0x3b0
Mar 11 16:30:08 phenom kernel: [ 9.076055] [<ffffffff815c157a>] ? error_exit+0x2a/0x60
Mar 11 16:30:08 phenom kernel: [ 9.076059] [<ffffffff815c10a1>] ? retint_restore_args+0x5/0x6
Mar 11 16:30:08 phenom kernel: [ 9.076064] [<ffffffffa012257d>] ? sp5100_tco_init+0xfc/0xb7f [sp5100_tco]
Mar 11 16:30:08 phenom kernel: [ 9.076068] [<ffffffff8103ea77>] ? ioremap_nocache+0x17/0x20
Mar 11 16:30:08 phenom kernel: [ 9.076072] [<ffffffffa012257d>] ? sp5100_tco_init+0xfc/0xb7f [sp5100_tco]
Mar 11 16:30:08 phenom kernel: [ 9.076077] [<ffffffff813b7417>] ? platform_drv_probe+0x17/0x20
Mar 11 16:30:08 phenom kernel: [ 9.076081] [<ffffffff813b6116>] ? driver_probe_device+0x96/0x1c0
Mar 11 16:30:08 phenom kernel: [ 9.076084] [<ffffffff813b62e0>] ? __device_attach+0x0/0x60
Mar 11 16:30:08 phenom kernel: [ 9.076087] [<ffffffff813b6333>] ? __device_attach+0x53/0x60
Mar 11 16:30:08 phenom kernel: [ 9.076091] [<ffffffff813b51a8>] ? bus_for_each_drv+0x68/0x90
Mar 11 16:30:08 phenom kernel: [ 9.076094] [<ffffffff813b63ff>] ? device_attach+0x8f/0xb0
Mar 11 16:30:08 phenom kernel: [ 9.076097] [<ffffffff813b4f7d>] ? bus_probe_device+0x2d/0x50
Mar 11 16:30:08 phenom kernel: [ 9.076101] [<ffffffff813b38e9>] ? device_add+0x639/0x710
Mar 11 16:30:08 phenom kernel: [ 9.076105] [<ffffffff813b2121>] ? dev_set_name+0x41/0x50
Mar 11 16:30:08 phenom kernel: [ 9.076109] [<ffffffff813b7e98>] ? platform_device_add+0x138/0x1f0
Mar 11 16:30:08 phenom kernel: [ 9.076112] [<ffffffff813b82ce>] ? platform_device_register_resndata+0xae/0xc0
Mar 11 16:30:08 phenom kernel: [ 9.076117] [<ffffffffa0006000>] ? sp5100_tco_init_module+0x0/0x1000 [sp5100_tco]
Mar 11 16:30:08 phenom kernel: [ 9.076121] [<ffffffffa0006051>] ? sp5100_tco_init_module+0x51/0x1000 [sp5100_tco]
Mar 11 16:30:08 phenom kernel: [ 9.076125] [<ffffffffa0006000>] ? sp5100_tco_init_module+0x0/0x1000 [sp5100_tco]
Mar 11 16:30:08 phenom kernel: [ 9.076129] [<ffffffff8100214c>] ? do_one_initcall+0x13c/0x190
Mar 11 16:30:08 phenom kernel: [ 9.076133] [<ffffffff8109fd8b>] ? sys_init_module+0xfb/0x250
Mar 11 16:30:08 phenom kernel: [ 9.076137] [<ffffffff8100bfc2>] ? system_call_fastpath+0x16/0x1b
Mar 11 16:30:08 phenom kernel: [ 9.076140] ---[ end trace a7919e7f17c0a727 ]---
Mar 11 16:30:08 phenom kernel: [ 9.076310] PGD 1f0827067 PUD 1f0828067 PMD 1dcdfd067 PTE 0
Mar 11 16:30:08 phenom kernel: [ 9.076329] CPU 0
Mar 11 16:30:08 phenom kernel: [ 9.076332] Modules linked in: sp5100_tco(+) i2c_piix4 i2c_algo_bit video lp parport usb_storage usbhid hid uas btrfs r8169 ahci libahci zlib_deflate libcrc32c
Mar 11 16:30:08 phenom kernel: [ 9.076359]
Mar 11 16:30:08 phenom kernel: [ 9.076364] Pid: 449, comm: modprobe Tainted: G W 2.6.38-rc8-master-00310-gecfaad3 #40 BIOSTAR Group TA890FXE/TA890FXE
Mar 11 16:30:08 phenom kernel: [ 9.076380] RIP: e030:[<ffffffffa0122616>] [<ffffffffa0122616>] sp5100_tco_init+0x195/0xb7f [sp5100_tco]
Mar 11 16:30:08 phenom kernel: [ 9.076392] RSP: e02b:ffff8801cfe1dce8 EFLAGS: 00010202
Mar 11 16:30:08 phenom kernel: [ 9.076400] RAX: ffffc90012658e00 RBX: 0000000000000cd7 RCX: 0000000000b8fe08
Mar 11 16:30:08 phenom kernel: [ 9.076407] RDX: 0000000000000cd7 RSI: 00000000000000a0 RDI: ffff8801dde1c000
Mar 11 16:30:08 phenom kernel: [ 9.076411] RBP: ffff8801cfe1dd08 R08: ffff8801c8a8c800 R09: ffff880000000000
Mar 11 16:30:08 phenom kernel: [ 9.076417] R10: 0000000000000010 R11: 0000000000000000 R12: 00000000ffffffed
Mar 11 16:30:08 phenom kernel: [ 9.076424] R13: ffffffffa0124088 R14: 0000000000000000 R15: 0000000000000000
Mar 11 16:30:08 phenom kernel: [ 9.076436] FS: 00007ff69583f700(0000) GS:ffff8800bfed1000(0000) knlGS:0000000000000000
Mar 11 16:30:08 phenom kernel: [ 9.076442] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
Mar 11 16:30:08 phenom kernel: [ 9.076450] CR2: ffffc90012658e00 CR3: 00000001cfe7c000 CR4: 0000000000000660
Mar 11 16:30:08 phenom kernel: [ 9.076458] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 11 16:30:08 phenom kernel: [ 9.076465] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar 11 16:30:08 phenom kernel: [ 9.076470] Process modprobe (pid: 449, threadinfo ffff8801cfe1c000, task ffff8801cca516c0)
Mar 11 16:30:08 phenom kernel: [ 9.076483] ffff8801cfe1dd18 0000000e813b5fba ffff8801dcef6c10 ffff8801dcef6c10
Mar 11 16:30:08 phenom kernel: [ 9.076495] ffff8801cfe1dd18 ffffffff813b7417 ffff8801cfe1dd48 ffffffff813b6116
Mar 11 16:30:08 phenom kernel: [ 9.076505] ffff8801cfe1dd68 ffffffffa0124088 ffff8801dcef6c10 ffffffff813b62e0
Mar 11 16:30:08 phenom kernel: [ 9.076525] [<ffffffff813b7417>] platform_drv_probe+0x17/0x20
Mar 11 16:30:08 phenom kernel: [ 9.076532] [<ffffffff813b6116>] driver_probe_device+0x96/0x1c0
Mar 11 16:30:08 phenom kernel: [ 9.076538] [<ffffffff813b62e0>] ? __device_attach+0x0/0x60
Mar 11 16:30:08 phenom kernel: [ 9.076545] [<ffffffff813b6333>] __device_attach+0x53/0x60
Mar 11 16:30:08 phenom kernel: [ 9.076550] [<ffffffff813b51a8>] bus_for_each_drv+0x68/0x90
Mar 11 16:30:08 phenom kernel: [ 9.076555] [<ffffffff813b63ff>] device_attach+0x8f/0xb0
Mar 11 16:30:08 phenom kernel: [ 9.076560] [<ffffffff813b4f7d>] bus_probe_device+0x2d/0x50
Mar 11 16:30:08 phenom kernel: [ 9.076566] [<ffffffff813b38e9>] device_add+0x639/0x710
Mar 11 16:30:08 phenom kernel: [ 9.076573] [<ffffffff813b2121>] ? dev_set_name+0x41/0x50
Mar 11 16:30:08 phenom kernel: [ 9.076578] [<ffffffff813b7e98>] platform_device_add+0x138/0x1f0
Mar 11 16:30:08 phenom kernel: [ 9.076584] [<ffffffff813b82ce>] platform_device_register_resndata+0xae/0xc0
Mar 11 16:30:08 phenom kernel: [ 9.076590] [<ffffffffa0006000>] ? sp5100_tco_init_module+0x0/0x1000 [sp5100_tco]
Mar 11 16:30:08 phenom kernel: [ 9.076597] [<ffffffffa0006051>] sp5100_tco_init_module+0x51/0x1000 [sp5100_tco]
Mar 11 16:30:08 phenom kernel: [ 9.076603] [<ffffffffa0006000>] ? sp5100_tco_init_module+0x0/0x1000 [sp5100_tco]
Mar 11 16:30:08 phenom kernel: [ 9.076609] [<ffffffff8100214c>] do_one_initcall+0x13c/0x190
Mar 11 16:30:08 phenom kernel: [ 9.076614] [<ffffffff8109fd8b>] sys_init_module+0xfb/0x250
Mar 11 16:30:08 phenom kernel: [ 9.076620] [<ffffffff8100bfc2>] system_call_fastpath+0x16/0x1b
Mar 11 16:30:08 phenom kernel: [ 9.076722] RSP <ffff8801cfe1dce8>
Mar 11 16:30:08 phenom kernel: [ 9.076730] ---[ end trace a7919e7f17c0a728 ]---
Mar 11 16:30:08 phenom kernel: [ 9.129655] [drm] Initialized drm 1.1.0 20060810
Mar 11 16:30:08 phenom kernel: [ 9.163789] EXT4-fs (sdd1): mounted filesystem with ordered data mode. Opts: errors=remount-ro
Mar 11 16:30:08 phenom kernel: [ 9.169000] MCE: In-kernel MCE decoding enabled.
Mar 11 16:30:08 phenom kernel: [ 9.180697] udev[419]: renamed network interface eth0 to eth2
Mar 11 16:30:08 phenom kernel: [ 9.190273] EDAC MC: Ver: 2.1.0 Mar 11 2011

A normal boot has this in /proc/ioports:

0b00-0b1f : pnp 00:09
0b00-0b07 : piix4_smbus

(there is no sp5100_tco, even thought it is loaded).

If I back out that patch, the machine boots fine.
> Yinghai Lu (1):
> x86: Cleanup highmap after brk is concluded

If I use this one above, the machine crashes right away. I tried
a build with just that patch and had the same failure.

Attached is the config I used.


Attachments:
(No filename) (9.30 kB)
.config (126.36 kB)
Download all attachments

2011-03-16 12:28:38

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [GIT PULL tip/x86/mm] xen/x86 fixes

On Fri, 11 Mar 2011, Konrad Rzeszutek Wilk wrote:
> On Fri, Mar 11, 2011 at 01:17:23PM +0000, Stefano Stabellini wrote:
> > Hello,
> > recently we had a couple of long discussions with Yinghai about boot
> > crashes on xen, related to pagetable initialization.
> > As a result we came up with three patches, two of them fix the first [1]
> > boot crash and provide a nice cleanup on native:
>
> I don't know why this is happening now, but it could be very well
> related to the build config. Smaller builds don't seem to encounter this, while
> this is a distro type build. If I use:
>
> > Stefano Stabellini (1):
> > xen: set max_pfn_mapped to the last pfn mapped
>
> it hangs during bootup. The machine hangs during the box (no keyboard interaction)
> and I can see this in the bootup.

Konrad sent me few other logs offline: log1 is the log of the hang and
log2 is a successful boot (reverting the problematic patch).
It looks like the SP5100 TCO WatchDog Timer Driver is using ioremap on
an address (0xb8fe00) that belongs to the memory range used for the
pagetable (0x9fc000-0xf43fff).
In the succesful case max_pfn_mapped is higher so the pagetable is
located at an higher address (0x16dfb000-0x17342fff) so the problem
doesn't occur.

I still have few unaswered questions on this issue: if we assume that
the ioremap address is the same in the two cases (0xb8fe00), how is it
possible that in the first case it is ram (page_is_ram returns true)
while in the second case it is not (otherwise we would still get a
warning from ioremap): page_is_ram shouldn't be affected by the position
of the kernel pagetable, and the e820 is still the same.
In any case if 0xb8fe00 is really an MMIO address memblock_find_in_range
shouldn't have returned the range (0x9fc000-0xf43fff) in
find_early_table_space.
I think that lowering the value of max_pfn_mapped is likely to cause
bugs like this one, where a low memory range is not properly marked as
reserved and gets mistakenly used for the pagetable.

Considering that meanwhile Linux 2.6.38 was released with this bug, I
think is better if we change approach and fix the regression in a more
straightforward way, like for example:

- 2M align _end;
- do not clean initial mapping between _brk_end to _end;
- resurrect the patch "respect memblock reserved regions when
destroying mappings", trying to minimize the number of memblock reserved
checks.

Opinions?



Regarding the other commit "x86-64, mm: Put early page table high" that
causes a reliable crash on Xen: I noticed that Ingo sent a pull request
to Linus with this commit included.
At this point I can send the patch to fix the Xen issue to Linus
directly, no need to rebased the patch on tip?

2011-03-16 14:44:24

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [GIT PULL tip/x86/mm] xen/x86 fixes

actually attach the logs :)

On Wed, 16 Mar 2011, Stefano Stabellini wrote:
> On Fri, 11 Mar 2011, Konrad Rzeszutek Wilk wrote:
> > On Fri, Mar 11, 2011 at 01:17:23PM +0000, Stefano Stabellini wrote:
> > > Hello,
> > > recently we had a couple of long discussions with Yinghai about boot
> > > crashes on xen, related to pagetable initialization.
> > > As a result we came up with three patches, two of them fix the first [1]
> > > boot crash and provide a nice cleanup on native:
> >
> > I don't know why this is happening now, but it could be very well
> > related to the build config. Smaller builds don't seem to encounter this, while
> > this is a distro type build. If I use:
> >
> > > Stefano Stabellini (1):
> > > xen: set max_pfn_mapped to the last pfn mapped
> >
> > it hangs during bootup. The machine hangs during the box (no keyboard interaction)
> > and I can see this in the bootup.
>
> Konrad sent me few other logs offline: log1 is the log of the hang and
> log2 is a successful boot (reverting the problematic patch).
> It looks like the SP5100 TCO WatchDog Timer Driver is using ioremap on
> an address (0xb8fe00) that belongs to the memory range used for the
> pagetable (0x9fc000-0xf43fff).
> In the succesful case max_pfn_mapped is higher so the pagetable is
> located at an higher address (0x16dfb000-0x17342fff) so the problem
> doesn't occur.
>
> I still have few unaswered questions on this issue: if we assume that
> the ioremap address is the same in the two cases (0xb8fe00), how is it
> possible that in the first case it is ram (page_is_ram returns true)
> while in the second case it is not (otherwise we would still get a
> warning from ioremap): page_is_ram shouldn't be affected by the position
> of the kernel pagetable, and the e820 is still the same.
> In any case if 0xb8fe00 is really an MMIO address memblock_find_in_range
> shouldn't have returned the range (0x9fc000-0xf43fff) in
> find_early_table_space.
> I think that lowering the value of max_pfn_mapped is likely to cause
> bugs like this one, where a low memory range is not properly marked as
> reserved and gets mistakenly used for the pagetable.
>
> Considering that meanwhile Linux 2.6.38 was released with this bug, I
> think is better if we change approach and fix the regression in a more
> straightforward way, like for example:
>
> - 2M align _end;
> - do not clean initial mapping between _brk_end to _end;
> - resurrect the patch "respect memblock reserved regions when
> destroying mappings", trying to minimize the number of memblock reserved
> checks.
>
> Opinions?
>
>
>
> Regarding the other commit "x86-64, mm: Put early page table high" that
> causes a reliable crash on Xen: I noticed that Ingo sent a pull request
> to Linus with this commit included.
> At this point I can send the patch to fix the Xen issue to Linus
> directly, no need to rebased the patch on tip?
>


Attachments:
log1 (89.02 kB)
log2 (81.18 kB)
Download all attachments

2011-03-16 17:56:00

by Yinghai Lu

[permalink] [raw]
Subject: Re: [GIT PULL tip/x86/mm] xen/x86 fixes

On 03/16/2011 07:43 AM, Stefano Stabellini wrote:
> actually attach the logs :)
>
> On Wed, 16 Mar 2011, Stefano Stabellini wrote:
>> On Fri, 11 Mar 2011, Konrad Rzeszutek Wilk wrote:
>>> On Fri, Mar 11, 2011 at 01:17:23PM +0000, Stefano Stabellini wrote:
>>>> Hello,
>>>> recently we had a couple of long discussions with Yinghai about boot
>>>> crashes on xen, related to pagetable initialization.
>>>> As a result we came up with three patches, two of them fix the first [1]
>>>> boot crash and provide a nice cleanup on native:
>>>
>>> I don't know why this is happening now, but it could be very well
>>> related to the build config. Smaller builds don't seem to encounter this, while
>>> this is a distro type build. If I use:
>>>
>>>> Stefano Stabellini (1):
>>>> xen: set max_pfn_mapped to the last pfn mapped
>>>
>>> it hangs during bootup. The machine hangs during the box (no keyboard interaction)
>>> and I can see this in the bootup.
>>
>> Konrad sent me few other logs offline: log1 is the log of the hang and
>> log2 is a successful boot (reverting the problematic patch).
>> It looks like the SP5100 TCO WatchDog Timer Driver is using ioremap on
>> an address (0xb8fe00) that belongs to the memory range used for the
>> pagetable (0x9fc000-0xf43fff).

Mar 15 16:09:04 phenom kernel: [ 0.000000] found SMP MP-table at [ffff8800000ff780] ff780

Mar 15 16:09:04 phenom kernel: [ 0.000000] memblock_x86_reserve_range: [0x000ff780-0x000ff78f] * MP-table mpf

Mar 15 16:09:04 phenom kernel: [ 0.000000] memblock_x86_reserve_range: [0x000fd240-0x000fd423] * MP-table mpc

Mar 15 16:09:04 phenom kernel: [ 0.000000] memblock_x86_reserve_range: [0x01cfd000-0x01d1c0e4] BRK

Mar 15 16:09:04 phenom kernel: [ 0.000000] MEMBLOCK configuration:

Mar 15 16:09:04 phenom kernel: [ 0.000000] memory size = 0x23fe39000

Mar 15 16:09:04 phenom kernel: [ 0.000000] memory.cnt = 0x3

Mar 15 16:09:04 phenom kernel: [ 0.000000] memory[0x0] [0x00000000010000-0x0000000009afff], 0x8b000 bytes

Mar 15 16:09:04 phenom kernel: [ 0.000000] memory[0x1] [0x00000000100000-0x000000bffaffff], 0xbfeb0000 bytes

Mar 15 16:09:04 phenom kernel: [ 0.000000] memory[0x2] [0x00000100000000-0x0000027fefdfff], 0x17fefe000 bytes

Mar 15 16:09:04 phenom kernel: [ 0.000000] reserved.cnt = 0x5

Mar 15 16:09:04 phenom kernel: [ 0.000000] reserved[0x0] [0x000000000fd240-0x000000000fd423], 0x1e4 bytes

Mar 15 16:09:04 phenom kernel: [ 0.000000] reserved[0x1] [0x000000000ff780-0x000000000ff78f], 0x10 bytes

Mar 15 16:09:04 phenom kernel: [ 0.000000] reserved[0x2] [0x00000001000000-0x00000001d1c0e4], 0xd1c0e5 bytes

Mar 15 16:09:04 phenom kernel: [ 0.000000] reserved[0x3] [0x00000001e33000-0x00000016a36fff], 0x14c04000 bytes

Mar 15 16:09:04 phenom kernel: [ 0.000000] reserved[0x4] [0x000001f0f7e000-0x0000027fefdfff], 0x8ef80000 bytes

Mar 15 16:09:04 phenom kernel: [ 0.000000] Scanning 0 areas for low memory corruption

Mar 15 16:09:04 phenom kernel: [ 0.000000] memblock_x86_reserve_range: [0x00099000-0x0009afff] TRAMPOLINE

Mar 15 16:09:04 phenom kernel: [ 0.000000] memblock_x86_reserve_range: [0x00095000-0x00098fff] ACPI WAKEUP

Mar 15 16:09:04 phenom kernel: [ 0.000000] init_memory_mapping: 0000000000000000-00000000bffb0000

Mar 15 16:09:04 phenom kernel: [ 0.000000] DEBUG find_early_table_space: _text=1000000 _end=1e33000 pgtable_start=9fc000 pgtable_end=9fc000

Mar 15 16:09:04 phenom kernel: [ 0.000000] memblock_x86_reserve_range: [0x009fc000-0x00f43fff] PGTABLE

e820 said that range is ram and usable. so it is right for memblock to use it.

why TCO watchdog try to use ioremap with RAM? BIOS put wrong mmio in that BAR?

could do some sanitary check in that driver.

also another question is why memblock_find return so low value, it should return value just under 00000000bffb0000
We are putting page-table high to make usable more continuous, instead of put it just under 512M.

Thanks

Yinghai

2011-03-16 18:02:54

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [GIT PULL tip/x86/mm] xen/x86 fixes

On Wed, 16 Mar 2011, Yinghai Lu wrote:
> On 03/16/2011 07:43 AM, Stefano Stabellini wrote:
> > actually attach the logs :)
> >
> > On Wed, 16 Mar 2011, Stefano Stabellini wrote:
> >> On Fri, 11 Mar 2011, Konrad Rzeszutek Wilk wrote:
> >>> On Fri, Mar 11, 2011 at 01:17:23PM +0000, Stefano Stabellini wrote:
> >>>> Hello,
> >>>> recently we had a couple of long discussions with Yinghai about boot
> >>>> crashes on xen, related to pagetable initialization.
> >>>> As a result we came up with three patches, two of them fix the first [1]
> >>>> boot crash and provide a nice cleanup on native:
> >>>
> >>> I don't know why this is happening now, but it could be very well
> >>> related to the build config. Smaller builds don't seem to encounter this, while
> >>> this is a distro type build. If I use:
> >>>
> >>>> Stefano Stabellini (1):
> >>>> xen: set max_pfn_mapped to the last pfn mapped
> >>>
> >>> it hangs during bootup. The machine hangs during the box (no keyboard interaction)
> >>> and I can see this in the bootup.
> >>
> >> Konrad sent me few other logs offline: log1 is the log of the hang and
> >> log2 is a successful boot (reverting the problematic patch).
> >> It looks like the SP5100 TCO WatchDog Timer Driver is using ioremap on
> >> an address (0xb8fe00) that belongs to the memory range used for the
> >> pagetable (0x9fc000-0xf43fff).
>
> Mar 15 16:09:04 phenom kernel: [ 0.000000] found SMP MP-table at [ffff8800000ff780] ff780
>
> Mar 15 16:09:04 phenom kernel: [ 0.000000] memblock_x86_reserve_range: [0x000ff780-0x000ff78f] * MP-table mpf
>
> Mar 15 16:09:04 phenom kernel: [ 0.000000] memblock_x86_reserve_range: [0x000fd240-0x000fd423] * MP-table mpc
>
> Mar 15 16:09:04 phenom kernel: [ 0.000000] memblock_x86_reserve_range: [0x01cfd000-0x01d1c0e4] BRK
>
> Mar 15 16:09:04 phenom kernel: [ 0.000000] MEMBLOCK configuration:
>
> Mar 15 16:09:04 phenom kernel: [ 0.000000] memory size = 0x23fe39000
>
> Mar 15 16:09:04 phenom kernel: [ 0.000000] memory.cnt = 0x3
>
> Mar 15 16:09:04 phenom kernel: [ 0.000000] memory[0x0] [0x00000000010000-0x0000000009afff], 0x8b000 bytes
>
> Mar 15 16:09:04 phenom kernel: [ 0.000000] memory[0x1] [0x00000000100000-0x000000bffaffff], 0xbfeb0000 bytes
>
> Mar 15 16:09:04 phenom kernel: [ 0.000000] memory[0x2] [0x00000100000000-0x0000027fefdfff], 0x17fefe000 bytes
>
> Mar 15 16:09:04 phenom kernel: [ 0.000000] reserved.cnt = 0x5
>
> Mar 15 16:09:04 phenom kernel: [ 0.000000] reserved[0x0] [0x000000000fd240-0x000000000fd423], 0x1e4 bytes
>
> Mar 15 16:09:04 phenom kernel: [ 0.000000] reserved[0x1] [0x000000000ff780-0x000000000ff78f], 0x10 bytes
>
> Mar 15 16:09:04 phenom kernel: [ 0.000000] reserved[0x2] [0x00000001000000-0x00000001d1c0e4], 0xd1c0e5 bytes
>
> Mar 15 16:09:04 phenom kernel: [ 0.000000] reserved[0x3] [0x00000001e33000-0x00000016a36fff], 0x14c04000 bytes
>
> Mar 15 16:09:04 phenom kernel: [ 0.000000] reserved[0x4] [0x000001f0f7e000-0x0000027fefdfff], 0x8ef80000 bytes
>
> Mar 15 16:09:04 phenom kernel: [ 0.000000] Scanning 0 areas for low memory corruption
>
> Mar 15 16:09:04 phenom kernel: [ 0.000000] memblock_x86_reserve_range: [0x00099000-0x0009afff] TRAMPOLINE
>
> Mar 15 16:09:04 phenom kernel: [ 0.000000] memblock_x86_reserve_range: [0x00095000-0x00098fff] ACPI WAKEUP
>
> Mar 15 16:09:04 phenom kernel: [ 0.000000] init_memory_mapping: 0000000000000000-00000000bffb0000
>
> Mar 15 16:09:04 phenom kernel: [ 0.000000] DEBUG find_early_table_space: _text=1000000 _end=1e33000 pgtable_start=9fc000 pgtable_end=9fc000
>
> Mar 15 16:09:04 phenom kernel: [ 0.000000] memblock_x86_reserve_range: [0x009fc000-0x00f43fff] PGTABLE
>
> e820 said that range is ram and usable. so it is right for memblock to use it.
>
> why TCO watchdog try to use ioremap with RAM? BIOS put wrong mmio in that BAR?
>
> could do some sanitary check in that driver.
>

Yeah, I think the max_pfn_mapped patch might be exposing bugs in the
drivers.
Do you remember this patch:

https://lkml.org/lkml/2011/2/4/60

would you be happy with it as a safer alternative?



> also another question is why memblock_find return so low value, it should return value just under 00000000bffb0000
> We are putting page-table high to make usable more continuous, instead of put it just under 512M.

That is because Konrad is testing without your page table high patch.
I think that with the pagetable high patch most of these issues would go
away on x86_64 but they would remain on x86_32.


Thank you vert much for your quick reply!

2011-03-16 20:46:25

by Yinghai Lu

[permalink] [raw]
Subject: Re: [GIT PULL tip/x86/mm] xen/x86 fixes ===> fix sp5100_tco mmio checking.

On 03/16/2011 11:02 AM, Stefano Stabellini wrote:
> On Wed, 16 Mar 2011, Yinghai Lu wrote:
>> On 03/16/2011 07:43 AM, Stefano Stabellini wrote:
>>> actually attach the logs :)
>>>
>>> On Wed, 16 Mar 2011, Stefano Stabellini wrote:
>>>> On Fri, 11 Mar 2011, Konrad Rzeszutek Wilk wrote:
>>>>> On Fri, Mar 11, 2011 at 01:17:23PM +0000, Stefano Stabellini wrote:
>>>>>> Hello,
>>>>>> recently we had a couple of long discussions with Yinghai about boot
>>>>>> crashes on xen, related to pagetable initialization.
>>>>>> As a result we came up with three patches, two of them fix the first [1]
>>>>>> boot crash and provide a nice cleanup on native:
>>>>>
>>>>> I don't know why this is happening now, but it could be very well
>>>>> related to the build config. Smaller builds don't seem to encounter this, while
>>>>> this is a distro type build. If I use:
>>>>>
>>>>>> Stefano Stabellini (1):
>>>>>> xen: set max_pfn_mapped to the last pfn mapped
>>>>>
>>>>> it hangs during bootup. The machine hangs during the box (no keyboard interaction)
>>>>> and I can see this in the bootup.
>>>>
>>>> Konrad sent me few other logs offline: log1 is the log of the hang and
>>>> log2 is a successful boot (reverting the problematic patch).
>>>> It looks like the SP5100 TCO WatchDog Timer Driver is using ioremap on
>>>> an address (0xb8fe00) that belongs to the memory range used for the
>>>> pagetable (0x9fc000-0xf43fff).
>>
>> Mar 15 16:09:04 phenom kernel: [ 0.000000] found SMP MP-table at [ffff8800000ff780] ff780
>>
>> Mar 15 16:09:04 phenom kernel: [ 0.000000] memblock_x86_reserve_range: [0x000ff780-0x000ff78f] * MP-table mpf
>>
>> Mar 15 16:09:04 phenom kernel: [ 0.000000] memblock_x86_reserve_range: [0x000fd240-0x000fd423] * MP-table mpc
>>
>> Mar 15 16:09:04 phenom kernel: [ 0.000000] memblock_x86_reserve_range: [0x01cfd000-0x01d1c0e4] BRK
>>
>> Mar 15 16:09:04 phenom kernel: [ 0.000000] MEMBLOCK configuration:
>>
>> Mar 15 16:09:04 phenom kernel: [ 0.000000] memory size = 0x23fe39000
>>
>> Mar 15 16:09:04 phenom kernel: [ 0.000000] memory.cnt = 0x3
>>
>> Mar 15 16:09:04 phenom kernel: [ 0.000000] memory[0x0] [0x00000000010000-0x0000000009afff], 0x8b000 bytes
>>
>> Mar 15 16:09:04 phenom kernel: [ 0.000000] memory[0x1] [0x00000000100000-0x000000bffaffff], 0xbfeb0000 bytes
>>
>> Mar 15 16:09:04 phenom kernel: [ 0.000000] memory[0x2] [0x00000100000000-0x0000027fefdfff], 0x17fefe000 bytes
>>
>> Mar 15 16:09:04 phenom kernel: [ 0.000000] reserved.cnt = 0x5
>>
>> Mar 15 16:09:04 phenom kernel: [ 0.000000] reserved[0x0] [0x000000000fd240-0x000000000fd423], 0x1e4 bytes
>>
>> Mar 15 16:09:04 phenom kernel: [ 0.000000] reserved[0x1] [0x000000000ff780-0x000000000ff78f], 0x10 bytes
>>
>> Mar 15 16:09:04 phenom kernel: [ 0.000000] reserved[0x2] [0x00000001000000-0x00000001d1c0e4], 0xd1c0e5 bytes
>>
>> Mar 15 16:09:04 phenom kernel: [ 0.000000] reserved[0x3] [0x00000001e33000-0x00000016a36fff], 0x14c04000 bytes
>>
>> Mar 15 16:09:04 phenom kernel: [ 0.000000] reserved[0x4] [0x000001f0f7e000-0x0000027fefdfff], 0x8ef80000 bytes
>>
>> Mar 15 16:09:04 phenom kernel: [ 0.000000] Scanning 0 areas for low memory corruption
>>
>> Mar 15 16:09:04 phenom kernel: [ 0.000000] memblock_x86_reserve_range: [0x00099000-0x0009afff] TRAMPOLINE
>>
>> Mar 15 16:09:04 phenom kernel: [ 0.000000] memblock_x86_reserve_range: [0x00095000-0x00098fff] ACPI WAKEUP
>>
>> Mar 15 16:09:04 phenom kernel: [ 0.000000] init_memory_mapping: 0000000000000000-00000000bffb0000
>>
>> Mar 15 16:09:04 phenom kernel: [ 0.000000] DEBUG find_early_table_space: _text=1000000 _end=1e33000 pgtable_start=9fc000 pgtable_end=9fc000
>>
>> Mar 15 16:09:04 phenom kernel: [ 0.000000] memblock_x86_reserve_range: [0x009fc000-0x00f43fff] PGTABLE
>>
>> e820 said that range is ram and usable. so it is right for memblock to use it.
>>
>> why TCO watchdog try to use ioremap with RAM? BIOS put wrong mmio in that BAR?
>>
>> could do some sanitary check in that driver.
>>
>
> Yeah, I think the max_pfn_mapped patch might be exposing bugs in the
> drivers.
> Do you remember this patch:
>
> https://lkml.org/lkml/2011/2/4/60
>
> would you be happy with it as a safer alternative?

we should fix tco driver

Mar 15 16:09:04 phenom kernel: [ 9.148536] SP5100 TCO timer: SP5100 TCO WatchDog Timer Driver v0.01

Mar 15 16:09:04 phenom kernel: [ 9.148628] DEBUG __ioremap_caller WARNING address=b8fe00 size=8 valid=1 reserved=1

so BIOS program wrong MMIO info.

need some checking in that driver like

diff --git a/drivers/watchdog/sp5100_tco.c b/drivers/watchdog/sp5100_tco.c
index 8083728..2fac643 100644
--- a/drivers/watchdog/sp5100_tco.c
+++ b/drivers/watchdog/sp5100_tco.c
@@ -42,6 +42,7 @@
#define PFX TCO_MODULE_NAME ": "

/* internal variables */
+static u32 tcobase_phys;
static void __iomem *tcobase;
static unsigned int pm_iobase;
static DEFINE_SPINLOCK(tco_lock); /* Guards the hardware */
@@ -305,6 +306,12 @@ static unsigned char __devinit sp5100_tco_setupdevice(void)
/* Low three bits of BASE0 are reserved. */
val = val << 8 | (inb(SP5100_IO_PM_DATA_REG) & 0xf8);

+ if (request_mem_region_exclusive(val, SP5100_WDT_MEM_MAP_SIZE, "SP5100 TCO")) {
+ printk(KERN_ERR PFX "mmio address 0x%04x already in use\n", val);
+ goto unreg_region;
+ }
+ tcobase_phys = val;
+
tcobase = ioremap(val, SP5100_WDT_MEM_MAP_SIZE);
if (tcobase == 0) {
printk(KERN_ERR PFX "failed to get tcobase address\n");
@@ -414,6 +421,7 @@ static void __devexit sp5100_tco_cleanup(void)
/* Deregister */
misc_deregister(&sp5100_tco_miscdev);
iounmap(tcobase);
+ release_mem_region(tcobase_phys, SP5100_WDT_MEM_MAP_SIZE);
release_region(pm_iobase, SP5100_PM_IOPORTS_SIZE);
}

2011-03-16 21:02:04

by Mike Waychison

[permalink] [raw]
Subject: Re: [GIT PULL tip/x86/mm] xen/x86 fixes ===> fix sp5100_tco mmio checking.

On Wed, Mar 16, 2011 at 1:45 PM, Yinghai Lu <[email protected]> wrote:
> On 03/16/2011 11:02 AM, Stefano Stabellini wrote:
>>
>> On Wed, 16 Mar 2011, Yinghai Lu wrote:
>>>
>>> On 03/16/2011 07:43 AM, Stefano Stabellini wrote:
>>>>
>>>> actually attach the logs :)
>>>>
>>>> On Wed, 16 Mar 2011, Stefano Stabellini wrote:
>>>>>
>>>>> On Fri, 11 Mar 2011, Konrad Rzeszutek Wilk wrote:
>>>>>>
>>>>>> On Fri, Mar 11, 2011 at 01:17:23PM +0000, Stefano Stabellini wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>> recently we had a couple of long discussions with Yinghai about boot
>>>>>>> crashes on xen, related to pagetable initialization.
>>>>>>> As a result we came up with three patches, two of them fix the first
>>>>>>> [1]
>>>>>>> boot crash and provide a nice cleanup on native:
>>>>>>
>>>>>> I don't know why this is happening now, but it could be very well
>>>>>> related to the build config. Smaller builds don't seem to encounter
>>>>>> this, while
>>>>>> this is a distro type build. If I use:
>>>>>>
>>>>>>> Stefano Stabellini (1):
>>>>>>> ? ? ? ?xen: set max_pfn_mapped to the last pfn mapped
>>>>>>
>>>>>> it hangs during bootup. The machine hangs during the box (no keyboard
>>>>>> interaction)
>>>>>> and I can see this in the bootup.
>>>>>
>>>>> Konrad sent me few other logs offline: log1 is the log of the hang and
>>>>> log2 is a successful boot (reverting the problematic patch).
>>>>> It looks like the SP5100 TCO WatchDog Timer Driver is using ioremap on
>>>>> an address (0xb8fe00) that belongs to the memory range used for the
>>>>> pagetable (0x9fc000-0xf43fff).
>>>
>>> Mar 15 16:09:04 phenom kernel: [ ? ?0.000000] found SMP MP-table at
>>> [ffff8800000ff780] ff780
>>>
>>> Mar 15 16:09:04 phenom kernel: [ ? ?0.000000]
>>> memblock_x86_reserve_range: [0x000ff780-0x000ff78f] ? * MP-table mpf
>>>
>>> Mar 15 16:09:04 phenom kernel: [ ? ?0.000000]
>>> memblock_x86_reserve_range: [0x000fd240-0x000fd423] ? * MP-table mpc
>>>
>>> Mar 15 16:09:04 phenom kernel: [ ? ?0.000000]
>>> memblock_x86_reserve_range: [0x01cfd000-0x01d1c0e4] ? ? ? ? ? ? ?BRK
>>>
>>> Mar 15 16:09:04 phenom kernel: [ ? ?0.000000] MEMBLOCK configuration:
>>>
>>> Mar 15 16:09:04 phenom kernel: [ ? ?0.000000] ?memory size = 0x23fe39000
>>>
>>> Mar 15 16:09:04 phenom kernel: [ ? ?0.000000] ?memory.cnt ?= 0x3
>>>
>>> Mar 15 16:09:04 phenom kernel: [ ? ?0.000000] ?memory[0x0]
>>> ?[0x00000000010000-0x0000000009afff], 0x8b000 bytes
>>>
>>> Mar 15 16:09:04 phenom kernel: [ ? ?0.000000] ?memory[0x1]
>>> ?[0x00000000100000-0x000000bffaffff], 0xbfeb0000 bytes
>>>
>>> Mar 15 16:09:04 phenom kernel: [ ? ?0.000000] ?memory[0x2]
>>> ?[0x00000100000000-0x0000027fefdfff], 0x17fefe000 bytes
>>>
>>> Mar 15 16:09:04 phenom kernel: [ ? ?0.000000] ?reserved.cnt ?= 0x5
>>>
>>> Mar 15 16:09:04 phenom kernel: [ ? ?0.000000] ?reserved[0x0]
>>> ?[0x000000000fd240-0x000000000fd423], 0x1e4 bytes
>>>
>>> Mar 15 16:09:04 phenom kernel: [ ? ?0.000000] ?reserved[0x1]
>>> ?[0x000000000ff780-0x000000000ff78f], 0x10 bytes
>>>
>>> Mar 15 16:09:04 phenom kernel: [ ? ?0.000000] ?reserved[0x2]
>>> ?[0x00000001000000-0x00000001d1c0e4], 0xd1c0e5 bytes
>>>
>>> Mar 15 16:09:04 phenom kernel: [ ? ?0.000000] ?reserved[0x3]
>>> ?[0x00000001e33000-0x00000016a36fff], 0x14c04000 bytes
>>>
>>> Mar 15 16:09:04 phenom kernel: [ ? ?0.000000] ?reserved[0x4]
>>> ?[0x000001f0f7e000-0x0000027fefdfff], 0x8ef80000 bytes
>>>
>>> Mar 15 16:09:04 phenom kernel: [ ? ?0.000000] Scanning 0 areas for low
>>> memory corruption
>>>
>>> Mar 15 16:09:04 phenom kernel: [ ? ?0.000000]
>>> memblock_x86_reserve_range: [0x00099000-0x0009afff] ? ? ? TRAMPOLINE
>>>
>>> Mar 15 16:09:04 phenom kernel: [ ? ?0.000000]
>>> memblock_x86_reserve_range: [0x00095000-0x00098fff] ? ? ?ACPI WAKEUP
>>>
>>> Mar 15 16:09:04 phenom kernel: [ ? ?0.000000] init_memory_mapping:
>>> 0000000000000000-00000000bffb0000
>>>
>>> Mar 15 16:09:04 phenom kernel: [ ? ?0.000000] DEBUG
>>> find_early_table_space: _text=1000000 _end=1e33000 pgtable_start=9fc000
>>> pgtable_end=9fc000
>>>
>>> Mar 15 16:09:04 phenom kernel: [ ? ?0.000000]
>>> memblock_x86_reserve_range: [0x009fc000-0x00f43fff] ? ? ? ? ?PGTABLE
>>>
>>> e820 said that range is ram and usable. so it is right for memblock to
>>> use it.
>>>
>>> why TCO watchdog try to use ioremap with RAM? ?BIOS put wrong mmio in
>>> that BAR?
>>>
>>> could do some sanitary check in that driver.
>>>
>>
>> Yeah, I think the max_pfn_mapped patch might be exposing bugs in the
>> drivers.
>> Do you remember this patch:
>>
>> https://lkml.org/lkml/2011/2/4/60
>>
>> would you be happy with it as a safer alternative?
>
> we should fix tco driver
>
> Mar 15 16:09:04 phenom kernel: [ ? ?9.148536] SP5100 TCO timer: SP5100 TCO
> WatchDog Timer Driver v0.01
>
> Mar 15 16:09:04 phenom kernel: [ ? ?9.148628] DEBUG __ioremap_caller WARNING
> address=b8fe00 size=8 valid=1 reserved=1
>
> so BIOS program wrong MMIO info.
>
> need some checking in that driver like
>
> diff --git a/drivers/watchdog/sp5100_tco.c b/drivers/watchdog/sp5100_tco.c
> index 8083728..2fac643 100644
> --- a/drivers/watchdog/sp5100_tco.c
> +++ b/drivers/watchdog/sp5100_tco.c
> @@ -42,6 +42,7 @@
> ?#define PFX TCO_MODULE_NAME ": "
> ?/* internal variables */
> +static u32 tcobase_phys;
> ?static void __iomem *tcobase;
> ?static unsigned int pm_iobase;
> ?static DEFINE_SPINLOCK(tco_lock); ? ? ?/* Guards the hardware */
> @@ -305,6 +306,12 @@ static unsigned char __devinit
> sp5100_tco_setupdevice(void)
> ? ? ? ?/* Low three bits of BASE0 are reserved. */
> ? ? ? ?val = val << 8 | (inb(SP5100_IO_PM_DATA_REG) & 0xf8);
> ?+ ? ? ?if (request_mem_region_exclusive(val, SP5100_WDT_MEM_MAP_SIZE,
> "SP5100 TCO")) {
> + ? ? ? ? ? ? ? printk(KERN_ERR PFX "mmio address 0x%04x already in use\n",
> val);
> + ? ? ? ? ? ? ? goto unreg_region;
> + ? ? ? }
> + ? ? ? tcobase_phys = val;
> +
> ? ? ? ?tcobase = ioremap(val, SP5100_WDT_MEM_MAP_SIZE);
> ? ? ? ?if (tcobase == 0) {

Needs a release_mem_region() in this path. Otherwise this looks fine.

> ? ? ? ? ? ? ? ?printk(KERN_ERR PFX "failed to get tcobase address\n");
> @@ -414,6 +421,7 @@ static void __devexit sp5100_tco_cleanup(void)
> ? ? ? ?/* Deregister */
> ? ? ? ?misc_deregister(&sp5100_tco_miscdev);
> ? ? ? ?iounmap(tcobase);
> + ? ? ? release_mem_region(tcobase_phys, SP5100_WDT_MEM_MAP_SIZE);
> ? ? ? ?release_region(pm_iobase, SP5100_PM_IOPORTS_SIZE);
> ?}
>

2011-03-16 21:18:58

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH] watchdog, SP5100: Check if firmware has set correct value in tcobase.



Stefano found SP5100 TCO watchdog driver using wrong address.

[ 9.148536] SP5100 TCO timer: SP5100 TCO WatchDog Timer Driver v0.01
[ 9.148628] DEBUG __ioremap_caller WARNING address=b8fe00 size=8 valid=1 reserved=1

and e820 said that range is RAM.

We should check if we can use that reading out. BIOS could just program wrong address there.

-v2: Mike pointed out one path need one release.

Reported-by: Stefano Stabellini <[email protected]>
Signed-off-by:Yinghai Lu <[email protected]>
Acked-by: Mike Waychison <[email protected]>

---
drivers/watchdog/sp5100_tco.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)

Index: linux-2.6/drivers/watchdog/sp5100_tco.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/sp5100_tco.c
+++ linux-2.6/drivers/watchdog/sp5100_tco.c
@@ -42,6 +42,7 @@
#define PFX TCO_MODULE_NAME ": "

/* internal variables */
+static u32 tcobase_phys;
static void __iomem *tcobase;
static unsigned int pm_iobase;
static DEFINE_SPINLOCK(tco_lock); /* Guards the hardware */
@@ -305,10 +306,16 @@ static unsigned char __devinit sp5100_tc
/* Low three bits of BASE0 are reserved. */
val = val << 8 | (inb(SP5100_IO_PM_DATA_REG) & 0xf8);

+ if (request_mem_region_exclusive(val, SP5100_WDT_MEM_MAP_SIZE, "SP5100 TCO")) {
+ printk(KERN_ERR PFX "mmio address 0x%04x already in use\n", val);
+ goto unreg_region;
+ }
+ tcobase_phys = val;
+
tcobase = ioremap(val, SP5100_WDT_MEM_MAP_SIZE);
if (tcobase == 0) {
printk(KERN_ERR PFX "failed to get tcobase address\n");
- goto unreg_region;
+ goto unreg_mem_region;
}

/* Enable watchdog decode bit */
@@ -346,7 +353,8 @@ static unsigned char __devinit sp5100_tc
/* Done */
return 1;

- iounmap(tcobase);
+unreg_mem_region:
+ release_mem_region(tcobase_phys, SP5100_WDT_MEM_MAP_SIZE);
unreg_region:
release_region(pm_iobase, SP5100_PM_IOPORTS_SIZE);
exit:
@@ -401,6 +409,7 @@ static int __devinit sp5100_tco_init(str

exit:
iounmap(tcobase);
+ release_mem_region(tcobase_phys, SP5100_WDT_MEM_MAP_SIZE);
release_region(pm_iobase, SP5100_PM_IOPORTS_SIZE);
return ret;
}
@@ -414,6 +423,7 @@ static void __devexit sp5100_tco_cleanup
/* Deregister */
misc_deregister(&sp5100_tco_miscdev);
iounmap(tcobase);
+ release_mem_region(tcobase_phys, SP5100_WDT_MEM_MAP_SIZE);
release_region(pm_iobase, SP5100_PM_IOPORTS_SIZE);
}

2011-03-17 00:06:51

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH] watchdog, SP5100: Check if firmware has set correct value in tcobase.

On Wed, Mar 16, 2011 at 02:18:17PM -0700, Yinghai Lu wrote:
>
>
> Stefano found SP5100 TCO watchdog driver using wrong address.
>
> [ 9.148536] SP5100 TCO timer: SP5100 TCO WatchDog Timer Driver v0.01
> [ 9.148628] DEBUG __ioremap_caller WARNING address=b8fe00 size=8 valid=1 reserved=1
>
> and e820 said that range is RAM.
>
> We should check if we can use that reading out. BIOS could just program wrong address there.
>
> -v2: Mike pointed out one path need one release.
>
> Reported-by: Stefano Stabellini <[email protected]>
> Signed-off-by:Yinghai Lu <[email protected]>
> Acked-by: Mike Waychison <[email protected]>

Tested-by: Konrad Rzeszutek Wilk <[email protected]>

Stefano,

this fixes my bootup issues with your:
xen: set max_pfn_mapped to the last pfn mapped
patch. Will try the full patchset tomorrow.

2011-03-17 02:24:56

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH] watchdog, SP5100: Check if firmware has set correct value in tcobase.

On Wed, Mar 16, 2011 at 02:18:17PM -0700, Yinghai Lu wrote:
>
>
> Stefano found SP5100 TCO watchdog driver using wrong address.
>
> [ 9.148536] SP5100 TCO timer: SP5100 TCO WatchDog Timer Driver v0.01
> [ 9.148628] DEBUG __ioremap_caller WARNING address=b8fe00 size=8 valid=1 reserved=1
>
> and e820 said that range is RAM.
>
> We should check if we can use that reading out. BIOS could just program wrong address there.
>
> -v2: Mike pointed out one path need one release.
>
> Reported-by: Stefano Stabellini <[email protected]>
> Signed-off-by:Yinghai Lu <[email protected]>
> Acked-by: Mike Waychison <[email protected]>

I have no idea why it worked the first time b/c this:


> + if (request_mem_region_exclusive(val, SP5100_WDT_MEM_MAP_SIZE, "SP5100 TCO")) {

is wrong. It should have been "if (!request...")..

With that, and with Stefano's patches (stefano/2.6.38-rc6-mm-fix) on top of 2.6.39-rc0 it boots up fine.

Excerpt from the log:

[ 0.000000] DEBUG find_early_table_space: _text=1000000 _end=1e33000 pgtable_start=9fc000 pgtable_end=9fc000
[ 0.000000] DEBUG find_early_table_space: _text=1000000 _end=1e33000 pgtable_start=beba5000 pgtable_end=beba5000
[ 9.064064] calling sp5100_tco_init_module+0x0/0x1000 [sp5100_tco] @ 507
[ 9.064067] SP5100 TCO timer: SP5100 TCO WatchDog Timer Driver v0.01
[ 9.064180] SP5100 TCO timer: mmio address 0xb8fe00 already in use
[ 9.064201] initcall sp5100_tco_init_module+0x0/0x1000 [sp5100_tco] returned 0 after 126 usecs

Attached is the full log if folks are curious.


Attachments:
(No filename) (1.53 kB)
d (254.83 kB)
Download all attachments

2011-03-17 03:01:40

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH -v3] watchdog, SP5100: Check if firmware has set correct value in tcobase.


Stefano found SP5100 TCO watchdog driver using wrong address.

[ 9.148536] SP5100 TCO timer: SP5100 TCO WatchDog Timer Driver v0.01
[ 9.148628] DEBUG __ioremap_caller WARNING address=b8fe00 size=8 valid=1 reserved=1

and e820 said that range is RAM.

We should check if we can use that reading out. BIOS could just program wrong address there.

-v2: Mike pointed out one path need one release.
-v3: corrected logic checking with request_mem_region_exclusive()
Found by Konrad.


Reported-by: Stefano Stabellini <[email protected]>
Signed-off-by:Yinghai Lu <[email protected]>
Acked-by: Mike Waychison <[email protected]>
Tested-by: Konrad Rzeszutek Wilk <[email protected]>

---
drivers/watchdog/sp5100_tco.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)

Index: linux-2.6/drivers/watchdog/sp5100_tco.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/sp5100_tco.c
+++ linux-2.6/drivers/watchdog/sp5100_tco.c
@@ -42,6 +42,7 @@
#define PFX TCO_MODULE_NAME ": "

/* internal variables */
+static u32 tcobase_phys;
static void __iomem *tcobase;
static unsigned int pm_iobase;
static DEFINE_SPINLOCK(tco_lock); /* Guards the hardware */
@@ -305,10 +306,16 @@ static unsigned char __devinit sp5100_tc
/* Low three bits of BASE0 are reserved. */
val = val << 8 | (inb(SP5100_IO_PM_DATA_REG) & 0xf8);

+ if (!request_mem_region_exclusive(val, SP5100_WDT_MEM_MAP_SIZE, "SP5100 TCO")) {
+ printk(KERN_ERR PFX "mmio address 0x%04x already in use\n", val);
+ goto unreg_region;
+ }
+ tcobase_phys = val;
+
tcobase = ioremap(val, SP5100_WDT_MEM_MAP_SIZE);
if (tcobase == 0) {
printk(KERN_ERR PFX "failed to get tcobase address\n");
- goto unreg_region;
+ goto unreg_mem_region;
}

/* Enable watchdog decode bit */
@@ -346,7 +353,8 @@ static unsigned char __devinit sp5100_tc
/* Done */
return 1;

- iounmap(tcobase);
+unreg_mem_region:
+ release_mem_region(tcobase_phys, SP5100_WDT_MEM_MAP_SIZE);
unreg_region:
release_region(pm_iobase, SP5100_PM_IOPORTS_SIZE);
exit:
@@ -401,6 +409,7 @@ static int __devinit sp5100_tco_init(str

exit:
iounmap(tcobase);
+ release_mem_region(tcobase_phys, SP5100_WDT_MEM_MAP_SIZE);
release_region(pm_iobase, SP5100_PM_IOPORTS_SIZE);
return ret;
}
@@ -414,6 +423,7 @@ static void __devexit sp5100_tco_cleanup
/* Deregister */
misc_deregister(&sp5100_tco_miscdev);
iounmap(tcobase);
+ release_mem_region(tcobase_phys, SP5100_WDT_MEM_MAP_SIZE);
release_region(pm_iobase, SP5100_PM_IOPORTS_SIZE);
}

2011-03-17 12:36:06

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [PATCH] watchdog, SP5100: Check if firmware has set correct value in tcobase.

On Thu, 17 Mar 2011, Konrad Rzeszutek Wilk wrote:
> On Wed, Mar 16, 2011 at 02:18:17PM -0700, Yinghai Lu wrote:
> >
> >
> > Stefano found SP5100 TCO watchdog driver using wrong address.
> >
> > [ 9.148536] SP5100 TCO timer: SP5100 TCO WatchDog Timer Driver v0.01
> > [ 9.148628] DEBUG __ioremap_caller WARNING address=b8fe00 size=8 valid=1 reserved=1
> >
> > and e820 said that range is RAM.
> >
> > We should check if we can use that reading out. BIOS could just program wrong address there.
> >
> > -v2: Mike pointed out one path need one release.
> >
> > Reported-by: Stefano Stabellini <[email protected]>
> > Signed-off-by:Yinghai Lu <[email protected]>
> > Acked-by: Mike Waychison <[email protected]>
>
> I have no idea why it worked the first time b/c this:
>
>
> > + if (request_mem_region_exclusive(val, SP5100_WDT_MEM_MAP_SIZE, "SP5100 TCO")) {
>
> is wrong. It should have been "if (!request...")..
>
> With that, and with Stefano's patches (stefano/2.6.38-rc6-mm-fix) on top of 2.6.39-rc0 it boots up fine.

Yinghai, thanks for the patch!
I hope that we are not going to find any more of this kind of issues
with other drivers and other BIOSes.

2011-03-17 12:45:19

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [GIT PULL tip/x86/mm] xen/x86 fixes

On Wed, 16 Mar 2011, Stefano Stabellini wrote:
> On Fri, 11 Mar 2011, Konrad Rzeszutek Wilk wrote:
> > On Fri, Mar 11, 2011 at 01:17:23PM +0000, Stefano Stabellini wrote:
> > > Hello,
> > > recently we had a couple of long discussions with Yinghai about boot
> > > crashes on xen, related to pagetable initialization.
> > > As a result we came up with three patches, two of them fix the first [1]
> > > boot crash and provide a nice cleanup on native:
> >
> > I don't know why this is happening now, but it could be very well
> > related to the build config. Smaller builds don't seem to encounter this, while
> > this is a distro type build. If I use:
> >
> > > Stefano Stabellini (1):
> > > xen: set max_pfn_mapped to the last pfn mapped
> >
> > it hangs during bootup. The machine hangs during the box (no keyboard interaction)
> > and I can see this in the bootup.
>
> Konrad sent me few other logs offline: log1 is the log of the hang and
> log2 is a successful boot (reverting the problematic patch).
> It looks like the SP5100 TCO WatchDog Timer Driver is using ioremap on
> an address (0xb8fe00) that belongs to the memory range used for the
> pagetable (0x9fc000-0xf43fff).
> In the succesful case max_pfn_mapped is higher so the pagetable is
> located at an higher address (0x16dfb000-0x17342fff) so the problem
> doesn't occur.
>
> I still have few unaswered questions on this issue: if we assume that
> the ioremap address is the same in the two cases (0xb8fe00), how is it
> possible that in the first case it is ram (page_is_ram returns true)
> while in the second case it is not (otherwise we would still get a
> warning from ioremap): page_is_ram shouldn't be affected by the position
> of the kernel pagetable, and the e820 is still the same.
> In any case if 0xb8fe00 is really an MMIO address memblock_find_in_range
> shouldn't have returned the range (0x9fc000-0xf43fff) in
> find_early_table_space.
>

The issue is due to a bug in the TCO driver and has been fixed thanks
to Yinghai.


Peter, can I add your ack to "Cleanup highmap after brk is concluded" by
Yinghai?

Should I send another git pull request for tip even though the two
patches on linux-tip that this series was depending on have gone
upstream?

x86-64: Move out cleanup higmap [_brk_end, _end) out of init_memory_mapping()
x86-64, mm: Put early page table high

Should I send the git pull request to Linus instead?

2011-03-17 17:25:47

by Yinghai Lu

[permalink] [raw]
Subject: Re: [GIT PULL tip/x86/mm] xen/x86 fixes

On 03/17/2011 05:44 AM, Stefano Stabellini wrote:
> On Wed, 16 Mar 2011, Stefano Stabellini wrote:
>> On Fri, 11 Mar 2011, Konrad Rzeszutek Wilk wrote:
>>> On Fri, Mar 11, 2011 at 01:17:23PM +0000, Stefano Stabellini wrote:
>>>> Hello,
>>>> recently we had a couple of long discussions with Yinghai about boot
>>>> crashes on xen, related to pagetable initialization.
>>>> As a result we came up with three patches, two of them fix the first [1]
>>>> boot crash and provide a nice cleanup on native:
>>>
>>> I don't know why this is happening now, but it could be very well
>>> related to the build config. Smaller builds don't seem to encounter this, while
>>> this is a distro type build. If I use:
>>>
>>>> Stefano Stabellini (1):
>>>> xen: set max_pfn_mapped to the last pfn mapped
>>>
>>> it hangs during bootup. The machine hangs during the box (no keyboard interaction)
>>> and I can see this in the bootup.
>>
>> Konrad sent me few other logs offline: log1 is the log of the hang and
>> log2 is a successful boot (reverting the problematic patch).
>> It looks like the SP5100 TCO WatchDog Timer Driver is using ioremap on
>> an address (0xb8fe00) that belongs to the memory range used for the
>> pagetable (0x9fc000-0xf43fff).
>> In the succesful case max_pfn_mapped is higher so the pagetable is
>> located at an higher address (0x16dfb000-0x17342fff) so the problem
>> doesn't occur.
>>
>> I still have few unaswered questions on this issue: if we assume that
>> the ioremap address is the same in the two cases (0xb8fe00), how is it
>> possible that in the first case it is ram (page_is_ram returns true)
>> while in the second case it is not (otherwise we would still get a
>> warning from ioremap): page_is_ram shouldn't be affected by the position
>> of the kernel pagetable, and the e820 is still the same.
>> In any case if 0xb8fe00 is really an MMIO address memblock_find_in_range
>> shouldn't have returned the range (0x9fc000-0xf43fff) in
>> find_early_table_space.
>>
>
> The issue is due to a bug in the TCO driver and has been fixed thanks
> to Yinghai.
>
>
> Peter, can I add your ack to "Cleanup highmap after brk is concluded" by
> Yinghai?
>
> Should I send another git pull request for tip even though the two
> patches on linux-tip that this series was depending on have gone
> upstream?
>
> x86-64: Move out cleanup higmap [_brk_end, _end) out of init_memory_mapping()
> x86-64, mm: Put early page table high
>
> Should I send the git pull request to Linus instead?

can you please resend the pull request to Ingo and HPA?

better to make those patches go through tip and pass Ingo's test matrix.

Thanks

Yinghai

2011-03-17 17:39:01

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [GIT PULL tip/x86/mm] xen/x86 fixes

On Thu, 17 Mar 2011, Yinghai Lu wrote:
> > Should I send another git pull request for tip even though the two
> > patches on linux-tip that this series was depending on have gone
> > upstream?
> >
> > x86-64: Move out cleanup higmap [_brk_end, _end) out of init_memory_mapping()
> > x86-64, mm: Put early page table high
> >
> > Should I send the git pull request to Linus instead?
>
> can you please resend the pull request to Ingo and HPA?
>
> better to make those patches go through tip and pass Ingo's test matrix.

OK, I'll send another pull request.

2011-03-18 13:10:56

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH -v3] watchdog, SP5100: Check if firmware has set correct value in tcobase.

On Wed, Mar 16, 2011 at 08:01:07PM -0700, Yinghai Lu wrote:
>
> Stefano found SP5100 TCO watchdog driver using wrong address.
>
> [ 9.148536] SP5100 TCO timer: SP5100 TCO WatchDog Timer Driver v0.01
> [ 9.148628] DEBUG __ioremap_caller WARNING address=b8fe00 size=8 valid=1 reserved=1
>
> and e820 said that range is RAM.
>
> We should check if we can use that reading out. BIOS could just program wrong address there.
>
> -v2: Mike pointed out one path need one release.
> -v3: corrected logic checking with request_mem_region_exclusive()
> Found by Konrad.

Yinghai:
Not sure what you are using as a base, but I had to modify this patch
to go on top of 2.6.38. Here is the patch that applies cleanly:

diff --git a/drivers/watchdog/sp5100_tco.c b/drivers/watchdog/sp5100_tco.c
index 8083728..9cbca8b 100644
--- a/drivers/watchdog/sp5100_tco.c
+++ b/drivers/watchdog/sp5100_tco.c
@@ -42,6 +42,7 @@
#define PFX TCO_MODULE_NAME ": "

/* internal variables */
+static u32 tcobase_phys;
static void __iomem *tcobase;
static unsigned int pm_iobase;
static DEFINE_SPINLOCK(tco_lock); /* Guards the hardware */
@@ -305,10 +306,16 @@ static unsigned char __devinit sp5100_tco_setupdevice(void)
/* Low three bits of BASE0 are reserved. */
val = val << 8 | (inb(SP5100_IO_PM_DATA_REG) & 0xf8);

+ if (!request_mem_region_exclusive(val, SP5100_WDT_MEM_MAP_SIZE, "SP5100 TCO")) {
+ printk(KERN_ERR PFX "mmio address 0x%04x already in use\n", val);
+ goto unreg_region;
+ }
+ tcobase_phys = val;
+
tcobase = ioremap(val, SP5100_WDT_MEM_MAP_SIZE);
if (tcobase == 0) {
printk(KERN_ERR PFX "failed to get tcobase address\n");
- goto unreg_region;
+ goto unreg_mem_region;
}

/* Enable watchdog decode bit */
@@ -346,7 +353,8 @@ static unsigned char __devinit sp5100_tco_setupdevice(void)
/* Done */
return 1;

- iounmap(tcobase);
+unreg_mem_region:
+ release_mem_region(tcobase_phys, SP5100_WDT_MEM_MAP_SIZE);
unreg_region:
release_region(pm_iobase, SP5100_PM_IOPORTS_SIZE);
exit:
@@ -401,6 +409,7 @@ static int __devinit sp5100_tco_init(struct platform_device *dev)

exit:
iounmap(tcobase);
+ release_mem_region(tcobase_phys, SP5100_WDT_MEM_MAP_SIZE);
release_region(pm_iobase, SP5100_PM_IOPORTS_SIZE);
return ret;
}
@@ -414,6 +423,7 @@ static void __devexit sp5100_tco_cleanup(void)
/* Deregister */
misc_deregister(&sp5100_tco_miscdev);
iounmap(tcobase);
+ release_mem_region(tcobase_phys, SP5100_WDT_MEM_MAP_SIZE);
release_region(pm_iobase, SP5100_PM_IOPORTS_SIZE);
}

>
>
> Reported-by: Stefano Stabellini <[email protected]>
> Signed-off-by:Yinghai Lu <[email protected]>
> Acked-by: Mike Waychison <[email protected]>
> Tested-by: Konrad Rzeszutek Wilk <[email protected]>
>
> ---
> drivers/watchdog/sp5100_tco.c | 14 ++++++++++++--
> 1 file changed, 12 insertions(+), 2 deletions(-)
>
> Index: linux-2.6/drivers/watchdog/sp5100_tco.c
> ===================================================================
> --- linux-2.6.orig/drivers/watchdog/sp5100_tco.c
> +++ linux-2.6/drivers/watchdog/sp5100_tco.c
> @@ -42,6 +42,7 @@
> #define PFX TCO_MODULE_NAME ": "
> /* internal variables */
> +static u32 tcobase_phys;
> static void __iomem *tcobase;
> static unsigned int pm_iobase;
> static DEFINE_SPINLOCK(tco_lock); /* Guards the hardware */
> @@ -305,10 +306,16 @@ static unsigned char __devinit sp5100_tc
> /* Low three bits of BASE0 are reserved. */
> val = val << 8 | (inb(SP5100_IO_PM_DATA_REG) & 0xf8);
> + if (!request_mem_region_exclusive(val, SP5100_WDT_MEM_MAP_SIZE, "SP5100 TCO")) {
> + printk(KERN_ERR PFX "mmio address 0x%04x already in use\n", val);
> + goto unreg_region;
> + }
> + tcobase_phys = val;
> +
> tcobase = ioremap(val, SP5100_WDT_MEM_MAP_SIZE);
> if (tcobase == 0) {
> printk(KERN_ERR PFX "failed to get tcobase address\n");
> - goto unreg_region;
> + goto unreg_mem_region;
> }
> /* Enable watchdog decode bit */
> @@ -346,7 +353,8 @@ static unsigned char __devinit sp5100_tc
> /* Done */
> return 1;
> - iounmap(tcobase);
> +unreg_mem_region:
> + release_mem_region(tcobase_phys, SP5100_WDT_MEM_MAP_SIZE);
> unreg_region:
> release_region(pm_iobase, SP5100_PM_IOPORTS_SIZE);
> exit:
> @@ -401,6 +409,7 @@ static int __devinit sp5100_tco_init(str
> exit:
> iounmap(tcobase);
> + release_mem_region(tcobase_phys, SP5100_WDT_MEM_MAP_SIZE);
> release_region(pm_iobase, SP5100_PM_IOPORTS_SIZE);
> return ret;
> }
> @@ -414,6 +423,7 @@ static void __devexit sp5100_tco_cleanup
> /* Deregister */
> misc_deregister(&sp5100_tco_miscdev);
> iounmap(tcobase);
> + release_mem_region(tcobase_phys, SP5100_WDT_MEM_MAP_SIZE);
> release_region(pm_iobase, SP5100_PM_IOPORTS_SIZE);
> }
>
> _______________________________________________
> Xen-devel mailing list
> [email protected]
> http://lists.xensource.com/xen-devel

2011-03-18 16:39:16

by Yinghai Lu

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH -v3] watchdog, SP5100: Check if firmware has set correct value in tcobase.

On Fri, Mar 18, 2011 at 6:10 AM, Konrad Rzeszutek Wilk
<[email protected]> wrote:
> On Wed, Mar 16, 2011 at 08:01:07PM -0700, Yinghai Lu wrote:
>>
>> Stefano found SP5100 TCO watchdog driver using wrong address.
>>
>> [ ? ?9.148536] SP5100 TCO timer: SP5100 TCO WatchDog Timer Driver v0.01
>> [ ? ?9.148628] DEBUG __ioremap_caller WARNING address=b8fe00 size=8 valid=1 reserved=1
>>
>> and e820 said that range is RAM.
>>
>> We should check if we can use that reading out. BIOS could just program wrong address there.
>>
>> -v2: Mike pointed out one path need one release.
>> -v3: corrected logic checking with request_mem_region_exclusive()
>> ? ? ?Found by Konrad.
>
> Yinghai:
> Not sure what you are using as a base, but I had to modify this patch
> to go on top of 2.6.38. Here is the patch that applies cleanly:


after

commit 4562f53940432369df88e195ef8f9b642bdf7cd6
Author: Wim Van Sebroeck <[email protected]>
Date: Mon Feb 21 12:16:44 2011 +0000

watchdog: convert to DEFINE_PCI_DEVICE_TABLE

Convert static struct pci_device_id *[] to static
DEFINE_PCI_DEVICE_TABLE tables.

Signed-off-by: Wim Van Sebroeck <[email protected]>

diff --git a/drivers/watchdog/sp5100_tco.c b/drivers/watchdog/sp5100_tco.c
index 8083728..1bc4938 100644
--- a/drivers/watchdog/sp5100_tco.c
+++ b/drivers/watchdog/sp5100_tco.c
@@ -259,7 +259,7 @@ static struct miscdevice sp5100_tco_miscdev = {
* register a pci_driver, because someone else might
* want to register another driver on the same PCI id.
*/
-static struct pci_device_id sp5100_tco_pci_tbl[] = {
+static DEFINE_PCI_DEVICE_TABLE(sp5100_tco_pci_tbl) = {
{ PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_SBX00_SMBUS, PCI_ANY_ID,
PCI_ANY_ID, },
{ 0, }, /* End of list */

commit 15e28bf130081a574192fb934b832ac7d07739f7
Author: Priyanka Gupta <[email protected]>
Date: Mon Oct 25 17:58:04 2010 -0700

watchdog: Add support for sp5100 chipset TCO

This driver adds /dev/watchdog support for the AMD sp5100 aka
SB7x0 chipsets.

It follows the same conventions found in other /dev/watchdog drivers.

Signed-off-by: Priyanka Gupta <[email protected]>
Signed-off-by: Mike Waychison <[email protected]>
Signed-off-by: Wim Van Sebroeck <[email protected]>

2011-03-24 05:33:45

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH -v3 -resend] watchdog, SP5100: Check if firmware has set correct value in tcobase.


Stefano found SP5100 TCO watchdog driver using wrong address.

[ 9.148536] SP5100 TCO timer: SP5100 TCO WatchDog Timer Driver v0.01
[ 9.148628] DEBUG __ioremap_caller WARNING address=b8fe00 size=8 valid=1 reserved=1

and e820 said that range is RAM.

We should check if we can use that reading out. BIOS could just program wrong address there.

-v2: Mike pointed out one path need one release.
-v3: corrected logic checking with request_mem_region_exclusive()
Found by Konrad.


Reported-by: Stefano Stabellini <[email protected]>
Signed-off-by:Yinghai Lu <[email protected]>
Acked-by: Mike Waychison <[email protected]>
Tested-by: Konrad Rzeszutek Wilk <[email protected]>

---
drivers/watchdog/sp5100_tco.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)

Index: linux-2.6/drivers/watchdog/sp5100_tco.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/sp5100_tco.c
+++ linux-2.6/drivers/watchdog/sp5100_tco.c
@@ -42,6 +42,7 @@
#define PFX TCO_MODULE_NAME ": "

/* internal variables */
+static u32 tcobase_phys;
static void __iomem *tcobase;
static unsigned int pm_iobase;
static DEFINE_SPINLOCK(tco_lock); /* Guards the hardware */
@@ -305,10 +306,16 @@ static unsigned char __devinit sp5100_tc
/* Low three bits of BASE0 are reserved. */
val = val << 8 | (inb(SP5100_IO_PM_DATA_REG) & 0xf8);

+ if (!request_mem_region_exclusive(val, SP5100_WDT_MEM_MAP_SIZE, "SP5100 TCO")) {
+ printk(KERN_ERR PFX "mmio address 0x%04x already in use\n", val);
+ goto unreg_region;
+ }
+ tcobase_phys = val;
+
tcobase = ioremap(val, SP5100_WDT_MEM_MAP_SIZE);
if (tcobase == 0) {
printk(KERN_ERR PFX "failed to get tcobase address\n");
- goto unreg_region;
+ goto unreg_mem_region;
}

/* Enable watchdog decode bit */
@@ -346,7 +353,8 @@ static unsigned char __devinit sp5100_tc
/* Done */
return 1;

- iounmap(tcobase);
+unreg_mem_region:
+ release_mem_region(tcobase_phys, SP5100_WDT_MEM_MAP_SIZE);
unreg_region:
release_region(pm_iobase, SP5100_PM_IOPORTS_SIZE);
exit:
@@ -401,6 +409,7 @@ static int __devinit sp5100_tco_init(str

exit:
iounmap(tcobase);
+ release_mem_region(tcobase_phys, SP5100_WDT_MEM_MAP_SIZE);
release_region(pm_iobase, SP5100_PM_IOPORTS_SIZE);
return ret;
}
@@ -414,6 +423,7 @@ static void __devexit sp5100_tco_cleanup
/* Deregister */
misc_deregister(&sp5100_tco_miscdev);
iounmap(tcobase);
+ release_mem_region(tcobase_phys, SP5100_WDT_MEM_MAP_SIZE);
release_region(pm_iobase, SP5100_PM_IOPORTS_SIZE);
}

2011-03-24 08:31:29

by Wim Van Sebroeck

[permalink] [raw]
Subject: Re: [PATCH -v3 -resend] watchdog, SP5100: Check if firmware has set correct value in tcobase.

Hi Yinghai,

> Stefano found SP5100 TCO watchdog driver using wrong address.
>
> [ 9.148536] SP5100 TCO timer: SP5100 TCO WatchDog Timer Driver v0.01
> [ 9.148628] DEBUG __ioremap_caller WARNING address=b8fe00 size=8 valid=1 reserved=1
>
> and e820 said that range is RAM.
>
> We should check if we can use that reading out. BIOS could just program wrong address there.
>
> -v2: Mike pointed out one path need one release.
> -v3: corrected logic checking with request_mem_region_exclusive()
> Found by Konrad.
>
>
> Reported-by: Stefano Stabellini <[email protected]>
> Signed-off-by:Yinghai Lu <[email protected]>
> Acked-by: Mike Waychison <[email protected]>
> Tested-by: Konrad Rzeszutek Wilk <[email protected]>

This patch is in linux-2.6-watchdog-next for a week now. Will go upstream to linus.

Kind regards,
Wim.

2011-03-24 15:41:23

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH -v3 -resend] watchdog, SP5100: Check if firmware has set correct value in tcobase.

On 03/24/2011 01:31 AM, Wim Van Sebroeck wrote:
>
> This patch is in linux-2.6-watchdog-next for a week now. Will go upstream to linus.

Thanks.