2012-05-28 13:36:50

by Liu, Jinsong

[permalink] [raw]
Subject: RE: [Xen-devel] [PATCH 1/3] xen/mce: Add mcelog support for Xen platform (v2)

Liu, Jinsong wrote:
> Konrad Rzeszutek Wilk wrote:
>> On Fri, May 25, 2012 at 05:56:56PM +0000, Liu, Jinsong wrote:
>>> Konrad Rzeszutek Wilk wrote:
>>>>>>>>> -static struct miscdevice mce_chrdev_device = {
>>>>>>>>> +struct miscdevice mce_chrdev_device = {
>>>>>>>>> MISC_MCELOG_MINOR,
>>>>>>>>> "mcelog",
>>>>>>>>> &mce_chrdev_ops,
>>>>>>>>
>>>>>>>> You're still reusing those - pls, define your own 'struct
>>>>>>>> miscdevice mce_chrdev_device' in drivers/xen/ or somewhere
>>>>>>>> convenient and your own mce_chrdev_ops. The only thing you
>>>>>>>> should be touching in arch/x86/.../mcheck/ is the export of
>>>>>>>> MISC_MCELOG_MINOR.
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>
>>>>>>> I'm *not* reuse native code.
>>>>>>> I have defined 'struct miscdevice xen_mce_chrdev_device' in
>>>>>>> drivers/xen, and I also implement xen_mce_chrdev_ops, they are
>>>>>>> all xen-self-contained.
>>>>>>>
>>>>>>> The patch just redirect native mce_chrdev_device to
>>>>>>> xen_mce_chrdev_device when running under xen environment.
>>>>>>> It didn't change any native code (except just cancel
>>>>>>> mce_chrdev_device 'static'), and will not break native logic.
>>>>>>
>>>>>> Why are you doing that?
>>>>>>
>>>>>> Why don't you do
>>>>>>
>>>>>> misc_register(&xen_mce_chrdev_device);
>>>>>>
>>>>>> in xen_early_init_mcelog() ?
>>>>>>
>>>>>> This way there'll be no arch/x86/ dependencies at all.
>>>>>
>>>>> The reason is, if we do so, it would be covered by native
>>>>> misc_register(&mce_chrdev_device) later when native kernel init
>>>>> (xen init first and then start native kernel).
>>>>
>>>> Won't the second registration (so the original one) of the major
>>>> fail? So the mce_log would just error out since somebody already
>>>> registered?
>>>
>>> No, that would be device confliction, the 2nd register return as
>>> -EBUSY and un-predicetable result.
>>
>> And the existing code does not actually check the 'misc_register'
>> return value? Ah yes. Perhaps then a fix to
>> arch/x86/kernel/cpu/mcheck/mce.c to do the proper de-registration if
>> 'misc_register' fails?
>
> It's weird. From code point of view, it indeed not check return value
> so it should go silently. mce.c didn't do misc_deregister.

Have a debug it, the root cause is,
1). it does misc_register(&xen_mce_chrdev_device) at xen_early_init_mcelog(), it's very early stage (at xen_start_kernel), linux kernel and dd didn't start yet, so it fail to register xen_mce_chrdev_device.
2). After native start_kernel, native mce_chrdev_device registered successfully, then it point to *native* mcelog logic.

>
>>
>> Or just set 'mce_disabled=1' in the bootup of Xen, similar to
>> how lguest.c does it?

Have a look at lguest, it disable whole mce logic.
Xen dom0 cannot do so since we need keep dom0 mce logic to handle memory error which belong to dom0 (hypervisor will inject vMCE to dom0 in the case).


===================

Borislav and Konrad,

An approach which basically same as you suggested but w/ slightly update, is
1). at xen/mcelog.c, do misc_register(&xen_mce_chrdev_device) at xen_late_init_mcelog, define it as device_initcall(xen_late_init_mcelog) --> now linux dd ready, so xen mcelog divice would register successfully;
2). at native mce.c, change 1 line from device_initcall(mcheck_init_device) to device_initcall_sync(mcheck_init_device) --> so misc_register(&mce_chrdev_device) would be blocked by xen mcelog device;

I have draft test it and works fine.
Thought?

Thanks,
Jinsong


2012-05-29 18:46:54

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 1/3] xen/mce: Add mcelog support for Xen platform (v2)

> An approach which basically same as you suggested but w/ slightly update, is
> 1). at xen/mcelog.c, do misc_register(&xen_mce_chrdev_device) at xen_late_init_mcelog, define it as device_initcall(xen_late_init_mcelog) --> now linux dd ready, so xen mcelog divice would register successfully;

OK.
> 2). at native mce.c, change 1 line from device_initcall(mcheck_init_device) to device_initcall_sync(mcheck_init_device) --> so misc_register(&mce_chrdev_device) would be blocked by xen mcelog device;

You mean that the registration would be delayed to the next initcall level.
Ok, that sounds right.. but you also need to actually check the 'misc_register' return
value and if it fails (which it would in this case) unroll all the registration
that the generic code I think?

>
> I have draft test it and works fine.
> Thought?
>
> Thanks,
> Jinsong
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2012-05-30 15:15:01

by Liu, Jinsong

[permalink] [raw]
Subject: RE: [Xen-devel] [PATCH 1/3] xen/mce: Add mcelog support for Xen platform (v2)

Konrad Rzeszutek Wilk wrote:
>> An approach which basically same as you suggested but w/ slightly
>> update, is 1). at xen/mcelog.c, do
>> misc_register(&xen_mce_chrdev_device) at xen_late_init_mcelog,
>> define it as device_initcall(xen_late_init_mcelog) --> now linux dd
>> ready, so xen mcelog divice would register successfully;
>
> OK.
>> 2). at native mce.c, change 1 line from
>> device_initcall(mcheck_init_device) to
>> device_initcall_sync(mcheck_init_device) --> so
>> misc_register(&mce_chrdev_device) would be blocked by xen mcelog
>> device;
>
> You mean that the registration would be delayed to the next initcall
> level.
> Ok, that sounds right.. but you also need to actually check the
> 'misc_register' return value and if it fails (which it would in this
> case) unroll all the registration
> that the generic code I think?

Hmm, this patch blocked AMD mce logic, it's unacceptable.

Thanks,
Jinsong

>
>>
>> I have draft test it and works fine.
>> Thought?
>>
>> Thanks,
>> Jinsong
>> --
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/