[permalink] [raw]

Subject: Re: [PATCH V3 06/11] tracing: Improve panic/die notifiers

On 13/12/2022 21:06, Steven Rostedt wrote:
> [...]
> Heh, yeah, I forgot about this one too.
>
>>
>> So, do you think it's possible to pick it for 6.2? No dependency here,
>> it's a standalone patch.
>
> I can pull it in.

Awesome, thanks a bunch!

>
>>
>> Thanks again and sorry for the annoyance!
>
> No, sorry for missing these.
>
> -- Steve

No biggies =))

2023-02-10 16:01:50

by Guilherme G. Piccoli

[permalink] [raw]

Subject: Re: [PATCH V3 08/11] EDAC/altera: Skip the panic notifier if kdump is loaded

Hi Boris and Petr, first of all thanks for your great analysis and
really sorry for the huge delay in my response.

Below I'm pasting the 2 relevant responses from both Petr and Boris.

On 22/11/2022 12:06, Borislav Petkov wrote:
> On Tue, Nov 22, 2022 at 10:33:12AM -0300, Guilherme G. Piccoli wrote:
>
> Leaving in the whole thing for newly added people.
>
>> On 18/09/2022 11:10, Guilherme G. Piccoli wrote:
>>> On 19/08/2022 19:17, Guilherme G. Piccoli wrote:
>>>> The altera_edac panic notifier performs some data collection with
>>>> regards errors detected; such code relies in the regmap layer to
>>>> perform reads/writes, so the code is abstracted and there is some
>>>> risk level to execute that, since the panic path runs in atomic
>>>> context, with interrupts/preemption and secondary CPUs disabled.
>>>>
>>>> Users want the information collected in this panic notifier though,
>>>> so in order to balance the risk/benefit, let's skip the altera panic
>>>> notifier if kdump is loaded. While at it, remove a useless header
>>>> and encompass a macro inside the sole ifdef block it is used.
>>>>
>>>> Cc: Borislav Petkov <[email protected]>
>>>> Cc: Petr Mladek <[email protected]>
>>>> Cc: Tony Luck <[email protected]>
>>>> Acked-by: Dinh Nguyen <[email protected]>
>>>> Signed-off-by: Guilherme G. Piccoli <[email protected]>
>>>>
>>>> ---
>>>>
>>>> V3:
>>>> - added the ack tag from Dinh - thanks!
>>>> - had a good discussion with Boris about that in V2 [0],
>>>> hopefully we can continue and reach a consensus in this V3.
>>>> [0] https://lore.kernel.org/lkml/[email protected]/
>>>>
>>>> V2:
>>>> - new patch, based on the discussion in [1].
>>>> [1] https://lore.kernel.org/lkml/[email protected]/
>>>>
>>>> [...]
>>>
>>> Hi Dinh, Tony, Boris - sorry for the ping.
>>>
>>> Appreciate reviews on this one - Dinh already ACKed the patch but Boris
>>> raised some points in the past version [0], so any opinions or
>>> discussions are welcome!
>>
>>
>> Hi folks, monthly ping heheh
>> Apologies for the re-pings, please let me know if there is anything
>> required to move on this patch.
>
> Looking at this again, I really don't like the sprinkling of
>
> if (kexec_crash_loaded())
>
> in unrelated code. And I still think that the real fix here is to kill
> this
>
> edac->panic_notifier
>
> thing. And replace it with simply logging the error from the double bit
> error interrupt handle. That DBERR IRQ thing altr_edac_a10_irq_handler().
> Because this is what this panic notifier does - dump double-bit errors.
>
> Now, if Dinh doesn't move, I guess we can ask Tony and/or Rabara (he has
> sent a patch for this driver recently and Altera belongs to Intel now)
> to find someone who can test such a change and we (you could give it a
> try first :)) can do that change.
>
> Thx.
>

On 09/12/2022 13:03, Petr Mladek wrote:> [...]>
> I have read the discussion about v2 [1] and this looks like a bad
> approach from my POV.
>
> My understanding is that the information provided by this notifier
> could not be found in the crashdump. It means that people really
> want to run this before crashdump in principle.
>
> Of course, there is the question how much safe this code is. I mean
> if the panic() code path might get blocked here.
>
> I see two possibilities.
>
> The best solution would be if we know that this is "always" safe or if
> it can be done a safe way. Then we could keep it as it is or implement
> the safe way.
>
> Alternative solution would be to create a kernel parameter that
> would enable/disable this particular report when kdump is enabled.
> The question would be the default. It would depend on how risky
> the code is and how useful the information is.
>
> [1] https://lore.kernel.org/r/[email protected]
>

So, for me Petr approach is the more straightforward, though we could
rethink the idea of this notifier being...a notifier, as suggest Boris heh

Anyway, what I plan to do is: I'll re-submit a simple clean-up for this
code (header / ifdef stuff), not functional-changing the code path.

After that, when re-submitting the V2 or the notifiers refactor (which
I'm pending for some good months =O ), I'll deal with this code
properly, factoring the ideas and proposing a meaningful change.

So, let's discard this patch for now.
Thanks again,

Guilherme