On Fri, Sep 30, 2022 at 02:27:09AM +0300, Serge Semin wrote:
> It was a bad idea in the first place to combine two absolutely different
> controllers support in a single driver [1]. It caused having an additional
> level of abstraction, which obviously have needlessly overcomplicated the
> driver and as such caused many problems in the new main controller
> features support implementation. The solution looks even more unreasonable
> now seeing the justification of having both controllers support in a
> single driver hasn't been implemented by the original code author [2].
Yeah, no, you need to give more concrete details here.
Why exactly is this a problem?
Are you saying that if synopsys puts out 10 incompatible memory
controllers, we should have 10 separate EDAC drivers?
Hell no.
synopsys_edac.c is not a huge file and the probe logic which matches
which synps_platform_data to load is pretty straight-forward to me.
But maybe I'm missing something so please explain in detail what the
actual problems with this are?
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Sep 30, 2022 at 04:42:31PM +0200, Borislav Petkov wrote:
> On Fri, Sep 30, 2022 at 02:27:09AM +0300, Serge Semin wrote:
> > It was a bad idea in the first place to combine two absolutely different
> > controllers support in a single driver [1]. It caused having an additional
> > level of abstraction, which obviously have needlessly overcomplicated the
> > driver and as such caused many problems in the new main controller
> > features support implementation. The solution looks even more unreasonable
> > now seeing the justification of having both controllers support in a
> > single driver hasn't been implemented by the original code author [2].
>
> Yeah, no, you need to give more concrete details here.
>
> Why exactly is this a problem?
In general because it needlessly overcomplicates the driver, worsen
it scalability, maintainability and readability, makes it much harder
to add new controller features. Moreover even if you still able to add
some more features support the driver will get to be more and more messy
(as Michal correctly said in the original thread [1]). It will get to
be messy since you'll need to add more if-flag-else conditionals or
define new device-specific callbacks to select the right code path for
different controllers; you'll need to define some private data
specific to one controller and unused in another. All of that you
already have in the driver and all of that would be unneeded should the
driver author have followed the standard kernel bus-device-driver model.
Regarding this particular case. This series and two more patchsets
mentioned in the cover letter are about extending the Synopsys uMCTL2
DDRC controller functionality support in the driver. Here is just the
most significant part of the being added features:
1. DW uCMTL2 IP-core parameters auto-detection (DDR bus-width, frequency
ratio, burst-length, ECC-mode, etc).
2. Common SDRAM<->phys address translation.
3. Multi-ranked memory support.
4. ECC-scrubber and Scrubber IRQ support.
5. Individual IRQ signals support.
6. DFI alert_n IRQ support.
7. Add reference clock sources support.
None of the above can be utilized for the Zynq A05 DDR controller, but
is of the Synopsys uMCTL2 DDRC specific. Seeing the controllers are
not even of the same vendor and are absolutely incompatible,
considering all the new features, moving the Zynq A05 DDRC support
away to another driver was the best choice in order to create a
coherent and easy-to-read code, provide simple-enough patches for
review. The standard kernel bus-device-driver model is perfectly
suitable to differentiate these controllers. There is no point in
creating an additional abstraction.
>
> Are you saying that if synopsys puts out 10 incompatible memory
> controllers, we should have 10 separate EDAC drivers?
I said "absolutely different controllers" in the patchlog. The level
of incompatibility can be different and can vary from case to case. In
this particular case there is nothing in common in Synopsys uMCTL2
DDRC and Zynq A05 DDRC. So if there are two absolutely different
devices at consideration then in general they should be handled in
different drivers as per the kernel bus-device-driver model.
Note having absolutely different devices detectable on the same
platform doesn't mean they should be handled by the same driver
otherwise the kernel would have had tons of common platform-drivers
instead of the device-specific ones. Even in current synopsys EDAC
driver state (without my patches applied) should you detach the Zynq
A05 DDRC support to another driver, the code would have got to be much
easier to read and maintain.
I one more time draw your attention to the fact that the original
reason of having both Synopsys uMCTL2 and Zynq A05 DDRC support in the
same driver has never been implemented [2]. See edac_mc_alloc()
is always called with index zero in the driver. Even despite of that I
still provide a way to solve the problem denoted in [2] in the patch
"EDAC/mc: Add MC unique index allocation procedure" in this patchset.
>
> Hell no.
>
> synopsys_edac.c is not a huge file and the probe logic which matches
> which synps_platform_data to load is pretty straight-forward to me.
The synps_platform_data interface is so abstractive so you could have
added any basic EDAC device support to the driver. It doesn't mean you
should even if the devices can be detected on the same platform,
especially if you expect the code to evolve further being extended with
new more complex features.
>
> But maybe I'm missing something so please explain in detail what the
> actual problems with this are?
In this case I'd say the KISS principle applies. Simplification will
be reached by splitting the driver up only. I understand that it
couldn't have been done back then due to the original driver author
claiming the EDAC core to require the MC auto-index generation [2].
That's why I have added such functionality in the framework of this
series.
[1] https://lore.kernel.org/all/808655a9-77eb-4e3a-9781-2b059ad9517b@BN1AFFO11FD020.protection.gbl/
[2] https://lore.kernel.org/all/9dc2a947-d2ab-4f00-8ed3-d2499cb6fdfd@BN1BFFO11FD002.protection.gbl/
-Sergey
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
On Thu, Oct 06, 2022 at 03:17:40PM +0300, Serge Semin wrote:
> In general because it needlessly overcomplicates the driver, worsen
> it scalability, maintainability and readability, makes it much harder
> to add new controller features. Moreover even if you still able to add
> some more features support the driver will get to be more and more messy
> (as Michal correctly said in the original thread [1]).
Did you read that thread until the end?
> It will get to be messy since you'll need to add more if-flag-else
> conditionals or define new device-specific callbacks to select the
> right code path for different controllers; you'll need to define
> some private data specific to one controller and unused in another.
> All of that you already have in the driver and all of that would be
> unneeded should the driver author have followed the standard kernel
> bus-device-driver model.
So lemme ask this again because the last time it all sounded weird and I
don't think it got clarified fully: you cannot have more than one memory
controller type at the same time on those systems, can you?
Because if you can and you want to support that, the current EDAC
"design" allows to have only a single EDAC driver per system. So you
can't do two drivers now.
If your answer to that is
Subject: [PATCH RESEND v3 13/17] EDAC/mc: Add MC unique index allocation procedure
then I'm sceptical but I'd need to look at that first.
And reading your commit messages, you're talking a lot about what you're
doing. But that should be visible from the patch. What I wanna read is
*why* you're doing it.
Like in this patch above, what's that "unique index allocation
procedure" for?
edac_mc_alloc() already gets a mc_num as the MC number.
And yes, if you want to do multiple driver instances like x86 does per
NUMA node instances, then that is done with edac_mc_alloc() which gives
you a memory controller descriptor and then you can deal with those.
From all the text it sounds to me you want to add a separate driver for
that Zynq A05 thing but I might still be missing something...
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Thu, Oct 06, 2022 at 03:25:42PM +0200, Borislav Petkov wrote:
> On Thu, Oct 06, 2022 at 03:17:40PM +0300, Serge Semin wrote:
> > In general because it needlessly overcomplicates the driver, worsen
> > it scalability, maintainability and readability, makes it much harder
> > to add new controller features. Moreover even if you still able to add
> > some more features support the driver will get to be more and more messy
> > (as Michal correctly said in the original thread [1]).
>
> Did you read that thread until the end?
Of course I did. Here is a short summary:
1. @Punnaiah described that he got a Zynq system with two different
DDR controllers. One of them was Synopsys DW uMCTL2 DDRC (ARM cortex A9
PS) and another one - Zynq A05 DDRC (Xilinx PL). All of these DDR
controllers could be probed in Linux. It was required to detect the ECC
errors for both of them.
2. @Punnaiah suggested that both of these controllers should have been
handled by two different drivers since the controllers were different.
But he was reasonably afraid that there could be a race condition for
the mc_num assignment since it was the drivers responsibility to
allocate one.
3. @Punnaiah suggested two solutions: 1) Keep two drivers in single file
and use static global variable for tracking the mc_num. 2) Let the
framework assign the mc_num to the edac driver.
4. @Punnaiah preferred the solution 2, but you said that the solution
1 would do it.
5. @Michal was against mixing up the support of two different devices
in a single driver. He reasonably assumed that the driver would get
to look messy.
6. @Boris disagreed saying that it won't be messy and any
communication between the two memory controllers could be done
internally in that driver.
7. So you all decided to keep both controllers support in a single file,
but would get back to a discussion of having separate drivers for them
later.
But after all these years of the Synopsys EDAC driver living in kernel
we can conclude that combining the two different devices support in a
single driver was pointless. First by the driver design nobody ever
used the driver to handle both of them at the same time (MC index is
fixed to zero). Second there is no even a glimpse of communications
between two devices in the system. Third there are two different set
of macros and multiple platform-specific methods and if-flag-else
statements fully abstracting out the devices implementation from the
EDAC core. All of that has made the driver overcomplicated in-vain
seeing none of the reasons of these complications have been realised.
Meanwhile extending further the Synopsys uMCTL2 DDR controller
functionality support would mean adding new macros, functions,
platform-specific flags, parameters and device-specific data. That
would have made the driver even more complicated. Dropping the
abstraction layer out and splitting up the drivers was the best choice
in the current circumstances especially seeing the driver author and
@Michal preferred that solution in the first place.
Moreover in order to cover a still possible case of having both
Synopsys uMCTL2 DDRC and Zynq A05 DDRC used on the same system in
future I have implemented the solution 2.
>
> > It will get to be messy since you'll need to add more if-flag-else
> > conditionals or define new device-specific callbacks to select the
> > right code path for different controllers; you'll need to define
> > some private data specific to one controller and unused in another.
> > All of that you already have in the driver and all of that would be
> > unneeded should the driver author have followed the standard kernel
> > bus-device-driver model.
>
> So lemme ask this again because the last time it all sounded weird and I
> don't think it got clarified fully: you cannot have more than one memory
> controller type at the same time on those systems, can you?
To be clear I don't work with the Xilinx systems. So I can only say based
on your discussion with @Punnaiah on the initial driver review.
@Punnaiah said they could have more than one memory controller type on
their systems. That's why he described the problem with the MC indexes
allocation and suggested to combine the devices support in a single
driver.
>
> Because if you can and you want to support that, the current EDAC
> "design" allows to have only a single EDAC driver per system. So you
> can't do two drivers now.
I do understand that.
>
> If your answer to that is
>
> Subject: [PATCH RESEND v3 13/17] EDAC/mc: Add MC unique index allocation procedure
>
> then I'm sceptical but I'd need to look at that first.
Em, if there is something else which makes the EDAC drivers to be
impossible to co-exist on the same system, then it greatly violates
the bus-device-driver model.
>
> And reading your commit messages, you're talking a lot about what you're
> doing. But that should be visible from the patch. What I wanna read is
> *why* you're doing it.
Each patchlog has a thorough enough description of "why".
>
> Like in this patch above, what's that "unique index allocation
> procedure" for?
Have you read the patchlog?
>
> edac_mc_alloc() already gets a mc_num as the MC number.
>
> And yes, if you want to do multiple driver instances like x86 does per
> NUMA node instances, then that is done with edac_mc_alloc() which gives
> you a memory controller descriptor and then you can deal with those.
>
See the "EDAC/mc: Add MC unique index allocation procedure" patch. It
also provides a way to assign the indexes based on the OF-aliases.
> From all the text it sounds to me you want to add a separate driver for
> that Zynq A05 thing but I might still be missing something...
I do want to detach the Zynq A05 device support from out of the
Synopsys uMCTL2 driver.
BTW have you read the cover letter? It contains a short summary of the
changes and their justification. It seems as if you started your
review straight from this patch.
-Sergey
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
On Sat, Oct 08, 2022 at 03:42:24AM +0300, Serge Semin wrote:
> Of course I did. Here is a short summary:
You didn't have to do a short summary but sure, if you prefer...
> 7. So you all decided to keep both controllers support in a single file,
> but would get back to a discussion of having separate drivers for them
> later.
Yes, pretty much.
> But after all these years of the Synopsys EDAC driver living in kernel
> we can conclude that combining the two different devices support in a
> single driver was pointless. First by the driver design nobody ever
> used the driver to handle both of them at the same time (MC index is
> fixed to zero).
So how was this supposed to work on his system?
If you have a system with two different memory controllers and you want
to have two different EDAC drivers for each, then the whole EDAC core
code needs to be audited wrt concurrency and synchronizing access to
its facilities because I don't think it has ever supported more than a
single EDAC driver per system.
And it has never needed to, at least not on x86 land. Which is
currently changing because of CXL, because of accelerators needing
RAS, GPUs needing RAS and so on and so on. So eventually we'd have to
either put the new RAS functionality in the existing chipset-specific
driver or have to go the multiple EDAC drivers route. But that's only
tangential...
So first I'd like to hear what your use case is: single EDAC driver for
your particular Baical-T1 device or you need to support multiple EDAC
drivers.
If so, why?
> Moreover in order to cover a still possible case of having both
> Synopsys uMCTL2 DDRC and Zynq A05 DDRC used on the same system in
> future I have implemented the solution 2.
See above.
> Em, if there is something else which makes the EDAC drivers to be
> impossible to co-exist on the same system, then it greatly violates
> the bus-device-driver model.
If by that you mean the aspect of a driver associating with a device and
performing operations with it then why do you assume that EDAC drivers
have to adhere to that model?
> Have you read the patchlog?
Lemme reply to it directly.
> BTW have you read the cover letter? It contains a short summary of the
> changes and their justification.
Yes, I have read it and it contains a lot of unnecessary detail which
should be in the respective patches themselves. And I still don't know
exactly what *you* are trying to do, as I said above.
A cover letter should contain a short executive summary explaining only
the goal of the patchset and then you can go into details if you prefer.
A reviewer should not have to dig into patch management details to know
what this patchset is trying to do.
A possible structure could be:
Problem is A.
It happens because of B.
Fix it by doing C.
(Potentially do D).
Btw, when you're writing your commit messages, please use passive voice
in your commit message: no "we" or "I", etc, and describe your changes
in imperative mood.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Oct 12, 2022 at 07:28:48PM +0200, Borislav Petkov wrote:
> On Sat, Oct 08, 2022 at 03:42:24AM +0300, Serge Semin wrote:
> > Of course I did. Here is a short summary:
>
> You didn't have to do a short summary but sure, if you prefer...
It seemed to me as I should have so to make sure we are on the page of
the original discussion understanding.
>
> > 7. So you all decided to keep both controllers support in a single file,
> > but would get back to a discussion of having separate drivers for them
> > later.
>
> Yes, pretty much.
>
> > But after all these years of the Synopsys EDAC driver living in kernel
> > we can conclude that combining the two different devices support in a
> > single driver was pointless. First by the driver design nobody ever
> > used the driver to handle both of them at the same time (MC index is
> > fixed to zero).
>
> So how was this supposed to work on his system?
How am I supposed to know? You should have asked him at the time of
his patches review before accepting them. He (Punnaiah Choudary
Kalluri) said they had the system with two different DDR controllers.
Since the MC idx was supposed to be provided by the controller driver
he suggested to have both devices support implemented in the same
driver. You agreed. If no interaction intended between the two devices
why did he need to combine their support in the same driver then? He
could have as well just split the drivers up and didn't bother with
all the discussions.
>
> If you have a system with two different memory controllers
I don't. The Synopsys EDAC driver author (Punnaiah) did judging by
what he said in your discussion.
> and you want
> to have two different EDAC drivers for each, then the whole EDAC core
> code needs to be audited wrt concurrency and synchronizing access to
> its facilities because I don't think it has ever supported more than a
> single EDAC driver per system.
Once again. What is driver-depended do you have in the EDAC core?
Does it follow the bus-device-drivers model? I failed to find any
inter-driver dependency (what could be seen for instance in the
tty/serial subsystem). But I am not the subsystem maintainer after all.
I found the possible races in the MC index registration and fixed it
in the corresponding patch. But that was the just
registration-specifics behavior which could be easily fixed in the
same way as the most of the kernel subsystem do.
You are worried about the concurrencies. Does the EDAC core have
problems if there are several DDR devices of the same type probed?
AFAICS it doesn't as long as the indexes are properly allocated by the
driver. What is the problem with registering devices of different
types? The error counters are the MC-data-specific, so are the
sysfs-nodes. The EDAC MC error handler function doesn't touch any
inter-device parts of the core. The registration procedure is
protected by the mutex and RCU. So it seems as the EDAC core developer
thought about having the devices being registered concurrently.
>
> And it has never needed to, at least not on x86 land. Which is
> currently changing because of CXL, because of accelerators needing
> RAS, GPUs needing RAS and so on and so on. So eventually we'd have to
> either put the new RAS functionality in the existing chipset-specific
> driver or have to go the multiple EDAC drivers route. But that's only
> tangential...
If it has never needed to, then please explain why did you let the
Synopsys EDAC driver being accepted like that then?
>
> So first I'd like to hear what your use case is: single EDAC driver for
> your particular Baical-T1 device or you need to support multiple EDAC
> drivers.
In my case it's a single EDAC driver per-chip. There can be several
DDR-controllers installed on the same SoC, but all of them of the same
type (Synopsys DW uMCTL2 v2.61a).
>
> If so, why?
I don't have a system with different DDR controllers detected on the
same SoC but Punnaiah Choudary Kalluri, original Synopsys EDAC driver
developer, did.
>
> > Moreover in order to cover a still possible case of having both
> > Synopsys uMCTL2 DDRC and Zynq A05 DDRC used on the same system in
> > future I have implemented the solution 2.
>
> See above.
>
> > Em, if there is something else which makes the EDAC drivers to be
> > impossible to co-exist on the same system, then it greatly violates
> > the bus-device-driver model.
>
> If by that you mean the aspect of a driver associating with a device and
> performing operations with it then why do you assume that EDAC drivers
> have to adhere to that model?
>
> > Have you read the patchlog?
>
> Lemme reply to it directly.
>
> > BTW have you read the cover letter? It contains a short summary of the
> > changes and their justification.
>
> Yes, I have read it and it contains a lot of unnecessary detail which
> should be in the respective patches themselves. And I still don't know
> exactly what *you* are trying to do, as I said above.
>
> A cover letter should contain a short executive summary explaining only
> the goal of the patchset and then you can go into details if you prefer.
> A reviewer should not have to dig into patch management details to know
> what this patchset is trying to do.
The main part of the letter starts exactly with the goals and then has
text with more details of what is done and why. It is enough to get a
notion regarding the patchset aim and content. Each patchlog starts
with the problem description, the suggested solution and some details
of the implementation I thought was required to add. Everything in
accordance with the kernel patches standards.
-Sergey
>
> A possible structure could be:
>
> Problem is A.
>
> It happens because of B.
>
> Fix it by doing C.
>
> (Potentially do D).
>
> Btw, when you're writing your commit messages, please use passive voice
> in your commit message: no "we" or "I", etc, and describe your changes
> in imperative mood.
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Oct 12, 2022 at 10:27:43PM +0300, Serge Semin wrote:
> ... inter-device parts of the core. The registration procedure is
> protected by the mutex and RCU. So it seems as the EDAC core developer
Seems, schmeems. As I said already, EDAC has always had a single
chipset-specific driver. Period. So if one needs to run more than one
chipset-specific driver concurrently, then the whole code needs to be
audited because this hasn't been done before.
> If it has never needed to, then please explain why did you let the
> Synopsys EDAC driver being accepted like that then?
I think I already did.
> In my case it's a single EDAC driver per-chip. There can be several
> DDR-controllers installed on the same SoC, but all of them of the same
> type (Synopsys DW uMCTL2 v2.61a).
Good.
I'll look at your patches as time allows.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Oct 12, 2022 at 09:44:00PM +0200, Borislav Petkov wrote:
> On Wed, Oct 12, 2022 at 10:27:43PM +0300, Serge Semin wrote:
> > ... inter-device parts of the core. The registration procedure is
> > protected by the mutex and RCU. So it seems as the EDAC core developer
>
> Seems, schmeems. As I said already, EDAC has always had a single
> chipset-specific driver. Period. So if one needs to run more than one
> chipset-specific driver concurrently, then the whole code needs to be
> audited because this hasn't been done before.
>
> > If it has never needed to, then please explain why did you let the
> > Synopsys EDAC driver being accepted like that then?
>
> I think I already did.
Kind of. What you didn't explain was the driver-specific problem in the
edac_mc core. What is the difference in the EDAC core handling
two devices (including of difference types) on the same platform and
handling the same devices each probed by two different drivers? (Consider
the drivers are designed thread-safe and we are talking about the EDAC
MC core.)
>
> > In my case it's a single EDAC driver per-chip. There can be several
> > DDR-controllers installed on the same SoC, but all of them of the same
> > type (Synopsys DW uMCTL2 v2.61a).
>
> Good.
>
> I'll look at your patches as time allows.
Ok. Thanks in advance.
-Sergey
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette