2024-04-26 22:41:55

by Andrew Lunn

[permalink] [raw]
Subject: Re: PoE complex usage of regulator API

> Let's begin simple, in PSE world we are more talking about power.
> Would it be ok to add a regulator_get/set_power_limit() and
> regulator_get_power() callback to regulator API. Would regulator API have
> interest to such callbacks?

Could you define this API in more details.

I'm assuming this is mostly about book keeping? When a regulator is
created, we want to say is can deliver up to X Kilowatts. We then want
to allocate power to ports. So there needs to be a call asking it to
allocate part of X to a consumer, which could fail if there is not
sufficient power budget left. And there needs to be a call to release
such an allocation.

We are probably not so much interested in what the actual current
power draw is, assuming there is no wish to over provision?

There is in theory a potential second user of this. Intel have been
looking at power control for SFPs. Typically they are guaranteed a
minimum of 1.5W. However, they can operate at higher power
classes. You can have boards with multiple SFPs, with a theoretical
maximum power draw more than what the supply can supply. So you need
similar sort of power budget book keeping to allocate power to an SFP
cage before telling the SFP module it can swap to a higher power
class. I say this is theoretical, because the device Intel is working
on has this hidden away in firmware. But maybe sometime in the future
somebody will want Linux doing this.

Andrew


2024-04-29 12:54:53

by Kory Maincent

[permalink] [raw]
Subject: Re: PoE complex usage of regulator API

On Sat, 27 Apr 2024 00:41:19 +0200
Andrew Lunn <[email protected]> wrote:

> > Let's begin simple, in PSE world we are more talking about power.
> > Would it be ok to add a regulator_get/set_power_limit() and
> > regulator_get_power() callback to regulator API. Would regulator API have
> > interest to such callbacks?
>
> Could you define this API in more details.

The first new PoE features targeted by this API was to read the consumed power
and get set the power limit for each ports. Yes mainly book keeping.
Few drivers callbacks that will be called by ethtool and maybe the read of power
limit and consumed power could be add to read-only sysfs regulator.

> I'm assuming this is mostly about book keeping? When a regulator is
> created, we want to say is can deliver up to X Kilowatts. We then want
> to allocate power to ports. So there needs to be a call asking it to
> allocate part of X to a consumer, which could fail if there is not
> sufficient power budget left. And there needs to be a call to release
> such an allocation.

This is more the aim of the second point I have raised, power priority and
parent power budget. And how the core can manage it.

> We are probably not so much interested in what the actual current
> power draw is, assuming there is no wish to over provision?
>
> There is in theory a potential second user of this. Intel have been
> looking at power control for SFPs. Typically they are guaranteed a
> minimum of 1.5W. However, they can operate at higher power
> classes. You can have boards with multiple SFPs, with a theoretical
> maximum power draw more than what the supply can supply. So you need
> similar sort of power budget book keeping to allocate power to an SFP
> cage before telling the SFP module it can swap to a higher power
> class. I say this is theoretical, because the device Intel is working
> on has this hidden away in firmware. But maybe sometime in the future
> somebody will want Linux doing this.

So there is a potential second user, that's great to hear it! Could the
priority stuff be also interesting? Like to allow only high priority SFP to use
higher power class in case of a limiting power budget.

Regards,
--
Köry Maincent, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com

2024-04-29 14:32:50

by Oleksij Rempel

[permalink] [raw]
Subject: Re: PoE complex usage of regulator API

Hi Kory,

On Mon, Apr 29, 2024 at 02:52:03PM +0200, Kory Maincent wrote:
> On Sat, 27 Apr 2024 00:41:19 +0200
> Andrew Lunn <[email protected]> wrote:
>
> > > Let's begin simple, in PSE world we are more talking about power.
> > > Would it be ok to add a regulator_get/set_power_limit() and
> > > regulator_get_power() callback to regulator API. Would regulator API have
> > > interest to such callbacks?
> >
> > Could you define this API in more details.
>
> The first new PoE features targeted by this API was to read the consumed power
> and get set the power limit for each ports. Yes mainly book keeping.
> Few drivers callbacks that will be called by ethtool and maybe the read of power
> limit and consumed power could be add to read-only sysfs regulator.

regulator framework already supports operations with current (I):
regulator_set_current_limit()
regulator_get_current_limit()

The power P = I * V. On one side you can calculate needed current value:
I = P/V. On other side, may be regulator framework can be extended to do
it to. In case of PoE/PoDL we have adjustable voltage, depending on the
Class of the device, we will probably interact with PSE controller by
using Power instate of Current.

> > I'm assuming this is mostly about book keeping? When a regulator is
> > created, we want to say is can deliver up to X Kilowatts. We then want
> > to allocate power to ports. So there needs to be a call asking it to
> > allocate part of X to a consumer, which could fail if there is not
> > sufficient power budget left. And there needs to be a call to release
> > such an allocation.
>
> This is more the aim of the second point I have raised, power priority and
> parent power budget. And how the core can manage it.

Since there is already support to work with current (I) values, there
are is also overcurrent protection. If a device is beyond the power
budget limit, it is practically an over current event. Regulator
framework already capable on handling some of this events, what we need
for PoE is prioritization. If we detect overcurrent on supply root/node
we need to shutdown enough low prio consumers to provide enough power
for the high prio consumers.

In reality, this will be done by the PoE controller in HW. Usually we
will get

> > We are probably not so much interested in what the actual current
> > power draw is, assuming there is no wish to over provision?
> >
> > There is in theory a potential second user of this. Intel have been
> > looking at power control for SFPs. Typically they are guaranteed a
> > minimum of 1.5W. However, they can operate at higher power
> > classes. You can have boards with multiple SFPs, with a theoretical
> > maximum power draw more than what the supply can supply. So you need
> > similar sort of power budget book keeping to allocate power to an SFP
> > cage before telling the SFP module it can swap to a higher power
> > class. I say this is theoretical, because the device Intel is working
> > on has this hidden away in firmware. But maybe sometime in the future
> > somebody will want Linux doing this.
>
> So there is a potential second user, that's great to hear it! Could the
> priority stuff be also interesting? Like to allow only high priority SFP to use
> higher power class in case of a limiting power budget.

There are even more use cases. For example on power loss with some
limited backup power source, you wont to shut all low prio consumers
and provided needed power and time for some device which may fail. For
example storage devices.

Regards,
Oleksij
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

2024-04-29 14:57:53

by Andrew Lunn

[permalink] [raw]
Subject: Re: PoE complex usage of regulator API

> Since there is already support to work with current (I) values, there
> are is also overcurrent protection. If a device is beyond the power
> budget limit, it is practically an over current event. Regulator
> framework already capable on handling some of this events, what we need
> for PoE is prioritization. If we detect overcurrent on supply root/node
> we need to shutdown enough low prio consumers to provide enough power
> for the high prio consumers.

So the assumption is we allow over provisioning?

> > So there is a potential second user, that's great to hear it! Could the
> > priority stuff be also interesting? Like to allow only high priority SFP to use
> > higher power class in case of a limiting power budget.

I was not expecting over-provisioning to happen. So prioritisation
does not make much sense. You either have the power budget, or you
don't. The SFP gets to use a higher power class if there is budget, or
it is kept at a lower power class if there is no budget. I _guess_ you
could give it a high power class, let it establish link, monitor its
actual power consumption, and then decide to drop it to a lower class
if the actual consumption indicates it could work at a lower
class. But the danger is, you are going to loose link.

I've no real experience with this, and all systems today hide this
away in firmware, rather than have Linux control it.

Andrew


2024-04-29 15:48:04

by Mark Brown

[permalink] [raw]
Subject: Re: PoE complex usage of regulator API

On Sat, Apr 27, 2024 at 12:41:19AM +0200, Andrew Lunn wrote:

> I'm assuming this is mostly about book keeping? When a regulator is
> created, we want to say is can deliver up to X Kilowatts. We then want
> to allocate power to ports. So there needs to be a call asking it to
> allocate part of X to a consumer, which could fail if there is not
> sufficient power budget left. And there needs to be a call to release
> such an allocation.

The current limits for regulators are generally imposed in hardware as a
safety measure, this also happens for example with USB where there's
regulators in the PHYs. Whatever is providing the power is very likely
to have reasonable headroom for robustness.

> We are probably not so much interested in what the actual current
> power draw is, assuming there is no wish to over provision?

One of the goals is to protect the system in the case that something
malfunctions and tries to draw more current than can be sustained. A
system that is overprovisioned might choose to allow excessive draw,
especially transiently to cover bootsrapping issues, though there's
tradeoffs with system protection vs interoperability with poor quality
implementations there.


Attachments:
(No filename) (1.19 kB)
signature.asc (499.00 B)
Download all attachments

2024-04-29 15:59:47

by Mark Brown

[permalink] [raw]
Subject: Re: PoE complex usage of regulator API

On Mon, Apr 29, 2024 at 04:57:35PM +0200, Andrew Lunn wrote:

> I was not expecting over-provisioning to happen. So prioritisation
> does not make much sense. You either have the power budget, or you
> don't. The SFP gets to use a higher power class if there is budget, or
> it is kept at a lower power class if there is no budget. I _guess_ you
> could give it a high power class, let it establish link, monitor its
> actual power consumption, and then decide to drop it to a lower class
> if the actual consumption indicates it could work at a lower
> class. But the danger is, you are going to loose link.

I suspect these devices will be like most other modern systems and
typically not consume anything like their peak current most of the time,
for networking hardware I'd imagine this will only be when the link is
saturated and could depend on factors like how long the physical links
are. If it's anything like other similar hardware you may also be
making power requests with a very low resolution specification of the
consumption so have conservative allocation end up rejecting systems
that should work.


Attachments:
(No filename) (1.11 kB)
signature.asc (499.00 B)
Download all attachments

2024-04-29 16:01:26

by Oleksij Rempel

[permalink] [raw]
Subject: Re: PoE complex usage of regulator API

On Mon, Apr 29, 2024 at 04:57:35PM +0200, Andrew Lunn wrote:
> > Since there is already support to work with current (I) values, there
> > are is also overcurrent protection. If a device is beyond the power
> > budget limit, it is practically an over current event. Regulator
> > framework already capable on handling some of this events, what we need
> > for PoE is prioritization. If we detect overcurrent on supply root/node
> > we need to shutdown enough low prio consumers to provide enough power
> > for the high prio consumers.
>
> So the assumption is we allow over provisioning?

I assume yes. But I didn't spend enough time to understand and analyze
this part. May be I just misunderstand over provisioning.

> > > So there is a potential second user, that's great to hear it! Could the
> > > priority stuff be also interesting? Like to allow only high priority SFP to use
> > > higher power class in case of a limiting power budget.
>
> I was not expecting over-provisioning to happen. So prioritisation
> does not make much sense. You either have the power budget, or you
> don't.
> The SFP gets to use a higher power class if there is budget, or
> it is kept at a lower power class if there is no budget. I _guess_ you
> could give it a high power class, let it establish link, monitor its
> actual power consumption, and then decide to drop it to a lower class
> if the actual consumption indicates it could work at a lower
> class. But the danger is, you are going to loose link.
>
> I've no real experience with this, and all systems today hide this
> away in firmware, rather than have Linux control it.
>
> Andrew

It may not be a over-provisioning by design. I can imagine some scenarios where
available power budge may dynamically change:

- Changes in Available Power Budget: If a PoE switch is modular or supports
hot-swappable power supplies, inserting a power supply with a lower power
budget while the system is under load can lead to insufficient power
availability. This might cause the system to redistribute power, potentially
leading to instability or overcurrent situations if the power management isn't
handled smoothly.

- Power Loss and Switching to Backup Sources: In cases where a switch relies on
a backup power source (like a UPS or a secondary power supply), the transition
from the primary power source to the backup can create fluctuations. These
fluctuations may temporarily affect how power is supplied to the PoE ports,
potentially causing overcurrent if the backup power does not match the original
specifications.

- System Internal Consumers: Components within the switch itself, such as
processing units or internal lighting/cooling systems, might draw power
differently under various operating conditions. Changes in internal consumption
due to increased processing needs or thermal dynamics could affect the overall
power budget.

- Environmental Conditions: High ambient temperatures can reduce the efficiency
of power delivery and increase the electrical resistance in circuits,
potentially leading to higher current draws. Additionally, cooling failures
within the switch can exacerbate this issue.

- Faulty Power Management Logic: Firmware bugs or errors in the power
management algorithm might incorrectly allocate power or fail to properly
respond to changes in power demands, leading to potential overcurrent
scenarios.

Regards,
Oleksij
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |