2021-12-03 15:43:33

by Laurent Dufour

[permalink] [raw]
Subject: [PATCH v3] powerpc/pseries: read the lpar name from the firmware

The LPAR name may be changed after the LPAR has been started in the HMC.
In that case lparstat command is not reporting the updated value because it
reads it from the device tree which is read at boot time.

However this value could be read from RTAS.

Adding this value in the /proc/powerpc/lparcfg output allows to read the
updated value.

Cc: Nathan Lynch <[email protected]>
Signed-off-by: Laurent Dufour <[email protected]>
---
v3:
address Michael's comments.
v2:
address Nathan's comments.
change title to partition_name aligning with existing partition_id
---
arch/powerpc/platforms/pseries/lparcfg.c | 54 ++++++++++++++++++++++++
1 file changed, 54 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c
index f71eac74ea92..5f6dbc23d7d2 100644
--- a/arch/powerpc/platforms/pseries/lparcfg.c
+++ b/arch/powerpc/platforms/pseries/lparcfg.c
@@ -311,6 +311,59 @@ static void parse_mpp_x_data(struct seq_file *m)
seq_printf(m, "coalesce_pool_spurr=%ld\n", mpp_x_data.pool_spurr_cycles);
}

+/*
+ * PAPR defines, in section "7.3.16 System Parameters Option", the token 55 to
+ * read the LPAR name.
+ */
+#define SPLPAR_LPAR_NAME_TOKEN 55
+static void read_lpar_name(struct seq_file *m)
+{
+ int rc, len, token;
+ union {
+ char raw_buffer[RTAS_DATA_BUF_SIZE];
+ struct {
+ __be16 len;
+ char name[RTAS_DATA_BUF_SIZE-2];
+ };
+ } *local_buffer;
+
+ token = rtas_token("ibm,get-system-parameter");
+ if (token == RTAS_UNKNOWN_SERVICE)
+ return;
+
+ local_buffer = kmalloc(sizeof(*local_buffer), GFP_KERNEL);
+ if (!local_buffer)
+ return;
+
+ do {
+ spin_lock(&rtas_data_buf_lock);
+ memset(rtas_data_buf, 0, RTAS_DATA_BUF_SIZE);
+ rc = rtas_call(token, 3, 1, NULL, SPLPAR_LPAR_NAME_TOKEN,
+ __pa(rtas_data_buf), RTAS_DATA_BUF_SIZE);
+ if (!rc)
+ memcpy(local_buffer->raw_buffer, rtas_data_buf,
+ RTAS_DATA_BUF_SIZE);
+ spin_unlock(&rtas_data_buf_lock);
+ } while (rtas_busy_delay(rc));
+
+ if (rc != 0) {
+ pr_err_once(
+ "%s %s Error calling get-system-parameter (0x%x)\n",
+ __FILE__, __func__, rc);
+ } else {
+ /* Force end of string */
+ len = be16_to_cpu(local_buffer->len);
+ if (len >= (RTAS_DATA_BUF_SIZE-2))
+ len = RTAS_DATA_BUF_SIZE-2;
+ local_buffer->name[len] = '\0';
+
+ seq_printf(m, "partition_name=%s\n", local_buffer->name);
+ }
+
+ kfree(local_buffer);
+}
+
+
#define SPLPAR_CHARACTERISTICS_TOKEN 20
#define SPLPAR_MAXLENGTH 1026*(sizeof(char))

@@ -496,6 +549,7 @@ static int pseries_lparcfg_data(struct seq_file *m, void *v)

if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
/* this call handles the ibm,get-system-parameter contents */
+ read_lpar_name(m);
parse_system_parameter_string(m);
parse_ppp_data(m);
parse_mpp_data(m);
--
2.34.1



2021-12-07 14:32:51

by Nathan Lynch

[permalink] [raw]
Subject: Re: [PATCH v3] powerpc/pseries: read the lpar name from the firmware

Hi Laurent,

Laurent Dufour <[email protected]> writes:
> +/*
> + * PAPR defines, in section "7.3.16 System Parameters Option", the token 55 to
> + * read the LPAR name.
> + */
> +#define SPLPAR_LPAR_NAME_TOKEN 55
> +static void read_lpar_name(struct seq_file *m)
> +{
> + int rc, len, token;
> + union {
> + char raw_buffer[RTAS_DATA_BUF_SIZE];
> + struct {
> + __be16 len;

This:

> + char name[RTAS_DATA_BUF_SIZE-2];
^^^^^^

should be 4000, not (4K - 2), according to PAPR (it's weird and I don't
know the reason).


> + };
> + } *local_buffer;
> +
> + token = rtas_token("ibm,get-system-parameter");
> + if (token == RTAS_UNKNOWN_SERVICE)
> + return;
> +
> + local_buffer = kmalloc(sizeof(*local_buffer), GFP_KERNEL);
> + if (!local_buffer)
> + return;
> +
> + do {
> + spin_lock(&rtas_data_buf_lock);
> + memset(rtas_data_buf, 0, RTAS_DATA_BUF_SIZE);
> + rc = rtas_call(token, 3, 1, NULL, SPLPAR_LPAR_NAME_TOKEN,
> + __pa(rtas_data_buf), RTAS_DATA_BUF_SIZE);
> + if (!rc)
> + memcpy(local_buffer->raw_buffer, rtas_data_buf,
> + RTAS_DATA_BUF_SIZE);
> + spin_unlock(&rtas_data_buf_lock);
> + } while (rtas_busy_delay(rc));
> +
> + if (rc != 0) {
> + pr_err_once(
> + "%s %s Error calling get-system-parameter (0x%x)\n",
> + __FILE__, __func__, rc);

The __FILE__ and __func__ in the message seem unnecessary, and rc should
be reported in decimal so the error meaning is apparent.

Is there a reasonable fallback for VMs where this parameter doesn't
exist? PowerVM partitions should always have it, but what do we want the
behavior to be on other hypervisors?


> + } else {
> + /* Force end of string */
> + len = be16_to_cpu(local_buffer->len);
> + if (len >= (RTAS_DATA_BUF_SIZE-2))
> + len = RTAS_DATA_BUF_SIZE-2;

Could use min() or clamp(), and it would be better to build the
expression using the value of sizeof(local_buffer->name).

> + local_buffer->name[len] = '\0';

If 'len' can be (RTAS_DATA_BUF_SIZE - 2), then this writes past the end
of the buffer, no?


2021-12-07 16:07:45

by Laurent Dufour

[permalink] [raw]
Subject: Re: [PATCH v3] powerpc/pseries: read the lpar name from the firmware

On 07/12/2021, 15:32:39, Nathan Lynch wrote:
> Hi Laurent,
>
> Laurent Dufour <[email protected]> writes:
>> +/*
>> + * PAPR defines, in section "7.3.16 System Parameters Option", the token 55 to
>> + * read the LPAR name.
>> + */
>> +#define SPLPAR_LPAR_NAME_TOKEN 55
>> +static void read_lpar_name(struct seq_file *m)
>> +{
>> + int rc, len, token;
>> + union {
>> + char raw_buffer[RTAS_DATA_BUF_SIZE];
>> + struct {
>> + __be16 len;
>
> This:
>
>> + char name[RTAS_DATA_BUF_SIZE-2];
> ^^^^^^
>
> should be 4000, not (4K - 2), according to PAPR (it's weird and I don't
> know the reason).

That's true, PAPR defines the largest output buffer for
ibm,get-system-parameter to 4002, so we could limit this to 4002, not sure
whether this would make a big difference here. Anyway I will limit that
buffer size this way.
>
>
>> + };
>> + } *local_buffer;
>> +
>> + token = rtas_token("ibm,get-system-parameter");
>> + if (token == RTAS_UNKNOWN_SERVICE)
>> + return;
>> +
>> + local_buffer = kmalloc(sizeof(*local_buffer), GFP_KERNEL);
>> + if (!local_buffer)
>> + return;
>> +
>> + do {
>> + spin_lock(&rtas_data_buf_lock);
>> + memset(rtas_data_buf, 0, RTAS_DATA_BUF_SIZE);
>> + rc = rtas_call(token, 3, 1, NULL, SPLPAR_LPAR_NAME_TOKEN,
>> + __pa(rtas_data_buf), RTAS_DATA_BUF_SIZE);
>> + if (!rc)
>> + memcpy(local_buffer->raw_buffer, rtas_data_buf,
>> + RTAS_DATA_BUF_SIZE);
>> + spin_unlock(&rtas_data_buf_lock);
>> + } while (rtas_busy_delay(rc));
>> +
>> + if (rc != 0) {
>> + pr_err_once(
>> + "%s %s Error calling get-system-parameter (0x%x)\n",
>> + __FILE__, __func__, rc);
>
> The __FILE__ and __func__ in the message seem unnecessary, and rc should
> be reported in decimal so the error meaning is apparent.

Fair enough.

> Is there a reasonable fallback for VMs where this parameter doesn't
> exist? PowerVM partitions should always have it, but what do we want the
> behavior to be on other hypervisors?

In that case, there is no value displayed in the /proc/powerpc/lparcfg and
the lparstat -i command will fall back to the device tree value. I can't
see any valid reason to report the value defined in the device tree here.

>
>
>> + } else {
>> + /* Force end of string */
>> + len = be16_to_cpu(local_buffer->len);
>> + if (len >= (RTAS_DATA_BUF_SIZE-2))
>> + len = RTAS_DATA_BUF_SIZE-2;
>
> Could use min() or clamp(), and it would be better to build the
> expression using the value of sizeof(local_buffer->name).

Fair enough.

>
>> + local_buffer->name[len] = '\0';
>
> If 'len' can be (RTAS_DATA_BUF_SIZE - 2), then this writes past the end
> of the buffer, no?

Oh yes, the previous test should be
if (len >= (RTAS_DATA_BUF_SIZE-2))
len = RTAS_DATA_BUF_SIZE-3;

Thanks,
Laurent.

2021-12-07 17:08:02

by Nathan Lynch

[permalink] [raw]
Subject: Re: [PATCH v3] powerpc/pseries: read the lpar name from the firmware

Laurent Dufour <[email protected]> writes:
> On 07/12/2021, 15:32:39, Nathan Lynch wrote:
>> Is there a reasonable fallback for VMs where this parameter doesn't
>> exist? PowerVM partitions should always have it, but what do we want the
>> behavior to be on other hypervisors?
>
> In that case, there is no value displayed in the /proc/powerpc/lparcfg and
> the lparstat -i command will fall back to the device tree value. I can't
> see any valid reason to report the value defined in the device tree
> here.

Here's a valid reason :-)

lparstat isn't the only possible consumer of the interface, and the
'ibm,partition-name' property and the dynamic system parameter clearly
serve a common purpose. 'ibm,partition-name' is provided by qemu.

In any case, the function should not print an error when the return
value is -3 (parameter not supported).

2021-12-07 17:18:57

by Laurent Dufour

[permalink] [raw]
Subject: Re: [PATCH v3] powerpc/pseries: read the lpar name from the firmware

On 07/12/2021, 18:07:50, Nathan Lynch wrote:
> Laurent Dufour <[email protected]> writes:
>> On 07/12/2021, 15:32:39, Nathan Lynch wrote:
>>> Is there a reasonable fallback for VMs where this parameter doesn't
>>> exist? PowerVM partitions should always have it, but what do we want the
>>> behavior to be on other hypervisors?
>>
>> In that case, there is no value displayed in the /proc/powerpc/lparcfg and
>> the lparstat -i command will fall back to the device tree value. I can't
>> see any valid reason to report the value defined in the device tree
>> here.
>
> Here's a valid reason :-)
>
> lparstat isn't the only possible consumer of the interface, and the
> 'ibm,partition-name' property and the dynamic system parameter clearly
> serve a common purpose. 'ibm,partition-name' is provided by qemu.

If the hypervisor is not providing this value, this is not the goal of this
interface to fetch it from the device tree.

Any consumer should be able to fall back on the device tree value, and
there is no added value to do such a trick in the kernel when it can be
done in the user space.

> In any case, the function should not print an error when the return
> value is -3 (parameter not supported).

That's a valid requirement.


2021-12-07 17:24:50

by Laurent Dufour

[permalink] [raw]
Subject: Re: [PATCH v3] powerpc/pseries: read the lpar name from the firmware

On 07/12/2021, 18:18:40, Laurent Dufour wrote:
> On 07/12/2021, 18:07:50, Nathan Lynch wrote:
>> Laurent Dufour <[email protected]> writes:
>>> On 07/12/2021, 15:32:39, Nathan Lynch wrote:
>>>> Is there a reasonable fallback for VMs where this parameter doesn't
>>>> exist? PowerVM partitions should always have it, but what do we want the
>>>> behavior to be on other hypervisors?
>>>
>>> In that case, there is no value displayed in the /proc/powerpc/lparcfg and
>>> the lparstat -i command will fall back to the device tree value. I can't
>>> see any valid reason to report the value defined in the device tree
>>> here.
>>
>> Here's a valid reason :-)
>>
>> lparstat isn't the only possible consumer of the interface, and the
>> 'ibm,partition-name' property and the dynamic system parameter clearly
>> serve a common purpose. 'ibm,partition-name' is provided by qemu.
>
> If the hypervisor is not providing this value, this is not the goal of this
> interface to fetch it from the device tree.
>
> Any consumer should be able to fall back on the device tree value, and
> there is no added value to do such a trick in the kernel when it can be
> done in the user space.
>
>> In any case, the function should not print an error when the return
>> value is -3 (parameter not supported).
>
> That's a valid requirement.

I sent a v4 which is printing an error message even if the parameter is not
supported by the hypervisor.

This is unlikely and since this a call to pr_err_once(), it would be
printed only once, so not really annoying. I don't think a v5 is required
for such a minor message.

2021-12-08 15:21:40

by Nathan Lynch

[permalink] [raw]
Subject: Re: [PATCH v3] powerpc/pseries: read the lpar name from the firmware

Laurent Dufour <[email protected]> writes:
> On 07/12/2021, 18:07:50, Nathan Lynch wrote:
>> Laurent Dufour <[email protected]> writes:
>>> On 07/12/2021, 15:32:39, Nathan Lynch wrote:
>>>> Is there a reasonable fallback for VMs where this parameter doesn't
>>>> exist? PowerVM partitions should always have it, but what do we want the
>>>> behavior to be on other hypervisors?
>>>
>>> In that case, there is no value displayed in the /proc/powerpc/lparcfg and
>>> the lparstat -i command will fall back to the device tree value. I can't
>>> see any valid reason to report the value defined in the device tree
>>> here.
>>
>> Here's a valid reason :-)
>>
>> lparstat isn't the only possible consumer of the interface, and the
>> 'ibm,partition-name' property and the dynamic system parameter clearly
>> serve a common purpose. 'ibm,partition-name' is provided by qemu.
>
> If the hypervisor is not providing this value, this is not the goal of this
> interface to fetch it from the device tree.
>
> Any consumer should be able to fall back on the device tree value, and
> there is no added value to do such a trick in the kernel when it can be
> done in the user space.

There is value in imposing a level of abstraction so that the semantics
are:

* Report the name assigned to the guest by the hosting environment, if
available

as opposed to

* Return the string returned by a RTAS call to ibm,get-system-parameter
with token 55, if implemented

The benefit is that consumers of lparcfg do not have to be coded with
the knowledge that "if a partition_name= line is absent, the
ibm,get-system-parameter RTAS call must have failed, so now I should
read /sys/firmware/devicetree/base/ibm,partition_name." That's the sort
of esoterica that is appropriate for the kernel to encapsulate.

And I'd say the effort involved (falling back to a root node property
lookup) is proportional to the benefit.

2021-12-09 08:54:17

by Laurent Dufour

[permalink] [raw]
Subject: Re: [PATCH v3] powerpc/pseries: read the lpar name from the firmware

On 08/12/2021, 16:21:29, Nathan Lynch wrote:
> Laurent Dufour <[email protected]> writes:
>> On 07/12/2021, 18:07:50, Nathan Lynch wrote:
>>> Laurent Dufour <[email protected]> writes:
>>>> On 07/12/2021, 15:32:39, Nathan Lynch wrote:
>>>>> Is there a reasonable fallback for VMs where this parameter doesn't
>>>>> exist? PowerVM partitions should always have it, but what do we want the
>>>>> behavior to be on other hypervisors?
>>>>
>>>> In that case, there is no value displayed in the /proc/powerpc/lparcfg and
>>>> the lparstat -i command will fall back to the device tree value. I can't
>>>> see any valid reason to report the value defined in the device tree
>>>> here.
>>>
>>> Here's a valid reason :-)
>>>
>>> lparstat isn't the only possible consumer of the interface, and the
>>> 'ibm,partition-name' property and the dynamic system parameter clearly
>>> serve a common purpose. 'ibm,partition-name' is provided by qemu.
>>
>> If the hypervisor is not providing this value, this is not the goal of this
>> interface to fetch it from the device tree.
>>
>> Any consumer should be able to fall back on the device tree value, and
>> there is no added value to do such a trick in the kernel when it can be
>> done in the user space.
>
> There is value in imposing a level of abstraction so that the semantics
> are:
>
> * Report the name assigned to the guest by the hosting environment, if
> available
>
> as opposed to
>
> * Return the string returned by a RTAS call to ibm,get-system-parameter
> with token 55, if implemented
>
> The benefit is that consumers of lparcfg do not have to be coded with
> the knowledge that "if a partition_name= line is absent, the
> ibm,get-system-parameter RTAS call must have failed, so now I should
> read /sys/firmware/devicetree/base/ibm,partition_name." That's the sort
> of esoterica that is appropriate for the kernel to encapsulate.
>
> And I'd say the effort involved (falling back to a root node property
> lookup) is proportional to the benefit.
>

I don't agree.
From the kernel point of view, I can't see any benefit, this is adding more
complexity to do in the kernel what can be done easily in user space.

This is typically what should be implemented in a user space shared library.

2021-12-09 13:53:45

by Nathan Lynch

[permalink] [raw]
Subject: Re: [PATCH v3] powerpc/pseries: read the lpar name from the firmware

Laurent Dufour <[email protected]> writes:
> On 08/12/2021, 16:21:29, Nathan Lynch wrote:
>> Laurent Dufour <[email protected]> writes:
>>> On 07/12/2021, 18:07:50, Nathan Lynch wrote:
>>>> Laurent Dufour <[email protected]> writes:
>>>>> On 07/12/2021, 15:32:39, Nathan Lynch wrote:
>>>>>> Is there a reasonable fallback for VMs where this parameter doesn't
>>>>>> exist? PowerVM partitions should always have it, but what do we want the
>>>>>> behavior to be on other hypervisors?
>>>>>
>>>>> In that case, there is no value displayed in the /proc/powerpc/lparcfg and
>>>>> the lparstat -i command will fall back to the device tree value. I can't
>>>>> see any valid reason to report the value defined in the device tree
>>>>> here.
>>>>
>>>> Here's a valid reason :-)
>>>>
>>>> lparstat isn't the only possible consumer of the interface, and the
>>>> 'ibm,partition-name' property and the dynamic system parameter clearly
>>>> serve a common purpose. 'ibm,partition-name' is provided by qemu.
>>>
>>> If the hypervisor is not providing this value, this is not the goal of this
>>> interface to fetch it from the device tree.
>>>
>>> Any consumer should be able to fall back on the device tree value, and
>>> there is no added value to do such a trick in the kernel when it can be
>>> done in the user space.
>>
>> There is value in imposing a level of abstraction so that the semantics
>> are:
>>
>> * Report the name assigned to the guest by the hosting environment, if
>> available
>>
>> as opposed to
>>
>> * Return the string returned by a RTAS call to ibm,get-system-parameter
>> with token 55, if implemented
>>
>> The benefit is that consumers of lparcfg do not have to be coded with
>> the knowledge that "if a partition_name= line is absent, the
>> ibm,get-system-parameter RTAS call must have failed, so now I should
>> read /sys/firmware/devicetree/base/ibm,partition_name." That's the sort
>> of esoterica that is appropriate for the kernel to encapsulate.
>>
>> And I'd say the effort involved (falling back to a root node property
>> lookup) is proportional to the benefit.
>>
>
> I don't agree.
> From the kernel point of view, I can't see any benefit, this is adding more
> complexity to do in the kernel what can be done easily in user space.

Applying this logic, I don't see how adding this to lparcfg would be
justified at all, because user space can already get at the parameter
using the privileged rtas syscall. Publish it to unprivileged programs
over D-Bus or something. That would minimize complexity for the kernel.


2021-12-09 15:59:40

by Laurent Dufour

[permalink] [raw]
Subject: Re: [PATCH v3] powerpc/pseries: read the lpar name from the firmware

On 09/12/2021, 14:53:31, Nathan Lynch wrote:
> Laurent Dufour <[email protected]> writes:
>> On 08/12/2021, 16:21:29, Nathan Lynch wrote:
>>> Laurent Dufour <[email protected]> writes:
>>>> On 07/12/2021, 18:07:50, Nathan Lynch wrote:
>>>>> Laurent Dufour <[email protected]> writes:
>>>>>> On 07/12/2021, 15:32:39, Nathan Lynch wrote:
>>>>>>> Is there a reasonable fallback for VMs where this parameter doesn't
>>>>>>> exist? PowerVM partitions should always have it, but what do we want the
>>>>>>> behavior to be on other hypervisors?
>>>>>>
>>>>>> In that case, there is no value displayed in the /proc/powerpc/lparcfg and
>>>>>> the lparstat -i command will fall back to the device tree value. I can't
>>>>>> see any valid reason to report the value defined in the device tree
>>>>>> here.
>>>>>
>>>>> Here's a valid reason :-)
>>>>>
>>>>> lparstat isn't the only possible consumer of the interface, and the
>>>>> 'ibm,partition-name' property and the dynamic system parameter clearly
>>>>> serve a common purpose. 'ibm,partition-name' is provided by qemu.
>>>>
>>>> If the hypervisor is not providing this value, this is not the goal of this
>>>> interface to fetch it from the device tree.
>>>>
>>>> Any consumer should be able to fall back on the device tree value, and
>>>> there is no added value to do such a trick in the kernel when it can be
>>>> done in the user space.
>>>
>>> There is value in imposing a level of abstraction so that the semantics
>>> are:
>>>
>>> * Report the name assigned to the guest by the hosting environment, if
>>> available
>>>
>>> as opposed to
>>>
>>> * Return the string returned by a RTAS call to ibm,get-system-parameter
>>> with token 55, if implemented
>>>
>>> The benefit is that consumers of lparcfg do not have to be coded with
>>> the knowledge that "if a partition_name= line is absent, the
>>> ibm,get-system-parameter RTAS call must have failed, so now I should
>>> read /sys/firmware/devicetree/base/ibm,partition_name." That's the sort
>>> of esoterica that is appropriate for the kernel to encapsulate.
>>>
>>> And I'd say the effort involved (falling back to a root node property
>>> lookup) is proportional to the benefit.
>>>
>>
>> I don't agree.
>> From the kernel point of view, I can't see any benefit, this is adding more
>> complexity to do in the kernel what can be done easily in user space.
>
> Applying this logic, I don't see how adding this to lparcfg would be
> justified at all, because user space can already get at the parameter
> using the privileged rtas syscall. Publish it to unprivileged programs
> over D-Bus or something. That would minimize complexity for the kernel.

As you know, there is a need for unprivileged user to read that value.
Adding this to lparcfg solves this issue. Also, this makes it easy to read
from user space, without the need to get plumbing like D-Bus in the
picture. This is simple, working fine, and user space can easily make the
choice to read the DT value if needed. Please keep in mind that most of the
time the hypervisor is providing that value.