On Tue, Jan 18, 2022 at 09:16:01AM -0800, Vikash Bansal wrote:
> In the current implementation, the PCI capability list is parsed from
> the beginning to find each capability, which results in a large number
> of redundant PCI reads.
>
> Instead, we can parse the complete list just once, store it in the
> pci_dev structure, and get the offset of each capability directly from
> the pci_dev structure.
>
> This implementation improves pci devices initialization time by ~2-3% in
> case of bare metal and 7-8% in case of VM running on ESXi.
What is that in terms of "wall clock" time? % is hard to know here, and
of course it will depend on the PCI bus speed, right?
> It also adds a memory overhead of 20bytes (value of PCI_CAP_ID_MAX) per
> PCI device.
>
> Signed-off-by: Vikash Bansal <[email protected]>
> ---
> drivers/pci/pci.c | 43 ++++++++++++++++++++++++++++++++++++-------
> drivers/pci/probe.c | 5 +++++
> include/linux/pci.h | 2 ++
> 3 files changed, 43 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 3d2fb394986a..8e024db30262 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -468,6 +468,41 @@ static u8 __pci_bus_find_cap_start(struct pci_bus *bus,
> return 0;
> }
>
> +
> +/**
> + * pci_find_all_capabilities - Read all capabilities
> + * @dev: the PCI device
> + *
> + * Read all capabilities and store offsets in cap_off
> + * array in pci_dev structure.
> + */
> +void pci_find_all_capabilities(struct pci_dev *dev)
> +{
> + int ttl = PCI_FIND_CAP_TTL;
> + u16 ent;
> + u8 pos;
> + u8 id;
> +
> + pos = __pci_bus_find_cap_start(dev->bus, dev->devfn, dev->hdr_type);
> + if (!pos)
> + return;
> + pci_bus_read_config_byte(dev->bus, dev->devfn, pos, &pos);
> + while (ttl--) {
> + if (pos < 0x40)
What is this magic value of 0x40?
> + break;
> + pos &= ~3;
Why ~3?
> + pci_bus_read_config_word(dev->bus, dev->devfn, pos, &ent);
> + id = ent & 0xff;
Do you really need the & if you are truncating it?
> + if (id == 0xff)
> + break;
> +
> + /* Read first instance of capability */
> + if (!(dev->cap_off[id]))
> + dev->cap_off[id] = pos;
Shouldn't you have checked this before you read the value?
> + pos = (ent >> 8);
What about walking the list using __pci_find_next_cap() like before?
Why is this somehow the same as the old function?
> + }
> +}
> +
> /**
> * pci_find_capability - query for devices' capabilities
> * @dev: PCI device to query
> @@ -489,13 +524,7 @@ static u8 __pci_bus_find_cap_start(struct pci_bus *bus,
> */
> u8 pci_find_capability(struct pci_dev *dev, int cap)
> {
> - u8 pos;
> -
> - pos = __pci_bus_find_cap_start(dev->bus, dev->devfn, dev->hdr_type);
> - if (pos)
> - pos = __pci_find_next_cap(dev->bus, dev->devfn, pos, cap);
> -
> - return pos;
> + return dev->cap_off[cap];
> }
> EXPORT_SYMBOL(pci_find_capability);
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 087d3658f75c..bacab12cedbb 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1839,6 +1839,11 @@ int pci_setup_device(struct pci_dev *dev)
> dev->hdr_type = hdr_type & 0x7f;
> dev->multifunction = !!(hdr_type & 0x80);
> dev->error_state = pci_channel_io_normal;
> + /*
> + * Read all capabilities and store offsets in cap_off
> + * array in pci_dev structure.
> + */
Comment is not needed if the function name is descriptive.
> + pci_find_all_capabilities(dev);
And it is, so no need for the comment.
> set_pcie_port_type(dev);
>
> pci_set_of_node(dev);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 18a75c8e615c..d221c73e67f8 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -326,6 +326,7 @@ struct pci_dev {
> unsigned int class; /* 3 bytes: (base,sub,prog-if) */
> u8 revision; /* PCI revision, low byte of class word */
> u8 hdr_type; /* PCI header type (`multi' flag masked out) */
> + u8 cap_off[PCI_CAP_ID_MAX]; /* Offsets of all pci capabilities */
Did you run 'pahole' to ensure you are not adding extra padding bytes
here?
> #ifdef CONFIG_PCIEAER
> u16 aer_cap; /* AER capability offset */
> struct aer_stats *aer_stats; /* AER stats for this device */
> @@ -1128,6 +1129,7 @@ void pci_sort_breadthfirst(void);
>
> u8 pci_bus_find_capability(struct pci_bus *bus, unsigned int devfn, int cap);
> u8 pci_find_capability(struct pci_dev *dev, int cap);
> +void pci_find_all_capabilities(struct pci_dev *dev);
Why is this now a global function and not one just local to the pci
core? Who else would ever need to call it?
thanks,
greg k-h
On 20/01/22, 11:56 AM, "Greg KH" <[email protected]> wrote:
Hi Greg,
Thanks for the comments
>On Tue, Jan 18, 2022 at 09:16:01AM -0800, Vikash Bansal wrote:
>> In the current implementation, the PCI capability list is parsed from
>> the beginning to find each capability, which results in a large number
>> of redundant PCI reads.
>>
>> Instead, we can parse the complete list just once, store it in the
>> pci_dev structure, and get the offset of each capability directly from
>> the pci_dev structure.
>>
>> This implementation improves pci devices initialization time by ~2-3% in
>> case of bare metal and 7-8% in case of VM running on ESXi.
>
>What is that in terms of "wall clock" time? % is hard to know here, and
>of course it will depend on the PCI bus speed, right?
>
In terms of "wall clock" time:
For bare-metal it reduced from 270ms to 261ms
And for VM it reduced from 201ms to 184ms.
>> It also adds a memory overhead of 20bytes (value of PCI_CAP_ID_MAX) per
>> PCI device.
>>
>> Signed-off-by: Vikash Bansal <[email protected]>
>> ---
>> drivers/pci/pci.c | 43 ++++++++++++++++++++++++++++++++++++-------
>> drivers/pci/probe.c | 5 +++++
>> include/linux/pci.h | 2 ++
>> 3 files changed, 43 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 3d2fb394986a..8e024db30262 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -468,6 +468,41 @@ static u8 __pci_bus_find_cap_start(struct pci_bus *bus,
>> return 0;
>> }
>>
>> +
>> +/**
>> + * pci_find_all_capabilities - Read all capabilities
>> + * @dev: the PCI device
>> + *
>> + * Read all capabilities and store offsets in cap_off
>> + * array in pci_dev structure.
>> + */
>> +void pci_find_all_capabilities(struct pci_dev *dev)
>> +{
>> + int ttl = PCI_FIND_CAP_TTL;
>> + u16 ent;
>> + u8 pos;
>> + u8 id;
>> +
>> + pos = __pci_bus_find_cap_start(dev->bus, dev->devfn, dev->hdr_type);
>> + if (!pos)
>> + return;
>> + pci_bus_read_config_byte(dev->bus, dev->devfn, pos, &pos);
>> + while (ttl--) {
>> + if (pos < 0x40)
>
>What is this magic value of 0x40?
>
0x40 is the start address of capability list. This code is copied from function __pci_find_next_cap_ttl
>> + break;
>> + pos &= ~3;
>
>Why ~3?
>
Capability start address is 4 byte aligned. This code is also copied from __pci_find_next_cap_ttl.
>> + pci_bus_read_config_word(dev->bus, dev->devfn, pos, &ent);
>> + id = ent & 0xff;
>
>Do you really need the & if you are truncating it?
>
Yes, this is not really required. But again, this code is copied from __pci_find_next_cap_ttl.
>> + if (id == 0xff)
>> + break;
>> +
>> + /* Read first instance of capability */
>> + if (!(dev->cap_off[id]))
>> + dev->cap_off[id] = pos;
>
>Shouldn't you have checked this before you read the value?
>
Yes, will move this code
>> + pos = (ent >> 8);
>
>What about walking the list using __pci_find_next_cap() like before?
>Why is this somehow the same as the old function?
>
__pci_find_next_cap() is used to find a given capability,
It can't be used to walk through the list in this case.
>> + }
>> +}
>> +
>> /**
>> * pci_find_capability - query for devices' capabilities
>> * @dev: PCI device to query
>> @@ -489,13 +524,7 @@ static u8 __pci_bus_find_cap_start(struct pci_bus *bus,
>> */
>> u8 pci_find_capability(struct pci_dev *dev, int cap)
>> {
>> - u8 pos;
>> -
> - pos = __pci_bus_find_cap_start(dev->bus, dev->devfn, dev->hdr_type);
>> - if (pos)
>> - pos = __pci_find_next_cap(dev->bus, dev->devfn, pos, cap);
>> -
>> - return pos;
>> + return dev->cap_off[cap];
>> }
>> EXPORT_SYMBOL(pci_find_capability);
>>
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index 087d3658f75c..bacab12cedbb 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -1839,6 +1839,11 @@ int pci_setup_device(struct pci_dev *dev)
>> dev->hdr_type = hdr_type & 0x7f;
>> dev->multifunction = !!(hdr_type & 0x80);
>> dev->error_state = pci_channel_io_normal;
>> + /*
>> + * Read all capabilities and store offsets in cap_off
>> + * array in pci_dev structure.
>> + */
>
>Comment is not needed if the function name is descriptive.
>
ok
>> + pci_find_all_capabilities(dev);
>
>And it is, so no need for the comment.
>
>> set_pcie_port_type(dev);
>>
>> pci_set_of_node(dev);
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 18a75c8e615c..d221c73e67f8 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -326,6 +326,7 @@ struct pci_dev {
>> unsigned int class; /* 3 bytes: (base,sub,prog-if) */
>> u8 revision; /* PCI revision, low byte of class word */
>> u8 hdr_type; /* PCI header type (`multi' flag masked out) */
>> + u8 cap_off[PCI_CAP_ID_MAX]; /* Offsets of all pci capabilities */
>
>Did you run 'pahole' to ensure you are not adding extra padding bytes
>here?
>
>> #ifdef CONFIG_PCIEAER
>> u16 aer_cap; /* AER capability offset */
>> struct aer_stats *aer_stats; /* AER stats for this device */
>> @@ -1128,6 +1129,7 @@ void pci_sort_breadthfirst(void);
>>
>> u8 pci_bus_find_capability(struct pci_bus *bus, unsigned int devfn, int cap);
>> u8 pci_find_capability(struct pci_dev *dev, int cap);
>> +void pci_find_all_capabilities(struct pci_dev *dev);
>
>Why is this now a global function and not one just local to the pci
>core? Who else would ever need to call it?
Will make pci_find_all_capabilitie local and move it to probe.c
>
>thanks,
>
>greg k-h
On 20/01/22, 11:56 AM, "Greg KH" <[email protected]> wrote:
Run pahole for pci_dev structure, it is not adding any padding bytes.
Please refer to my previous email for replies to Greg's other comments.
>On Tue, Jan 18, 2022 at 09:16:01AM -0800, Vikash Bansal wrote:
>> In the current implementation, the PCI capability list is parsed from
>> the beginning to find each capability, which results in a large number
>> of redundant PCI reads.
>>
>> Instead, we can parse the complete list just once, store it in the
>> pci_dev structure, and get the offset of each capability directly from
>> the pci_dev structure.
>>
>> This implementation improves pci devices initialization time by ~2-3% in
>> case of bare metal and 7-8% in case of VM running on ESXi.
>
>What is that in terms of "wall clock" time? % is hard to know here, and
>of course it will depend on the PCI bus speed, right?
>
>> It also adds a memory overhead of 20bytes (value of PCI_CAP_ID_MAX) per
>> PCI device.
>>
>> Signed-off-by: Vikash Bansal <[email protected]>
>> ---
>> drivers/pci/pci.c | 43 ++++++++++++++++++++++++++++++++++++-------
>> drivers/pci/probe.c | 5 +++++
>> include/linux/pci.h | 2 ++
>> 3 files changed, 43 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 3d2fb394986a..8e024db30262 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -468,6 +468,41 @@ static u8 __pci_bus_find_cap_start(struct pci_bus *bus,
>> return 0;
>> }
>>
>> +
>> +/**
>> + * pci_find_all_capabilities - Read all capabilities
>> + * @dev: the PCI device
>> + *
>> + * Read all capabilities and store offsets in cap_off
>> + * array in pci_dev structure.
>> + */
>> +void pci_find_all_capabilities(struct pci_dev *dev)
>> +{
>> + int ttl = PCI_FIND_CAP_TTL;
>> + u16 ent;
>> + u8 pos;
>> + u8 id;
>> +
>> + pos = __pci_bus_find_cap_start(dev->bus, dev->devfn, dev->hdr_type);
>> + if (!pos)
>> + return;
>> + pci_bus_read_config_byte(dev->bus, dev->devfn, pos, &pos);
>> + while (ttl--) {
>> + if (pos < 0x40)
>
>What is this magic value of 0x40?
>
>> + break;
>> + pos &= ~3;
>
>Why ~3?
>
>> + pci_bus_read_config_word(dev->bus, dev->devfn, pos, &ent);
>> + id = ent & 0xff;
>
>Do you really need the & if you are truncating it?
>
>> + if (id == 0xff)
>> + break;
>> +
>> + /* Read first instance of capability */
>> + if (!(dev->cap_off[id]))
>> + dev->cap_off[id] = pos;
>
>Shouldn't you have checked this before you read the value?
>
>> + pos = (ent >> 8);
>
>What about walking the list using __pci_find_next_cap() like before?
>Why is this somehow the same as the old function?
>
>> + }
>> +}
>> +
>> /**
>> * pci_find_capability - query for devices' capabilities
>> * @dev: PCI device to query
>> @@ -489,13 +524,7 @@ static u8 __pci_bus_find_cap_start(struct pci_bus *bus,
>> */
>> u8 pci_find_capability(struct pci_dev *dev, int cap)
>> {
>> - u8 pos;
>> -
> - pos = __pci_bus_find_cap_start(dev->bus, dev->devfn, dev->hdr_type);
>> - if (pos)
>> - pos = __pci_find_next_cap(dev->bus, dev->devfn, pos, cap);
>> -
>> - return pos;
>> + return dev->cap_off[cap];
>> }
>> EXPORT_SYMBOL(pci_find_capability);
>>
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index 087d3658f75c..bacab12cedbb 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -1839,6 +1839,11 @@ int pci_setup_device(struct pci_dev *dev)
>> dev->hdr_type = hdr_type & 0x7f;
>> dev->multifunction = !!(hdr_type & 0x80);
>> dev->error_state = pci_channel_io_normal;
>> + /*
>> + * Read all capabilities and store offsets in cap_off
>> + * array in pci_dev structure.
>> + */
>
>Comment is not needed if the function name is descriptive.
>
>> + pci_find_all_capabilities(dev);
>
>And it is, so no need for the comment.
>
>> set_pcie_port_type(dev);
>>
>> pci_set_of_node(dev);
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 18a75c8e615c..d221c73e67f8 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -326,6 +326,7 @@ struct pci_dev {
>> unsigned int class; /* 3 bytes: (base,sub,prog-if) */
>> u8 revision; /* PCI revision, low byte of class word */
>> u8 hdr_type; /* PCI header type (`multi' flag masked out) */
>> + u8 cap_off[PCI_CAP_ID_MAX]; /* Offsets of all pci capabilities */
>
>Did you run 'pahole' to ensure you are not adding extra padding bytes
>here?
>
>> #ifdef CONFIG_PCIEAER
>> u16 aer_cap; /* AER capability offset */
>> struct aer_stats *aer_stats; /* AER stats for this device */
>> @@ -1128,6 +1129,7 @@ void pci_sort_breadthfirst(void);
>>
>> u8 pci_bus_find_capability(struct pci_bus *bus, unsigned int devfn, int cap);
>> u8 pci_find_capability(struct pci_dev *dev, int cap);
>> +void pci_find_all_capabilities(struct pci_dev *dev);
>
>Why is this now a global function and not one just local to the pci
>core? Who else would ever need to call it?
>
>thanks,
>
>greg k-h
On Fri, Jan 21, 2022 at 05:26:35PM +0000, Vikash Bansal wrote:
> On 20/01/22, 11:56 AM, "Greg KH" <[email protected]> wrote:
>
> Run pahole for pci_dev structure, it is not adding any padding bytes.
> Please refer to my previous email for replies to Greg's other comments.
Please don't indent your entire response. The original posting
apparently didn't go to [email protected] or was rejected,
maybe because it wasn't plain text (see
http://vger.kernel.org/majordomo-info.html)?
It doesn't appear in the thread at
https://lore.kernel.org/all/[email protected]/
> >On Tue, Jan 18, 2022 at 09:16:01AM -0800, Vikash Bansal wrote:
> >> In the current implementation, the PCI capability list is parsed from
> >> the beginning to find each capability, which results in a large number
> >> of redundant PCI reads.
> >>
> >> Instead, we can parse the complete list just once, store it in the
> >> pci_dev structure, and get the offset of each capability directly from
> >> the pci_dev structure.
> ...
Hi Bjorn,
On 1/21/22 11:42 AM, Bjorn Helgaas wrote:
> On Fri, Jan 21, 2022 at 05:26:35PM +0000, Vikash Bansal wrote:
>> On 20/01/22, 11:56 AM, "Greg KH" <[email protected]> wrote:
>>
>> Run pahole for pci_dev structure, it is not adding any padding bytes.
>> Please refer to my previous email for replies to Greg's other comments.
>
> Please don't indent your entire response. The original posting
> apparently didn't go to [email protected] or was rejected,
> maybe because it wasn't plain text (see
> http://vger.kernel.org/majordomo-info.html)?
>
> It doesn't appear in the thread at
> https://lore.kernel.org/all/[email protected]/
>
Looking at the source for Vikash's first email in this thread, I see:
"Content-Type: text/plain", so I don't think that was the issue. Also,
the patch was sent using git-send-email: "X-Mailer: git-send-email
2.6.2".
Is there a way to find out exactly why that email might have prompted
the mailing list to drop it?
Thank you!
Regards,
Srivatsa
VMware Photon OS
Thanks Srivatsa for clarification, even I am surprised to see this.
I will send V2 of the patch, hopefully that should appear in the thread.
On 22/01/22, 7:42 AM, "Srivatsa S. Bhat" <[email protected]> wrote:
>
>Hi Bjorn,
>
>On 1/21/22 11:42 AM, Bjorn Helgaas wrote:
>> On Fri, Jan 21, 2022 at 05:26:35PM +0000, Vikash Bansal wrote:
>>> On 20/01/22, 11:56 AM, "Greg KH" <[email protected]> wrote:
>>>
>>> Run pahole for pci_dev structure, it is not adding any padding bytes.
>>> Please refer to my previous email for replies to Greg's other comments.
>>
>> Please don't indent your entire response. The original posting
>> apparently didn't go to [email protected] or was rejected,
>> maybe because it wasn't plain text (see
>> https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fvger.kernel.org%2Fmajordomo-info.html&data=04%7C01%7Cbvikas%40vmware.com%7Caeeecd0515704520558208d9dd4cb350%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637784143774862668%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=DeEhueGnvJRTeavdK86HO2cSDEwWh7DLTte%2F4P00alA%3D&reserved=0)?
>>
>> It doesn't appear in the thread at
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F7E2C2648-76CE-4987-AB4F-7B4576F10D7B%40vmware.com%2F&data=04%7C01%7Cbvikas%40vmware.com%7Caeeecd0515704520558208d9dd4cb350%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637784143774862668%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=iTMr0mSNFph6AWMohfyCGxM6rVwXTjUQ5QW29bhecMQ%3D&reserved=0
>>
>
>Looking at the source for Vikash's first email in this thread, I see:
>"Content-Type: text/plain", so I don't think that was the issue. Also,
>the patch was sent using git-send-email: "X-Mailer: git-send-email
>2.6.2".
>
>Is there a way to find out exactly why that email might have prompted
>the mailing list to drop it?
>
>Thank you!
>
>Regards,
>Srivatsa
>VMware Photon OS
In my earlier response, I agreed to few changes suggested by Greg.
I observed some issue while implementing 2 of those changes.
On 20/01/22, 11:01 PM, "Vikash Bansal" <[email protected]> wrote:
>>> + pci_bus_read_config_word(dev->bus, dev->devfn, pos, &ent);
>>> + id = ent & 0xff;
>>> + if (id == 0xff)
>>> + break;
>>> +
>>> + /* Read first instance of capability */
>>> + if (!(dev->cap_off[id]))
>>> + dev->cap_off[id] = pos;
>>
>>Shouldn't you have checked this before you read the value?
>>
>
>Yes, will move this code
>
Cannot be moved before read, because "id" used in this "if" conditions is
returned by last read.
>>> @@ -1128,6 +1129,7 @@ void pci_sort_breadthfirst(void);
>>>
>>> u8 pci_bus_find_capability(struct pci_bus *bus, unsigned int devfn, int cap);
>>> u8 pci_find_capability(struct pci_dev *dev, int cap);
>>> +void pci_find_all_capabilities(struct pci_dev *dev);
>>
>>Why is this now a global function and not one just local to the pci
>>core? Who else would ever need to call it?
>
>Will make pci_find_all_capabilitie local and move it to probe.c
>
pci_find_all_capabilities function is called only once in probe.c file,
but this function is calling __pci_bus_find_cap_start which is defined in pci.c,
so need to implement this function in pci.c and make it global.
Thanks
Vikash