[permalink] [raw]

Subject: Re: [PATCH 2/7] firmware: add offset to request_firmware_into_buf

On Mon, 26 Aug 2019 19:24:22 +0200,
Scott Branden wrote:
>
> Hi Takashi,
>
> On 2019-08-26 10:12 a.m., Takashi Iwai wrote:
> > On Mon, 26 Aug 2019 17:41:40 +0200,
> > Scott Branden wrote:
> >> HI Takashi,
> >>
> >> On 2019-08-26 8:20 a.m., Takashi Iwai wrote:
> >>> On Fri, 23 Aug 2019 21:44:42 +0200,
> >>> Scott Branden wrote:
> >>>> Hi Takashi,
> >>>>
> >>>> Thanks for review. comments below.
> >>>>
> >>>> On 2019-08-23 3:05 a.m., Takashi Iwai wrote:
> >>>>> On Thu, 22 Aug 2019 21:24:46 +0200,
> >>>>> Scott Branden wrote:
> >>>>>> Add offset to request_firmware_into_buf to allow for portions
> >>>>>> of firmware file to be read into a buffer. Necessary where firmware
> >>>>>> needs to be loaded in portions from file in memory constrained systems.
> >>>>> AFAIU, this won't work with the fallback user helper, right?
> >>>> Seems to work fine in the fw_run_tests.sh with fallbacks.
> >>> But how? You patch doesn't change anything about the fallback loading
> >>> mechanism.
> >> Correct - I didn't change any of the underlying mechanisms,
> >> so however request_firmware_into_buf worked before it still does.
> >>> Or, if the expected behavior is to load the whole content
> >>> and then copy a part, what's the merit of this API?
> >> The merit of the API is that the entire file is not copied into a buffer.
> >> In my use case, the buffer is a memory region in PCIe space that isn't
> >> even large enough for the whole file. So the only way to get the file
> >> is to read it
> >> in portions.
> > BTW: does the use case above mean that the firmware API directly
> > writes onto the given PCI iomem region? If so, I'm not sure whether
> > it would work as expected on all architectures. There must be a
> > reason of the presence of iomem-related API like memcpy_toio()...
> Yes, we access the PCI region directly in the driver and thus also
> through request_firmware_into_buf.

Then you really need to access via the standard APIs for iomem.
The normal memory copy would work only on some architectures like
x86.

> I will admit I am not familiar with every subtlety of PCI
> accesses. Any comments to the Valkyrie driver in this patch series are
> appreciated.
> But not all drivers need to work on all architectures. I can add a
> depends on x86 64bit architectures to the driver to limit it to such.

But it's an individual board on PCIe, and should work no matter which
architecture is? Or is this really exclusive to x86?

thanks,

Takashi

2019-08-27 13:56:01

by Arnd Bergmann

[permalink] [raw]

Subject: Re: [PATCH 5/7] bcm-vk: add bcm_vk UAPI

On Thu, Aug 22, 2019 at 9:25 PM Scott Branden
<[email protected]> wrote:
>
> Add user space api for bcm-vk driver.

> +
> +struct vk_metadata {
> + /* struct version, always backwards compatible */
> + __u32 version;
> +
> + /* Version 0 fields */
> + __u32 card_status;
> +#define VK_CARD_STATUS_FASTBOOT_READY BIT(0)
> +#define VK_CARD_STATUS_FWLOADER_READY BIT(1)
> +
> + __u32 firmware_version;
> + __u32 fw_status;
> + /* End version 0 fields */
> +
> + __u64 reserved[14];
> + /* Total of 16*u64 for all versions */
> +};

I'd suggest getting rid of the API version fields, just leave the version 0
fields here and add a new structure + ioctl if you need other
fields

Versioning usually just adds complexity and is hard to get right.

> +struct vk_access {
> + __u8 barno; /* BAR number to use */
> + __u8 type; /* Type of access */
> +#define VK_ACCESS_READ 0
> +#define VK_ACCESS_WRITE 1
> + __u32 len; /* length of data */
> + __u64 offset; /* offset in BAR */
> + __u32 *data; /* where to read/write data to */
> +};

The pointer in the last member makes the structure incompatible between
32-bit and 64-bit user space. You could work around that using a __u64
member and turning that into a pointer using the u64_to_user_ptr()
macro in the driver in a portable way.

However, since this seems to be a read/write type interface, maybe
it's better to just use read/write file operations.

I also wonder if the interface should be on a higher abstraction level
here.

Arnd

2019-08-27 14:16:36

by Arnd Bergmann

[permalink] [raw]

Subject: Re: [PATCH 6/7] misc: bcm-vk: add Broadcom Valkyrie driver

On Thu, Aug 22, 2019 at 9:25 PM Scott Branden
<[email protected]> wrote:
>
> Add Broadcom Valkyrie driver offload engine.
> This driver interfaces to the Valkyrie PCIe offload engine to perform
> should offload functions as video transcoding on multiple streams
> in parallel. Valkyrie device is booted from files loaded using
> request_firmware_into_buf mechanism. After booted card status is updated
> and messages can then be sent to the card.
> Such messages contain scatter gather list of addresses
> to pull data from the host to perform operations on.
>
> Signed-off-by: Scott Branden <[email protected]>
> Signed-off-by: Desmond Yan <[email protected]>
> Signed-off-by: James Hu <[email protected]>

Can you explain the decision to make this is a standalone misc driver
rather than hooking into the existing framework in drivers/media?

There is an existing interface that looks like it could fit the hardware
in include/media/v4l2-mem2mem.h. Have you considered using that?

There is also support for video transcoding using GPUs in
driver/gpu/drm/, that could also be used in theory, though it sounds
like a less optimal fit.

Arnd

2019-08-27 14:50:57

by Kieran Bingham

[permalink] [raw]

Subject: Re: [PATCH 5/7] bcm-vk: add bcm_vk UAPI

Hi Scott,

On 22/08/2019 20:24, Scott Branden wrote:
> Add user space api for bcm-vk driver.
>
> Signed-off-by: Scott Branden <[email protected]>
> ---
> include/uapi/linux/misc/bcm_vk.h | 88 ++++++++++++++++++++++++++++++++
> 1 file changed, 88 insertions(+)
> create mode 100644 include/uapi/linux/misc/bcm_vk.h
>
> diff --git a/include/uapi/linux/misc/bcm_vk.h b/include/uapi/linux/misc/bcm_vk.h
> new file mode 100644
> index 000000000000..df7dfd7f0702
> --- /dev/null
> +++ b/include/uapi/linux/misc/bcm_vk.h
> @@ -0,0 +1,88 @@
> +/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
> +/*
> + * Copyright(c) 2018 Broadcom
> + */
> +
> +#ifndef __UAPI_LINUX_MISC_BCM_VK_H
> +#define __UAPI_LINUX_MISC_BCM_VK_H
> +
> +#include <linux/ioctl.h>
> +#include <linux/types.h>
> +
> +struct vk_metadata {
> + /* struct version, always backwards compatible */
> + __u32 version;
> +
> + /* Version 0 fields */
> + __u32 card_status;
> +#define VK_CARD_STATUS_FASTBOOT_READY BIT(0)
> +#define VK_CARD_STATUS_FWLOADER_READY BIT(1)
> +
> + __u32 firmware_version;
> + __u32 fw_status;
> + /* End version 0 fields */
> +
> + __u64 reserved[14];
> + /* Total of 16*u64 for all versions */
> +};
> +
> +struct vk_image {
> + __u32 type; /* Type of image */
> +#define VK_IMAGE_TYPE_BOOT1 1 /* 1st stage (load to SRAM) */
> +#define VK_IMAGE_TYPE_BOOT2 2 /* 2nd stage (load to DDR) */
> + char filename[64]; /* Filename of image */
> +};
> +
> +/* default firmware images names */
> +#define VK_BOOT1_DEF_FILENAME "vk-boot1.bin"
> +#define VK_BOOT2_DEF_FILENAME "vk-boot2.bin"
> +
> +struct vk_access {
> + __u8 barno; /* BAR number to use */
> + __u8 type; /* Type of access */
> +#define VK_ACCESS_READ 0
> +#define VK_ACCESS_WRITE 1
> + __u32 len; /* length of data */
> + __u64 offset; /* offset in BAR */
> + __u32 *data; /* where to read/write data to */
> +};
> +
> +struct vk_reset {
> + __u32 arg1;
> + __u32 arg2;
> +};
> +
> +#define VK_MAGIC 0x5E
> +
> +/* Get metadata from Valkyrie (firmware version, card status, etc) */
> +#define VK_IOCTL_GET_METADATA _IOR(VK_MAGIC, 0x1, struct vk_metadata)
> +
> +/* Load image to Valkyrie */
> +#define VK_IOCTL_LOAD_IMAGE _IOW(VK_MAGIC, 0x2, struct vk_image)
> +
> +/* Read data from Valkyrie */
> +#define VK_IOCTL_ACCESS_BAR _IOWR(VK_MAGIC, 0x3, struct vk_access)
> +
> +/* Send Reset to Valkyrie */
> +#define VK_IOCTL_RESET _IOW(VK_MAGIC, 0x4, struct vk_reset)

It sounds a bit like the valkyrie is a generic asynchronous coprocessor,
does it merit using the remoteproc interfaces to control it ?

Or is it really just a single purpose cell doing video operations ?

--
Kieran

> +
> +/*
> + * message block - basic unit in the message where a message's size is always
> + * N x sizeof(basic_block)
> + */
> +struct vk_msg_blk {
> + __u8 function_id;
> +#define VK_FID_TRANS_BUF 5
> +#define VK_FID_SHUTDOWN 8
> + __u8 size;
> + __u16 queue_id:4;
> + __u16 msg_id:12;
> + __u32 context_id;
> + __u32 args[2];
> +#define VK_CMD_PLANES_MASK 0x000F /* number of planes to up/download */
> +#define VK_CMD_UPLOAD 0x0400 /* memory transfer to vk */
> +#define VK_CMD_DOWNLOAD 0x0500 /* memory transfer from vk */
> +#define VK_CMD_MASK 0x0F00 /* command mask */
> +};
> +
> +#endif /* __UAPI_LINUX_MISC_BCM_VK_H */
>

--
Regards
--
Kieran

2019-08-27 15:27:24

On Sat, Feb 22, 2020 at 12:37 AM Scott Branden
<[email protected]> wrote:
> On 2020-02-21 12:44 a.m., Arnd Bergmann wrote:
> > On Fri, Feb 21, 2020 at 1:11 AM Scott Branden
> > <[email protected]> wrote:
> >> On 2019-10-11 6:31 a.m., Luis Chamberlain wrote:
> >>> On Tue, Aug 27, 2019 at 12:40:02PM +0200, Takashi Iwai wrote:
> >>>> On Mon, 26 Aug 2019 19:24:22 +0200,
> >>>> Scott Branden wrote:
> >>>>> I will admit I am not familiar with every subtlety of PCI
> >>>>> accesses. Any comments to the Valkyrie driver in this patch series are
> >>>>> appreciated.
> >>>>> But not all drivers need to work on all architectures. I can add a
> >>>>> depends on x86 64bit architectures to the driver to limit it to such.
> >>>> But it's an individual board on PCIe, and should work no matter which
> >>>> architecture is? Or is this really exclusive to x86?
> >>> Poke Scott.
> >> Yes, this is exclusive to x86.
> >> In particular, 64-bit x86 server class machines with PCIe gen3 support.
> >> There is no reason for these PCIe boards to run in other lower end
> >> machines or architectures.
> > It doesn't really matter that much what you expect your customers to
> > do with your product, or what works a particular machine today, drivers
> > should generally be written in a portable manner anyway and use
> > the documented APIs. memcpy() into an __iomem pointer is not
> > portable and while it probably works on any x86 machine today, please
> > just don't do it. If you use 'sparse' to check your code, that would normally
> > result in an address space warning, unless you add __force and a
> > long comment explaining why you cannot just use memcpy_to_io()
> > instead. At that point, you are already better off usingn memcpy_to_io() ;-)
> >
> > Arnd
> I am a not performing a memcpy at all right now.
> I am calling a request_firmware_into_buf call and do not need to make a
> copy.
> This function eventually calls kernel_read_file, which then makes at
> indirect call in __vfs_read to perform the read to memory.

Well, that comes down to a memcpy() in the end, even if you don't
spell it like that in your driver. It may be a copy_from_user(), but
clearly not a memcpy_to_io().

> From there I am lost as to what operation happens to achieve this.
> The read function would need to detect the buf is in io space and
> perform the necessary operation.
> Anyone with any knowledge on how to make this read to io space would be
> appreciated?

I don't think modifying the common code is helpful in this case:
any access to PCI MMIO space is inevitably going to be slow, so
an extra memcpy() in your driver is not going to cause any noticeable
overhead, but the generic functions are meant to be fast for the
normal use case and not gain any other features.

Arnd