Dear Kernel Maintainers,
I am a driver developer and would like to upstream the ArmChina Zhouyi NPU driver ("Zhouyi" is the brand) to accel subsystem.
The driver is already open sourced (both UMD and KMD) and anyone can find the code from https://github.com/Arm-China/Compass_NPU_Driver.git.
This driver is responsible for scheduling AI inference tasks to the NPU cores (V1/V2/V3). Specifically, a simplified end-to-end flow is:
1. A TFLite/ONNX model is transformed to an executable binary file in ELF format by the NN graph compiler (designed by ArmChina)
2. An application loads the executable binary file to UMD and provides the input data.
3. UMD parses the binary and sends ioctls to KMD (open device, do memory allocation/mmap/free, submit the job descriptor).
4. KMD dispatches the job to NPU h/w, handles interrupts and updates the execution status.
5. UMD polls the status of the pre-scheduled job.
6. The application gets the output results.
So...for the upstreaming,
Q1: do you think our NPU driver is suitable for accel? If the answer is yes, which tree & branch should the patches be based on?
Q2: in thread https://lore.kernel.org/lkml/[email protected]/ showing a similar case, Oded mentioned that:
"If we would have upstreamed a new driver, the expectation would have been that we would use some drm mechanisms.", and
"the minimal requirement is to use GEM/BOs for memory management operations".
I guess those requirements are also applicable for the Zhouyi NPU KMD? Currently, the memory management (MM) in KMD is based on dma-mapping APIs, which handles both reserved CMA region(s) and SMMU mapped buffers, and supports the dma-buf framework. Maybe I should replace the implementations with DRM APIs.
Q3: if you have looked at the KMD code, do you think I should make any other major change before submitting the first patch series? Thank you!
Thanks for your time and look forward to your reply~ ????
Best Regards,
Dejia
IMPORTANT NOTICE: The contents of this email and any attachments may be privileged and confidential. If you are not the intended recipient, please delete the email immediately. It is strictly prohibited to disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ©Arm Technology (China) Co., Ltd copyright and reserve all rights. 重要提示:本邮件(包括任何附件)可能含有专供明确的个人或目的使用的机密信息,并受法律保护。如果您并非该收件人,请立即删除此邮件。严禁通过任何渠道,以任何目的,向任何人披露、储存或复制邮件信息或者据此采取任何行动。感谢您的配合。 ©安谋科技(中国)有限公司 版权所有并保留一切权利。
On Thu, Mar 28, 2024 at 07:46:01AM +0000, Dejia Shang wrote:
> IMPORTANT NOTICE: The contents of this email and any attachments may be privileged and confidential. If you are not the intended recipient, please delete the email immediately. It is strictly prohibited to disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ©Arm Technology (China) Co., Ltd copyright and reserve all rights. 重要提示:本邮件(包括任何附件)可能含有专供明确的个人或目的使用的机密信息,并受法律保护。如果您并非该收件人,请立即删除此邮件。严禁通过任何渠道,以任何目的,向任何人披露、储存或复制邮件信息或者据此采取任何行动。感谢您的配合。 ©安谋科技(中国)有限公司 版权所有并保留一切权利。
You need to get this fixed, otherwise people will delete this email
as you have suggested and/or refrain from responding to this email.
Please talk to your local IT and get a setup without this disclaimer for
all mailing list activities.
--
Regards,
Sudeep
On Thu, Mar 28, 2024 at 10:01 AM Dejia Shang <[email protected]> wrote:
>
> Dear Kernel Maintainers,
>
> I am a driver developer and would like to upstream the ArmChina Zhouyi NPU driver ("Zhouyi" is the brand) to accel subsystem.
>
> The driver is already open sourced (both UMD and KMD) and anyone can find the code from https://github.com/Arm-China/Compass_NPU_Driver.git.
>
> This driver is responsible for scheduling AI inference tasks to the NPU cores (V1/V2/V3). Specifically, a simplified end-to-end flow is:
>
> 1. A TFLite/ONNX model is transformed to an executable binary file in ELF format by the NN graph compiler (designed by ArmChina)
> 2. An application loads the executable binary file to UMD and provides the input data.
> 3. UMD parses the binary and sends ioctls to KMD (open device, do memory allocation/mmap/free, submit the job descriptor).
> 4. KMD dispatches the job to NPU h/w, handles interrupts and updates the execution status.
> 5. UMD polls the status of the pre-scheduled job.
> 6. The application gets the output results.
>
> So...for the upstreaming,
>
> Q1: do you think our NPU driver is suitable for accel? If the answer is yes, which tree & branch should the patches be based on?
Hi Dejia,
Yes, it definitely sounds as a good fit to the accel subsystem.
Please base your patches on "drm-misc-next" branch in drm-misc repo:
https://anongit.freedesktop.org/git/drm/drm-misc.git
>
> Q2: in thread https://lore.kernel.org/lkml/[email protected]/ showing a similar case, Oded mentioned that:
>
> "If we would have upstreamed a new driver, the expectation would have been that we would use some drm mechanisms.", and
> "the minimal requirement is to use GEM/BOs for memory management operations".
>
> I guess those requirements are also applicable for the Zhouyi NPU KMD? Currently, the memory management (MM) in KMD is based on dma-mapping APIs, which handles both reserved CMA region(s) and SMMU mapped buffers, and supports the dma-buf framework. Maybe I should replace the implementations with DRM APIs.
Yes, those requirements definitely apply here.
>
> Q3: if you have looked at the KMD code, do you think I should make any other major change before submitting the first patch series? Thank you!
I took a quick glance. In general, it seems to be ok, but I noticed
two things related to the integration with drm/accel:
1. You us a scheduler for the job submission, which provides the
ability to defer jobs. In that case, I suggest to check if you can use
drm_sched instead of your own implementation. No point in re-inventing
the wheel.
2. You provide several memory zones for allocation of memory. I would
suggest here to look at using ttm as the memory manager instead of
re-implementing your own.
And please remove the IMPORTANT NOTICE at the end of your emails. I
would have to refrain from answering to further emails if that notice
remains.
Thanks,
Oded
>
> Thanks for your time and look forward to your reply~ ????
>
> Best Regards,
> Dejia
> IMPORTANT NOTICE: The contents of this email and any attachments may be privileged and confidential. If you are not the intended recipient, please delete the email immediately. It is strictly prohibited to disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ©Arm Technology (China) Co., Ltd copyright and reserve all rights. 重要提示:本邮件(包括任何附件)可能含有专供明确的个人或目的使用的机密信息,并受法律保护。如果您并非该收件人,请立即删除此邮件。严禁通过任何渠道,以任何目的,向任何人披露、储存或复制邮件信息或者据此采取任何行动。感谢您的配合。 ©安谋科技(中国)有限公司 版权所有并保留一切权利。
> -----Original Message-----
> From: Sudeep Holla <[email protected]>
> Sent: 2024年3月28日 18:32
> To: Dejia Shang <[email protected]>
> Cc: [email protected]; Sudeep Holla <[email protected]>;
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: About upstreaming ArmChina NPU driver
>
> On Thu, Mar 28, 2024 at 07:46:01AM +0000, Dejia Shang wrote:
> > IMPORTANT NOTICE: The contents of this email and any attachments may
> > be privileged and confidential. If you are not the intended recipient,
> > please delete the email immediately. It is strictly prohibited to
> > disclose the contents to any other person, use it for any purpose, or
> > store or copy the information in any medium. Thank you. ©Arm
> > Technology (China) Co., Ltd copyright and reserve all rights.
> > 重要提示:本邮件(包括任何附件)可能含有专供明确的个人或目的
> 使用的机密信息,并受法律保护。如果您并非该收件人,请立即删除此
> 邮件。严禁通过任何
> > 渠道,以任何目的,向任何人披露、储存或复制邮件信息或者据此采
> 取任何行动。感谢您的配合。 ©安谋科技(中国)有限公司 版权所有并
> 保留一切权利。
>
> You need to get this fixed, otherwise people will delete this email as you have
> suggested and/or refrain from responding to this email.
>
> Please talk to your local IT and get a setup without this disclaimer for all
> mailing list activities.
Now fixed. I did not realize that because the server auto appended that disclaimer. Thanks for your reminder!
Best Regards,
Dejia
>
> --
> Regards,
> Sudeep
> -----Original Message-----
> From: Oded Gabbay <[email protected]>
> Sent: 2024年4月3日 14:26
> To: Dejia Shang <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]
> Subject: Re: About upstreaming ArmChina NPU driver
>
> On Thu, Mar 28, 2024 at 10:01 AM Dejia Shang <[email protected]>
> wrote:
> >
> > Dear Kernel Maintainers,
> >
> > I am a driver developer and would like to upstream the ArmChina Zhouyi
> NPU driver ("Zhouyi" is the brand) to accel subsystem.
> >
> > The driver is already open sourced (both UMD and KMD) and anyone can
> find the code from https://github.com/Arm-China/Compass_NPU_Driver.git.
> >
> > This driver is responsible for scheduling AI inference tasks to the NPU cores
> (V1/V2/V3). Specifically, a simplified end-to-end flow is:
> >
> > 1. A TFLite/ONNX model is transformed to an executable binary
> file in ELF format by the NN graph compiler (designed by ArmChina)
> > 2. An application loads the executable binary file to UMD and
> provides the input data.
> > 3. UMD parses the binary and sends ioctls to KMD (open device,
> do memory allocation/mmap/free, submit the job descriptor).
> > 4. KMD dispatches the job to NPU h/w, handles interrupts and
> updates the execution status.
> > 5. UMD polls the status of the pre-scheduled job.
> > 6. The application gets the output results.
> >
> > So...for the upstreaming,
> >
> > Q1: do you think our NPU driver is suitable for accel? If the answer is yes,
> which tree & branch should the patches be based on?
> Hi Dejia,
> Yes, it definitely sounds as a good fit to the accel subsystem.
> Please base your patches on "drm-misc-next" branch in drm-misc repo:
> https://anongit.freedesktop.org/git/drm/drm-misc.git
>
Hi Oded,
Got it.
> >
> > Q2: in thread
> https://lore.kernel.org/lkml/ec547d33-214f-4952-aa33-c271e9edad63@kern
> el.org/ showing a similar case, Oded mentioned that:
> >
> > "If we would have upstreamed a new driver, the expectation
> would have been that we would use some drm mechanisms.", and
> > "the minimal requirement is to use GEM/BOs for memory
> management operations".
> >
> > I guess those requirements are also applicable for the Zhouyi NPU KMD?
> Currently, the memory management (MM) in KMD is based on dma-mapping
> APIs, which handles both reserved CMA region(s) and SMMU mapped buffers,
> and supports the dma-buf framework. Maybe I should replace the
> implementations with DRM APIs.
> Yes, those requirements definitely apply here.
> >
> > Q3: if you have looked at the KMD code, do you think I should make any
> other major change before submitting the first patch series? Thank you!
> I took a quick glance. In general, it seems to be ok, but I noticed two things
> related to the integration with drm/accel:
>
> 1. You us a scheduler for the job submission, which provides the ability to
> defer jobs. In that case, I suggest to check if you can use drm_sched instead of
> your own implementation. No point in re-inventing the wheel.
> 2. You provide several memory zones for allocation of memory. I would
> suggest here to look at using ttm as the memory manager instead of
> re-implementing your own.
Thanks for your time! I will try to refactor the code as suggested and then send the first patch series.
>
> And please remove the IMPORTANT NOTICE at the end of your emails. I
> would have to refrain from answering to further emails if that notice remains.
Now fixed. I did not realize that because the server auto appended the notice. Sorry for the inconvenience.
Best Regards,
Dejia
>
> Thanks,
> Oded
>
> >
> > Thanks for your time and look forward to your reply~ ????
> >
> > Best Regards,
> > Dejia