I have heeded Andrew Jones' advice and written a script to generate the
instruction handling code. It is still in development, but currently
lives on a fork of riscv-opcodes [1]. I am interested if what I have
produced so far is in line with what people would want to see.
An insn.h file can be generated by running the following in the repo:
make
python3 parse_linux.py instr_dict.yaml insn.h opcodes_config variable_field_data.yaml
I have pushed the generated files to the repo so people do not need to
run the script.
Each instruction has "variable fields" such as registers and immediates.
For each variable field that appears in any provided instruction 3 functions
are provided: extract a variable field from an instruction, insert a value into
a variable field of an instruction, update a value into a variable field of an
instruction. Update first clears the previous value of the variable field of the
instruction. Then for each instruction, the script generates a function to check if an
arbitrary 32-bit value matches the given instruction, and a function to
generate the binary for the instruction given the required variable
fields.
I was able to use riscv-opcodes to parse the instruction files, but
needed to create a new data structure in variable_field_data.py [2] which
holds the positioning of immediates inside of an instruction.
I envision that opcodes_config [3] would live inside of the kernel alongside
a simple script to call riscv-opcodes (that resides somewhere in the
user's file system) with appropriate parameters. When somebody wants to
add a new instruction, they can add an instruction to opcodes_config,
run the script, and commit the resulting generated file.
If this script is in a direction that people like, I will continue to
fix up the issues in it and try to get it upstreamed to riscv-opcodes
before I send a kernel patch.
- Charlie
[1] https://github.com/charlie-rivos/riscv-opcodes/tree/linux_parsing
[2] https://github.com/charlie-rivos/riscv-opcodes/blob/linux_parsing/variable_field_data.py
[3] https://github.com/charlie-rivos/riscv-opcodes/blob/linux_parsing/opcodes_config
On Sun, Sep 24, 2023 at 08:19:35PM -0700, Charlie Jenkins wrote:
> I have heeded Andrew Jones' advice and written a script to generate the
> instruction handling code. It is still in development, but currently
> lives on a fork of riscv-opcodes [1]. I am interested if what I have
> produced so far is in line with what people would want to see.
Hi Charlie,
Sorry for my slow response. I'm glad to see that we're going in a
direction where we generate these functions and reuse an existing
generator to do it.
>
> An insn.h file can be generated by running the following in the repo:
>
> make
> python3 parse_linux.py instr_dict.yaml insn.h opcodes_config variable_field_data.yaml
>
> I have pushed the generated files to the repo so people do not need to
> run the script.
I couldn't find the generated files, not even [3] from your references
seems to be present.
>
> Each instruction has "variable fields" such as registers and immediates.
> For each variable field that appears in any provided instruction 3 functions
> are provided: extract a variable field from an instruction, insert a value into
> a variable field of an instruction, update a value into a variable field of an
> instruction. Update first clears the previous value of the variable field of the
> instruction. Then for each instruction, the script generates a function to check if an
> arbitrary 32-bit value matches the given instruction, and a function to
> generate the binary for the instruction given the required variable
> fields.
>
> I was able to use riscv-opcodes to parse the instruction files, but
> needed to create a new data structure in variable_field_data.py [2] which
> holds the positioning of immediates inside of an instruction.
>
> I envision that opcodes_config [3] would live inside of the kernel alongside
> a simple script to call riscv-opcodes (that resides somewhere in the
> user's file system) with appropriate parameters. When somebody wants to
> add a new instruction, they can add an instruction to opcodes_config,
> run the script, and commit the resulting generated file.
That sounds good to me. (They may hand craft the functions for a single
instruction too, by just using the other functions as templates, but even
if the script isn't used all the time in the future, the initial
conversion of many instructions makes it worth while, IMO.)
>
> If this script is in a direction that people like, I will continue to
> fix up the issues in it and try to get it upstreamed to riscv-opcodes
> before I send a kernel patch.
Please send me a pointer to opcodes_config and insn.h. Also, since you're
extending riscv-opcodes with variable_field_data.py, have you found a way
to verify that all the immediate offsets are correct? Or were the offsets
extracted from the spec/tool directly somehow? I.e. was
variable_field_data.py mostly generated itself?
Thanks,
drew
>
> - Charlie
>
> [1] https://github.com/charlie-rivos/riscv-opcodes/tree/linux_parsing
> [2] https://github.com/charlie-rivos/riscv-opcodes/blob/linux_parsing/variable_field_data.py
> [3] https://github.com/charlie-rivos/riscv-opcodes/blob/linux_parsing/opcodes_config
On Mon, Oct 23, 2023 at 11:19:23AM +0200, Andrew Jones wrote:
> On Sun, Sep 24, 2023 at 08:19:35PM -0700, Charlie Jenkins wrote:
> > I have heeded Andrew Jones' advice and written a script to generate the
> > instruction handling code. It is still in development, but currently
> > lives on a fork of riscv-opcodes [1]. I am interested if what I have
> > produced so far is in line with what people would want to see.
>
> Hi Charlie,
>
> Sorry for my slow response. I'm glad to see that we're going in a
> direction where we generate these functions and reuse an existing
> generator to do it.
>
> >
> > An insn.h file can be generated by running the following in the repo:
> >
> > make
> > python3 parse_linux.py instr_dict.yaml insn.h opcodes_config variable_field_data.yaml
> >
> > I have pushed the generated files to the repo so people do not need to
> > run the script.
>
> I couldn't find the generated files, not even [3] from your references
> seems to be present.
>
I somehow deleted the files... I have added them back:
https://github.com/charlie-rivos/riscv-opcodes/blob/linux_parsing/insn.h
https://github.com/charlie-rivos/riscv-opcodes/blob/linux_parsing/opcodes_config
> >
> > Each instruction has "variable fields" such as registers and immediates.
> > For each variable field that appears in any provided instruction 3 functions
> > are provided: extract a variable field from an instruction, insert a value into
> > a variable field of an instruction, update a value into a variable field of an
> > instruction. Update first clears the previous value of the variable field of the
> > instruction. Then for each instruction, the script generates a function to check if an
> > arbitrary 32-bit value matches the given instruction, and a function to
> > generate the binary for the instruction given the required variable
> > fields.
> >
> > I was able to use riscv-opcodes to parse the instruction files, but
> > needed to create a new data structure in variable_field_data.py [2] which
> > holds the positioning of immediates inside of an instruction.
> >
> > I envision that opcodes_config [3] would live inside of the kernel alongside
> > a simple script to call riscv-opcodes (that resides somewhere in the
> > user's file system) with appropriate parameters. When somebody wants to
> > add a new instruction, they can add an instruction to opcodes_config,
> > run the script, and commit the resulting generated file.
>
> That sounds good to me. (They may hand craft the functions for a single
> instruction too, by just using the other functions as templates, but even
> if the script isn't used all the time in the future, the initial
> conversion of many instructions makes it worth while, IMO.)
>
> >
> > If this script is in a direction that people like, I will continue to
> > fix up the issues in it and try to get it upstreamed to riscv-opcodes
> > before I send a kernel patch.
>
> Please send me a pointer to opcodes_config and insn.h. Also, since you're
> extending riscv-opcodes with variable_field_data.py, have you found a way
> to verify that all the immediate offsets are correct? Or were the offsets
> extracted from the spec/tool directly somehow? I.e. was
> variable_field_data.py mostly generated itself?
>
> Thanks,
> drew
No, they were hand-coded unfortunately. riscv-opcodes invented a whole
bunch of names for different styles of immediates. How I did it manually
was find an instruction that used that immediate type, then go to the
spec and figure out the bounds of the immediate. There are some further
complications like some immediates can't be specific value (normally 0)
and some immediates are split. I don't think it's worth the effort to
auto-generate that.
Recently I have been distracted from this, but I have re-evaluated this.
I believe it might be better to not store the Linux parsing scripts
directly in riscv-opcodes, but rather generalize the scripts in
riscv-opcodes and provide it as a Python package. I have a prototype of
this working, but it is still a work in progress. I would like to avoid
using the parse script already in riscv-opcodes so the python package can
be fully contained inside of the repo, and the parse script can remain
separate. However, there are some features I would like to add to the
parsing so I would need to add those features to parse.py first.
- Charlie
>
> >
> > - Charlie
> >
> > [1] https://github.com/charlie-rivos/riscv-opcodes/tree/linux_parsing
> > [2] https://github.com/charlie-rivos/riscv-opcodes/blob/linux_parsing/variable_field_data.py
> > [3] https://github.com/charlie-rivos/riscv-opcodes/blob/linux_parsing/opcodes_config