2001-10-25 17:02:24

by Vladimir Dergachev

[permalink] [raw]
Subject: [RFC] alternative kernel multimedia API


After looking at and working with Xv, v4l and v4l2 I became somewhat
dissatisfied with the current state of affairs. I have attached a
description of the API that would make (at least) me much happier.

I would very much appreciate comments from interested people..

thanks !

Vladimir Dergachev




Attachments:
km.rfc.txt (8.68 kB)
RFC: Kernel multimedia API (km.rfc.txt)

2001-10-25 20:50:56

by Nate Dannenberg

[permalink] [raw]
Subject: Re: [livid-gatos] [RFC] alternative kernel multimedia API

On Thu, 25 Oct 2001 [email protected] wrote:

>
> After looking at and working with Xv, v4l and v4l2 I became somewhat
> dissatisfied with the current state of affairs. I have attached a
> description of the API that would make (at least) me much happier.
>
> I would very much appreciate comments from interested people..

I'm not a coder for Linux or X, but I figured this would be a good read
anyway, as it might provide some ideas for a similar (albeit less complex)
issue on another platform I use >:-)

Anyways since this is an RFC, here's the only a couple of comments I can
think of (the RFC otherwise looks clear to me):


Commands/queries should also include some kind of (arbitrary, decided by the
calling program) command serial number, so that an out-of-sequence reply
(those prepended with a colon as you describe) can be matched to a previous
command/query. This could allow several commands to be sent and handled by
multiple processes/threads/whatever.

05,HUE=7\n
07,some unrelated command
+05\n # The HUE command was successful
:07,reply to unrelated command
:05,HUE=6\n # Driver reported the HUE parameter as
# different from that most recently set.

The program wanted to set HUE to 7, command successful, value later returned
was 6 (maybe the device only allows even values), while at the same time some
other command was sent and processed.

Also, it might be a nice idea to return the range of one or more parameters if
a command given is out of range. Suppose HUE is only valid for values 0-255:

06,HUE=300\n
-06,HUE=INVALID,0,255\n

..Would tell the calling program that a value given for HUE is only valid for
a range of 0 to 255. This would be useful for a program that wants to attempt
to guess the range a device accepts for a given parameter.

--
_________________________ ___ ___
| [email protected] //ZZ]__ |
| C64/C128/SCPU |'/ |Z/ |
| What's *YOUR* Hobby!? | \__|_\ |
|_________________________\___]___|


2001-10-25 23:10:58

by Vladimir Dergachev

[permalink] [raw]
Subject: Re: [livid-gatos] [RFC] alternative kernel multimedia API



On Thu, 25 Oct 2001, Nate Dannenberg wrote:

> On Thu, 25 Oct 2001 [email protected] wrote:
>
> >
> > After looking at and working with Xv, v4l and v4l2 I became somewhat
> > dissatisfied with the current state of affairs. I have attached a
> > description of the API that would make (at least) me much happier.
> >
> > I would very much appreciate comments from interested people..
>
> I'm not a coder for Linux or X, but I figured this would be a good read
> anyway, as it might provide some ideas for a similar (albeit less complex)
> issue on another platform I use >:-)
>
> Anyways since this is an RFC, here's the only a couple of comments I can
> think of (the RFC otherwise looks clear to me):
>
>
> Commands/queries should also include some kind of (arbitrary, decided by the
> calling program) command serial number, so that an out-of-sequence reply
> (those prepended with a colon as you describe) can be matched to a previous
> command/query. This could allow several commands to be sent and handled by
> multiple processes/threads/whatever.

out of sequence replies are meant to be volunteered by the driver, not
requested by the application (thought the application may want to turn
on/off their generation) I thought about putting a restriction that
prohibits the user application from doing this: write one query, write
another query, read one reply, read another reply and instead require it
to always: write query, read all replies, repeat if necessary. This will
conserve the space used by buffers for the control device.

Vladimir Dergachev

>
> 05,HUE=7\n
> 07,some unrelated command
> +05\n # The HUE command was successful
> :07,reply to unrelated command
> :05,HUE=6\n # Driver reported the HUE parameter as
> # different from that most recently set.
>
> The program wanted to set HUE to 7, command successful, value later returned
> was 6 (maybe the device only allows even values), while at the same time some
> other command was sent and processed.
>
> Also, it might be a nice idea to return the range of one or more parameters if
> a command given is out of range. Suppose HUE is only valid for values 0-255:
>
> 06,HUE=300\n
> -06,HUE=INVALID,0,255\n
>
> ..Would tell the calling program that a value given for HUE is only valid for
> a range of 0 to 255. This would be useful for a program that wants to attempt
> to guess the range a device accepts for a given parameter.
>
> --
> _________________________ ___ ___
> | [email protected] //ZZ]__ |
> | C64/C128/SCPU |'/ |Z/ |
> | What's *YOUR* Hobby!? | \__|_\ |
> |_________________________\___]___|
>
>

2001-10-26 10:55:30

by Xavier Bestel

[permalink] [raw]
Subject: Re: [livid-gatos] [RFC] alternative kernel multimedia API

le ven 26-10-2001 ? 01:14, [email protected] a ?crit :

> > 05,HUE=7\n
> > 07,some unrelated command
> > +05\n # The HUE command was successful
> > :07,reply to unrelated command
> > :05,HUE=6\n # Driver reported the HUE parameter as

I would prefer a proc-like interface to devices, e.g.:

/dev/video0/hue
/dev/video0/saturation
...

more unix-like, no parsing involved.

Xav

2001-10-26 11:55:39

by Vladimir Dergachev

[permalink] [raw]
Subject: Re: [livid-gatos] [RFC] alternative kernel multimedia API



On 26 Oct 2001, Xavier Bestel wrote:

> le ven 26-10-2001 ? 01:14, [email protected] a ?crit :
>
> > > 05,HUE=7\n
> > > 07,some unrelated command
> > > +05\n # The HUE command was successful
> > > :07,reply to unrelated command
> > > :05,HUE=6\n # Driver reported the HUE parameter as
>
> I would prefer a proc-like interface to devices, e.g.:
>
> /dev/video0/hue
> /dev/video0/saturation
> ...
>
> more unix-like, no parsing involved.

This would be a problem as an application would need to open many files in
order to operate such interface. Additionally, you underestimate the
number of files needed. We'll need hue_range, hue_label, hue_comment and
something else..

Parsing is not that hard I believe.

Vladimir Dergachev

>
> Xav
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2001-10-26 16:39:11

by Morgan Collins

[permalink] [raw]
Subject: Re: [livid-gatos] [RFC] alternative kernel multimedia API

>
>
> On 26 Oct 2001, Xavier Bestel wrote:
>
>> le ven 26-10-2001 ? 01:14, [email protected] a ?crit :
>>
>> > > 05,HUE=7\n
>> > > 07,some unrelated command
>> > > +05\n # The HUE command was successful
>> > > :07,reply to unrelated command
>> > > :05,HUE=6\n # Driver reported the HUE parameter as
>>
>> I would prefer a proc-like interface to devices, e.g.:
>>
>> /dev/video0/hue
>> /dev/video0/saturation
>> ...
>>
>> more unix-like, no parsing involved.
>
> This would be a problem as an application would need to open many files in order to
> operate such interface. Additionally, you underestimate the number of files needed.
> We'll need hue_range, hue_label, hue_comment and something else..
>
> Parsing is not that hard I believe.
>
> Vladimir Dergachev
>
Might I suggest a single file interface then?
Say /dev/video0/config, which would hold

hue: 0
saturation: 0
brightness: 0
etc..

which one could open and read or write param: value to.
This, or with the linux wireless config tools, is how the airo module allows you to modify
card values and personally I like the simplicity of the approach.

>>
>> Xav
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body
>> of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the
>> FAQ at http://www.tux.org/lkml/
>>


--
Morgan Collins [Ax0n] http://sirmorcant.morcant.org
Software is something like a machine, and something like mathematics, and something like
language, and something like thought, and art, and information.... but software is not in
fact any of those other things.

2001-10-27 01:05:42

by Mark Whitis

[permalink] [raw]
Subject: Re: [livid-gatos] [RFC] alternative kernel multimedia API

On Thu, 25 Oct 2001 [email protected] wrote:
> After looking at and working with Xv, v4l and v4l2 I became somewhat
> dissatisfied with the current state of affairs. I have attached a
> description of the API that would make (at least) me much happier.

I have some experience relevent to designing such protocols including:
- Development of a couple loadable kernel modules
- Development of a symbol table based parsing system
- Development of communications protocols
- parsing of ascii symbolic parameters in a kernel module.
- Development of a real time full frame rate video capture
application, though not on linux (it was a telescope
autoguider (star tracker) which used a $1500 scientific
frame grabber memory mapped into extended memory on
a 286 cpu (access to memory was a real bitch).
I do not, however, know a lot about the specifics of the V4l/v4l2/xv
implementations. So, some of my comments will betray my ignorance
of v4l2 specifics.

First, I definitely agree that passing control information symbolically
rathern than in fixed structs is a good idea in the long run if it is done
as a standard and not just ATI specific. To qualify as a suitable
successor for existing standards, the new standard must be well designed
and offer as much advantage as possible and as much compatibility with old
standards. Ascii symbolic "name=value" representations are much more
flexible in supporting a variety of devices and in allowing standards to
evolve without breaking older software and the parsing overhead is quite
reasonable (it is even possible to add negotiation for a tokenized binary
transfer format if you are really paranoid about speed).

I also agree with the one control file per device plus data
transfer file.

Compatibility with older standards could probably be implemented
as a module which accepts old calls and translates them to
new calls. This would allow old apps to work with new
protocol drivers.

I agree with the person who commented that each command should have a
request id. This field (say up to 8 characters [A-Za-z0-9_]) will
be repeated on any responses. This value of this field may not
necessarily be unique; the application could, for example, repeat
"foo" for each command (this is handy for simple apps, manual
testing, and scripts).

More than one device should be able to open the control interface
simultaneously. For example, a volume/brightness control app
might be separate from a capture app.

Commands should use FTP style response codes, including the
3 digit response (or 4 digit as used in some varients) with
a hierarchical interpretation of the digits and the trailing
"-" for continuation messages.

1xxx Positive Preliminary reply
2xxx Positive Completion reply
3xxx Positive Intermediate reply
4xxx Transiet Negative Completion Reply
5xxx Permanent Negative Completion Reply

x0xx Syntax error
x1xx Information
x2xx Connections
x3xx Authentiaction and accounting
x4xx unspecified
x5xx file system - perhaps results pertaining to the hardware


Personally, I would limit to one query or assignment per line.
This allows for simpler parsing and errors can be properly associated
with the actual value being set.

There must be a standardized parameter set.
video.source
video.norm
video.tuner.channel
video.tuner.freqtable
video.brightness
video.contrast
video.gamma
video.width
video.height
video.nframes
video.deinterlacemode
video.format.planes (1 for interleaved color, 3 for
interleaved color)
video.format.colorencoding (Greyscale, YUV, RGB, YUV42)
video.format.bytesperpix
audio.volume
audio.balance
...
These need to be the same for all devices. As new commands are
added for each driver, they can be defined in a central place
so new drivers will use newer commands.

You asked if it was appropriate to bloat the driver with parsing
functions. It would be best if the parsing functions were
in a separate kernal module rather than duplicated in each
video driver as well as other types of drivers entirely.
Each driver would have one or more symbol tables which
defined the symbols availble. I have some general purpose parsing
code that has not been adapted to kernel space yet:
http://www.freelabs.com/~whitis/software/symbol/
These routines can be used for:
- TCP/IP communications protocols
- configuration files
- data save/load
- handling user commands affecting internal data structures
- communication with kernel modules (once a kernel version
is implemented).
Kernel porting would not be too hard:
- no malloc problems - caller allocates memory.
- redefining printf()
- supplying a few string handling functions not present in kernel space
- adding module_init() and module_cleanup() functions
- substituting the appropriate #includes for kernel space.
- adding protocol encapsulation for the protocol described here
- adding bit field and enumerated field support
- simple array support for gamma tables and the like.
I could probably be persuaded to port this library to kernel space for
this purpose.

Functionality can be divided over various modules:
insmod symbol - symbol table based parsing
insmod vvl3 - new protocol common functions
insmod v4lcompat - comnvert v4l calls into v4l3 calls
may not be needed if translation with
the existing translator works with v4l2compat.
insmod v4l2compat - convert v4l2 calls into v4l3 calls and vice versa
insmod v4lcompress - video compression (optional).
insmod atiaiw - functions common to all/most ati devices
insmod atiaiw32 - functions specific to a particular card
insmod vbidecode - vertical blanking decoder
insmod netaccess - communicates with user space daemon to access
video streams over TCP/IP using teleconferncing
and streaming protocols and access to other
devices with userspace drivers.

There should be four or five types of instructions, not just two:
- set value
- query value
- execute command
- query symbol tables
This would allow an application to add dialog box entries
for each symbol defined by a driver, similar to SANE.
An application may want to devise
- kill background process (or current command).

Here is an example. Commands sent are marked with ">>>" and responses
with "<<<"
>>> foo101 call reset()
<<< foo101 2000 ok
>>> foo102 query video.norm
<<< foo102 2100 video.norm="NTSC"
>>> foo103 query video.source
<<< foo103 2101 video.source="TUNER"
>>> foo104 query video.tuner.channel
<<< foo105 2101 video.tuner.channel
>>> foo106 set video.width=640
<<< foo106 2000 ok
>>> foo107 set video.height=480
<<< foo107 2000 ok
>>> foo108 async call capture(
note: async keyword means keep processing commands without
completing this one.
>>> nframes=16
>>> vsync=1
>>> report_frames=1
>>> start_timecode="00:00:00:00"
>>> )
<<< foo108 1100- Capture Started
>>> foo108 1101- set capturestatus.framenumber=0
>>> foo108 1101- set capturestatus.framenumber=1
>>> foo108 1101- set capturestatus.framenumber=2
>>> foo108 1101- set capturestatus.framenumber=3
>>> foo108 1101- set capturestatus.framenumber=4
>>> foo108 1101- set capturestatus.framenumber=5
>>> foo109 call abort() // interrupt capture
note: we now have two outstanding commands
>>> foo109 2000 ok
note: we are receiving responses now for two outstanding commands
>>> foo108 5500 capture aborted
note: foo108 must send a final response with no trailing "-"
<<< foo110 query capturestatus
>>> foo110 1100- capturestatus={
<<< foo110 1100- status="STOPPED"
<<< foo110 1100- framenumber=5
<<< foo110 1100- bytestransfered=1000000
<<< foo110 2100 }


If there are too many outstanding commands for internal buffering,
there driver should block the application, unless it uses non-blocking
I/O. Preferably, no single application should be allowed to use all of
the buffers.


/devfs/multimedia/devices should also be symbolic, not fixed field. It
makes no sense to fix the problem in one place and reintroduce it
somewhere else.
{
device_id="101"
location_id="PCI:01:00.0"
...
},{
device_id="102"
location_id="USB:004/004"
...
}
or more crudely, you can have only one line per device but that will get
very ugly as the number of fields increases.

I don't care much for the idea of Xfont style matching taken that
literally. Among other things, the "-" character is a very poor choice of
separator as it often occurs in negative numbers, device names, company
names, etc.
Also, consistency with the format already used within a device family is
more important than consistency between device family. The begining
of the name might be treated in a URLish manner.
PCI:01:00.0
USB:001/001
FIREWIRE:... /* DV camcorder or other FW dev */
SANE://host.example.com:port/ /* scanner */
DIGICAM:... /* pocket Digital camera */
H323://host.example.com:port/ /* teleconferencing */
CUCME://host.example.com:port/ /* teleconferencing */
V4LREMOTE://host.example.com:port/ /* some v4l specific protocol */
SHOUTCAST://host.example.com:port/ /* shoutcast stream */
Note that some of these need to communicate with a userspace daemon
which does TCP/IP and format translation. A small kernel module
whould essentially connect a v4l input and a v4l output device.

Note that the v4l device specifier should be either a URL or a filename
in /devfs space. There will not be a entry in the filesystem for
every possible video source.

Instead, For scripts, a small application could be written that does
pattern matching and data extraction. The functions used by that app could
also be made availible as a library. These functions and helper
app could do much more inteligent device selection including
search path and multiple field selection, symbolic links, consulting
system wide and user specific .rc files, environment variables, and
user specified options.

Applicatins should be able to lock/unlock the device for exclusive use
which may not preclude minor control access from control only applications
like xmixer or a remote control receiver. So, there might be
exclusive capture access and exclusive control access. Applications
should not use exclusive mode unless configured to do so. It is
entirely reasonable to interupt a web capture or teleconference
in progress with a snapshot taken for some application. For
example, I may want to photograph something to sell on ebay
while I am video chatting with my girlfriend. There are some
applications that may want to capture continously.

The data at the begining of each frame should not be a fixed
struct. We are trying to get away from fixed structs, remember.
Instead, it should be a sequence of declarations in the form
name=value. For a crude example of an image format that
does this in real life, look at FITS.

format=V4L3
videooffset=1024
audiooffset=123456


The timestamp at the beginning of each frame of data should
contain a sequence number in two formats: sequential frame
number and relative and absolute SMPTE time code. Absolute SMPTE
timecodes should be generated sequentially if not timecode source
is detected but if there is a timecode, the absolute code should track
the timecode souce.

The device specific loadable kernel module should make heavy use
of jump tables (structures of function pointers) to call individual
routines which start initialized to general purpose functions
and can be replaced with more device specific functions.


Complicated but general things like video data format conversion may be
handled by a general purpose routine which may "compile" down to
a state machine that does the copying or select from one of
many function pointers to a function which does the approriate
translations. You could have a small array of tokens which mean things
like:
loop 640 times
byteswap
yuv42toyuv24 (two bytes/pix to 3 bytes/pix)
rgbtoyuv
endloop
These tokens could be interpretted inside a large loop with jumps
rather than function calls (for less overhead). Or the functions
can be done a scanline at a time for less overhead:
byteswap 640 pixels
yuv42toyuv24 640 pixels
rgbtoyuv 640 pixels

By doing a scan line at a time, you assume that the application will
access the data in row-major vs. column-major order. This assumption
is likely to be true for most bulk transfers. Similarly,
you can convert one sectors worth of data at a time which is likely
to include portions of one or more scan lines which are converted
as many pixels as are needed. page tables for the mmaped region
are adjusted to point to the buffer containing the translated data
with all other pages set to generate a page fault which will invoke
the translation routine.

Each driver should allow a few different optimized formats with
some other formats also being optionally availible:
- raw pixel data
(this allows mmap directly into frame buffer)
driver tells application how the data is organized so it
can decode.
- 32 bit/2 pixel YUYV
Y byte(pixel n), U byte , Y byte (pixel n+1), V bytes
- 24 bit RGB red byte, green byte, blue byte
- 24 bit YUV Y byte, U byte, V byte
- some optional formats:
- 48 bit RGB
- 24 bit CMYK
- 48 bit CMYK
- mpeg encoded
- partially mpeg encoded
- jpeg encoded
- MJPEG encoded
- 16 bit YUV Y byte, UV byte (with specific order of bits)
loss of color information vs. YUYV
- Mixed video and audio
For those formats above, data is in row major order with each
color byte for the first pixel of the first line followed by the
second pixel for the first line, etc. with no padding between
lines.

The mpeg/jpeg/mjpeg encoded formats are intended for hardware
which support full or partial encoding in hardware and there
would probably need to be an assortment of them, perhaps

The raw format is primarly there for fastest transfers to userspace
with the data possibly being encoded later. Basically a capture
to userspace RAM. (25KMB or ram could store about 14 seconds
of 30 fps 640x480 16 bit YUV data). The other three formats
give the application a choice of a few standard formats
which must be supported by all drivers with a reasonable
degree of efficiency.

Video data formats could be described using a sequence of variables:
videoformat.bytesperline=1280
videoformat.planes=1 (color data interleved)
videoformat.pixelspergroup=2 (either 1 or 2)
note: this has to do with the peculiarities of typical cameras
which have twice as many pixels for luminence as for each chrominance
component.
videoformat.nbytes=4
videoformat.byte00="Y0"
videoformat.byte01="Y1"
videoformat.byte02="U"
videoformat.byte03="V"
videoformat.width=640
videoformat.height=480
videoformat.order="ROWMAJOR"
videoformat.pixelpadding=0 (might be 1 for 24/32 bit RGB)
videoformat.rowpadding=0
videoformat.interlace=0
note: if 1, this indicates that all odd rows are stored before all
even rows.
videoformat.audio.bytesperscanline=0
videoformat.audio.bytesperframe=0
videoformat.prefixbytes=0
videoformat.suffixbytes=0
videoformat.timecode.present=0
videoformat.timecode.offset=0
videoformat.timecode.format="SMPTE_ASCII"

Note that the existing v4l2 standards already define video
formats and names for things which should generally be used instead of my
specifications.

VBI (Closed captioning) interface could use a separate
device as is currently done although an option might be to
do it via the control interface (either on the same
connection as the main control connection or from
a separate non-video-capturing application).
foo110 set vbi.
foo111 async call VBI()
foo111 1102- VBI: Closed captioning provided by a grand from
foo111 1102- VBI: linux international. Dateline 29 news.
foo111 1102- VBI: The news you need when you need it.
foo111 1102- VBI: Tonight on Dateline 29 news...
foo112 call abort_vbi();
foo112 2000 ok
foo111 5502 VBI terminated


The abort functions might be replaced with a new protocol command
"kill" which when used with the command sequence ID of the command
calls the corresponding kill function automatically.

Note that the new standard should define some way to synchronize
the audio and video streams for proper mpeg encoding in userspace.
One way to do this is to append the audio and a sample count
which may vary to the end of each frame of video. This clearly
identifies which audio samples were taken at the same time
the frame was (give or take a little skew between audio codecs
and video circuitry). Another is to include time codes in both
the audio and video streams.

Note that to make the protocol easier to implement, we can be
somewhat pedantic out formats in the first version and relax
later if desired:
- structures and function/proceedure calls will be listed one
element per line.
- separators between lines
- case insensitive matching
- Strings in double quotes.
- whitespace allowed at the begining of the line for indentation
but extranious white space not allowed around "=", "(", "{", etc.

/devfs organization:
better to organize by device:
/devfs/multimedia/dev1/control
/devfs/multimedia/dev1/data
/devfs/multimedia/dev2/control
/devfs/multimedia/dev2/data
This works much better with symbolic links.
/dev/videocapture -> /devfs/multimedia/dev1
~/.v4l/searchpath/
01 -> /devfs/multimedia/dev10
02 -> /devfs/multimedia/dev23
03 -> /devfs/multimedia/dev10


--
--
Mark Whitis http://www.freelabs.com/~whitis/ NO SPAM
Author of many open source software packages.
Coauthor: Linux Programming Unleashed (1st Edition)


2001-10-27 04:02:55

by Vladimir Dergachev

[permalink] [raw]
Subject: Re: [livid-gatos] [RFC] alternative kernel multimedia API



On Fri, 26 Oct 2001, Mark Whitis wrote:

> On Thu, 25 Oct 2001 [email protected] wrote:
> > After looking at and working with Xv, v4l and v4l2 I became somewhat
> > dissatisfied with the current state of affairs. I have attached a
> > description of the API that would make (at least) me much happier.
>
> I have some experience relevent to designing such protocols including:
> - Development of a couple loadable kernel modules
> - Development of a symbol table based parsing system
> - Development of communications protocols
> - parsing of ascii symbolic parameters in a kernel module.
> - Development of a real time full frame rate video capture
> application, though not on linux (it was a telescope
> autoguider (star tracker) which used a $1500 scientific
> frame grabber memory mapped into extended memory on
> a 286 cpu (access to memory was a real bitch).
> I do not, however, know a lot about the specifics of the V4l/v4l2/xv
> implementations. So, some of my comments will betray my ignorance
> of v4l2 specifics.
>
> First, I definitely agree that passing control information symbolically
> rathern than in fixed structs is a good idea in the long run if it is done
> as a standard and not just ATI specific. To qualify as a suitable
> successor for existing standards, the new standard must be well designed
> and offer as much advantage as possible and as much compatibility with old
> standards. Ascii symbolic "name=value" representations are much more
> flexible in supporting a variety of devices and in allowing standards to
> evolve without breaking older software and the parsing overhead is quite
> reasonable (it is even possible to add negotiation for a tokenized binary
> transfer format if you are really paranoid about speed).
>
> I also agree with the one control file per device plus data
> transfer file.
>
> Compatibility with older standards could probably be implemented
> as a module which accepts old calls and translates them to
> new calls. This would allow old apps to work with new
> protocol drivers.

Yep..

>
> I agree with the person who commented that each command should have a
> request id. This field (say up to 8 characters [A-Za-z0-9_]) will
> be repeated on any responses. This value of this field may not
> necessarily be unique; the application could, for example, repeat
> "foo" for each command (this is handy for simple apps, manual
> testing, and scripts).

I know how to make you both happy. We just introduce a special field
PRIVATE, so that if you make a request like
?HUE,PRIVATE=foo1282

the reply would be something like
+HUE=3872,PRIVATE=foo1282
(i.e. it will be repeated back verbatim). I don't want to use ECHO for the
name because some device might care about. On the other hand I think
everyone will agree that choosing PRIVATE as a name for something driver
function exposed to user-space is a bad idea.

>
> More than one device should be able to open the control interface
> simultaneously. For example, a volume/brightness control app
> might be separate from a capture app.

Perhaps..

>
> Commands should use FTP style response codes, including the
> 3 digit response (or 4 digit as used in some varients) with
> a hierarchical interpretation of the digits and the trailing
> "-" for continuation messages.
>
> 1xxx Positive Preliminary reply
> 2xxx Positive Completion reply
> 3xxx Positive Intermediate reply
> 4xxx Transiet Negative Completion Reply
> 5xxx Permanent Negative Completion Reply

I think this is too much. The command should just indicate whether it is
possitive or negative or out of sequence. Otherwise any other information
should be passed as anything else. If you really want to return more
diagnostics you can always do:

?HUE
-HUE=INVALID,ERROR_MESSAGE=Not supported in this mode

I am _very_ against codifying stuff as numbers. They are not mnemonic and
prone to create conflicts.

>
> x0xx Syntax error
> x1xx Information
> x2xx Connections
> x3xx Authentiaction and accounting
> x4xx unspecified
> x5xx file system - perhaps results pertaining to the hardware
>
>
> Personally, I would limit to one query or assignment per line.

Yes, especially if you remember that the interface is for software not
humans. If you need to pass multiple lines - just use \\n for newlines.

> This allows for simpler parsing and errors can be properly associated
> with the actual value being set.
>
> There must be a standardized parameter set.
> video.source
> video.norm
> video.tuner.channel
> video.tuner.freqtable
> video.brightness
> video.contrast
> video.gamma
> video.width
> video.height
> video.nframes
> video.deinterlacemode
> video.format.planes (1 for interleaved color, 3 for
> interleaved color)
> video.format.colorencoding (Greyscale, YUV, RGB, YUV42)
> video.format.bytesperpix
> audio.volume
> audio.balance
> ...
> These need to be the same for all devices. As new commands are
> added for each driver, they can be defined in a central place
> so new drivers will use newer commands.

Something like that, except I also wanted to define INTERFACE_CLASS which
would be responsible for the semantic value. So when a device reports that
it supports a given INTERFACE_CLASS it would mean that a) it supports a
certain set of parameters and b) their function is as specified in the
INTERFACE_CLASS description.

>
> You asked if it was appropriate to bloat the driver with parsing
> functions. It would be best if the parsing functions were
> in a separate kernal module rather than duplicated in each
> video driver as well as other types of drivers entirely.

Definetly outside drivers..

> Each driver would have one or more symbol tables which
> defined the symbols availble. I have some general purpose parsing

I actually was thinking towards some kind of idl compiler. So we can
specify which fields are supported and it would generate all the necessary
code. Ultimately I want the whole drivers generated automatically, so that
we punch in register offsets and some kind of data flow description and
the compiler will generate the appropriate code. But I don't want to start
on that before I did the same job by hand and reduced the interface to
something more managable.

> code that has not been adapted to kernel space yet:
> http://www.freelabs.com/~whitis/software/symbol/

thanks !

> These routines can be used for:
> - TCP/IP communications protocols
> - configuration files
> - data save/load
> - handling user commands affecting internal data structures
> - communication with kernel modules (once a kernel version
> is implemented).
> Kernel porting would not be too hard:
> - no malloc problems - caller allocates memory.
> - redefining printf()
> - supplying a few string handling functions not present in kernel space
> - adding module_init() and module_cleanup() functions
> - substituting the appropriate #includes for kernel space.
> - adding protocol encapsulation for the protocol described here
> - adding bit field and enumerated field support
> - simple array support for gamma tables and the like.
> I could probably be persuaded to port this library to kernel space for
> this purpose.
>
> Functionality can be divided over various modules:
> insmod symbol - symbol table based parsing
> insmod vvl3 - new protocol common functions
> insmod v4lcompat - comnvert v4l calls into v4l3 calls
> may not be needed if translation with
> the existing translator works with v4l2compat.
> insmod v4l2compat - convert v4l2 calls into v4l3 calls and vice versa
> insmod v4lcompress - video compression (optional).
> insmod atiaiw - functions common to all/most ati devices
> insmod atiaiw32 - functions specific to a particular card
> insmod vbidecode - vertical blanking decoder
> insmod netaccess - communicates with user space daemon to access
> video streams over TCP/IP using teleconferncing
> and streaming protocols and access to other
> devices with userspace drivers.

Well, these thing can be decided later. For now, I want to hear all
comments and made sure I did not miss something obvious and stupid.

>
> There should be four or five types of instructions, not just two:
> - set value
> - query value
> - execute command

This can be part of set value. Execute command is a kind of "one-shot
button" versus usual set value being a kind of "entry field".

> - query symbol tables
> This would allow an application to add dialog box entries
> for each symbol defined by a driver, similar to SANE.
> An application may want to devise

This can be part of query value - the value would be a symbol table.

> - kill background process (or current command).

What do we need this for ? Ahh, for capturing.. just STOP_CAPTURE command
would be sufficient. Really, I don't want this to be too complex though it
could be. A simple interface, to be extended only when there are features
not available thru it.

As for - versus :, we'll it just looks better: PCI-0-0-1
versus: PCI:0:0:1. If you want to put a - someplace just escape it.
These are symbols - their choice influences look and feel only, not
functionality.


As for data in the beginning of frames being as struct - that's what data
connection is for. control connection passes symbolic information over it.
data connection is for binary data.

Vladimir Dergachev

>
> Here is an example. Commands sent are marked with ">>>" and responses
> with "<<<"
> >>> foo101 call reset()
> <<< foo101 2000 ok
> >>> foo102 query video.norm
> <<< foo102 2100 video.norm="NTSC"
> >>> foo103 query video.source
> <<< foo103 2101 video.source="TUNER"
> >>> foo104 query video.tuner.channel
> <<< foo105 2101 video.tuner.channel
> >>> foo106 set video.width=640
> <<< foo106 2000 ok
> >>> foo107 set video.height=480
> <<< foo107 2000 ok
> >>> foo108 async call capture(
> note: async keyword means keep processing commands without
> completing this one.
> >>> nframes=16
> >>> vsync=1
> >>> report_frames=1
> >>> start_timecode="00:00:00:00"
> >>> )
> <<< foo108 1100- Capture Started
> >>> foo108 1101- set capturestatus.framenumber=0
> >>> foo108 1101- set capturestatus.framenumber=1
> >>> foo108 1101- set capturestatus.framenumber=2
> >>> foo108 1101- set capturestatus.framenumber=3
> >>> foo108 1101- set capturestatus.framenumber=4
> >>> foo108 1101- set capturestatus.framenumber=5
> >>> foo109 call abort() // interrupt capture
> note: we now have two outstanding commands
> >>> foo109 2000 ok
> note: we are receiving responses now for two outstanding commands
> >>> foo108 5500 capture aborted
> note: foo108 must send a final response with no trailing "-"
> <<< foo110 query capturestatus
> >>> foo110 1100- capturestatus={
> <<< foo110 1100- status="STOPPED"
> <<< foo110 1100- framenumber=5
> <<< foo110 1100- bytestransfered=1000000
> <<< foo110 2100 }
>
>
> If there are too many outstanding commands for internal buffering,
> there driver should block the application, unless it uses non-blocking
> I/O. Preferably, no single application should be allowed to use all of
> the buffers.
>
>
> /devfs/multimedia/devices should also be symbolic, not fixed field. It
> makes no sense to fix the problem in one place and reintroduce it
> somewhere else.
> {
> device_id="101"
> location_id="PCI:01:00.0"
> ...
> },{
> device_id="102"
> location_id="USB:004/004"
> ...
> }
> or more crudely, you can have only one line per device but that will get
> very ugly as the number of fields increases.
>
> I don't care much for the idea of Xfont style matching taken that
> literally. Among other things, the "-" character is a very poor choice of
> separator as it often occurs in negative numbers, device names, company
> names, etc.
> Also, consistency with the format already used within a device family is
> more important than consistency between device family. The begining
> of the name might be treated in a URLish manner.
> PCI:01:00.0
> USB:001/001
> FIREWIRE:... /* DV camcorder or other FW dev */
> SANE://host.example.com:port/ /* scanner */
> DIGICAM:... /* pocket Digital camera */
> H323://host.example.com:port/ /* teleconferencing */
> CUCME://host.example.com:port/ /* teleconferencing */
> V4LREMOTE://host.example.com:port/ /* some v4l specific protocol */
> SHOUTCAST://host.example.com:port/ /* shoutcast stream */
> Note that some of these need to communicate with a userspace daemon
> which does TCP/IP and format translation. A small kernel module
> whould essentially connect a v4l input and a v4l output device.
>
> Note that the v4l device specifier should be either a URL or a filename
> in /devfs space. There will not be a entry in the filesystem for
> every possible video source.
>
> Instead, For scripts, a small application could be written that does
> pattern matching and data extraction. The functions used by that app could
> also be made availible as a library. These functions and helper
> app could do much more inteligent device selection including
> search path and multiple field selection, symbolic links, consulting
> system wide and user specific .rc files, environment variables, and
> user specified options.
>
> Applicatins should be able to lock/unlock the device for exclusive use
> which may not preclude minor control access from control only applications
> like xmixer or a remote control receiver. So, there might be
> exclusive capture access and exclusive control access. Applications
> should not use exclusive mode unless configured to do so. It is
> entirely reasonable to interupt a web capture or teleconference
> in progress with a snapshot taken for some application. For
> example, I may want to photograph something to sell on ebay
> while I am video chatting with my girlfriend. There are some
> applications that may want to capture continously.
>
> The data at the begining of each frame should not be a fixed
> struct. We are trying to get away from fixed structs, remember.
> Instead, it should be a sequence of declarations in the form
> name=value. For a crude example of an image format that
> does this in real life, look at FITS.
>
> format=V4L3
> videooffset=1024
> audiooffset=123456
>
>
> The timestamp at the beginning of each frame of data should
> contain a sequence number in two formats: sequential frame
> number and relative and absolute SMPTE time code. Absolute SMPTE
> timecodes should be generated sequentially if not timecode source
> is detected but if there is a timecode, the absolute code should track
> the timecode souce.
>
> The device specific loadable kernel module should make heavy use
> of jump tables (structures of function pointers) to call individual
> routines which start initialized to general purpose functions
> and can be replaced with more device specific functions.
>
>
> Complicated but general things like video data format conversion may be
> handled by a general purpose routine which may "compile" down to
> a state machine that does the copying or select from one of
> many function pointers to a function which does the approriate
> translations. You could have a small array of tokens which mean things
> like:
> loop 640 times
> byteswap
> yuv42toyuv24 (two bytes/pix to 3 bytes/pix)
> rgbtoyuv
> endloop
> These tokens could be interpretted inside a large loop with jumps
> rather than function calls (for less overhead). Or the functions
> can be done a scanline at a time for less overhead:
> byteswap 640 pixels
> yuv42toyuv24 640 pixels
> rgbtoyuv 640 pixels
>
> By doing a scan line at a time, you assume that the application will
> access the data in row-major vs. column-major order. This assumption
> is likely to be true for most bulk transfers. Similarly,
> you can convert one sectors worth of data at a time which is likely
> to include portions of one or more scan lines which are converted
> as many pixels as are needed. page tables for the mmaped region
> are adjusted to point to the buffer containing the translated data
> with all other pages set to generate a page fault which will invoke
> the translation routine.
>
> Each driver should allow a few different optimized formats with
> some other formats also being optionally availible:
> - raw pixel data
> (this allows mmap directly into frame buffer)
> driver tells application how the data is organized so it
> can decode.
> - 32 bit/2 pixel YUYV
> Y byte(pixel n), U byte , Y byte (pixel n+1), V bytes
> - 24 bit RGB red byte, green byte, blue byte
> - 24 bit YUV Y byte, U byte, V byte
> - some optional formats:
> - 48 bit RGB
> - 24 bit CMYK
> - 48 bit CMYK
> - mpeg encoded
> - partially mpeg encoded
> - jpeg encoded
> - MJPEG encoded
> - 16 bit YUV Y byte, UV byte (with specific order of bits)
> loss of color information vs. YUYV
> - Mixed video and audio
> For those formats above, data is in row major order with each
> color byte for the first pixel of the first line followed by the
> second pixel for the first line, etc. with no padding between
> lines.
>
> The mpeg/jpeg/mjpeg encoded formats are intended for hardware
> which support full or partial encoding in hardware and there
> would probably need to be an assortment of them, perhaps
>
> The raw format is primarly there for fastest transfers to userspace
> with the data possibly being encoded later. Basically a capture
> to userspace RAM. (25KMB or ram could store about 14 seconds
> of 30 fps 640x480 16 bit YUV data). The other three formats
> give the application a choice of a few standard formats
> which must be supported by all drivers with a reasonable
> degree of efficiency.
>
> Video data formats could be described using a sequence of variables:
> videoformat.bytesperline=1280
> videoformat.planes=1 (color data interleved)
> videoformat.pixelspergroup=2 (either 1 or 2)
> note: this has to do with the peculiarities of typical cameras
> which have twice as many pixels for luminence as for each chrominance
> component.
> videoformat.nbytes=4
> videoformat.byte00="Y0"
> videoformat.byte01="Y1"
> videoformat.byte02="U"
> videoformat.byte03="V"
> videoformat.width=640
> videoformat.height=480
> videoformat.order="ROWMAJOR"
> videoformat.pixelpadding=0 (might be 1 for 24/32 bit RGB)
> videoformat.rowpadding=0
> videoformat.interlace=0
> note: if 1, this indicates that all odd rows are stored before all
> even rows.
> videoformat.audio.bytesperscanline=0
> videoformat.audio.bytesperframe=0
> videoformat.prefixbytes=0
> videoformat.suffixbytes=0
> videoformat.timecode.present=0
> videoformat.timecode.offset=0
> videoformat.timecode.format="SMPTE_ASCII"
>
> Note that the existing v4l2 standards already define video
> formats and names for things which should generally be used instead of my
> specifications.
>
> VBI (Closed captioning) interface could use a separate
> device as is currently done although an option might be to
> do it via the control interface (either on the same
> connection as the main control connection or from
> a separate non-video-capturing application).
> foo110 set vbi.
> foo111 async call VBI()
> foo111 1102- VBI: Closed captioning provided by a grand from
> foo111 1102- VBI: linux international. Dateline 29 news.
> foo111 1102- VBI: The news you need when you need it.
> foo111 1102- VBI: Tonight on Dateline 29 news...
> foo112 call abort_vbi();
> foo112 2000 ok
> foo111 5502 VBI terminated
>
>
> The abort functions might be replaced with a new protocol command
> "kill" which when used with the command sequence ID of the command
> calls the corresponding kill function automatically.
>
> Note that the new standard should define some way to synchronize
> the audio and video streams for proper mpeg encoding in userspace.
> One way to do this is to append the audio and a sample count
> which may vary to the end of each frame of video. This clearly
> identifies which audio samples were taken at the same time
> the frame was (give or take a little skew between audio codecs
> and video circuitry). Another is to include time codes in both
> the audio and video streams.
>
> Note that to make the protocol easier to implement, we can be
> somewhat pedantic out formats in the first version and relax
> later if desired:
> - structures and function/proceedure calls will be listed one
> element per line.
> - separators between lines
> - case insensitive matching
> - Strings in double quotes.
> - whitespace allowed at the begining of the line for indentation
> but extranious white space not allowed around "=", "(", "{", etc.
>
> /devfs organization:
> better to organize by device:
> /devfs/multimedia/dev1/control
> /devfs/multimedia/dev1/data
> /devfs/multimedia/dev2/control
> /devfs/multimedia/dev2/data
> This works much better with symbolic links.
> /dev/videocapture -> /devfs/multimedia/dev1
> ~/.v4l/searchpath/
> 01 -> /devfs/multimedia/dev10
> 02 -> /devfs/multimedia/dev23
> 03 -> /devfs/multimedia/dev10
>
>
> --
> --
> Mark Whitis http://www.freelabs.com/~whitis/ NO SPAM
> Author of many open source software packages.
> Coauthor: Linux Programming Unleashed (1st Edition)
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2001-10-30 22:47:44

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC] alternative kernel multimedia API

Hi!

One thing current interface can not do is to put driver into userspace.

And yes, we *need* userspace devices for many cams, as you will be shot
if you try to do format conversion in kernel.
Pavel
--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.