2006-01-08 09:24:50

by Jaroslav Kysela

[permalink] [raw]
Subject: Re: [Alsa-devel] Re: [OT] ALSA userspace API complexity

On Sun, 8 Jan 2006, Hannu Savolainen wrote:

> On Sat, 7 Jan 2006, Takashi Iwai wrote:
>
> > > > Because OSS API doesn't cover many things. For example,
> > > >
> > > > - PCM with non-interleaved formats
> > > There is no need to handle non-interleaved data in kernel level drivers
> > > because all the devices use interleaved formats.
> >
> > Many RME boards support only non-intereleave data.
> In such cases it's better to do interleavin/deinterleaving in the kernel
> rather than forcing the apps to check which method they should use.

I don't think so. The library can do such conversions (and alsa-lib does)
quite easy. If we have a possibility to remove the code from the kernel
space without any drawbacks, then it should be removed. I don't see any
advantage to have such conversions in the kernel.

> > Indeed. But you know that almost all "OSS" applications access
> > directly the device files. There is no room to put a library to solve
> > these things in user-space.
> Why should there be any need to put library code between the kernel API
> and an application that is perfectly happy with it? It is only necessary
> if somebody wants to emulate the OSS kernel API in library level.
>
> A wrapper library with routines like oss_open, oss_write, etc was once
> considered. However we didn't find any good reason to do that (in
> particular because that conflicted with routine names already used
> internally in some important OSS applications).

Bad decision. Again, I feel you're hidding the flexibility against
your feeling that the kernel API is the best enough for applications.
Imagine that the API redirection is or can be also flexible for your
future development.

> What if there is some better way to handle OSS-ALSA interaction than
> library level hooks/emulation. In the short term this may be difficult
> because OSS is binary only and outside the kernel. But in long run OSS can
> hopefully be open sourced which could make it possible to use solutions
> like merging the kernel space drivers together.
>
> Actually I have forgotten what was the reason why you wanted to get the
> OSS API emulated in userland rather than using the previous snd-oss
> module (which worked well other than the API version you emulated was too
> old)?

Stream mixing. We have user space solution while you insist to put this
code to kernel. Simply, we need to go through our library.

>From the end user perspective, don't you think that having an opportunity
to change the API entry point from one to multiple (user space library -
preferred, direct kernel space - last resort) is more flexible for
developers and users? Please, consider this question without any flames
line which API is better and what's better for audio subsystem architects
and what's better for your commercial work.

Jaroslav

-----
Jaroslav Kysela <[email protected]>
Linux Kernel Sound Maintainer
ALSA Project, SUSE Labs


2006-01-09 13:06:05

by René Rebe

[permalink] [raw]
Subject: Re: [Alsa-devel] Re: [OT] ALSA userspace API complexity

Hi,

On Sunday 08 January 2006 10:24, Jaroslav Kysela wrote:

> > > > > - PCM with non-interleaved formats
> > > > There is no need to handle non-interleaved data in kernel level drivers
> > > > because all the devices use interleaved formats.
> > >
> > > Many RME boards support only non-intereleave data.
> > In such cases it's better to do interleavin/deinterleaving in the kernel
> > rather than forcing the apps to check which method they should use.
>
> I don't think so. The library can do such conversions (and alsa-lib does)
> quite easy. If we have a possibility to remove the code from the kernel
> space without any drawbacks, then it should be removed. I don't see any
> advantage to have such conversions in the kernel.

Also, when the data is already available as single streams in a user-space
multi track application, why should it be forced interleaved, when the hardware
could handle the format just fine?

Yours,
Rene

--
ExactCODE, Berlin

2006-01-09 15:13:37

by Hannu Savolainen

[permalink] [raw]
Subject: Re: [Alsa-devel] Re: [OT] ALSA userspace API complexity

On Mon, 9 Jan 2006, Ren? Rebe wrote:

> > I don't think so. The library can do such conversions (and alsa-lib does)
> > quite easy. If we have a possibility to remove the code from the kernel
> > space without any drawbacks, then it should be removed. I don't see any
> > advantage to have such conversions in the kernel.
>
> Also, when the data is already available as single streams in a user-space
> multi track application, why should it be forced interleaved, when the hardware
> could handle the format just fine?
Because the conversion doesn't cost anything. Trying to avoid it by
making the API more complicated (I would even say confusing) is extreme
overkill.

Each feature of this kind requires two additional API
calls (one for checking in which way the hardware works and another to
set the device to use the feature). It's also possible to implement the
feature in a way that requires more new calls. By adding support for
dozens of features like this it's easy to create an API that has 1500+
calls.

Even worse this kind of features weaken the device abstraction provided by
the API. The applications will have to check for this and
that and provide support for 100s of special cases that may be required by
certain devices.

IMHO this has already happened with ALSA. Normal
programmers (other than few of the world class gurus) have no way to
understand the API. I would consider myself at least moderately
experienced sound programmer (25+ years of programming experience and more
than half of it on sound). However even after two years of more or less
intense learning I don't know what is the preferred way to use ALSA. I
think this is a general problem because practically all ALSA applications use
different ALSA API calls.

Best regards,

Hannu
-----
Hannu Savolainen ([email protected])
http://www.opensound.com (Open Sound System (OSS))
http://www.compusonic.fi (Finnish OSS pages)
OH2GLH QTH: Karkkila, Finland LOC: KP20CM

2006-01-09 17:16:26

by René Rebe

[permalink] [raw]
Subject: Re: [Alsa-devel] Re: [OT] ALSA userspace API complexity

Hi,

On Monday 09 January 2006 16:10, Hannu Savolainen wrote:

> > > I don't think so. The library can do such conversions (and alsa-lib does)
> > > quite easy. If we have a possibility to remove the code from the kernel
> > > space without any drawbacks, then it should be removed. I don't see any
> > > advantage to have such conversions in the kernel.
> >
> > Also, when the data is already available as single streams in a user-space
> > multi track application, why should it be forced interleaved, when the hardware
> > could handle the format just fine?
> Because the conversion doesn't cost anything. Trying to avoid it by
> making the API more complicated (I would even say confusing) is extreme
> overkill.

Since when doesn't cost convesion anything? I'm able to count a lot of wasted
CPU cycles in there ...

> Even worse this kind of features weaken the device abstraction provided by
> the API. The applications will have to check for this and
> that and provide support for 100s of special cases that may be required by
> certain devices.

An lame write() only player can still open the default device and get the auto-convert
chain it deserves ...

Yours,
Rene Rebe

--
ExactCODE Berlin

2006-01-09 22:02:31

by David Lang

[permalink] [raw]
Subject: Re: [Alsa-devel] Re: [OT] ALSA userspace API complexity

On Mon, 9 Jan 2006, Ren? Rebe wrote:

> On Monday 09 January 2006 16:10, Hannu Savolainen wrote:
>
>>>> I don't think so. The library can do such conversions (and alsa-lib does)
>>>> quite easy. If we have a possibility to remove the code from the kernel
>>>> space without any drawbacks, then it should be removed. I don't see any
>>>> advantage to have such conversions in the kernel.
>>>
>>> Also, when the data is already available as single streams in a user-space
>>> multi track application, why should it be forced interleaved, when the hardware
>>> could handle the format just fine?
>> Because the conversion doesn't cost anything. Trying to avoid it by
>> making the API more complicated (I would even say confusing) is extreme
>> overkill.
>
> Since when doesn't cost convesion anything? I'm able to count a lot of wasted
> CPU cycles in there ...

if the data needed to be accessed by the CPU anyway it's free becouse
otherwise the CPU would stall waiting for the next chunk of memory. you
can do quite a bit of work on data in cache while you are waiting for the
next cache line to load.

in this same way, checksumming a network packet is free if the CPU needs
to copy the data anway, it only costs something if the data could bypass
the CPU.

David Lang

--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare

2006-01-09 23:10:39

by John Rigg

[permalink] [raw]
Subject: Re: [Alsa-devel] Re: [OT] ALSA userspace API complexity

On Mon, Jan 09, 2006 at 01:58:00PM -0800, David Lang wrote:
> On Mon, 9 Jan 2006, Ren? Rebe wrote:
>
> >On Monday 09 January 2006 16:10, Hannu Savolainen wrote:
> >
> >>>>I don't think so. The library can do such conversions (and alsa-lib
> >>>>does)
> >>>>quite easy. If we have a possibility to remove the code from the kernel
> >>>>space without any drawbacks, then it should be removed. I don't see any
> >>>>advantage to have such conversions in the kernel.
> >>>
> >>>Also, when the data is already available as single streams in a
> >>>user-space
> >>>multi track application, why should it be forced interleaved, when the
> >>>hardware
> >>>could handle the format just fine?
> >>Because the conversion doesn't cost anything. Trying to avoid it by
> >>making the API more complicated (I would even say confusing) is extreme
> >>overkill.
> >
> >Since when doesn't cost convesion anything? I'm able to count a lot of
> >wasted
> >CPU cycles in there ...
>
> if the data needed to be accessed by the CPU anyway it's free becouse
> otherwise the CPU would stall waiting for the next chunk of memory. you
> can do quite a bit of work on data in cache while you are waiting for the
> next cache line to load.
>
> in this same way, checksumming a network packet is free if the CPU needs
> to copy the data anway, it only costs something if the data could bypass
> the CPU.

Yes, but the CPU has plenty of other work to do. The sound cards that
would be worst affected by this are the big RME cards (non-interleaved) and
multiple ice1712 cards (non-interleaved blocks of interleaved data),
which AFAIK are the only cards capable of handling serious professional audio.
This could represent 48 or more channels of 96kHz audio, which
doesn't leave a lot of spare CPU capacity for running X, for example.

John

2006-01-09 23:25:55

by David Lang

[permalink] [raw]
Subject: Re: [Alsa-devel] Re: [OT] ALSA userspace API complexity

On Mon, 9 Jan 2006, John Rigg wrote:

> On Mon, Jan 09, 2006 at 01:58:00PM -0800, David Lang wrote:
>> On Mon, 9 Jan 2006, Ren? Rebe wrote:
>>>>>
>>>>> Also, when the data is already available as single streams in a
>>>>> user-space
>>>>> multi track application, why should it be forced interleaved, when the
>>>>> hardware
>>>>> could handle the format just fine?
>>>> Because the conversion doesn't cost anything. Trying to avoid it by
>>>> making the API more complicated (I would even say confusing) is extreme
>>>> overkill.
>>>
>>> Since when doesn't cost convesion anything? I'm able to count a lot of
>>> wasted
>>> CPU cycles in there ...
>>
>> if the data needed to be accessed by the CPU anyway it's free becouse
>> otherwise the CPU would stall waiting for the next chunk of memory. you
>> can do quite a bit of work on data in cache while you are waiting for the
>> next cache line to load.
>>
>> in this same way, checksumming a network packet is free if the CPU needs
>> to copy the data anway, it only costs something if the data could bypass
>> the CPU.
>
> Yes, but the CPU has plenty of other work to do. The sound cards that
> would be worst affected by this are the big RME cards (non-interleaved) and
> multiple ice1712 cards (non-interleaved blocks of interleaved data),
> which AFAIK are the only cards capable of handling serious professional audio.
> This could represent 48 or more channels of 96kHz audio, which
> doesn't leave a lot of spare CPU capacity for running X, for example.

does the CPU touch the data for these, or do you DMA directly from
userspace (i.e. "zero-copy")?

if the cpu touches this data on it's way in and out of the system then you
are going to have a time period where you are maxing out the memory bus of
your CPU (this may be a short time, but since the bus is either active or
not there will be a time when it's active transfering audio data :-).
while the memory bus is busy transfering the audio data your cpu can only
work on data that's in the cache.

remember that as you keep reading the data from memory it will push other
stuff out of your cache.

what magic do you pull to have the CPU busy on other things while the
cache (and memory bus) is being occupied by the audio data transfers?

David Lang

--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare

2006-01-10 00:06:05

by John Rigg

[permalink] [raw]
Subject: Re: [Alsa-devel] Re: [OT] ALSA userspace API complexity

On Mon, Jan 09, 2006 at 03:21:21PM -0800, David Lang wrote:
> On Mon, 9 Jan 2006, John Rigg wrote:
>
> >On Mon, Jan 09, 2006 at 01:58:00PM -0800, David Lang wrote:
> >>On Mon, 9 Jan 2006, Ren? Rebe wrote:
> >>>>>
> >>>>>Also, when the data is already available as single streams in a
> >>>>>user-space
> >>>>>multi track application, why should it be forced interleaved, when the
> >>>>>hardware
> >>>>>could handle the format just fine?
> >>>>Because the conversion doesn't cost anything. Trying to avoid it by
> >>>>making the API more complicated (I would even say confusing) is extreme
> >>>>overkill.
> >>>
> >>>Since when doesn't cost convesion anything? I'm able to count a lot of
> >>>wasted
> >>>CPU cycles in there ...
> >>
> >>if the data needed to be accessed by the CPU anyway it's free becouse
> >>otherwise the CPU would stall waiting for the next chunk of memory. you
> >>can do quite a bit of work on data in cache while you are waiting for the
> >>next cache line to load.
> >>
> >>in this same way, checksumming a network packet is free if the CPU needs
> >>to copy the data anway, it only costs something if the data could bypass
> >>the CPU.
> >
> >Yes, but the CPU has plenty of other work to do. The sound cards that
> >would be worst affected by this are the big RME cards (non-interleaved) and
> >multiple ice1712 cards (non-interleaved blocks of interleaved data),
> >which AFAIK are the only cards capable of handling serious professional
> >audio.
> >This could represent 48 or more channels of 96kHz audio, which
> >doesn't leave a lot of spare CPU capacity for running X, for example.
>
> does the CPU touch the data for these, or do you DMA directly from
> userspace (i.e. "zero-copy")?

The cards I mentioned use DMA. RME actually advertises that some of their
cards can handle 52 channels with zero CPU load. Their onboard DSPs can
also do routing and mixing, again without touching the CPU.

John

2006-01-10 00:34:07

by David Lang

[permalink] [raw]
Subject: Re: [Alsa-devel] Re: [OT] ALSA userspace API complexity

On Tue, 10 Jan 2006, John Rigg wrote:

>> does the CPU touch the data for these, or do you DMA directly from
>> userspace (i.e. "zero-copy")?
>
> The cards I mentioned use DMA. RME actually advertises that some of their
> cards can handle 52 channels with zero CPU load. Their onboard DSPs can
> also do routing and mixing, again without touching the CPU.

I was under the (apparently mistaken) impression that you couldn't DMA
from userspace (something to do with the possibility that the userspace
memory pages could be swapped out in the middle of the DMA)

David Lang

--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare

2006-01-10 00:51:13

by Hannu Savolainen

[permalink] [raw]
Subject: Re: [Alsa-devel] Re: [OT] ALSA userspace API complexity

On Mon, 9 Jan 2006, John Rigg wrote:

> Yes, but the CPU has plenty of other work to do. The sound cards that
> would be worst affected by this are the big RME cards (non-interleaved) and
> multiple ice1712 cards (non-interleaved blocks of interleaved data),
ice1712 uses normal interleaving. There are no "non-interleaved blocks".

> which AFAIK are the only cards capable of handling serious professional audio.
> This could represent 48 or more channels of 96kHz audio, which
> doesn't leave a lot of spare CPU capacity for running X, for example.

This is true only if you run the system at full 100% load before the
conversions. But in real life you cannot do this anyway. You have to use
a CPU that has lot of spare power. Otherwise anything unpredictable will
make things to fail.

Best regards,

Hannu
-----
Hannu Savolainen ([email protected])
http://www.opensound.com (Open Sound System (OSS))
http://www.compusonic.fi (Finnish OSS pages)
OH2GLH QTH: Karkkila, Finland LOC: KP20CM

2006-01-10 00:52:21

by John Rigg

[permalink] [raw]
Subject: Re: [Alsa-devel] Re: [OT] ALSA userspace API complexity

On Tue, Jan 10, 2006 at 12:16:17AM +0000, John Rigg wrote:
> On Mon, Jan 09, 2006 at 03:21:21PM -0800, David Lang wrote:
> > does the CPU touch the data for these, or do you DMA directly from
> > userspace (i.e. "zero-copy")?
>
> The cards I mentioned use DMA. RME actually advertises that some of their
> cards can handle 52 channels with zero CPU load. Their onboard DSPs can
> also do routing and mixing, again without touching the CPU.

Of course I should also mention that the sound cards deal with PCM audio
samples in integer format, but audio apps like jackd and its clients use
floating point, so in practice the CPU is still processing audio data much
of the time.

John

2006-01-10 01:17:04

by John Rigg

[permalink] [raw]
Subject: Re: [Alsa-devel] Re: [OT] ALSA userspace API complexity

On Mon, Jan 09, 2006 at 04:29:45PM -0800, David Lang wrote:
> On Tue, 10 Jan 2006, John Rigg wrote:
>
> >>does the CPU touch the data for these, or do you DMA directly from
> >>userspace (i.e. "zero-copy")?
> >
> >The cards I mentioned use DMA. RME actually advertises that some of their
> >cards can handle 52 channels with zero CPU load. Their onboard DSPs can
> >also do routing and mixing, again without touching the CPU.
>
> I was under the (apparently mistaken) impression that you couldn't DMA
> from userspace (something to do with the possibility that the userspace
> memory pages could be swapped out in the middle of the DMA)

Hmm. Maybe I've been paying too much attention to card vendors'
sales talk :)

John

2006-01-10 01:42:34

by Alan

[permalink] [raw]
Subject: Re: [Alsa-devel] Re: [OT] ALSA userspace API complexity

On Llu, 2006-01-09 at 16:29 -0800, David Lang wrote:
> I was under the (apparently mistaken) impression that you couldn't DMA
> from userspace (something to do with the possibility that the userspace
> memory pages could be swapped out in the middle of the DMA)

Drivers can choose to support this two different ways. One is to have a
buffer of kernel memory mapped into user space and shared with the
hardware (this is how OSS did it), the other is to use the 2.6
get_user_pages API to get the physical address of a set of pages and
lock them down so they don't wander off during DMA.

Both have advantages for different uses.

Alan

2006-01-10 01:50:05

by John Rigg

[permalink] [raw]
Subject: Re: [Alsa-devel] Re: [OT] ALSA userspace API complexity

On Tue, Jan 10, 2006 at 02:48:35AM +0200, Hannu Savolainen wrote:
> On Mon, 9 Jan 2006, John Rigg wrote:
>
> > Yes, but the CPU has plenty of other work to do. The sound cards that
> > would be worst affected by this are the big RME cards (non-interleaved) and
> > multiple ice1712 cards (non-interleaved blocks of interleaved data),
> ice1712 uses normal interleaving. There are no "non-interleaved blocks".

With two ice1712 cards I had to patch jackd for MMAP_COMPLEX
support to make them work together. My understanding was that the
individual cards use interleaved data, but when several are combined
the resulting blocks of data are not interleaved together. I realise the
usual way of dealing with this is to use the alsa route plugin with
ttable to interleave them, but I couldn't get it to work with these
cards.

John

2006-01-10 01:56:27

by Lee Revell

[permalink] [raw]
Subject: Re: [Alsa-devel] Re: [OT] ALSA userspace API complexity

On Tue, 2006-01-10 at 00:44 +0000, Alan Cox wrote:
> On Llu, 2006-01-09 at 16:29 -0800, David Lang wrote:
> > I was under the (apparently mistaken) impression that you couldn't DMA
> > from userspace (something to do with the possibility that the userspace
> > memory pages could be swapped out in the middle of the DMA)
>
> Drivers can choose to support this two different ways. One is to have a
> buffer of kernel memory mapped into user space and shared with the
> hardware (this is how OSS did it), the other is to use the 2.6
> get_user_pages API to get the physical address of a set of pages and
> lock them down so they don't wander off during DMA.
>
> Both have advantages for different uses.

ALSA would appear to use the first method (get_user_pages does not
appear in the source), presumably because new ALSA versions still
support 2.4 (and 2.2, maybe even 2.0).

Lee

2006-01-10 02:19:41

by Hannu Savolainen

[permalink] [raw]
Subject: Re: [Alsa-devel] Re: [OT] ALSA userspace API complexity

On Tue, 10 Jan 2006, John Rigg wrote:

> On Tue, Jan 10, 2006 at 02:48:35AM +0200, Hannu Savolainen wrote:
> > On Mon, 9 Jan 2006, John Rigg wrote:
> >
> > > Yes, but the CPU has plenty of other work to do. The sound cards that
> > > would be worst affected by this are the big RME cards (non-interleaved) and
> > > multiple ice1712 cards (non-interleaved blocks of interleaved data),
> > ice1712 uses normal interleaving. There are no "non-interleaved blocks".
>
> With two ice1712 cards I had to patch jackd for MMAP_COMPLEX
> support to make them work together. My understanding was that the
> individual cards use interleaved data, but when several are combined
> the resulting blocks of data are not interleaved together. I realise the
> usual way of dealing with this is to use the alsa route plugin with
> ttable to interleave them, but I couldn't get it to work with these
> cards.
Right. If you use two cards then both of them have independently
interleaved blocks. However if this kind of mapping belongs to the lowest
level audio API or not is yet another API feature to argue about. Higher
level libraries like Jack could do this themselves.

Best regards,

Hannu
-----
Hannu Savolainen ([email protected])
http://www.opensound.com (Open Sound System (OSS))
http://www.compusonic.fi (Finnish OSS pages)
OH2GLH QTH: Karkkila, Finland LOC: KP20CM