2005-04-07 07:47:27

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Thu, 2005-04-07 at 09:36 +0200, Guillaume Thouvenin wrote:
> Hi Evgeniy,
>
> I forgot to put you in the CC of the email so I'm forwarding a post
> about the connector sent on lkml.

Ok, I'm in.

> Best regards,
> Guillaume

> email message attachment, "Forwarded message - Re: connector is
> missing in 2.6.12-rc2-mm1"
> On Thu, 2005-04-07 at 09:36 +0200, Guillaume Thouvenin wrote:
> > Guillaume Thouvenin <[email protected]> wrote:
> > >
> > > Hello,
> > >
> > > I don't see the connector directory in the 2.6.12-rc2-mm1 tree. So it
> > > seems that you removed the connector?
> >
> > Greg dropped it for some reason. I think that's best because it needed a
> > significant amount of rework. I'd like to see it resubitted in totality so
> > we can take another look at it.

Hmm, what exactly do you think _must_ be changed?
Most of your comments are addressed in 4 patches I sent to you and Greg.
Others [mostly atomic allocation] are API extensions and will be added.
There also not included flush on callback removal.

> > It's a new piece of core kernel infrastructure and the barriers for that
> > are necessarily high.
> >
> > > Will you include it again in futur
> > > release? At the same time, will you include the fork connector?
> >
> > I could put the fork connector into -mm, but would like to be convinced
> > that it's acceptable to and useful for all system accounting requirements,
> > not just the one project. That means code, please.

SuperIO and kobject_uevent are also dropped as far as I can see.

Acrypto is being reviewed but it also depends on it, although
it takes to much time, probably will be dropped too.

Proper w1 notification also requires connector.

--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2005-04-07 08:00:36

by Andrew Morton

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

Evgeniy Polyakov <[email protected]> wrote:
>
> > > > I don't see the connector directory in the 2.6.12-rc2-mm1 tree. So it
> > > > seems that you removed the connector?
> > >
> > > Greg dropped it for some reason. I think that's best because it needed a
> > > significant amount of rework. I'd like to see it resubitted in totality so
> > > we can take another look at it.
>
> Hmm, what exactly do you think _must_ be changed?

The stuff we discussed.

Plus, I'm still quite unsettled about the whole object lifecycle
management, refcounting and locking in there. The fact that the code is
littered with peculiar barriers says "something weird is happening here",
and it remains unobvious to me why such a very common kernel pattern was
implemented in such an unusual manner.

So. I'd like to see the whole thing reexplained and resubmitted so we can
think about it all again.

> Most of your comments are addressed in 4 patches I sent to you and Greg.

Which comments were not addressed?

> Others [mostly atomic allocation] are API extensions and will be added.

I would like to see that code before committing to merging anything.

> There also not included flush on callback removal.
>
> > > It's a new piece of core kernel infrastructure and the barriers for that
> > > are necessarily high.
> > >
> > > > Will you include it again in futur
> > > > release? At the same time, will you include the fork connector?
> > >
> > > I could put the fork connector into -mm, but would like to be convinced
> > > that it's acceptable to and useful for all system accounting requirements,
> > > not just the one project. That means code, please.
>
> SuperIO and kobject_uevent are also dropped as far as I can see.
>
> Acrypto is being reviewed but it also depends on it, although
> it takes to much time, probably will be dropped too.
>
> Proper w1 notification also requires connector.

Guillaume was referring to "fork connector", not to "connector".

2005-04-07 08:09:25

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Thu, 2005-04-07 at 11:53 +0400, Evgeniy Polyakov wrote:
> > > Guillaume Thouvenin <[email protected]> wrote:
> > > >
> > > > Hello,
> > > >
> > > > I don't see the connector directory in the 2.6.12-rc2-mm1 tree. So it
> > > > seems that you removed the connector?
> > >
> > > Greg dropped it for some reason. I think that's best because it needed a
> > > significant amount of rework. I'd like to see it resubitted in totality so
> > > we can take another look at it.
>
> Hmm, what exactly do you think _must_ be changed?
> Most of your comments are addressed in 4 patches I sent to you and Greg.
> Others [mostly atomic allocation] are API extensions and will be added.
> There also not included flush on callback removal.
>
> > > It's a new piece of core kernel infrastructure and the barriers for that
> > > are necessarily high.
> > >
> > > > Will you include it again in futur
> > > > release? At the same time, will you include the fork connector?
> > >
> > > I could put the fork connector into -mm, but would like to be convinced
> > > that it's acceptable to and useful for all system accounting requirements,
> > > not just the one project. That means code, please.
>
> SuperIO and kobject_uevent are also dropped as far as I can see.
>
> Acrypto is being reviewed but it also depends on it, although
> it takes to much time, probably will be dropped too.

I mean review process - It is low priority task for maintainers,
so can be preempted, no problem.

> Proper w1 notification also requires connector.

From the other point of view - how can someone use the interface, if it
is
not in the kernel tree?

Main connector idea was not system accounting, but it was designed
so that it can be usefull in other places, like accounting.
The main idea was to simplify userspace control and notification
system - so people did not waste it's time learning how skb's are
allocated
and processed, how socket layer is designed and what all those
netlink_* and NLMSG* mean if they do not need it.

As you can see in kobject_uevent.c changes - it is significant amount of
code
that can be removed if connector is used without _any_ advantages for
not using connector.

But, of course the main word is yours, so you still may allow growing
netlink uids and socket allocation all over the place.
P.S. udev may also use it directly - for example with some notification
from ->open() syscall for /dev/something_major_minor and
that "something" does not exist :)
[Ok, I know you do not like it, just an idea]

--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2005-04-07 08:19:20

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Thu, 2005-04-07 at 00:58 -0700, Andrew Morton wrote:
> Evgeniy Polyakov <[email protected]> wrote:
> >
> > > > > I don't see the connector directory in the 2.6.12-rc2-mm1 tree. So it
> > > > > seems that you removed the connector?
> > > >
> > > > Greg dropped it for some reason. I think that's best because it needed a
> > > > significant amount of rework. I'd like to see it resubitted in totality so
> > > > we can take another look at it.
> >
> > Hmm, what exactly do you think _must_ be changed?
>
> The stuff we discussed.
>
> Plus, I'm still quite unsettled about the whole object lifecycle
> management, refcounting and locking in there. The fact that the code is
> littered with peculiar barriers says "something weird is happening here",
> and it remains unobvious to me why such a very common kernel pattern was
> implemented in such an unusual manner.
>
> So. I'd like to see the whole thing reexplained and resubmitted so we can
> think about it all again.

All those barriers can be replaced with atomic_dec_and_test(),
i.e. with something that returns the value.
Methods that return value requires explicit barriers.

Actually there are quite many places where we have:

cpu0 cpu1
use object
atomic_dec()
if atomic_read/atomic_dec_and_test == 0
free object.

With explicit barriers about use object we can
not catch atomic vs. non atomic reordering.
There still _may_ exist other types of races,
but as we discussed, in connector case they
were my faults [flush on connector removal].

> > Most of your comments are addressed in 4 patches I sent to you and Greg.
>
> Which comments were not addressed?

CBUS code comments [I did not get ack on CBUS itself], and two below
issues.

> > Others [mostly atomic allocation] are API extensions and will be added.
>
> I would like to see that code before committing to merging anything.

Ok.

> > There also not included flush on callback removal.
> >
> > > > It's a new piece of core kernel infrastructure and the barriers for that
> > > > are necessarily high.
> > > >
> > > > > Will you include it again in futur
> > > > > release? At the same time, will you include the fork connector?
> > > >
> > > > I could put the fork connector into -mm, but would like to be convinced
> > > > that it's acceptable to and useful for all system accounting requirements,
> > > > not just the one project. That means code, please.
> >
> > SuperIO and kobject_uevent are also dropped as far as I can see.
> >
> > Acrypto is being reviewed but it also depends on it, although
> > it takes to much time, probably will be dropped too.
> >
> > Proper w1 notification also requires connector.
>
> Guillaume was referring to "fork connector", not to "connector".

fork connector depends on the dropped connector/CBUS.

--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2005-04-07 08:34:16

by Andrew Morton

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

Evgeniy Polyakov <[email protected]> wrote:
>
> >
> > Plus, I'm still quite unsettled about the whole object lifecycle
> > management, refcounting and locking in there. The fact that the code is
> > littered with peculiar barriers says "something weird is happening here",
> > and it remains unobvious to me why such a very common kernel pattern was
> > implemented in such an unusual manner.
> >
> > So. I'd like to see the whole thing reexplained and resubmitted so we can
> > think about it all again.
>
> All those barriers can be replaced with atomic_dec_and_test(),

What a shame you didn't say atomic_dec_and_lock()...

> i.e. with something that returns the value.
> Methods that return value requires explicit barriers.
>
> Actually there are quite many places where we have:
>
> cpu0 cpu1
> use object
> atomic_dec()
> if atomic_read/atomic_dec_and_test == 0
> free object.

Yes, but those places normally also use locking to prevent the obvious race.

Yes, atomic_dec_and_test() and barrier removal would be better. Especially
if it's associated with code commentary which explains why the whole thing
isn't racy (ie: explains why no other CPU can look this object up).

> > Which comments were not addressed?
>
> CBUS code comments [I did not get ack on CBUS itself], and two below
> issues.

I continue to not see any point in cbus. It moves work from one place to
another while increasing the amount of code and quite probably increasing
the net amount of work too.

IOW it looks like a net loss which happens to provide gains in one rather
uninteresting microbenchmark.

I'll gleefully admit that I'm wrong, but I don't think that has been
demonstrated yet.


2005-04-07 09:12:53

by Ian Campbell

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Thu, 2005-04-07 at 12:13 +0400, Evgeniy Polyakov wrote:
> The main idea was to simplify userspace control and notification
> system - so people did not waste it's time learning how skb's are
> allocated
> and processed, how socket layer is designed and what all those
> netlink_* and NLMSG* mean if they do not need it.

Isn't connector built on top of netlink? If so, is there any reason for
it to be a new subsystem rather than an extension the the netlink API?

Ian.
--
Ian Campbell

Employees and their families are not eligible.

2005-04-07 09:49:42

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Thu, 2005-04-07 at 10:12 +0100, Ian Campbell wrote:
> On Thu, 2005-04-07 at 12:13 +0400, Evgeniy Polyakov wrote:
> > The main idea was to simplify userspace control and notification
> > system - so people did not waste it's time learning how skb's are
> > allocated
> > and processed, how socket layer is designed and what all those
> > netlink_* and NLMSG* mean if they do not need it.
>
> Isn't connector built on top of netlink? If so, is there any reason for
> it to be a new subsystem rather than an extension the the netlink API?

Connector is not netlink API extension in any way.
It uses netlink as transport layer, one can change
cn_netlink_send()/cn_input()
into something like bidirectional ioctl and use it.

Only one cn_netlink_send() function can be "described" as API
extension,
although even it is not entirely true.

Better design explanation can be found in lkml/netdev archives.

> Ian.
--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2005-04-07 10:07:00

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Thu, 2005-04-07 at 01:32 -0700, Andrew Morton wrote:
> Evgeniy Polyakov <[email protected]> wrote:
> >
> > >
> > > Plus, I'm still quite unsettled about the whole object lifecycle
> > > management, refcounting and locking in there. The fact that the code is
> > > littered with peculiar barriers says "something weird is happening here",
> > > and it remains unobvious to me why such a very common kernel pattern was
> > > implemented in such an unusual manner.
> > >
> > > So. I'd like to see the whole thing reexplained and resubmitted so we can
> > > think about it all again.
> >
> > All those barriers can be replaced with atomic_dec_and_test(),
>
> What a shame you didn't say atomic_dec_and_lock()...

Oops...

> > i.e. with something that returns the value.
> > Methods that return value requires explicit barriers.
> >
> > Actually there are quite many places where we have:
> >
> > cpu0 cpu1
> > use object
> > atomic_dec()
> > if atomic_read/atomic_dec_and_test == 0
> > free object.
>
> Yes, but those places normally also use locking to prevent the obvious race.

In case of connector all free/remove/destroy places
can be accessed only after appropriate object is unlinked
and no new work [only that work remains that already incremented the
counter]
can be assigned to the object.
It was design goal [the same is true for w1, superio, acrypto]
to create such a lifecycle:
object is created
work is assigned with atomical refcnt incrementing
object is "scheduled"[1] to be removed
object is destroyed if it is not used [i.e. it's refcnt is zero].

[1]: "scheduled" means that work [removal in our case] can be done not
now, but after some condition, like refcnt zeroing.

> Yes, atomic_dec_and_test() and barrier removal would be better. Especially
> if it's associated with code commentary which explains why the whole thing
> isn't racy (ie: explains why no other CPU can look this object up).

atomic_dec_and_test() is more expensive than 2 barriers + atomic_dec(),
but in case of connector I think the price is not so high.

Should questionable design notes be described in in-source
documentation
or patch description is enough?

> > > Which comments were not addressed?
> >
> > CBUS code comments [I did not get ack on CBUS itself], and two below
> > issues.
>
> I continue to not see any point in cbus. It moves work from one place to
> another while increasing the amount of code and quite probably increasing
> the net amount of work too.

Ok.

Cache utilisation and ability to send events from any context
and under the locks are significant issues which are solved in CBUS,
but they are not the most important reasons.

Ability to not slow down fast pathes - that is the main reason.

Concider situation when one may want to have notification
of each new write system call - let's say without such
notification it took about 1 second to write one page from userspace,
now with notification sending, which is not so fast, it will take
1.5 seconds, but with CBUS write() still costs 1 second plus
later, when we do not care about writing performance and scheduler
decides to run CBUS thread, those notifications will take additional
0.5 seconds to be delivered.

But if one requires not delayed fact of the notification, but
almost immediate event - one can still use direct connector's methods.

So the main usage case for CBUS is sutuations when we do not want
to slow process down just to send notification about it,
instead we can create such a notification, and deliver it later.

It is design issue - queue events in fast pathes, lock contexts,
irq [soft irq] contexts and safely send them later.

It also provides better controlling mechanism over messages to be sent -
using limited number of messages to be sent in a time,
CBUS smoothes shape peaks which are created by huge amount of
notification
to be send in a time.

> IOW it looks like a net loss which happens to provide gains in one rather
> uninteresting microbenchmark.

No messages were lost during tests,
since there were no OOM condition [atomic allocation never fails]
and userspace application read it's socket queue properly,
so socket queue and thus connector was not starved by userspace.

That benchmark showed that CBUS/connector does not slow down fast
fork() path while delivering messages, and all notifications about
forks were delivered.

From your point of view it is better to complete notification work
before event itself is finished, but you definitely want to
queue block requests before they reach low-level device driver,
and queue skbs but not process them in receive IRQ handler.

Last two cases are created exactly for not slowing down fast path,
so other fast events [new interrupts] may occure and all them
[and the real work can be done] can be processed later.

> I'll gleefully admit that I'm wrong, but I don't think that has been
> demonstrated yet.
>
--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2005-04-07 10:42:05

by Kay Sievers

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Thu, 2005-04-07 at 13:52 +0400, Evgeniy Polyakov wrote:
> On Thu, 2005-04-07 at 10:12 +0100, Ian Campbell wrote:
> > On Thu, 2005-04-07 at 12:13 +0400, Evgeniy Polyakov wrote:
> > > The main idea was to simplify userspace control and notification
> > > system - so people did not waste it's time learning how skb's are
> > > allocated
> > > and processed, how socket layer is designed and what all those
> > > netlink_* and NLMSG* mean if they do not need it.
> >
> > Isn't connector built on top of netlink? If so, is there any reason for
> > it to be a new subsystem rather than an extension the the netlink API?
>
> Connector is not netlink API extension in any way.
> It uses netlink as transport layer, one can change
> cn_netlink_send()/cn_input()
> into something like bidirectional ioctl and use it.
>
> Only one cn_netlink_send() function can be "described" as API
> extension,
> although even it is not entirely true.

I see much overlap here too. Wouldn't it be nice to see the transport
part of the connector code to be implemented as a generic netlink
multicast? We already have uni- and broadcast for netlink.

Isn't the whole purpose of the connector to hook in notifications that
act only if someone is listening? That is a perfect multicast case. :)

At the time we added kobject_uevent I was missing something like this.
The broadcast groups did not really fit, and we decided not to use them,
and unicast wasn't a option here.

Thanks,
Kay

2005-04-07 11:20:01

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Thu, 2005-04-07 at 12:41 +0200, Kay Sievers wrote:
> On Thu, 2005-04-07 at 13:52 +0400, Evgeniy Polyakov wrote:
> > On Thu, 2005-04-07 at 10:12 +0100, Ian Campbell wrote:
> > > On Thu, 2005-04-07 at 12:13 +0400, Evgeniy Polyakov wrote:
> > > > The main idea was to simplify userspace control and notification
> > > > system - so people did not waste it's time learning how skb's are
> > > > allocated
> > > > and processed, how socket layer is designed and what all those
> > > > netlink_* and NLMSG* mean if they do not need it.
> > >
> > > Isn't connector built on top of netlink? If so, is there any reason for
> > > it to be a new subsystem rather than an extension the the netlink API?
> >
> > Connector is not netlink API extension in any way.
> > It uses netlink as transport layer, one can change
> > cn_netlink_send()/cn_input()
> > into something like bidirectional ioctl and use it.
> >
> > Only one cn_netlink_send() function can be "described" as API
> > extension,
> > although even it is not entirely true.
>
> I see much overlap here too. Wouldn't it be nice to see the transport
> part of the connector code to be implemented as a generic netlink
> multicast? We already have uni- and broadcast for netlink.

Netlink broadcast is multicast actually,
if listener exists, then message will be sent to him,
if no - skb will be just freed.

> Isn't the whole purpose of the connector to hook in notifications that
> act only if someone is listening? That is a perfect multicast case. :)

Connector can be used to send data from userspace to kernelspace,
so it allows sending controlling messages without ioctl() compatibility
mess and so on.

One may use cn_netlink_send() to send notification without being
registered
in connector, if it's second parameter is 0, then appropriate
connector listener will be searched for.

It is different from netlink messages,
netlink is a transport layer for connector.

> At the time we added kobject_uevent I was missing something like this.
> The broadcast groups did not really fit, and we decided not to use them,
> and unicast wasn't a option here.

There is a check for listener in netlink_broadcast() - sk_for_each_bound
().

> Thanks,
> Kay
--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2005-04-07 14:23:55

by Kay Sievers

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Thu, Apr 07, 2005 at 03:24:34PM +0400, Evgeniy Polyakov wrote:
> On Thu, 2005-04-07 at 12:41 +0200, Kay Sievers wrote:
> > On Thu, 2005-04-07 at 13:52 +0400, Evgeniy Polyakov wrote:
> > > On Thu, 2005-04-07 at 10:12 +0100, Ian Campbell wrote:
> > > > On Thu, 2005-04-07 at 12:13 +0400, Evgeniy Polyakov wrote:
> > > > > The main idea was to simplify userspace control and notification
> > > > > system - so people did not waste it's time learning how skb's are
> > > > > allocated
> > > > > and processed, how socket layer is designed and what all those
> > > > > netlink_* and NLMSG* mean if they do not need it.
> > > >
> > > > Isn't connector built on top of netlink? If so, is there any reason for
> > > > it to be a new subsystem rather than an extension the the netlink API?
> > >
> > > Connector is not netlink API extension in any way.
> > > It uses netlink as transport layer, one can change
> > > cn_netlink_send()/cn_input()
> > > into something like bidirectional ioctl and use it.
> > >
> > > Only one cn_netlink_send() function can be "described" as API
> > > extension,
> > > although even it is not entirely true.
> >
> > I see much overlap here too. Wouldn't it be nice to see the transport
> > part of the connector code to be implemented as a generic netlink
> > multicast? We already have uni- and broadcast for netlink.
>
> Netlink broadcast is multicast actually,
> if listener exists, then message will be sent to him,
> if no - skb will be just freed.
>
> > Isn't the whole purpose of the connector to hook in notifications that
> > act only if someone is listening? That is a perfect multicast case. :)
>
> Connector can be used to send data from userspace to kernelspace,
> so it allows sending controlling messages without ioctl() compatibility
> mess and so on.
>
> One may use cn_netlink_send() to send notification without being
> registered
> in connector, if it's second parameter is 0, then appropriate
> connector listener will be searched for.

Sure, but seems I need to ask again: What is the exact reason not to implement
the muticast message multiplexing/subscription part of the connector as a
generic part of netlink? That would be nice to have and useful for other
subsystems too as an option to the current broadcast.

> It is different from netlink messages,
> netlink is a transport layer for connector.

That's still possible and the kernel usually doesn't care about unimplemented
alternatives. :)

Thanks,
Kay

2005-04-07 14:42:38

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Thu, 2005-04-07 at 16:23 +0200, Kay Sievers wrote:

> Sure, but seems I need to ask again: What is the exact reason not to implement
> the muticast message multiplexing/subscription part of the connector as a
> generic part of netlink? That would be nice to have and useful for other
> subsystems too as an option to the current broadcast.

I do not understamd, what is netlink multicast and how it differs from
the
existing behaviour?
Netlink message can be delivered only if someone listens for it.

Subscription part of connector is similar to existing group mechanims,
it is moved a bit higher from do_one_broadcast(),
since it is required for proper high-layer protocol and
it's users do not care about netlink groups, but only it's
own ID's, which can be different.

> Thanks,
> Kay
--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2005-04-07 15:50:41

by James Morris

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Thu, 7 Apr 2005, Kay Sievers wrote:

> Sure, but seems I need to ask again: What is the exact reason not to implement
> the muticast message multiplexing/subscription part of the connector as a
> generic part of netlink? That would be nice to have and useful for other
> subsystems too as an option to the current broadcast.

This is a good point, in general, consider generically extending Netlink
itself instead of creating these separate things.


- James
--
James Morris
<[email protected]>


2005-04-08 03:01:23

by Herbert Xu

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

Evgeniy Polyakov <[email protected]> wrote:
>
> atomic_dec_and_test() is more expensive than 2 barriers + atomic_dec(),
> but in case of connector I think the price is not so high.

Can you list the platforms on which this is true?

Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2005-04-08 03:27:52

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Fri, 2005-04-08 at 12:59 +1000, Herbert Xu wrote:
> Evgeniy Polyakov <[email protected]> wrote:
> >
> > atomic_dec_and_test() is more expensive than 2 barriers + atomic_dec(),
> > but in case of connector I think the price is not so high.
>
> Can you list the platforms on which this is true?

sparc64, some mips [at least in UP].
I do not know about others, but I think that any
arch, where atomic operations are serialized,
has sync/isync in atomic_dec_and_test()
and does not for atomic_dec().

> Thanks,
--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2005-04-08 03:33:36

by Herbert Xu

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Fri, Apr 08, 2005 at 07:33:58AM +0400, Evgeniy Polyakov wrote:
> On Fri, 2005-04-08 at 12:59 +1000, Herbert Xu wrote:
> > Evgeniy Polyakov <[email protected]> wrote:
> > >
> > > atomic_dec_and_test() is more expensive than 2 barriers + atomic_dec(),
> > > but in case of connector I think the price is not so high.
> >
> > Can you list the platforms on which this is true?
>
> sparc64, some mips [at least in UP].

Are you sure? The implementations of atomic_sub and atomic_sub_return
(which correspond to atomic_dec and atomic_dec_and_test) seem to be
comparable in cost on those two architectures.

Perhaps Dave can clarify for us about sparc64?

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2005-04-08 03:35:30

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Thu, 2005-04-07 at 11:47 -0400, James Morris wrote:
> On Thu, 7 Apr 2005, Kay Sievers wrote:
>
> > Sure, but seems I need to ask again: What is the exact reason not to implement
> > the muticast message multiplexing/subscription part of the connector as a
> > generic part of netlink? That would be nice to have and useful for other
> > subsystems too as an option to the current broadcast.
>
> This is a good point, in general, consider generically extending Netlink
> itself instead of creating these separate things.

I just do not understand, what does netlink multicasting means
and how it is different from what we have now.
Currently we have group registratin in bind(),
and then send data to the bound socket if it
has appropriate group.

Or should some error be propagated to the caller,
if there is no appropriate listener?

Connector requires it's own registration technique for
1. hide all transport [netlink] layer from higher protocols which use
connector
2. create different group appointment for the given connector's ID
[it was different, now new group which is eqal to idx field is appointed
to
the new callback]
3. provide more generic set of ids

>
> - James
--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2005-04-08 03:46:42

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Fri, 2005-04-08 at 13:32 +1000, Herbert Xu wrote:
> On Fri, Apr 08, 2005 at 07:33:58AM +0400, Evgeniy Polyakov wrote:
> > On Fri, 2005-04-08 at 12:59 +1000, Herbert Xu wrote:
> > > Evgeniy Polyakov <[email protected]> wrote:
> > > >
> > > > atomic_dec_and_test() is more expensive than 2 barriers + atomic_dec(),
> > > > but in case of connector I think the price is not so high.
> > >
> > > Can you list the platforms on which this is true?
> >
> > sparc64, some mips [at least in UP].
>
> Are you sure? The implementations of atomic_sub and atomic_sub_return
> (which correspond to atomic_dec and atomic_dec_and_test) seem to be
> comparable in cost on those two architectures.

mips has additional sync.
sparc64 has 32->64 conversation on exit.

> Perhaps Dave can clarify for us about sparc64?

Dave?

> Cheers,
--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2005-04-08 03:51:49

by Herbert Xu

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Fri, Apr 08, 2005 at 07:52:34AM +0400, Evgeniy Polyakov wrote:
> On Fri, 2005-04-08 at 13:32 +1000, Herbert Xu wrote:
> > On Fri, Apr 08, 2005 at 07:33:58AM +0400, Evgeniy Polyakov wrote:
> > > On Fri, 2005-04-08 at 12:59 +1000, Herbert Xu wrote:
> > > > Evgeniy Polyakov <[email protected]> wrote:
> > > > >
> > > > > atomic_dec_and_test() is more expensive than 2 barriers + atomic_dec(),
> > > > > but in case of connector I think the price is not so high.
> > > >
> > > > Can you list the platforms on which this is true?
> > >
> > > sparc64, some mips [at least in UP].
> >
> > Are you sure? The implementations of atomic_sub and atomic_sub_return
> > (which correspond to atomic_dec and atomic_dec_and_test) seem to be
> > comparable in cost on those two architectures.
>
> mips has additional sync.

But atomic_dec + 2 barries is going to do the sync as well, no?

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2005-04-08 04:00:11

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Fri, 2005-04-08 at 13:50 +1000, Herbert Xu wrote:
> On Fri, Apr 08, 2005 at 07:52:34AM +0400, Evgeniy Polyakov wrote:
> > On Fri, 2005-04-08 at 13:32 +1000, Herbert Xu wrote:
> > > On Fri, Apr 08, 2005 at 07:33:58AM +0400, Evgeniy Polyakov wrote:
> > > > On Fri, 2005-04-08 at 12:59 +1000, Herbert Xu wrote:
> > > > > Evgeniy Polyakov <[email protected]> wrote:
> > > > > >
> > > > > > atomic_dec_and_test() is more expensive than 2 barriers + atomic_dec(),
> > > > > > but in case of connector I think the price is not so high.
> > > > >
> > > > > Can you list the platforms on which this is true?
> > > >
> > > > sparc64, some mips [at least in UP].
> > >
> > > Are you sure? The implementations of atomic_sub and atomic_sub_return
> > > (which correspond to atomic_dec and atomic_dec_and_test) seem to be
> > > comparable in cost on those two architectures.
> >
> > mips has additional sync.
>
> But atomic_dec + 2 barries is going to do the sync as well, no?

On UP do not.

> Cheers,
--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2005-04-08 04:03:35

by Herbert Xu

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Fri, Apr 08, 2005 at 08:02:49AM +0400, Evgeniy Polyakov wrote:
>
> > > mips has additional sync.
> >
> > But atomic_dec + 2 barries is going to do the sync as well, no?
>
> On UP do not.

Shouldn't we should be fixing the MIPS implementation of
atomic_sub_return to not do the sync on UP then?
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2005-04-08 04:15:39

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Fri, 2005-04-08 at 14:02 +1000, Herbert Xu wrote:
> On Fri, Apr 08, 2005 at 08:02:49AM +0400, Evgeniy Polyakov wrote:
> >
> > > > mips has additional sync.
> > >
> > > But atomic_dec + 2 barries is going to do the sync as well, no?
> >
> > On UP do not.
>
> Shouldn't we should be fixing the MIPS implementation of
> atomic_sub_return to not do the sync on UP then?

Unfortunately not, that sync is required exactly for return value store.

--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2005-04-08 04:19:07

by Herbert Xu

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Fri, Apr 08, 2005 at 08:21:28AM +0400, Evgeniy Polyakov wrote:
>
> > > On UP do not.
> >
> > Shouldn't we should be fixing the MIPS implementation of
> > atomic_sub_return to not do the sync on UP then?
>
> Unfortunately not, that sync is required exactly for return value store.

On UP?
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2005-04-08 04:25:26

by David Miller

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Fri, 8 Apr 2005 14:17:24 +1000
Herbert Xu <[email protected]> wrote:

> On UP?

I think the barrier can be eliminated on MIPS on UP.

2005-04-08 04:25:08

by David Miller

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Fri, 08 Apr 2005 07:52:34 +0400
Evgeniy Polyakov <[email protected]> wrote:

> sparc64 has 32->64 conversation on exit.

It's extremely cheap, the conversion instruction
pairs with the retl instruction so it's essentially
free.

Talking about an arithmetic instruction over is complete
nonsense when the atomic CAS instruction itself takes a minimum
of 32 processor cycles :-)

2005-04-08 04:49:51

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Fri, 2005-04-08 at 14:17 +1000, Herbert Xu wrote:
> On Fri, Apr 08, 2005 at 08:21:28AM +0400, Evgeniy Polyakov wrote:
> >
> > > > On UP do not.
> > >
> > > Shouldn't we should be fixing the MIPS implementation of
> > > atomic_sub_return to not do the sync on UP then?
> >
> > Unfortunately not, that sync is required exactly for return value store.
>
> On UP?

Yes, some quotes:

The memory access type of a location affects the behavior of I-fetch,
load, store,
and prefetch operations to the location. In addition, memory access
types affect
some instruction descriptions. Load linked (LL, LLD) and store
conditional (SC,
SCD) have defined operation only for locations with cached memory access
type.
SYNC affects only load and stores made to locations with uncached or
cached
coherent memory access types.

3. The SC [comment: store conditional]
stores a new value into the memory word, unless the new value has
been modified. If the word has not been modified, the store succeeds and
a 1
is stored in the destination register. Otherwise the Store Conditional
fails,
memory is not modified, and a 0 is loaded into the destination register.
Since
the instruction format has only a single field to select a data register
(rt), this
destination register is the same as the register which was stored.
Load Linked and Store Conditional instructions (LL, LLD, SC, and SCD) do
not
implicitly perform SYNC operations in the R10000 processor.

--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2005-04-08 04:54:04

by Herbert Xu

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Fri, Apr 08, 2005 at 08:55:27AM +0400, Evgeniy Polyakov wrote:
>
> > > Unfortunately not, that sync is required exactly for return value store.
> >
> > On UP?
>
> Yes, some quotes:

Yes but what will go wrong on uni-processor MIPS when you don't do the
sync in atomic_sub_return?

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2005-04-08 04:57:45

by David Miller

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Fri, 8 Apr 2005 14:53:02 +1000
Herbert Xu <[email protected]> wrote:

> Yes but what will go wrong on uni-processor MIPS when you don't do the
> sync in atomic_sub_return?

Indeed. I see nothing in those quotes which indicate that the
SYNC is needed on uniprocessor. It's only saying things such
as "SYNC only affects load and store instructions" and
"LL/SC can only be performed on cacheable memory". Stuff
like that.

2005-04-08 05:05:54

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Fri, 2005-04-08 at 14:53 +1000, Herbert Xu wrote:
> On Fri, Apr 08, 2005 at 08:55:27AM +0400, Evgeniy Polyakov wrote:
> >
> > > > Unfortunately not, that sync is required exactly for return value store.
> > >
> > > On UP?
> >
> > Yes, some quotes:
>
> Yes but what will go wrong on uni-processor MIPS when you don't do the
> sync in atomic_sub_return?

Sync synchornizes cached mamory access,
without it new value may be stored only into cache,
but not into memory.

> Cheers,
--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2005-04-08 05:09:17

by Herbert Xu

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Fri, Apr 08, 2005 at 09:11:56AM +0400, Evgeniy Polyakov wrote:
>
> > Yes but what will go wrong on uni-processor MIPS when you don't do the
> > sync in atomic_sub_return?
>
> Sync synchornizes cached mamory access,
> without it new value may be stored only into cache,
> but not into memory.

I know, the same thing holds for most architectures, including i386.
However, this is not an issue for uni-processor kernels anywhere else,
so what's so special about MIPS?

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2005-04-08 05:13:29

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Fri, 2005-04-08 at 15:08 +1000, Herbert Xu wrote:
> On Fri, Apr 08, 2005 at 09:11:56AM +0400, Evgeniy Polyakov wrote:
> >
> > > Yes but what will go wrong on uni-processor MIPS when you don't do the
> > > sync in atomic_sub_return?
> >
> > Sync synchornizes cached mamory access,
> > without it new value may be stored only into cache,
> > but not into memory.
>
> I know, the same thing holds for most architectures, including i386.
> However, this is not an issue for uni-processor kernels anywhere else,
> so what's so special about MIPS?

Does i386 or ppc has cached and uncached memory?
No, i386, ppc and others do not require sync on uncached memory access,
and only instruction not data cache sync on SMP.

> Cheers,
--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2005-04-08 05:56:04

by James Morris

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Fri, 8 Apr 2005, Evgeniy Polyakov wrote:

> > > Sure, but seems I need to ask again: What is the exact reason not to implement
> > > the muticast message multiplexing/subscription part of the connector as a
> > > generic part of netlink? That would be nice to have and useful for other
> > > subsystems too as an option to the current broadcast.
> >
> > This is a good point, in general, consider generically extending Netlink
> > itself instead of creating these separate things.
>

> Connector requires it's own registration technique for
> 1. hide all transport [netlink] layer from higher protocols which use
> connector

Why?

> 2. create different group appointment for the given connector's ID
> [it was different, now new group which is eqal to idx field is appointed
> to
> the new callback]

I don't understand.

> 3. provide more generic set of ids

What do you mean by "ids"?


- James
--
James Morris
<[email protected]>


2005-04-08 06:05:55

by David Miller

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Fri, 08 Apr 2005 09:19:39 +0400
Evgeniy Polyakov <[email protected]> wrote:

> > I know, the same thing holds for most architectures, including i386.
> > However, this is not an issue for uni-processor kernels anywhere else,
> > so what's so special about MIPS?
>
> Does i386 or ppc has cached and uncached memory?

Yes, they do.

> No, i386, ppc and others do not require sync on uncached memory access,
> and only instruction not data cache sync on SMP.

On MIPS, all the MIPS atomic operations will operate on cached memory.
And as far as a uniprocessor cpu is concerned, updating the cache is
all that matters.

In fact, this SYNC instruction seems unnecessary even on SMP. If the
cache is updated, it is part of the coherent memory space and thus
MOESI main bus SMP cache coherency transactions will see the update
value. When another processor does a "read-to-share" or "read-to-own"
request on the main bus, the processor which did the atomic OP will
provide the correct data from it's cache in response to that transaction.

So what you have to do is show me an example where the MIPS kernel can
do an atomic.h operation on uncached memory. I even think that is
invalid, come to think of it.

2005-04-08 06:06:43

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Fri, 2005-04-08 at 09:19 +0400, Evgeniy Polyakov wrote:
> On Fri, 2005-04-08 at 15:08 +1000, Herbert Xu wrote:
> > On Fri, Apr 08, 2005 at 09:11:56AM +0400, Evgeniy Polyakov wrote:
> > >
> > > > Yes but what will go wrong on uni-processor MIPS when you don't do the
> > > > sync in atomic_sub_return?
> > >
> > > Sync synchornizes cached mamory access,
> > > without it new value may be stored only into cache,
> > > but not into memory.
> >
> > I know, the same thing holds for most architectures, including i386.
> > However, this is not an issue for uni-processor kernels anywhere else,
> > so what's so special about MIPS?
>
> Does i386 or ppc has cached and uncached memory?
> No, i386, ppc and others do not require sync on uncached memory access,
> and only instruction not data cache sync on SMP.

Ugh, now I see your point :)
For UP we may have some nitpics with DMA,
but I doubt anyone will use atomic pointer for DMA.
sync will not be an issue in atomic ops.

> > Cheers,
--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2005-04-08 06:43:59

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Fri, 2005-04-08 at 01:55 -0400, James Morris wrote:
> On Fri, 8 Apr 2005, Evgeniy Polyakov wrote:
>
> > > > Sure, but seems I need to ask again: What is the exact reason not to implement
> > > > the muticast message multiplexing/subscription part of the connector as a
> > > > generic part of netlink? That would be nice to have and useful for other
> > > > subsystems too as an option to the current broadcast.
> > >
> > > This is a good point, in general, consider generically extending Netlink
> > > itself instead of creating these separate things.
> >
>
> > Connector requires it's own registration technique for
> > 1. hide all transport [netlink] layer from higher protocols which use
> > connector
>
> Why?

User should not know about low-level transport -
it is like socket layer - write only data and do not care about
how it will be delivered.

> > 2. create different group appointment for the given connector's ID
> > [it was different, now new group which is eqal to idx field is appointed
> > to
> > the new callback]
>
> I don't understand.

In the previous versions netlink group was assigned as incremented
counter,
that was not convenient, but now we have 2-way ID, which is better
from users point of view - idx is supposed to be major id, val -
some subsystem of that set.

> > 3. provide more generic set of ids
>
> What do you mean by "ids"?

Each connector message requires pair of u32 ids - idx and val.
Idx is generic system id [which is equal to the appropriate netlink
group
in current implementation], while val is id of some subsytem
inside idx.

Using only group without it's own connector's id will heavily
complex callback register/unregister notification [Jamal's suggested
feature]
for example.

>
> - James
--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2005-04-08 13:15:38

by Daniel Jacobowitz

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Thu, Apr 07, 2005 at 11:02:22PM -0700, David S. Miller wrote:
> On Fri, 08 Apr 2005 09:19:39 +0400
> Evgeniy Polyakov <[email protected]> wrote:
>
> > > I know, the same thing holds for most architectures, including i386.
> > > However, this is not an issue for uni-processor kernels anywhere else,
> > > so what's so special about MIPS?
> >
> > Does i386 or ppc has cached and uncached memory?
>
> Yes, they do.
>
> > No, i386, ppc and others do not require sync on uncached memory access,
> > and only instruction not data cache sync on SMP.
>
> On MIPS, all the MIPS atomic operations will operate on cached memory.
> And as far as a uniprocessor cpu is concerned, updating the cache is
> all that matters.
>
> In fact, this SYNC instruction seems unnecessary even on SMP. If the
> cache is updated, it is part of the coherent memory space and thus
> MOESI main bus SMP cache coherency transactions will see the update
> value. When another processor does a "read-to-share" or "read-to-own"
> request on the main bus, the processor which did the atomic OP will
> provide the correct data from it's cache in response to that transaction.
>
> So what you have to do is show me an example where the MIPS kernel can
> do an atomic.h operation on uncached memory. I even think that is
> invalid, come to think of it.

It better be...

My impression is that the MIPS story isn't so simple, because the
architecture only offers very weak coherency guarantees. Most of the
SMP implementations offer strong coherency in practice, but at least
one (RM9000) doesn't.

--
Daniel Jacobowitz
CodeSourcery, LLC

2005-04-10 09:56:04

by Herbert Xu

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

Please add netdev to the CC list since this discussion pertains to
the networking subsystem.

Evgeniy Polyakov <[email protected]> wrote:
>
> User should not know about low-level transport -
> it is like socket layer - write only data and do not care about
> how it will be delivered.

The delineation between transport and upper layer is fuzzy.
In one situation the protocol might be transport and in another
it could be above the transport.

So I don't buy this argument.

> In the previous versions netlink group was assigned as incremented
> counter,
> that was not convenient, but now we have 2-way ID, which is better
> from users point of view - idx is supposed to be major id, val -
> some subsystem of that set.

Actually netlink does let you bind to a specific ID.

Of course you may argue that a single u32 is not enough. However,
nothing is stopping you from introducing netlink v2 that extends
this.

In fact this is my main gripe with your patch: Why aren't you
extending netlink instead of hacking something on top of the existing
netlink? If the extensions require breaking compatibility: Fine,
you just need to call it netlink v2.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2005-04-10 10:33:46

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Sun, 10 Apr 2005 19:52:54 +1000
Herbert Xu <[email protected]> wrote:

> Please add netdev to the CC list since this discussion pertains to
> the networking subsystem.
>
> Evgeniy Polyakov <[email protected]> wrote:
> >
> > User should not know about low-level transport -
> > it is like socket layer - write only data and do not care about
> > how it will be delivered.
>
> The delineation between transport and upper layer is fuzzy.
> In one situation the protocol might be transport and in another
> it could be above the transport.
>
> So I don't buy this argument.

Connector allows to hide transport layer completely -
concider acrypto or superio - that subsystems do not know
about network, author of the temperature driver for superio
should not know how skb is allocated and processed,
and acrypto is not network related system, so why
they should know about network API?
Why should they know what is skb, how it is allocated,
why shared skb can not be used in netlink and so on?
Users of the connector are only cared about
destination ID and how to send/receive message.
And what to do if we do not want something like connector
to be used for userspace control - we need to create
each time new netlink socket unit, like kobject's one,
or netlink_ulog, we need to register it in netlink.h,
where already there too many units.

> > In the previous versions netlink group was assigned as incremented
> > counter,
> > that was not convenient, but now we have 2-way ID, which is better
> > from users point of view - idx is supposed to be major id, val -
> > some subsystem of that set.
>
> Actually netlink does let you bind to a specific ID.
>
> Of course you may argue that a single u32 is not enough. However,
> nothing is stopping you from introducing netlink v2 that extends
> this.
>
> In fact this is my main gripe with your patch: Why aren't you
> extending netlink instead of hacking something on top of the existing
> netlink? If the extensions require breaking compatibility: Fine,
> you just need to call it netlink v2.

When connector was created in the middle of 2004, it's main aim
was allowing easy userspace control over netlink.
Creating it's own skb receive parser was acceptible for
one project, for two, but it is definitely not the solution
for general use. And also I think it is not so easy to include new
netlink family unit for some completely unrelated to network subsystem.

So I decided to create only one skb parser, and put all skb processing
in one place, so any user has to do only registration with it's
own receive callback, and sending function.

Thus, transport layer was completely hidden from connector users,
there are no complex netlink macros there, no network API
and all complex related issues, no huge code duplication in each device,
which does not want to mess with 32/64 ioctl issues, but
want easy userspace control.

Later connector was changed a bit to allow easy notification
ability and new bus was added.
Connector is the solution for d-bus related projects, since
all control is in one place, there are destination ID,
there is no complex API.

Concider the latest xfrm event changes - good that we already
have netlink socket there, but in each sending function
[there are at least three new]
we have all those skb_alloc, skb_put, NLMSG_* and so on which are
absolutely identical.
According to names and structures itself, they are too close
to what connector is for, and how it is implemented, so we already
have several connectors in the tree, and do we really want
to proceed with it all over the place?


> Cheers,
> --
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <[email protected]>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Evgeniy Polyakov

Only failure makes us experts. -- Theo de Raadt

2005-04-10 11:08:53

by Kay Sievers

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Sun, 2005-04-10 at 14:32 +0400, Evgeniy Polyakov wrote:
> On Sun, 10 Apr 2005 19:52:54 +1000
> Herbert Xu <[email protected]> wrote:
> > Evgeniy Polyakov <[email protected]> wrote:
> > >
> > > User should not know about low-level transport -
> > > it is like socket layer - write only data and do not care about
> > > how it will be delivered.
> >
> > The delineation between transport and upper layer is fuzzy.
> > In one situation the protocol might be transport and in another
> > it could be above the transport.
> >
> > So I don't buy this argument.
>
> Connector allows to hide transport layer completely -
> concider acrypto or superio - that subsystems do not know
> about network, author of the temperature driver for superio
> should not know how skb is allocated and processed,
> and acrypto is not network related system, so why
> they should know about network API?
> Why should they know what is skb, how it is allocated,
> why shared skb can not be used in netlink and so on?
> Users of the connector are only cared about
> destination ID and how to send/receive message.
> And what to do if we do not want something like connector
> to be used for userspace control - we need to create
> each time new netlink socket unit, like kobject's one,
> or netlink_ulog, we need to register it in netlink.h,
> where already there too many units.
>
> > > In the previous versions netlink group was assigned as incremented
> > > counter,
> > > that was not convenient, but now we have 2-way ID, which is better
> > > from users point of view - idx is supposed to be major id, val -
> > > some subsystem of that set.
> >
> > Actually netlink does let you bind to a specific ID.
> >
> > Of course you may argue that a single u32 is not enough. However,
> > nothing is stopping you from introducing netlink v2 that extends
> > this.
> >
> > In fact this is my main gripe with your patch: Why aren't you
> > extending netlink instead of hacking something on top of the existing
> > netlink? If the extensions require breaking compatibility: Fine,
> > you just need to call it netlink v2.
>
> When connector was created in the middle of 2004, it's main aim
> was allowing easy userspace control over netlink.
> Creating it's own skb receive parser was acceptible for
> one project, for two, but it is definitely not the solution
> for general use. And also I think it is not so easy to include new
> netlink family unit for some completely unrelated to network subsystem.
>
> So I decided to create only one skb parser, and put all skb processing
> in one place, so any user has to do only registration with it's
> own receive callback, and sending function.

Sure, but that would apply the a generic netlink extension too, right?

> Thus, transport layer was completely hidden from connector users,
> there are no complex netlink macros there, no network API
> and all complex related issues, no huge code duplication in each device,
> which does not want to mess with 32/64 ioctl issues, but
> want easy userspace control.

I don't see the need for unimplemented abstractions here. You can
replace a generic netlink-use just the same way as you can replace the
connector's own netlink code below the connector API. There is not much
difference.

> Later connector was changed a bit to allow easy notification
> ability and new bus was added.
> Connector is the solution for d-bus related projects, since
> all control is in one place, there are destination ID,
> there is no complex API.

Sorry, what does this have to do with DBUS?

> Concider the latest xfrm event changes - good that we already
> have netlink socket there, but in each sending function
> [there are at least three new]
> we have all those skb_alloc, skb_put, NLMSG_* and so on which are
> absolutely identical.
> According to names and structures itself, they are too close
> to what connector is for, and how it is implemented, so we already
> have several connectors in the tree, and do we really want
> to proceed with it all over the place?

That's not the point. Nobody asks to replace the whole connector by raw
netlink. But the message subscription and multicasting could be part of
the generic netlink framework. The connector API would still be on-top
if it and nothing changes besides that we would have a nice low-level
api to use for other systems too.
The basic idea behind the connector as a nice generic framework for
netlink, as it fills the missing multicast case, while we already can do
singlecast and broadcast with netlink.
It is just the same way the kernel plays with other core functionality
too. If something is not represented in the vfs-layer your are asked to
add it to the generic part and not implement it privately for your
filesystem only.

Thanks,
Kay

2005-04-10 11:39:20

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Sun, 10 Apr 2005 13:08:44 +0200
Kay Sievers <[email protected]> wrote:

> On Sun, 2005-04-10 at 14:32 +0400, Evgeniy Polyakov wrote:
> > On Sun, 10 Apr 2005 19:52:54 +1000
> > Herbert Xu <[email protected]> wrote:
> > > Evgeniy Polyakov <[email protected]> wrote:
> > > >
> > > > User should not know about low-level transport -
> > > > it is like socket layer - write only data and do not care about
> > > > how it will be delivered.
> > >
> > > The delineation between transport and upper layer is fuzzy.
> > > In one situation the protocol might be transport and in another
> > > it could be above the transport.
> > >
> > > So I don't buy this argument.
> >
> > Connector allows to hide transport layer completely -
> > concider acrypto or superio - that subsystems do not know
> > about network, author of the temperature driver for superio
> > should not know how skb is allocated and processed,
> > and acrypto is not network related system, so why
> > they should know about network API?
> > Why should they know what is skb, how it is allocated,
> > why shared skb can not be used in netlink and so on?
> > Users of the connector are only cared about
> > destination ID and how to send/receive message.
> > And what to do if we do not want something like connector
> > to be used for userspace control - we need to create
> > each time new netlink socket unit, like kobject's one,
> > or netlink_ulog, we need to register it in netlink.h,
> > where already there too many units.
> >
> > > > In the previous versions netlink group was assigned as incremented
> > > > counter,
> > > > that was not convenient, but now we have 2-way ID, which is better
> > > > from users point of view - idx is supposed to be major id, val -
> > > > some subsystem of that set.
> > >
> > > Actually netlink does let you bind to a specific ID.
> > >
> > > Of course you may argue that a single u32 is not enough. However,
> > > nothing is stopping you from introducing netlink v2 that extends
> > > this.
> > >
> > > In fact this is my main gripe with your patch: Why aren't you
> > > extending netlink instead of hacking something on top of the existing
> > > netlink? If the extensions require breaking compatibility: Fine,
> > > you just need to call it netlink v2.
> >
> > When connector was created in the middle of 2004, it's main aim
> > was allowing easy userspace control over netlink.
> > Creating it's own skb receive parser was acceptible for
> > one project, for two, but it is definitely not the solution
> > for general use. And also I think it is not so easy to include new
> > netlink family unit for some completely unrelated to network subsystem.
> >
> > So I decided to create only one skb parser, and put all skb processing
> > in one place, so any user has to do only registration with it's
> > own receive callback, and sending function.
>
> Sure, but that would apply the a generic netlink extension too, right?

Sending is the only and not the most significant part of the connector.
Of course cn_netlink_send() can be split into prepare_send() and commit_send(),
where commit_send() will live in af_netlink.c and do
allocation and so on.
But it will not change the fact, that kernel users still need
to allocate a netlink socket, register it's own family unit, and so on...

Direct netlink usage is like using raw socket or even tun/tap device for TCP/IP,
noone use it in that way, although we can, but connector is like
using socket interface.

> > Thus, transport layer was completely hidden from connector users,
> > there are no complex netlink macros there, no network API
> > and all complex related issues, no huge code duplication in each device,
> > which does not want to mess with 32/64 ioctl issues, but
> > want easy userspace control.
>
> I don't see the need for unimplemented abstractions here. You can
> replace a generic netlink-use just the same way as you can replace the
> connector's own netlink code below the connector API. There is not much
> difference.

Kernel users do not want to implement it's own transport over netlink.
Socket allocation was changed in 2.5 - we could need to change
each driver that use it instead of changing only one place in connector.c
Abstration over netlink already implemented in connector and is used
in acrypto, superio and even kobject_uevent was changed to do it.

> > Later connector was changed a bit to allow easy notification
> > ability and new bus was added.
> > Connector is the solution for d-bus related projects, since
> > all control is in one place, there are destination ID,
> > there is no complex API.
>
> Sorry, what does this have to do with DBUS?

Kernel now has only two ways to inform to userspace about it's things -
kobject [uevent] and hotplug. There is inotify/dnotify too, but it is
diferent in some way.

The second one is a huge monster that can not be used in embedded
systems, calling userspace process from inside the kernel is
now very flexible way.
The first one has it's own socket, it's own protocol and infrastructure
based on strings.

What if we want to inform about some new event?
Should we use kobject_uevent mechanism? Not anything can be
described as kobject strings.
Should we create new socket/proto/infratructure? We do not
have another way.

That is why connector is usefull here - only one set of
methods, known proto, just change "val" in ID and
drop it in the userspace if it is not updated to use
new events.


> > Concider the latest xfrm event changes - good that we already
> > have netlink socket there, but in each sending function
> > [there are at least three new]
> > we have all those skb_alloc, skb_put, NLMSG_* and so on which are
> > absolutely identical.
> > According to names and structures itself, they are too close
> > to what connector is for, and how it is implemented, so we already
> > have several connectors in the tree, and do we really want
> > to proceed with it all over the place?
>
> That's not the point. Nobody asks to replace the whole connector by raw
> netlink. But the message subscription and multicasting could be part of
> the generic netlink framework. The connector API would still be on-top
> if it and nothing changes besides that we would have a nice low-level
> api to use for other systems too.

Netlink already has it's groups - why doesn't it meet your needs?
If someone "subscribed" [bound to that group], so message
sent to it will be delivered, otherwise - not.

One thing may be added to raw netlink - ability to send
to the group but not in a "that" multicast way,
i.e. send message to all subscribed _exactly_ to that group,
so one bound to -1 group will not receive message to
0x123 group when that group was specified in netlink header.

I can prepare a patch it is something like following:
--- ./net/netlink/af_netlink.c.orig 2005-04-10 15:46:48.000000000 +0400
+++ ./net/netlink/af_netlink.c 2005-04-10 15:47:04.000000000 +0400
@@ -747,7 +747,7 @@
if (p->exclude_sk == sk)
goto out;

- if (nlk->pid == p->pid || !(nlk->groups & p->group))
+ if (nlk->pid == p->pid || (nlk->groups != p->group))
goto out;

if (p->failure) {

and requires new name...

> The basic idea behind the connector as a nice generic framework for
> netlink, as it fills the missing multicast case, while we already can do
> singlecast and broadcast with netlink.
> It is just the same way the kernel plays with other core functionality
> too. If something is not represented in the vfs-layer your are asked to
> add it to the generic part and not implement it privately for your
> filesystem only.

Exactly for that reason connector was created - so noone
needs to create it's private sockets/netlink API wrappers/skb handlers
and so on, just register callback and have a ->send() call.

> Thanks,
> Kay


Evgeniy Polyakov

Only failure makes us experts. -- Theo de Raadt

2005-04-10 11:55:46

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Sun, 10 Apr 2005 15:37:57 +0400
Evgeniy Polyakov <[email protected]> wrote:

> The second one is a huge monster that can not be used in embedded
> systems, calling userspace process from inside the kernel is
> now very flexible way.

is NOT very flexible way...


Evgeniy Polyakov

Only failure makes us experts. -- Theo de Raadt

2005-04-10 12:09:55

by Thomas Graf

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

* Evgeniy Polyakov <[email protected]> 2005-04-10 15:37
> --- ./net/netlink/af_netlink.c.orig 2005-04-10 15:46:48.000000000 +0400
> +++ ./net/netlink/af_netlink.c 2005-04-10 15:47:04.000000000 +0400
> @@ -747,7 +747,7 @@
> if (p->exclude_sk == sk)
> goto out;
>
> - if (nlk->pid == p->pid || !(nlk->groups & p->group))
> + if (nlk->pid == p->pid || (nlk->groups != p->group))
> goto out;
>
> if (p->failure) {

Not valid, would break RTMGRP_*.

2005-04-10 12:16:41

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Sun, 10 Apr 2005 14:10:05 +0200
Thomas Graf <[email protected]> wrote:

> * Evgeniy Polyakov <[email protected]> 2005-04-10 15:37
> > --- ./net/netlink/af_netlink.c.orig 2005-04-10 15:46:48.000000000 +0400
> > +++ ./net/netlink/af_netlink.c 2005-04-10 15:47:04.000000000 +0400
> > @@ -747,7 +747,7 @@
> > if (p->exclude_sk == sk)
> > goto out;
> >
> > - if (nlk->pid == p->pid || !(nlk->groups & p->group))
> > + if (nlk->pid == p->pid || (nlk->groups != p->group))
> > goto out;
> >
> > if (p->failure) {
>
> Not valid, would break RTMGRP_*.

Yes, it will break too many existing application,
it would be new API, like do_one_broadcast_direct().
If I understand Kay right, it is what he needs
for the new multicast.

Evgeniy Polyakov

Only failure makes us experts. -- Theo de Raadt

2005-04-10 14:39:35

by jamal

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

Evgeniy,

Please crosspost on netdev - you should know that by now;->

I actually disagreee with Herbert on this. Theres definetely good
need to have a more usable messaging system that rides on top of
netlink. It is not that netlink cant be extended (I actually think thats
a separate topic) - its just that its usability curve is too high.
Thats what the original motivation for konnector was. To make it easy
for joe dumbass. And i think if konnector sticks to just doing that it
will end up being valuable.

I do think (and ive said this before) that Evgeniy is pushing it
by going beyond this simple agenda/focus. Unfortunately, I actually dont
think he is listening _at all_ on this specific issue.

Evgeniy, just stick to the original focus and if it is accepted and
understood then lets move on to adding new features. Otherwise the
result of you adding yet one more feature for CBUS or whatever is
clearly to question what the original motivation was. And i dont think
you are able to add any other points to justify the existence of any new
konnector feature other than describe the original goals. At least thats
what i saw reading this thread.
Otherwise if you really dont know what you want yet lets just pull this
whole thing out IMO.

cheers,
jamal

2005-04-10 14:57:15

by James Morris

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On 10 Apr 2005, jamal wrote:

> Thats what the original motivation for konnector was. To make it easy
> for joe dumbass.

Who you really want writing kernel code :-)


- James
--
James Morris
<[email protected]>


2005-04-10 15:09:18

by jamal

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Sun, 2005-04-10 at 10:56, James Morris wrote:
> On 10 Apr 2005, jamal wrote:
>
> > Thats what the original motivation for konnector was. To make it easy
> > for joe dumbass.
>
> Who you really want writing kernel code :-)

Ok, let me take that back then ;->
The value is in allowing people who are kernel hackers that are trying
to focus on an entirely different problem to not really know much about
the internals of the messaging subsystem (if you wanna call netlink
that). A good example will be the fork patches which were posted
recently.

cheers,
jamal

2005-04-10 19:27:42

by Thomas Graf

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

* jamal <[email protected]> 2005-04-10 10:39
> Please crosspost on netdev - you should know that by now;->
>
> I actually disagreee with Herbert on this. Theres definetely good
> need to have a more usable messaging system that rides on top of
> netlink. It is not that netlink cant be extended (I actually think thats
> a separate topic)

I find it quite easy already but I guess a few macros would improve
it even more. The routing attribute macros could be made generic to
so can benefit from the advanages of TLVs.

Evgeniy, Sorry for not having time earlier to give your patch a
review. I'm not yet through completely and won't comment on the
overall architecture until I have understood it all.

diff -Nru /tmp/empty/cn_queue.c linux-2.6/drivers/connector/cn_queue.c
--- /tmp/empty/cn_queue.c 1970-01-01 03:00:00.000000000 +0300
+++ linux-2.6/drivers/connector/cn_queue.c 2004-09-24 00:01:00.000000000
+int cn_queue_add_callback(struct cn_queue_dev *dev, struct cn_callback *cb)
+{
+ struct cn_callback_entry *cbq, *n, *__cbq;
+ int found = 0;
+
+ cbq = cn_queue_alloc_callback_entry(cb);
+ if (!cbq)
+ return -ENOMEM;
+
+ atomic_inc(&dev->refcnt);
+ cbq->pdev = dev;
+
+ spin_lock(&dev->queue_lock);
+ list_for_each_entry_safe(__cbq, n, &dev->queue_list, callback_entry) {

Why _safe? There is no way a entry can be removed here.

+ if (cn_cb_equal(&__cbq->cb->id, &cb->id)) {
+ found = 1;
+ break;
+ }
+ }
diff -Nru /tmp/empty/connector.c linux-2.6/drivers/connector/connector.c
--- /tmp/empty/connector.c 1970-01-01 03:00:00.000000000 +0300
+++ linux-2.6/drivers/connector/connector.c 2004-09-24 00:01:00.000000000
+void cn_netlink_send(struct cn_msg *msg, u32 __groups)
+{
+ struct cn_callback_entry *n, *__cbq;
+ unsigned int size;
+ struct sk_buff *skb;
+ struct nlmsghdr *nlh;
+ struct cn_msg *data;
+ struct cn_dev *dev = &cdev;
+ u32 groups = 0;
+ int found = 0;
+
+ if (!__groups)
+ {
+ spin_lock(&dev->cbdev->queue_lock);
+ list_for_each_entry_safe(__cbq, n, &dev->cbdev->queue_list, callback_entry) {

Same here

+ if (cn_cb_equal(&__cbq->cb->id, &msg->id)) {
+ found = 1;
+ groups = __cbq->group;
+ }
+ }
+ spin_unlock(&dev->cbdev->queue_lock);
+
+ if (!found) {
+ printk(KERN_ERR "Failed to find multicast netlink group for callback[0x%x.0x%x]. seq=%u\n",
+ msg->id.idx, msg->id.val, msg->seq);
+ return;
+ }
+ }
+ else
+ groups = __groups;
+
+ size = NLMSG_SPACE(sizeof(*msg) + msg->len);
+
+ skb = alloc_skb(size, GFP_ATOMIC);
+ if (!skb) {
+ printk(KERN_ERR "Failed to allocate new skb with size=%u.\n", size);
+ return;
+ }
+
+ nlh = NLMSG_PUT(skb, 0, msg->seq, NLMSG_DONE, size - sizeof(*nlh));

This is not correct, what happens is:
size = NLMSG_SPACE(sizeof(*msg) + msg->len);
--> align(hdr)+align(data)
size - sizeof(*nlh)
--> (align(hdr)-hdr)+align(data)
NLMSG_PUT pads again to get to the end of the data block (NLMSG_LENGTH)
--> align(hdr)+(align(hdr)-hdr)+align(data)

At the moment align(hdr) == hdr since nlmsghdr is already aligned
but this might change and your code will break.

2005-04-11 05:23:36

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Sun, Apr 10, 2005 at 09:27:27PM +0200, Thomas Graf ([email protected]) wrote:
> * jamal <[email protected]> 2005-04-10 10:39
> > Please crosspost on netdev - you should know that by now;->
> >
> > I actually disagreee with Herbert on this. Theres definetely good
> > need to have a more usable messaging system that rides on top of
> > netlink. It is not that netlink cant be extended (I actually think thats
> > a separate topic)
>
> I find it quite easy already but I guess a few macros would improve
> it even more. The routing attribute macros could be made generic to
> so can benefit from the advanages of TLVs.
>
> Evgeniy, Sorry for not having time earlier to give your patch a
> review. I'm not yet through completely and won't comment on the
> overall architecture until I have understood it all.
>
> diff -Nru /tmp/empty/cn_queue.c linux-2.6/drivers/connector/cn_queue.c
> --- /tmp/empty/cn_queue.c 1970-01-01 03:00:00.000000000 +0300
> +++ linux-2.6/drivers/connector/cn_queue.c 2004-09-24 00:01:00.000000000
> +int cn_queue_add_callback(struct cn_queue_dev *dev, struct cn_callback *cb)
> +{
> + struct cn_callback_entry *cbq, *n, *__cbq;
> + int found = 0;
> +
> + cbq = cn_queue_alloc_callback_entry(cb);
> + if (!cbq)
> + return -ENOMEM;
> +
> + atomic_inc(&dev->refcnt);
> + cbq->pdev = dev;
> +
> + spin_lock(&dev->queue_lock);
> + list_for_each_entry_safe(__cbq, n, &dev->queue_list, callback_entry) {
>
> Why _safe? There is no way a entry can be removed here.

No particular reason - it easier to copy-paste it from other line :)

> + if (cn_cb_equal(&__cbq->cb->id, &cb->id)) {
> + found = 1;
> + break;
> + }
> + }
> diff -Nru /tmp/empty/connector.c linux-2.6/drivers/connector/connector.c
> --- /tmp/empty/connector.c 1970-01-01 03:00:00.000000000 +0300
> +++ linux-2.6/drivers/connector/connector.c 2004-09-24 00:01:00.000000000
> +void cn_netlink_send(struct cn_msg *msg, u32 __groups)
> +{
> + struct cn_callback_entry *n, *__cbq;
> + unsigned int size;
> + struct sk_buff *skb;
> + struct nlmsghdr *nlh;
> + struct cn_msg *data;
> + struct cn_dev *dev = &cdev;
> + u32 groups = 0;
> + int found = 0;
> +
> + if (!__groups)
> + {
> + spin_lock(&dev->cbdev->queue_lock);
> + list_for_each_entry_safe(__cbq, n, &dev->cbdev->queue_list, callback_entry) {
>
> Same here
>
> + if (cn_cb_equal(&__cbq->cb->id, &msg->id)) {
> + found = 1;
> + groups = __cbq->group;
> + }
> + }
> + spin_unlock(&dev->cbdev->queue_lock);
> +
> + if (!found) {
> + printk(KERN_ERR "Failed to find multicast netlink group for callback[0x%x.0x%x]. seq=%u\n",
> + msg->id.idx, msg->id.val, msg->seq);
> + return;
> + }
> + }
> + else
> + groups = __groups;
> +
> + size = NLMSG_SPACE(sizeof(*msg) + msg->len);
> +
> + skb = alloc_skb(size, GFP_ATOMIC);
> + if (!skb) {
> + printk(KERN_ERR "Failed to allocate new skb with size=%u.\n", size);
> + return;
> + }
> +
> + nlh = NLMSG_PUT(skb, 0, msg->seq, NLMSG_DONE, size - sizeof(*nlh));
>
> This is not correct, what happens is:
> size = NLMSG_SPACE(sizeof(*msg) + msg->len);
> --> align(hdr)+align(data)
> size - sizeof(*nlh)
> --> (align(hdr)-hdr)+align(data)
> NLMSG_PUT pads again to get to the end of the data block (NLMSG_LENGTH)
> --> align(hdr)+(align(hdr)-hdr)+align(data)
>
> At the moment align(hdr) == hdr since nlmsghdr is already aligned
> but this might change and your code will break.

As far as I remember, header is always supposed to be aligned properly
"by design", so it even could be nonaligned here.

--
Evgeniy Polyakov ( s0mbre )

2005-04-11 10:45:37

by Thomas Graf

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

* Evgeniy Polyakov <[email protected]> 2005-04-11 09:22
> On Sun, Apr 10, 2005 at 09:27:27PM +0200, Thomas Graf ([email protected]) wrote:
> > + size = NLMSG_SPACE(sizeof(*msg) + msg->len);
> > +
> > + skb = alloc_skb(size, GFP_ATOMIC);
> > + if (!skb) {
> > + printk(KERN_ERR "Failed to allocate new skb with size=%u.\n", size);
> > + return;
> > + }
> > +
> > + nlh = NLMSG_PUT(skb, 0, msg->seq, NLMSG_DONE, size - sizeof(*nlh));
> >
> > This is not correct, what happens is:
> > size = NLMSG_SPACE(sizeof(*msg) + msg->len);
> > --> align(hdr)+align(data)
> > size - sizeof(*nlh)
> > --> (align(hdr)-hdr)+align(data)
> > NLMSG_PUT pads again to get to the end of the data block (NLMSG_LENGTH)
> > --> align(hdr)+(align(hdr)-hdr)+align(data)
> >
> > At the moment align(hdr) == hdr since nlmsghdr is already aligned
> > but this might change and your code will break.
>
> As far as I remember, header is always supposed to be aligned properly
> "by design", so it even could be nonaligned here.

No, have a look at the macros:

#define NLMSG_LENGTH(len) ((len)+NLMSG_ALIGN(sizeof(struct nlmsghdr)))
#define NLMSG_SPACE(len) NLMSG_ALIGN(NLMSG_LENGTH(len))

NLMSG_LENGTH points to the end of the payload in the message, NLMSG_SPACE
represents the total size aligned properly for a possible next multipart
message.

It is unlikely that nlmsghdr will ever be unaligned but there can be no
reason to introduce code that can break with perfectly legal changes just
because of that.

2005-04-11 11:17:17

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

On Mon, 2005-04-11 at 12:45 +0200, Thomas Graf wrote:
> * Evgeniy Polyakov <[email protected]> 2005-04-11 09:22
> > On Sun, Apr 10, 2005 at 09:27:27PM +0200, Thomas Graf ([email protected]) wrote:
> > > + size = NLMSG_SPACE(sizeof(*msg) + msg->len);
> > > +
> > > + skb = alloc_skb(size, GFP_ATOMIC);
> > > + if (!skb) {
> > > + printk(KERN_ERR "Failed to allocate new skb with size=%u.\n", size);
> > > + return;
> > > + }
> > > +
> > > + nlh = NLMSG_PUT(skb, 0, msg->seq, NLMSG_DONE, size - sizeof(*nlh));
> > >
> > > This is not correct, what happens is:
> > > size = NLMSG_SPACE(sizeof(*msg) + msg->len);
> > > --> align(hdr)+align(data)
> > > size - sizeof(*nlh)
> > > --> (align(hdr)-hdr)+align(data)
> > > NLMSG_PUT pads again to get to the end of the data block (NLMSG_LENGTH)
> > > --> align(hdr)+(align(hdr)-hdr)+align(data)
> > >
> > > At the moment align(hdr) == hdr since nlmsghdr is already aligned
> > > but this might change and your code will break.
> >
> > As far as I remember, header is always supposed to be aligned properly
> > "by design", so it even could be nonaligned here.
>
> No, have a look at the macros:
>
> #define NLMSG_LENGTH(len) ((len)+NLMSG_ALIGN(sizeof(struct nlmsghdr)))
> #define NLMSG_SPACE(len) NLMSG_ALIGN(NLMSG_LENGTH(len))
>
> NLMSG_LENGTH points to the end of the payload in the message, NLMSG_SPACE
> represents the total size aligned properly for a possible next multipart
> message.
>
> It is unlikely that nlmsghdr will ever be unaligned but there can be no
> reason to introduce code that can break with perfectly legal changes just
> because of that.

But NLMSG_ALIGN(sizeof(hdr)) is always equal to sizeof(hdr), that size
was select not accidentally.

I will change it, no problem, it is good from aesthetic point of view.

--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part