[dsc] Patch for build on Solaris9 and multiple interfaces fix

Alexander Gall gall at switch.ch
Mon Dec 17 10:17:18 UTC 2007


Hello Duane,

On Fri, 14 Dec 2007 14:41:02 -0700 (MST), Duane Wessels <wessels at measurement-factory.com> said:

> On Wed, 5 Dec 2007, Alexander Gall wrote:

>> 
>> I have used the attached patch to build the DSC collector (version
>> 200706121022) on Solaris 9.  The header file fixes are trivial, but I
>> discovered a bug that affects capturing on multiple interfaces.  I
>> think this also solves the issue "Multiple interfaces" reported on
>> this list in November (I just joined the list and had a quick look in
>> the archive).
>> 
>> The problem is simply that FD_ISSET checks the original FD set rather
>> than the one returned by select().  The fix is obvious (but one also
>> needs to cover the case when select() returns after a timeout).

> The current select() behavior is intentional.  There is an oldish
> known problem on some operating systems where they don't always set
> the FD for reading.  The suggested workaround is to always try
> reading from the pcap whenever select returns.  See
> http://www.tcpdump.org/lists/workers/2002/09/msg00033.html

I see.

> That message *is* five years old, so maybe this is no longer a
> problem with current pcap implementations.  If anyone knows for
> sure then we could try doing it the "right" way.  But I also believe
> that the workaround doesn't create any significant performance
> penalties.

Performance was not the problem, I think, but I've had problems with
dual-homed name servers on Solaris and Linux dropping most of the
captured packets anyway.  Two of them are getting around 700 queries
per second peak load and one about 1200 qps.  On all of them, DSC with
the original code only registered on the order of a few ten queries
per second.  I could increase that to maybe one or two hundred qps by
playing with the timeout value in the pcap_dispatch() call, but most
of the packets were still dropped somewhere.

For Solaris, I suspect that the drops occured somewhere in the STREAMS
module chain, e.g. when libpcap is waiting too long on a file
descriptor that hasn't got any packets, the queue of the other fd is
overflowing (STREAMS buffers appear to be quite small, like 64KiB or
so).  I'm not really sure, though.

I'm currently using my patch on Solaris 9 and Linux 2.6.12.  It works
perfectly on those systems, so they appear to handle select() on
libpcap file descriptors correctly.  I think it would make sense to do
it "the right way" on these OSs.

> I also recently discovered and fixed a bug with multiple interfaces.
> I found that if you have interfaces with differnt pcap datalink
> types, then only the last one would be read.  I found it by trying
> to read from both loopback and a physical interface.  Now DSC stores
> the datalink handler function with each pcap.

OK.  In my case, all interfaces are of the same datalink types.

-- 
Alex




More information about the dsc mailing list