Archive for September, 2007

The long road to Conary 1.2

Thursday, September 27th, 2007

Conary 1.2 is finally out!

We are very excited, I think this is a great achievement for Conary. It packs three months of work behind the scenes on several new features and a ton of bug fixes, while we were maintaining the former stable Conary 1.1 branch. The release announcement is pretty long, as a result.

Many thanks to the Foresight community, who agreed to try some early releases and provided valuable feedback (not to mention uncovering those minor things that we like to call undiscovered features, for lack of a better term :-) ).

A head-scratching TCP/IP problem

Wednesday, September 12th, 2007

Here’s an interesting scenario I ran into.

You have a program that:

  • looks for an empty port in the high numbers
  • forks
  • in the child
    • it does some computation first (this could take some time)
    • it binds and listens to the port determined in the first step
    • starts processing requests of some sort (like HTTP requests, but any TCP would do)
  • in the parent
    • it has to wait for the child to start serving requests, so in a loop it will start communicating with the child. If it gets an ECONNREFUSED, wait some time (a tenth or a hundredth of a second) and try again
    • do a bunch of work here

This all looks reasonable so far. Except that, every now and then, (sometimes more frequently, sometimes very rarely) I was seeing the parent process stuck. A quick /usr/sbin/lsof was showing the most disturbing thing: a socket connected to itself!

27945 misa 10r IPv4 73626638 TCP localhost.localdomain:51308->localhost.localdomain:51308 (ESTABLISHED)

All you can do when you see this is scratch your head and wonder what in the world is going on.

After much running around and lots of tracing of the code, it finally occurred to me what happens.

For the explanation that follows, I will call it a local port the port number used by the socket locally (as returned by getsockname(2)).

Every time you have a socket and you try to connect(2) it (to a remote, known port), the socket library will pick an unused high-numbered port for its end of the connection (as the local port). If the attempt to connect(2) the socket fails and you try again, it will temporarily bind to another port (the’re usually consecutive, until they wrap around and start over).

If you try long enough without succeeding (and it is key that the server and client are both on the same machine – I used localhost for both), connect(2) will pick the port you initially assigned to the server as the local port, and will connect to itself!

One possible solution is to pick an unused port and bind to it before connecting. Then connect(2) will no longer bind to random other ports.