Category Archives: Linux

A head-scratching TCP/IP problem

Here’s an interesting scenario I ran into.

You have a program that:

  • looks for an empty port in the high numbers
  • forks
  • in the child
    • it does some computation first (this could take some time)
    • it binds and listens to the port determined in the first step
    • starts processing requests of some sort (like HTTP requests, but any TCP would do)
  • in the parent
    • it has to wait for the child to start serving requests, so in a loop it will start communicating with the child. If it gets an ECONNREFUSED, wait some time (a tenth or a hundredth of a second) and try again
    • do a bunch of work here

This all looks reasonable so far. Except that, every now and then, (sometimes more frequently, sometimes very rarely) I was seeing the parent process stuck. A quick /usr/sbin/lsof was showing the most disturbing thing: a socket connected to itself!

27945 misa 10r IPv4 73626638 TCP localhost.localdomain:51308->localhost.localdomain:51308 (ESTABLISHED)

All you can do when you see this is scratch your head and wonder what in the world is going on.

After much running around and lots of tracing of the code, it finally occurred to me what happens.

For the explanation that follows, I will call it a local port the port number used by the socket locally (as returned by getsockname(2)).

Every time you have a socket and you try to connect(2) it (to a remote, known port), the socket library will pick an unused high-numbered port for its end of the connection (as the local port). If the attempt to connect(2) the socket fails and you try again, it will temporarily bind to another port (the’re usually consecutive, until they wrap around and start over).

If you try long enough without succeeding (and it is key that the server and client are both on the same machine – I used localhost for both), connect(2) will pick the port you initially assigned to the server as the local port, and will connect to itself!

One possible solution is to pick an unused port and bind to it before connecting. Then connect(2) will no longer bind to random other ports.

ext2online is gone

I used to use ext2online in conjunction with LVM whenever I had to resize a partition that was already mounted. I haven’t had to do that in a while, so I was surprised that I couldn’t find ext2online anymore.

Turns out more modern versions of resize2fs already know how to do that. Not the ones from, but I was able to install into a temporary root and run the new resize2fs from there. Yay.

Adventures with SCons

I’ve been playing with SCons for the past couple of days. It’s intended to be a replacement for Make, and probably sounds similar enough with Ant or Maven, for those familiar with these tools from the Java world.

It’s pretty powerful in that it lets you use the boilerplate builders or you can build your own builders (and nodes!) too. It also allows you to write custom “freshness checks”. make usually verifies if a node is out of date by comparing the timestamps for the source and target nodes. This can get you in trouble when using CVS, for instance, because clocks are not synchronized. It’s also not very useful when what you build doesn’t live on the filesystem.

I will post some examples  shortly. I am currently creating nodes for Mercurial checkouts and they work pretty well. CVC (Conary) nodes will follow shortly.

And yes,  it’s written in Python…

What I’ve been up to

It’s been a while since my last post, so here’s what’s been going on.

  •  Very busy working on Conary (though still not making progress fast enough through my issues list).
  • Speaking of that, I’ve reached the respectable 6 months with rPath.
  • Moved over the weekend.
  • Packaged gnucash 2.1.0 – and hit some guile issues in the process. Hopefully will finish tomorrow.
  • Does “April 17th” tell you something?
  • Before moving, a bunch of repairs made – that was all my spare time went.
  • Ran an orienteering event at the Schenck forest (and I missed a bunch of controls). Got to cross a 20-ish-ft river over a rotten log 3 times before the heavy rain started. I was completely soaked when I finished. But it was fun.
  • Did not run as much lately because of my evenings being “a different workout”.

The Mercurial Plugin for Jira (or Read the Code, Luke)

As Matt (the author of the Mercurial plugin for Jira) pointed out in his comment, there was an issue with the permissions for the plugin. Seemingly random people were able to see the Mercurial Commits tab, and all along I thought I messed something up when I ported the plugin from Jira 3.6.2 to Jira 3.6.5 and then to Jira 3.7. (Yes, I know Jira 3.8 is out, we didn’t schedule the migration yet).

Lately I’ve been busy closing bugs in Conary land, and haven’t got the time to go back and investigate what’s going on. Last week I finally decided I should look at the code – and it became very obvious. There is a View Version Control permission that controls who can see what, and it turned out only several groups were granted that permission. We’ve only allowed access to commits to internal users for now, but that may change in the future.

Also, yesterday I noticed that Jira was not indexing the Mercurial repositories anymore. As usual, catalina.out is full of useless messages, so reading the code again pointed out that I got the configuration wrong. Funny it did work at all. Turns out hg.clonedir.idx is indeed supposed to be the top directory where your Mercurial clones are, and not the directory where you cloned the repository. That is derived from the URL. Doh!

NetworkManager and virtual interfaces, again.

Zeus has a point in his comment, I should blog more often.

Here’s the first: NetworkManager again. I noticed that it doesn’t bring my interface alias up anymore. Must have happened after some upgrade. I am not sure.

As I was trying to figure out what’s going on, I opened the network management interface and looked at various things – and it started to work after that. Kind of frightening.

Even more frightening is looking at /etc/sysconfig/network-scripts/ifup-aliases. Looks like there are people who need such a large number of network interfaces, that they needed to assign them in ranges, hence the ifcfg-$DEV-range* files.

OK, this is not that exciting of a post. I’ll do better next time.