weird linux networking issue

shawn

running dog lackey of the oppressor class
Administrator
Joined
Mar 13, 2005
Location
Raleigh, NC
First off, I can replicate this in CentOS 5.3, 5.4, Ubuntu 9.04, 9.10 desktop, server, and netbook. It happens running natively and under VirtualBox.

The way I have it set up at the moment is the dhcp server (sep machine) hands out static IP leases to macs it recognizes. Everybody else gets a random IP. The same box is dns for the network. It's a little D-Link wireless router running dd-wrt.

My laptop connects wirelessly, gets a random IP, works all the time, every time. That's running Ubuntu 9.04.

The other machines connect, get their static lease, get their dns server, but basically have no dns resolution. In fact, it doesn't look like they can get out of the subnet by IP only. Ping by hostname gets unknown host, ping by remote IP times out. Ping local subnet IP responds normally.

Sometimes, if I start dhclient or restart networking, things will work normally for a few minutes or an hour. Sooner or later, it craps out again.

Windows machines on the same network function normally.

At first I thought it was a VirtualBox issue, but tried a native install and got the same thing. Then I thought it was the router, but can't understand why it works sometimes (or under windows XP and 7) but not others.

I'd have to double check, but I'm 90% sure that if I gave the same machine a random DHCP IP, it works normally. But I need it to have a static IP.

Anybody have any insight? If I could just figure out the source of the problem, I'd at least have something I could look up.
 
Oh, and manually adding dns servers to resolv.conf doesn't change anything.

Edit: if it were to have anything to do with the router... my guess would be domain settings. Here's the dnsmasq config:

Code:
addn-hosts=/tmp/dlhosts
local=/local/
expand-hosts

Running hostname on the client machine only returns "mutt", not "mutt.local"
 
what does your route table look like?


also test connectivity to the dns servers...


telnet ip.of.dns.sever 53

you should atleast get a connection.
 
Route table on a 'bad' machine looks like this:

Code:
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.1.0     *               255.255.255.0   U     0      0        0 eth1
169.254.0.0     *               255.255.0.0     U     0      0        0 eth1
default         Linus.local     0.0.0.0         UG    0      0        0 eth1

On the (working) laptop:
Code:
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.1.0     *               255.255.255.0   U     2      0        0 eth1
link-local      *               255.255.0.0     U     1000   0        0 eth1
default         Linus.local     0.0.0.0         UG    0      0        0 eth1

The bad machine times out on remote IPs, responds "protocol not available" when telnetting to 192.168.1.1 on 53. Good machine works as expected.
 
Looking at the route table on the bad machine it's almost as if it couldn't renew the lease with the dhcp server. When it's not working, what is the IP address? Also what is your lease life cycle for the static IPs?
 
something is up with the tcp stack...

your routes look fine. anything noteworthy in the logs?

for shits-n-giggles run a tcpdump while trying to connect to the dns server or any other device by ip.
 
The IP address is always correct. It looks like it connects, gets its static lease, gets resolv.conf, etc.

The static leases are set in dnsmasq.leases like this:
Code:
dhcp-host=00:11:22:aa:bb:cc,hostname,192.168.1.101,infinite

I don't know if I have fixed it or not... but I noticed in the route table that the gateway was "Linus.local", and that I couldn't ping "Linus.local" from anywhere. Went through the router settings and changed the hostname, router name, and everything to lowercase, rebooted everybody, and it's working (for now).

Best I can do is wait a couple of hours and see if the linux static leases can still get out.

Edit: nevermind, that didn't stick.
 
My initial feeling is either the router hardware failing or something pooching within dd-wrt... anything being logged there?

Any iptables/ipchains/firewall stuff happening on the Linux clients?

Also, while TCP/IP are standards, I've seen crazy wild differences in interpretation between various OS's/FW...
 
It was a hardware failure all right, but not the router. Turned out to be an old Linksys router (the old-ass blue stackable kind) that was doing switch duty.

I tried everything, including swapping the dd-wrt router. When that didn't change anything, I started checking other hardware in the system. My guess is that the bad switch was dropping out intermittently. The windows machines were fault-tolerant enough to continue on... the linux machines balked at the faults sooner or later and gave up on the network.

Anyway, I pulled that switch out of the network and suddenly everything was a bit snappier. I've got a virtualbox session that's been running for three or four hours that's still working... figure if it's fine in the morning, I've fixed it.

Thanks for all the suggestions. Good to have some level-headed advice when weird shit happens.
 
Back
Top