Buggy xDSL is an ongoing problem here. We solved it by reducing the netmasks significantly. During the last months, rigid first-hop security filters were introduced into the DSLAMs and different CPEs, unable to deal with short netmasks, occured. We had to really solve the problem.
Large networks come with large numbers of visible MAC addresses. Several devices (i.e. DSLAMs) behave strangely in such situations, therefore network operators try hard to prevent leakage of frames into other areas. Blocking traffic from a satellite location to a different satellite location is typically called split-horizon. Clients in different parts of the network are unable to communicate with each other.
Large layer 2 networks are hard to maintain. Therefore, several security measurements should be implemented at the satellite locations with few clients. Especially DSLAMs are prone to broken first-hop security implementation, which effectively requires to disable those filters or to drop support of non-standard clients,
Often first hop security filters do sniff the communication for DHCP and adjust the filters accordingly. Clients with static IPs do not need to use DHCP (and often have and static configuration instead), which causes the filters to fail to learn the used IP address. Other clients do negotiate more than one IP address, or have full ranges of statically assigned IP addresses. In all such cases, the DLSAM filters are too simple to cover with the required setup.
A common first hop security filter simply drops all broadcast traffic not intended to the learned IP address. The rationale behind this type of filter is, that clients, which already know the MAC of the destination are allowed to reach this device. The canonical way to learn about the destination MAC is to broadcast an ARP packet for desired IP address. So if the filter blocks the distribution of those ARP broadcasts to different clients, only valid communication can be established.
Combining DHCP sniffing, broadcast filters, and split-horizon creates a typical xDSL network, where each client can only communicate with the central router(s). Advanced clients cannot communicate at all.
If only a single router is attached to such a network, enabling local-proxy-arp on this device solves large parts of problem: For every ARP request from a client, the router responds with its own MAC address. So client-client communication is hair pinned at the router interface.
Because a router learns the client MAC from the ARP request for the default gateway, it may not need to ask for the client itself. Several routers refresh expiring ARP entries by sending unicast requests, hence the broadcast filter may not cause notifiable trouble.
If there are more than one router or server in the network, the situation becomes complicated. Local-proxy-arp can't be used anymore, because each of the routers will quickly learn each other's MAC address for the client IPs, causing loops.
On the other hand some services, like DHCP, try to verify, that free IP addresses are unused, by arping for that IP. Any generic local-proxy-arp response would cause trouble to such applications.
In order to keep the network running, a new daemon (parpd) provides the necessary ARP replies. Depending on configuration rules, the utility respond to ARP requests with the real MAC address of the device or the MAC address of a router (redirect).
It does learn the real MAC-IP pairs by listening to broadcast ARP queries and gratuitous ARP requests. Of course, it does not learn from ARP probes or replies not originated by the device owning the IP. parpd does refresh it's ARP cache using unicast ARP requests. In order to obtain MAC addresses for a redirect response, the request may be broadcasted.
Responses can be delayed, which effectively ignores the first set of requests over a (short) period. This way, special services can probe for the non-existence of an IP, while obtaining necessary answers for real communication.
The customizable set of rules allows adapting the behaviour to complex scenarios.
How to configure the daemon? This way:
cache timeout 302 # seconds tablesize 3499 # expecting about 10000 entries refresh 3*5 # 3 retries a 5 seconds each delay 4*3 # respond at 4th retry in 3 seconds end interface em0 timeout 1.011 # do not respond for queries to our own infrastructure rule 0.0.0.0/0 198.51.100.0/29 ignore # delay queries from the DHCP server rule 198.51.100.4/32 198.51.100.0/24 delay tell # help the routers/servers to reach the clients rule 198.51.100.0/29 198.51.100.0/24 tell # interclient communication through hairpinning at the default gateway rule 198.51.100.0/24 198.51.100.0/24 198.51.100.1 # help erroneous clients arping for everything rule 198.51.100.0/24 0.0.0.0/0 verbose 198.51.100.1 # multihomed server with weak host model rule 192.0.2.0/24 198.51.100.0/24 tell # show missing entries rule 0.0.0.0/0 0.0.0.0/0 verbose ignore end
Und ja, man kann beliebig viele davon in Betrieb nehmen. Aktuell rennen hier sechs Instanzen pro VLAN.
sehr interessante/r Artikel.
Wir betreiben ein ähnliches Netz wie du es beschreibst und stehen vor ähnlichen Problemen.
Stellt der parpd dann nicht auch wieder einen single-point-of failure dar (lässt sich der proxy redundant aufbauen)?
Wie sind deine (langzeit-)Erfahrung mit der Software?
"The configuration is contains outmost options as well as a cache and an interface section."
Da fehlt etwas, oder das "is" ist zuviel.
Total 6 comments