Proxy-ARP daemon

27/10/2017 3:11 pm Lutz Donnerhacke

Buggy xDSL is an ongoing problem here. We solved it by reducing the netmasks significantly. During the last months, rigid first-hop security filters were introduced into the DSLAMs and different CPEs, unable to deal with short netmasks, occured. We had to really solve the problem.

Problem

Large networks come with large numbers of visible MAC addresses. Several devices (i.e. DSLAMs) behave strangely in such situations, therefore network operators try hard to prevent leakage of frames into other areas. Blocking traffic from a satellite location to a different satellite location is typically called split-horizon. Clients in different parts of the network are unable to communicate with each other.

Large layer 2 networks are hard to maintain. Therefore, several security measurements should be implemented at the satellite locations with few clients. Especially DSLAMs are prone to broken first-hop security implementation, which effectively requires to disable those filters or to drop support of non-standard clients,

Often first hop security filters do sniff the communication for DHCP and adjust the filters accordingly. Clients with static IPs do not need to use DHCP (and often have and static configuration instead), which causes the filters to fail to learn the used IP address. Other clients do negotiate more than one IP address, or have full ranges of statically assigned IP addresses. In all such cases, the DLSAM filters are too simple to cover with the required setup.

A common first hop security filter simply drops all broadcast traffic not intended to the learned IP address. The rationale behind this type of filter is, that clients, which already know the MAC of the destination are allowed to reach this device. The canonical way to learn about the destination MAC is to broadcast an ARP packet for desired IP address. So if the filter blocks the distribution of those ARP broadcasts to different clients, only valid communication can be established.

Combining DHCP sniffing, broadcast filters, and split-horizon creates a typical xDSL network, where each client can only communicate with the central router(s). Advanced clients cannot communicate at all.

Simple solution

If only a single router is attached to such a network, enabling local-proxy-arp on this device solves large parts of problem: For every ARP request from a client, the router responds with its own MAC address. So client-client communication is hair pinned at the router interface.

Because a router learns the client MAC from the ARP request for the default gateway, it may not need to ask for the client itself. Several routers refresh expiring ARP entries by sending unicast requests, hence the broadcast filter may not cause notifiable trouble.

If there are more than one router or server in the network, the situation becomes complicated. Local-proxy-arp can't be used anymore, because each of the routers will quickly learn each other's MAC address for the client IPs, causing loops.

On the other hand some services, like DHCP, try to verify, that free IP addresses are unused, by arping for that IP. Any generic local-proxy-arp response would cause trouble to such applications.

Different approach

In order to keep the network running, a new daemon (parpd) provides the necessary ARP replies. Depending on configuration rules, the utility respond to ARP requests with the real MAC address of the device or the MAC address of a router (redirect).

It does learn the real MAC-IP pairs by listening to broadcast ARP queries and gratuitous ARP requests. Of course, it does not learn from ARP probes or replies not originated by the device owning the IP. parpd does refresh it's ARP cache using unicast ARP requests. In order to obtain MAC addresses for a redirect response, the request may be broadcasted.

Responses can be delayed, which effectively ignores the first set of requests over a (short) period. This way, special services can probe for the non-existence of an IP, while obtaining necessary answers for real communication.

The customizable set of rules allows adapting the behaviour to complex scenarios.

Example

How to configure the daemon? This way:

cache
 timeout       302     # seconds
 tablesize     3499     # expecting about 10000 entries
 refresh       3*5     # 3 retries a 5 seconds each
 delay         4*3     # respond at 4th retry in 3 seconds
end

interface em0
 timeout       1.011
 # do not respond for queries to our own infrastructure
 rule          0.0.0.0/0        198.51.100.0/29    ignore
 # delay queries from the DHCP server
 rule          198.51.100.4/32  198.51.100.0/24    delay tell
 # help the routers/servers to reach the clients
 rule          198.51.100.0/29  198.51.100.0/24    tell
 # interclient communication through hairpinning at the default gateway
 rule          198.51.100.0/24  198.51.100.0/24    198.51.100.1
 # help erroneous clients arping for everything
 rule          198.51.100.0/24  0.0.0.0/0          verbose 198.51.100.1
 # multihomed server with weak host model
 rule          192.0.2.0/24     198.51.100.0/24    tell
 # show missing entries
 rule          0.0.0.0/0        0.0.0.0/0          verbose ignore
end

555

Die Software rennt nun schon seit ich davon berichtet produktiv. Keinen zusätzlichen RAM-Bedarf. Keine CPU Auffälligkeiten.

Und ja, man kann beliebig viele davon in Betrieb nehmen. Aktuell rennen hier sechs Instanzen pro VLAN.

Hallo Lutz,

sehr interessante/r Artikel.
Wir betreiben ein ähnliches Netz wie du es beschreibst und stehen vor ähnlichen Problemen.

Stellt der parpd dann nicht auch wieder einen single-point-of failure dar (lässt sich der proxy redundant aufbauen)?
Wie sind deine (langzeit-)Erfahrung mit der Software?

Dieses Netz ist ein Layer2-Netz. Die CPE macht auf dem WAN-Bein stinknormales DHCP. Kein PPP(oE).

Dem ganzen unterliegt ein Denkfehler anzunehmen der 1. Providerrouter aus Kundensicht (BRAS - Broadband Remote Access Server) würde sich wie ein Default-Router in einem Ethernet-Segment verhalten. Dieses ist aber nicht so! Vielmehr baut jeder Kunde mit seinem DSL-Router eine PPPoE-Verbindung, also eine Tunnel-Verbindung zum BRAS-Router auf und bekommt eine Subnetmaske von 255.255.255.255 zugewiesen, was bedeutet, es gibt gar keine Nachbarn im gleichen Subnetz, sondern nur einen einzigen Weg und zwar zum BRAS-Router! Verkehr unter Kunden ist also in jedem Fall problemlos möglich und bedarf keinem Lokal-Proxy-ARP.

parpd.8.txt:
"The configuration is contains outmost options as well as a cache and an interface section."
Da fehlt etwas, oder das "is" ist zuviel.

Total 6 comments

Mon	Tue	Wed	Thu	Fri	Sat	Sun
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31