Logging the client IP behind Amazon ELB with Apache

How can Apache HTTP Server log the actual remote client IP address when an Amazon Elastic Load Balancer (ELB) is proxying the client HTTP requests?  The solution below involves SetEnvIf.

An ELB sets REMOTE_ADDR to the load balancer IP and sets the X-Forwarded-For HTTP header to a comma-delimited string of ip-addresses like client, proxy1, proxy2.

The various solutions I've seen for logging client IP suggest replacing %h (for REMOTE_ADDR) in the NCSA common log format (%h %l %u %t \"%r\" %>s %O) with the X-Forwarded-For header:

LogFormat "\"%{X-Forwarded-For}i\" %l %u %t \"%r\" %>s %O xfwd_common

This approach has two problems:

Broken log formatting
Comma-separated IP addresses violate the NCSA common and combined log formats and generally breaks applications that attempt to extract the log fields.

Above I added quotes around X-Forwarded-For to make it easier to extract by regex.   Supporting this modified format in Splunk involves adapting the access-extractions transform to use [[qstring:clientip]]  (quoted string) instead of [[nspaces:clientip]] (no-spaces string).

Missing IP for unproxied requests
Direct or unproxied HTTP requests lack the X-Forwarded-For header, so the clientip is logged as "".    If all clients connect via the load balancer this won't happen, but in practice developers and monitoring agents may want to skip the load balancer.

Solution for logging the true client IP

I've worked out how to fix the log formatting and log unproxied IPs by using SetEnvIf to log the remote client IP whether the request is direct or proxied:

SetEnvIf X-Forwarded-For "^([0-9.]+)" CLIENTIP=$1
LogFormat "%{CLIENTIP}e %D %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" trueip_combined

The first line sets the environment variable CLIENTIP to the value of REMOTE_ADDR.
The second line then overwrites CLIENTIP with the first component of X-Forwarded-For if available.
The third line defines the custom trueip_combined log format that uses CLIENTIP in place of %h.
It also uses %D in the place of the never-used ident field (%l) to log request latency in microseconds.

The one downside is that depending on how ELB treats X-Forwarded-For, it may allow clients to spoof their source IP.

Hope people find this useful.

Popular posts from this blog

Cutting down on clutter with the Outbox Method

A comparison of file synchronisation software