This is my solution to make Apache HTTP Server log the actual remote client IP address from behind the Amazon Elastic Load Balancer (ELB), with a little help from SetEnvIf. [2 minutes]
Some suggest replacing %h (REMOTE_ADDR) in the NCSA common log format (%h %l %u %t \"%r\" %>s %O) with the X-Forwarded-For header:
This approach has two problems:
Above I added quotes around X-Forwarded-For to make it easier to extract by regex. Supporting this modified format in Splunk involves adapting the access-extractions transform to use [[qstring:clientip]] (quoted string) instead of [[nspaces:clientip]] (no-spaces string).
The third line defines the custom trueip_combined log format that uses CLIENTIP in place of %h and uses %D in the place of the never-used ident field (%l) to log request latency in microseconds.
P.S. if you're running Wordpress on the EC2 instance, check out my how-to for optimizing Wordpress on EC2 micro instances.
The Problem
The problem is that the Elastic Load Balancer (ELB) sets REMOTE_ADDR to the load balancer IP address. However it also sets the X-Forwarded-For HTTP header to a comma-delimited string of ip-addresses like client, proxy1, proxy2.Some suggest replacing %h (REMOTE_ADDR) in the NCSA common log format (%h %l %u %t \"%r\" %>s %O) with the X-Forwarded-For header:
LogFormat "\"%{X-Forwarded-For}i\" %l %u %t \"%r\" %>s %O xfwd_common
This approach has two problems:
1. Broken log formatting
Comma-separated IP addresses violate the NCSA common and combined log formats and generally breaks applications that attempt to extract the log fields.Above I added quotes around X-Forwarded-For to make it easier to extract by regex. Supporting this modified format in Splunk involves adapting the access-extractions transform to use [[qstring:clientip]] (quoted string) instead of [[nspaces:clientip]] (no-spaces string).
2. Missing IP for unproxied requests
Direct or unproxied HTTP requests lack the X-Forwarded-For header, so the clientip is logged as "". If all clients connect via the load balancer this won't happen, but in practice developers and monitoring agents may want to skip the load balancer.The Solution
My solution for standard log formatting and logging of unproxied IPs uses SetEnvIf to log the remote client IP from REMOTE_ADDR initially, and overwrites it with the first component of the X-Forwarded-For header only if available, meaning the request is proxied:SetEnvIf REMOTE_ADDR "(.+)" CLIENTIP=$1 SetEnvIf X-Forwarded-For "^([0-9.]+)" CLIENTIP=$1 LogFormat "%{CLIENTIP}e %D %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" trueip_combined
The third line defines the custom trueip_combined log format that uses CLIENTIP in place of %h and uses %D in the place of the never-used ident field (%l) to log request latency in microseconds.
P.S. if you're running Wordpress on the EC2 instance, check out my how-to for optimizing Wordpress on EC2 micro instances.
Awesome article. Incorporated this into my web server AMI this evening.
ReplyDeleteHUGE help. Many thanks!
ReplyDeleteThankyou so much! :)
ReplyDeleteThanks for the info. However, I added it to my /etc/apache2/sites-available for this domain as you suggested, and I'm still getting the ELB internal ip.
ReplyDeleteI tested printing the $_SERVER["HTTP_X_FORWARDED_FOR"] value on a php page and i do get the forwarded ip. Do you have any suggestions? I'm really scratching my head on this one...
Terry,
ReplyDeleteTry adding the values to httpd.conf.