A roundup of seemingly inconspicuous things that we had to deal with as we went from zero to 200K POST requests/second at CleverTap. Things like engineering for reliable autoscaling (you don’t want to be caught off guard as traffic increases), optimising SSL handshakes to drive cost of data transfer and managing infrastructure on AWS
Always connected reverse SSH port forwarding with systemd
Replacing autossh, the de-facto for managing and monitoring SSH connections with systemd’s service.
Terms used
remote host
refers to a device running in a third-party managed network i.e you have no control over any networking equipment. Its public IP may or mayn’t changemanaged host
refers to a server/device whose SSH port is reachable
Assumptions
remote host
runs a distribution that uses systemd as an init system- A user named
callhome
on theremote host
is able to SSH using public key authentication asincoming@managed host
systemd service configuration
Create a systemd service unit by adding the below mentioned config to a file called /etc/systemd/system/call-home.service.
[Unit]
Description=Forward local SSH port to remote host
After=network-online.target
Before=multi-user.target
DefaultDependencies=no
[Service]
# SSH connection uses the private key stored in this
# users home dir (~/.ssh/)
User=callhome
# SSH connection with port forwarding
# Forwards local port 22 to port 5000
ExecStart=/usr/bin/ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ServerAliveInterval=20 -o ServerAliveCountMax=1 -o ExitOnForwardFailure=yes -N -T -R5000:localhost:22 incoming@managedhost.example.com
# wait 60 seconds before trying to restart the connection
# if it disconnects
RestartSec=60
# keep retrying no matter what
Restart=always
[Install]
WantedBy=multi-user.target
Ensure that this service starts at boot
root@raspberry-pi:~# systemctl enable call-home
Created symlink from /etc/systemd/system/multi-user.target.wants/call-home.service to /etc/systemd/system/call-home.service.
root@display1:~#
Start the service and test if port forwarding works
root@raspberry-pi:~# systemctl start call-home
# check to see if the connection was established
root@raspberry-pi:~# sudo journalctl -u call-home
Jun 25 18:03:00 raspberry-pi systemd[1]: Starting SSH reverse tunnelling...
Jun 25 18:03:00 raspberry-pi systemd[1]: Started SSH reverse tunnelling.
Jun 25 18:03:01 raspberry-pi ssh[23582]: Warning: Permanently added '1.2.3.4' (ECDSA) to the list of known hosts.
If everything worked, you should be able to connect to port 5000 on the managed host
, authenticate and reach remote host
,
like so:
root@ip-172-31-20-1 :~# ssh -p 5000 pi@127.0.0.1
pi@127.0.0.1's password:
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Sat Jun 25 20:17:28 2016 from localhost
pi@raspberry-pi:~ $
Troubleshooting
When attempting to connect from managed host to remote host you get error ssh: connect to host 127.0.0.1 port 5000: Connection refused
- On the
remote host
check if forwarding worked, like so:
pi@raspberry-pi:~ $ sudo journalctl -u call-home
Reverse proxying Kibana with Nginx at a subpath (/kibana)
Reverse proxying to Kibana when its hosted at root path (/) i.e https://kibana.tools.example.org/ works out-of-the-box. Problem is, most folks don’t have a dedicated certificate for each internal application. Instead, the common practice is to host apps via a subpath such as https://tools.example.org/kibana/. This is where things get tricky. Kibana supports a config variable called server.basePath that is supposed to set its base path so that all emitted links are prefix accordingly. This is supposed to make Kibana play nice when it behind a reverse proxy. As of today, the latest version still has issues ( #5171, #1555 and #6339)
Assuming you have kibana running on default port 5601, server.basePath set to “” and Nginx is configured to respond to hostname tools.example.org, the following Nginx configuration make Kibana play nice at subpath /kibana and /kibana/
-- snip --
# kibana is actually hosted at /app/kibana
# This redirect points it to the right direction
location = /kibana {
return 301 https://tools.example.org/app/kibana;
}
location = /kibana/ {
return 301 https://tools.example.org/app/kibana;
}
# by default Kibana redirects /app/kibana/ to /app/kibana
location = /app/kibana/ {
return 301 https://tools.example.org/app/kibana;
}
# this is where the app is served
location = /app/kibana {
proxy_pass http://kibana-host:5601;
}
# internal application links
location /app/kibana/ {
proxy_pass http://kibana-host:5601;
}
# static content is not relative to /app/kibana.
# instead its served at /bundles/*
# see https://github.com/elastic/kibana/issues/6339
location /bundles/ {
proxy_pass http://kibana-host:5601;
}
#
-- snip --
Logstash mysteriously returns connection reset when connecting to a HTTPS elasticsearch endpoint
I was on a wild goose chase today because connections to a newly setup elasticsearch fronted by nginx were failing from Logstash with error ‘Connection reset’. I was absolutely certain that the host trying to make a connection could connect. curl -v https://example.com/elasticsearch
worked. For some reason Logstash could not connect. I assumed I wasn’t setting some elasticsearch plugin parameters correctly.
[root@ip-172-10-1-246 conf.d]# /etc/init.d/logstash configtest
Mar 21, 2016 4:42:31 PM org.apache.http.impl.execchain.RetryExec execute
INFO: I/O exception (java.net.SocketException) caught when processing request to {s}->https://es.example.com:443: Connection reset
Mar 21, 2016 4:42:31 PM org.apache.http.impl.execchain.RetryExec execute
INFO: Retrying request to {s}->https://es.example.com:443
Mar 21, 2016 4:42:31 PM org.apache.http.impl.execchain.RetryExec execute
INFO: I/O exception (java.net.SocketException) caught when processing request to {s}->https://es.example.com:443: Connection reset
Mar 21, 2016 4:42:31 PM org.apache.http.impl.execchain.RetryExec execute
INFO: Retrying request to {s}->https://es.example.com:443
Mar 21, 2016 4:42:31 PM org.apache.http.impl.execchain.RetryExec execute
INFO: I/O exception (java.net.SocketException) caught when processing request to {s}->https://es.example.com:443: Connection reset
Mar 21, 2016 4:42:31 PM org.apache.http.impl.execchain.RetryExec execute
INFO: Retrying request to {s}->https://es.example.com:443
Connection reset {:class=>"Manticore::SocketException", :level=>:error}
Configuration OK
After hours of trial and elimination, I had it narrowed down to the JVM that was running logstash. It turns out, support for TLSv2 in OpenJDK 1.7 is not enabled by default. Adding -Dhttp.protocols=TLSv2
to java startup parameters does not help either. Upgrading to OpenJDK version 1.8 worked for me. I hope this quick note helps save time for someone.
References
Debian on a Linksys WRT1200ac / WRT1900ac WiFi router
In line with their popular, hacker friendly WRT54 series routers, Linksys released WRT1200ac and WRT1900ac in 2014. These new devices are beefed up versions of their predecessor. The WRT1900ac for instance ships with 128MB flash storage and 256MB DDR3 RAM powered by an ARM compliant Marvell Armada 370/XP SoC. Marvell over time has worked with the community to provide opensource WiFi drivers. While work continues on the driver, the device is fairly stable to run production workloads. I have ~40 devices concurrently connected during business hours on 2.4GHz and 5GHz. This top of the line embedded hardware opens up interesting new possibility – say hello to McDebian
McDebian is a complete Debian operating system for the new Linksys WRT routers. The kernel along with hardware specific DTB blob is written to MTD flash which enables the device to boot. Rootfs is stored on a USB key connected to the device. Suddenly, storage space is no longer a limitation.
Why McDebian on a router ?
- Debian maintains a wide range of packages.
- systemd for init
- A familiar networking stack running on the router
- A familiar root filesystems. No difference from what runs on your servers everyday.
- Upgrading drivers and packages is simple and straight forward
- Easy to create consistent backups that will save the day if things do go bad
- Chef or Puppet for configuration management.
- apt-get update; apt-get upgrade and – Poof! All your security updates applied.
It boots fast too
Don’t take my word for it. Here’s evidence
root@MCDEBIAN:~# systemd-analyze time
Startup finished in 6.891s (kernel) + 13.690s (userspace) = 20.581s
Deploying WordPress with SELinux enabled
SELinux can be a pain to work with at times, but that does not justify setting it to permissive mode or disabling it. WordPress’ popularity makes it a script kiddie’ favorite target. Other than always keeping your WordPress instance updated, you should be running httpd/Apache in a confined SELinux domain. It reduces the damage someone can do, if (at all) they manage to upload and execute files onto your webserver (ever noticed those random named, hidden binary files in /tmp owned by the user running your webserver ? )
Assumptions
- SELinux is installed and enabled
- WordPress is unzipped into /var/www/html/
- This has been tested on an Amazon AMI, but should work for all distributions that support SELinux
Getting started
Lets make sure that the stage is setup correctly
Ensuring SELinux is up and running in enforcing mode
sestatus must report the below output. If it doesn’t – you either have it disabled or running in permissive mode. Don’t proceed until you have output that looks exactly like below
[ec2-user@ip-172-30-10-10 ~]$ sestatus | grep 'SELinux status\|Current mode'
SELinux status: enabled
Current mode: enforcing
Ensuring httpd is running in confined domain httpd_t
httpd must be running in domain httpd_t
. If its unconfined_t
then its running in the wrong domain and rules meant to secure it won’t apply.
[ec2-user@ip-172-30-10-10 ~]$ ps uax -Z | grep httpd
unconfined_u:system_r:httpd_t:s0 root 22884 0.0 1.0 324744 11188 ? Ss Jan15 0:01 /usr/sbin/httpd
unconfined_u:system_r:httpd_t:s0 apache 22887 0.0 2.7 342656 28556 ? S Jan15 0:00 /usr/sbin/httpd
unconfined_u:system_r:httpd_t:s0 apache 23138 0.0 0.6 324876 6812 ? S Jan15 0:00 /usr/sbin/httpd
unconfined_u:system_r:httpd_t:s0 apache 23206 0.0 0.5 324744 6064 ? S Jan15 0:00 /usr/sbin/httpd
unconfined_u:system_r:httpd_t:s0 apache 23483 0.0 0.5 324744 6064 ? S Jan15 0:00 /usr/sbin/httpd
unconfined_u:system_r:httpd_t:s0 apache 23486 0.0 0.5 324744 6064 ? S Jan15 0:00 /usr/sbin/httpd
unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 ec2-user 24944 0.0 0.0 110256 644 pts/0 S+ 03:58 0:00 grep httpd
If your /usr/sbin/httpd is running in domain unconfined_t
instead of httpd_t
, then your context of /etc/init.d/httpd
or /usr/sbin/httpd
has somehow changed. Restoring the context should fix it
To restore context run restorecon
[ec2-user@ip-172-30-10-10 ~]$ sudo restorecon -v /etc/init.d/httpd /usr/sbin/httpd
Allowing httpd to connect to remote database.
If you are hosting your database on a remote server, httpd must be allowed to connect to it.
[ec2-user@ip-172-30-10-10 ~]$ sudo setsebool -P httpd_can_network_connect_db 1
Note: Connection to local database i.e. localhost:3306 does not need this to be set to true
Whitelisting /var/www/html/wp-content/uploads/ for write access
When uploading images and other media from wp-admin, httpd needs access to write to wordpress’ upload directory. By default all files and directories in /var/www/html are labeled with type httpd_sys_content_t
. In this context files are not writeable by process httpd running in domain httpd_t
. Changing this to type httpd_sys_rw_content_t
will allow write/create/delete access
[ec2-user@ip-172-30-10-10 ~]$ sudo semanage fcontext -f "" -a -t httpd_sys_rw_content_t '/var/www/html/wp-content/uploads(/.*)?'
To apply the change in context for directories and files in /var/www/html/wp-content/uploads/* use restorecon
[ec2-user@ip-172-30-10-10 ~]# restorecon -Rv /var/www/html
[ec2-user@ip-172-30-10-10 ~]# ls -all -Z /var/www/html/wp-content/uploads/
drwxr-xr-x. apache apache unconfined_u:object_r:httpd_sys_rw_content_t:s0 .
drwxr-xr-x. nobody 65534 unconfined_u:object_r:httpd_sys_content_t:s0 ..
drwxr-xr-x. apache apache unconfined_u:object_r:httpd_sys_rw_content_t:s0 2015
Special note about boolean httpd_unified
on CentOS/RHEL distributions
By default httpd_unified
is enabled on CentOS/RHEL systems older than version 7.
This allows Apache write access to files and directories labeled with context httpd_sys_content_t
. We want to make sure that Apache can only write to the directory/files we whitelisted (/var/www/html/wp-content/uploads/). Turning this off is highly recommended to control writes to the filesystem and within the DocRoot
[ec2-user@ip-172-30-10-10 ~]# sudo setsebool -P httpd_unified 0
Debugging
- When working with selinux, tailing /var/log/audit/audit.log is always helpful
- Switch to permissive mode and see if something that breaks while in enforcing mode. This should tell you if SELinux is breaking things.
Exploring other interfaces exposed by selinux_httpd
A list of all boolean interfaces that are exposed by selinux_httpd can be listed using getsebool. An explanation of boolean interface is documented on selinux_httpd’ man page
[ec2-user@ip-172-30-10-10 ~]$ sudo getsebool -a | grep httpd
allow_httpd_anon_write --> off
allow_httpd_mod_auth_ntlm_winbind --> off
allow_httpd_mod_auth_pam --> off
allow_httpd_sys_script_anon_write --> off
httpd_builtin_scripting --> on
httpd_can_check_spam --> off
httpd_can_connect_ftp --> off
httpd_can_connect_ldap --> off
httpd_can_connect_zabbix --> off
httpd_can_network_connect --> on
httpd_can_network_connect_cobbler --> off
httpd_can_network_connect_db --> off
httpd_can_network_memcache --> off
httpd_can_network_relay --> off
httpd_can_sendmail --> off
httpd_dbus_avahi --> off
httpd_enable_cgi --> on
httpd_enable_ftp_server --> off
httpd_enable_homedirs --> off
httpd_execmem --> off
httpd_graceful_shutdown --> off
httpd_manage_ipa --> off
httpd_read_user_content --> off
httpd_run_stickshift --> off
httpd_setrlimit --> off
httpd_ssi_exec --> off
httpd_tmp_exec --> off
httpd_tty_comm --> off
httpd_unified --> off
httpd_use_cifs --> off
httpd_use_fusefs --> off
httpd_use_gpg --> off
httpd_use_nfs --> off
httpd_verify_dns --> off
Putting it all together
At this point, you should have a working instance of WordPress served by httpd running in confined domain httpd_t
. It should minimised the damage that someone can do, if at all they manage to upload files to your server and attempt to execute them.
These simple steps should keep your WordPress instance fairly secure and the random binary files in /tmp at bay.
Further reading
- [SELinux in Practice: DVWA Test by Positive Research Center](SELinux in Practice: DVWA Test)
- And in response – Got SELinux? by Dan Walsh
Setting up an IPSec VPN connection to Microsoft Azure using Strongswan
It took me a while to get the IPSec tunnel between Azure and Strongswan up and running. This post documents Strongswan’ configuration required to get traffic going through the tunnel
Assumptions
- Private network segment on Azure’s side is 10.0.0.0/16
- Public IP address of VPN getaway on Azure’s side is 1.2.3.4
- Private network segment of instance running Strongswan is 172.30.0.0/16
- IP address of instance running Strongswan is 172.30.2.11
- Your pershared key is in /etc/strongswan/ipsec.secrets
Connection configuration
[francis@ip-172-30-2-11 ~]# cat /etc/strongswan/ipsec.conf
conn office-network-to-azure-southeast-asia
closeaction=restart
dpdaction=restart
ike=aes256-sha1-modp1024
esp=aes256-sha1
reauth=no
keyexchange=ikev2
mobike=no
ikelifetime=28800s
keylife=3600s
keyingtries=%forever
authby=secret
left=172.30.2.11 # local instance ip (strongswan)
leftsubnet=0.0.0.0/0
leftid=172.30.2.11 # local instance ip (strongswan)
right=1.2.3.4 # vpn gateway ip (azure)
rightid=1.2.3.4 # vpn gateway ip (azure)
rightsubnet=10.0.0.0/16 # private ip segment (azure)
auto=start
Installing MySQL 5.5 on CentOS 6.x
This article describes how to install MySQL 5.5 on CentOS 6.x which is not available in the default CentOS package repository. It installs the x86_64 bit version of MySQL 5.5.33-1 on a x86_64 bit machine. For i386 replace x64_64 with i386.
# Install libaio – its required by MySQL server 5.5
$ yum install libaio
# Download MySQL 5.5 installation RPMs
$ wget http://dev.mysql.com/get/Downloads/MySQL-5.5/MySQL-5.5.33-1.linux2.6.x86_64.rpm-bundle.tar/from/http://cdn.mysql.com/
# Untar the installation bundle
$ tar -xvf MySQL-5.5.33-1.linux2.6.x86_64.rpm-bundle.tar
# Install MySQL shared compact
$ rpm -Uvh ySQL-shared-compat-5.5.33-1.linux2.6.x86_64.rpm
# Install MySQL shared
$ rpm -Uvh MySQL-shared-5.5.33-1.linux2.6.x86_64.rpm
# Install MySQL client
$ rpm -Uvh MySQL-client-5.5.33-1.linux2.6.x86_64.rpm
# Install MySQL server
$ rpm -Uvh MySQL-server-5.5.33-1.linux2.6.x86_64.rpm
# Finally, start MySQL server 5.5
$ /etc/init.d/mysql start
Don’t forget to run mysql_secure_installation
to secure the newly installed MySQL instance
Kannel – setting up active-passive failover SMPP gateways
This post documents setting up two SMPP gateways in active-passive failover mode i.e when the active SMPP gateway goes down, traffic is automatically send through the secondary (passive) SMPP gateway.
Prerequisites
This post assumes that you have working knowledge of setting up and running Kannel. Only relevant config is documented, rest of the config parts are snipped off for readability
Setting up active-passive gateways
Assuming the smsc-id of the active SMPP gateway is reliable_smpp_gw
and passive is unrealiable_smpp_gw
the following config sets up an active-passive gateway
Configuring the active SMPP gateway
group = smsc
smsc = smpp
smsc-id = reliable_smpp_gw
.
.
allowed-smsc-id = reliable_smpp_gw
preferred-smsc-id = reliable_smpp_gw
Configuring the passive SMPP gateway
group = smsc
smsc = smpp
smsc-id = unreliable_smpp_gw
.
.
allowed-smsc-id = unreliable_smpp_gw;realiable_smpp_gw
preferred-smsc-id = unreliable_smpp_gw
Hooking the gateways to an account
group = sendsms-user
username = company_1
password = compant_1_admin
name = company_1
.
.
default-smsc = reliable_smpp_gw
forced-smsc = reliable_smpp_gw
The default-smsc
and forced-smsc
enforce that messages submitted by user account company_1
are send through smsc realiable_smpp_gw
. The passive gateway is configured to send messages for smsc-id realiable_smpp_gw
and unrealiable_smpp_gw
, this make Kannel use passive (unrealiable_smpp_gw
) when active (realiable_smpp_gw
) is unavailable. Once active gateway (realiable_smpp_gw
) is back online, traffic is automatically send through the active gateway
Kannel – setting up active-active load-balanced SMPP gateways
This post documents setting up Kannel to balance across two SMPP gateways in active-active mode such that messages are set using both the gateways
Note: Kannel does not monitor the quality of service of each link. If a gateway is connected and is not delivering messages, it will continue using the gateway until the connection to the gateway goes offline.
Prerequisites
This post assumes that you have working knowledge of setting up and running Kannel. Only relevant config is documented, rest of the config parts are snipped off for readability
Setting up active-active gateways
The trick in setting up active-active gateways in Kannel is to set the same smpp-id
for both the SMPP gateways.
[SMPP connection 1 config]
group = smsc
smsc = smpp
smsc-id = smpp_carrier_gw
host = carrier1.example.com
smsc-username = carrier1
smsc-password = carrier1
[SMPP connection 2 config]
group = smsc
smsc = smpp
smsc-id = smpp_carrier_gw
host = carrier2.example.com
smsc-username = carrier2
smsc-password = carrier2
Hooking the gateways to an account
group = sendsms-user
username = company_1
password = compant_1_admin
name = company_1
.
.
default-smsc = smpp_carrier_gw
forced-smsc = smpp_carrier_gw