MTA — Modernize Traditional Apps with Docker, case study 1 Squid cluster

We at our University are moving many traditional IT infrastructure to Docker now, so this first article try to show a simple use case of this process.

There is a very good guide and advice at Docker success center named Docker Reference Architecture: Design Considerations and Best Practices to Modernize Traditional Apps (MTA) with Docker EE. The article shows common use case and the idea of this new one is to show an application which was not designed from the beginning with elastic idea it can be adapted to work and exploit the functionality of a Docker Swarm cluster such as horizontal elasticity, fault tolerance and CI/CD.

Squid is the best reverse proxy available at Linux environment and could speedup your Internet access a lot within an enterprise networks, but its previous to the Docker/Swarm clustering idea and hasn’t an horizontal scaling functionality such as Elastic Search for example, but We can hack on Docker this app to get similar behavior, here a sample picture of the deployment

assuming that We have a Docker Swarm cluster having two master nodes and four or more worker nodes as is showed this status:

above deployment can be defined using 2 Docker Stack file, once for the Squid cluster and another one for DNS/Syslog services:

version: "3.2"


version: "3.2"

our Docker image slab/syslog-ng is based on balabit/syslog-ng:latest and using a custom syslog-ng.conf designed to have a centralized store for Squid logging information of the four running images. Here some part of this file:

# Anything that's from the program 'squid'#  and the 'user' log facility
filter f_squid { program("squid") and facility(user); };

# This is our squid destination log filedestination d_squid {
# The squid log file with dates
file("/var/log/$HOST/$YEAR/$MONTH/squid.$YEAR-$MONTH-$DAY" owner(root) group(adm) perm(665)
create_dirs(yes) dir_perm(0775));

# This is the actual Squid logging
log { source(src); filter(f_squid); destination(d_squid); };

image slab/named-fibra is based on superbfg7/named:latest and including our internal DNS entries and forward configuration to OpenDNS filtering functionality. Dockerfile definition is basically:

FROM superbfg7/named:latest
MAINTAINER Marcelo Ochoa ""

Finally the tricky part of the Squid cluster, we would like to have at least 4 nodes running in sibling mode, their share cache information using ICP protocol over sibling network. Dockerfile for slab/squid-direct:3.3.8 image is:

FROM slab/squid:3.3.8

COPY /etc/squid3/squid.conf
COPY dominios.conf /etc/squid3/dominios.conf

and Dockerfile for slab/squid:3.3.8 is:

FROM ubuntu:14.04

SQUID_CACHE_DIR=/var/spool/squid3 \
SQUID_LOG_DIR=/var/log/squid3 \

RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y squid iputils-ping curl telnet \
&& ln -s /usr/share/squid3/errors/es /usr/share/squid3/errors/es-us \
&& ln -s /usr/share/squid3/errors/es /usr/share/squid3/errors/es-la \
&& rm -rf /var/lib/apt/lists/*

COPY /sbin/
RUN chmod 755 /sbin/

EXPOSE 3128/tcp 3130/udp 8080/tcp
ENTRYPOINT ["/sbin/"]

Nothing special here, except that /sbin/ includes this functionality:

build_sibling_list() {
my_ip=$(tail -1 /etc/hosts|cut -f1)
echo "# Sibling list" >/etc/squid3/sibling.conf
for i in $(dig "tasks.$SERVICE_NAME"|grep "^tasks\.$SERVICE_NAME\."|cut -f5|sort); do
echo "cache_peer $i sibling 8080 3130" >>/etc/squid3/sibling.conf;
sed -i "/cache_peer $my_ip/d" /etc/squid3/sibling.conf

It basically have two extra Bash functions build_sibling_list and check_reload_sibling, first function check an special DNS entry named tasks.squid_proxy which includes an updated list of internal IPs associated to each task of the running service, parsing the DNS response in a file /etc/squid3/sibling.conf and finally removing the instance self IP from the list (/etc/squid3/sibling.conf is included from squid.conf file). The sibling list looks like:

# Sibling list
cache_peer sibling 8080 3130
cache_peer sibling 8080 3130
cache_peer sibling 8080 3130

meaning that the node which have the IP have three sibling at IPs ending with .23, .25 and .26, this tells Squid that can share the cache content in between using ICP protocol.

Second bash function check_reload_sibling runs every 1minute in background, thanks god we have Bash!!, and compare a new generated file with a previous one looking for changes, they could happen if We scale our cluster with more or less nodes or by Swarm re-localization service, if there are changes at the list of sibling it tells squid that need to reload the configuration.

Final remarks, note that:

  • DNS service is fault tolerant (2 service running in parallel)
  • Squid service is elastic can scale by simple call docker service scale squid_proxy=6 to get two additional instance for example, with additional running copies We have more Net and Disk bandwidth
  • Logging information is centralized and we can get real-time insight of this logging following this post Analyze web traffic with Squid proxy & Elasticsearch/Logstash/Kibana stack
  • Squid disk cache is transient, it will survive only the life-time of the service instance the size of this cache is defined in squid.conf file
  • Squid memory cache is also controlled by squid.conf parameter, for example 256Mb gives you 1Gb RAM total
  • A healthcheck is implemented using curl and an internal URL (switch status page) which means if for any reason the squid proxy is unable to reach this web page will be automatically destroyed and started in another swarm host, this give to the cluster a sufficient stability

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store