Estimating Micro services max DB throughput

Recently I came with the scenery of estimating maximum throughput of micro-services based applications on K8s clusters.
These micro-services implement several modules which use an Oracle DB as backend, before starting with a real test I like to know which is the maximum bandwidth our Oracle DB can accept, which is the Docker network overhead and which is the cluster (node to node) overhead.
To get a similar workload across different deployments I am working with a great piece of software by Dominic Giles called SwingBench. It included a graphical interface:

Image for post
Image for post
SwingBench visual console

as well a command line, named CharBench.

Building SwingBench Docker image

Before starting with a cluster workload test let’s start with a simple Swarm deployment, to do that first We will create Docker images including SwingBench distribution ready for running as Docker Swarm services. Here a Dockerfile:

Dockerfile

Although it uses JDK8 now SwingBench is supported for JDK9+, local swarm tests will be a baseline for later comparison with a cloud deployment.

Simple Swarm stack for local testing

Here a simple docker-compose.yml Swarm stack ready for running a local testing:

The above docker-compose.yml file requires a Docker Oracle Image (19c) built using official Oracle Images.

Local test using 19c EE RDBMS

Deploy above stack using:

$ export ADMIN_PWD=Oracle_2020
$ export SOE_PWD=soe
$ export TPC_PWD=tpcdslike
$ docker stack deploy -c docker-compose.yml sb

Once your RDBMS is up and running some services declared in the above stack are required to create SwingBench test data.
Let starts creating SOE and TPCDSLIKE schemas:

$ docker service scale sb_oesetup=1
$ docker service scale sb_tpcdsetup=1
$ docker service scale sb_oesetup=0 sb_tpcdsetup=0

Both sb_oesetup and sb_tpcdsetup create Oracle schemas with around 0.8 Gigabytes of sample data, if you want a bigger test case just update above docker-compose.yml and re-rerun setup process.

Client side

Once of the test suite provides by SwingBench is typical client server stress test, mostly SQL select and DML operations with no PL SQL usage, here a local test run:

$ docker service scale sb_soecs=1
$ docker service logs -f sb_soecs
sb_soecs.1 | Author : Dominic Giles
sb_soecs.1 | Version : 2.6.0.1137
sb_soecs.1 |
sb_soecs.1 | Results will be written to results.xml.
..... lots of output here ....
|Maximum Transactions/min | 87636 |
|Average Transactions/sec | 1409.73|
.....
sb_soecs.1 | Completed Run.

Above result is considered our maximum throughput because there is only the network overhead associated to the overlay network which connect SwingBench service and Oracle DB service, if We check online using SQL Developer Instance Viewer We see something like:

Image for post
Image for post
SQL Developer Instance Viewer

getting around 11 Mbps of client/server SQLNet traffic and more than 1400 TPS (commit rate) with 16 concurrent connections. A diagram with this deployment is:

Image for post
Image for post
Docker stack deployment diagram

Server Side

A similar test suite to the previous one is called Server Side V2, it intend to the same thing in term of functionality but using Stored Procedures, here a local result:

$ docker service scale sb_soess=1
$ docker service logs -f sb_soess
sb_soess.1 | Author : Dominic Giles
sb_soess.1 | Version : 2.6.0.1137
sb_soess.1 |
sb_soess.1 | Results will be written to results.xml.
..... lots of output here ....
|Maximum Transactions/min | 94659 |
|Average Transactions/sec | 1542.66|
.....
sb_soess | Completed Run.

Around 10% more transactions by second with much less network traffic!!!

Image for post
Image for post
SQL Developer Instance Viewer

TPC-DS like test

TPCDS is a different test suite complaint with TPC-DS (Decision Support)

$ docker service scale sb_tpcds=1
$ docker service logs -f sb_tpcds
sb_tpcds.1 | Author : Dominic Giles
sb_tpcds.1 | Version : 2.6.0.1137
sb_tpcds.1 |
sb_tpcds.1 | Results will be written to results.xml.
..... lots of output here ....
|Maximum Transactions/min | 18251 |
|Average Transactions/sec | 301.23|
.....
sb_tpcds.1 | Completed Run.

Local test outside Docker

This test is to know the impact of Docker layer networking, basically a linux firewall DNAT, with the change of userland-proxy=disabled as is explained by Franck Pachot post. It will basically affect mostly client side test, here the result:

$ /opt/swingbench/bin/charbench -cs //172.17.0.1:1521/orclpdb1 -c /opt/swingbench/configs/SOE_Client_Side.xml -u soe -p ${SOE_PWD} -uc 16 -a -v users,tpm,tps,cpu,disk -mt 100000 -mr
Author : Dominic Giles
Version : 2.6.0.1135
Results will be written to results.xml.
..... lots of output here ....
|Maximum Transactions/min | 85572 |
|Average Transactions/sec | 1391.0|
.....
Completed Run.
Image for post
Image for post
SQL Developer Instance Viewer

We experiment a little more network traffic, more CPU usage (firewall activity) and less transactions, around 1.5% less. Deployment diagram:

Image for post
Image for post
Docker stack deployment diagram

Now, is the time for the Oracle Cloud!!

Assuming that you already have Docker Swarm deployment (two nodes) and two RDBMS at Oracle Cloud Free scenery as is described in my previous post My own dev/test cloud environment using Oracle Always Free instances, We could make above test to see the impact of the remote network layering at DB oriented micro-services.

Note: At Oracle Cloud, docker-compose.yml variables ADMIN_PWD/SOE_PWD/TPC_PWD must be complaint against Oracle Cloud security rules. Password must be 12 to 30 characters and contain at least one uppercase letter, one lowercase letter, and one number. The password cannot contain the double quote (“) character or the username “admin”. It must be different than the last four passwords. You cannot reuse a password within 24 hours.

Client side

Having two nodes in a cluster to simulate micro-services workload We can test not only our maximum throughput also We can test which is better, a big micro-service with a connection pool size with 16 connections, 2 with 8 or 4 with 4. Deployment layer

Image for post
Image for post
Oracle Cloud Always Free infrastructure

Note: Due two nodes are constrained to 1Gb RAM We override SwingBench layer with an smaller size JVM, see docker-compose.yml and launcher.xml files, also connection SQLNet string requires Wallet_SID.zip files download from Oracle Cloud Console.

To download each wallet log into your cloud connection and go Autonomous Database -> DB Connection -> Download Wallet (Instance Wallet) as is depicted in following screenshots.

Image for post
Image for post
Autonomous TP instance details, click on DB Connection button
Image for post
Image for post
Autonomous DB Wallet Credentials Download

Put both files in the global directory of your swarm services and that’s all.

Fat micro-service

Starting one instance of SwingBench with 16 concurrent connections to the Autonomous Databases / Transaction Processing (test1_tpurgent connect string, docker service scale sb_soecs=1) We got this result

soecs:
image: swingbench:2.6.0
command: /opt/swingbench/bin/charbench -cf /opt/Wallet_test1.zip -cs test1_tpurgent -c /opt/swingbench/configs/SOE_Client_Side.xml -u soe -p ${SOE_PWD} -uc 16 -a -v users,tpm,tps,cpu,disk -mt 40000 -mr
Author : Dominic Giles
Version : 2.6.0.1135
Results will be written to results.xml.
..... lots of output here ....
|Maximum Transactions/min | 9157 |
|Average Transactions/sec | 131.61 |
.....
Completed Run.
Image for post
Image for post
SQL Developer Instance Viewer

Maximum SQLNet throughput is 1.6 Mbps and around 225 TPS (commit rate)

Note: Autonomous DB is constrained to maximum 18 connections, once is consumed by Oracle Cloud monitoring services, the other by SQL Developer and 16 by our fat micro-service.

Thin micro-service

Two micro-services with 8 concurrent connections (docker service scale sb_soecs=2):

soecs:
image: swingbench:2.6.0
command: /opt/swingbench/bin/charbench -cf /opt/Wallet_test1.zip -cs test1_tpurgent -c /opt/swingbench/configs/SOE_Client_Side.xml -u soe -p ${SOE_PWD} -uc 8 -a -v users,tpm,tps,cpu,disk -mt 20000 -mr
Author : Dominic Giles
Version : 2.6.0.1135
Results will be written to results.xml.
..... lots of output here ....
|Maximum Transactions/min | 6253 |
|Average Transactions/sec | 87.75 |
|Maximum Transactions/min | 6241 |
|Average Transactions/sec | 90.12 |
.....
Completed Run.
Image for post
Image for post
SQL Developer Instance Viewer

Maximum SQLNet throughput is 2.4 Mbps and around 300 TPS (commit rate), Aggregate TPS 177,87 around 35% faster than a fat micro-service.

Four micro-services with 4 concurrent connections (docker service scale sb_soecs=4):

soecs:
image: swingbench:2.6.0
command: /opt/swingbench/bin/charbench -cf /opt/Wallet_test1.zip -cs test1_tpurgent -c /opt/swingbench/configs/SOE_Client_Side.xml -u soe -p ${SOE_PWD} -uc 4 -a -v users,tpm,tps,cpu,disk -mt 10000 -mr
Author : Dominic Giles
Version : 2.6.0.1135
Results will be written to results.xml.
..... lots of output here ....
|Maximum Transactions/min | 4247 |
|Average Transactions/sec | 40.83 |
|Maximum Transactions/min | 3201 |
|Average Transactions/sec | 46.62 |
|Maximum Transactions/min | 3406 |
|Average Transactions/sec | 43.21 |
|Maximum Transactions/min | 3195 |
|Average Transactions/sec | 45.98 |
.....
Completed Run.
Image for post
Image for post
SQL Developer Instance Viewer

Maximum SQLNet throughput is again 2.4 Mbps and around 310 TPS (commit rate), Aggregate TPS 176,64 around 35% faster than a fat microservice.
We reach a bandwidth limit between two nodes, even if We started 4 microservices they need to be placed into two nodes so they are limited to this aggregated bandwidth, but if you have four physical nodes 4 x 4 combination will be faster than 2 x 8.

Server Side

In this case we summarize our test with that results (Average Transactions/sec):

  • 16 x 1 -> 266,67
  • 8 x 2 -> 290,30
  • 4 x 4 -> 286,24

We only get only 8% gain on performance in 2 x 8 layout, mostly of this workload is done at server side and network bandwidth is small.

TP or DS Autonomous DB

We already deploy both test data into test1 (Transaction Processing) and test2 (Data Warehouse) so which outperform better for a client side test?

  • TP 8 x 2 -> 290,30
  • DS 8 x 2 -> 127,50

right Autonomous TP using test1_tpurgent connect string outperforms test2_high connect string of Autonomous DS database, around 130% faster which is logical because it is tuned for transaction processing.

TPC-DS

As I mentioned early, SwingBench includes a TPC-DS like test suite, which seem to simulate a mixed workload between client side and server side queries, here a comparison between TP or DS Autonomous DB:

  • 8 x 2 test1_tpurgent -> 269,41 tps / 17815 tpm
  • 8 x 2 test2_high -> 265,90 tps / 17833 tpm
  • 8 x 2 local deploy -> 298,38 tps / 18212 tpm

Basically there is no difference between cloud databases options, and my notebook which is using 19c with a SSD based disk is slightly faster but remember We are using a cloud free tier for testing which is very limited in network bandwidth, disks and CPU.

Conclusions

If you are planning to deploy a cluster (Swarm or K8s) microservices using Oracle as a backend database you need to know which will be the maximum available throughput at the RDBMS side.
To know a maximum throughput is important to use a well known test suite such as SwingBench first, because your application may need some tweek or tuning.
As We see above, multiples replicas of your micro-services with a DB connection pool limited to a few connections will be better because they run in different nodes and get the advantage of the network/cpu aggregation.
Remember to disable userland-proxy on Docker it will impose a lot of overhead to the RDBMS inbound network traffic causing less available bandwidth to transfer data.
A database with NVME disk will increase a lot your available TPS, for example an Azure Standard_L16s_v2 shape with two NVME disk result in twice as much performance than my notebook, it is for client side test 22,9 MBps local network throughput and 2570.90 tps, for a TPC-DS test around 469.67 tps with 29342 tpm.
If you pay attention to Instance Viewer of my local test compared with Cloud captures, at Oracle Cloud there is NO red lines (Blocked connections) this caused by there is NO IO contention at the RDBMS, sure Autonomouse DB have multiples disks compared with my notebook which only have one SSD disk.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store