SonarG Architecture

SonarG is a system for storing, managing and providing access to the IBM InfoSphere Guardium Database Activity Monitoring (DAM) system (called Guardium in the documents that follow).

SonarG is a Big Data system that uses the SonarW NoSQL Data Warehouse to store data that is extracted from Guardium collectors. SonarG allows you to store large amounts of Guardium data in one place - thus eliminating the need for complex aggregation processes and allowing you to centralize data from hundreds of Collectors and for long periods of time in one place. Because the data is stored in a best-of-breed data warehouse, reports and analytics run fast and the data can be used for multiple purposes.

SonarG includes the following components:

  • The SonarW NoSQL Data Warehouse.
  • The SonarCollector ETL layer and specific Guardium ETL algorithms.
  • The SonarG GUI.
  • The SonarK discovery GUI (based on Kibana).
  • SonarSQL, providing SQL access to Guardium data stored within SonarW.
  • JSON Studio providing a GUI for advanced analytic query building and visualization.

SonarG is a software package that is installed on a RHEL Linux server. SonarG can run on a physical server or as a virtual machine. SonarG can be installed as the only application on the server or co-located with other applications. However, due to the nature of the SonarG Big Data workloads, SonarG is a resource-intensive application and consumes all resources available to it - compute, memory and I/O. It is therefore recommended to run SonarG on its own server.

SonarG receives data from Guardium collectors through an SCP process of compressed extraction files. These files are produced by the collectors and the mechanism is supported for Guardium versions 9.x and 10.x. If you are running version 9.5 collectors you need to install the IBM data extraction patch 609 (or a cumulative later patch). Consult your SonarG account manager for the precise IBM patch required. Guardium 10 has built in support for producing these extract files.

Data coming from Guardium Collectors is copied to the SonarG server where it is processed using a Guardium-specific ETL process before it is inserted into SonarW. When you configure data extraction from Guardium collectors you specify a hostname where the extract files should be copied to. This host can be the SonarG host or a separate host which will serve as the staging area for the extract files (from which SonarG ETL will copy the files). It is recommended that the collectors copy the files directly to the SonarG server to prevent an additional and unnecessary copy.

Collectors produce and copy files on an hourly basis. The SonarG ETL process runs continuously and ingests these extract files on an ongoing basis. Data is therefore available in SonarG with a lag not longer than ~60-75 minutes.

Once the data is in SonarW, various tools provide access to the Guardium data. These include a SonarG custom-built reporting layer, JSON Studio for building queries, reports and visualizations directly over the Guardium data, a Web Services layer and a SQL layer. All these are installed on the SonarG server as part of the SonarG installer.

System Sizing

A single SonarG node is usually used for up to 30TB of compressed Guardium data. You can store more than 30TB on a single node and reporting times may still be reasonable but you can also cluster multiple SonarG nodes to provide faster response times. Consult your SonarG account manager for additional sizing guidelines.

Each SonarG node should have the following specs:

  • Two Intel Xeon processors, at least 6 cores per socket, at least 2.4Ghz each.
  • At least 64GB of memory.
  • Either HDD or SSD drives. In both cases, and especially when using HDDs, the drives should be striped using RAID0 or RAID10. For example, you can choose to use SATA drives. In this case you should create a single RAID array using at least four such disks to give you the ability to read at a rate nearing 500MB/s. The system has been optimized to allow leveraging low-cost SATA drives in order to achieve the most cost-effective large data store using inexpensive drives.
  • At least one SSD of size ~400MB used for temp storage for SonarW.

If you are deploying SonarG on an Amazon AWS EC2 instance, SonarG recommends using a m4.4xlarge instance or an m4.10xlarge instance if workloads are expected to be very large. An io2 EBS volume with at least 10K-12K PIOPS is recommended since this will allow you to grow the volume as your data size grows with no changes to the SonarG application or to RHEL. If you choose general purpose EBS then you should use a RAID0 configuration.

When using virtual machines (VMs), the recommended minimum production configuration is 8 vCPUs and 64GB RAM. You must work with your VM administrator to provision enough IOPS dependant on your loads and data volumes. A VM machine with 32GB RAM and 6 vCPUs can be used for a POC host, with the understanding that the performance will not be optimal.

If you are deploying SonarG on a machine that has between 96GB to 128GB of RAM, set the parameter block_allocation_size_percentage to 33. This will take advantage of the memory available. If you are deploying on a machine that has 128GB of RAM or more, set block_allocation_size_percentage to 50.


A single SonarG system maintains up to 10 Trillion distinct sessions per collector. If there is a Guardium system that will feed more than 10 Trillion sessions then older sessions will be deleted and the newer sessions will be maintained.