Ingesting Guardium Data in SonarG

Sonargd is the daemon responsible for ingesting incoming data extract files from Guardium collectors, performing the required transformations (ETL) and inserting the data into the SonarW warehouse.

Sonargd configuration

The following items are configured using sonargd during the initial installation and can be reconfigured at any time by editing the file sonargd.conf

After making changes you need to restart (or stop and start) the sonargd service

SonarW URI

This is the URI that sonargd uses to access SonarW. It needs to be able to ingest data, create collection and inspect existing collections, including system collections (system.ingest). This URI must include the database to be used by sonargd and sonargdm for the data injection.

Defaults to mongodb://localhost:27117/sonargd.

Base Directory

The base directory of operation for sonargd. All files read and written by sonargd will reside under this directory (except possibly log files). sonargd needs write permissions and enough free space in the base directory. See Directory Structure for details.

Defaults to /var/lib/sonargd

Log Directory

The directory under which log files are to be created. This is usually different from the base directory and will often reside on a different partition.

Log files will be named sonargd.log, and will be rotated every 10 MB, with 10 backup copies.

In addition, sonargdm output files for each file set will be placed here, and named sonargdm.<timestamp>:<collector>.<n>.log.

  • timestamp - a 14 digit timestamp from the original file, format YYYYMMDDhhmmss.
  • collector - the name of the collector extracted from the original file
  • n - number used by sonargdm to ensure files are not being overwritten

Defaults to /var/log/sonargd

Log Level

You can fine tune the level of messages reported to the log file. Choose one of: debug, info, warning, error, critical. Each level emits less messages than the previous one.

Defaults to info.

Incoming Server URI

When sonargd needs to pull the data from a different server, provide the URI for the server. URI is defined in RFC-3986, and is of the general form of ssh://user:password@server/path/to/files.

If you do not specify a URI here, sonargd assumes that files will appear in the incoming folder through other means. See Pulling data manually for more information.

Incoming Server SSH Key

Instead of providing a password in the Incoming Server URI, you could provide an ssh private key file to be used when connecting to the server. It takes more effort to manage keys, but it is more secure and provides you fine grained control over who can connect to each server. Make sure the public portion of this key is added as an authorized key on the server.

If you do not provide a key here, sonargd will connect to the server using the URI supplied above.

sonargd Directory Structure

The following directory structure will be created under the base directory (see Base Directory) upon first run of sonargd (if it does not exist already):

incoming

This directory is the landing area for new data files and persistently monitored for incoming files. Newly discovered files are processed by sonargd, which can determine when a copy operation has completed and will only process completed files or file sets in this directory.

At the end of processing, files will be moved to the archive directory. Files that generated any errors during processing will be moved to the audit directory.

inprogress

This is where files currently being processed are stored. The compressed files from the incoming directory are extracted and processed into this folder to become the input files for running sonargdm.

This directory is cleaned up at the end of each file processing cycle.

audit
The file that generated the error will be copied, regardless of whether it was a corrupted compressed file, or a file that generated errors while being processed by sonargdm.
archive
Files that were successfully processed will be moved into this directory. Only files from incoming folder are moved here, not any of the intermediary files used during the process. By default, files are only kept in the archive for 24 hours. This can be changed by passing a different value to --keep command line argument. Zero will disable removal of files from the archive.

Additional files may also be stored in the base directory. One such important file is the session_cache.sqlite, which allows sonargdm to synchronize session data between files.

SonarW Collections

sonargd creates and accesses several collections within SonarW.

system.ingest
sonargd will verify that the ingested collections are indeed defined properly in this collection. If they are not, they will be added here. If they already exist but defined differently than what sonargd expects, sonargd will report a critical error.
gid
sonargd creates a unique gid (gmachine id) for each gmachine. The gmachine name is taken from the first part of the file name in the set. For consistency, the gmachine names are stored in a special collection in SonarW. By default this collection is named gid, but this can be configured by editing /etc/default/sonargd. If you modify this collection after files were processed, make sure to rename the collection both in SonarW and in the configuration file at the same time, or gmachine id conflicts will occur.
grdm

For each file set that has been processed, a document will be inserted into this collection. The document will have three keys: n, t, v.

n is the hostname of the machine that the file came from (if given).

v is the total amount of MB that were processed by sonargdm (uncompressed).

t is the current time and date.

files is the list of files processed.

grdmrec
This is more detailed data extract log that records precise record counts during ingestion.
sonargd_log

When sonargdm is being run by sonargd, it might produce an abundant amount of log messages. These will be stored in a log file per set, and this collection provides information about this file. Each document in this collection has file_timestamp, which is the timestamp given in the processed file, logfile, which has the full path of the log file from sonargdm, and timestamp which is the current time.

Log files are named sonargdm.<timestamp>:<collector>.<n>.log and are located in the Log Directory.

distinct_*

Several collections named with the prefix distinct_ will be created, that summarize the distinct values of Database name, DB User name, OS User, Source Program, Server IP, Analyzed client IP, Client host name, Server host name, Server type and Service name.

These collections are only be created/updated after ingesting the session collection.

top_*

For each of the distinct_* collections there exists a top_* collection, that will only contain the top 100 of each specified value, based on the number of occurrences in each respective session collection.

These collections will only be created/updated after ingesting the session collection.

Valid file sets

Only files that are part of a valid file set will be processed by sonargd. A valid file set is a group of files with the same timestamp in their name, that was fully copied into the system.

Some files require session id to be processed, and thus must be processed together, or not at all. These are the files that have SESSION, SESSION_END and FULL_SQL in their name.

Files that have fully arrived are processed immediately, with the exception of the above 3 files (session id requiring file). These files need to be processed together so the system will wait for all of them to arrive before processing.

Pulling data manually

If you do not specify the Incoming Server URI above, sonargd will wait for files to appear in the incoming folder.

The option exists to copy the files there from any other source, for example, using a shell script that uses scp or rsync.

Here is an example for a bash script, that will copy files from the remote server to the incoming:

#!/bin/bash REMOTE_HOST=myhost REMOTE_DIR=mydir

files = $(rsync -rt --out-format='%n' ${REMOTE_HOST}:$)

Large Files Safety Valve

SonarG ingestion has an ability to provide a “safety valve” against large files ingestion. On some occasions mistakes are made - e.g. a policy is changed or a change in the database causes a flood of records that are unwanted. In regular Guardium environments these sometimes cause collectors or aggregators to fill up and shut down. You can configure SonarG ingestion to prevent ingestion of such large files until you determine if you want these records in SonarG or not.

Configure this threshold using the max-size config parameter. The default of -1 means that data in ingested always. Setting this to 5000 MB means that if a single gz extract file is over 5GB of data, that file will not be ingested and will be moved to the audit directory for determination by the operator:

#
# maximum file size in MB we are to handle. The compressed incoming files will
# be checked and files larger than this size will be moved to audit
# directory. The files will be processed as if they were empty.
#
# Set to -1 to disable file size checks.
#
# Example: 5000

max-size: -1

When you configure sonargd using such a threshold, a file that is larger than this size is copied to the audit directory and a record is added to grdmrec with the following structure:

  > db.grdmrec.find({'Error': {$exists: true}}).pretty()
  {
          "_id" : ObjectId("5683221b792aa02f638e1b54"),
          "T2" : ISODate("2015-12-30T00:15:23.917Z"),
          "Type" : "full_sql",
          "F" :
"/home/[user]/work/js/sonarG/data/incoming/1762144298_gibm38_EXP_FULL_SQL_[date].gz",
          "T1" : ISODate("2015-12-30T00:15:23.917Z"),
          "Size" : 154624000,
          "Num" : 0,
          "Error" : "File incoming/1762144298_gibm38_EXP_FULL_SQL_20151229162300.gz is
  larger than the configured maximum file size (150000000)",
          "Errs" : 0,
          "N" : "gibm38"
  }

Remote File Copies

In most cases collectors will copy extract files directly to the SonarG incoming directory for processing. SonarG also supports a deployment model by which collectors push their files to a third host and SonarG pulls files from this host. This is especially helpful when SonarG is installed in the cloud while the Guardium appliances are on-prem; rather than having all collectors send the data directly to a machine in the cloud the collectors send it to a local “jump server” and SonarG just needs to be able to copy files from this server. To configure SonarG for such operation add the URIs into the remotes configuration parameter in sonargd.conf:

#
# List of remote file servers from which to read incoming files. The files
# will be read from the server when they are ready (have an appropriate
# '_COMPLETE' file), and moved to the 'archive' directory there right after
# reading. From then on local processing of the file proceeds as usual.
#
# List all the remotes you wish to read from. Use a full RFC-3986 URI
# syntax. Include scheme (ssh, scp or sftp), username, password, host, port
# (if not 22), and a path to the incoming folder on the server.
#
# Example:
# remotes:
#   - sftp://user1:passw0rd@fileserver.domain.tld/full/path/to/incoming
#   - sftp://user2:mypasswd@fileserver2.domain.tld:1220/some/other/path/incoming
#
# For an empty list of remotes, use:
# remotes: []
#
remotes: []

You can use standard ssh config keys to pull data without a username/password. For example, in sonargd.conf specify a remotes definition without a username/password as well as a location for the ssh config file:

remotes:
  - sftp://g2/var/lib/sonargd/incoming

ssh-config: /etc/sonar/ssh_config

and then a standard ssh config entry such as:

host g2
  hostname g2.jsonar.com
  port 22
  user sonargd
  identityfile /etc/sonar/id_dsa

SonarG pulls data from the remote every fixe minutes. If you are using the remote option to pull data from an on-prem server and need to control the periods and frequency of these pulls to have better control when traversing your public-facing network, use the remote-scedule section:

#
# Set the schedule for reading the files from the remote hosts.
#
# The format is given by 5 fields: min(ute), hour, day, month and day-of-week.
#
# minute, hour, day and month are numbers.
#
# day-of-week can be given as a 3 letter day abbreviation (sun, mon, ...) or
# as a number (mon=1, sun=7).
#
# each entry can be a single value (e.g. 2 or sun), a list (e.g. 2,5,19 or
# sun,thu), a step value (e.g. */5, meaning every 5), or a combination of all
# of the above e.g. (*/17,12,41-59).
#
# Day of week can also specify count, so that fri#2 is the second friday of
# the month.
#
# If both day of week and day of month are restricted, the job will run when
# either matches. All the other fields must match together.
#
# If not all values are given, the missing values are taken from the default, or *.
#
# Note: * is a special character in YAML, so it needs quoting in this context.
#
# Default is every 5 minutes:
#   min: '*/5'
#
#remotes-schedule:
#  min: '*/5'
#  hour: '*'
#  day: '*'
#  month: '*'
#  day-of-week: '*'

Ingesting Additional Guardium Data Domains

SonarG receives data extract files from Guardium collectors (or CMs) as part of the built-in SonarG ETL. This data includes session data, query data, Full SQL data, extraction logs, buffer usage monitor data, VA data, exceptions and more. These data extract files are part of the standard Guardium-to-SonarG interface.

Additional data from Guardium can be brought over to the SonarG system with a simple configuration on the Guardium appliances. This is based on the Data Mart (DM) mechanism available in Guardium V9 and up. No new code is required – just a configuration. The SonarG ETL can already ingest any data brought over from the Guardium infrastructure with no additional changes. This means that any data for which a report can be defined on the Guardium system can be brought over and ingested into the SonarG warehouse.

There are three stages to the configuration:

  1. Defining a data mart: Define the data mart based on a query or a report using the query builder or any report. Refer to Guardium documentation for more information about Data Mart definitions. When you define the DM specification typically set it to run on an hourly basis. Set the extraction result to a file and set a schedule. IMPORTANT: The file name must start with EXP_ – this tells the SonarG ETL to process the file as a Guardium DM extract.

  2. Defining the export: For every DM that you want sent over to SonarG you need to specify two activation parameters using grdapi – one to include headers and one to tell the collector to copy files as they are generated to the SonarG host. The two commands to issue (per DM) are:

    grdapi datamart_include_file_header Name=”<your DM name>” includeFileHeader=”Yes”
    grdapi datamart_update_copy_file_info destination=…
    

Refer to Guardium documentation for more information on grdapi.

  1. Configuring ingestion: As long as the DM is defined using the EXP prefix, no changes are needed and all data will be ingested into appropriate collections within the warehouse. For data type casting change sonargd.conf to specify numeric and date fields.

From a mechanism perspective, all DMs defined as above use the from-to dates to provide incremental extracts to be passed to SonarG. Every run of the DM starts off at the time greater than or equal to the last DM run. The timestamp used is whatever is defined in the Guardium domain. Understand what this field is per domain to avoid confusion of what data will be sent. As an example, the Guardium user data has a timestamp for last login time. Therefore each extract will resend all user definitions for users that have logged in since the last time the DM was run. This means that all user data will be recorded in SonarG but the collection will include duplicates and thus SonarG reports on such data should use a group clause.

Another note on mechanism, when you are running a CM environment you only need to define the DM once on the CM or any collector and need to run the scheduling grdapi scripts on each collector.

Examples Adding Custom DM Extracts

This subsection provides a few examples of DMs built on existing Guardium reports or reports built on existing domains. All examples are very much the same, and additional extracts are all easy to define following this pattern.

Entitlement Data

Entitlement data in Guardium is maintained is a set of custom tables. Make sure you have data sources and that data has been loaded into the custom tables. Then open the predefined report that you wish to extract as a DM, for example, ORA Object Privileges. Click on the Data Mart button, for example, in V10:

_images/dm1.png

Enter the data mart configuration definition as shown below. The Data Mart Name is used within the guardapi call, select File and enter the file name (make sure the file name starts with EXP_ and note that the rest of the file name determines the default collection name that will be created in SonarG. Set the time granularity to 1 hour and set the initial start to the current time or close by. Apply your changes.

_images/dm2.png

Run the following grdapi commands to set up the DM and the schedule:

grdapi datamart_update_copy_file_info destinationHost=”<host>” destinationPassword=”<pwd>” destinationPath=”<path>” destinationUser=”sonargd” Name=”Export: Ora object priv” transferMethod=”SCP”*

grdapi schedule_job jobType=dataMartExtraction cronString=”0 40 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export: Ora object priv”*

Files will typically start coming in 1 hour from now.

By default all fields will be imported as strings. If you want the ETL to convert fields as dates or numbers you need to edit the sonargd.conf file on SonarG. For example, in the entitlements report there is a field called SqlGuard Timestamp. To ensure that it is brought in as a timestamp:

  1. Edit the conf file using sudo vi /etc/sonar/sonargd.conf
  2. You can define the field type in the general section (inder int or under date) or you can define a section specific to this CSV file name. In our case since many DM extracts will include this field it makes sense to add this under the general date section.
  3. Save the file and restart the sonargd service.

User Definitions

Use the built-in report called Guardium Users - credentials to set up the DM and then use the following grdapi:

grdapi datamart_update_copy_file_info destinationHost=”<host>” destinationPassword=”<pwd>” destinationPath=”<path>” destinationUser=”sonargd” Name=”Export: User Credentials” transferMethod=”SCP”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 45 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export: User Credentials”

User Activity

There is a predefined report called Detailed Guardium User Acticity but you cannot use it as is since it has two additional filters other than the from-to times. Therefore, rather than use it, clone it, remove the two additional filters, and then define the DM on the new report. Then use the following grdapis:

grdapi datamart_update_copy_file_info destinationHost=”<host>” destinationPassword=”<pwd>” destinationPath=”<path>g” destinationUser=”sonargd” Name=”Export: Guard User Activity” transferMethod=”SCP”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 10 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export: Guard User Activity”

Alerts Sent

By default the three prebuilt reports on Guardium do not enable a DM (Alert Notification, Logged R/T Alerts and Logged Threshold Alerts). You therefore need to build a report based on the domain. Open the Query Builder for the Alert domain and build the desired reports (e.g. make the main entitity Messages Sent). Save it and then define the DM on it. Then run the grdapi command using the DM name you gave when defining the DM (in this case Export: Alert):

grdapi datamart_update_copy_file_info destinationHost=”<host>” destinationPassword=”<pwd>” destinationPath=”<path>g” destinationUser=”sonargd” Name=”Export: Alert” transferMethod=”SCP”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 10 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export: Alert”

Policy Changes

Use the prebuilt report “Policy Changes” and define a DM on it. Note that this is actually a report based on user activity and therefore a subset of the User Activity DM already defined. Use the following grdapis:

grdapi datamart_update_copy_file_info destinationHost=”<host>” destinationPassword=”<pwd>” destinationPath=”<path>” destinationUser=”sonargd” Name=”Export: Policy Changes” transferMethod=”SCP”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 10 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export: Policy Changes”

Audit Process Sign Offs

Use the Audit Process Result Comments as your main entity and define the DM from your new report.

Archive Aggregation Logs

Data archive logs are part of the archive and aggregation log report - and can be used to track archiving of data even when using SonarG and no longer using aggregation. Use the built-in Aggregation/Archive Log and build a DM from it.

Configuring Guardium Appliances

This section provides scripts to configure Guardium appliances. This section does NOT replace the instructions available in the Guardium release notes - follow release notes instructions which are more complete. The instructions here are a summary and provide scripts that you can use as grdapi scripts to use with an ssl session and cli. You can also use this procedure to move files to a third Linux server from which SonarG will pull data.

To enable data transfers between Guardium appliances and SonarG, there are two scripts that need to be executed on a one time basis in order to configure the appropriate data marts and also to schedule the ongoing export of these data marts to SonarG. The config_script.formanager is executed on the CM and responsible for configuring the data marts. The config_script.formanaged is to be run on each collector to set the export schedules for the collectors to push data to the target location.

To push directly to SonarG:

  1. Login to the Guardium appliance using the cli account and copy to that directory using the following grdapi command (replacing the host, username, pwd and directory location):

    grdapi datamart_validate_copy_file_info destinationHost=”<host>” destinationPassword=”<pwd>” destinationPath=”<full path to directory>” destinationUser=”<username>” transferMethod=”SCP”

    Successful execution will result in an OK message.

  2. Change the attached scripts to include this host, username, path and password

  3. Run the two modified scripts in the cli using:

    ssh cli@<guardium CM appliance> < config_script.forManager
    

    Enter the cli password when requested; all should return OK

    and then execute:

    ssh cli@<guardium collector appliance> < config_script.forManaged
    

    Enter the cli password when requested; all should return OK

  4. Files will begin to be written to the directory after two hours and after this initiation will be delivered hourly.

To push directly to a third Linux server (and later configure SonarG using remote mode):

  1. Identify a Linux server with an ssh server installed to be used as the staging target for all the extract files to be created.

  2. Secure a username/password to that server and identify a directory that this user has write permissions to. Execute an SCP file copy to this directly to validate access.

  3. Login to the Guardium appliance using the cli account and copy to that directory using the following grdapi command (replacing the host, username, pwd and directory location):

    grdapi datamart_validate_copy_file_info destinationHost=”<host>” destinationPassword=”<pwd>” destinationPath=”<full path to directory>” destinationUser=”<username>” transferMethod=”SCP”

    Successful execution will result in an OK message.

  4. Change the attached scripts to include this host, username, path and password

  5. Run the two modified scripts in the cli using:

    ssh cli@<guardium CM appliance> < config_script.forManager
    

    Enter the cli password when requested; all should return OK

    and then execute:

    ssh cli@<guardium collector appliance> < config_script.forManaged
    

    Enter the cli password when requested; all should return OK

  6. Files will begin to be written to the directory after two hours and after this initiation will be delivered hourly.

Sample scripts for SonarG V1.x-V2.1 or when you only use SonarG for audit data (change the path to your specific path; for a standalone system run both scripts) - config_script.forManager:

grdapi datamart_update_copy_file_info destinationHost=”yourhosthere” destinationPassword=”yourpwdhere” destinationPath=”/local/raid0/sonargd/incoming” destinationUser=”sonargd” Name=”Export:Exception Log” transferMethod=”SCP”

grdapi datamart_update_copy_file_info destinationHost=”yourhosthere” destinationPassword=”yourpwdhere” destinationPath=”/local/raid0/sonargd/incoming” destinationUser=”sonargd” Name=”Export:Session Log Ended” transferMethod=”SCP”

grdapi datamart_update_copy_file_info destinationHost=”yourhosthere” destinationPassword=”yourpwdhere” destinationPath=”/local/raid0/sonargd/incoming” destinationUser=”sonargd” Name=”Export:Session Log” transferMethod=”SCP”

grdapi datamart_update_copy_file_info destinationHost=”yourhosthere” destinationPassword=”yourpwdhere” destinationPath=”/local/raid0/sonargd/incoming” destinationUser=”sonargd” Name=”Export:Access Log” transferMethod=”SCP”

grdapi datamart_update_copy_file_info destinationHost=”yourhosthere” destinationPassword=”yourpwdhere” destinationPath=”/local/raid0/sonargd/incoming” destinationUser=”sonargd” Name=”Export:Full SQL” transferMethod=”SCP”

grdapi datamart_update_copy_file_info destinationHost=”yourhosthere” destinationPassword=”yourpwdhere” destinationPath=”/local/raid0/sonargd/incoming” destinationUser=”sonargd” Name=”Export:Outliers List” transferMethod=”SCP”

grdapi datamart_update_copy_file_info destinationHost=”yourhosthere” destinationPassword=”yourpwdhere” destinationPath=”/local/raid0/sonargd/incoming” destinationUser=”sonargd” Name=”Export:Outliers Summary by hour” transferMethod=”SCP”

grdapi datamart_update_copy_file_info destinationHost=”yourhosthere” destinationPassword=”yourpwdhere” destinationPath=”/local/raid0/sonargd/incoming” destinationUser=”sonargd” Name=”Export:Export Extraction Log” transferMethod=”SCP”

grdapi datamart_update_copy_file_info destinationHost=”yourhosthere” destinationPassword=”yourpwdhere” destinationPath=”/local/raid0/sonargd/incoming” destinationUser=”sonargd” Name=”Export:Group Members” transferMethod=”SCP”

grdapi datamart_update_copy_file_info destinationHost=”yourhosthere” destinationPassword=”yourpwdhere” destinationPath=”/local/raid0/sonargd/incoming” destinationUser=”sonargd” Name=”Export:Policy Violations” transferMethod=”SCP”

grdapi datamart_update_copy_file_info destinationHost=”yourhosthere” destinationPassword=”yourpwdhere” destinationPath=”/local/raid0/sonargd/incoming” destinationUser=”sonargd” Name=”Export:Buff Usage Monitor” transferMethod=”SCP”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 20 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export:Group Members”

config_script.forManaged:

grdapi schedule_job jobType=dataMartExtraction cronString=”0 10 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export:Outliers List”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 10 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export:Outliers Summary by hour”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 01 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export:Exception Log”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 20 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export:Export Extraction Log”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 05 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export:Policy Violations”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 33 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export:Session Log”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 34 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export:Session Log Ended”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 32 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export:Access Log”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 30 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export:Full SQL”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 05 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export:Policy Violations”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 0 8 ? * 1,2,3,4,5,6,7” objectName=”Export:VA Results”

Sample scripts for SonarG V2.2 and up (change the path to your specific path; for a standalone system run both scripts) - config_script.forManager:

grdapi datamart_update_copy_file_info destinationHost=”yourhosthere” destinationPassword=”yourpwdhere” destinationPath=”/local/raid0/sonargd/incoming” destinationUser=”sonargd” Name=”Export:Exception Log” transferMethod=”SCP”

grdapi datamart_update_copy_file_info destinationHost=”yourhosthere” destinationPassword=”yourpwdhere” destinationPath=”/local/raid0/sonargd/incoming” destinationUser=”sonargd” Name=”Export:Session Log Ended” transferMethod=”SCP”

grdapi datamart_update_copy_file_info destinationHost=”yourhosthere” destinationPassword=”yourpwdhere” destinationPath=”/local/raid0/sonargd/incoming” destinationUser=”sonargd” Name=”Export:Session Log” transferMethod=”SCP”

grdapi datamart_update_copy_file_info destinationHost=”yourhosthere” destinationPassword=”yourpwdhere” destinationPath=”/local/raid0/sonargd/incoming” destinationUser=”sonargd” Name=”Export:Access Log” transferMethod=”SCP”

grdapi datamart_update_copy_file_info destinationHost=”yourhosthere” destinationPassword=”yourpwdhere” destinationPath=”/local/raid0/sonargd/incoming” destinationUser=”sonargd” Name=”Export:Full SQL” transferMethod=”SCP”

grdapi datamart_update_copy_file_info destinationHost=”yourhosthere” destinationPassword=”yourpwdhere” destinationPath=”/local/raid0/sonargd/incoming” destinationUser=”sonargd” Name=”Export:Outliers List” transferMethod=”SCP”

grdapi datamart_update_copy_file_info destinationHost=”yourhosthere” destinationPassword=”yourpwdhere” destinationPath=”/local/raid0/sonargd/incoming” destinationUser=”sonargd” Name=”Export:Outliers Summary by hour” transferMethod=”SCP”

grdapi datamart_update_copy_file_info destinationHost=”yourhosthere” destinationPassword=”yourpwdhere” destinationPath=”/local/raid0/sonargd/incoming” destinationUser=”sonargd” Name=”Export:Export Extraction Log” transferMethod=”SCP”

grdapi datamart_update_copy_file_info destinationHost=”yourhosthere” destinationPassword=”yourpwdhere” destinationPath=”/local/raid0/sonargd/incoming” destinationUser=”sonargd” Name=”Export:Group Members” transferMethod=”SCP”

grdapi datamart_update_copy_file_info destinationHost=”yourhosthere” destinationPassword=”yourpwdhere” destinationPath=”/local/raid0/sonargd/incoming” destinationUser=”sonargd” Name=”Export:Policy Violations” transferMethod=”SCP”

grdapi datamart_update_copy_file_info destinationHost=”yourhosthere” destinationPassword=”yourpwdhere” destinationPath=”/local/raid0/sonargd/incoming” destinationUser=”sonargd” Name=”Export:Buff Usage Monitor” transferMethod=”SCP”

grdapi datamart_update_copy_file_info destinationHost=”yourhosthere” destinationPassword=”yourpwdhere” destinationPath=”/local/raid0/sonargd/incoming” destinationUser=”sonargd” Name=”Export:VA Results” transferMethod=”SCP”

grdapi datamart_update_copy_file_info destinationHost=”yourhosthere” destinationPassword=”yourpwdhere” destinationPath=”/local/raid0/sonargd/incoming” destinationUser=”sonargd” Name=”Export:STAP Status” transferMethod=”SCP”

grdapi datamart_update_copy_file_info destinationHost=”yourhosthere” destinationPassword=”yourpwdhere” destinationPath=”/local/raid0/sonargd/incoming” destinationUser=”sonargd” Name=”Export:Classifier Results” transferMethod=”SCP”

grdapi datamart_update_copy_file_info destinationHost=”yourhosthere” destinationPassword=”yourpwdhere” destinationPath=”/local/raid0/sonargd/incoming” destinationUser=”sonargd” Name=”Export:Discovered Instances” transferMethod=”SCP”

grdapi datamart_update_copy_file_info destinationHost=”yourhosthere” destinationPassword=”yourpwdhere” destinationPath=”/local/raid0/sonargd/incoming” destinationUser=”sonargd” Name=”Export:Databases Discovered” transferMethod=”SCP”

grdapi datamart_update_copy_file_info destinationHost=”yourhosthere” destinationPassword=”yourpwdhere” destinationPath=”/local/raid0/sonargd/incoming” destinationUser=”sonargd” Name=”Export:Datasources” transferMethod=”SCP”

grdapi datamart_update_copy_file_info destinationHost=”yourhosthere” destinationPassword=”yourpwdhere” destinationPath=”/local/raid0/sonargd/incoming” destinationUser=”sonargd” Name=”Export:Installed Patches” transferMethod=”SCP”

grdapi datamart_update_copy_file_info destinationHost=”yourhosthere” destinationPassword=”yourpwdhere” destinationPath=”/local/raid0/sonargd/incoming” destinationUser=”sonargd” Name=”Export:System Info” transferMethod=”SCP”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 20 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export:Group Members”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 21 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export:Datasources”

config_script.forManaged:

grdapi schedule_job jobType=dataMartExtraction cronString=”0 40 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export:Access Log”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 45 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export:Session Log”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 46 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export:Session Log Ended”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 25 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export:Exception Log”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 30 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export:Full SQL”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 10 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export:Outliers List”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 10 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export:Outliers Summary by hour”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 50 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export:Export Extraction Log”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 15 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export:Group Members”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 5 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export:Policy Violations”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 12 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export:Buff Usage Monitor”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 20 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export:VA Results”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 0/5 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export:STAP Status”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 22 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export:Discovered Instances”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 23 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export:Databases Discovered”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 24 0/1 ? * 1,2,3,4,5,6,7” objectName=”Export:Classifier Results”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 0 5 ? * 1,2,3,4,5,6,7” objectName=”Export:Installed Patches”

grdapi schedule_job jobType=dataMartExtraction cronString=”0 0 5 ? * 1,2,3,4,5,6,7” objectName=”Export:System Info”

Using Bundles and DMv2 (SonarG v2.7 and up)

Guardium DMv2 (Data Mart Protocol v2) introduces two changes:

  1. COMPLETE files are replaced with checksums, and therefore only data files are copied over.
  2. Bundles are supported. This is important especially for the SFE bundle - sessions, full sql and exceptions.

When using DMv2 rather then sending many individual DMs you can package multiple DMs to be sent together as one file. You can determine and set of DMs to be packaged together as a bundle - there is one important thing to remember - each bundle has a “main DM”. That main DM determines where the files will be copied to (i.e. its copy file is what determines the location) as well as the schedule. Plus, when there are many DMs in the bundle the bundle will be created when the main DM fires - it is therefore important that the main DM be scheduled last.

The SFE bundle is important and it is suggested to always use it (when you have the right patch level). It packages SESSION, SESSION_END, FULL_SQL and EXCEPTION together and copies it as a single bundle.

To setup the bundle:

grdapi datamart_copy_file_bundle action=”create” bundle_name=”SFE_BUNDLE” main_datamart_name=”Export:Session Log Ended” grdapi datamart_copy_file_bundle action=”include” bundle_name=”SFE_BUNDLE” datamart_name=”Export:Session Log” grdapi datamart_copy_file_bundle action=”include” bundle_name=”SFE_BUNDLE” datamart_name=”Export:Exception Log” grdapi datamart_copy_file_bundle action=”include” bundle_name=”SFE_BUNDLE” datamart_name=”Export:Full SQL”
To delete bundle::
grdapi datamart_copy_file_bundle action=”delete” bundle_name=”SFE_BUNDLE”
To get bundle info::
grdapi datamart_copy_file_bundle action=”info” bundle_name=”SFE_BUNDLE”
To exclude from a bundle::
grdapi datamart_copy_file_bundle action=”exclude” bundle_name=”SFE_BUNDLE” datamart_name=”Export:Session Log Ended”

The main DM has been chosen to be Session Log Ended since under normal schedules it runs last.

Ingesting Non-Guardium Data

The SonarG ETL processes can ingest CSV data from the incoming data folder regardless of whether it comes from Guardium collectors, or other sources. This makes it easy to use SonarG as a Security Data Lake for miscellaneous data as well as make it easy to bring in data from ancillary systems required for compliance reporting, or security analytics, in a Guardium deployment.

There are two ways to ingest such “miscellaneous” CSVs - make them “look like” Guardium extracts or use the misc-files feature of the SonarG ETL.

Misc-Files

You can process a miscellaneous CSV, to insert its data into a collection within the sonargd database, by creating an entry under misc-files within /etc/sonar/sonargd.conf. the CSV will be processed from your regular incoming directory.

Example:

misc-files:
   - match: StreetTree([a-zA-z0-9]*)
     collection: trees
   - match: Regis([a-zA-z0-9]*)
     collection: animals

The first match states that if a filename within the incoming folder is a regex match for StreetTree([a-zA-z0-9]*) sonargd will attempt to ingest the data. As an example StreetTrees_Kerrisdale.csv would regex match StreetTree([a-zA-z0-9]*) The data will be inserted into the trees collection under the sonargd database.

The second match requires that a filename is a regex match for Regis([a-zA-z0-9]*) it will be processed into a collection called animals.

the SonarG Source field is taken from the section of the filename matched in the parenthesis. ( ) If the file looks like StreetTree_gmac03 then the SonarG Source field will have _gmac03 as a value.

By default a match attempts to use the sonargdm plugin during ingestion. To use a different plugin create a matching entry under the plugins section of sonargd.conf. Below, the trees collection does have a match under Plugins, and will be ingested using the basic plugin.

Plugins

To ingest a miscellaneous CSV, using a different plugin than sonargdm, an entry in the above misc-files section must have a corresponding entry under the plugin section of /etc/sonar/sonargd.conf:

Builtin Plugins:
 sonargdm: The internal processing (meant mostly for Guardium originated data).
           Specify the binary if different than /usr/bin/sonargdm
 basic: write each line as is to the collection.
       By default assumes utf-8. Specify 'encoding' if different (standard python encoding)
 upsert: handle upserts. The parameters specify the key. Use encoding as above.
 i_dam: write both to full_sql and instance. Use encoding as above.
 oracle: for oracle xml.

The following example includes both the default sonargdm, as well as a match for trees, which corresponds to the above example in the misc-files section. The plugin used for trees will be basic, forcing the data to be inserted ‘as-is’ into the collection trees:

plugins:
  default:
    sonargdm
  trees:
     basic

Note that there is no entry called animals; ergo, the second entry from the above misc-files example will attempt to be ingested using the sonargdm plugin.

Min-age

Rather than waiting for a COMPLETE file, a misc file is processed only when it has not been updated for a certain (configurable) period of time. This is to prevent a file from being processed while it is still being copied. To configure the wait time before ingestion starts edit:

min-age: 7

in sonargd.conf

Following Guardium DMv1 Conventions

CSVs should conform to certain rules in order to be properly ingested both in terms of format as well as in terms of file naming conventions:

  1. CSVs must include a header line, e.g.:

    $ cat EXP_TRUSTED_CONNECTIONS_20160429100000.csv
    "DB_User_Name","Server_IP","Client_IP","Client_Host_Name","OS_User","Type"
    
  2. CSVs must be comma separated. Strings must be encompassed in double quotes. Typing can be controlled through configuration in the sonargd.conf file but otherwise will create strings for all data.

  3. CSV file names must start with EXP_ as shown above.

  4. One or more CSV of the same type and timeframe may be packaged together. CSVs much be packaged and compressed using tar czf (i.e. a compressed tar) and the file must have a .gz extension, for example:

    $ ls -lrt TRUST -rw-rw-r– 1 ubuntu ubuntu 1185 Apr 29 20:53 1762144738_host1_EXP_TRUSTED_CONNECTIONS_20160429100000.gz -rw-r–r– 1 ubuntu ubuntu 0 Apr 29 21:09 1762144738_host1_EXP_TRUSTED_CONNECTIONS_20160429100000_COMPLETE.gz

The first number represents a unique source machine identifier, the name after the _ is the hostname of the source machine, then the name of the data domain and then the timeframe. Note that the name of the file determines which collection the data will be inserted into. This can be changed and managed by editing sonargd.conf but as a default the filename determines where the data will go.

  1. A COMPLETE file must always accompany a data file by the same name. The content of the COMPLETE file is not used and can it be zero length. A data file will not be processed until the accompanied COMPLETE file arrives; this is how the ETL process knows the copy of the data file has completed.