Quasar Scan user guide
Risk Management and Mitigation
This page provides information around risks associated with data scanning, and how Quasar mitigates or obviates those risks.
Read-Only Data Access
First and most importantly, Quasar is a data scanner and has no automated remediation capability. This means Quasar does not need or use write access to the assets it scans – it is a read only operation.
This means security and access permissions can set read-only on all resources to be scanned, without compromising any Quasar functionality or capabilities. The business can thus easily be fully insulated from any possibility of data being changed.
Database Locking
With all scan database connections, Quasar will issue appropriate SQL “ISOLATION LEVEL” commands to ensure no locking is done. “Isolation level” controls the tradeoff between locking and data-quality (e.g. ‘dirty reads’ and half-committed data in transactions).
Because Quasar is a scanner, we want to see all data – even that in ‘dirty/phantom reads’ or half-committed transactions, which is what happens when the connection is configured for no locking.
“Read Committed” is the default isolation level on SQL Server, which locks rows even for select queries. Quasar sets its SQL Server connections to “Read UNcommitted”, which performs no locking but introduces the possibility of seeing ‘dirty’ or ‘phantom’ reads, which is not a problem for Quasar’s use case.
More info about MS SQL isolation levels can be found here: https://docs.microsoft.com/en-us/sql/t-sql/statements/set-transaction-isolation-level-transact-sql
On Oracle, “Oracle Database provides for nonblocking reads by default”, the default isolation level employs no locking, which is what we use.
More info here: http://www.oracle.com/technetwork/testcontent/o65asktom-082389.html
Resource Usage
The other risks of data scanning operations are primarily related to resource usage – both on the machine performing the scanning and on the machine being scanned (sometimes these are the same machine). The following describes mitigation features for each resource type.
RAM
The Quasar scanner is a single-threaded sequential design. This means that a scan job processes a maximum of one file or database rowset at a time. (Speed gains are accomplished by running many jobs in parallel, where necessary.)
So for the single job, only ever as much RAM as is necessary to read a file block-by-block is used, and the blocks are sized sufficiently low to ensure minimal RAM usage. In practice the agent consumes around 50 megabytes of RAM while conducting a scan job.
CPU
When Windows/Linux needs to share finite CPU resource across multiple processes, it gives each a chunk of CPU time. How much CPU time a process gets vs other processes is determined by it’s ‘priority’ setting. A high priority will get a process the lion’s share of the CPU, whereas a low priority will cause a process to only be run only if there is spare time left over after all the normal/high processes have had their fill.
Quasar’s as-shipped default priority for scanning processes is “IDLE” – which is the lowest priority setting. Thus everything else on the system will get CPU before the scanner, hence removing ‘cpu load risk’ completely.
This comes at the expense of scans running possibly much more slowly on a loaded system. The setting can be adjusted to NORMAL or HIGH if for example you have a dedicated scanning machine and you want scans to run fast.
Additionally, as the scanner is single-threaded one scan job running will only ever consume a single core. If you are running the scanner directly on a busy multicore machine, ensuring you only run one job at a time will keep the other cores completely free for that machine’s normal business functions. Conversely on a dedicated multicore machine you can run as many jobs as there are cores, for additional speed.
Network
There are two considerations when thinking about network usage, first is bandwidth itself, and second is number of TCP connections.
The largest consumer of bandwidth is scanning fileshares over a network connection. This can be mitigated by locating an agent as close as possible to the network resource. Sometimes (e.g. with Windows or Linux fileservers) it is possible to locate the agent ON the fileserver.
The amount of bandwidth used by the agent sending it’s results to the Quasar app server, is significantly smaller than the amount of bandwidth used by opening and reading all files remotely:
For TCP connections, generally there is one background connection per-agent to the app server for management, and additional connections are made when jobs are running, for data. TCP connection resource risk is mitigated by the manangement connections having an extremely conservative hard-wired reconnect interval of 15 minutes. ie, only one attempt is made every 15 minutes. This can sometimes make testing slightly inconvenient, but it ensures that TCP resources are preserved when Quasar is operating at scale.
Disk
Disk usage is a consideration for the machines running the agent that performs scanning. A small amount of temp-file space is used for functions such as decompressing large zip files.
The agent has a temp-file management feature that aggressively limits the amount of temp space available to the agent. This limit is configurable and the as-shipped default is 200 megabytes. Temp space is cleaned by the agent after jobs complete or are aborted, and if the temp space size limit is exceeded, the agent shuts down the scan job.
As scan targets are only ever read from, no disk usage considerations apply to them.