Release Notes
2.22.1
December 8, 2021
new
Introduce IPv6 support (i.e. fe90::60:ff:fe00:1,::1) for network attacks.
2.22.0
December 3, 2021
new
On startup, the Gremlin agent now performs some validation on its ability to run a CPU and Latency attack. Validation results are accessible through the Clients API.
info
The Gremlin agent now outputs more information when Gremlin is killed during container attacks.
info
Updated dependencies
2.21.2
November 16, 2021
fix
Fixed a bug present in RPM and DEB packaging where the
gremlind
service startup script changed ownership of /var/log
to gremlin:gremlin
on SysvInit enabled systems.info
Updated dependencies
2.21.0
October 25, 2021
new
The Disk attack has been significantly improved. In most cases it is much faster, more accurate, and safer. It also uses significantly less CPU and RAM when filling disk volumes. The improved version is used when the environment variable
GREMLIN_DEUCHAINN_EN1023
is set to true
; all other values are treated as false
. This environment variable may be ignored or removed in a future version without notice.2.20.1
October 20, 2021
fix
Fixed AWS tag ingestion when running Gremlin in a container.
fix
Fixed bug with Gremlin's IO attack cleanup when
--mode r
or --mode w
was used. Previously, Gremlin would try to tear down files that did not exist, leading to attack failures.info
Improve messages reported by the Gremlin IO attack, when file-creation errors occur.
info
Updated dependencies
2.20.0
October 8, 2021
info
Changed the way
PUSH_METRICS
and GREMLIN_COLLECT_PROCESSES
boolean configuration variables are evaluated. Previously, any non-empty value other than "0"
would evaluate to true
(e.g. GREMLIN_COLLECT_PROCESSES=false
would evaluate to true
). This has been changed to provide expected outcomes: the only values that evaluate true
are now "1"
, "true"
, and "TRUE"
, leaving all other values to evaluate to false
.
info
Updated dependencies
2.19.5
September 27, 2021
fix
Fixed a bug where the Gremlin agent does not properly roll back time travel attacks with an offset of 5 seconds or less.
2.19.4
September 16, 2021
fix
Fixed a bug where the Gremlin agent does not initialize the
containerd-runc
container driver when running on a system using the systemd
cgroup driver.2.19.3
September 10, 2021
new
The
percent
argument for Disk attacks now accepts real numbers. For example, --percent 27.5
was previously unsupported.new
Gremlin no longer relies on the
hostname
executable to derive the host's hostname. This is replaced by the gethostname(2) system call.fix
API interactions made by the Gremlin agent now always send the appropriate
Content-Type
header value.info
Updated dependencies
2.19.1
August 2, 2021
fix
Fixed a bug where child processes beyond immediate children of a container's root process were ignored from the process collection mechanisms that inform service discovery.
info
Updated dependencies
2.19.0
July 15, 2021
fix
This update fixes Memory attack bugs. Previously, the amount of memory consumed could deviate significantly from what was requested especially when an attack is run just after I/O operations.
new
The Memory attack is more "aggressive" in the sense that the memory allocated by Gremlin during the attack is more difficult to swap to disk.
2.18.5
July 6, 2021
fix
Fixed a bug among all container drivers that use
runc
, introduced in 2.18.4, where attacks against container and Kubernetes object targets would fail if the targets had memory limit values that exceeded 4 GiB.2.18.4
July 2, 2021
fix
Fixed a bug among all container drivers that use
runc
, where the Memory Gremlin's `Percent` argument would calculate incorrect target memory allocation. Instead of using the container's available/total memory statistics, Gremlin would use the host's, causing Gremlin to try (and fail) to allocate more than is allowed by the target's memory limits.2.18.3
June 29, 2021
fix
Fixed a bug that prevented the use of comma-separated values as the input for port arguments to the
packet_loss
Gremlin.info
Updated dependencies.
2.18.2
June 15, 2021
fix
Gremlin container drivers
crio-runc
, containerd-runc
, and docker-runc
requested more Linux capabilities than were actually needed by Gremlin: SETFCAP
, AUDIT_WRITE
, MKNOD
, NET_RAW
. Gremlin no longer requests these capabilities when running container attacks.fix
Gremlin no longer fails to make outgoing HTTP calls to api.gremlin.com if there happens to be directives in
/etc/resolv.conf
that Gremlin does not understand. Gremlin will now log a message when encountering unknown directives and ignore them.new
The Gremlin CLI now has a
gremlin check daemon
subcommand which reports on the status of any running Gremlin agent, as well whether process collection is enabled.info
Gremlin now logs errors encountered when collecting process information. Gremlin logs these errors only the first time they are encountered to reduce log noise.
info
Updated dependencies.
2.18.1
April 29, 2021
fix
RPM and DEB package installers for
gremlind
did not properly honor values for GREMLIN_COLLECT_PROCESSES
in /etc/default/gremlind
.info
Updated dependencies.
2.18.0
April 26, 2021
new
Gremlin announces Services Discovery for tracking and improving the reliability of distributed services! This update includes support for Services Discovery for Linux.
info
RPM and DEB packages have been updated to set the following capabilities on
/usr/sbin/gremlind
: CAP_SYS_PTRACE
and CAP_DAC_READ_SEARCH
. The two capabilities are necessary for the daemon to collect process information for Services Discovery. While the capabilities are set at installation time, Gremlin process collection features are disabled by default and can be enabled by changing the agent configuration. Visit Enabling Services Discovery for more informationinfo
Updated dependencies.
2.17.10
April 14, 2021
fix
While very rare, getting the current username can fail. When that happens the Gremlin Client would fail to run an attack. Instead, this version resorts to using "unknown" when the username cannot be determined.
2.17.9
April 5, 2021
fix
Upgrade our Docker image to mitigate a security vulnerability in OpenSSL.
new
Daemon log file management improvements. Previously, the log file was truncated at midnight. That made troubleshooting difficult. The log file is now rolled when it reaches approximately 1 MiB. Ten compressed log files are kept. With this update the current log file typically captures several days and the compressed log files typically capture a few weeks at a modest cost of approximately 2 MiB of disk space.
2.17.8
March 17, 2021
fix
Fix a bug in Gremlin's argument parsing for the
hostnames
and ipaddresses
arguments for network attacks.2.17.7
March 12, 2021
fix
Improve command-line argument parsing by providing better error messages and catching more edge cases related to illegal inputs.
new
When enabled, Gremlin process collection now correctly labels child processes of containers as container processes, where they previously were labeled as host processes.
new
When enabled, Gremlin process collection now records process's active Ipv6 sockets if they can be translated to Ipv4. This is most commonly seen among container processes that are running in their host's network namespace.
2.17.6
March 4, 2021
fix
Patch a vulnerability in a 3rd party library that posed a variety of memory corruption scenarios, most likely use-after-free.
info
Drop invalid targeting tags with a warning.
2.17.5
February 18, 2021
new
The daemon version is included in the
gremlin check
report.fix
Occasionally the Docker version was incorrectly parsed which would result in the classic driver being used for container attacks.
2.17.3
January 27, 2021
new
Some agent API traffic is now gzip-compressed, reducing network overhead on machines where Gremlin is installed.
2.17.2
January 12, 2021
fix
Patch a vulnerability in a 3rd party library that posed a potential buffer overflow scenario
fix
Patch a vulnerability in a 3rd party library that posed a potential scenario to operate on dangling memory references
2.17.1
December 11, 2020
new
You can now specify the
SSL_CERT_FILE
variable via the config.yml
file. See the advanced configuration page for details on how to use it.2.17.0
December 7, 2020
fix
Gremlin now properly interprets escaped newline characters
\n
for values of the GREMLIN_SSL_CERT
environment variable.info
Gremlin now reports container and process data at a slower rate, down from every 5 seconds during active attacks (and every 10 seconds otherwise) to every 30 seconds. We've found that this data changes much less frequently than is justified for a 5-10 second interval. This should result in significantly reduced network overhead required to run Gremlin.
info
Updated dependencies
2.16.3
November 20, 2020
fix
The Gremlin agent now writes a message to
daemon.log
when attacks finish. This provides observers of this log with an approximation on when attacks have ended.info
Updated dependencies
2.16.1
November 6, 2020
new
Gremlin will now log more information when it receives signals such as TERM. Details include the user and process that sent the signal.
info
Updated dependencies
2.16.0
October 14, 2020
new
Introduced 3 new container drivers:
docker-runc
, crio-runc
, and containerd-runc
. With this comes support for new container runtimes: Cri-O and Containerd.new
Gremlin's container image now runs solely on Alpine Linux, reducing image size and complexity.
fix
Gremlin now provides full support for the systemd cgroup driver when running any of the new container drivers.
2.15.11
October 13, 2020
fix
Provide operating system tags for Alpine Linux
info
Update the expiration date for code signing keys
info
Updated dependencies
2.15.10
October 8, 2020
fix
Fixed a bug that omitted previous Gremlin versions from showing up at rpm.gremlin.com
fix
Improved Gremlin's ability to discover Linux distributions that would otherwise yield a tag of
os-type: Unknown
. Among the previously unknown distributions are Alpine
, Amazon
, Fedora
, and Red Hat Enterprise
. These distributions will now properly yield an os-type: Linux
tag as well as an os-name
tag that appropriately describes the Linux distribution.info
Updated dependencies
2.15.9
September 28, 2020
new
AWS Availability Zone ID (azid) is available for targeting.
new
AWS tags are now available for targeting.
2.15.8
September 21, 2020
fix
Error messages from attack executions resulting in
InitializationFailed
were missing their error output in the UI. Gremlin now properly reports the error that occurs during initializationfix
Fix a regression introduced in
2.15.0
which removed Gremlin's Systemd service configuration during re-installs and upgrades. Now, Gremlin properly configures Systemd (or SysvInit) on every installation, re-installation, or upgrade.2.15.7
September 17, 2020
new
Output detailed messages when an attack results in a terminated process.
fix
Filter out clearly invalid data when collecting cloud metadata.
2.15.5
August 27, 2020
fix
Fixed a bug introduced in
2.12.25
where Gremlin did not accurately determine when SELinux was enabled for Docker users. This produced incorrect behavior for Gremlin's container attacks, as Gremlin failed to mount /var/lib/gremlin
with the Docker volume options :z, resulting in permissions errors.new
Gremlin now reports
Available Memory
for gremlin measure memory
new
When Gremlin runs in a container, it can now be run under custom SELinux process labels. This allows the privileges that Gremlin requires to run correctly to be granted only to Gremlin and not the rest of a host containers running under the default process label:
container_t
. Learn more about this on our documentation page, or our Github repo2.15.3
July 15, 2020
fix
Improve error messaging when Gremlin fails to find an IP address for a hostname supplied with the
--hostname
argument, which can be passed to any network attack. Error message now mentions failures due to specifying a hostname that maps to an invalid DNS record type, such as NS.fix
Gremlin was not correctly using the
SSL_CERT_FILE
environment variable when running attacks against containers. As a result, Gremlin would only properly trust intermediate SSL proxies if the file referenced in SSL_CERT_FILE
had a path within /var/lib/gremlin
. Now, this file can live anywhere on the file system, so long as Gremlin has access to it.2.15.2
July 1, 2020
fix
Patch a vulnerability in a 3rd party library that posed a potential denial of service to Gremlin's outbound https connections. In practice this is 100% mitigated unless connecting Gremlin through a malicious SSL proxy
info
Updated dependencies
2.15.1
June 30, 2020
fix
Gremlin was not using the custom TLS trust store (specified by the
SSL_CERT_FILE
environment variable) when carrying out attacks against containers. This resulted in a failure to launch container attacks for users that rely on this configuration.fix
Improve accuracy of latency measurement when checking Gremlin's connectivity to the control plane using
gremlin check api
. This measurement now omits the time it takes to initialize the HTTP client used to test connectivity.2.15.0
June 10, 2020
new
Gremlin can be installed with a custom group, user, and/or binary mode. The three optional environment variables
GREMLIN_INSTALL_GROUP
, GREMLIN_INSTALL_USER
, GREMLIN_INSTALL_BIN_MODE
are set before running the install to establish the security context. The defaults are unchanged: gremlin
, gremlin
, 6111
.2.14.16
June 2, 2020
fix
Added more detail to error messages that occur when Gremlin fails to do a DNS lookup of a hostname. Previously the error message did not include the reason for the lookup failure. An example of the new detail we've added is:
failed to lookup address information: Name does not resolve.
2.14.15
May 27, 2020
fix
Fixed a bug where Time travel attacks were not blocking the NTP port of the target, even when told to do so. Now, specifying
--ntp
, or checking the Block NTP
box in the UI, correctly blocks all traffic to outbound NTP servers. Omitting this option still correctly allows NTP traffic on the target.2.14.14
May 20, 2020
fix
Fixed a bug where container attacks (including Kubernetes) were not properly setting attacks to ClientAborted when Gremlin's target is killed. This fix includes displaying more information about Gremlin's status after the target is killed.
2.14.13
May 19, 2020
fix
Fixed a bug in how the Gremlin Agent reports attack status when Gremlin attacks exit abnormally. In many instances, attacks were incorrectly labeled as
LostCommunication
when they instead failed to start (Failed
), or were killed mid-attack (ClientAborted
).fix
Fixed a bug where the Gremlin Disk attack would not clean up the impact files it created if it was halted from the UI.
fix
Changed the way Debian and RPM installation scripts handle failures when adding Gremlin to the Docker Linux group. Previously, Gremlin would fail and terminate the installation if a
docker
Linux group was found, but could not add Gremlin to it. Now, a warning is printed instead.2.14.12
May 11, 2020
fix
Improved the safety guarantees of the Gremlin Agent when loading attacks from the filesystem. Now, if the Gremlin Agent fails to load any attack state due to IO errors, all attacks will be halted immediately to prevent any unexpected behavior.
2.14.11
May 6, 2020
info
We now collect an approximate host boot time, this will aid Gremlin to better recognize unique hosts on your team.
fix
Select a default network interface in more cases (also used when Gremlin identifier isn't specified).
2.14.10
April 30, 2020
fix
Fix bug that prevented the Gremlin agent from reading attack state for attacks created via the CLI. This was preventing users from halting such attacks from the UI, as well as reading logs from the attack details page.
fix
Remove
attack.log
files associated with attacks that get rolled back from the CLI through gremlin rollback as well as signals, such as from a Ctrl-C.2.14.9
April 29, 2020
fix
Immediately halt and mark the attack as "Initialization Failed" if a Disk Gremlin encounters an IO error while writing the desired amount of bytes.
2.14.7
April 14, 2020
fix
Integrate more thoroughly with the cgroups managed by Kubernetes and Docker. Gremlin container attacks now properly report usage metrics to cAdvisor which is used in Kubernetes monitoring and autoscaling triggers. NOTE: Gremlin currently only supports the
cgroupfs
cgroup driver. View more information2.14.6
April 8, 2020
fix
Cap the
--workers (-w)
argument for Disk and IO attacks to a maximum value equal to the number of CPUs available to Gremlin. This ensures Gremlin is always busy, and not generating more threads than can be fully utilized by the machine on which Gremlin runs. This also eliminates the possibility that Gremlin will exhaust all threads available to Gremlin, which was observed with very large values supplied to --workers
(1024 or higher)info
Updated dependencies
2.14.5
March 27, 2020
fix
Improved handling of invalid auth when running
gremlin attack-container
new
Better organization of output of
gremlin check auth
, including more information in both the success/error cases2.14.4
March 26, 2020
fix
The daemon was not properly halting attacks when it did not have access to it's library directory:
/var/lib/gremlin
, even though it would allow attacks to run. Attacks are now properly halted.fix
Users can now supply
push_metrics
inside config.yaml
. This attribute is a boolean value that defaults to true
, and is equivalent to the environment variable PUSH_METRICS=1
2.14.3
March 20, 2020
fix
The daemon was not correctly handling the case when it started up in an un-authed state and relied on
gremlin init
being run to provide the .credentials
file. In particular, it was missing some critical metadata which charting relied upon.fix
Added subheaders to
gremlin check auth
to better categorize the outputfix
Read tags supplied in
config.yaml
new
Ship example
config.yaml
to RPM/DEB packagesnew
Auto-initialize daemon if secret is present and credentials are not present
2.14.1
March 11, 2020
fix
gremlin measure $TYPE
now accepts TYPE
in uppercase or lowercase (e.g. gremlin measure cpu
). It previously only accepted uppercase.2.14.0
March 6, 2020
fix
Kubernetes Pod eviction events triggered by Gremlin resource attacks no longer produce
Failed
attack states. There is now additional information when Gremlin is killed, and the steps it took to clean up.fix
Gremlin agents installed into Azure now properly set the
publicIpAddress
metadata tag (erroneously named publicpAddress
in prior versions).new
Gremlin now pushes CPU metrics for active attacks. These metrics will be used in charting features that allow you to see Gremlin's effect on your machines in real time. To disable this functionality, add
PUSH_METRICS=0
to the configuration for gremlind
. No data is collected when attacks aren't running, and only data relevant to the attack is collected.2.13.0
February 28, 2020
fix
Gremlin can now compete with the resources dedicated to a container, instead of taking free resources from the host. See more about Gremlin and Cgroups
fix
Running attacks are now halted when the
gremlind
service is told to shut down from process managersfix
Memory Gremlin more aggressively touches memory it consumes to better ensure that operating systems don't try to reuse some of it
new
os-name
tag added to clients by default; this value, in combination with os-type
, now make up the full os description of the machine (i.e. os-type=Linux
+ os-name=Ubuntu
)2.12.27
February 26, 2020
fix
Memory leak collecting measurement data
fix
Ensure capabilities are correctly applied during a rollback
new
Improvements to I/O and Disk attack targeting and capabilities handling
new
Better local IP address determination when automatically setting
GREMLIN_IDENTIFIER
new
Improved shutdown handling (
SIGINT
, SIGTERM
, and attack halt)info
Updated dependencies
2.12.26
February 17, 2020
fix
There was a regression in
2.12.25
where host attacks that required capabilities did not properly rollback. This release fixes that.fix
There was a regression in
2.12.23
where the value of SSL_CERT_FILE
was not added to the trust store. That is properly wired into the trust store againfix
/var/log/gremlin/executions/{guid}
was not being cleared on halts - now it isfix
Shutdown container attack showed an error in the logs, now this case is handled more gracefully
fix
gremlin status
was displaying UnknownVariantError
in some casesinfo
Updated dependencies
2.12.25
February 4, 2020
info
Updated dependencies
info
Gremlin now interfaces with version 1.24 of Docker's REST API over Unix socket
/var/run/docker.sock
, instead of indirectly through docker
shell commands.2.12.23
January 9, 2020
fix
Address startup errors referencing
number too large to fit in target type
, which happens under certain configurations of the target machine's CPU.fix
Signal handling and process killer improvements
2.12.24
January 9, 2020
fix
Better handling for the case when a stateful attack doesn't get a chance to clean up properly within a container
2.12.22
January 2, 2020
fix
Make file management for Gremlin logs more operating system agnostic
fix
Improve capabilities checking
info
Updated dependencies
2.12.21
December 2, 2019
fix
Prevent non-privileged users from acquiring Gremlin secrets if they have command-line access to linux hosts while a container attack is running
fix
Supply the correct
DOCKER_API_VERSION
to container attacksinfo
Updated dependencies
new
New `
os_type
tag added to all new Gremlin clients (e.g. os_type:Linux
)2.12.20
November 21, 2019
fix
Fixed bug that caused Network Gremlins to fail when attacking two or more processes (including containers) when they shared a network interface.
2.12.19
November 12, 2019
new
Improved memory attack performance by as much as four times while limiting the CPU impact.
fix
Recover gracefully from operating system out-of-memory errors.
fix
Minor status message improvements for the memory attack.
2.12.17
October 29, 2019
fix
Fixed a bug where launching a container attack was not respecting the
GREMLIN_BYPASS_USERNS_REMAP
environment variable. This should get set only when the Docker namespace remapping feature is being used.2.12.16
October 23, 2019
fix
Fixed a bug where Memory Gremlin puts unnecessary strain on
getrandom
and therefore system entropy.2.12.14
October 17, 2019
fix
Fixed bug where Gremlin (in Docker only) would log errors about missing directories until it received an attack to run
2.12.13
October 14, 2019
fix
Fixed a bug where the Gremlin CPU attack would leave too much CPU in the
idle
and sy
states. The CPU attack will now consume the requested amount, using us
instead.2.12.11
October 1, 2019
fix
Fixed bug where Gremlin would fail attacks due to a closed HTTP stream
fix
Fixed bug where Gremlin would fail to load attacks under certain circumstances
2.12.10
September 27, 2019
fix
Improved error messaging around loading authentication configuration
new
New command
gremlin check
for diagnostics, check out the docs2.12.8
September 9, 2019
fix
Improve help text for Blackhole Gremlin arguments about ports
info
Updated dependencies
2.12.7
September 5, 2019
fix
Fix bug where Gremlin would create
/var/lib/gremlin/.credentials
with permissions from the OS umask. Gremlin would then change the mode of the created file before writing to it. Now, Gremlin creates the file with proper permissions, without having to change mode later.fix
Remove world-readable bit from the
/var/log/gremlin
directory2.12.5
August 28, 2019
fix
Fix to Memory Gremlin running in containers - we were allowing the Gremlin to allocate more memory than was given to the target container
2.12.4
August 23, 2019
fix
Bugfix to Memory Gremlin - we were letting the
--percentage
option consume more memory than was available2.12.3
August 21, 2019
fix
Fewer writes by the client to the filesystem, reducing the chance that a Disk Gremlin fails
2.12.1
August 5, 2019
fix
Explicitly track tearing down successful attacks, so we don't halt attacks too early in the case teardown takes a material amount of time.
2.11.17
July 31, 2019
fix
Ensure Gremlin sidecars launched in a container have the same
GREMLIN_IDENTIFIER
as the daemon.info
Updated dependencies
2.11.16
July 26, 2019
fix
Make the
Memory
attack track its allocation time in the Initializing
state.2.11.9
July 2, 2019
fix
Fix handling of
GREMLIN_CLIENT_TAGS
, which were ignored starting in 2.11.6
.new
Added more trust-store file locations
2.11.6
June 25, 2019
new
Automatically populate client tags when running in Microsoft Azure or Google Cloud
2.11.4
June 21, 2019
fix
Bugfix for halted attacks which ended in a
Lost Communication
state (introduced in 2.11.2
)2.11.1
June 10, 2019
new
Automatically populate client tags with
instance-id
when running on AWS EC2.info
Updated dependencies
2.11.0
May 29, 2019
new
Resource
CPU
Attacks can now impact All cores
and can consume a percentage of CPU capacitynew
Network
DNS
attacks now cache the IP address of the Gremlin Control Plane to avoid the attack from halting prematurelyfix
Proxy details are now hidden in the attack logs on successful calls