This prevents the use bookmarks when trying to find the latest common
snapshot. This forces a rollback when the latest snapshot on the source
was deleted but a common and older snapshot was found.
Signed-off-by: Felix Matouschek <felix@matouschek.org>
This commit changes syncoid's behavior so it is always looking for
a matching snapshot and a matching bookmark. If the bookmark was created
after the snapshot it is used instead. This allows replication when the
latest snapshot replicated was deleted on the source, a common snapshot
was found but rollback on the target is not allowed. The matching bookmark
is used instead for replication.
This fixes https://github.com/jimsalterjrs/sanoid/issues/602
Signed-off-by: Felix Matouschek <felix@matouschek.org>
Zpool status reports error counts with unit suffixes after the first 999 errors in any given column; as originally written, the monitor-health plugin used strictly numeric comparison checks which would error out when encountering a value like "1.09K" for checksum errors.
Updated the script to accept warning and critical levels as decimal arguments. Added support for excluding 'squashfs' filesystems from the df command output.
When creating bookmarks compare the GUID of existing bookmarks before
failing the creation of a duplicate bookmark.
Signed-off-by: Felix Matouschek <felix@matouschek.org>
This commit adds a regression test for the out-of-order snapshot
replication issue.
The new test case manipulates the system clock with `setdate` to create
snapshots with non-monotonic `creation` timestamps but a correct,
sequential `createtxg`. It then runs `syncoid` and verifies that all
snapshots were replicated, which is only possible if they are ordered
correctly by `createtxg`.
See #815 for the original test.
* Replaced direct sorting on the `creation` property with calls to the
`sortsnapshots()` helper subroutine. As with other usages, this
ensures that when `syncoid` searches for the next snapshot to
replicate from a bookmark, it preferentially uses the monotonic
`createtxg` for sorting.
* Refactored the variables holding bookmark details from separate
scalars (`$bookmark`, `$bookmarkcreation`) into a single hash
(`%bookmark`). This allows for cleaner handling of all
relevant bookmark properties (`name`, `creation`, `createtxg`).
* Fixed a code comment that incorrectly described the snapshot search
order. The search for a matching target snapshot now correctly states
it proceeds from newest-to-oldest to find the most recent common
ancestor.
System clock adjustments from manual changes or NTP synchronization can
cause ZFS snapshot creation timestamps to be non-monotonic. This can
cause `syncoid` to select the wrong "oldest" snapshot for initial
replication or the wrong "newest" snapshot with `--no-sync-snap`,
potentially losing data from the source to the target.
This change adds the `sortsnapshots()` helper that prefers to compare
the `createtxg` (creation transaction group) property over the
`creation` property when available. The `createtxg` property is
guaranteed to be strictly sequential within a ZFS pool, unlike the
`creation` property, which depends on the system clock's accuracy.
Unlike the first iteration of `sortsnapshots()` in #818, the subroutine
takes a specific snapshot sub-hash ('source' or 'target') as input
rather than both, ensuring createtxg comparisons occur within the same
zpool context. The first iteration that took the entire snapshots hash
had no mechanism to sort target snapshots, which could have caused
issues in usages that expected target snapshots to be sorted.
Most snapshot sorting call sites now use the new `sortsnapshots()`
subroutine. Two more usages involving bookmarks are updated in a
different commit for independent testing of a bookmarks-related
refactoring.
Fixes: #815
The original snapshot fetching relied on a complex state-dependent
`getsnaps()` subroutine with a separate `getsnapsfallback()` for older
ZFS versions. The first refactor attempt in #818 simplified this but
introduced performance regressions by using `zfs get all`, which was
inefficient for large datasets.
This commit avoids that overhead by integrating proactive `zfs get`
feature detection through a new `check_zfs_get_features()` subroutine
that determines the command's capabilities by testing for `-t` (type
filter) support and the availability of the `createtxg` property.
Results are cached per host to avoid redundant checks.
`check_zfs_get_features()` came from the #931, which this change
supersedes.
The `getsnaps()` and `getbookmarks()` subroutines now use this
information to build optimized `zfs get` commands that query only
necessary properties. As before in #818, the parsing logic is refactored
to populate property hashes for each item, eliminating the old
multi-loop state-dependent approach and the need for mostly duplicated
fallback logic.
This resolves both the original complexity and the performance issues
from the first attempted fix. Now there is a foundation for fixing the
snapshot ordering bug reported in #815.