Log Compression and Archiving: Storing Properly and for the Long Term
Collected logs take up disk space, and over time this space can run out. In this article, we'll look at how to effectively compress and archive logs to preserve important data while not overflowing disk space.
Why Compress and Archive Logs?
Logs aren't just temporary files that can be deleted immediately. They're important for:
- Problem diagnostics
- Security auditing
- Performance analysis
- Legal requirements (e.g., GDPR, SOX)
However, storing all logs uncompressed is inefficient. That's why archiving is needed:
- Reduced disk space consumption
- Simplified file management
- Faster data transfer
- Compliance with storage requirements
Compression Formats for Logs
gzip
One of the most popular compression formats:
```bash
Compressing a file
gzip logfile.txt
Creates logfile.txt.gz
Unpacking
gunzip logfile.txt.gz ```
Advantages: - High compression ratio - Wide support - Fast operation
Disadvantages: - Single-threaded compression - Doesn't support multi-volume archives
bzip2
Provides better compression but is slower:
```bash bzip2 logfile.txt
Creates logfile.txt.bz2
```
xz
Newer format with excellent compression ratio:
```bash xz logfile.txt
Creates logfile.txt.xz
```
zstd (Zstandard)
From Facebook, good balance of speed and compression ratio:
```bash zstd logfile.txt
Creates logfile.txt.zst
```
Comparison of Compression Formats
| Format | Compression Ratio | Compression Speed | Decompression Speed | Support | |--------|-------------------|-------------------|---------------------|---------| | gzip | Medium | High | High | Everywhere | | bzip2 | High | Low | Medium | Wide | | xz | Very high | Low | Low | Modern | | zstd | High | Very high | Very high | Modern |
Archiving Logs in Linux
Using tar for Archiving
Combining multiple files into one archive with simultaneous compression:
```bash
Creating archive with gzip
tar -czf logs_archive_$(date +%Y%m%d).tar.gz /var/log/*.log
Creating archive with xz
tar -cJf logs_archive_$(date +%Y%m%d).tar.xz /var/log/*.log
View archive contents without unpacking
tar -tzf logs_archive_$(date +%Y%m%d).tar.gz ```
Advanced Examples
Creating archive with filtering:
```bash
Archive only files from last 7 days
find /var/log -name "*.log" -mtime -7 -print0 | tar --null -czf weekly_logs_$(date +%Y%m%d).tar.gz -T -
Archive with exclusion of certain files
tar --exclude='.gz' --exclude='.bz2' -czf logs_$(date +%Y%m%d).tar.gz /var/log/ ```
Archiving Logs in Windows
PowerShell
```powershell
Archiving with Compress-Archive
Compress-Archive -Path "C:\Logs*.log" -DestinationPath "C:\Archive\logs_$(Get-Date -Format 'yyyyMMdd').zip"
Archiving with subdirectories
Get-ChildItem -Path "C:\Logs" -Recurse -Include "*.log" | Compress-Archive -DestinationPath "C:\Archive\all_logs_$(Get-Date -Format 'yyyyMMdd').zip" ```
External Tools
- 7-Zip — powerful archiver with excellent compression capabilities
- WinRAR — popular archiver with support for various formats
Archiving Strategies
By Time
The most common approach is archiving by time intervals:
```bash
Daily archiving
0 2 * * * /usr/local/bin/archive_daily_logs.sh
Weekly archiving
0 3 * * 0 /usr/local/bin/archive_weekly_logs.sh
Monthly archiving
0 4 1 * * /usr/local/bin/archive_monthly_logs.sh ```
By Size
Archiving when a certain size is reached:
```bash
!/bin/bash
LOG_FILE="/var/log/application.log" ARCHIVE_DIR="/var/log/archive" THRESHOLD=100 # in megabytes
SIZE=$(stat -c%s "$LOG_FILE") SIZE_MB=$((SIZE / 1024 / 1024))
if [ $SIZE_MB -gt $THRESHOLD ]; then DATE=$(date +%Y%m%d_%H%M%S) mv "$LOG_FILE" "$ARCHIVE_DIR/application_${DATE}.log" touch "$LOG_FILE" # create new empty file gzip "$ARCHIVE_DIR/application_${DATE}.log" fi ```
By Importance
Different categories of logs may require different archiving strategies:
- Critical logs (errors, security) — store longer, better compression
- Informational logs — short-term storage, fast compression
- Debug logs — minimal storage time
Log Archiving Systems
logrotate with Compression
As we've already seen, logrotate can automatically compress files:
/var/log/application/*.log {
daily
rotate 52
compress
delaycompress
missingok
notifempty
create 640 root adm
}
journalctl for systemd
For systemd logs:
```bash
Archiving month's journal
journalctl --since "2026-02-01" --until "2026-02-28" | gzip > /var/log/journal/feb_2026.gz
Setting journal size limit
sudo journalctl --vacuum-size=1G ```
Cloud Storage for Archives
For long-term log storage, cloud solutions are often used:
AWS S3
```bash
Upload archive to S3
aws s3 cp logs_$(date +%Y%m%d).tar.gz s3://my-logs-bucket/daily/
Set up lifecycle for automatic transition to Glacier
aws s3api put-bucket-lifecycle-configuration \ --bucket my-logs-bucket \ --lifecycle-configuration file://lifecycle_policy.json ```
Azure Blob Storage
Using AzCopy to upload archives:
bash
azcopy copy "logs_$(date +%Y%m%d).tar.gz" "https://mystorageaccount.blob.core.windows.net/logs/"
Archive Integrity Verification
It's important to periodically check archive integrity:
```bash
For gzip archives
gzip -t logfile.log.gz
For tar.gz archives
tar -tzf logs_$(date +%Y%m%d).tar.gz
Creating checksums
md5sum logs_$(date +%Y%m%d).tar.gz > logs_$(date +%Y%m%d).md5 ```
Archive Rotation
To prevent endless accumulation of archives:
```bash
Delete archives older than 90 days
find /var/log/archive -name "*.gz" -mtime +90 -delete
Using du to monitor space
ARCHIVE_SIZE=$(du -sh /var/log/archive | cut -f1) echo "Archive size: $ARCHIVE_SIZE" ```
Practical Tips
Choosing Compression Strategy
- For active use — gzip (fast decompression)
- For long-term storage — xz or bzip2 (better compression)
- For large volumes — zstd (balance of speed and compression)
Organizing Archive Structure
Create a clear storage structure:
/var/log/archive/
├── 2026/
│ ├── 01/ # January
│ │ ├── logs_20260101.tar.gz
│ │ └── logs_20260102.tar.gz
│ └── 02/ # February
│ ├── logs_20260201.tar.gz
│ └── logs_20260202.tar.gz
└── current/ # current archives
Monitoring
Monitor the following metrics:
- Disk space usage
- Compression/archiving time
- Compression ratio
- Number of archiving errors
Conclusion
Proper log archiving and compression is a balance between space savings and information accessibility. Choose formats and strategies based on your needs: if speed is needed, use gzip; if compression ratio is important, use xz or bzip2.
Remember that archiving is not the final stage of working with logs, but an important part of data management strategy. Regularly review your approaches and adapt them to changing business requirements and technologies.