Log Compression and Archiving: Storing Properly and for the Long Term

Collected logs take up disk space, and over time this space can run out. In this article, we'll look at how to effectively compress and archive logs to preserve important data while not overflowing disk space.

Why Compress and Archive Logs?

Logs aren't just temporary files that can be deleted immediately. They're important for:

  • Problem diagnostics
  • Security auditing
  • Performance analysis
  • Legal requirements (e.g., GDPR, SOX)

However, storing all logs uncompressed is inefficient. That's why archiving is needed:

  • Reduced disk space consumption
  • Simplified file management
  • Faster data transfer
  • Compliance with storage requirements

Compression Formats for Logs

gzip

One of the most popular compression formats:

```bash

Compressing a file

gzip logfile.txt

Creates logfile.txt.gz

Unpacking

gunzip logfile.txt.gz ```

Advantages: - High compression ratio - Wide support - Fast operation

Disadvantages: - Single-threaded compression - Doesn't support multi-volume archives

bzip2

Provides better compression but is slower:

```bash bzip2 logfile.txt

Creates logfile.txt.bz2

```

xz

Newer format with excellent compression ratio:

```bash xz logfile.txt

Creates logfile.txt.xz

```

zstd (Zstandard)

From Facebook, good balance of speed and compression ratio:

```bash zstd logfile.txt

Creates logfile.txt.zst

```

Comparison of Compression Formats

| Format | Compression Ratio | Compression Speed | Decompression Speed | Support | |--------|-------------------|-------------------|---------------------|---------| | gzip | Medium | High | High | Everywhere | | bzip2 | High | Low | Medium | Wide | | xz | Very high | Low | Low | Modern | | zstd | High | Very high | Very high | Modern |

Archiving Logs in Linux

Using tar for Archiving

Combining multiple files into one archive with simultaneous compression:

```bash

Creating archive with gzip

tar -czf logs_archive_$(date +%Y%m%d).tar.gz /var/log/*.log

Creating archive with xz

tar -cJf logs_archive_$(date +%Y%m%d).tar.xz /var/log/*.log

View archive contents without unpacking

tar -tzf logs_archive_$(date +%Y%m%d).tar.gz ```

Advanced Examples

Creating archive with filtering:

```bash

Archive only files from last 7 days

find /var/log -name "*.log" -mtime -7 -print0 | tar --null -czf weekly_logs_$(date +%Y%m%d).tar.gz -T -

Archive with exclusion of certain files

tar --exclude='.gz' --exclude='.bz2' -czf logs_$(date +%Y%m%d).tar.gz /var/log/ ```

Archiving Logs in Windows

PowerShell

```powershell

Archiving with Compress-Archive

Compress-Archive -Path "C:\Logs*.log" -DestinationPath "C:\Archive\logs_$(Get-Date -Format 'yyyyMMdd').zip"

Archiving with subdirectories

Get-ChildItem -Path "C:\Logs" -Recurse -Include "*.log" | Compress-Archive -DestinationPath "C:\Archive\all_logs_$(Get-Date -Format 'yyyyMMdd').zip" ```

External Tools

  • 7-Zip — powerful archiver with excellent compression capabilities
  • WinRAR — popular archiver with support for various formats

Archiving Strategies

By Time

The most common approach is archiving by time intervals:

```bash

Daily archiving

0 2 * * * /usr/local/bin/archive_daily_logs.sh

Weekly archiving

0 3 * * 0 /usr/local/bin/archive_weekly_logs.sh

Monthly archiving

0 4 1 * * /usr/local/bin/archive_monthly_logs.sh ```

By Size

Archiving when a certain size is reached:

```bash

!/bin/bash

LOG_FILE="/var/log/application.log" ARCHIVE_DIR="/var/log/archive" THRESHOLD=100 # in megabytes

SIZE=$(stat -c%s "$LOG_FILE") SIZE_MB=$((SIZE / 1024 / 1024))

if [ $SIZE_MB -gt $THRESHOLD ]; then DATE=$(date +%Y%m%d_%H%M%S) mv "$LOG_FILE" "$ARCHIVE_DIR/application_${DATE}.log" touch "$LOG_FILE" # create new empty file gzip "$ARCHIVE_DIR/application_${DATE}.log" fi ```

By Importance

Different categories of logs may require different archiving strategies:

  • Critical logs (errors, security) — store longer, better compression
  • Informational logs — short-term storage, fast compression
  • Debug logs — minimal storage time

Log Archiving Systems

logrotate with Compression

As we've already seen, logrotate can automatically compress files:

/var/log/application/*.log { daily rotate 52 compress delaycompress missingok notifempty create 640 root adm }

journalctl for systemd

For systemd logs:

```bash

Archiving month's journal

journalctl --since "2026-02-01" --until "2026-02-28" | gzip > /var/log/journal/feb_2026.gz

Setting journal size limit

sudo journalctl --vacuum-size=1G ```

Cloud Storage for Archives

For long-term log storage, cloud solutions are often used:

AWS S3

```bash

Upload archive to S3

aws s3 cp logs_$(date +%Y%m%d).tar.gz s3://my-logs-bucket/daily/

Set up lifecycle for automatic transition to Glacier

aws s3api put-bucket-lifecycle-configuration \ --bucket my-logs-bucket \ --lifecycle-configuration file://lifecycle_policy.json ```

Azure Blob Storage

Using AzCopy to upload archives:

bash azcopy copy "logs_$(date +%Y%m%d).tar.gz" "https://mystorageaccount.blob.core.windows.net/logs/"

Archive Integrity Verification

It's important to periodically check archive integrity:

```bash

For gzip archives

gzip -t logfile.log.gz

For tar.gz archives

tar -tzf logs_$(date +%Y%m%d).tar.gz

Creating checksums

md5sum logs_$(date +%Y%m%d).tar.gz > logs_$(date +%Y%m%d).md5 ```

Archive Rotation

To prevent endless accumulation of archives:

```bash

Delete archives older than 90 days

find /var/log/archive -name "*.gz" -mtime +90 -delete

Using du to monitor space

ARCHIVE_SIZE=$(du -sh /var/log/archive | cut -f1) echo "Archive size: $ARCHIVE_SIZE" ```

Practical Tips

Choosing Compression Strategy

  1. For active use — gzip (fast decompression)
  2. For long-term storage — xz or bzip2 (better compression)
  3. For large volumes — zstd (balance of speed and compression)

Organizing Archive Structure

Create a clear storage structure:

/var/log/archive/ ├── 2026/ │ ├── 01/ # January │ │ ├── logs_20260101.tar.gz │ │ └── logs_20260102.tar.gz │ └── 02/ # February │ ├── logs_20260201.tar.gz │ └── logs_20260202.tar.gz └── current/ # current archives

Monitoring

Monitor the following metrics:

  • Disk space usage
  • Compression/archiving time
  • Compression ratio
  • Number of archiving errors

Conclusion

Proper log archiving and compression is a balance between space savings and information accessibility. Choose formats and strategies based on your needs: if speed is needed, use gzip; if compression ratio is important, use xz or bzip2.

Remember that archiving is not the final stage of working with logs, but an important part of data management strategy. Regularly review your approaches and adapt them to changing business requirements and technologies.