Time passes, growth happens, and suddenly your server is croaking because 10 simultaneous rsyncs are happening. The script runtime is now longer than your interval. Being the smart person you are, you add some kind of synchronization to prevent multiple instances from running at once, and it might look like this:
#!/bin/sh $lock = "/tmp/cron_rsync.lock" if [ -f "$lock" ] ; then echo "Lockfile exists, aborting." exit 1 fi touch $lock rsync ... rm $lockYou have your cron job put the output of this script into a logfile so cron doesn't email you when the lockfile's stuck.
Looks good for now. A while later, you log in and need to do work that requires this script temporarily not run, so you disable the cron job and kill the running script. After you finish you work, you enable the cron job again.
Due to your luck, you killed the script while it was in the rsync process, which meant the 'rm $lock' never ran, which means your cron job isn't running now and is periodically updating your logfile with "Lockfile exists, aborting." It's easy to not watch logfiles, so you only notice this when something breaks that depends on your script. Realizing the edge case you forgot, you add handling for signals, just above your 'touch' statement:
trap "rm -f $lock; exit" INT TERM EXITNow normal termination and signal (safely rebooting, for example) will remove your lockfile. And there was once again peace among the land ...
... until a power outage causes your server to reboot, interrupting the rsync and leaving your lockfile around. If you're lucky, your lockfile is in /tmp and your platform happens to wipe /tmp on boot, clearing your lockfile. If you aren't lucky, you'll need to fix the bug (you should fix the bug anyway), but how?
The real fix means we'll have to reliably know whether or not a process is running. Recording the pid isn't totally reliable unless you check the pid's command arguments, and it doesn't survive some kinds of updates (name change, etc). A reliable way to do it with the least amount of change is to use flock(1) for lockfile tracking. The flock(1) tool uses the flock(2) interface to lock your file. Locks are released when the program holding the lock dies or unlocks it. A small update to our script will let us use flock instead:
#!/bin/sh lockfile="/tmp/cron_rsync.lock" if [ -z "$flock" ] ; then lockopts="-w 0 $lockfile" exec env flock=1 flock $lockopts $0 "$@" fi rsync ...This change allows us to keep all of the locking logic in one small part of the script, which is a benefit alone. The trick here is that if '$flock' is not set, we will exec flock with this script and its arguments. The '-w 0' argument to flock tells it to exit immediately if the lock is already held. This solution provide locking that expires when the shell script exits under any conditions (normal, signal, sigkill, power outage).
You could also use something like daemontools for this. If you use daemontools, you'd be better off making a service specific to this script. To have cron start your process only once and let it die, you can use 'svc -o /service/yourservice'
Whatever solution you decide, it's important that all of your periodic scripts will continue running normally if they are interrupted.
- flock(2) syscall is available on solaris, freebsd, linux, and probably other platforms
- FreeBSD port of a different flock implementation: sysutils/flock
- daemontools homepage