Category: scripting | /usr/sbin/blog

A Database Engine in 16 Lines

2011-08-15T13:36:00-06:00

Recently I've been playing around with a neat command line utility called netcat. It makes it incredibly easy to add networking functionality to your scripts. What it does is allow you to open a port on your machine and listen for incoming connections. Should a connection occur, anything the remote machine sends is dumped to stdout, and anything written to stdin is sent out to the remote machine. As a demo, I decided to write a database engine with it, and I managed to do it in 16 lines.

#!/bin/bash

PORT=1234
DATA_FILE="/full/path/to/data"

listen () {
    TIMEOUT=10
    echo "$2" | nc -l -p $1 -w $TIMEOUT -q 0 
}

while :; do
    ANSWER_PORT=$((RANDOM % 1000 + 1028))
    QUERY="$(echo $ANSWER_PORT | nc -l -p $PORT -q 2)"
    ANSWER="$(grep "^$QUERY," "$DATA_FILE" | sed "s/ $QUERY, //")"
    listen "$ANSWER_PORT" "$ANSWER" &
done

The database itself is simply a text file full of key-value pairs. So, a remote machine can send over a key, and the database responds with that key's corresponding value. If, for example, the database was full of names (keys) and phone numbers (values), a remote machine could send over someone's name, and the database would respond with that person's phone number. Not bad for 16 lines.

Before I step through the code, I'll explain its overall functionality. When first executed, the script opens up port 1234 and listens. When a remote machine wants to query it, the remote machine connects to port 1234 and sends over its query (a key). The script receives this and, over the same connection, sends back a random number between 1028 and 2028. Suppose the random number is 1050. The database will then open up port 1050, and when the remote machine connect to that port, the database will send its answer over that connection (the value corresponding to the key), and the transaction is finished. Let's look at the code now.

Analysis

The script starts off with two variables. The first is the port that the script listens on for incoming queries. The second points to the text file containing the key-value pairs. The file would look something like this:

alex beal, 555-555-5555
bob, 111-111-1111
joe, 222-222-2222

Where, "alex beal", "bob", and "joe" are the keys, and the values corresponding to those keys are listed after the comma. Let's skip over the listen function for now and examine the while loop below it.

while :; do
    ANSWER_PORT=$((RANDOM % 1000 + 1028))
    QUERY="$(echo $ANSWER_PORT | nc -l -p $PORT -q 2)"
    ANSWER="$(grep "^$QUERY," "$DATA_FILE" | sed "s/^$QUERY, //")" 
    listen "$ANSWER_PORT" "$ANSWER" &
done

The first line is a while loop that continues forever until the script receives an interrupt. The second two lines get a random number between 1028 and 2028 and store it to $ANSWER_PORT. That value is piped into the netcat instance on the third line, which listens for connections (-l) on port 1234 (-p $PORT). Once a connection is established, and the random port number is sent, the remote host has 2 seconds to send over a query (-q 2), which gets stored to $QUERY. This means that after the first two lines have executed, the query is received, and the random answer port is sent.

The next lines search through the database file for the key-value pair. Searching the database is as simple as grep-ing the data file for the line beginning with the key, and stripping that key from the output with sed.

Finally we call the listen function defined above. This is passed the port that the client must connect to to receive the answer ($ANSWER_PORT), and the answer itself ($ANSWER).

listen () {
    TIMEOUT=10
    echo "$2" | nc -l -p $1 -w $TIMEOUT -q 0 
}

The answer, which is the second argument ($2), is piped into netcat (nc). The flags tell it to listen (-l) on the port passed to it as the first argument (-p $1). The netcat instance will stop listening after $TIMEOUT seconds, and after the answer is sent, it will close immediately (-q 0). As you can see, the listen function is called as a background process, so the loop can continue back to the top, allowing other requests to be handled while the answer is sent to the client in the background.

The Client's Script

You can query it remotely with this script:

#!/bin/bash
ANSWERPORT=$(echo "$2" | nc $1 1234)
sleep 1s
nc $1 $ANSWERPORT

The first line both receives the answer port, and sends over the query. The script then sleeps for 1 second, while the database handles the request. Finally, it connects a second time to retrieve its answer. The script is used as follows:

query.sh REMOTE_IP QUERY_STRING

Conclusion

Although this was a fun exercise, the script itself isn't very practical. It's slow (at least compared to MySQL), and probably insecure, but there might be a use for it on a private network with trusted users. The take away points here are the different netcat constructs that you could apply to your own programs, such as sending and receiving at the same time, and setting up a client-server model. Definitely let me know in the comments if you end up making something cool!

Download the scripts.

∞ Permalink

Secure Versioned Remote Backups with Rdiff-Backup

2011-08-13T14:31:00-06:00

When putting together a backup solution for this web server, I was looking for a few things:

Simplicity: The less new software, the better. Rsync is both powerful and complex, and I wanted to avoid it. Other things being equal, simple solutions are also more reliable.
Security: I needed to prevent an attacker who gained access to my web server from also gaining access to my backup server. Backing up over an encrypted connection was also a necessity.
Versioning: I needed my solution to keep several versions of all my files. A solution that only keeps the most recent version of a file is useless, if that version turns out to be corrupted.
Incrementing: An efficient backup solution should only update the files that have changed. This ruled out simply tar-ing and ssh-ing my entire webroot over to my backup server.
Automation: This solution should automatically run as a cron job.

As you can see, some of these goals conflict. Wanting versioned incremental backups, while also ruling out rsync just made my life harder. Automation and security also butt heads. If my web server can access my backup server without a password, how can I prevent an attacker who's taken over my web server from trashing my backups? As I'll show, I managed to piece this together with only one new software package: rdiff-backup. This took care of versioning and incrementing in a simpler way than rsync would have. The rest was achieved with tools installed on every web server: ssh and cron.

Backing up with rdiff-backup

Let's start with rdiff-backup. This is dead simple. On Ubuntu, installing is only an apt-get away. This needs to be done on the web server and backup server:

sudo apt-get install rdiff-backup

Backing up to a remote directory over ssh isn't much harder:

rdiff-backup /var/www backups@example.com::/var/backups

That will backup the local directory /var/www to /var/backups on example.com using the ssh user backups. It's that simple. rdiff-backup handles the versioning and incrementing automatically. Also, since it's done over ssh, everything is encrypted. Restoring is just as easy:

rdiff-backup -r 3D backups@example.com::/var/backups/www /var/www.old

That will copy whatever you backed up to /var/backups/www three days prior to /var/www.old on the local machine. Hopefully by now you see why I love this utility. It's basically rsync, but stripped down and rebuilt specifically for backups. Best of all, there are no config files or servers to maintain.

Automation with cron

So, those are the basics of rdiff-backup. How do we automate this? Here's the backup script that I use on this web server:

#!/bin/bash

RDIFF="/usr/bin/rdiff-backup"
REMOTE="backups@example.com::/home/backups/newest/"
CONF="/etc/backups"

cat $CONF | while read DIR; do
        $RDIFF $DIR $REMOTE$(basename $DIR)
        if [ $? != 0 ]; then
                echo "Exited with errors. Backup process stopped at $DIR"
                exit 1;
        fi
done

What this does is read the paths I've listed in the text file /etc/backups, and backs those up to /home/backups/newest on the machine example.com with the user backups. So if /etc/backups looks like this:

/etc/
/home/alex/
/var/www/

Then all those directories get backed up to the remote machine. The first time it's run, it simply mirrors the directories to the remote machine. Each consecutive execution only copies the changed files over. Versioning is, of course, handled automatically by rdiff-backup.

I wrote the script to be easily customizable. Change $REMOTE to modify which machine, user, and directory the files are backed up to. Change $CONF to modify which file contains all the paths to be backed up. $RDIFF simply points to wherever the rdiff-backup binary is (well, python script, actually). As it's currently set, $RDIFF should work on Ubuntu 10.04 machines.

To automate this, simply set it as a cron job on the web server, and run it hourly or daily. Make sure the user running the job has high enough privileges to access the files you want to back up (unless you want to set up some complex permissions, simply running this as root is an unfortunate necessity).

Finally, on the remote machine, you may want some sort of script that manages the backup directory. One idea would be to have the backup directory tar-ed and compressed, then saved somewhere safe. Once that's done, the directory could be cleared out, so the next time the backup script runs, a fresh copy of all the files would be copied over. Set this to run once a week and you'll have weekly full backups, with nightly (or hourly) incremental backups. The following script will do that for you:

#!/bin/bash

NEW="/home/backups/newest"
OLD="/home/backups/archive"

# Tar and compress the old files
(
    cd `dirname $NEW`
    tar -zcf "$OLD/`date "+%F-%H%M"`.tar.gz" "`basename "$NEW"`"
)

# Clear out the backup directory
if [ $? == 0 ]; then
    rm -rf "$NEW"
    mkdir "$NEW"
else
    echo "Exited with errors. Nothing was deleted, but the files may not have been rotated."
    exit 1
fi

$NEW is the location of your newest backups. $OLD is where you want the tar-ed and compressed copies stored.

Security with ssh

We're almost there. Rdiff-backup is installed, and the scripts are automatically backing up and rotating our backup files. The only problem is that every time the script connects to the backup server, it's asked for a password (the remote user's ssh password). We can't be there to type it in, so how do we deal with this? The solution is to create a public/private key pair that the script can log in with. There are lots of places on the web that have detailed instructions on how to do this, but I'll run through it quickly.

First we decide which user is running the backup script on our web server. If it's root, then we log in as root and run ssh-keygen. When prompted for a password, leave it blank. After this is done, we need to copy the public key located under /root/.ssh/id_rsa.pub to the remote machine. If we're logging in as backups on the remote machine, then we copy the public key into the /home/backups/.ssh directory (create the .ssh directory if it doesn't already exist) and rename the file authorized_keys. Now when root connects over ssh to the backup server, he won't be prompted for a password, and neither will the backup script being run as a root cron job. Ahhh the magic of public key cryptography.

The obvious problem here is that if an attacker gets root on the web server, he has access to the backup server. Lucky for us, ssh has a built in way of restricting a remote user to only one command. We do this by prefixing the public key with command="RESTRICTED_COMMAND". So, for example, we can restrict a remote user to rdiff-backup by modifying the authorized_keys file to look something like this:

command="rdiff-backup --server --restrict-update-only /home/backups/newest/" ssh-rsa AAB3NzaC1 [...]

That allows a remote user to only execute rdiff-backup in server mode. But notice the second flag --restrict-update-only. That restricts the user to the backup directory, and only allows her to update the backups, and not delete or otherwise trash them. Pretty cool. The worst an attacker could do is fill the backup server's hard drive by pushing a huge amount of data to it, but since rdiff-backup is versioned, no old versions of the files will be lost.

Also, there are additional options you can prefix to the public key to lock down the server even more. Check out the ssh-keygen man pages, and look under the -O flag.

Conclusion

So, that's my custom built backup solution. I realize I glazed over a lot of small details, and this isn't really a step by step how-to (e.g., I never explained how to set up a cron job). I leave it as an exercise to the reader to put it all together. To help you, I've included the scripts at the bottom of this post. I also check the comments obsessively, so please don't hesitate to ask if you have any questions.

Download the scripts.

∞ Permalink

Using 'trap' To Catch 'Ctrl + C's and Control How Your Script Exits

2011-04-02T16:00:00-06:00

So you've written a script that creates all sorts of temporary files, and launches dozens of processes in the background, and, as is, the script runs indefinitely. The only way to quit is Ctrl + C, but once you fire off this hotkey combo, you're left with a mess of background processes and temp files. There are lots of reasons why your script might do this. For example, I'm currently writing a download manager that sends its wget processes to the background, and creates various temporary status files. It runs indefinitely because it polls a queue file, which contains the URLs of the files it's instructed to download. So, given all that, how is it that I get my script to clean up after itself once I deal it the fatal Ctrl + C blow (or more technically, a "signal interrupt" or "SIGINT" for short)? The trick here is to use the trap command along with some basic process manipulation. Here's the syntax:

trap 'command' signal_to_trap

The above bit of code will cause the command (or string of commands between the quotes) to execute if the script is issued a signal_to_trap. As is implied by the syntax, you can trap more than one type of signal. To catch the Ctrl + C combo, you usually want to trap a SIGINT, which is done as follows:

trap 'echo "Exiting"; exit 0' INT

That bit of code will cause your script to print the message "Exiting" and exit with status 0 once Ctrl + C is pressed (or gets issued a SIGINT some other way). One important caveat is always remember the exit command. For example, the following code will cause your program to print "Exiting" and then continue, effectively ignoring the SIGINT. What NOT to do if you want your program to actually quit:

trap 'echo "Exiting"' INT

Of course, that construct isn't without its uses. Perhaps you want your script to immediately run some routine if Ctrl + C is entered, but you don't want it to quit. The above bit of code would be the way to do it, but that strikes me as a confusing break from convention. Most people expect Ctrl + C to stop the currently running process. A better way to do this would be to use a USR signal. Just substitute INT for USR1 or USR2, and send the USR signal to the script using the kill command: kill -USR1 pid. In any case, back to the topic at hand: exiting a script cleanly.

'Trapping' and Cleaning Up After Yourself

The way I use 'trap' is something along these lines:

exit_routine () {
    # TODO: Clean up stuff.
}
trap 'exit_routine' INT # Intercept SIGINT and call exit_routine

If it's short enough, you can, of course, cram your entire exit routine between the single quotes, but if it's nontrivial it's best to pull it out into its own function. Also remember that BASH executes a script's commands in the order it sees them, so this must be placed somewhere near the beginning of the script to set the trap early on. (Perhaps that's obvious, but I'd be the first to admit that that's just the sort of pitfall I'd waste an hour puzzling over.) Also remember that the function must be declared before the trap statement (or at least before the script receives a SIGINT and tries to call exit_routine).

Cleaning Up Stuff (Processes)

Now that we've declared the exit routine and set the trap, we need to do some actual housekeeping. I do this by keeping track of all the background processes' PIDs and temporary files' filenames in an array. Consider the following code:

for i in $(seq 3); do
    wget "$URL[$i]" & 
    PIDS[$i]=$!
done

Three wget instances are launched and sent to the background with the & operator. Their PIDs are accessed with the $! variable and stored to $PIDS. We can now kill off those processes with the following bit of code:

exit_routine () {
    kill ${PIDS[*]}
    echo "Script exiting."
    exit 0
}
trap 'exit_routine' INT

Whenever Ctrl + C is pressed, the SIGINT is trapped and exit_routine is called. kill is then given all the PIDs, which are accessed using ${PIDS[*]}. This kills the wget processes, an exit message is printed, and the script is exited. The neat thing about the ${ARRAY[*]} way of accessing all an array's elements is that empty elements won't be returned. So if I do:

PIDS[5]="5"
PIDS[10]="10"
echo ${PIDS[*]}

Then only "5 10" is printed.

Dealing with Temp Files

Erasing temp files is a bit more tricky. The possibility of white space characters in a file's name makes rm ${FILES[*]} troublesome. Enclosing the array in double quotes doesn't help either. Instead, we need to step through the array as follows:

for i in $(seq 3); do
    rm "${FILES[$i]}"
done

If we want to ignore the empty elements of $FILES like ${PIDS[*]} automatically did for the PIDs, then we can test if an element is empty using an if statement. Alternatively, you can impress your friends with some BASH-fu and take advantage of BASH's short circuit functionality:

for i in $(seq 3); do
    [ "${FILES[$i]}" != "" ] && rm "${FILES[$i]}"
done

If ${FILES[$i]} is empty, the statement short circuits, and does nothing. If it contains something, the rm command is executed.

So, that's how it's done. Now you have no excuse for leaving garbage behind (if only BASH had some way of automagically collecting this so called garbage!), and, as always, if you have any tips regarding this, please share your BASH-fu with us in the comments!

∞ Permalink

Sending Emails and Texts from the Command Line

2011-03-24T16:39:00-06:00

Here's a handy way to send emails from the command line:

echo 'message' | mail -s 'subject' 'email_address'

If you know the recipient's carrier, you can also use it to send text messages from the command line. For example, if the recipient's number is 111-111-1111 and she's on Verizon, you can send her a text message using the following command:

echo 'Sent from my terminal!' | 
    mail -s 'Linux is fun' '1111111111@vtext.com'

Once again, the trick here is knowing the domain name of the carrier's Email-to-SMS Gateway. In the case of Verizon, it's @vtext.com. Wikipedia has a handy list of other gateways here. Another trick is to send yourself an email from the phone, and the phone's SMS gateway will be revealed in the 'from' field.

You can use this to create all sorts of fun shell scripts. For example, here's one I wrote recently that tells me whether or not the VPS provider BuyVM has any VPSs in stock:

#!/bin/bash -

echo 'Is BuyVM in stock?'
while :; do
        wget -O - -q http://doesbuyvmhavestock.com/ | grep -i yes
        if [ $? == 0 ]; then
                echo 'Well, what are you waiting for?' | 
                    mail -s "BuyVM is in stock" "you@example.com"
                echo 'Yes!'
                exit 0
        else
            echo 'No'
            sleep 15m
        fi
done

It's pretty simple. Every 15 minutes it scrapes the webpage doesbuyvmhavestock.com and sends me an email if it contains the word 'yes'. Obviously, the success of your own script depends on how well it can parse the webpage. As you can see, my parsing routine (the grep on line 5) is pretty primitive.

∞ Permalink

Backing Up a Directory's Permissions

2011-03-17T10:39:00-06:00

We've all done it before, and if it hasn't happened to you yet, it's only a matter of time. You're attempting to set the permissions on a directory and and you're doing something along the lines of sudo chmod -R 660 ./foobar. You execute the command only to realize that you're in the wrong directory or you've mistyped the directory's name, and now your permissions are totally clobbered. If you've kept good backups, this is annoying, but not devastating. Either you could completely restore the files, or perhaps you could write a script which would copy the old file permissions back over. An easier alternative would be to plan ahead and backup the file permissions before disaster strikes. Here's a script that will do that for you:

#!/bin/bash

#Check that we've been passed a directory.
if [ ! -d "$1" ]; then
   echo "$1 is not a directory. Exiting."
   exit 1
fi

#Check if we've been passed a relative or absolute
#path. If it's relative, store $PWD in $DIR. We'll need
#this later to build the absolute path.
echo "$1" | grep -q '^/'
if [ $? == 0 ]; then
   DIR='' 
else
   DIR="$PWD/"
fi

#Generate the header comments for the script.
echo "#!/bin/bash"
echo "#"
echo "#This script will restore permissions under $DIR$1"
echo "#to how they were on `date`"
echo "#"

#Loop through the given directory's tree and 
#echo the commands to restore the permissions and owner.
find "$1" -print0 | while read -d $'\0' i
do
   echo chmod `stat -c %a "$i"` \"$DIR$i\"
   echo chown `stat -c "%U:%G" "$i"` \"$DIR$i\"
done

What this script does is generate another script, which, when run, will restore a directory's permissions, owner, and group (and also the permissions, owner, and group of any files and directories under it). Suppose you drop this script into a file called backup.sh and you want to backup the permissions of a directory called /home/you/mystuff. Here's how you'd do it:

./backup.sh /home/you/mystuff > mystuff-backup.sh

Now when you execute mystuff-backup.sh, all of /home/you/mystuff's permissions will be restored. Easy.

There is one security caveat you should keep in mind. The resultant script will contain the names of all the files and subdirectories in the directory tree. The upshot is that anyone who has read access to the script will be able to see this information. If this is a problem, you should adjust the script's permissions accordingly.

The script itself is pretty straight forward. On line 4, we tell a relative path from an absolute path by seeing if it's prefixed with a forward slash. If it is, then it's absolute. If it isn't, then it's relative. We'll need this information later to build the absolute path to all the files. The next step is to loop through the files, which is a bit tricky. On line 28, we need to pipe the output of find to a while read because the more intuitive for loop construct doesn't handle spaces and newlines in filenames properly (of course, anyone who puts newlines in a filename needs to be thrown into the sarlacc). In fact, you may just want to memorize that line, as it's a common problem with a far from obvious solution. Finally, on line 30 and 31, we read the file's permission and owners with stat, which is a handy command that, among other things, outputs file information in a user definable format.

So there you have it. Download the script here. Now go forth and make this a cron job.

∞ Permalink