Category: bash | /usr/sbin/blog

A Database Engine in 16 Lines

2011-08-15T13:36:00-06:00

Recently I've been playing around with a neat command line utility called netcat. It makes it incredibly easy to add networking functionality to your scripts. What it does is allow you to open a port on your machine and listen for incoming connections. Should a connection occur, anything the remote machine sends is dumped to stdout, and anything written to stdin is sent out to the remote machine. As a demo, I decided to write a database engine with it, and I managed to do it in 16 lines.

#!/bin/bash

PORT=1234
DATA_FILE="/full/path/to/data"

listen () {
    TIMEOUT=10
    echo "$2" | nc -l -p $1 -w $TIMEOUT -q 0 
}

while :; do
    ANSWER_PORT=$((RANDOM % 1000 + 1028))
    QUERY="$(echo $ANSWER_PORT | nc -l -p $PORT -q 2)"
    ANSWER="$(grep "^$QUERY," "$DATA_FILE" | sed "s/ $QUERY, //")"
    listen "$ANSWER_PORT" "$ANSWER" &
done

The database itself is simply a text file full of key-value pairs. So, a remote machine can send over a key, and the database responds with that key's corresponding value. If, for example, the database was full of names (keys) and phone numbers (values), a remote machine could send over someone's name, and the database would respond with that person's phone number. Not bad for 16 lines.

Before I step through the code, I'll explain its overall functionality. When first executed, the script opens up port 1234 and listens. When a remote machine wants to query it, the remote machine connects to port 1234 and sends over its query (a key). The script receives this and, over the same connection, sends back a random number between 1028 and 2028. Suppose the random number is 1050. The database will then open up port 1050, and when the remote machine connect to that port, the database will send its answer over that connection (the value corresponding to the key), and the transaction is finished. Let's look at the code now.

Analysis

The script starts off with two variables. The first is the port that the script listens on for incoming queries. The second points to the text file containing the key-value pairs. The file would look something like this:

alex beal, 555-555-5555
bob, 111-111-1111
joe, 222-222-2222

Where, "alex beal", "bob", and "joe" are the keys, and the values corresponding to those keys are listed after the comma. Let's skip over the listen function for now and examine the while loop below it.

while :; do
    ANSWER_PORT=$((RANDOM % 1000 + 1028))
    QUERY="$(echo $ANSWER_PORT | nc -l -p $PORT -q 2)"
    ANSWER="$(grep "^$QUERY," "$DATA_FILE" | sed "s/^$QUERY, //")" 
    listen "$ANSWER_PORT" "$ANSWER" &
done

The first line is a while loop that continues forever until the script receives an interrupt. The second two lines get a random number between 1028 and 2028 and store it to $ANSWER_PORT. That value is piped into the netcat instance on the third line, which listens for connections (-l) on port 1234 (-p $PORT). Once a connection is established, and the random port number is sent, the remote host has 2 seconds to send over a query (-q 2), which gets stored to $QUERY. This means that after the first two lines have executed, the query is received, and the random answer port is sent.

The next lines search through the database file for the key-value pair. Searching the database is as simple as grep-ing the data file for the line beginning with the key, and stripping that key from the output with sed.

Finally we call the listen function defined above. This is passed the port that the client must connect to to receive the answer ($ANSWER_PORT), and the answer itself ($ANSWER).

listen () {
    TIMEOUT=10
    echo "$2" | nc -l -p $1 -w $TIMEOUT -q 0 
}

The answer, which is the second argument ($2), is piped into netcat (nc). The flags tell it to listen (-l) on the port passed to it as the first argument (-p $1). The netcat instance will stop listening after $TIMEOUT seconds, and after the answer is sent, it will close immediately (-q 0). As you can see, the listen function is called as a background process, so the loop can continue back to the top, allowing other requests to be handled while the answer is sent to the client in the background.

The Client's Script

You can query it remotely with this script:

#!/bin/bash
ANSWERPORT=$(echo "$2" | nc $1 1234)
sleep 1s
nc $1 $ANSWERPORT

The first line both receives the answer port, and sends over the query. The script then sleeps for 1 second, while the database handles the request. Finally, it connects a second time to retrieve its answer. The script is used as follows:

query.sh REMOTE_IP QUERY_STRING

Conclusion

Although this was a fun exercise, the script itself isn't very practical. It's slow (at least compared to MySQL), and probably insecure, but there might be a use for it on a private network with trusted users. The take away points here are the different netcat constructs that you could apply to your own programs, such as sending and receiving at the same time, and setting up a client-server model. Definitely let me know in the comments if you end up making something cool!

Download the scripts.

∞ Permalink

Using 'trap' To Catch 'Ctrl + C's and Control How Your Script Exits

2011-04-02T16:00:00-06:00

So you've written a script that creates all sorts of temporary files, and launches dozens of processes in the background, and, as is, the script runs indefinitely. The only way to quit is Ctrl + C, but once you fire off this hotkey combo, you're left with a mess of background processes and temp files. There are lots of reasons why your script might do this. For example, I'm currently writing a download manager that sends its wget processes to the background, and creates various temporary status files. It runs indefinitely because it polls a queue file, which contains the URLs of the files it's instructed to download. So, given all that, how is it that I get my script to clean up after itself once I deal it the fatal Ctrl + C blow (or more technically, a "signal interrupt" or "SIGINT" for short)? The trick here is to use the trap command along with some basic process manipulation. Here's the syntax:

trap 'command' signal_to_trap

The above bit of code will cause the command (or string of commands between the quotes) to execute if the script is issued a signal_to_trap. As is implied by the syntax, you can trap more than one type of signal. To catch the Ctrl + C combo, you usually want to trap a SIGINT, which is done as follows:

trap 'echo "Exiting"; exit 0' INT

That bit of code will cause your script to print the message "Exiting" and exit with status 0 once Ctrl + C is pressed (or gets issued a SIGINT some other way). One important caveat is always remember the exit command. For example, the following code will cause your program to print "Exiting" and then continue, effectively ignoring the SIGINT. What NOT to do if you want your program to actually quit:

trap 'echo "Exiting"' INT

Of course, that construct isn't without its uses. Perhaps you want your script to immediately run some routine if Ctrl + C is entered, but you don't want it to quit. The above bit of code would be the way to do it, but that strikes me as a confusing break from convention. Most people expect Ctrl + C to stop the currently running process. A better way to do this would be to use a USR signal. Just substitute INT for USR1 or USR2, and send the USR signal to the script using the kill command: kill -USR1 pid. In any case, back to the topic at hand: exiting a script cleanly.

'Trapping' and Cleaning Up After Yourself

The way I use 'trap' is something along these lines:

exit_routine () {
    # TODO: Clean up stuff.
}
trap 'exit_routine' INT # Intercept SIGINT and call exit_routine

If it's short enough, you can, of course, cram your entire exit routine between the single quotes, but if it's nontrivial it's best to pull it out into its own function. Also remember that BASH executes a script's commands in the order it sees them, so this must be placed somewhere near the beginning of the script to set the trap early on. (Perhaps that's obvious, but I'd be the first to admit that that's just the sort of pitfall I'd waste an hour puzzling over.) Also remember that the function must be declared before the trap statement (or at least before the script receives a SIGINT and tries to call exit_routine).

Cleaning Up Stuff (Processes)

Now that we've declared the exit routine and set the trap, we need to do some actual housekeeping. I do this by keeping track of all the background processes' PIDs and temporary files' filenames in an array. Consider the following code:

for i in $(seq 3); do
    wget "$URL[$i]" & 
    PIDS[$i]=$!
done

Three wget instances are launched and sent to the background with the & operator. Their PIDs are accessed with the $! variable and stored to $PIDS. We can now kill off those processes with the following bit of code:

exit_routine () {
    kill ${PIDS[*]}
    echo "Script exiting."
    exit 0
}
trap 'exit_routine' INT

Whenever Ctrl + C is pressed, the SIGINT is trapped and exit_routine is called. kill is then given all the PIDs, which are accessed using ${PIDS[*]}. This kills the wget processes, an exit message is printed, and the script is exited. The neat thing about the ${ARRAY[*]} way of accessing all an array's elements is that empty elements won't be returned. So if I do:

PIDS[5]="5"
PIDS[10]="10"
echo ${PIDS[*]}

Then only "5 10" is printed.

Dealing with Temp Files

Erasing temp files is a bit more tricky. The possibility of white space characters in a file's name makes rm ${FILES[*]} troublesome. Enclosing the array in double quotes doesn't help either. Instead, we need to step through the array as follows:

for i in $(seq 3); do
    rm "${FILES[$i]}"
done

If we want to ignore the empty elements of $FILES like ${PIDS[*]} automatically did for the PIDs, then we can test if an element is empty using an if statement. Alternatively, you can impress your friends with some BASH-fu and take advantage of BASH's short circuit functionality:

for i in $(seq 3); do
    [ "${FILES[$i]}" != "" ] && rm "${FILES[$i]}"
done

If ${FILES[$i]} is empty, the statement short circuits, and does nothing. If it contains something, the rm command is executed.

So, that's how it's done. Now you have no excuse for leaving garbage behind (if only BASH had some way of automagically collecting this so called garbage!), and, as always, if you have any tips regarding this, please share your BASH-fu with us in the comments!

∞ Permalink

Sending Emails and Texts from the Command Line

2011-03-24T16:39:00-06:00

Here's a handy way to send emails from the command line:

echo 'message' | mail -s 'subject' 'email_address'

If you know the recipient's carrier, you can also use it to send text messages from the command line. For example, if the recipient's number is 111-111-1111 and she's on Verizon, you can send her a text message using the following command:

echo 'Sent from my terminal!' | 
    mail -s 'Linux is fun' '1111111111@vtext.com'

Once again, the trick here is knowing the domain name of the carrier's Email-to-SMS Gateway. In the case of Verizon, it's @vtext.com. Wikipedia has a handy list of other gateways here. Another trick is to send yourself an email from the phone, and the phone's SMS gateway will be revealed in the 'from' field.

You can use this to create all sorts of fun shell scripts. For example, here's one I wrote recently that tells me whether or not the VPS provider BuyVM has any VPSs in stock:

#!/bin/bash -

echo 'Is BuyVM in stock?'
while :; do
        wget -O - -q http://doesbuyvmhavestock.com/ | grep -i yes
        if [ $? == 0 ]; then
                echo 'Well, what are you waiting for?' | 
                    mail -s "BuyVM is in stock" "you@example.com"
                echo 'Yes!'
                exit 0
        else
            echo 'No'
            sleep 15m
        fi
done

It's pretty simple. Every 15 minutes it scrapes the webpage doesbuyvmhavestock.com and sends me an email if it contains the word 'yes'. Obviously, the success of your own script depends on how well it can parse the webpage. As you can see, my parsing routine (the grep on line 5) is pretty primitive.

∞ Permalink

Backing Up a Directory's Permissions

2011-03-17T10:39:00-06:00

We've all done it before, and if it hasn't happened to you yet, it's only a matter of time. You're attempting to set the permissions on a directory and and you're doing something along the lines of sudo chmod -R 660 ./foobar. You execute the command only to realize that you're in the wrong directory or you've mistyped the directory's name, and now your permissions are totally clobbered. If you've kept good backups, this is annoying, but not devastating. Either you could completely restore the files, or perhaps you could write a script which would copy the old file permissions back over. An easier alternative would be to plan ahead and backup the file permissions before disaster strikes. Here's a script that will do that for you:

#!/bin/bash

#Check that we've been passed a directory.
if [ ! -d "$1" ]; then
   echo "$1 is not a directory. Exiting."
   exit 1
fi

#Check if we've been passed a relative or absolute
#path. If it's relative, store $PWD in $DIR. We'll need
#this later to build the absolute path.
echo "$1" | grep -q '^/'
if [ $? == 0 ]; then
   DIR='' 
else
   DIR="$PWD/"
fi

#Generate the header comments for the script.
echo "#!/bin/bash"
echo "#"
echo "#This script will restore permissions under $DIR$1"
echo "#to how they were on `date`"
echo "#"

#Loop through the given directory's tree and 
#echo the commands to restore the permissions and owner.
find "$1" -print0 | while read -d $'\0' i
do
   echo chmod `stat -c %a "$i"` \"$DIR$i\"
   echo chown `stat -c "%U:%G" "$i"` \"$DIR$i\"
done

What this script does is generate another script, which, when run, will restore a directory's permissions, owner, and group (and also the permissions, owner, and group of any files and directories under it). Suppose you drop this script into a file called backup.sh and you want to backup the permissions of a directory called /home/you/mystuff. Here's how you'd do it:

./backup.sh /home/you/mystuff > mystuff-backup.sh

Now when you execute mystuff-backup.sh, all of /home/you/mystuff's permissions will be restored. Easy.

There is one security caveat you should keep in mind. The resultant script will contain the names of all the files and subdirectories in the directory tree. The upshot is that anyone who has read access to the script will be able to see this information. If this is a problem, you should adjust the script's permissions accordingly.

The script itself is pretty straight forward. On line 4, we tell a relative path from an absolute path by seeing if it's prefixed with a forward slash. If it is, then it's absolute. If it isn't, then it's relative. We'll need this information later to build the absolute path to all the files. The next step is to loop through the files, which is a bit tricky. On line 28, we need to pipe the output of find to a while read because the more intuitive for loop construct doesn't handle spaces and newlines in filenames properly (of course, anyone who puts newlines in a filename needs to be thrown into the sarlacc). In fact, you may just want to memorize that line, as it's a common problem with a far from obvious solution. Finally, on line 30 and 31, we read the file's permission and owners with stat, which is a handy command that, among other things, outputs file information in a user definable format.

So there you have it. Download the script here. Now go forth and make this a cron job.

∞ Permalink

The First Spellchecker (In 6 Lines)

2011-03-12T15:17:00-07:00

I recently came across this gem in a book called Classic Shell Scripting. According to the authors, it's a modern port of the first spellchecker. The original version was written by Steve Johnson, and was then translated into the form you see here by Kernighan and Plauger. I assume the translation was necessary because the syntax of the original isn't compatible with modern shells. Without further ado, here it is:

cat filename | 
  tr A-Z a-z | 
    tr -c a-z '\n' | 
      sort | 
        uniq | 
          comm -13 dictionary_file -

It's a spellchecker, which, at 115 characters, fits within a tweet! Imagine that! To prove to you that it actually works, and to truly appreciate the cleverness of this code, let's step through it line by line. The first command, 'cat', simply reads the file and sends it down the pipe. This is then processed by 'tr' which converts all the uppercase characters to lower case. This is again piped to 'tr', but this time everything that's not a lower case character is converted to a newline character. At this point, the original input has been processed down to a list of words separated by newline characters. This list is then sorted alphabetically with the 'sort' command, and all the duplicate words are removed with the 'uniq' command. Finally, this list of words is compared with a dictionary file. Any words that appear in the list, but not in the dictionary, are sent to standard output. The final result is a list of words that appear in your document, but not in the dictionary, and thus are probably misspelled. At this point, you may have noticed a rather serious problem with this script. It doesn't do a very good job of handling apostrophes. The third line will remove all single quote characters, even if they're functioning as apostrophes, and replace them with newline characters. One way to fix this is to replace line 3 with the following:

sed -r "s/('$|^'|\W'\b|\b'\W|\W'\W)/ /g" |
  tr -c a-z\' '\n' |

That rather ugly 'sed' command will remove all the single quote characters which aren't in the middle of words, thereby preserving apostrophes, and the new 'tr' command will ignore all single quote characters since 'sed' has already dealt with them. The end result is that the apostrophe in, for example, foobar's will be preserved but the single quotes at the beginning and end of 'foobar's' will be removed. Another improvement would be to keep it from removing hyphens in compound words, but I'll leave that as an exercise to the reader. Also, If you need a dictionary file try:

sudo apt-get install wamerican-small

And look under /usr/share/dict/words.

beala@superfly:/usr/share/dict$ wc -l words
50253 words

That's a lot of words! Try 'wamerican-insane' if you want a, well, insane number of words.

∞ Permalink