Korn Shell Wildcards
by A Programmer on Dec.17, 2011, under Uncategorized
Most of us working on unix would know about the shell wildcards (* and ?). These can be very useful while working with files.
For example, *.db would match any file ending with .db in the given directory:
# ls -lrt *.db -rw-r--r-- 1 staff staff 14336 Dec 9 18:58 messages.54672.db -rw-r--r-- 1 staff staff 23552 Dec 11 14:07 messages.76698.db -rw-r--r-- 1 staff staff 4536320 Dec 11 22:30 messages.57286.db -rw-r--r-- 1 staff staff 283648 Dec 11 22:52 messages.80986.db
One of the less-known feature of the wildcards is the ability to OR them:
# ls -lrt *.@(57286|80986).db -rw-r--r-- 1 staff staff 4536320 Dec 11 22:30 messages.57286.db -rw-r--r-- 1 staff staff 283648 Dec 11 22:52 messages.80986.db
Further, you can negate them as well:
# ls -lrt !(*.@(57286|80986)).db -rw-r--r-- 1 staff staff 14336 Dec 9 18:58 messages.54672.db -rw-r--r-- 1 staff staff 23552 Dec 11 14:07 messages.76698.db
Here is a quick reference of all KSH wildcards.
I find this feature really useful while working with multiple files and wish to process them, except a few.
For example, removing all the log files in a directory except the one for the active process (assuming PID is part of the file name).
Hope you find use of this in your day-to-day work!
~A Programmer
Golden Rule for Super Fast Shell Scripts!
by A Programmer on Jun.15, 2010, under Uncategorized
Shell scripts are very powerful in what you can make them do and they can be written in a very short amount of time, especially when it comes to processing text files. Most often these scripts are written as a stop-gap solution till a more permanent, efficient C program can replace them. However, once the script is installed in production, people realize its working just fine and there is no need to spend more effort to write a C-program from scratch. Over time the script starts getting used more and more (a phenomena one of my good friend labels “if you build it, they will come”. A reference from the movie “Field of dreams”… more on this in some other post) and it soon becomes a performance bottle-neck in the system.
This is partly due to the fact that there are so many ways of accomplishing a single task in shell scripts that its difficult for most developers to figure out which one is the most efficient way. In this post I’ll cover a single “Golden rule” that I have discovered, which helps me write really efficient Korn Shell scripts. Here it is:
Never launch a child process in a processing loop!
A processing loop, as referred here, is a loop which iterates over every record in the input file.
The Golden Rule can be expressed in mathematical notation as:
performance = A/(# of child processes launched per input record)
Considering ‘A’ would be a system-dependent constant.
Note:Even though I have used Korn shell for all the examples here, the basic principle should hold true for any shell.
What is a Child Process?
The crux of the golden rule is to make sure that we do not launch any child process multiple times within a script execution. This is because there is a lot of overhead involved in creating a child-process. Here are some tips on what causes a child process to be created:
- Any utility that’s not a shell built-in, like cut, sed, grep etc.
- Every time you use a pipeline, it causes child processes to be created.
It will be easier to demo the Golden rule than talk about it. So, lets dive into some samples.
Examples
Lets take a very simple example, where you have a CSV input file with first-name, last-name and an email address. The job of the process is to parse the input file, verify each email for an ‘@’ and a ‘.’ in the email id, and split out any invalid records to an error file.
I’ll show the same script written in 3 different styles in reducing number of child-processes per record and increasing degree of efficiency.
Sample 1
#!/bin/ksh
> valid
> invalid
cat $1 | while read line
do
## Three pipelines used here to parse each record
fname=$(echo $line | cut -d, -f1)
lname=$(echo $line | cut -d, -f2)
email=$(echo $line | cut -d, -f3)
## Another pipeline to check validity of the email
if echo "$email" | egrep ".+@.+\..+" >/dev/null; then
echo "$fname,$lname,$email" >> valid
else
echo "$fname,$lname,$email" >> invalid
fi
done
Sample 2
#!/bin/ksh
> valid
> invalid
cat $1 | while IFS=, read fname lname email
do
## Eliminated the need for three pipelines by using IFS with read
if echo "$email" | egrep ".+@.+\..+" >/dev/null; then
echo "$fname,$lname,$email" >> valid
else
echo "$fname,$lname,$email" >> invalid
fi
done
Sample 3
#!/bin/ksh
cat $1 | while IFS=, read fname lname email
do
## Eliminated need for using egrep by using KSH's built-in regular expression capbility
if [[ $email = (.)+@(.)+\.(.)+ ]]; then
echo "$fname,$lname,$email"
else
echo "$fname,$lname,$email" >&2
fi
done > valid 2> invalid
Performance Matrix
| Input Records | Sample 1 | Sample 2 | Sample 3 | |
|---|---|---|---|---|
| 1,000 | Real | 0m9.380s | 0m2.533s | 0m0.085s |
| User | 0m2.744s | 0m0.781s | 0m0.045s | |
| Sys | 0m7.960s | 0m2.037s | 0m0.039s | |
| 10,000 | Real | 1m25.238s | 0m22.515s | 0m0.970s |
| User | 0m24.663s | 0m7.001s | 0m0.379s | |
| Sys | 1m11.544s | 0m17.786s | 0m0.299s | |
| 100,000 | Real | 14m42.842s | 4m6.492s | 0m6.527s |
| User | 4m8.237s | 1m15.282s | 0m3.667s | |
| Sys | 12m12.653s | 3m13.174s | 0m2.862s | |
| 1,000,000 | Real | 145m58.457s | 41m17.773s | 1m11.483s |
| User | 41m15.294s | 12m28.498s | 0m39.565s | |
| Sys | 121m8.872s | 32m21.106s | 0m30.701s |
Following is the raw ‘time’ command’s output, just in case I made a mistake inputting values in the table format.
1,000 Input Records
Sample1
real 0m9.380s user 0m2.744s sys 0m7.960s
Sample2
real 0m2.533s user 0m0.781s sys 0m2.037s
Sample3
real 0m0.085s user 0m0.045s sys 0m0.039s
10,000 Input Records
Sample1
real 1m25.238s user 0m24.663s sys 1m11.544s
Sample2
real 0m22.515s user 0m7.001s sys 0m17.786s
Sample3
real 0m0.970s user 0m0.379s sys 0m0.299s
100,000 Input Records
Sample1
real 14m42.842s user 4m8.237s sys 12m12.653s
Sample2
real 4m6.492s user 1m15.282s sys 3m13.174s
Sample3
real 0m6.527s user 0m3.667s sys 0m2.862s
1,000,000 Input Records
Sample1
real 145m58.457s user 41m15.294s sys 121m8.872s
Sample2
real 41m17.773s user 12m28.498s sys 32m21.106s
Sample3
real 1m11.483s user 0m39.565s sys 0m30.701s
Conclusion
As you can see from the performance matrix, as the number of child processes in a shell script reduce, the performance begins to improve drastically.
There are situations, however, when it seems impossible to avoid running child-processes for every record (especially when third party utilities are involved, like having to update the database for every record). Rest assured, there is a way around that (read KSH co-processes)! I’ll cover that in another post sometime.
Hope this post helps you write faster running scripts!
~A Programmer
Server Refused Our Key!
by A Programmer on Jun.14, 2010, under Uncategorized
If you work in a corporate Unix environment, you probably have a bunch of different Unix servers that you have to login to on a regular basis. And, if you use putty to connect to your servers, you probably end up typing in your password at least couple dozen times a day (double up that number if you are working on a VPN connection that drops couple of times a day like mine!).
So, what is the solution to all of this? Here are a couple of options:
- Use a different Terminal application that provides the ability to store your passwords (like Poderosa, Tectia etc.)
- Use Key-based authentication.
The main problem with #1 is security. When you store your password, it is basically stored in an unprotected fashion on your local hard disk. So, if someone gets to your hard disk someway, they have access to all the servers that you have configured. Also, on a more personal note, I like Putty a lot and have gotten used to it, so I do not want to use a different Terminal.
That leaves us with option #2. Now, there are a lot of great tutorials out there on key-based authentication, so I would not go in detail into that. However, I’ll quickly cover the steps needed to setup key based authentication and the one error that kept me from using it for almost a month!
Prerequisites
We need the following tools installed on your local Windows machine:
- An SSH-compatible Terminal Application (Putty)
- A Key generator (PuttyGen)
- A Key Agent. Optional. (Pageant)
- An SSH-enable Unix/Linux Server
Usually, all the terminals come loaded with all the above mentioned utilities. If not, check the documentation and see if they are compatible with keys generated/stored by some other generator/agent application.
Steps
All the steps shown here will be for Putty’s suite, but you should have similar steps with any other tool:
Generate a Key Pair
A key pair consists of a public and a private key that will be used for authentication with the server. A key pair is like a yin-yang pair. They complete each other! Alone they are useless, but together they provide a very secure method of encrypting data. A piece of information encrypted by a private-key can only be decrypted by the public-key of the same pair and vice versa. Key-based authentication makes use of this fact and splits up the pair between a server and a client machine. The client machine keeps the private-key. To authenticate the client encrypts a pre-agreed message with its private key and sends it to the server. The server tries to decrypt the message using the public-key it has for that client. If the message decrypts properly, the server knows it had to be encrypted with the paired private key and hence you are authenticated.
In a sense, the private key in the key-pair becomes your credentials for the server (and that makes it a really important thing!!!). This is why it is important to protect your private key!
Here are the steps required to generate a key-pair:
- Launch PuttyGen.
- By default, the Parameters section would have ‘SSH-2 RSA’ and ’1024′ bits options enabled. They should work with most modern system. If your server is using older version of SSH, you can tweak these settings (Check with your Unix admin to know what settings you should use).
- Click on Generate. A message will come up asking you to move your cursor in the empty space in the window. Do it.
- A key will be generated and displayed.
- Change the ‘Key comment’ field to something which identifies the server(s) you will use this key-pair on, like “Credit-Cards-Accounting” (Yes, you can generate multiple key-pairs for use with different servers). I am not sure if you can use spaces in the comment, but would advise to use some other separator character.
- Enter a passphrase in the ‘Key passphrase’ and ‘Confirm passphrase’ fields. Although, a passphrase is optional, I strongly recommend using it, as it is the only thing that will protect your credentials in case somebody steals them.
- Click ‘Save private key’ button and save the key in a folder of your choice. I recommend creating a separate folder where you store all your keys and back up that folder whenever you create a new key.
- There is an option to ‘Save public key’, but I much rather prefer copying the public key as displayed on the screen. So select all and copy the public key (Since the private key file contains both public and private keys, you can just load the private key file into Puttygen, if you ever need to copy the public key again).
Setting up the Server
Now that we have the key-pair, the next step is to setup the server to accept the public key of your workstation. My servers use the open-source OpenSSH server, so the following steps are for them. I believe that the proprietary SSH server is not that common and it should also use the same setup (feel free to leave a comment if you use the proprietary SSH sever and you are able to use/not use these steps).
- Login to your Unix/Linux server using the regular password based authentication.
- Create a directory ‘.ssh’ under your home directory. Make sure the permissions are 700 for the .ssh directory.
- cd to the .ssh directory.
- Create a file called authorized_keys and paste the public key in that file. Note that the complete key should be on a single line.
- Change the file permissions on authorized_keys file to 600.
- Make sure that your home directory has the permissions of 755.
That’s all the setup required on the server end.
Setting up the Terminal Client App
Lets setup a new connection with Putty
- First we need to setup the connection to use a default user-name. This is not required, but it if you don’t want to enter your password every time, I am assuming you don’t want to type in you user name either. This option is configured under the Connection > Data section, in the ‘Auto-login username’ field.
- Next, we need instruct the terminal to use the private key we just generated. Go to Connection > SSH > Auth section.
- Since we are using an agent to store our private keys, we only need to check the ‘Attempt authentication using Pageant’ under the ‘Authentication Methods’ sub-section.
Setting up the Agent
Since we have our keys encrypted with a passphrase, the only way our Terminal app can access the keys is by asking us for the pass-phrase to decrypt the keys. This defeats the whole purpose, as we have replaced the server’s password with the key’s passphrase. This is where the agent comes in handy. The agent can store your decrypted keys in-memory during your login session on your local machine. This way your keys are always stored encrypted on disk and you do not not have to provide a passphrase every time you use the key. So, the agent strikes a balance between security and usability.
- Launch the agent program (in our case, Pageant).
- Most probably it will start in a minimized mode in the notification area. Double click its icon to show the main window.
- You would see there are no keys loaded in the agent as yet. Click on the ‘Add Key’ button.
- Browse to the private key file, select it and click ‘Open’.
- A dialog box will open up asking for the passphrase for the key. Enter the same passphrase that was used when you created the key-pair using PuttyGen.
- The key would now show-up in the list of loaded keys.
- Repeat last 3 steps for any additional keys you might have.
Note that, you will have to setup the agent everytime you log-off your machine. You can actually create a shortcut to Pageant in the Windows Startup folder with the list of keys to be loaded as command line arguments. This way it will automatically launch and prompt for the key passphrase(s) every time you login.
Testing the Setup
Its time to test the complete setup. Launch Putty and select the profile that you have configured to authenticate using the agent. You should see the message that Putty is using the key to authenticate and should be logged in automatically.
Troubleshooting
Server Refused Our Key
The only error that I kept getting was ‘Server refused our key’. If you have run through the setup as mentioned in this article, there could be a couple of reasons why you would get this error:
- You are using OpenSSH and used the wrong format of the public key in the ~/.ssh/authorized_keys file. Remember, copy-paste the public key from the PuttyGen window, not from the ‘public key’ file saved using PuttyGen.
- The permissions on your home directory, the ~/.ssh directory or the ~/.ssh/authorized_keys file are wrong. Double check the permissions as specified in the ‘Setting up the Server‘ section. Note that the home directory permissions are really important… it took me weeks to figure that out!
Conclusion
I have been using key based authentication for the last week or so, and I can tell you that I am mighty pleased with the setup. No more frustration of remembering/typing in the password twenty times a day. If you are working in a Unix/Linux environment, I would highly recommend using key-based authentication. There are a lot of other uses of key-based authentication other than interactive login sessions. Perhaps, I’ll cover them in some other post.
Let me know through the comments if this was of help and if I have missed something here!
~A Programmer
Hello world!
by A Programmer on Jun.13, 2010, under Uncategorized
So here is a quick introduction about me/this journal.
I got my first taste of programming with GWBasic on a DOS-based PC when I was 9 years old. Since then I have been fascinated with computers. Then in my first year of college I was introduced to the amazing world of C programming and have been programming since then. I have been working professionally for the last 7 years in the software/IT field with the main focus on Unix/C/KSH scripting.
Over the years, I have searched the internet for many different types of problems and have gotten a lot of help from people sharing tutorials/tips/recipes/source-code etc. However, till now I have only used that information, but never gave back. This journal is an effort to do exactly that… give back.
I plan to share with everyone different kinds of problems that I face in day-to-day programming and how I solved them. The posts would range from a theoretical approach to a mid-level design to a code-implementation.
Hopefully you will find some of them of use!
~A Programmer












