How does a shell terminal process a command?
ls -l is a relatively basic command that is learned early and used often. The command ls by itself is used to list the contents of the current folder, and the -l flag tells ls to show those contents in long format, providing additional information such as permissions, ownership, and the date and time of the most recent change, as shown in the image above. The question I’d like to address today is how does the bash shell process this and other commands when they’re seen?
On standard Linux systems the program used to interpret and execute commands is BASH, or Bourne-Again SHell. Other shell programs such as sh, ksh, and csh are also used and act similarly. Regardless of the shell used, they all have the same tools at their disposal to mediate with the kernel and perform actions, such as environmental variables and system calls. An environmental variable is simply a value with a known name that can affect the way a system runs, such as $PATH, which stores the locations executable files are most commonly stored, or $PWD, which simply stores the current absolute path to the user’s current working location.
First, as soon as the command is entered, it is taken in as one long string of characters and sent as an input into the shell program being used. From here, the string is separated into separate strings through a process known as tokenization, removing whitespace and separating the command from the various options. In the example above, ls -l would be split into two tokens, the first being ‘ls’ and the second being ‘-l’.
Now that an array of tokens has been formed, the shell will next check to see if an alias exists for any of the tokens. An alias may be thought of as another name or a shortcut for other programs. If an alias is found, it is also tokenized and those new tokens are also searched for aliases.
The last step before interpreting the command is to check for built-ins. A built-in differs from a normal program in that built-ins are executed by the shell program itself, instead of interpreting and executing them as normal. In the example above, ls is not a built-in, so the shell will move past this stage.
Once the previous steps have been completed, it is time for interpreting and finding the executable. First, the shell checks the file locations described in the $PATH environmental variable for the program. If the program isn’t found, the shell will display an error on the terminal. As ls is stored in /bin, which is on the $PATH, this step will find ls.
In order to move on to the next step, executing the ls executable, the shell first needs to use the system call fork(), which essentially duplicates the current process, creating a parent and a child. Because the system call used to execute programs “takes over” the current process, this is done to prevent the shell being exited upon executing a program. After the execution, the child process will end with the end of the executed program, or the shell itself will terminate it if the execution fails. While the child is executing, the parent process executes the system call wait(), which tells this version of the program to halt processes until a signal otherwise is received, typically the termination of the child process.
In the final step, the shell calls execve, the system call used to execute programs, using the absolute path to the program found already by searching the $PATH. In addition to the path, execve takes an argv, or argument vector as input, which is where the ‘-l’ token will be stored, and an envp argument, or program environment, which is where all the environmental variables are stored if the program being called happens to need them.
int execve(const char *pathname, char *const argv,
char *const envp);
Now that the ls program has been executed, it is time to reset and prepare for the next command. To do this, the shell simply prints the prompt again, which is saved in the environmental variable $PS1.