Many repetitive jobs can be performed more efficiently if you utilize more of your computer's resources (i.e. CPU's and RAM). Below is an example of running multiple jobs in parallel.
Suppose you have a < list of files >
, say output from ls
. Also, let these files are bz2 compressed and the following order of tasks need to be operated on them.
bzcat
to stdoutgrep <some key word>
gzip
Running this using a while-loop may look like this
filenames="file_list.txt"
while read -r line
do
name="$line"
## grab lines with puppies in them
bzcat $line | grep puppies | gzip >> output.gz
done < "$filenames"
Using GNU Parallel, we can run 3 parallel jobs at once by simply doing
parallel -j 3 "bzcat {} | grep puppies" ::: $( cat filelist.txt ) | gzip > output.gz
This command is simple, concise and more efficient when number of files and file size is large. The jobs gets initiated by parallel
, option -j 3
launches 3 parallel jobs and input to the parallel jobs is taken in by :::
. The output is eventually piped to gzip > output.gz