How to Execute Linux Bash Commands in Scala Application

Bash scripts/commands can be very handy at times. You may come across use-cases where you may need to execute a bash script within a Scala Application. I once encountered such scenario in one of the project where I was supposed to count length of a file without actually loading file in Spark cluster. This was a challenging task. I came up with a number of approaches like making use of metadata of Hive table (by creating an external table on the location of the file by properly specifying the delimiters and also leveraging the fact that the file was structured). This approach works in some distributions like Cloudera but not in others like HDInsight. Another approach was to make use of bash's "wc" function but I couldn't escape the environment of my Scala application to compute the value. So after a lot of research and efforts, I figured out how you can execute bash commands in Scala. Here is a brief snippet of code that highlights the usage at higher level:

 

import sys.process._ 

//firstly you will need to import all class in the sys.process package using the import. Because the class that we are going to use belongs to this package;

val wc_path="hadoop fs -cat /example/my_file.txt"

//just creating a variable that stores the part of the command I want to use. I am making use of HDFS API and specifically the "cat" command to view the content of my_file.txt file present in /example folder. Cat command has a number of usage but here I am just outputting the content of the file.

//now to calculate length of a file or more specifically the number of words in a file, I will "pipe" the output of the hadoop fs -cat command to 'wc -l" command. Its a Linux concept where you can redirect output of one command to another. So my complete bash command is hadoop fs -cat /example/my_file.txt | wc -l 

//to execute this command in Scala Application, the function to use is !!. Strange isn't it? This is where Scala differs significantly from Java and other JVM languages because it allows you to use symbols and non-alphanumeric characters in class methods. So what you are doing is actually using !! function of sys.process which allows you to execute bash command in Scala Applications;


val actual_num_rows = (wc_path #| "wc -l").!!.trim.toInt

//and then I am using trim and toInt functions to convert the returned value into Int because word count should be integer right?

//now there is one little issue. Those bash commands which involve the usage of | dont work in straight forward manner in Scala.process.!! function. So you can see in the command above that I had to use #| and had to partition my complete command to make it work. So be conscious of that when you are using piped bash command.

 

So in this quick tutorial, I hope that you may have gathered how to execute bash commands from your Scala Application. Now you have the power bash along with Scala and Java libraries to beef up the capabilities of your program. Enjoy!