Hello everyone. In this article, we will explore Sqoop Import command. This command is used to transfer data from RDBMS to Hadoop cluster. We can use Sqoop Import tool to transfer one table at a time. We can choose file format in which data will be stored in HDFS. There is import all tables variant of this tool which imports all the tables from a database to HDFS.

Sqoop Import:

Import command is used to import a single table from database to HDFS. Let’s look into it with an example.

This Sqoop command will transfer employees table from MySQL to HDFS. In the home path at HDFS, there will be a directory created with employees name which will have all the transferred data.

 

sqoop import

 

Sqoop Import Arguments:

We can change the name of the directory where Sqoop is importing data in HDFS. Also, we can keep all the folders of import job at single location by specifying warehouse directory path.

Remember you can specify target directory and warehouse directory in same import.

We can see output has been split into 4 parts. This is because by default Sqoop MapReduce job launches 4 Mappers. Sqoop divides all data equally in those 4 mappers so that all mappers have the same load. Sqoop uses Primary Key of the table to decide how to split data into mappers. Suppose in this employee table, emp id is the primary key and it has 1000 values. Then Sqoop will split this table among 4 Mappers such that each Mapper will transfer 250 records.

We can change the number of mappers in Sqoop command by passing argument ‘-m ‘ and the number of mappers we want. It is tempting to specify a large number of mappers but this may consume all the resources on RDBMS and HDFS which in turn slows down all the operations.

We can also specify file format in which we want to store data in HDFS. Import tool supports 4 types of file formats.

We can also specify compression setting while importing data to HDFS.

Let us move employees table again to another directory in parquet format using some of the parameters we have mentioned above.

We have used parquet file format with compression and we are using 6 mappers for this job. We can see that in the following output.

 

sqoop import

 

 

This is how we can use Sqoop import tool to transfer single table to HDFS. There are still some important parameters remaining in Sqoop Import command. We will see those in next article.

I am passionate about data analytics, machine learning, and artificial intelligence. Recently I have started blogging about my experience while learning these exciting technologies.

Leave a Reply

Close Menu