What is HDFS – Overview of Hadoop’s distributed file system

Hadoop comes with its own distributed filesystem, HDFS. As the use of the internet is increasing, we are producing data at astonishing rates. With the increase in the size of data, we face issues of storage as well as processing such huge datasets. We can increase the limit of a single machine to a finite extent (Vertical scaling). It becomes costly and unreliable…

Continue Reading

Sqoop Import : part 2

Hello everyone, in the last article of Sqoop Import we have seen how can we use this command to transfer single table to HDFS. There are cases when we do not want all the data from the table or not all the columns. In such cases, we can use a filter and/or free-form of queries. There is also a possibility…

Continue Reading

Sqoop Import

Hello everyone. In this article, we will explore Sqoop Import command. This command is used to transfer data from RDBMS to Hadoop cluster. We can use Sqoop Import tool to transfer one table at a time. We can choose file format in which data will be stored in HDFS. There is import all tables variant of this tool which imports…

Continue Reading

Sqoop Introduction

One of reason which made Hadoop ecosystem popular is its ability process different forms of data. But not all data is present in HDFS i.e Hadoop Distributed File System. We have been using relational databases to store and process structured data from a long time. That is why a lot of data still resides in RDBMS and we need some tool…

Continue Reading
  • 1
  • 2
Close Menu