Sqoop Import

Hello everyone. In this article, we will explore Sqoop Import command. This command is used to transfer data from RDBMS to Hadoop cluster. We can use Sqoop Import tool to transfer one table at a time. We can choose file format in which data will be stored in HDFS. There is import all tables variant of this tool which imports…

Continue Reading

Sqoop Introduction

One of reason which made Hadoop ecosystem popular is its ability process different forms of data. But not all data is present in HDFS i.e Hadoop Distributed File System. We have been using relational databases to store and process structured data from a long time. That is why a lot of data still resides in RDBMS and we need some tool…

Continue Reading

Grouping sets, Rollup and cube

Hello everyone. We have used GROUP BY operation to perform aggregations in our queries. Consider the case where we have data with of retail store inventory. Every month, we have shipped products to different stores with different product types like clothing, home appliances etc. Now we want to calculate that how many products have we shipped to each store according to…

Continue Reading

Pivot rows to columns in Hive

Hello everyone. In this article, we will learn how can we pivot rows to columns in Hive. Pivoting/transposing means we need to convert a row into columns. We need to do this to show different view of data, to show aggregation performed on different granularity than which is present in the existing table. Consider you have following data from some company. It shows how…

Continue Reading

Collect_set and Collect_list in hive

Hello all, welcome to another article on Apache Hive. In this article, we will see how can we use COLLECT_SET and COLLECT_LIST to get a list of comma separated values for particular column while doing grouping operation. While doing hive queries we have used group by operation very often to perform all kinds of aggregation operations like sum, count, max etc. Consider…

Continue Reading
Close Menu