Hadoop – Hadoopsters

in Hadoopsters

More on Medium

Landon Robinson
How to use Predicate Subqueries in Spark SQL (i.e. subqueries in where clause)Spark SQL supports an incredibly power feature: predicate subqueries. When combined with broadcasting, you are nigh-unstoppable.
2 min read·Oct 14, 2020
--
--
Landon Robinson
How to Load Data from Cassandra into Hadoop using SparkCassandra is a great open-source solution for accessing data at web scale, thanks in no small part to its low-latency performance.
3 min read·Jun 27, 2019
--
1
--
1
Landon Robinson
How to Control File Count, Reducers and Partitions in Spark and Spark SQLIt’s time we wrangle this horse.
7 min read·Jun 22, 2019
--
--
Landon Robinson
Our Spark + AI Summit 2019 Talks are Now Available OnlineApache Spark Listeners: A Crash Course in Fast, Easy Monitoring and Headaches and Breakthroughs in Building Continuous Applications
1 min read·May 27, 2019
--
--
Landon Robinson
How to Override a Spark Dependency in Client or Cluster ModeIn this post, we’ll cover a simple way to override a jar, library, or dependency in your Spark application that may already exist…
3 min read·May 8, 2019
--
--
Landon Robinson
How Random Sampling in Hive Works, And How to Use ItRandom sampling is a technique in which each sample has an equal probability of being chosen. Let’s see how it’s done in Hive.
5 min read·Feb 4, 2018
--
1
--
1
Landon Robinson
How to Build Optimal Hive Tables Using ORC, Partitions and Metastore StatisticsLearn how to leverage the ORC file format to optimize your big data in Hive and Hadoop!
5 min read·Dec 19, 2017
--
--
Landon Robinson
How to Export Hive Table to CSV FileIf your Hadoop cluster allows you to connect to Hive through the command line interface, you can very easily export a Hive table to a CSV.
2 min read·Sep 18, 2015
--
--

in Hadoopsters

How to use Predicate Subqueries in Spark SQL (i.e. subqueries in where clause)

Spark SQL supports an incredibly power feature: predicate subqueries. When combined with broadcasting, you are nigh-unstoppable.

How to Load Data from Cassandra into Hadoop using Spark

Cassandra is a great open-source solution for accessing data at web scale, thanks in no small part to its low-latency performance.

How to Control File Count, Reducers and Partitions in Spark and Spark SQL

It’s time we wrangle this horse.

Our Spark + AI Summit 2019 Talks are Now Available Online

Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring and Headaches and Breakthroughs in Building Continuous Applications

How to Override a Spark Dependency in Client or Cluster Mode

In this post, we’ll cover a simple way to override a jar, library, or dependency in your Spark application that may already exist…

How Random Sampling in Hive Works, And How to Use It

Random sampling is a technique in which each sample has an equal probability of being chosen. Let’s see how it’s done in Hive.

How to Build Optimal Hive Tables Using ORC, Partitions and Metastore Statistics

Learn how to leverage the ORC file format to optimize your big data in Hive and Hadoop!

How to Export Hive Table to CSV File

If your Hadoop cluster allows you to connect to Hive through the command line interface, you can very easily export a Hive table to a CSV.

Editors

Landon Robinson

Craig C