Landon RobinsonHow to use Predicate Subqueries in Spark SQL (i.e. subqueries in where clause)Spark SQL supports an incredibly power feature: predicate subqueries. When combined with broadcasting, you are nigh-unstoppable.2 min read·Oct 14, 2020----
Landon RobinsonHow to Load Data from Cassandra into Hadoop using SparkCassandra is a great open-source solution for accessing data at web scale, thanks in no small part to its low-latency performance.3 min read·Jun 27, 2019--1--1
Landon RobinsonHow to Control File Count, Reducers and Partitions in Spark and Spark SQLIt’s time we wrangle this horse.7 min read·Jun 22, 2019----
Landon RobinsonOur Spark + AI Summit 2019 Talks are Now Available OnlineApache Spark Listeners: A Crash Course in Fast, Easy Monitoring and Headaches and Breakthroughs in Building Continuous Applications1 min read·May 27, 2019----
Landon RobinsonHow to Override a Spark Dependency in Client or Cluster ModeIn this post, we’ll cover a simple way to override a jar, library, or dependency in your Spark application that may already exist…3 min read·May 8, 2019----
Landon RobinsonHow Random Sampling in Hive Works, And How to Use ItRandom sampling is a technique in which each sample has an equal probability of being chosen. Let’s see how it’s done in Hive.5 min read·Feb 4, 2018--1--1
Landon RobinsonHow to Build Optimal Hive Tables Using ORC, Partitions and Metastore StatisticsLearn how to leverage the ORC file format to optimize your big data in Hive and Hadoop!5 min read·Dec 19, 2017----
Landon RobinsonHow to Export Hive Table to CSV FileIf your Hadoop cluster allows you to connect to Hive through the command line interface, you can very easily export a Hive table to a CSV.2 min read·Sep 18, 2015----