DuckDB ASOF Join
Version 0.8.1
Version:

Problem: we have a time-based price table; Traditional joins against this table get NULL results if there is a time which does not exactly match.

Solution: “ASOF JOIN” picks a good value for “in the gap” values.

First, we create a price table and sales table.

CREATE TABLE prices AS (
    SELECT '2001-01-01 00:16:00'::TIMESTAMP + INTERVAL (v) MINUTE AS ticker_time,
        v AS unit_price
    FROM range(0,5) vals(v)
);

create table sales(item text, sale_time timestamp, quantity int);
insert into sales values('a', '2001-01-01 00:18:00', 10);
insert into sales values('b', '2001-01-01 00:18:30', 20);
insert into sales values('c', '2001-01-01 00:19:00', 30);

We can see that we have a unit_price defined for each hour, but not for half hours.

SELECT * FROM prices;
┌─────────────────────┬────────────┐
     ticker_time      unit_price 
      timestamp         int64    
├─────────────────────┼────────────┤
 2001-01-01 00:16:00           0 
 2001-01-01 00:17:00           1 
 2001-01-01 00:18:00           2  No unit_price for 18:30!
 2001-01-01 00:19:00           3 
 2001-01-01 00:20:00           4 
└─────────────────────┴────────────┘
SELECT * FROM sales;
┌─────────┬─────────────────────┬──────────┐
  item         sale_time       quantity 
 varchar       timestamp        int32   
├─────────┼─────────────────────┼──────────┤
 a        2001-01-01 00:18:00        10 
 b        2001-01-01 00:18:30        20  A sale time of 18:30!
 c        2001-01-01 00:19:00        30 
└─────────┴─────────────────────┴──────────┘

With a normal LEFT JOIN, there is a problem for the 18:30 sale. Since there is not a sale_time of 18:30, a join against that time will be NULL.

-- no price value for 18:30, so item b's unit_price and total are NULL!

SELECT s.*, p.unit_price, s.quantity * p.unit_price AS total
 FROM sales s LEFT JOIN prices p
   ON s.sale_time = p.ticker_time;
┌─────────┬─────────────────────┬──────────┬────────────┬───────┐
  item         sale_time       quantity  unit_price  total 
 varchar       timestamp        int32      int64     int64 
├─────────┼─────────────────────┼──────────┼────────────┼───────┤
 a        2001-01-01 00:18:00        10           2     20 
 c        2001-01-01 00:19:00        30           3     90 
 b        2001-01-01 00:18:30        20        NULL   NULL   NULL result!
└─────────┴─────────────────────┴──────────┴────────────┴───────┘

The ASOF JOIN picks a good price for the 18:30 sale. the ON s.sale_time >= pp.ticker_time will cause the nearest lower value (in this case, for 18:00) to be used.

-- using ASOF, 18:30 "rounds down" to use the 18:00 unit_price

SELECT s.*, p.unit_price, s.quantity * p.unit_price AS total_cost
  FROM sales s ASOF LEFT JOIN prices p
    ON s.sale_time >= p.ticker_time;
┌─────────┬─────────────────────┬──────────┬────────────┬────────────┐
  item         sale_time       quantity  unit_price  total_cost 
 varchar       timestamp        int32      int64       int64    
├─────────┼─────────────────────┼──────────┼────────────┼────────────┤
 a        2001-01-01 00:18:00        10           2          20 
 b        2001-01-01 00:18:30        20           2          40  Good result!
 c        2001-01-01 00:19:00        30           3          90 
└─────────┴─────────────────────┴──────────┴────────────┴────────────┘
Search Shortcut cmd + k | ctrl + k