How to improve MySQL "fill the gaps" query -
i have table currency exchange rates fill data published ecb. data contains gaps in date dimension e.g. holidays.
create table `imp_exchangerate` ( `id` int(11) not null auto_increment, `rate_date` date not null, `currency` char(3) not null, `rate` decimal(14,6) default null, primary key (`id`), key `rate_date` (`rate_date`,`currency`), key `imp_exchangerate_by_currency` (`currency`) ) engine=innodb default charset=latin1
i have date dimension youd expect in data warehouse:
create table `d_date` ( `date_id` int(11) not null, `full_date` date default null, ---- etc. primary key (`date_id`), key `full_date` (`full_date`) ) engine=innodb default charset=utf8
now try fill gaps in exchangerates this:
select d.full_date, currency, (select rate imp_exchangerate rate_date <= d.full_date , currency = c.currency order rate_date desc limit 1) rate d_date d, (select distinct currency imp_exchangerate) c d.full_date >= (select min(rate_date) imp_exchangerate currency = c.currency) , d.full_date <= curdate()
explain says:
+------+--------------------+------------------+-------+----------------------------------------+------------------------------+---------+------------+------+--------------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | | +------+--------------------+------------------+-------+----------------------------------------+------------------------------+---------+------------+------+--------------------------------------------------------------+ | 1 | primary | <derived3> | | null | null | null | null | 201 | | | 1 | primary | d | range | full_date | full_date | 4 | null | 6047 | using where; using index; using join buffer (flat, bnl join) | | 4 | dependent subquery | imp_exchangerate | ref | imp_exchangerate_by_currency | imp_exchangerate_by_currency | 3 | c.currency | 664 | | | 3 | derived | imp_exchangerate | range | null | imp_exchangerate_by_currency | 3 | null | 201 | using index group-by | | 2 | dependent subquery | imp_exchangerate | index | rate_date,imp_exchangerate_by_currency | rate_date | 6 | null | 1 | using | +------+--------------------+------------------+-------+----------------------------------------+------------------------------+---------+------------+------+--------------------------------------------------------------+
mysql needs multiple hours execute query. there ideas how improve that? have tried index on rate without noticable impact.
i have solution while now: rid of dependent subqueries. had think different angles in mutliple places , here result:
select cd.date_id, x.currency, x.rate imp_exchangerate x inner join (select d.date_id, max(rate_date) rate_date, currency d_date d inner join imp_exchangerate on rate_date <= d.full_date d.full_date <= curdate() group d.date_id, currency) cd on x.rate_date = cd.rate_date , x.currency = cd.currency
this query finishes in less 10 minutes compared multiple hours original query.
lesson learned: avoid dependent subqueries in mysql plague!
Comments
Post a Comment