Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

Suppose, for illustrative purposes, you are running a library using a simple MySQL "books" table with three columns:

(id, title, status)

  • id is the primary key
  • title is the title of the book
  • status could be an enum describing the book's current state (e.g. AVAILABLE, CHECKEDOUT, PROCESSING, MISSING)

A simple query to report how many books fall into each state is:

SELECT status, COUNT(*) FROM books GROUP BY status

or to specifically find how many books are available:

SELECT COUNT(*) FROM books WHERE status = "AVAILABLE"

However, once the table grows to millions of rows, these queries take several seconds to complete. Adding an index to the "status" column doesn't appear to make a difference in my experience.

Aside from periodically caching the results or explicitly updating summary info in a separate table each time a book changes state (via triggers or some other mechanism), are there any techniques for speeding up these kinds of queries? It seems that the COUNT queries end up looking at every row, and (without knowing more details) I'm a bit surprised that this information can't somehow be determined from the index.

UPDATE

Using the sample table (with an indexed "status" column) with 2 million rows, I benchmarked the GROUP BY query. Using the InnoDB storage engine, the query takes 3.0 - 3.2 seconds on my machine. Using MyISAM, the query takes 0.9 - 1.1 seconds. There was no significant difference between count(*), count(status), or count(1) in either case.

MyISAM is admittedly a bit faster, but I was curious to see if there was a way to make an equivalent query run much faster (e.g. 10-50 ms -- fast enough to be called on every webpage request for a low-traffic site) without the mental overhead of caching and triggers. It sounds like the answer is "there's no way to run the direct query quickly" which is what I expected - I just wanted to make sure I wasn't missing an easy alternative.

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
697 views
Welcome To Ask or Share your Answers For Others

1 Answer

So the question is

are there any techniques for speeding up these kinds of queries?

Well, not really. A column-based storage engine would probably be faster with those SELECT COUNT(*) queries but it would be less performant for pretty much any other query.

Your best bet is to maintain a summary table via triggers. It doesn't have much overhead and the SELECT part will be instantaneous no matter how big the table. Here's some boilerplate code:

DELIMITER //

CREATE TRIGGER ai_books AFTER INSERT ON books
FOR EACH ROW UPDATE books_cnt SET total = total + 1 WHERE status = NEW.status
//
CREATE TRIGGER ad_books AFTER DELETE ON books
FOR EACH ROW UPDATE books_cnt SET total = total - 1 WHERE status = OLD.status;
//
CREATE TRIGGER au_books AFTER UPDATE ON books
FOR EACH ROW
BEGIN
    IF (OLD.status <> NEW.status)
    THEN
        UPDATE books_cnt SET total = total + IF(status = NEW.status, 1, -1) WHERE status IN (OLD.status, NEW.status);
    END IF;
END
//

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...