Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
447 views
in Technique[技术] by (71.8m points)

indexing - MySQL index usage query optimization

I have the following MySQL (MyISAM) table with about 3 Million rows.

CREATE TABLE `tasks` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `node` smallint(6) NOT NULL,
  `pid` int(11) NOT NULL,
  `job` int(11) NOT NULL,
  `a_id` int(11) DEFAULT NULL,
  `user_id` int(11) NOT NULL,
  `state` int(11) NOT NULL,
  `start_time` int(11) NOT NULL,
  `end_time` int(11) NOT NULL,
  `stop_time` int(11) NOT NULL,
  `end_stream` int(11) NOT NULL,
  `message` varchar(255) DEFAULT NULL,
  `rate` float NOT NULL,
  `exiting` int(11) NOT NULL DEFAULT '0',
  `bytes` int(11) NOT NULL,
  `motion` tinyint(4) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `a_id` (`a_id`),
  KEY `job` (`job`),
  KEY `state` (`state`),
  KEY `end_time` (`end_time`),
  KEY `start_time` (`start_time`),
) ENGINE=MyISAM AUTO_INCREMENT=100 DEFAULT CHARSET=utf8;

Now when I run the following query, MySQL is only using the a_id index and needs to scan a few thousand rows.

SELECT count(id) AS tries FROM `tasks` WHERE ( job='1' OR job='3' ) 
AND a_id='614' AND state >'80' AND state < '100' AND start_time >='1386538013';

When I add an additional index KEY newkey (a_id,state,start_time), MySQL is still trying to use a_id only and not newkey. Only when using the hint / force index in the query, it's been used. Changing the fields in the query around does not help.

Any ideas? I don't necessarily want hints in my statements. The fact that MySQL is not doing this automatically indicates to me that there is an issue with my table, keys or query somewhere. Any help is highly appreciated.

Additional info:

mysql> show index in tasks;
+-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name  | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| tasks |          0 | PRIMARY   |            1 | id          | A         |     3130554 |     NULL | NULL   |      | BTREE      |         |               |
| tasks |          1 | a_id      |            1 | a_id        | A         |        2992 |     NULL | NULL   | YES  | BTREE      |         |               |
| tasks |          1 | job       |            1 | job         | A         |           5 |     NULL | NULL   |      | BTREE      |         |               |
| tasks |          1 | state     |            1 | state       | A         |           9 |     NULL | NULL   |      | BTREE      |         |               |
| tasks |          1 | end_time  |            1 | end_time    | A         |     1565277 |     NULL | NULL   |      | BTREE      |         |               |
| tasks |          1 | newkey    |            1 | a_id        | A         |        2992 |     NULL | NULL   | YES  | BTREE      |         |               |
| tasks |          1 | newkey    |            2 | state       | A         |        8506 |     NULL | NULL   |      | BTREE      |         |               |
| tasks |          1 | newkey    |            3 | start_time  | A         |     3130554 |     NULL | NULL   |      | BTREE      |         |               |
+-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

EXPLAIN with and without quotes:

mysql> DESCRIBE SELECT count(id) AS tries FROM `tasks` WHERE ( job='1' OR job='3' )  AND a_id='614' AND state >'80' AND state < '100' AND start_time >='1386538013';
+----+-------------+-------+------+----------------------------+-----------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys              | key       | key_len | ref   | rows | Extra       |
+----+-------------+-------+------+----------------------------+-----------+---------+-------+------+-------------+
|  1 | SIMPLE      | tasks | ref  | a_id,job,state,newkey      | a_id      | 5       | const |  740 | Using where |
+----+-------------+-------+------+----------------------------+-----------+---------+-------+------+-------------+
1 row in set (0.10 sec)

mysql> DESCRIBE SELECT count(id) AS tries FROM `tasks` WHERE ( job=1 OR job=3 )  AND a_id = 614 AND state > 80 AND state < 100 AND start_time >= 1386538013;
+----+-------------+-------+------+----------------------------+-----------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys              | key       | key_len | ref   | rows | Extra       |
+----+-------------+-------+------+----------------------------+-----------+---------+-------+------+-------------+
|  1 | SIMPLE      | tasks | ref  | a_id,job,state,newkey      | a_id      | 5       | const |  740 | Using where |
+----+-------------+-------+------+----------------------------+-----------+---------+-------+------+-------------+
1 row in set (0.01 sec)
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

A few things... I would have a SINGLE compound index on ( a_id, job, state, start_time )

This to help optimize the query on all the criteria, in what I believe is the best tuned sequence. A single "A_ID", then two jobs, a small state range, then time based. Next, notice no quotes... It appears you were converting numeric to string comparisons, leave them as numeric for compare -- faster than strings.

Also, by having them all as part of the index, it is a COVERING index meaning it does NOT have to go to the raw page data to get the other values to test the qualifying records to include or not.

SELECT 
      count(*) AS tries 
   FROM 
      tasks
   WHERE 
          a_id = 614
      AND job IN ( 1, 3 ) 
      AND state > 80 AND state < 100 
      AND start_time >= 1386538013;

Now, the why the index... consider the following scenario. You have two rooms that have boxes... In the first room, each box is an "a_id", within that are the jobs in order, within each job are the state ranges, and finally by start time.

In another room, your boxes are sorted by start time, within that a_id are sorted, and finally state.

Which would be easier to find what you need. That is how you should think on the indexes. I would rather go to one box for "A_ID = 614", then jump to Job 1 and another for Job 3. Within each Job 1, Job 3, grab 80-100, then time. You however know better your data and volume in each criteria consideration and may adjust.

Finally, the count(ID) vs count(*). All I care about is a record qualified. I don't need to know the actual ID as the filtering criteria already qualified as include or not, why look (in this case) for the actual "ID".


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...