1、MySQL中如何提高字符串查询效率?
对字符串加索引?一般情况下,是不建议在字符串加索引,占空间
如果一定要加,建议可以指定长度,前提是字符串前面部分区分度好的话,此时这类索引就叫前缀索引
2、前缀索引的问题
区分度不好的话,很容易发生碰撞,进而引发一系列问题。
前缀索引和普通索引在只有where条件、order by和group by不同执行情况,可以看到Extra的说明,前缀索引只有where条件,无法使用覆盖索引,order by会使用filesort,group by会使用temporary和filesort
总的来说,前缀索引无法使用覆盖索引,进而导致order by和group by要使用文件排序,甚至临时表
3、字符串加不加索引的测试
1、不加索引
查询的SQL:SELECT SQL_NO_CACHE * FROM string_index_test WHERE name=‘forlan’;
Benchmark
Average number of seconds to run all queries: 8.328 seconds
Minimum number of seconds to run all queries: 8.328 seconds
Maximum number of seconds to run all queries: 8.328 seconds
Number of clients running queries: 100
Average number of queries per client: 0
Benchmark
Average number of seconds to run all queries: 18.078 seconds
Minimum number of seconds to run all queries: 18.078 seconds
Maximum number of seconds to run all queries: 18.078 seconds
Number of clients running queries: 200
Average number of queries per client: 0
2、加字符串索引
alter table string_index_test add index idx_name (name) USING BTREE;
查询的SQL:SELECT SQL_NO_CACHE * FROM string_index_test WHERE name=‘forlan’;
Benchmark
Average number of seconds to run all queries: 0.250 seconds
Minimum number of seconds to run all queries: 0.250 seconds
Maximum number of seconds to run all queries: 0.250 seconds
Number of clients running queries: 100
Average number of queries per client: 0
Benchmark
Average number of seconds to run all queries: 1.438 seconds
Minimum number of seconds to run all queries: 1.438 seconds
Maximum number of seconds to run all queries: 1.438 seconds
Number of clients running queries: 200
Average number of queries per client: 0
4、使用CRC32创建索引
CRC全称为Cyclic Redundancy Check,又叫循环冗余校验。
CRC32是CRC算法的一种,返回值的范围0~2^32-1,使用bigint存储
加一个name_crc32列,创建这个列的所有,索引空间小很多,利用整型加速查询
加索引:alter table string_index_test add index idx_nam_crc32 (name_crc32) USING BTREE;
查询的SQL:SELECT SQL_NO_CACHE * FROM string_index_test WHERE name_crc32=CRC32(‘forlan’) and name=‘forlan’;
因为CRC32存在发生碰撞,所以加上name条件,才能筛选出正确的数据
Benchmark
Average number of seconds to run all queries: 0.266 seconds
Minimum number of seconds to run all queries: 0.266 seconds
Maximum number of seconds to run all queries: 0.266 seconds
Number of clients running queries: 100
Average number of queries per client: 0
Benchmark
Average number of seconds to run all queries: 0.390 seconds
Minimum number of seconds to run all queries: 0.390 seconds
Maximum number of seconds to run all queries: 0.390 seconds
Number of clients running queries: 200
Average number of queries per client: 0
总结:
- 通过对字符串加索引,可以提高查询效率,但需要注意指定长度,无法使用覆盖索引
- 通过使用CRC32,需要额外存一个字段,将字符串转为整数存储,节省空间,效率提升并不是很大,但存在碰撞问题,可以加多字符串筛选条件
- 对于CRC32存在碰撞问题,可以使用CRC64减少碰撞,但需要安装 common_schema database函数库
发表评论
取消回复