Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
732 views
in Technique[技术] by (71.8m points)

mysql - B-Tree vs Hash Table

In MySQL, an index type is a b-tree, and access an element in a b-tree is in logarithmic amortized time O(log(n)).

On the other hand, accessing an element in a hash table is in O(1).

Why is a hash table not used instead of a b-tree in order to access data inside a database?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can only access elements by their primary key in a hashtable. This is faster than with a tree algorithm (O(1) instead of log(n)), but you cannot select ranges (everything in between x and y). Tree algorithms support this in Log(n) whereas hash indexes can result in a full table scan O(n). Also the constant overhead of hash indexes is usually bigger (which is no factor in theta notation, but it still exists). Also tree algorithms are usually easier to maintain, grow with data, scale, etc.

Hash indexes work with pre-defined hash sizes, so you end up with some "buckets" where the objects are stored in. These objects are looped over again to really find the right one inside this partition.

So if you have small sizes you have a lot of overhead for small elements, big sizes result in further scanning.

Todays hash tables algorithms usually scale, but scaling can be inefficient.

There are indeed scalable hashing algorithms. Don't ask me how that works - its a mystery to me too. AFAIK they evolved from scalable replication where re-hashing is not easy.

Its called RUSH - Replication Under Scalable Hashing, and those algorithms are thus called RUSH algorithms.

However there may be a point where your index exceeds a tolerable size compared to your hash sizes and your entire index needs to be re-built. Usually this is not a problem, but for huge-huge-huge databases, this can take days.

The trade off for tree algorithms is small and they are suitable for almost every use case and thus are default.

However if you have a very precise use case and you know exactly what and only what is going to be needed, you can take advantage of hashing indexes.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...