• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    迪恩网络公众号

johnmyleswhite/BloomFilters.jl: Bloom filters in Julia

原作者: [db:作者] 来自: 网络 收藏 邀请

开源软件名称:

johnmyleswhite/BloomFilters.jl

开源软件地址:

https://github.com/johnmyleswhite/BloomFilters.jl

开源编程语言:

Julia 100.0%

开源软件介绍:

BloomFilters.jl

Build Status

Please note: The API for version 0.1.0 of this package has changed substantially from version 0.0.1. Please review the below README and accompanying library documentation.

Bloom filter implementation in Julia. Supports insertion (add!) and probabilistic membership queries (in) for sets using an in-memory or mmap'd bit array. When an element x is inserted into a Bloom filter, set membership queries will always correctly return true for x (i.e., there are no false negatives). Bloom filter membership queries do, however, occassionally return true for a y not inserted into the data structure (i.e., false positives are possible). With this cost comes remarkable space efficiency: Bloom filters can store set membership using only 10 bits per element at a 1.00% error rate or 20 bits per element at a 0.01% error rate. This space requirement is irrespective of the size or length of the inserted elements (e.g., one can store a set of URLs using only a handful of bits per URL).

Forthcoming features include:

  • Better persistence (only the bit array is automatically saved when using the mmap-backed version; parameters must be separately stored or hard-coded)
  • Support for arbitrary numbers of hash functions (k > 12)

Quickstart

Quick functionality demo:

using BloomFilters


bf = BloomFilter(1000, 0.001)  # Create an in-memory Bloom filter
							   # with a capacity of 1K elements and
							   # expected error rate of 0.1%

add!(bf, "My first element.")
in("My first element.", bf)   # Returns true
in("My second element.", bf)  # Returns false
"My first element." in bf           # Returns true
show(bf)

# Prints:
# Bloom filter with capacity 1000, error rate of 0.10, and k of 10.
# Total bits required: 15000 (15.0 / element).
# Bloom filter is in-memory.

By default, the Bloom filter will be constructed using an optimal number of k hash functions, which minimizes the expected false positive rate per required bit of storage. In many cases, however, it may be advantegous to specify a smaller value of k in order to save time hashing and performing subsequent memory accesses. Alternatively, one may want to explicitly set the number of bits to use per element and k, rather than constructing the filter by passing a target error rate.

The BloomFilters module supports all 3 of these patterns:

# Constructor pattern #1: Pass capacity and error rate
bf1 = BloomFilter(1000, 0.001)

# Constructor pattern #2: Pass capacity, error rate, and k
bf2 = BloomFilter(1000, 0.001, 6)  # vs. the optimal number of 10 if not specified as in bf1

# Constructor pattern #3: Pass capacity, bits per element, and k, but not the error rate
bf3 = BloomFilter(1000, 16, 6)  # Same as bf2, but will *not* compute the error rate and display NaN when show() is called

In addition to this, basic persistence support is provided via optionally using an mmap-backed bit array. This works with each of the above methods by either passing a string of the mmap filepath or an IOStream:

### With an IOStream
# Note that "w+" needs to be passed as the second argument to open() if creating
# a new mmap-backed Bloom filter, or "r+" if opening an already created one
bf4 = BloomFilter(open("/tmp/small_bloom_filter.array", "w+"), 1000, 0.001)

### With a string filepath
# Note this creates the bit array if it doesn't exist or opens it if previously created
bf5 = BloomFilter("/tmp/small_bloom_filter_two.array", 1000, 0.001)

show(bf5)

# Prints:
# Bloom filter with capacity 1000, error rate of 0.10, and k of 10.
# Total bits required: 15000 (15.0 / element).
# Bloom filter is backed by mmap at <file /tmp/small_bloom_filter_two.array>.



鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
热门推荐
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap