在线时间:8:00-16:00
迪恩网络APP
随时随地掌握行业动态
扫描二维码
关注迪恩网络微信公众号
开源软件名称:pafka开源软件地址:https://gitee.com/paradigm4/pafka开源软件介绍:Pafka: Persistent Memory (PMem) Accelerated Kafka1. IntroductionPafka is an evolved version of Apache Kafka developed by MemArk. Kafka is an open-source distributed event streaming/message queue system for handling real-time data feeds efficiently and reliably. However, its performance (e.g., throughput and latency) is constrained by slow external storage. Pafka enhances Kafka based on tiered storage architecture, which is usually equipped with high-performance SSD or Intel® Optane™ Persistent Memory (PMem). With the careful design of data migration algorithms, it improves overall persistence performance with low cost. For example, it can well support the scenario that high data production rate is repeated after an interval of time (e.g., special discount is released every one hour from a shopping website); it is also capable of improving the overall performance when high throughput is required over a long period. Please refer to our latest blog for Pafka benchmark and use cases English (中文) 2. ArchitectureThe basic idea behind Pafka is to utilize tiered storage architecture to enhance overall performance of a system. Nowadays, a data center may have various kinds of storage devices, such as HDD, SSD, and state-of-the-art non-volatile persistent memory. However, Kafka is not aware of such storage hierarchy. In this project, we enhance Kafka by using the high performance storage device, e.g., PMem, as the first layer of storage, together with carefully designed migration algorithms, to significantly improve overall performance for specific scenarios. The key challenge of taking advantage of tiered storage is to design the data partitioning and migration mechanism between the fast and slow devices. The overall architecture and key workflow of Pafka are illustrated in the below figure. Basically, data is written onto PMem when there is available space, otherwise onto HDD. Besides, there is a background migration task to balance the data between PMem and HDD. Specifically, a new parameter 3. Get StartedFor the complete documentation of Kafka, refer to here. 3.1. Docker ImageThe easiest way to try Pafka is to use the docker image: https://hub.docker.com/r/4pdopensource/pafka-dev docker run -it 4pdopensource/pafka-dev bash If you use the docker image, you can skip the following 3.2. Compile3.2.1. Dependencies
We have already shipped pcj and llpl jars in After cloning the source code: # compile pcjcd pcjmake && make jarcp target/pcj.jar $PAFKA_HOME/libs# compile llplcd llplmake && make jarcp target/llpl.jar $PAFKA_HOME/libs 3.2.2. Build Pafka Jar./gradlew jar 3.3. Run3.3.1. Environmental SetupTo verify the correctness, you can use any file systems with normal hard disks. To take advantage of tiered storage architecture, it requires the availability of PMem hardware mounted as a DAX file system. 3.3.2. ConfigIn order to support tiered storage, we add some more config fields to the Kafka server config.
Sample config in config/server.properties is as follows: ######## start of tiered storage config ######### log file channel type; Options: "file", "pmem", "tiered".# if "file": use normal file as vanilla Kafka does. Following configs are not applicable.log.channel.type=tiered# the storage types for each layers (separated by ,)storage.tiers.types=NVME,HDD# first-layer storage paths (separated by ,)storage.tiers.first.paths=/nvme# first-layer storage capacities in bytes (separated by ,); -1 means use all the spacestorage.tiers.first.sizes=-1# second-layer storage paths (separated by ,)storage.tiers.second.paths=/hdd# threshold to control when to start the migration; -1 means no migration.storage.migrate.threshold=0.5# migration threadsstorage.migrate.threads=1# pmem-specific config# pre-allocated pool ratiolog.pmem.pool.ratio=0.8# log.preallocate have to set to true if pmem is usedlog.preallocate=true######## end of tiered storage config ######## 3.3.3. Start PafkaFollow instructions in https://kafka.apache.org/quickstart. Basically: bin/zookeeper-server-start.sh config/zookeeper.properties > zk.log 2>&1 &bin/kafka-server-start.sh config/server.properties > pafka.log 2>&1 & Benchmark PafkaProducerSingle Client# bin/kafka-producer-perf-test.sh --topic $TOPIC --throughput $MAX_THROUGHPUT --num-records $NUM_RECORDS --record-size $RECORD_SIZE --producer.config config/producer.properties --producer-props bootstrap.servers=$BROKER_IP:$PORTbin/kafka-producer-perf-test.sh --topic test --throughput 1000000 --num-records 1000000 --record-size 1024 --producer.config config/producer.properties --producer-props bootstrap.servers=localhost:9092 Multiple ClientsWe provide a script to let you run multiple clients on multiple hosts.For example, if you want to run 16 producers in each of the hosts, bin/bench.py --threads 16 --hosts "node-1 node-2" --num_records 100000000 --type producer In total, there are 32 clients, which will generate 100000000 records. Each client is responsible for populating one topic.
You can run ConsumerSingle Client# bin/kafka-consumer-perf-test.sh --topic $TOPIC --consumer.config config/consumer.properties --bootstrap-server $BROKER_IP:$PORT --messages $NUM_RECORDS --show-detailed-stats --reporting-interval $REPORT_INTERVAL --timeout $TIMEOUT_IN_MSbin/kafka-consumer-perf-test.sh --topic test --consumer.config config/consumer.properties --bootstrap-server localhost:9092 --messages 1000000 --show-detailed-stats --reporting-interval 1000 --timeout 100000 Multiple ClientsSimilarly, you can use the same script as producer benchmark to launch multiple clients. bin/bench.py --threads 16 --hosts "node-1 node-2" --num_records 100000000 --type consumer 4. Limitations
5. Roadmap
6. CommunityPafka is developed by MemArk (https://memark.io/en), which is a tech community focusing on leveraging modern storage architecture for system enhancement. MemArk is led by 4Paradigm (https://www.4paradigm.com/) and other sponsors (such as Intel). Please join our community for:
You can also contact the MemArk community for any feedback: [email protected] |
请发表评论