在线时间:8:00-16:00
迪恩网络APP
随时随地掌握行业动态
扫描二维码
关注迪恩网络微信公众号
开源软件名称:stream-dm开源软件地址:https://gitee.com/mirrors/stream-dm开源软件介绍:streamDM for Spark StreamingstreamDM is a new open source software for mining big data streams using Spark Streaming, started at Huawei Noah's ArkLab. streamDM is licensed under Apache Software License v2.0. Big Data Stream LearningBig Data stream learning is more challenging than batch or offline learning,since the data may not keep the same distribution over the lifetime of thestream. Moreover, each example coming in a stream can only be processed once, orthey need to be summarized with a small memory footprint, and the learningalgorithms must be very efficient. Spark StreamingSpark Streaming is an extension of thecore Spark API that enables stream processing froma variety of sources. Spark is a extensible and programmable framework formassive distributed processing of datasets, called Resilient DistributedDatasets (RDD). Spark Streaming receives input data streams and divides the datainto batches, which are then processed by the Spark engine to generate theresults. Spark Streaming data is organized into a sequence of DStreams, representedinternally as a sequence of RDDs. Included MethodsIn this current release of StreamDM v0.2, we have implemented: we also implemented following data generators:
We have also implemented SampleDataWriter, which can call data generatorsto create sample data for simulation or test. In the next release of streamDM, we are going to add:
For future works, we are considering:
Going FurtherFor a quick introduction to running StreamDM, refer to the GettingStarted document. The StreamDM ProgrammingGuide presents a detailed view of StreamDM. The full APIdocumentation can be consulted here. Environment
Mailing listsUser support and questions mailing list:Development related discussions: |
请发表评论