在线时间:8:00-16:00
迪恩网络APP
随时随地掌握行业动态
扫描二维码
关注迪恩网络微信公众号
开源软件名称(OpenSource Name):thomasnield/kotlin-statistics开源软件地址(OpenSource Url):https://github.com/thomasnield/kotlin-statistics开源编程语言(OpenSource Language):Kotlin 100.0%开源软件介绍(OpenSource Introduction):Kotlin StatisticsNOTE: UNSUPPORTED. PLEASE FORK AND SUPPORT.Idiomatic math and statistical extensions for KotlinThis library contains helpful extension functions to perform exploratory and production statistics in a Kotlin-idiomatic way. Read the introductory blog post here CommunityJoin the #datscience community on Kotlin Slack for community discussion on this library as well as Kotlin for data science. Build InstructionsYou can use Gradle or Maven to pull the latest release from Maven. Gradle
Maven
You can also use Maven or Gradle with JitPack to directly build a snapshot as a dependency. Gradle
Maven
Basic OperatorsThere are a number of extension function operators that support
Here is an example of using the val median = sequenceOf(1.0, 3.0, 5.0).median()
println(median) // prints "3.0" Slicing OperatorsThere are also simple but powerful
Below, we slice a sequence of class Item(val name: String, val value: Double)
val sequence = sequenceOf(
Item("Alpha", 4.0),
Item("Beta", 6.0),
Item("Gamma", 7.2),
Item("Delta", 9.2),
Item("Epsilon", 6.8),
Item("Zeta", 2.4),
Item("Iota", 8.8)
)
// find sums by name length, using pairs or functional arguments
val sumsByLengths = sequence
.map { it.name.length to it.value }
.sumBy()
val sumsByLengths = sequence
.sumBy(keySelector = { it.name.length }, doubleSelector = {it.value} )
println("Sums by lengths: $sumsByLengths")
// find averages by name length, using pairs or functional arguments
val averagesByLength = sequence
.map { it.name.length to it.value }
.averageBy()
val averagesByLength = sequence
.averageBy(keySelector = { it.name.length }, doubleSelector = {it.value})
//find standard deviations by name length, using pairs or functional arguments
val standardDeviationsByLength = sequence
.map { it.name.length to it.value }
.standardDeviationBy()
val standardDeviationsByLength = sequence
.standardDeviationBy(keySelector = { it.name.length }, valueSelector = {it.value})
println("Std Devs by lengths: $standardDeviationsByLength") OUTPUT:
These slicing operators are backed by a common Slicing Using Data ClassesYou can slice on multiple fields using data classes with the //declare Product class
class Product(val id: Int,
val name: String,
val category: String,
val section: Int,
val defectRate: Double)
// Create list of Products
val products = listOf(Product(1, "Rayzeon", "ABR", 3, 1.1),
Product(2, "ZenFire", "ABZ", 4, 0.7),
Product(3, "HydroFlux", "ABR", 3, 1.9),
Product(4, "IceFlyer", "ZBN", 1, 2.4),
Product(5, "FireCoyote", "ABZ", 4, 3.2),
Product(6, "LightFiber", "ABZ",2, 5.1),
Product(7, "PyroKit", "ABR", 3, 1.4),
Product(8, "BladeKit", "ZBN", 1, 0.5),
Product(9, "NightHawk", "ZBN", 1, 3.5),
Product(10, "NoctoSquirrel", "ABR", 2, 1.1),
Product(11, "WolverinePack", "ABR", 3, 1.2)
)
// Data Class for Grouping
data class Key(val category: String, val section: Int)
// Get Count by Category and Section
val countByCategoryAndSection =
products.countBy { Key(it.category, it.section) }
println("Counts by Category and Section")
countByCategoryAndSection.entries.forEach { println(it) }
// Get Average Defect Rate by Category and Section
val averageDefectByCategoryAndSection =
products.averageBy(keySelector = { Key(it.category, it.section) }, doubleSelector = { it.defectRate })
println("\nAverage Defect Rate by Category and Section")
averageDefectByCategoryAndSection.entries.forEach { println(it) } OUTPUT:
Slicing by Ranges/BinsYou can also group by ranges (or known in statistics as "bins" or a "histogram").
Slicing By NumbersThere are specialized bin operators that deal with numeric ranges for import java.time.LocalDate
fun main(args: Array<String>) {
data class Sale(val accountId: Int, val date: LocalDate, val value: Double)
val sales = listOf(
Sale(1, LocalDate.of(2016,12,3), 180.0),
Sale(2, LocalDate.of(2016, 7, 4), 140.2),
Sale(3, LocalDate.of(2016, 6, 3), 111.4),
Sale(4, LocalDate.of(2016, 1, 5), 192.7),
Sale(5, LocalDate.of(2016, 5, 4), 137.9),
Sale(6, LocalDate.of(2016, 3, 6), 125.6),
Sale(7, LocalDate.of(2016, 12,4), 164.3),
Sale(8, LocalDate.of(2016, 7,11), 144.2)
)
//bin by double ranges
val binned = sales.binByDouble(
valueSelector = { it.value },
binSize = 20.0,
rangeStart = 100.0
)
binned.forEach(::println)
} OUTPUT:
Slicing by ComparablesYou can group any import java.time.LocalDate
fun main(args: Array<String>) {
data class Sale(val accountId: Int, val date: LocalDate, val value: Double)
val sales = listOf(
Sale(1, LocalDate.of(2016,12,3), 180.0),
Sale(2, LocalDate.of(2016, 7, 4), 140.2),
Sale(3, LocalDate.of(2016, 6, 3), 111.4),
Sale(4, LocalDate.of(2016, 1, 5), 192.7),
Sale(5, LocalDate.of(2016, 5, 4), 137.9),
Sale(6, LocalDate.of(2016, 3, 6), 125.6),
Sale(7, LocalDate.of(2016, 12,4), 164.3),
Sale(8, LocalDate.of(2016, 7,11), 144.2)
)
//bin by quarter
val byQuarter = sales.binByComparable(
valueSelector = { it.date.month },
binIncrements = 3,
incrementer = { it.plus(1L) }
)
byQuarter.forEach(::println)
} OUTPUT:
Custom Binning OperationsIf you want to perform a mathematical aggregation on a certain property for each item (rather than group up the items into a import java.time.LocalDate
fun main(args: Array<String>) {
data class Sale(val accountId: Int, val date: LocalDate, val value: Double)
val sales = listOf(
Sale(1, LocalDate.of(2016,12,3), 180.0),
Sale(2, LocalDate.of(2016, 7, 4), 140.2),
Sale(3, LocalDate.of(2016, 6, 3), 111.4),
Sale(4, LocalDate.of(2016, 1, 5), 192.7),
Sale(5, LocalDate.of(2016, 5, 4), 137.9),
Sale(6, LocalDate.of(2016, 3, 6), 125.6),
Sale(7, LocalDate.of(2016, 12,4), 164.3),
Sale(8, LocalDate.of(2016, 7,11), 144.2)
)
//bin sums by quarter
val totalValueByQuarter = sales.binByComparable(
valueSelector = { it.date.month },
binIncrements = 3,
incrementer = { it.plus(1L) },
groupOp = { it.map(Sale::value).sum() }
)
totalValueByQuarter.forEach(::println)
} OUTPUT:
Random SelectionKotlin-Statistics has a few helpful extensions to randomly sample elements from an
Weighted Coin/Dice - Discrete PDF SamplingRather than do a pure random sampling, there may be times you want different values of type The A val riggedCoin = WeightedCoin(trueProbability = .80)
// flip coin 100000 times and print outcome counts
(1..100000).asSequence().map { riggedCoin.flip() }
.countBy()
.also {
println(it)
} OUTPUT:
You can use the val threeSidedDice = WeightedDice(
"A" to .11,
"B" to .66,
"C" to .22
)
// roll dice 1000 times and print outcome counts
(1..1000).asSequence().map { threeSidedDice.roll() }
.countBy()
.also {
println(it)
} OUTPUT:
Typically with enum class Move {
ATTACK,
DEFEND,
HEAL,
RETREAT
}
fun main(args: Array<String>) {
val gameDice = WeightedDice(
Move.ATTACK to .60,
Move.DEFEND to .20,
Move.HEAL to .10,
Move.RETREAT to .10
)
val nextMove = gameDice.roll()
println(nextMove)
} Naive Bayes ClassifierThe You can then test a new set of features For instance, say you want to identify email as spam/not spam based on the words in the messages. In this case In idiomatic Kotlin fashion we can take a simple class Email(val message: String, val isSpam: Boolean)
val emails = listOf(
Email("Hey there! I thought you might find this interesting. Click here.", isSpam = true),
Email("Get viagra for a discount as much as 90%", isSpam = true),
Email("Viagra prescription for less", isSpam = true),
Email("Even better than Viagra, try this new prescription drug", isSpam = true),
Email("Hey, I left my phone at home. Email me if you need anything. I'll be in a meeting for the afternoon.", isSpam = false),
Email("Please see attachment for notes on today's meeting. Interesting findings on your market research.", isSpam = false),
Email("An item on your Amazon wish list received a discount", isSpam = false),
Email("Your prescription drug order is ready", isSpam = false),
Email("Your Amazon account password has been reset", isSpam = false),
Email("Your Amazon order", isSpam = false)
)
val nbc = emails.toNaiveBayesClassifier(
featuresSelector = { it.message.splitWords().toSet() },
categorySelector = {it.isSpam }
)
fun String.splitWords() = split(Regex("\\s")).asSequence()
.map { it.replace(Regex("[^A-Za-z]"),"").toLowerCase() }
.filter { it.isNotEmpty() } We can then use this // TEST 1
val input = "discount viagra wholesale, hurry while this offer lasts".splitWords().toSet()
val predictedCategory = nbc.predict(input)
Assert.assertTrue(predictedCategory == true)
// TEST 2
val input2 = "interesting meeting on amazon cloud services discount program".splitWords().toSet()
val predictedCategory2 = nbc.predict(input2)
Assert.assertTrue(predictedCategory2 == false) Here is another example that categorizes bank transactions. 全部评论
专题导读
上一篇:akshayravikumar/TeXnique: A LaTeX Typesetting Game发布时间:2022-07-09下一篇:latex3/hyperref: Hypertext support for LaTeX发布时间:2022-07-09热门推荐
热门话题
阅读排行榜
|
请发表评论