在线时间:8:00-16:00
迪恩网络APP
随时随地掌握行业动态
扫描二维码
关注迪恩网络微信公众号
开源软件名称(OpenSource Name):Kotlin/dataframe开源软件地址(OpenSource Url):https://github.com/Kotlin/dataframe开源编程语言(OpenSource Language):Kotlin 99.0%开源软件介绍(OpenSource Introduction):Kotlin Dataframe: typesafe in-memory structured data processing for JVMKotlin Dataframe aims to reconcile Kotlin static typing with dynamic nature of data by utilizing both the full power of Kotlin language and opportunities provided by intermittent code execution in Jupyter notebooks and REPL.
Integrates with Kotlin kernel for Jupyter. Inspired by krangl, Kotlin Collections and pandas Explore documentation for details. SetupGradlerepositories {
mavenCentral()
}
dependencies {
implementation 'org.jetbrains.kotlinx:dataframe:0.8.0'
} Jupyter NotebookInstall Kotlin kernel for Jupyter Import stable
or specific version:
Data model
Usage exampleCreate: // create columns
val fromTo by columnOf("LoNDon_paris", "MAdrid_miLAN", "londON_StockhOlm", "Budapest_PaRis", "Brussels_londOn")
val flightNumber by columnOf(10045.0, Double.NaN, 10065.0, Double.NaN, 10085.0)
val recentDelays by columnOf("23,47", null, "24, 43, 87", "13", "67, 32")
val airline by columnOf("KLM(!)", "{Air France} (12)", "(British Airways. )", "12. Air France", "'Swiss Air'")
// create dataframe
val df = dataFrameOf(fromTo, flightNumber, recentDelays, airline) Clean: // typed accessors for columns
// that will appear during
// dataframe transformation
val origin by column<String>()
val destination by column<String>()
val clean = df
// fill missing flight numbers
.fillNA { flightNumber }.with { prev()!!.flightNumber + 10 }
// convert flight numbers to int
.convert { flightNumber }.toInt()
// clean 'airline' column
.update { airline }.with { "([a-zA-Z\\s]+)".toRegex().find(it)?.value ?: "" }
// split 'fromTo' column into 'origin' and 'destination'
.split { fromTo }.by("_").into(origin, destination)
// clean 'origin' and 'destination' columns
.update { origin and destination }.with { it.lowercase().replaceFirstChar(Char::uppercase) }
// split lists of delays in 'recentDelays' into separate columns
// 'delay1', 'delay2'... and nest them inside original column `recentDelays`
.split { recentDelays }.inward { "delay$it" }
// convert string values in `delay1`, `delay2` into ints
.parse { recentDelays } Aggregate: clean
// group by the flight origin renamed into "from"
.groupBy { origin named "from" }.aggregate {
// we are in the context of single data group
// total number of flights from origin
count() into "count"
// list of flight numbers
flightNumber into "flight numbers"
// counts of flights per airline
airline.valueCounts() into "airlines"
// max delay across all delays in `delay1` and `delay2`
recentDelays.maxOrNull { delay1 and delay2 } into "major delay"
// separate lists of recent delays for `delay1`, `delay2` and `delay3`
recentDelays.implode(dropNulls = true) into "recent delays"
// total delay per destination
pivot { destination }.sum { recentDelays.intCols() } into "total delays to"
} Try it in Datalore and explore more examples here. Code of ConductThis project and the corresponding community are governed by the JetBrains Open Source and Community Code of Conduct. Please make sure you read it. LicenseKotlin Dataframe is licensed under the Apache 2.0 License. |
2023-10-27
2022-08-15
2022-08-17
2022-09-23
2022-08-13
请发表评论