Skip to content

Add SQL-to-Kotlin DataFrame transition guide for backend developers #1377

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

zaleslaw
Copy link
Collaborator

No description provided.

Includes a comprehensive guide to help SQL and ORM users adapt to Kotlin DataFrame. Covers key concepts, equivalents for SQL/ORM operations, and practical examples. Updated TOC to include the new guide.
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a comprehensive guide for backend developers with SQL experience to transition to Kotlin DataFrame. The guide provides SQL-to-DataFrame mappings and explains key conceptual differences between SQL databases, ORMs, and Kotlin DataFrame.

  • Introduces Kotlin DataFrame concepts through familiar SQL terminology
  • Provides side-by-side comparisons of SQL commands and DataFrame operations
  • Explains the differences between DataFrame, SQL databases, and ORMs

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
Guide-for-backend-SQL-developers.md New comprehensive guide document with SQL-to-DataFrame mappings and conceptual explanations
d.tree Adds the new guide to the documentation navigation structure

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

| `ALTER TABLE DROP COLUMN` | `.remove("colName")` |
| `ALTER TABLE RENAME COLUMN` | `.rename { oldName }.into("newName")` |
| `ALTER TABLE MODIFY COLUMN` | `.convert { colName }.to<NewType>()` |

Copy link
Preview

Copilot AI Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO comment indicates incomplete documentation structure. Either merge the duplicate DDL sections or remove one of them to avoid confusion.

Suggested change

Copilot uses AI. Check for mistakes.

Copy link
Collaborator

@AndreiKingsley AndreiKingsley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an information about supported SQL DBs (and how to integrate unsupported !) with references to https://kotlin.github.io/dataframe/sql.html.


This guide helps Kotlin backend developers with SQL experience quickly adapt to **Kotlin DataFrame**, mapping familiar SQL and ORM operations to DataFrame concepts.

We recommend starting with [**Kotlin Notebook**](SetupKotlinNotebook.md) — an IDE-integrated tool similar to Jupyter Notebook.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure? I suppose "Backend Developers" would rather use this in a Gradle project, not in KTNB. I mean, they can combine approaches (try doing what they need first in the notebook and then using this code in the project), but it seems we should also include information about setting it up in Gradle projects.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it's good to recommend notebooks, but we should also provide a Gradle-only option

val rs = connection.createStatement().executeQuery("SELECT * FROM orders")
rs.readDataFrame(connection)
```

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a visual example here - original DB schema image and Output dataframe.


Ready to go deeper? Check out what’s next:

- 📘 **[Explore in-depth guides and various examples](Guides-And-Examples.md)** with different datasets,
Copy link
Collaborator

@AndreiKingsley AndreiKingsley Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of the first two points, it is better to refer to the quickstart guide, which will show the user the basics of working with DataFrame.
(simple logic: User just completed reading a DF from file -> "Ok, what should I do next?" -> The QS guide provides answers to this question! )


<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.guides.QuickStartGuide-->

## Quick Setup
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a tab here about setting up in Gradle, see comment above.


---

## 1. What is a DataFrame?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow here (and everywhere in the guide) our spelling conventions;
https://kotlin.github.io/dataframe/spellingconventions.html

DataFrame as object/type/class should be in backtics.
"dataframe" as a concept of tabular data (both abstract or concrete) should be written in lowercase.

@koperagen
Copy link
Collaborator

koperagen commented Aug 14, 2025

Can be interesting that columns store any values, not just primitives or predefined set of types. List, File, Map, user objects - DataFrame offers fully generic storage.
(It's mentioned in It can be created from any source but i missed it, my attention went mostly to tables :) )


We recommend starting with [**Kotlin Notebook**](SetupKotlinNotebook.md) — an IDE-integrated tool similar to Jupyter Notebook.

It lets you explore data interactively, render DataFrames, create plots, and use all your IDE features within the JVM ecosystem.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*dataframes


## Quick Setup

To start working with Kotlin DataFrame in a Kotlin Notebook, run the cell with the next code:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*a cell with the following code


## 1. What is a DataFrame?

If you’re used to SQL, a **DataFrame** is conceptually like a **table**:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

- **Columns**: named, typed fields
- **Schema**: a mapping of column names to types

Kotlin DataFrame also supports [**hierarchical, JSON-like data**](hierarchical.md) — columns can contain *nested DataFrames* or *column groups*, allowing you to represent and transform tree-like structures without flattening.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*nested dataframes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can also link to DataColumn.md#framecolumn etc.


Unlike a relational DB table:

- A DataFrame **lives in memory** — there’s no storage engine or transaction log
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*dataframe, or *`DataFrame` if you're talking about the instance of the class

.take(5)
```

## In conclusion
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*Conclusion

## In conclusion

- Kotlin DataFrame keeps the familiar SQL-style workflow (select → filter → group → aggregate) but makes it **type-safe** and fully integrated into Kotlin.
- The main focus is **readability**, schema change safety, and evolving API support via the [Compiler Plugin](Compiler-Plugin.md).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean with "evolving API support"?


- Kotlin DataFrame keeps the familiar SQL-style workflow (select → filter → group → aggregate) but makes it **type-safe** and fully integrated into Kotlin.
- The main focus is **readability**, schema change safety, and evolving API support via the [Compiler Plugin](Compiler-Plugin.md).
- It is neither a database nor an ORM — a DataFrame does not store data or manage transactions but works as an in-memory layer for analytics and transformations.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*dataframe

- It is neither a database nor an ORM — a DataFrame does not store data or manage transactions but works as an in-memory layer for analytics and transformations.
- It does not provide some SQL features (permissions, transactions, indexes) — but offers convenient tools for working with JSON-like structures and combining multiple data sources.
- Use Kotlin DataFrame as a **type-safe DSL** for post-processing, merging data sources, and analytics directly on the JVM, while keeping your code easily refactorable and IDE-assisted.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I miss something here about the disadvantages of DataFrame. We need to be honest too, if you have a large database with millions of rows, doing analysis with DF is likely not a good idea

- 💡 **[Use Kotlin DataFrame Compiler Plugin](Compiler-Plugin.md)**
for auto-generated column access in your IntelliJ IDEA projects.

- 📊 **Master Kandy** for stunning and expressive DataFrame visualizations learning
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

learning

| `WHERE amount > 100` | `df.filter { amount > 100 }` |
| `ORDER BY amount DESC` | `df.sortByDesc { amount }` |
| `GROUP BY region` | `df.groupBy { region }` |
| `SUM(amount)` | `.aggregate { sum(amount) }` |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use sum { amount }, i think sum(amount) is deprecated

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another reason to use korro as much as possible :D

(though, I don't think korro can help inside tables like here, can it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants