Skip to content

Trackio #2978

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft

Trackio #2978

wants to merge 7 commits into from

Conversation

abidlabs
Copy link
Member

Congratulations! You've made it this far! Once merged, the article will appear at https://huggingface.co/blog. Official articles
require additional reviews. Alternatively, you can write a community article following the process here.

Preparing the Article

You're not quite done yet, though. Please make sure to follow this process (as documented here):

  • Add an entry to _blog.yml.
  • Add a thumbnail. There are no requirements here, but there is a template if it's helpful.
  • Check you use a short title and blog path.
  • Upload any additional assets (such as images) to the Documentation Images repo. This is to reduce bloat in the GitHub base repo when cloning and pulling. Try to have small images to avoid a slow or expensive user experience.
  • Add metadata (such as authors) to your md file. You can also specify guest or org for the authors.
  • Ensure the publication date is correct.
  • Preview the content. A quick way is to paste the markdown content in https://huggingface.co/new-blog. Do not click publish, this is just a way to do an early check.

Here is an example of a complete PR: #2382

Getting a Review

Please make sure to get a review from someone on your team or a co-author.
Once this is done and once all the steps above are completed, you should be able to merge.
There is no need for additional reviews if you and your co-authors are happy and meet all of the above.

Feel free to add @pcuenca as a reviewer if you want a final check. Keep in mind he'll be biased toward light reviews
(e.g., check for proper metadata) rather than content reviews unless explicitly asked.

@@ -6383,3 +6383,13 @@
- research
- evaluation
- ai

- local: trackio
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love the url


## Limitations

Trackio is intentionally lightweight and not intended to be a fully-featured experiment tracking solution. It is currently in beta, and the database schema may change, which could require migrating or deleting existing database files (by default at `~/.cache/huggingface/trackio`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mention Parquet + effficient Hub storage thanks to Xet? (sorry for again pushing for this but i think this could be one of the best killer use cases of Xet storage, cc @lhoestq @kszucs

Copy link
Member

@kszucs kszucs Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect duplicated data in/between tracio databases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes @znation is currently working on the parquet conversion: gradio-app/trackio#88, and we'll definitely include that in the blog post as well

Comment on lines +44 to +76
`trackio` is designed to be a drop-in replacement for experiment tracking libraries like `wandb`. The API is compatible with `wandb.init`, `wandb.log`, and `wandb.finish`, so you can simply import `trackio` as `wandb` in your code. Here is an example:

```python
import trackio as wandb
import random
import time

runs = 3
epochs = 8

def simulate_multiple_runs():
for run in range(runs):
wandb.init(project="fake-training", config={
"epochs": epochs,
"learning_rate": 0.001,
"batch_size": 64
})
for epoch in range(epochs):
train_loss = random.uniform(0.2, 1.0)
train_acc = random.uniform(0.6, 0.95)
val_loss = train_loss - random.uniform(0.01, 0.1)
val_acc = train_acc + random.uniform(0.01, 0.05)
wandb.log({
"epoch": epoch,
"train_loss": train_loss,
"train_accuracy": train_acc,
"val_loss": val_loss,
"val_accuracy": val_acc
})
time.sleep(0.2)
wandb.finish()

simulate_multiple_runs()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first code snippet in the blog post, so it’s likely the one people will copy and paste the most. While I love the idea of a drop-in replacement, I don’t think it should be a long-term design choice. With that in mind, I’m suggesting a slightly different version:

Suggested change
`trackio` is designed to be a drop-in replacement for experiment tracking libraries like `wandb`. The API is compatible with `wandb.init`, `wandb.log`, and `wandb.finish`, so you can simply import `trackio` as `wandb` in your code. Here is an example:
```python
import trackio as wandb
import random
import time
runs = 3
epochs = 8
def simulate_multiple_runs():
for run in range(runs):
wandb.init(project="fake-training", config={
"epochs": epochs,
"learning_rate": 0.001,
"batch_size": 64
})
for epoch in range(epochs):
train_loss = random.uniform(0.2, 1.0)
train_acc = random.uniform(0.6, 0.95)
val_loss = train_loss - random.uniform(0.01, 0.1)
val_acc = train_acc + random.uniform(0.01, 0.05)
wandb.log({
"epoch": epoch,
"train_loss": train_loss,
"train_accuracy": train_acc,
"val_loss": val_loss,
"val_accuracy": val_acc
})
time.sleep(0.2)
wandb.finish()
simulate_multiple_runs()
`trackio` is designed to be a drop-in replacement for experiment tracking libraries like `wandb`. The API is compatible with `wandb.init`, `wandb.log`, and `wandb.finish`, so you can simply import `trackio` as `wandb` in your code.
```diff
- import wandb
+ import trackio as wandb
```
Here is an example:
```python
import trackio
import random
import time
runs = 3
epochs = 8
def simulate_multiple_runs():
for run in range(runs):
trackio.init(project="fake-training", config={
"epochs": epochs,
"learning_rate": 0.001,
"batch_size": 64
})
for epoch in range(epochs):
train_loss = random.uniform(0.2, 1.0)
train_acc = random.uniform(0.6, 0.95)
val_loss = train_loss - random.uniform(0.01, 0.1)
val_acc = train_acc + random.uniform(0.01, 0.05)
trackio.log({
"epoch": epoch,
"train_loss": train_loss,
"train_accuracy": train_acc,
"val_loss": val_loss,
"val_accuracy": val_acc
})
time.sleep(0.2)
trackio.finish()
simulate_multiple_runs()

Copy link
Contributor

@merveenoyan merveenoyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love this! will we have this in model repos or is it going to be Spaces attached to model repos? 👀 i.e. when you push a model to Hub it spawns a Space with logs attached to model repo @abidlabs @julien-c
I think it would be nice to mention if we have even more plans

added a point about energy/transparency ⚡
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants