Skip to content
This repository was archived by the owner on Apr 16, 2025. It is now read-only.

Core-corrosion: Adding gitoxide to git-protocol integration tests #780

Closed
wants to merge 3 commits into from
Closed

Core-corrosion: Adding gitoxide to git-protocol integration tests #780

wants to merge 3 commits into from

Conversation

Byron
Copy link
Contributor

@Byron Byron commented Sep 1, 2021

A draft PR to get feedback early. I's goal is to add additional Rust to the core of git which is currently accessed through a thin corrosive layer provided by the git2 crate.

By the end of this PR I'd expect all usages of git2 to be replaced with at least equivalently readable substitutes provided by gitoxide except for where git2 is explicitly under test.

Doing so is extremely valuable to gitoxide as it gets to develop its high-level-yet-performant API based on actual usage of a proven and mature library.

@Byron Byron requested a review from a team as a code owner September 1, 2021 14:05
@xla xla marked this pull request as draft September 2, 2021 02:50
@xla xla removed the request for review from a team September 2, 2021 02:50
@kim
Copy link
Contributor

kim commented Sep 2, 2021

all usages of git2

When you say "all", do you mean all?

There are some uses in the tests which function like fixtures (but have random keys, so are not easy to generate upfront). But there are also other uses which are tied to the fact that librad is, well, tied to git2. I would suggest to scope this exercise the former.

@Byron
Copy link
Contributor Author

Byron commented Sep 2, 2021

When you say "all", do you mean all?

😅 All in this file except for the git2 specific tests. The strategy here might be to look at git2 usages in tests from time to time and see if it's feasible to replace those with gitoxide, potentially implementing the missing bits and pieces. Maybe that way one will end up with an API surface big enough to be feasible for places outside of tests as well.

I would suggest to scope this exercise the former.

Absolutely, I want it done yesterday to finally finish that git-fetch workblock I am still in, technically. I will keep that as MVP'ish as possible too to get the upload-pack building block more quickly.

@Byron
Copy link
Contributor Author

Byron commented Sep 6, 2021

It feels I have been working on this for an eternity, so I did hope that it will 'just work' with some success. Will see what's causing the issue the gitoxide can't find an object that is definitely there but only if the repository was handled by thin_pack_x, which doesn't do much with these repositories.
It's very strange and I have the feeling an even stranger bug is to be found as cause of this.

On the bright side, one can already see how the git-repository API is shaping up, something like 2000 LOCs later.


Edit: The issue was caused by gitoxide filtering packs by their name and required them to start with pack-. Even though this now is common it's far from a requirement. Removing this constraint fixed the issue.

@Byron Byron marked this pull request as ready for review September 7, 2021 01:07
@Byron
Copy link
Contributor Author

Byron commented Sep 7, 2021

I think for the most part, using git-repository instead of git2 now is a pleasant experience API wise. gitoxide definitely was inspired by git2 in many places, while also trying to make the API more tree-like for chaining.

To make it nicer, I would love to remove some of the unwrap() calls in favor of returning results, which is done in some places but not everywhere. If that's agreeable, I can patch this.

Besides that, what do you think 🥺?

@Byron
Copy link
Contributor Author

Byron commented Sep 7, 2021

As a side-note, while preparing this PR with crates from crates.io I managed to break the build surprisingly as this workspace then used incompatible crates in the same build. The problem causing this is well-known and not easy to tackle, I started a discussion over at cargo smart-release to explore ways to make improve the situation.

There are ways to improve the situation already:

  • pin all gitoxide crate versions at the cost of not getting patches automatically and upgrading periodically.
  • consume all gitoxide code through git-repository without using the unstable flag. That's not feasible today as the API surface doesn't cover what's needed to interact with low-level git-protocol facilities. This would be my preferred choice as it would mean that git-repository indeed fulfils its promise of delivering highest performance with negligible overhead for the convenience layer it provides.
  • gitoxide pre-release plumbing crates bump their dependents (those workspace crates who use it) minor version once their minor version changes to prevent auto-updates on the consumers side. This is what's (much better) described in the smart-release issue I mentioned earlier.

)
.unwrap();
.unwrap()
.detach();
repo.reference(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always found it a bit odd that reference creates a reference, but well names

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me too! And it was aptly called create_reference before. But as I worked with this it dawned on me how libgit2 (or git2) is thinking and I started liking it and thought it was beneficial to not unnecessarily change names.

This also means that reference**s** now creates an intermediate object for iterators, and you guessed it, that was called iter_references() before I took the U-turn there as well.

I don't know if it's good to be so git2 inspired there, and already thought providing an alternative name for the same functionality potentially even behind feature

[2000 words later]… names are hard.

@@ -317,13 +335,17 @@ where
assert!(out.pack.is_some());
}

let remote_repo = git2::Repository::open(&remote).unwrap();
let mut remote_repo = git::open(remote.clone()).unwrap().into_easy_arc_exclusive(); // without GATs we need the arc version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GATs would be needed on gitoxide's side?

Copy link
Contributor Author

@Byron Byron Sep 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just there. Here is the code for that and the reason mutable RefMut<'argh_a_lifetime_that_cannot_be_expressed, Repository> can't be done yet.

@@ -341,6 +363,8 @@ where
update_tips(&local_repo, &out.wanted_refs).unwrap();
}

// Need to refresh it as it didn't notice the new pack
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting.. what does it exactly mean to open a repository, in terms of I/O, file handles, etc?

I'm asking because git2 does a lot actually (including resolving and parsing config files), so we ended up using git2::Repository from a (database) pool to avoid fd exhaustion and other issues. I suspect that we'd not use easy mode in a lot of places in order to be able to carefully manage resource consumption, but I think documentation about what the cost of an easy repo is would help.

Copy link
Contributor Author

@Byron Byron Sep 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, documentation is non-existing in places and I plan to rectify that with a quick-pass of doc-writing before I move on.

Right now open(…) doesn't do much except for mapping all packs in full and reading the local repository config file once. Everything else is lazy.

On the server you would do the same, but instead of holding a git::Repository you would hold a git::EasyArcExclusive. That way, access to the persistent pack memory maps is shared, but caches and on-demand maps (like packed-refs) are not.

Edit: An Easy repo is nothing more than a shared Repository and an cache, cloning the Easy shares the repo and creates a new cache.

Even though it's not implemented, I would like to have a refresh() to reload the object database. And thinking about it, it's easy to demo this API here but would then force the local repo to be an EasyArcExclusive as well.

Probably worth sketching this out while we are here.

Note that git-config will be fully loaded into memory so no handles will be left open.

@FintanH
Copy link
Contributor

FintanH commented Sep 7, 2021

As a side-note, while preparing this PR with crates from crates.io I managed to break the build surprisingly as this workspace then used incompatible crates in the same build.

One of my builds got bitten by this too 😬

@kim
Copy link
Contributor

kim commented Sep 7, 2021 via email

@Byron
Copy link
Contributor Author

Byron commented Sep 7, 2021

I would expect git-repository to be more
conservative than lower level crates, so we may not get functionality that is
considered unstable until there is a release of git-repository.

That's correct, it's in stability tier 1. It re-exports all crates in the same stability tier and hides everything else.

o I'm not sure: either git-repository re-exports everything else, and we can
control via an unstable feature […]

Such a feature toggle exists and is used to re-exports all lower tier crates. Along with a new version bumping scheme described in this cargo smart-release ticket, if any of the plumbing crates have breaking changes with a minor bump, git-repository would also have a minor bump. That way you would get patches automatically but not more, i.e. breaking changes. This would allow me to even post PRs for upgrades for you without your CI breaking surprisingly.

For a lack of a better idea, this would be my preference, as it seems more powerful than pinning all git-* crates to their version.

Thinking into the future of post-release crates, we would eventually stop using the unstable feature to get all the benefits of stability tier 1 while lower tier crates may still change, but you won't be using them directly anymore.

@Byron
Copy link
Contributor Author

Byron commented Sep 7, 2021

Pending Changes

  • Show how packs can be refreshed after pull/fetch
  • See if writing ObjectRefs is as easy as it sounds for consistency and unbound performance

repo.commit(
"refs/namespaces/foo/refs/heads/main",
&auth.to_ref(),
&auth.to_ref(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I was doing breaking changes, I unified parameter order 😅

let base_id = {
repo.commit(
"refs/namespaces/foo/refs/heads/main",
&auth.to_ref(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the to_ref() is necessary as we can't have real refs. The *Ref is borrowed, too, for maximum flexibility, or else those who can't move these have to clone them. That's cheap, but I guess it's the question of who has to make the call.

I could imagine introducing a trait for ToRef which eliminates the to_ref() call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and before I forget: the refs are a lie, internally it still creates an owned Commit as it's easier to work with. Namely, object ids are in hex but nobody really has these, so they would need internal conversion and memory to hold that. A possible optimization, but one I skipped 😅.

let auth = git::actor::Signature::now_local_or_utc("apollo", "[email protected]");

let empty_tree_id = repo
.write_object(&git::objs::Tree::empty())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now one can write object refs as well as owned objects, and there is a trait for that so calls look as good as they can.

@@ -341,6 +360,8 @@ where
update_tips(&local_repo, &out.wanted_refs).unwrap();
}

// Need to refresh it as it didn't notice the new pack
local_repo.refresh_object_database().unwrap();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly what would have to be done on the server as well once it causes new packs to be written. Internally it currently replaces itself, which is certainly something that could be improved one day.

@Byron
Copy link
Contributor Author

Byron commented Sep 10, 2021

I squashed everything into one commit and rebased against master. Is there anything else that would prevent merging this PR?

Copy link
Contributor

@FintanH FintanH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the tests, just have the one question :)

default-features = false # leave out max-performance as it doesn't build on msvc win due to sha1-asm. See https://github.com/RustCrypto/asm-hashes/issues/17 for details
features = [ "one-stop-shop", "async-network-client", "unstable" ]

[dependencies.git-features]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we adding these back rather than accessing them through git-repository?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It adds all of the default features back except for max-performance, which unfortunately breaks the build.

I was thinking that it might be better to remove max-performance from the default feature set for now, but felt bad about it.

However, now that I wrote this I revisited the issue and fixed it on git-repository side as the superior choice :).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, too. It seems like we need git-repository and git-packetline, but not the rest. Maybe a rebase went wrong or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! And even though I don't know how that happened, removing the superfluous packages did indeed not cause any trouble.

That way no special feature configuration is necessary.
Byron added a commit to GitoxideLabs/gitoxide that referenced this pull request Sep 10, 2021
Copy link
Contributor

@FintanH FintanH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve this message 👍

@kim
Copy link
Contributor

kim commented Sep 13, 2021

Closed via 07013fc

Thanks!

@kim kim closed this Sep 13, 2021
@Byron Byron deleted the git-protocol-integration-oxidized branch September 13, 2021 23:48
@Byron
Copy link
Contributor Author

Byron commented Sep 13, 2021

Please note in order to provide actual protection from unexpected breakage coming up through the crates exposed by git-repository cargo-smart-release will have to learn a new trick. For now I have manually bumped git-repository's minor version to assure of that.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants