-
Notifications
You must be signed in to change notification settings - Fork 50
Support encoding to file-like object #754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Support encoding to file-like object #754
Conversation
|
||
# This check is useless but it's critical to keep it to ensures that samples | ||
# is still alive during the call to encode_audio_to_file_like. | ||
assert samples.is_contiguous() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hate that we have to do this but I do not see any other obvious way to keep the input samples
alive for the duration of the call.
Claude is saying that we could just pass samples
as a py::object. We won't be able to turn it back to a tensor (as mentioned in the code comment above), but claude claims that passing it as a parameter will ensure that pybind will keep it alive. I cannot verify this.
@scotts, any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the keep-alive part, I believe Claude is right. If we pass something as a py::object
, that gets properly reference-counted which will keep the object alive. When we launder a pointer as an int, there's no reference counting.
Of course, we would ideally just pass the tensor - but we run into problems passing tensors as tensors into the pybind11 code. The next simplest thing that we probably can't do for performance reasons is to copy the tensor into either bytes or a list, and then pass those as py::object
. But since samples
will be large, I don't think we want to do that.
Most workarounds I can think of are worse. One that might be just as bad, but could potentially apply to both this situation and decoder creation:
- On the pybind11 side, we only create the
AVIOFileLikeContext
. We don't create the encoder or decoder. We do still accept the file-like objects, and they are still stored in theAVIOFileLIkeContext
. - We return an int from the C++ side to the Python side where that int is a pointer to the
AVIOFileLikeContext
. - On the PyTorch custom ops side, we have functions for create-from-file-like and encode-to-file-like that accept the int value and do a
reinterpret_cast<AVIOFileLikeContext*>
in the C++. Those are then passed to the decoder or encode.
As it is right now, we're doing a lot of ugly pointer casting with tensors. The above may actually be better, as then the pybind11 code is only really concerned with creating AVIOFIleLikeContext objects. It doesn't even need to know about encoders and decoders.
std::optional<int64_t> bit_rate = std::nullopt, | ||
std::optional<int64_t> num_channels = std::nullopt) { | ||
// We assume float32 *and* contiguity, this must be enforced by the caller. | ||
auto tensor_options = torch::TensorOptions().dtype(torch::kFloat32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep this technique, we can probably allow all dtypes by passing in the dtype from the Python side as ints. I assume the Python and C++ enums agree on values, but even if they don't, we can figure out the mapping. Ugly, but possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I don't think we need to support more dtypes than just float32: the input samples that the user gives us must be float32 already. This comment is just here to explicitly state the assumptions that are made within encode_audio_to_file_like
.
This PR adds the
to_file_like()
method to theAudioEncoder
. This allows users to encode samples into a custom file-like object that supportsseek
andwrite
methods.Marking this as draft because there are unresolved points (see below), but this is still ready for a solid first review round.