-
Notifications
You must be signed in to change notification settings - Fork 575
RFC: Introducing Transactions extension to Modular Filesystems #245
Conversation
Hello, I am currently implementing s3 filesystem plugin and gcs filesystem plugin for Tensorflow. Currently, we use this approach "prefetching the data to local system and then uploading after processing". I have just skimmed through your RFC but I think this may cause an increasing of requests made to the cloud server ? We choose the prefetching to minimize the requests made to the cloud server |
Hi, |
I agree that it has some limitations, however, as far as I know, cloud filesystems like s3 and gcs do not allow objects to be modified. Once we upload something to the cloud server, the object become immutable. So in order to implement |
Of course there are workarounds for such issues but not necessarily optimal solutions for all cases. Transactions allow users to express intent. For example consider write to a file and conditionally maybe append to it. With the mechanism that you describe, file will be uploaded when closed after write() and uploaded to the cloud. Then if append branch is taken, it will be downloaded appended and uploaded again. However with transactions, user can close the transaction after append() decision and operation finished, which hints the file system to upload once the transaction is done, saving time, storage and bandwidth. If you keep file in local system until process is finished, there is a chance that it fails down the road, loosing all the work went into the production of these files. Also in inference case, TF might be producing files that is being uploaded for consumption by other processes/services. So keeping the file until the end of the process is a no-go under that circumstances. Also there are non-classical persistent storage systems other than gcs or s3 that may benefit from knowing the users intent. |
You can take a look at what I am doing here: GCS plugin, S3 plugin. Basically, I keep a namespace tf_random_access_file {
typedef struct S3File {
const char* bucket;
const char* object;
const std::shared_ptr<Aws::Transfer::TransferManager>& transfer_manager;
// FstreamWithName is a children class of `std::fstream`
// with a member variable `name` generated randomly
// something like tensorflow_tmp_abcdef_tmp_s3_filesystem
// to prevent it from being used by another process
// and it get deleted when it goes out of scope.
std::shared_ptr<FstreamWithName> temp_file;
} S3File;
It is true that I keep a temporary file until the users explicitly call
So you means transactions act like a signal to tell the cloud filesystem that everything is done, the server can close the connection, port, and the connection stream or something like this ? As far as I know, gcs and s3 use curl http GET/POST internally and they do not work like this. If the transactions act like a signal for tensorflow to upload, I have implemented in a way that
Currently, Tensorflow only supports 3 non-classical persistent storage systems |
rfcs/20200505-transactional-fs.md
Outdated
class Filesystem { | ||
// Transaction Token API extensions | ||
virtual Status GetTransactionTokenForFile(const string& file_name,std::unique_ptr<TransactionToken>* token) = 0 | ||
virtual Status StartTransaction(const string& transaction_name, std::unique_ptr<TransactionToken>* token) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::unique_ptr* is a weird type (pointer to unique pointer to TransactionToken)
I think the signature should just be TransactionToken* and we should clearly separate methods which can create/destroy a TransactionToken from ones that just need a reference to it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @alextp Thanks for the comments. I made them as unique_ptr* to be inline with existing implementations. I don't have any special preference for it. I will change it to TransactionToken*
|
||
Transaction token will be owned by the Filesystem and use of it after `EndTransaction` will be an invalid operation. | ||
|
||
File classes can be modified to keep TransactionToken, assigned by the filesystem on their construction using given scope, or default scope if not given. Filesystems may ignore it if transaction at that level doesn't make sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think saying that filesystems may ignore transaction tokes is very useful as this makes it impossible for code to work correctly on FSs with and without transaction tokens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not completely sure on what you meant here. I believe not all file systems will make use of transactions. For example for a RAM based file system, most if not all transactions are meaningless. Such a file system can return dummy transaction tokens and ignore the transaction token pointers it is given. So all file systems will accept tokens but not necessarily create or act on different transactions. I will try to add some examples to clarify this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a file system is guaranteed to have no other users then transactions are meaningless; otherwise transactions are meaningful.
Like, even on my local disk if I write a model checkpoint with N files and another process goes and deletes file 1 while I'm writing file N I have an invalid checkpoint. Similarly if I start writing a checkpoint with N files and another process tries to read from my local FS and sees the half-written checkpoint it'll fail. Creating a no-op transaction token for my local filesystem would then let these issues go through as opposed to actually doing the atomic writes we will do in networked filesystems if we do have transactions.
I think my question was more about what the intent of the token is: is it "please try to do a transaction but don't bother if you can't" or is it "I want to be able to rely on these things being atomic and please tell me if something failed".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be really hard to implement the things you describe across processes, with or without transactions since it would require some form of synchronization between any potential reader/writer combination. Since it is not possible, it is not the intent here. Purpose of the transactions is to relay the filesystem a set of operations are complete together. In your example, transaction will cover N file writes. For example, if any other reader tries to read a file that is generated in transaction before transaction is finalized, FS can block that thread. So transactions are hints to file system saying "N operations are meaningful together".
If the underlying persistent storage doesn't have atomic operation support, only solution is to implement blocking mechanism. I would say it is closer to first interpretation than the second.
For the posix file example you gave, if multiple processes are using same posix fs plugin, it may be possible to use lock files to mimic ipc. But if consumer of checkpoints is a copy/move script for example, not much can be done.
Existing filesystem C++ api can easily be expanded by addition of three methods, an opaque structure and possibly a helper class to support transactions. | ||
|
||
```cpp | ||
struct TransactionToken{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One alternative, less intrusive, API modification is instead of passing transaction tokens to all FS methods optionally we can have a variant of GetFileSystemForFile which instead is like StartTransactionForFile which returns a FileSystem which, when dereferenced, commits the transaction. Implicitly then all operations made on this file system are a part of the transaction until committed.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alextp +1 for non-intrusive APIs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understood correctly, that is a quite brilliant idea. You are talking about binding the token to FS object, at which, all FS operations done on this objects belong to same transaction. But since most existing code uses Env class for file operations which forward it to respective FS, it would require more changes in the code base, where all code using Env for file system interaction needs to be changed to use FS directly. If this is acceptable, I liked your idea better.
In C api, things can get a bit messier though, since it would mean constructing all TF_*Ops structures and a lot of gymnastics to get it right unless we require C api still requires tokens as proposed in this RFC, but C++ wrappers incorporates the logic you brought up. But that would mean that no direct C api interaction with file systems in framework code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point that the naive less intrusive approach I suggested won't work.
I don't have a rejoinder now, but I'll try to think of other non-intrusive or less intrusive approaches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note: Core TF would never directly call the filesystem C API outside of the C++ wrapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can block the operation to attach a file to a token if the file is already attached to the other token. This ensure blocking behavior and delaying overwrites.
Maybe this is actually the better implementation for concurrent transactions on the same file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case should filesystem ensure whether added object is directory or file? What happens in the case of directories. Also we have to make sure that appending to file uses same token, if done in another method. Which may require a unique filesystem based token key to be used when starting a transaction which would return existing token if there still an ongoing transaction with the same key or a new transaction if there isn't.
I am OK with this approach, but all these will start adding quite complicated layers of bookeeping into the filesystems which may have an impact on certain operations or filesystems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we prefer keeping API over complicating implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the new parameter is intrusive and requires more changes than adding the declarative API to create a token and then attach objects to it. Modular filesystem design does not allow subclassing and only creates one single operations table per URI scheme, so core TF won't be able to pick between the subclass that has tokens and the one that ignores them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if we relax subclassing of Modular filesystem or just make it something like base subsystem and let users always interact through wrapped filesystem, it won't be an issue right? Op table can contain the api with the tokens but wrapper may keep them as optional. There still will be one op table per URI and one filesystem. Users will access through filesystem pointer instead of Env::Default() and that would probably be only difference. This would be done purely on FW side and plugins only implement the api.
|
||
For the new proposed filesystem plugin mechanism, two possible approaches exists. For `TF_RandomAccessFile`, `TF_WritableFile`, and `TF_ReadOnlyMemoryRegion` structures, | ||
|
||
- Opaque pointers stay as is, thus no changes is needed in structures. Then each filesystem attach tokens to their own internal structures pointed by `void*`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer keeping the opaque pointers.
|
||
Transactions can also help with some filesystem access inconsistencies that can happen while reading and writing checkpoints. For example in while on thread reading files from a directory other may be modifying the underlying files. This could lead to reading an inconsistent or corrupt set of files by the reader. With transactions, each thread can have different transaction tokens and underlying file system can choose to postpone modification of files by redirecting them to a temporary location and then moving it in place when transactions end. | ||
|
||
## User Benefit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question I have is how do we expect FSs with transactions to coexist with FSs without transactions.
Do we expect only user-written python code to be made aware of transactions, or do we also expect opkernel C++ code to be transaction-aware? I ask because we do a lot of IO inside opkernels today (summary writing, checkpointing, data reading, etc) and I'd like to understand whether we plan to modify all this to use transactions and how.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The C++/C code change will also impact the existing external file systems plugins that are outside of TensorFlow Core repo. The tensorflow/io repo have several: Azfs (Microsoft Azure), Apache Ignite, OSS, and http. The file system plugins also exists for many other places outside of tensorflow org.
Wondering if there is a non-intrusive method to not break implementations for many places. cc @mihaimaruseac
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this proposal is an extension to Modular File Systems RFC, it is expected all filesystems use the API, however as I mentioned above, this may be as much as adding extra pointer to their methods and ignoring passed in value. Filesystems should not be forced to act on transactions and allowed to ignore or use a default transaction.
@yongtang Since all file systems are expected to migrate plugin api, I don't think an API change is avoidable and transaction tokens can easily be included at that time.
The second point about making C++ code transaction aware is unfortunately more intrusive. I can think of 2 options there
-
Transactions are initiated before ops so that ops can call GetTransactionTokenForFile(or equivalent from you proposal). This probably needs to be achieved by adding transaction begin/end ops to the graph, possibly by the python layer that inserts C++ ops.
-
Transaction tokens are made inputs and outputs of these C++ ops, and created by upstream begin transaction ops and finalized by downstream end transaction ops.
@alextp a quick search didn't turn up many ops, are you aware of any other ops doing file io other than save_restore_v2_ops?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samikama ReadFile and WriteFile are some of the C++ kernel ops, mostly in core/ops/io_ops.cc. But tf.data also has quite a few ops that touches file system API implicitly. As those C++ kernel op APIs are mostly frozen, the input args of the ops may not be easily changeable (version compatibility).
That sounds good to me. Can you update the proposal?
…On Wed, May 20, 2020 at 1:39 PM Sami Kama ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In rfcs/20200505-transactional-fs.md
<#245 (comment)>:
> +
+With this extension proposal, users will have a more stable access to cloud storage systems as well as checkpointing redundancy can improve.
+
+## Design Proposal
+
+This RFC proposes to extend the [filesystem plugin rfc][filesystem_plugin] api with transaction markers. There can be different levels of transactions. First level can be global transactions, user starts a transaction scope. Any operations done within this scope will be grouped in this transaction. This is easiest to implement but have drawbacks such as only one transaction can be present at a time and different type of transactions may not always be reordered.
+Alternative to this approach is having multiple transaction scopes. User can create a transaction token and pass it to filesystem operations for plugin to differentiate among independent transactions. This token can be per file or per directory level granularity. Even though per file would give most flexibility, per directory level of transaction detail is most likely sufficient.
+
+Filesystem plugins may choose to ignore the transaction scopes or can delay the operations until the termination of transaction scope.
+
+### Extension to existing filesystem implementation
+
+Existing filesystem C++ api can easily be expanded by addition of three methods, an opaque structure and possibly a helper class to support transactions.
+
+```cpp
+struct TransactionToken{
Something like
// Replace Env::Default()->(filesystems methods with filesystem// Env::Default()->CreateDir(dirname);
FileSystem* FS=Env::Default()->StartTransactionForURI(fname);
FS->CreateDir(dirname);//.....// Env::Default()->RenameFile(...)
FS->RenameFile(....);//...// Env::Default()->DeleteDir(..)
FS->DeleteDir(dirname);
FS->EndTransaction();
Since with @mihaimaruseac <https://github.com/mihaimaruseac> said that C
layer is always accessed through wrapper, Same logic can be implemented
with plugins.
But then the API should be extended a little bit so that users can
continue an existing transaction.
This does not necessarily reduce the total lines of code that need to be
changed but simplifies token tracing.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#245 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAABHRNU76R6A3MNJFUNUW3RSQ5YDANCNFSM4NAFXFCQ>
.
--
- Alex
|
Hi All, I updated the proposal, including some of the changes discussed above and approaches from @mihaimaruseac and @alextp with a trivial example and how it would be modified for further discussions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the stateful tokens, the existing API methods should not have the token in the implementation
Notes from the design meeting:
Conclusions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Let's add an example on how the entire checkpointing operation would get transformed (all relevant layers) and then we can merge this
Looks like the example was added. @samikama are we ready for the final LGTM from @mihaimaruseac and @alextp? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you for the example
LGTM |
Thanks all for your feedback and help on this. I will start opening PRs once @ematejska finalizes the procedure. |
It is all finalized :) |
PiperOrigin-RevId: 370587801 Change-Id: I04f1b2e30be7ef010a4445aaf657ed0de70e3af6
PiperOrigin-RevId: 370681107 Change-Id: I06d48c974357b494f875b6f1247dd4032119e9f9
cc @samikama @mihaimaruseac
Transactional File Systems Support
Objective
The aim of this RFC to extend filesystem access support to persistent storage that
provide transactional access and eventual consistency.