Skip to content

Commit 1b3fd83

Browse files
authored
DOCSP-37517 - GridFS (#62)
1 parent ef5ed72 commit 1b3fd83

File tree

5 files changed

+301
-84
lines changed

5 files changed

+301
-84
lines changed

source/fundamentals/gridfs.txt

Lines changed: 0 additions & 84 deletions
This file was deleted.
9.01 KB
Loading

source/includes/gridfs/gridfs.py

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# start create bucket
2+
const db = client.db(dbName);
3+
const bucket = new mongodb.GridFSBucket(db);
4+
# end create bucket
5+
6+
# start create custom bucket
7+
const bucket = new mongodb.GridFSBucket(db, { bucketName: 'myCustomBucket' });
8+
# end create custom bucket
9+
10+
# start upload files
11+
fs.createReadStream('./myFile').
12+
pipe(bucket.openUploadStream('myFile', {
13+
chunkSizeBytes: 1048576,
14+
metadata: { field: 'myField', value: 'myValue' }
15+
}));
16+
# end upload files
17+
18+
# start retrieve file info
19+
const cursor = bucket.find({});
20+
for await (const doc of cursor) {
21+
console.log(doc);
22+
}
23+
# end retrieve file info
24+
25+
# start download files name
26+
bucket.openDownloadStreamByName('myFile').
27+
pipe(fs.createWriteStream('./outputFile'));
28+
# end download files name
29+
30+
# start download files id
31+
bucket.openDownloadStream(ObjectId("60edece5e06275bf0463aaf3")).
32+
pipe(fs.createWriteStream('./outputFile'));
33+
# end download files id
34+
35+
# start rename files
36+
bucket.rename(ObjectId("60edece5e06275bf0463aaf3"), "newFileName");
37+
# end rename files
38+
39+
# start delete files
40+
bucket.delete(ObjectId("60edece5e06275bf0463aaf3"));
41+
# end delete files

source/write-operations.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,11 @@ Write Data to MongoDB
1616
/write/replace
1717
/write/delete
1818
/write/bulk-write
19+
/write/gridfs
1920

2021
- :ref:`pymongo-write-insert`
2122
- :ref:`pymongo-write-update`
2223
- :ref:`pymongo-write-replace`
2324
- :ref:`pymongo-write-delete`
2425
- :ref:`pymongo-bulk-write`
26+
- :ref:`pymongo-gridfs`

source/write/gridfs.txt

Lines changed: 258 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,258 @@
1+
.. _pymongo-gridfs:
2+
3+
=================
4+
Store Large Files
5+
=================
6+
7+
.. contents:: On this page
8+
:local:
9+
:backlinks: none
10+
:depth: 1
11+
:class: singlecol
12+
13+
.. facet::
14+
:name: genre
15+
:values: reference
16+
17+
.. meta::
18+
:keywords: binary large object, blob, storage
19+
20+
Overview
21+
--------
22+
23+
In this guide, you can learn how to store and retrieve large files in
24+
MongoDB by using **GridFS**. GridFS is a specification implemented by
25+
{+driver-short+} that describes how to split files into chunks when storing them
26+
and reassemble them when retrieving them. The driver's implementation of
27+
GridFS is an abstraction that manages the operations and organization of
28+
the file storage.
29+
30+
You should use GridFS if the size of your files exceeds the BSON document
31+
size limit of 16MB. For more detailed information on whether GridFS is
32+
suitable for your use case, see :manual:`GridFS </core/gridfs>` in the
33+
MongoDB Server manual.
34+
35+
The following sections describe GridFS operations and how to
36+
perform them.
37+
38+
How GridFS Works
39+
----------------
40+
41+
GridFS organizes files in a **bucket**, a group of MongoDB collections
42+
that contain the chunks of files and information describing them. The
43+
bucket contains the following collections, named using the convention
44+
defined in the GridFS specification:
45+
46+
- The ``chunks`` collection stores the binary file chunks.
47+
- The ``files`` collection stores the file metadata.
48+
49+
When you create a new GridFS bucket, the driver creates the preceding
50+
collections, prefixed with the default bucket name ``fs``, unless
51+
you specify a different name. The driver also creates an index on each
52+
collection to ensure efficient retrieval of the files and related
53+
metadata. The driver creates the GridFS bucket, if it doesn't exist, only when the first write
54+
operation is performed. The driver creates indexes only if they don't exist and when the
55+
bucket is empty. For more information about
56+
GridFS indexes, see :manual:`GridFS Indexes </core/gridfs/#gridfs-indexes>`
57+
in the MongoDB Server manual.
58+
59+
When storing files with GridFS, the driver splits the files into smaller
60+
chunks, each represented by a separate document in the ``chunks`` collection.
61+
It also creates a document in the ``files`` collection that contains
62+
a file ID, file name, and other file metadata. You can upload the file from
63+
memory or from a stream. See the following diagram to see how GridFS splits
64+
the files when uploaded to a bucket.
65+
66+
.. figure:: /includes/figures/GridFS-upload.png
67+
:alt: A diagram that shows how GridFS uploads a file to a bucket
68+
69+
When retrieving files, GridFS fetches the metadata from the ``files``
70+
collection in the specified bucket and uses the information to reconstruct
71+
the file from documents in the ``chunks`` collection. You can read the file
72+
into memory or output it to a stream.
73+
74+
.. _gridfs-create-bucket:
75+
76+
Create a GridFS Bucket
77+
----------------------
78+
79+
To store or retrieve files from GridFS, create a GridFS bucket by calling the
80+
``GridFSBucket()`` constructor and passing in a ``Database`` instance.
81+
You can use the ``GridFSBucket`` instance to
82+
call read and write operations on the files in your bucket.
83+
84+
.. literalinclude:: /includes/gridfs/gridfs.py
85+
:language: python
86+
:copyable: true
87+
:start-after: start create bucket
88+
:end-before: end create bucket
89+
90+
.. _gridfs-create-custom-bucket:
91+
92+
To create or reference a bucket with a custom name other than the default name
93+
``fs``, pass your bucket name as the second parameter to the ``GridFSBucket()``
94+
constructor, as shown below:
95+
96+
.. literalinclude:: /includes/gridfs/gridfs.py
97+
:language: python
98+
:copyable: true
99+
:start-after: start create custom bucket
100+
:end-before: end create custom bucket
101+
102+
.. _gridfs-upload-files:
103+
104+
Upload Files
105+
------------
106+
107+
Use the ``open_upload_stream()`` method from the ``GridFSBucket`` class to create an upload
108+
stream for a given file name. The
109+
``open_upload_stream()`` method allows you to specify configuration information
110+
such as file chunk size and other field/value pairs to store as metadata. Set
111+
these options as parameters of ``open_upload_stream()``, as shown in the
112+
following code example:
113+
114+
.. literalinclude:: /includes/gridfs/gridfs.py
115+
:language: python
116+
:copyable: true
117+
:start-after: start upload files
118+
:end-before: end upload files
119+
120+
.. _gridfs-retrieve-file-info:
121+
122+
Retrieve File Information
123+
-------------------------
124+
125+
In this section, you can learn how to retrieve file metadata stored in the
126+
``files`` collection of the GridFS bucket. The metadata contains information
127+
about the file it refers to, including:
128+
129+
- The ``_id`` of the file
130+
- The name of the file
131+
- The length/size of the file
132+
- The upload date and time
133+
- A ``metadata`` document in which you can store any other information
134+
135+
To retrieve files from a GridFS bucket, call the ``find()`` method on the ``GridFSBucket``
136+
instance. The method returns a ``Cursor`` instance
137+
from which you can access the results. To learn more about ``Cursor`` objects in
138+
{+driver-short+}, see :ref:`<pymongo-cursors>`.
139+
140+
The following code example shows you how to retrieve and print file metadata
141+
from all your files in a GridFS bucket. It uses the ``for...of`` syntax to traverse the
142+
``Cursor`` iterable and display the results:
143+
144+
.. literalinclude:: /includes/gridfs/gridfs.py
145+
:language: python
146+
:copyable: true
147+
:start-after: start retrieve file info
148+
:end-before: end retrieve file info
149+
150+
The ``find()`` method accepts various query specifications. You can use
151+
its parameters to specify the sort order, maximum number of documents to return,
152+
and the number of documents to skip before returning. To learn more about querying
153+
MongoDB, see :ref:`<pymongo-retrieve>`.
154+
155+
.. _gridfs-download-files:
156+
157+
Download Files
158+
--------------
159+
160+
You can download files from your MongoDB database by using the
161+
``open_download_stream_by_name()`` method from ``GridFSBucket`` to create a
162+
download stream.
163+
164+
The following example shows you how to download a file referenced
165+
by the file name, stored in the ``filename`` field, into your working
166+
directory:
167+
168+
.. literalinclude:: /includes/gridfs/gridfs.py
169+
:language: python
170+
:copyable: true
171+
:start-after: start download files name
172+
:end-before: end download files name
173+
174+
.. note::
175+
176+
If there are multiple documents with the same ``filename`` value,
177+
GridFS will stream the most recent file with the given name (as
178+
determined by the ``uploadDate`` field).
179+
180+
Alternatively, you can use the ``open_download_stream()``
181+
method, which takes the ``_id`` field of a file as a parameter:
182+
183+
.. literalinclude:: /includes/gridfs/gridfs.py
184+
:language: python
185+
:copyable: true
186+
:start-after: start download files id
187+
:end-before: end download files id
188+
189+
.. note::
190+
191+
The GridFS streaming API cannot load partial chunks. When a download
192+
stream needs to pull a chunk from MongoDB, it pulls the entire chunk
193+
into memory. The 255-kilobyte default chunk size is usually
194+
sufficient, but you can reduce the chunk size to reduce memory
195+
overhead.
196+
197+
.. _gridfs-rename-files:
198+
199+
Rename Files
200+
------------
201+
202+
Use the ``rename()`` method to update the name of a GridFS file in your
203+
bucket. You must specify the file to rename by its ``_id`` field
204+
rather than its file name.
205+
206+
The following example shows how to update the ``filename`` field to
207+
``"newFileName"`` by referencing a document's ``_id`` field:
208+
209+
.. literalinclude:: /includes/gridfs/gridfs.py
210+
:language: python
211+
:copyable: true
212+
:start-after: start rename files
213+
:end-before: end rename files
214+
215+
.. note::
216+
217+
The ``rename()`` method supports updating the name of only one file at
218+
a time. To rename multiple files, retrieve a list of files matching the
219+
file name from the bucket, extract the ``_id`` field from the files you
220+
want to rename, and pass each value in separate calls to the ``rename()``
221+
method.
222+
223+
.. _gridfs-delete-files:
224+
225+
Delete Files
226+
------------
227+
228+
Use the ``delete()`` method to remove a file's collection document and associated
229+
chunks from your bucket. This effectively deletes the file. You must
230+
specify the file by its ``_id`` field rather than its file name.
231+
232+
The following example shows you how to delete a file by referencing its ``_id`` field:
233+
234+
.. literalinclude:: /includes/gridfs/gridfs.py
235+
:language: python
236+
:copyable: true
237+
:start-after: start rename files
238+
:end-before: end rename files
239+
240+
.. note::
241+
242+
The ``delete()`` method supports deleting only one file at a time. To
243+
delete multiple files, retrieve the files from the bucket, extract
244+
the ``_id`` field from the files you want to delete, and pass each value
245+
in separate calls to the ``delete()`` method.
246+
API Documentation
247+
-----------------
248+
249+
To learn more about using {+driver-short+} to store and retrieve large files,
250+
see the following API documentation:
251+
252+
- `GridFSBucket <{+api-root+}gridfs/index.html#gridfs.GridFSBucket>`__
253+
- `open_upload_stream() <{+api-root+}gridfs/index.html#gridfs.GridFSBucket.open_upload_stream>`__
254+
- `find() <{+api-root+}gridfs/index.html#gridfs.GridFSBucket.find>`__
255+
- `open_download_stream_by_name() <{+api-root+}gridfs/index.html#gridfs.GridFSBucket.open_download_stream_by_name>`__
256+
- `open_download_stream() <{+api-root+}gridfs/index.html#gridfs.GridFSBucket.open_download_stream>`__
257+
- `rename() <{+api-root+}gridfs/index.html#gridfs.GridFSBucket.rename>`__
258+
- `delete() <{+api-root+}gridfs/index.html#gridfs.GridFSBucket.delete>`__

0 commit comments

Comments
 (0)