|
| 1 | +.. _pymongo-gridfs: |
| 2 | + |
| 3 | +================= |
| 4 | +Store Large Files |
| 5 | +================= |
| 6 | + |
| 7 | +.. contents:: On this page |
| 8 | + :local: |
| 9 | + :backlinks: none |
| 10 | + :depth: 1 |
| 11 | + :class: singlecol |
| 12 | + |
| 13 | +.. facet:: |
| 14 | + :name: genre |
| 15 | + :values: reference |
| 16 | + |
| 17 | +.. meta:: |
| 18 | + :keywords: binary large object, blob, storage |
| 19 | + |
| 20 | +Overview |
| 21 | +-------- |
| 22 | + |
| 23 | +In this guide, you can learn how to store and retrieve large files in |
| 24 | +MongoDB by using **GridFS**. GridFS is a specification implemented by |
| 25 | +{+driver-short+} that describes how to split files into chunks when storing them |
| 26 | +and reassemble them when retrieving them. The driver's implementation of |
| 27 | +GridFS is an abstraction that manages the operations and organization of |
| 28 | +the file storage. |
| 29 | + |
| 30 | +You should use GridFS if the size of your files exceeds the BSON document |
| 31 | +size limit of 16MB. For more detailed information on whether GridFS is |
| 32 | +suitable for your use case, see :manual:`GridFS </core/gridfs>` in the |
| 33 | +MongoDB Server manual. |
| 34 | + |
| 35 | +The following sections describe GridFS operations and how to |
| 36 | +perform them. |
| 37 | + |
| 38 | +How GridFS Works |
| 39 | +---------------- |
| 40 | + |
| 41 | +GridFS organizes files in a **bucket**, a group of MongoDB collections |
| 42 | +that contain the chunks of files and information describing them. The |
| 43 | +bucket contains the following collections, named using the convention |
| 44 | +defined in the GridFS specification: |
| 45 | + |
| 46 | +- The ``chunks`` collection stores the binary file chunks. |
| 47 | +- The ``files`` collection stores the file metadata. |
| 48 | + |
| 49 | +When you create a new GridFS bucket, the driver creates the preceding |
| 50 | +collections, prefixed with the default bucket name ``fs``, unless |
| 51 | +you specify a different name. The driver also creates an index on each |
| 52 | +collection to ensure efficient retrieval of the files and related |
| 53 | +metadata. The driver creates the GridFS bucket, if it doesn't exist, only when the first write |
| 54 | +operation is performed. The driver creates indexes only if they don't exist and when the |
| 55 | +bucket is empty. For more information about |
| 56 | +GridFS indexes, see :manual:`GridFS Indexes </core/gridfs/#gridfs-indexes>` |
| 57 | +in the MongoDB Server manual. |
| 58 | + |
| 59 | +When storing files with GridFS, the driver splits the files into smaller |
| 60 | +chunks, each represented by a separate document in the ``chunks`` collection. |
| 61 | +It also creates a document in the ``files`` collection that contains |
| 62 | +a file ID, file name, and other file metadata. You can upload the file from |
| 63 | +memory or from a stream. See the following diagram to see how GridFS splits |
| 64 | +the files when uploaded to a bucket. |
| 65 | + |
| 66 | +.. figure:: /includes/figures/GridFS-upload.png |
| 67 | + :alt: A diagram that shows how GridFS uploads a file to a bucket |
| 68 | + |
| 69 | +When retrieving files, GridFS fetches the metadata from the ``files`` |
| 70 | +collection in the specified bucket and uses the information to reconstruct |
| 71 | +the file from documents in the ``chunks`` collection. You can read the file |
| 72 | +into memory or output it to a stream. |
| 73 | + |
| 74 | +.. _gridfs-create-bucket: |
| 75 | + |
| 76 | +Create a GridFS Bucket |
| 77 | +---------------------- |
| 78 | + |
| 79 | +To store or retrieve files from GridFS, create a GridFS bucket by calling the |
| 80 | +``GridFSBucket()`` constructor and passing in a ``Database`` instance. |
| 81 | +You can use the ``GridFSBucket`` instance to |
| 82 | +call read and write operations on the files in your bucket. |
| 83 | + |
| 84 | +.. literalinclude:: /includes/gridfs/gridfs.py |
| 85 | + :language: python |
| 86 | + :copyable: true |
| 87 | + :start-after: start create bucket |
| 88 | + :end-before: end create bucket |
| 89 | + |
| 90 | +.. _gridfs-create-custom-bucket: |
| 91 | + |
| 92 | +To create or reference a bucket with a custom name other than the default name |
| 93 | +``fs``, pass your bucket name as the second parameter to the ``GridFSBucket()`` |
| 94 | +constructor, as shown below: |
| 95 | + |
| 96 | +.. literalinclude:: /includes/gridfs/gridfs.py |
| 97 | + :language: python |
| 98 | + :copyable: true |
| 99 | + :start-after: start create custom bucket |
| 100 | + :end-before: end create custom bucket |
| 101 | + |
| 102 | +.. _gridfs-upload-files: |
| 103 | + |
| 104 | +Upload Files |
| 105 | +------------ |
| 106 | + |
| 107 | +Use the ``open_upload_stream()`` method from the ``GridFSBucket`` class to create an upload |
| 108 | +stream for a given file name. The |
| 109 | +``open_upload_stream()`` method allows you to specify configuration information |
| 110 | +such as file chunk size and other field/value pairs to store as metadata. Set |
| 111 | +these options as parameters of ``open_upload_stream()``, as shown in the |
| 112 | +following code example: |
| 113 | + |
| 114 | +.. literalinclude:: /includes/gridfs/gridfs.py |
| 115 | + :language: python |
| 116 | + :copyable: true |
| 117 | + :start-after: start upload files |
| 118 | + :end-before: end upload files |
| 119 | + |
| 120 | +.. _gridfs-retrieve-file-info: |
| 121 | + |
| 122 | +Retrieve File Information |
| 123 | +------------------------- |
| 124 | + |
| 125 | +In this section, you can learn how to retrieve file metadata stored in the |
| 126 | +``files`` collection of the GridFS bucket. The metadata contains information |
| 127 | +about the file it refers to, including: |
| 128 | + |
| 129 | +- The ``_id`` of the file |
| 130 | +- The name of the file |
| 131 | +- The length/size of the file |
| 132 | +- The upload date and time |
| 133 | +- A ``metadata`` document in which you can store any other information |
| 134 | + |
| 135 | +To retrieve files from a GridFS bucket, call the ``find()`` method on the ``GridFSBucket`` |
| 136 | +instance. The method returns a ``Cursor`` instance |
| 137 | +from which you can access the results. To learn more about ``Cursor`` objects in |
| 138 | +{+driver-short+}, see :ref:`<pymongo-cursors>`. |
| 139 | + |
| 140 | +The following code example shows you how to retrieve and print file metadata |
| 141 | +from all your files in a GridFS bucket. It uses the ``for...of`` syntax to traverse the |
| 142 | +``Cursor`` iterable and display the results: |
| 143 | + |
| 144 | +.. literalinclude:: /includes/gridfs/gridfs.py |
| 145 | + :language: python |
| 146 | + :copyable: true |
| 147 | + :start-after: start retrieve file info |
| 148 | + :end-before: end retrieve file info |
| 149 | + |
| 150 | +The ``find()`` method accepts various query specifications. You can use |
| 151 | +its parameters to specify the sort order, maximum number of documents to return, |
| 152 | +and the number of documents to skip before returning. To learn more about querying |
| 153 | +MongoDB, see :ref:`<pymongo-retrieve>`. |
| 154 | + |
| 155 | +.. _gridfs-download-files: |
| 156 | + |
| 157 | +Download Files |
| 158 | +-------------- |
| 159 | + |
| 160 | +You can download files from your MongoDB database by using the |
| 161 | +``open_download_stream_by_name()`` method from ``GridFSBucket`` to create a |
| 162 | +download stream. |
| 163 | + |
| 164 | +The following example shows you how to download a file referenced |
| 165 | +by the file name, stored in the ``filename`` field, into your working |
| 166 | +directory: |
| 167 | + |
| 168 | +.. literalinclude:: /includes/gridfs/gridfs.py |
| 169 | + :language: python |
| 170 | + :copyable: true |
| 171 | + :start-after: start download files name |
| 172 | + :end-before: end download files name |
| 173 | + |
| 174 | +.. note:: |
| 175 | + |
| 176 | + If there are multiple documents with the same ``filename`` value, |
| 177 | + GridFS will stream the most recent file with the given name (as |
| 178 | + determined by the ``uploadDate`` field). |
| 179 | + |
| 180 | +Alternatively, you can use the ``open_download_stream()`` |
| 181 | +method, which takes the ``_id`` field of a file as a parameter: |
| 182 | + |
| 183 | +.. literalinclude:: /includes/gridfs/gridfs.py |
| 184 | + :language: python |
| 185 | + :copyable: true |
| 186 | + :start-after: start download files id |
| 187 | + :end-before: end download files id |
| 188 | + |
| 189 | +.. note:: |
| 190 | + |
| 191 | + The GridFS streaming API cannot load partial chunks. When a download |
| 192 | + stream needs to pull a chunk from MongoDB, it pulls the entire chunk |
| 193 | + into memory. The 255-kilobyte default chunk size is usually |
| 194 | + sufficient, but you can reduce the chunk size to reduce memory |
| 195 | + overhead. |
| 196 | + |
| 197 | +.. _gridfs-rename-files: |
| 198 | + |
| 199 | +Rename Files |
| 200 | +------------ |
| 201 | + |
| 202 | +Use the ``rename()`` method to update the name of a GridFS file in your |
| 203 | +bucket. You must specify the file to rename by its ``_id`` field |
| 204 | +rather than its file name. |
| 205 | + |
| 206 | +The following example shows how to update the ``filename`` field to |
| 207 | +``"newFileName"`` by referencing a document's ``_id`` field: |
| 208 | + |
| 209 | +.. literalinclude:: /includes/gridfs/gridfs.py |
| 210 | + :language: python |
| 211 | + :copyable: true |
| 212 | + :start-after: start rename files |
| 213 | + :end-before: end rename files |
| 214 | + |
| 215 | +.. note:: |
| 216 | + |
| 217 | + The ``rename()`` method supports updating the name of only one file at |
| 218 | + a time. To rename multiple files, retrieve a list of files matching the |
| 219 | + file name from the bucket, extract the ``_id`` field from the files you |
| 220 | + want to rename, and pass each value in separate calls to the ``rename()`` |
| 221 | + method. |
| 222 | + |
| 223 | +.. _gridfs-delete-files: |
| 224 | + |
| 225 | +Delete Files |
| 226 | +------------ |
| 227 | + |
| 228 | +Use the ``delete()`` method to remove a file's collection document and associated |
| 229 | +chunks from your bucket. This effectively deletes the file. You must |
| 230 | +specify the file by its ``_id`` field rather than its file name. |
| 231 | + |
| 232 | +The following example shows you how to delete a file by referencing its ``_id`` field: |
| 233 | + |
| 234 | +.. literalinclude:: /includes/gridfs/gridfs.py |
| 235 | + :language: python |
| 236 | + :copyable: true |
| 237 | + :start-after: start rename files |
| 238 | + :end-before: end rename files |
| 239 | + |
| 240 | +.. note:: |
| 241 | + |
| 242 | + The ``delete()`` method supports deleting only one file at a time. To |
| 243 | + delete multiple files, retrieve the files from the bucket, extract |
| 244 | + the ``_id`` field from the files you want to delete, and pass each value |
| 245 | + in separate calls to the ``delete()`` method. |
| 246 | +API Documentation |
| 247 | +----------------- |
| 248 | + |
| 249 | +To learn more about using {+driver-short+} to store and retrieve large files, |
| 250 | +see the following API documentation: |
| 251 | + |
| 252 | +- `GridFSBucket <{+api-root+}gridfs/index.html#gridfs.GridFSBucket>`__ |
| 253 | +- `open_upload_stream() <{+api-root+}gridfs/index.html#gridfs.GridFSBucket.open_upload_stream>`__ |
| 254 | +- `find() <{+api-root+}gridfs/index.html#gridfs.GridFSBucket.find>`__ |
| 255 | +- `open_download_stream_by_name() <{+api-root+}gridfs/index.html#gridfs.GridFSBucket.open_download_stream_by_name>`__ |
| 256 | +- `open_download_stream() <{+api-root+}gridfs/index.html#gridfs.GridFSBucket.open_download_stream>`__ |
| 257 | +- `rename() <{+api-root+}gridfs/index.html#gridfs.GridFSBucket.rename>`__ |
| 258 | +- `delete() <{+api-root+}gridfs/index.html#gridfs.GridFSBucket.delete>`__ |
0 commit comments