@@ -46,6 +46,10 @@ You can configure the following properties when reading data from MongoDB in str
46
46
* - ``collection``
47
47
- | **Required.**
48
48
| The collection name configuration.
49
+ | You can specify multiple collections by separating the collection names
50
+ with a comma.
51
+ |
52
+ | To learn more about specifying multiple collections, see :ref:`spark-specify-multiple-collections`.
49
53
50
54
* - ``comment``
51
55
- | The comment to append to the read operation. Comments appear in the
@@ -168,7 +172,7 @@ You can configure the following properties when reading a change stream from Mon
168
172
omit the ``fullDocument`` field and publishes only the value of the
169
173
field.
170
174
- If you don't specify a schema, the connector infers the schema
171
- from the change stream document rather than from the underlying collection .
175
+ from the change stream document.
172
176
173
177
**Default**: ``false``
174
178
@@ -203,4 +207,91 @@ You can configure the following properties when reading a change stream from Mon
203
207
Specifying Properties in ``connection.uri``
204
208
-------------------------------------------
205
209
206
- .. include:: /includes/connection-read-config.rst
210
+ .. include:: /includes/connection-read-config.rst
211
+
212
+ .. _spark-specify-multiple-collections:
213
+
214
+ Specifying Multiple Collections in the ``collection`` Property
215
+ --------------------------------------------------------------
216
+
217
+ You can specify multiple collections in the ``collection`` change stream
218
+ configuration property by separating the collection names
219
+ with a comma. Do not add a space between the collections unless the space is a
220
+ part of the collection name.
221
+
222
+ Specify multiple collections as shown in the following example:
223
+
224
+ .. code-block:: java
225
+
226
+ ...
227
+ .option("spark.mongodb.collection", "collectionOne,collectionTwo")
228
+
229
+ If a collection name is "*", or if the name includes a comma or a backslash (\\),
230
+ you must escape the character as follows:
231
+
232
+ - If the name of a collection used in your ``collection`` configuration
233
+ option contains a comma, the {+connector-short+} treats it as two different
234
+ collections. To avoid this, you must escape the comma by preceding it with
235
+ a backslash (\\). Escape a collection named "my,collection" as follows:
236
+
237
+ .. code-block:: java
238
+
239
+ "my\,collection"
240
+
241
+ - If the name of a collection used in your ``collection`` configuration
242
+ option is "*", the {+connector-short+} interprets it as a specification
243
+ to scan all collections. To avoid this, you must escape the asterisk by preceding it
244
+ with a backslash (\\). Escape a collection named "*" as follows:
245
+
246
+ .. code-block:: java
247
+
248
+ "\*"
249
+
250
+ - If the name of a collection used in your ``collection`` configuration
251
+ option contains a backslash (\\), the
252
+ {+connector-short+} treats the backslash as an escape character, which
253
+ might change how it interprets the value. To avoid this, you must escape
254
+ the backslash by preceding it with another backslash. Escape a collection named "\\collection" as follows:
255
+
256
+ .. code-block:: java
257
+
258
+ "\\collection"
259
+
260
+ .. note::
261
+
262
+ When specifying the collection name as a string literal in Java, you must
263
+ further escape each backslash with another one. For example, escape a collection
264
+ named "\\collection" as follows:
265
+
266
+ .. code-block:: java
267
+
268
+ "\\\\collection"
269
+
270
+ You can stream from all collections in the database by passing an
271
+ asterisk (*) as a string for the collection name.
272
+
273
+ Specify all collections as shown in the following example:
274
+
275
+ .. code-block:: java
276
+
277
+ ...
278
+ .option("spark.mongodb.collection", "*")
279
+
280
+ If you create a collection while streaming from all collections, the new
281
+ collection is automatically included in the stream.
282
+
283
+ You can drop collections at any time while streaming from multiple collections.
284
+
285
+ .. important:: Inferring the Schema with Multiple Collections
286
+
287
+ If you set the ``change.stream.publish.full.document.only``
288
+ option to ``true``, the {+connector-short+} infers the schema of a ``DataFrame``
289
+ by using the schema of the scanned documents.
290
+
291
+ Schema inference happens at the beginning of streaming, and does not take
292
+ into account collections that are created during streaming.
293
+
294
+ When streaming from multiple collections and inferring the schema, the connector samples
295
+ each collection sequentially. Streaming from a large number of
296
+ collections can cause the schema inference to have noticeably slower
297
+ performance. This performance impact occurs only while inferring the schema.
0 commit comments