Skip to content

Commit 79babb8

Browse files
Revert "DOCSP-36546 Scan Multiple Collections (#193)"
This reverts commit e206f09.
1 parent e206f09 commit 79babb8

File tree

3 files changed

+8
-136
lines changed

3 files changed

+8
-136
lines changed

source/release-notes.txt

Lines changed: 0 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -2,38 +2,6 @@
22
Release Notes
33
=============
44

5-
MongoDB Connector for Spark 10.3
6-
--------------------------------
7-
8-
The 10.3 connector release includes the following new features:
9-
10-
- Added support for reading multiple collections when using micro-batch or
11-
continuous streaming modes.
12-
13-
.. warning:: Breaking Change
14-
15-
Support for reading multiple collections introduces the following breaking
16-
changes:
17-
18-
- If the name of a collection used in your ``collection`` configuration
19-
option contains a comma, the
20-
{+connector-short+} treats it as two different collections. To avoid
21-
this, you must escape the comma by preceding it with a backslash (\\).
22-
23-
- If the name of a collection used in your ``collection`` configuration
24-
option is "*", the {+connector-short+} interprets it as a specification
25-
to scan all collections. To avoid this, you must escape the asterisk by preceding it
26-
with a backslash (\\).
27-
28-
- If the name of a collection used in your ``collection`` configuration
29-
option contains a backslash (\\), the
30-
{+connector-short+} treats the backslash as an escape character, which
31-
might change how it interprets the value. To avoid this, you must escape
32-
the backslash by preceding it with another backslash.
33-
34-
To learn more about scanning multiple collections, see the :ref:`collection
35-
configuration property <spark-streaming-input-conf>` description.
36-
375
MongoDB Connector for Spark 10.2
386
--------------------------------
397

source/streaming-mode/streaming-read-config.txt

Lines changed: 2 additions & 93 deletions
Original file line numberDiff line numberDiff line change
@@ -46,10 +46,6 @@ You can configure the following properties when reading data from MongoDB in str
4646
* - ``collection``
4747
- | **Required.**
4848
| The collection name configuration.
49-
| You can specify multiple collections by separating the collection names
50-
with a comma.
51-
|
52-
| To learn more about specifying multiple collections, see :ref:`spark-specify-multiple-collections`.
5349

5450
* - ``comment``
5551
- | The comment to append to the read operation. Comments appear in the
@@ -172,7 +168,7 @@ You can configure the following properties when reading a change stream from Mon
172168
omit the ``fullDocument`` field and publishes only the value of the
173169
field.
174170
- If you don't specify a schema, the connector infers the schema
175-
from the change stream document.
171+
from the change stream document rather than from the underlying collection.
176172

177173
**Default**: ``false``
178174

@@ -207,91 +203,4 @@ You can configure the following properties when reading a change stream from Mon
207203
Specifying Properties in ``connection.uri``
208204
-------------------------------------------
209205

210-
.. include:: /includes/connection-read-config.rst
211-
212-
.. _spark-specify-multiple-collections:
213-
214-
Specifying Multiple Collections in the ``collection`` Property
215-
--------------------------------------------------------------
216-
217-
You can specify multiple collections in the ``collection`` change stream
218-
configuration property by separating the collection names
219-
with a comma. Do not add a space between the collections unless the space is a
220-
part of the collection name.
221-
222-
Specify multiple collections as shown in the following example:
223-
224-
.. code-block:: java
225-
226-
...
227-
.option("spark.mongodb.collection", "collectionOne,collectionTwo")
228-
229-
If a collection name is "*", or if the name includes a comma or a backslash (\\),
230-
you must escape the character as follows:
231-
232-
- If the name of a collection used in your ``collection`` configuration
233-
option contains a comma, the {+connector-short+} treats it as two different
234-
collections. To avoid this, you must escape the comma by preceding it with
235-
a backslash (\\). Escape a collection named "my,collection" as follows:
236-
237-
.. code-block:: java
238-
239-
"my\,collection"
240-
241-
- If the name of a collection used in your ``collection`` configuration
242-
option is "*", the {+connector-short+} interprets it as a specification
243-
to scan all collections. To avoid this, you must escape the asterisk by preceding it
244-
with a backslash (\\). Escape a collection named "*" as follows:
245-
246-
.. code-block:: java
247-
248-
"\*"
249-
250-
- If the name of a collection used in your ``collection`` configuration
251-
option contains a backslash (\\), the
252-
{+connector-short+} treats the backslash as an escape character, which
253-
might change how it interprets the value. To avoid this, you must escape
254-
the backslash by preceding it with another backslash. Escape a collection named "\\collection" as follows:
255-
256-
.. code-block:: java
257-
258-
"\\collection"
259-
260-
.. note::
261-
262-
When specifying the collection name as a string literal in Java, you must
263-
further escape each backslash with another one. For example, escape a collection
264-
named "\\collection" as follows:
265-
266-
.. code-block:: java
267-
268-
"\\\\collection"
269-
270-
You can stream from all collections in the database by passing an
271-
asterisk (*) as a string for the collection name.
272-
273-
Specify all collections as shown in the following example:
274-
275-
.. code-block:: java
276-
277-
...
278-
.option("spark.mongodb.collection", "*")
279-
280-
If you create a collection while streaming from all collections, the new
281-
collection is automatically included in the stream.
282-
283-
You can drop collections at any time while streaming from multiple collections.
284-
285-
.. important:: Inferring the Schema with Multiple Collections
286-
287-
If you set the ``change.stream.publish.full.document.only``
288-
option to ``true``, the {+connector-short+} infers the schema of a ``DataFrame``
289-
by using the schema of the scanned documents.
290-
291-
Schema inference happens at the beginning of streaming, and does not take
292-
into account collections that are created during streaming.
293-
294-
When streaming from multiple collections and inferring the schema, the connector samples
295-
each collection sequentially. Streaming from a large number of
296-
collections can cause the schema inference to have noticeably slower
297-
performance. This performance impact occurs only while inferring the schema.
206+
.. include:: /includes/connection-read-config.rst

source/streaming-mode/streaming-read.txt

Lines changed: 6 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -15,13 +15,6 @@ Read from MongoDB in Streaming Mode
1515
:depth: 1
1616
:class: singlecol
1717

18-
.. facet::
19-
:name: genre
20-
:values: reference
21-
22-
.. meta::
23-
:keywords: change stream
24-
2518
Overview
2619
--------
2720

@@ -351,10 +344,12 @@ The following example shows how to stream data from MongoDB to your console.
351344

352345
.. important:: Inferring the Schema of a Change Stream
353346

354-
If you set the ``change.stream.publish.full.document.only``
355-
option to ``true``, the {+connector-short+} infers the schema of a ``DataFrame``
356-
by using the schema of the scanned documents. If you set the option to
357-
``false``, you must specify a schema.
347+
When the {+connector-short+} infers the schema of a DataFrame
348+
read from a change stream, by default,
349+
it uses the schema of the underlying collection rather than that
350+
of the change stream. If you set the ``change.stream.publish.full.document.only``
351+
option to ``true``, the connector uses the schema of the
352+
change stream instead.
358353

359354
For more information about this setting, and to see a full list of change stream
360355
configuration options, see the

0 commit comments

Comments
 (0)