Skip to content

Commit 099521a

Browse files
docsp-31169 - change stream schema inference (#164)
Co-authored-by: Caitlin Davey <[email protected]>
1 parent 5fa4781 commit 099521a

File tree

4 files changed

+21
-5
lines changed

4 files changed

+21
-5
lines changed

snooty.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,8 @@ intersphinx = ["https://www.mongodb.com/docs/manual/objects.inv"]
66
toc_landing_pages = ["configuration"]
77

88
[constants]
9-
driver-short = "Spark Connector"
10-
driver-long = "MongoDB {+driver-short+}"
9+
connector-short = "Spark Connector"
10+
connector-long = "MongoDB {+connector-short+}"
1111
current-version = "10.2.0"
1212
artifact-id-2-13 = "mongo-spark-connector_2.13"
1313
artifact-id-2-12 = "mongo-spark-connector_2.12"

source/configuration/read.txt

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,7 @@ You can configure the following properties to read from MongoDB:
133133
Partitioner Configurations
134134
~~~~~~~~~~~~~~~~~~~~~~~~~~
135135

136-
Partitioners change the read behavior for batch reads with the {+driver-short+}.
136+
Partitioners change the read behavior for batch reads with the {+connector-short+}.
137137
They do not affect Structured Streaming because the data stream processing
138138
engine produces a single stream with Structured Streaming.
139139

@@ -330,9 +330,13 @@ Change Streams
330330
- | Specifies whether to publish the changed document or the full
331331
change stream document.
332332
|
333-
| When set to ``true``, the connector filters out messages that
333+
| When this setting is ``true``, the connector exhibits the following behavior:
334+
335+
- The connector filters out messages that
334336
omit the ``fullDocument`` field and only publishes the value of the
335337
field.
338+
- If you don't specify a schema, the connector infers the schema
339+
from the change stream document rather than from the underlying collection.
336340

337341
.. note::
338342

source/read-from-mongodb.txt

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,18 @@ Overview
4242

4343
.. include:: /scala/filters.txt
4444

45+
.. important:: Inferring the Schema of a Change Stream
46+
47+
When the {+connector-short+} infers the schema of a data frame
48+
read from a change stream, by default,
49+
it will use the schema of the underlying collection rather than that
50+
of the change stream. If you set the ``change.stream.publish.full.document.only``
51+
option to ``true``, the connector uses the schema of the
52+
change stream instead.
53+
54+
For more information on configuring a read operation, see the
55+
:ref:`spark-change-stream-conf` section of the Read Configuration Options guide.
56+
4557
SQL Queries
4658
-----------
4759

source/structured-streaming.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -191,7 +191,7 @@ Configuring a Write Stream to MongoDB
191191

192192
Configuring a Read Stream from MongoDB
193193
--------------------------------------
194-
When reading a stream from a MongoDB database, the {+driver-long+} supports both
194+
When reading a stream from a MongoDB database, the {+connector-long+} supports both
195195
*micro-batch processing* and
196196
*continuous processing*. Micro-batch processing is the default processing engine, while
197197
continuous processing is an experimental feature introduced in

0 commit comments

Comments
 (0)