(DOCSP-21735) Add fullDocument config option (#119)

zach-carr · web-flow · commit e8ccaa55a107 · 2022-04-07T13:03:20.000-04:00
* (DOCSP-21735) Add fullDocument config option
diff --git a/source/configuration/read.txt b/source/configuration/read.txt
@@ -293,6 +293,53 @@ This partitioner is not compatible with hashed shard keys.
 
        **Default:** ``64``
 
+.. _spark-change-stream-conf:
+
+Change Streams
+--------------
+
+.. note::
+
+   If you use ``SparkConf`` to set the connector's change stream 
+   configurations, prefix ``spark.mongodb.change.stream.`` to each 
+   property.
+
+.. list-table::
+   :header-rows: 1
+   :widths: 35 65
+
+   * - Property name
+     - Description
+
+   * - ``lookup.full.document``
+
+     - Determines what values your change stream returns on update 
+       operations.
+
+       The default setting returns the differences between the original 
+       document and the updated document.
+
+       The ``updateLookup`` setting returns the differences between the 
+       original document and updated document as well as a copy of the 
+       entire updated document.
+
+       .. tip::
+
+          For more information on how this change stream option works, 
+          see the MongoDB server manual guide 
+          :manual:`Lookup Full Document for Update Operation </changeStreams/#lookup-full-document-for-update-operations>`.
+
+       **Default:** "default"
+
+   * - ``publish.full.document.only``
+
+     - If ``true``, this property returns only the changed document 
+       instead of the full change stream document. The connector 
+       automatically sets the ``lookup.full.document`` property to 
+       ``updateLookup`` to receive the updated documents.
+
+       **Default:** ``false``
+
 .. _configure-input-uri:
 
 ``uri`` Configuration Setting
diff --git a/source/structured-streaming.txt b/source/structured-streaming.txt
@@ -92,6 +92,61 @@ Configuring a Write Stream to MongoDB
          For a complete list of methods, see the 
          `pyspark Structured Streaming reference <https://spark.apache.org/docs/latest/api/python/reference/pyspark.ss.html>`__.
 
+     - id: scala
+       content: |
+
+         Specify write stream configuration settings on your streaming 
+         Dataset or DataFrame using the ``writeStream`` property. You 
+         must specify the following configuration settings to write 
+         to MongoDB:
+         
+         .. list-table::
+            :header-rows: 1
+            :stub-columns: 1
+            :widths: 10 40
+         
+            * - Setting
+              - Description
+         
+            * - ``writeStream.format()``
+              - The format to use for write stream data. Use 
+                ``mongodb``.
+         
+            * - ``writeStream.option()``
+              - Use the ``option`` method to specify your MongoDB 
+                deployment connection string with the 
+                ``spark.mongodb.connection.uri`` option key.
+         
+                You must specify a database and collection, either as 
+                part of your connection string or with additional 
+                ``option`` methods using the following keys:
+         
+                - ``spark.mongodb.database``
+                - ``spark.mongodb.collection``
+         
+            * - ``writeStream.outputMode()``
+              - The output mode to use. To view a list of all supported 
+                output modes, see `the pyspark outputMode documentation <https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.streaming.DataStreamWriter.outputMode.html#pyspark.sql.streaming.DataStreamWriter.outputMode>`__.
+
+         
+         The following code snippet shows how to use the preceding 
+         configuration settings to stream data to MongoDB:
+
+         .. code-block:: python
+            :copyable: true
+            :emphasize-lines: 3-4, 7
+         
+            <streaming Dataset/ DataFrame> \
+              .writeStream \
+              .format("mongodb") \
+              .option("spark.mongodb.connection.uri", <mongodb-connection-string>) \
+              .option("spark.mongodb.database", <database-name>) \
+              .option("spark.mongodb.collection", <collection-name>) \
+              .outputMode("append")
+
+         For a complete list of methods, see the 
+         `pyspark Structured Streaming reference <https://spark.apache.org/docs/latest/api/python/reference/pyspark.ss.html>`__.
+
 .. _read-structured-stream:
 .. _continuous-processing: