(DOCSP-20566) v10 configuration improvements (#86)

zach-carr · zach-carr · commit 61612572eb22 · 2022-04-07T10:52:37.000-04:00
* (DOCSP-20566) v10 configuration improvements
diff --git a/source/configuration.txt b/source/configuration.txt
@@ -11,53 +11,115 @@ Configuration Options
    :class: singlecol
 
 Various configuration options are available for the MongoDB Spark
-Connector.
+Connector. To learn more about the options you can set, see 
+:ref:`spark-write-conf` and :ref:`spark-read-conf`.
 
 Specify Configuration
 ---------------------
 
-Via ``SparkConf``
-~~~~~~~~~~~~~~~~~
+.. _spark-conf:
 
-You can specify these options via ``SparkConf`` using the ``--conf``
-setting or the ``$SPARK_HOME/conf/spark-default.conf`` file, and
-MongoDB Spark Connector will use the settings in ``SparkConf`` as the
+Using ``SparkConf``
+~~~~~~~~~~~~~~~~~~~
+
+You can specify configuration options with ``SparkConf`` using any of 
+the following approaches:
+
+.. tabs-selector:: drivers
+
+.. tabs-drivers::
+
+   tabs:
+     - id: java-sync
+       content: |
+
+         - The ``SparkConf`` constructor in your application. To learn more, see the `Java SparkConf documentation <https://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/SparkConf.html>`__.
+
+     - id: python
+       content: |
+
+         - The ``SparkConf`` constructor in your application. To learn more, see the `Python SparkConf documentation <https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkConf.html>`__.
+
+     - id: scala
+       content: |
+
+         - The ``SparkConf`` constructor in your application. To learn more, see the `Scala SparkConf documentation <https://spark.apache.org/docs/latest/api/scala/org/apache/spark/SparkConf.html>`__.
+
+- The ``--conf`` flag at runtime. To learn more, see 
+  `Dynamically Loading Spark Properties <https://spark.apache.org/docs/latest/configuration.html#dynamically-loading-spark-properties>`__ in 
+  the Spark documentation.
+
+- The ``$SPARK_HOME/conf/spark-default.conf`` file.
+
+The MongoDB Spark Connector will use the settings in ``SparkConf`` as 
 defaults.
 
 .. important::
 
-   When setting configurations via ``SparkConf``, you must prefix the
-   configuration options. Refer to the configuration sections for the
-   specific prefix.
+   When setting configurations with ``SparkConf``, you must prefix the
+   configuration options. Refer to :ref:`spark-write-conf` and 
+   :ref:`spark-read-conf` for the specific prefixes.
 
-Via ``ReadConfig`` and ``WriteConfig``
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. _options-map:
 
-Various methods in the MongoDB Connector API accept an optional
-:mongo-spark:`ReadConfig
-</blob/master/src/main/scala/com/mongodb/spark/config/ReadConfig.scala>`
-or a :mongo-spark:`WriteConfig
-</blob/master/src/main/scala/com/mongodb/spark/config/WriteConfig.scala>` object.
-``ReadConfig`` and ``WriteConfig`` settings override any
-corresponding settings in ``SparkConf``.
-For examples, see :ref:`gs-read-config` and :ref:`gs-write-config`. For
-more details, refer to the source for these methods.
+Using an Options Map
+~~~~~~~~~~~~~~~~~~~~
 
-Via Options Map
-~~~~~~~~~~~~~~~
+In the Spark API, the DataFrameReader and DataFrameWriter methods 
+accept options in the form of a ``Map[String, String]``. Options 
+specified this way override any corresponding settings in ``SparkConf``.
 
-In the Spark API, some methods (e.g. ``DataFrameReader`` and
-``DataFrameWriter``) accept options in the form of a ``Map[String,
-String]``.
+.. tabs-drivers::
 
-You can convert custom ``ReadConfig`` or ``WriteConfig`` settings into
-a ``Map`` via the ``asOptions()`` method.
+   tabs:
+     - id: java-sync
+       content: |
 
-Via System Property
-~~~~~~~~~~~~~~~~~~~
+         To learn more about specifying options with 
+         `DataFrameReader <https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html#option-java.lang.String-boolean->`__ and 
+         `DataFrameWriter <https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameWriter.html#option-java.lang.String-boolean->`__, 
+         refer to the Java Spark documentation for the ``.option()`` 
+         method.
+
+     - id: python
+       content: |
+
+         To learn more about specifying options with 
+         `DataFrameReader <https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrameReader.option.html#pyspark.sql.DataFrameReader.option>`__ and 
+         `DataFrameWriter <https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrameWriter.option.html#pyspark.sql.DataFrameWriter.option>`__, 
+         refer to the Java Spark documentation for the ``.option()`` 
+         method.
+
+     - id: scala
+       content: |
+
+         To learn more about specifying options with 
+         `DataFrameReader <https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameReader.html#option(key:String,value:Double):org.apache.spark.sql.DataFrameReader>`__ and 
+         `DataFrameWriter <https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameWriter.html#option(key:String,value:Double):org.apache.spark.sql.DataFrameWriter[T]>`__, 
+         refer to the Java Spark documentation for the ``.option()`` 
+         method.
+  
+Short-Form Syntax
+`````````````````
+
+Options maps support short-form syntax. You may omit the prefix when 
+specifying an option key string.
+
+.. example::
+
+   The following syntaxes are equivalent to one another:
+
+   - ``dfw.option("spark.mongodb.write.collection", "myCollection").save()``
+
+   - ``dfw.option("spark.mongodb.collection", "myCollection").save()``
+
+   - ``dfw.option("collection", "myCollection").save()``
+
+Using a System Property
+~~~~~~~~~~~~~~~~~~~~~~~
 
 The connector provides a cache for ``MongoClients`` which can only be
-configured via the System Property. See :ref:`cache-configuration`.
+configured with a System Property. See :ref:`cache-configuration`.
 
 .. _cache-configuration:
 
@@ -70,7 +132,7 @@ share the MongoClient across threads.
 .. important::
 
    As the cache is setup before the Spark Configuration is available,
-   the cache can only be configured via a System Property.
+   the cache can only be configured with a System Property.
 
 .. list-table::
    :header-rows: 1
@@ -80,10 +142,21 @@ share the MongoClient across threads.
      - Description
 
    * - ``mongodb.keep_alive_ms``
-     - The length of time to keep a ``MongoClient`` available for sharing.
+     - The length of time to keep a ``MongoClient`` available for 
+       sharing.
 
        **Default:** ``5000``
-       
+
+``ConfigException``\s
+---------------------
+
+A configuration error throws a ``ConfigException``. Confirm that any of 
+the following methods of configuration that you use are configured 
+properly:
+
+- :ref:`SparkConf <spark-conf>`
+- :ref:`Options maps <options-map>`
+
 .. toctree::
    :titlesonly:
 
diff --git a/source/configuration/read.txt b/source/configuration/read.txt
@@ -1,3 +1,5 @@
+.. _spark-read-conf:
+
 ==========================
 Read Configuration Options
 ==========================
diff --git a/source/configuration/write.txt b/source/configuration/write.txt
@@ -1,3 +1,5 @@
+.. _spark-write-conf:
+
 ===========================
 Write Configuration Options
 ===========================
diff --git a/source/includes/scala-java-read-readconfig.rst b/source/includes/scala-java-read-readconfig.rst
@@ -5,4 +5,4 @@ specifies various :ref:`read configuration settings
 <replica-set-read-preference-modes>`.
 
 The following example reads from the ``spark`` collection with a
-``secondaryPreferred`` read preference:
+``secondaryPreferred`` read preference:
diff --git a/source/python/aggregation.txt b/source/python/aggregation.txt
@@ -11,7 +11,7 @@ to use when creating a DataFrame.
 .. code-block:: none
 
    pipeline = "{'$match': {'type': 'apple'}}"
-   df = spark.read.format("mongo").option("pipeline", pipeline).load()
+   df = spark.read.format("mongodb").option("pipeline", pipeline).load()
    df.show()
 
 In the ``pyspark`` shell, the operation prints the following output:
diff --git a/source/python/filters-and-sql.txt b/source/python/filters-and-sql.txt
@@ -17,7 +17,7 @@ source:
 
 .. code-block:: python
 
-   df = spark.read.format("mongo").load()
+   df = spark.read.format("mongodb").load()
 
 The following example includes only
 records in which the ``qty`` field is greater than or equal to ``10``.
diff --git a/source/python/read-from-mongodb.txt b/source/python/read-from-mongodb.txt
@@ -10,7 +10,7 @@ from within the ``pyspark`` shell.
 
 .. code-block:: python
 
-   df = spark.read.format("mongo").load()
+   df = spark.read.format("mongodb").load()
 
 Spark samples the records to infer the schema of the collection.
 
@@ -35,5 +35,5 @@ To read from a collection called ``contacts`` in a database called
 
 .. code-block:: python
 
-   df = spark.read.format("mongo").option("uri",
+   df = spark.read.format("mongodb").option("uri",
    "mongodb://127.0.0.1/people.contacts").load()
diff --git a/source/python/write-to-mongodb.txt b/source/python/write-to-mongodb.txt
@@ -14,7 +14,7 @@ by using the ``write`` method:
 
 .. code-block:: python
 
-   people.write.format("mongo").mode("append").save()
+   people.write.format("mongodb").mode("append").save()
 
 The above operation writes to the MongoDB database and collection
 specified in the :ref:`spark.mongodb.output.uri<pyspark-shell>` option
@@ -69,5 +69,5 @@ To write to a collection called ``contacts`` in a database called
 
 .. code-block:: python
 
-   people.write.format("mongo").mode("append").option("database",
+   people.write.format("mongodb").mode("append").option("database",
    "people").option("collection", "contacts").save()
diff --git a/source/read-from-mongodb.txt b/source/read-from-mongodb.txt
@@ -1,4 +1,6 @@
 .. _read-from-mongodb: 
+.. _scala-read:
+.. _java-read:
 
 =================
 Read from MongoDB
@@ -12,10 +14,6 @@ Read from MongoDB
    :depth: 1
    :class: singlecol 
 
-.. _scala-read:
-.. _java-read:
-.. _gs-read-config:
-
 Overview
 --------
 
diff --git a/source/scala/datasets-and-sql.txt b/source/scala/datasets-and-sql.txt
@@ -97,15 +97,15 @@ Alternatively, you can use ``SparkSession`` methods to create DataFrames:
    ) // ReadConfig used for configuration
 
    val df4 = sparkSession.read.mongo() // SparkSession used for configuration
-   sqlContext.read.format("mongo").load()
+   sqlContext.read.format("mongodb").load()
 
    // Set custom options
    import com.mongodb.spark.config._
 
    val customReadConfig = ReadConfig(Map("readPreference.name" -> "secondaryPreferred"), Some(ReadConfig(sc)))
    val df5 = sparkSession.read.mongo(customReadConfig)
 
-   val df6 = sparkSession.read.format("mongo").options(customReadConfig.asOptions).load()
+   val df6 = sparkSession.read.format("mongodb").options(customReadConfig.asOptions).load()
 
 Filters
 -------
@@ -252,7 +252,7 @@ to MongoDB using the DataFrameWriter directly:
 .. code-block:: scala
 
    centenarians.write.option("collection", "hundredClub").mode("overwrite").mongo()
-   centenarians.write.option("collection", "hundredClub").mode("overwrite").format("mongo").save()
+   centenarians.write.option("collection", "hundredClub").mode("overwrite").format("mongodb").save()
 
 DataTypes
 ---------
diff --git a/source/write-to-mongodb.txt b/source/write-to-mongodb.txt
@@ -1,15 +1,13 @@
 .. _write-to-mongodb:
+.. _scala-write:
+.. _java-write:
 
 ================
 Write to MongoDB
 ================
 
 .. default-domain:: mongodb
 
-.. _scala-write:
-.. _java-write:
-.. _gs-write-config:
-
 .. tabs-selector:: drivers
 
 .. tabs-drivers::

Original file line number	Diff line number	Diff line change
`@@ -1,3 +1,5 @@`
	`1`	`+.. _spark-read-conf:`
	`2`	`+`
`1`	`3`	`==========================`
`2`	`4`	`Read Configuration Options`
`3`	`5`	`==========================`
Original file line number	Diff line number	Diff line change
`@@ -1,3 +1,5 @@`
	`1`	`+.. _spark-write-conf:`
	`2`	`+`
`1`	`3`	`===========================`
`2`	`4`	`Write Configuration Options`
`3`	`5`	`===========================`