Skip to content

Commit 6161257

Browse files
committed
(DOCSP-20566) v10 configuration improvements (#86)
* (DOCSP-20566) v10 configuration improvements
1 parent d1b34fa commit 6161257

11 files changed

+124
-51
lines changed

source/configuration.txt

Lines changed: 106 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -11,53 +11,115 @@ Configuration Options
1111
:class: singlecol
1212

1313
Various configuration options are available for the MongoDB Spark
14-
Connector.
14+
Connector. To learn more about the options you can set, see
15+
:ref:`spark-write-conf` and :ref:`spark-read-conf`.
1516

1617
Specify Configuration
1718
---------------------
1819

19-
Via ``SparkConf``
20-
~~~~~~~~~~~~~~~~~
20+
.. _spark-conf:
2121

22-
You can specify these options via ``SparkConf`` using the ``--conf``
23-
setting or the ``$SPARK_HOME/conf/spark-default.conf`` file, and
24-
MongoDB Spark Connector will use the settings in ``SparkConf`` as the
22+
Using ``SparkConf``
23+
~~~~~~~~~~~~~~~~~~~
24+
25+
You can specify configuration options with ``SparkConf`` using any of
26+
the following approaches:
27+
28+
.. tabs-selector:: drivers
29+
30+
.. tabs-drivers::
31+
32+
tabs:
33+
- id: java-sync
34+
content: |
35+
36+
- The ``SparkConf`` constructor in your application. To learn more, see the `Java SparkConf documentation <https://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/SparkConf.html>`__.
37+
38+
- id: python
39+
content: |
40+
41+
- The ``SparkConf`` constructor in your application. To learn more, see the `Python SparkConf documentation <https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkConf.html>`__.
42+
43+
- id: scala
44+
content: |
45+
46+
- The ``SparkConf`` constructor in your application. To learn more, see the `Scala SparkConf documentation <https://spark.apache.org/docs/latest/api/scala/org/apache/spark/SparkConf.html>`__.
47+
48+
- The ``--conf`` flag at runtime. To learn more, see
49+
`Dynamically Loading Spark Properties <https://spark.apache.org/docs/latest/configuration.html#dynamically-loading-spark-properties>`__ in
50+
the Spark documentation.
51+
52+
- The ``$SPARK_HOME/conf/spark-default.conf`` file.
53+
54+
The MongoDB Spark Connector will use the settings in ``SparkConf`` as
2555
defaults.
2656

2757
.. important::
2858

29-
When setting configurations via ``SparkConf``, you must prefix the
30-
configuration options. Refer to the configuration sections for the
31-
specific prefix.
59+
When setting configurations with ``SparkConf``, you must prefix the
60+
configuration options. Refer to :ref:`spark-write-conf` and
61+
:ref:`spark-read-conf` for the specific prefixes.
3262

33-
Via ``ReadConfig`` and ``WriteConfig``
34-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
63+
.. _options-map:
3564

36-
Various methods in the MongoDB Connector API accept an optional
37-
:mongo-spark:`ReadConfig
38-
</blob/master/src/main/scala/com/mongodb/spark/config/ReadConfig.scala>`
39-
or a :mongo-spark:`WriteConfig
40-
</blob/master/src/main/scala/com/mongodb/spark/config/WriteConfig.scala>` object.
41-
``ReadConfig`` and ``WriteConfig`` settings override any
42-
corresponding settings in ``SparkConf``.
43-
For examples, see :ref:`gs-read-config` and :ref:`gs-write-config`. For
44-
more details, refer to the source for these methods.
65+
Using an Options Map
66+
~~~~~~~~~~~~~~~~~~~~
4567

46-
Via Options Map
47-
~~~~~~~~~~~~~~~
68+
In the Spark API, the DataFrameReader and DataFrameWriter methods
69+
accept options in the form of a ``Map[String, String]``. Options
70+
specified this way override any corresponding settings in ``SparkConf``.
4871

49-
In the Spark API, some methods (e.g. ``DataFrameReader`` and
50-
``DataFrameWriter``) accept options in the form of a ``Map[String,
51-
String]``.
72+
.. tabs-drivers::
5273

53-
You can convert custom ``ReadConfig`` or ``WriteConfig`` settings into
54-
a ``Map`` via the ``asOptions()`` method.
74+
tabs:
75+
- id: java-sync
76+
content: |
5577

56-
Via System Property
57-
~~~~~~~~~~~~~~~~~~~
78+
To learn more about specifying options with
79+
`DataFrameReader <https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html#option-java.lang.String-boolean->`__ and
80+
`DataFrameWriter <https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameWriter.html#option-java.lang.String-boolean->`__,
81+
refer to the Java Spark documentation for the ``.option()``
82+
method.
83+
84+
- id: python
85+
content: |
86+
87+
To learn more about specifying options with
88+
`DataFrameReader <https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrameReader.option.html#pyspark.sql.DataFrameReader.option>`__ and
89+
`DataFrameWriter <https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrameWriter.option.html#pyspark.sql.DataFrameWriter.option>`__,
90+
refer to the Java Spark documentation for the ``.option()``
91+
method.
92+
93+
- id: scala
94+
content: |
95+
96+
To learn more about specifying options with
97+
`DataFrameReader <https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameReader.html#option(key:String,value:Double):org.apache.spark.sql.DataFrameReader>`__ and
98+
`DataFrameWriter <https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameWriter.html#option(key:String,value:Double):org.apache.spark.sql.DataFrameWriter[T]>`__,
99+
refer to the Java Spark documentation for the ``.option()``
100+
method.
101+
102+
Short-Form Syntax
103+
`````````````````
104+
105+
Options maps support short-form syntax. You may omit the prefix when
106+
specifying an option key string.
107+
108+
.. example::
109+
110+
The following syntaxes are equivalent to one another:
111+
112+
- ``dfw.option("spark.mongodb.write.collection", "myCollection").save()``
113+
114+
- ``dfw.option("spark.mongodb.collection", "myCollection").save()``
115+
116+
- ``dfw.option("collection", "myCollection").save()``
117+
118+
Using a System Property
119+
~~~~~~~~~~~~~~~~~~~~~~~
58120

59121
The connector provides a cache for ``MongoClients`` which can only be
60-
configured via the System Property. See :ref:`cache-configuration`.
122+
configured with a System Property. See :ref:`cache-configuration`.
61123

62124
.. _cache-configuration:
63125

@@ -70,7 +132,7 @@ share the MongoClient across threads.
70132
.. important::
71133

72134
As the cache is setup before the Spark Configuration is available,
73-
the cache can only be configured via a System Property.
135+
the cache can only be configured with a System Property.
74136

75137
.. list-table::
76138
:header-rows: 1
@@ -80,10 +142,21 @@ share the MongoClient across threads.
80142
- Description
81143

82144
* - ``mongodb.keep_alive_ms``
83-
- The length of time to keep a ``MongoClient`` available for sharing.
145+
- The length of time to keep a ``MongoClient`` available for
146+
sharing.
84147

85148
**Default:** ``5000``
86-
149+
150+
``ConfigException``\s
151+
---------------------
152+
153+
A configuration error throws a ``ConfigException``. Confirm that any of
154+
the following methods of configuration that you use are configured
155+
properly:
156+
157+
- :ref:`SparkConf <spark-conf>`
158+
- :ref:`Options maps <options-map>`
159+
87160
.. toctree::
88161
:titlesonly:
89162

source/configuration/read.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. _spark-read-conf:
2+
13
==========================
24
Read Configuration Options
35
==========================

source/configuration/write.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. _spark-write-conf:
2+
13
===========================
24
Write Configuration Options
35
===========================

source/includes/scala-java-read-readconfig.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,4 @@ specifies various :ref:`read configuration settings
55
<replica-set-read-preference-modes>`.
66

77
The following example reads from the ``spark`` collection with a
8-
``secondaryPreferred`` read preference:
8+
``secondaryPreferred`` read preference:

source/python/aggregation.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ to use when creating a DataFrame.
1111
.. code-block:: none
1212

1313
pipeline = "{'$match': {'type': 'apple'}}"
14-
df = spark.read.format("mongo").option("pipeline", pipeline).load()
14+
df = spark.read.format("mongodb").option("pipeline", pipeline).load()
1515
df.show()
1616

1717
In the ``pyspark`` shell, the operation prints the following output:

source/python/filters-and-sql.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ source:
1717

1818
.. code-block:: python
1919

20-
df = spark.read.format("mongo").load()
20+
df = spark.read.format("mongodb").load()
2121

2222
The following example includes only
2323
records in which the ``qty`` field is greater than or equal to ``10``.

source/python/read-from-mongodb.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ from within the ``pyspark`` shell.
1010

1111
.. code-block:: python
1212

13-
df = spark.read.format("mongo").load()
13+
df = spark.read.format("mongodb").load()
1414

1515
Spark samples the records to infer the schema of the collection.
1616

@@ -35,5 +35,5 @@ To read from a collection called ``contacts`` in a database called
3535

3636
.. code-block:: python
3737

38-
df = spark.read.format("mongo").option("uri",
38+
df = spark.read.format("mongodb").option("uri",
3939
"mongodb://127.0.0.1/people.contacts").load()

source/python/write-to-mongodb.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ by using the ``write`` method:
1414

1515
.. code-block:: python
1616

17-
people.write.format("mongo").mode("append").save()
17+
people.write.format("mongodb").mode("append").save()
1818

1919
The above operation writes to the MongoDB database and collection
2020
specified in the :ref:`spark.mongodb.output.uri<pyspark-shell>` option
@@ -69,5 +69,5 @@ To write to a collection called ``contacts`` in a database called
6969

7070
.. code-block:: python
7171

72-
people.write.format("mongo").mode("append").option("database",
72+
people.write.format("mongodb").mode("append").option("database",
7373
"people").option("collection", "contacts").save()

source/read-from-mongodb.txt

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
11
.. _read-from-mongodb:
2+
.. _scala-read:
3+
.. _java-read:
24

35
=================
46
Read from MongoDB
@@ -12,10 +14,6 @@ Read from MongoDB
1214
:depth: 1
1315
:class: singlecol
1416

15-
.. _scala-read:
16-
.. _java-read:
17-
.. _gs-read-config:
18-
1917
Overview
2018
--------
2119

source/scala/datasets-and-sql.txt

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -97,15 +97,15 @@ Alternatively, you can use ``SparkSession`` methods to create DataFrames:
9797
) // ReadConfig used for configuration
9898

9999
val df4 = sparkSession.read.mongo() // SparkSession used for configuration
100-
sqlContext.read.format("mongo").load()
100+
sqlContext.read.format("mongodb").load()
101101

102102
// Set custom options
103103
import com.mongodb.spark.config._
104104

105105
val customReadConfig = ReadConfig(Map("readPreference.name" -> "secondaryPreferred"), Some(ReadConfig(sc)))
106106
val df5 = sparkSession.read.mongo(customReadConfig)
107107

108-
val df6 = sparkSession.read.format("mongo").options(customReadConfig.asOptions).load()
108+
val df6 = sparkSession.read.format("mongodb").options(customReadConfig.asOptions).load()
109109

110110
Filters
111111
-------
@@ -252,7 +252,7 @@ to MongoDB using the DataFrameWriter directly:
252252
.. code-block:: scala
253253

254254
centenarians.write.option("collection", "hundredClub").mode("overwrite").mongo()
255-
centenarians.write.option("collection", "hundredClub").mode("overwrite").format("mongo").save()
255+
centenarians.write.option("collection", "hundredClub").mode("overwrite").format("mongodb").save()
256256

257257
DataTypes
258258
---------

0 commit comments

Comments
 (0)