Skip to content

Commit 6b245f8

Browse files
authored
(DOCSP-14027) Expand Configuration section (#82)
* (DOCSP-14027) Expand Configuration section
1 parent 136c08f commit 6b245f8

File tree

1 file changed

+40
-8
lines changed

1 file changed

+40
-8
lines changed

source/configuration.txt

Lines changed: 40 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,10 @@ You can configure the following properties to read from MongoDB:
124124
**Default:** ``10000``
125125

126126
* - ``partitioner``
127-
- The name of the partitioner to use to partition the data.
127+
- The name of the partitioner to use to split collection data into
128+
partitions. Partitions are based on a range of values of a field
129+
(e.g. ``_id``\s 1 to 100).
130+
128131
The connector provides the following partitioners:
129132

130133
- ``MongoDefaultPartitioner``
@@ -135,8 +138,8 @@ You can configure the following properties to read from MongoDB:
135138
**Requires MongoDB 3.2**. A general purpose partitioner for
136139
all deployments. Uses the average document size and random
137140
sampling of the collection to determine suitable
138-
partitions for the collection. For configuration settings
139-
for the MongoSamplePartitioner, see
141+
partitions for the collection. For configuration
142+
settings for the MongoSamplePartitioner, see
140143
:ref:`conf-mongosamplepartitioner`.
141144

142145
- ``MongoShardedPartitioner``
@@ -249,15 +252,41 @@ Partitioner Configuration
249252
**Default:** ``_id``
250253

251254
* - ``partitionSizeMB``
252-
- The size (in MB) for each partition
255+
- The size (in MB) for each partition. Smaller partition sizes
256+
create more partitions containing fewer documents.
253257

254258
**Default:** ``64``
255259

256260
* - ``samplesPerPartition``
257-
- The number of sample documents to take for each partition.
261+
- The number of sample documents to take for each partition in
262+
order to establish a ``partitionKey`` range for each partition.
263+
264+
A greater number of ``samplesPerPartition`` helps to find
265+
``partitionKey`` ranges that more closely match the
266+
``partitionSizeMB`` you specify.
267+
268+
.. note::
269+
270+
For sampling to improve performance, ``samplesPerPartition``
271+
must be fewer than the number of documents within each of
272+
your partitions.
273+
274+
You can estimate the number of documents within each of your
275+
partitions by dividing your ``partitionSizeMB`` by the
276+
average document size (in MB) in your collection.
258277

259278
**Default:** ``10``
260279

280+
.. example::
281+
282+
For a collection with 640 documents with an average document
283+
size of 0.5 MB, the default ``MongoSamplePartitioner`` configuration
284+
values creates 5 partitions with 128 documents per partition.
285+
286+
The MongoDB Spark Connector samples 50 documents (the default 10
287+
per intended partition) and defines 5 partitions by selecting
288+
``partitionKey`` ranges from the sampled documents.
289+
261290
.. _conf-mongoshardedpartitioner:
262291

263292
``MongoShardedPartitioner`` Configuration
@@ -303,7 +332,8 @@ Partitioner Configuration
303332
**Default:** ``_id``
304333

305334
* - ``partitionSizeMB``
306-
- The size (in MB) for each partition
335+
- The size (in MB) for each partition. Smaller partition sizes
336+
create more partitions containing fewer documents.
307337

308338
**Default:** ``64``
309339

@@ -328,7 +358,8 @@ Partitioner Configuration
328358
**Default:** ``_id``
329359

330360
* - ``numberOfPartitions``
331-
- The number of partitions to create.
361+
- The number of partitions to create. A greater number of
362+
partitions means fewer documents per partition.
332363

333364
**Default:** ``64``
334365

@@ -353,7 +384,8 @@ Partitioner Configuration
353384
**Default:** ``_id``
354385

355386
* - ``partitionSizeMB``
356-
- The size (in MB) for each partition
387+
- The size (in MB) for each partition. Smaller partition sizes
388+
create more partitions containing fewer documents.
357389

358390
**Default:** ``64``
359391

0 commit comments

Comments
 (0)