@@ -124,7 +124,10 @@ You can configure the following properties to read from MongoDB:
124
124
**Default:** ``10000``
125
125
126
126
* - ``partitioner``
127
- - The name of the partitioner to use to partition the data.
127
+ - The name of the partitioner to use to split collection data into
128
+ partitions. Partitions are based on a range of values of a field
129
+ (e.g. ``_id``\s 1 to 100).
130
+
128
131
The connector provides the following partitioners:
129
132
130
133
- ``MongoDefaultPartitioner``
@@ -135,8 +138,8 @@ You can configure the following properties to read from MongoDB:
135
138
**Requires MongoDB 3.2**. A general purpose partitioner for
136
139
all deployments. Uses the average document size and random
137
140
sampling of the collection to determine suitable
138
- partitions for the collection. For configuration settings
139
- for the MongoSamplePartitioner, see
141
+ partitions for the collection. For configuration
142
+ settings for the MongoSamplePartitioner, see
140
143
:ref:`conf-mongosamplepartitioner`.
141
144
142
145
- ``MongoShardedPartitioner``
@@ -249,15 +252,41 @@ Partitioner Configuration
249
252
**Default:** ``_id``
250
253
251
254
* - ``partitionSizeMB``
252
- - The size (in MB) for each partition
255
+ - The size (in MB) for each partition. Smaller partition sizes
256
+ create more partitions containing fewer documents.
253
257
254
258
**Default:** ``64``
255
259
256
260
* - ``samplesPerPartition``
257
- - The number of sample documents to take for each partition.
261
+ - The number of sample documents to take for each partition in
262
+ order to establish a ``partitionKey`` range for each partition.
263
+
264
+ A greater number of ``samplesPerPartition`` helps to find
265
+ ``partitionKey`` ranges that more closely match the
266
+ ``partitionSizeMB`` you specify.
267
+
268
+ .. note::
269
+
270
+ For sampling to improve performance, ``samplesPerPartition``
271
+ must be fewer than the number of documents within each of
272
+ your partitions.
273
+
274
+ You can estimate the number of documents within each of your
275
+ partitions by dividing your ``partitionSizeMB`` by the
276
+ average document size (in MB) in your collection.
258
277
259
278
**Default:** ``10``
260
279
280
+ .. example::
281
+
282
+ For a collection with 640 documents with an average document
283
+ size of 0.5 MB, the default ``MongoSamplePartitioner`` configuration
284
+ values creates 5 partitions with 128 documents per partition.
285
+
286
+ The MongoDB Spark Connector samples 50 documents (the default 10
287
+ per intended partition) and defines 5 partitions by selecting
288
+ ``partitionKey`` ranges from the sampled documents.
289
+
261
290
.. _conf-mongoshardedpartitioner:
262
291
263
292
``MongoShardedPartitioner`` Configuration
@@ -303,7 +332,8 @@ Partitioner Configuration
303
332
**Default:** ``_id``
304
333
305
334
* - ``partitionSizeMB``
306
- - The size (in MB) for each partition
335
+ - The size (in MB) for each partition. Smaller partition sizes
336
+ create more partitions containing fewer documents.
307
337
308
338
**Default:** ``64``
309
339
@@ -328,7 +358,8 @@ Partitioner Configuration
328
358
**Default:** ``_id``
329
359
330
360
* - ``numberOfPartitions``
331
- - The number of partitions to create.
361
+ - The number of partitions to create. A greater number of
362
+ partitions means fewer documents per partition.
332
363
333
364
**Default:** ``64``
334
365
@@ -353,7 +384,8 @@ Partitioner Configuration
353
384
**Default:** ``_id``
354
385
355
386
* - ``partitionSizeMB``
356
- - The size (in MB) for each partition
387
+ - The size (in MB) for each partition. Smaller partition sizes
388
+ create more partitions containing fewer documents.
357
389
358
390
**Default:** ``64``
359
391
0 commit comments