Skip to content

Commit 4f26694

Browse files
committed
(DOCSP-21715) Remove RDD support (#97)
* (DOCSP-21715) Remove RDD support
1 parent e58864b commit 4f26694

35 files changed

+273
-1357
lines changed

source/faq.txt

Lines changed: 1 addition & 80 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ How can I achieve data locality?
88
--------------------------------
99

1010
For any MongoDB deployment, the Mongo Spark Connector sets the
11-
preferred location for an RDD to be where the data is:
11+
preferred location for a DataFrame or Dataset to be where the data is:
1212

1313
- For a non sharded system, it sets the preferred location to be the
1414
hostname(s) of the standalone or the replica set.
@@ -30,89 +30,10 @@ To promote data locality,
3030
To partition the data by shard use the
3131
:ref:`conf-shardedpartitioner`.
3232

33-
How do I interact with Spark Streams?
34-
-------------------------------------
35-
36-
Spark streams can be considered as a potentially infinite source of
37-
RDDs. Therefore, anything you can do with an RDD, you can do with the
38-
results of a Spark Stream.
39-
40-
For an example, see :mongo-spark:`SparkStreams.scala
41-
</blob/master/examples/src/test/scala/tour/SparkStreams.scala>`
42-
4333
How do I resolve ``Unrecognized pipeline stage name`` Error?
4434
------------------------------------------------------------
4535

4636
In MongoDB deployments with mixed versions of :binary:`~bin.mongod`, it is
4737
possible to get an ``Unrecognized pipeline stage name: '$sample'``
4838
error. To mitigate this situation, explicitly configure the partitioner
4939
to use and define the Schema when using DataFrames.
50-
51-
How do I use MongoDB BSON types that are unsupported in Spark?
52-
--------------------------------------------------------------
53-
54-
Some custom MongoDB BSON types, such as ``ObjectId``, are unsupported
55-
in Spark.
56-
57-
The MongoDB Spark Connector converts custom MongoDB data types to and
58-
from extended JSON-like representations of those data types that are
59-
compatible with Spark. See :ref:`<bson-spark-datatypes>` for a list of
60-
custom MongoDB types and their Spark counterparts.
61-
62-
Spark Datasets
63-
~~~~~~~~~~~~~~
64-
65-
To create a standard Dataset with custom MongoDB data types, use
66-
``fieldTypes`` helpers:
67-
68-
.. code-block:: scala
69-
70-
import com.mongodb.spark.sql.fieldTypes
71-
72-
case class MyData(id: fieldTypes.ObjectId, a: Int)
73-
val ds = spark.createDataset(Seq(MyData(fieldTypes.ObjectId(new ObjectId()), 99)))
74-
ds.show()
75-
76-
The preceding example creates a Dataset containing the following fields
77-
and data types:
78-
79-
- The ``id`` field is a custom MongoDB BSON type, ``ObjectId``, defined
80-
by ``fieldTypes.ObjectId``.
81-
82-
- The ``a`` field is an ``Int``, a data type available in Spark.
83-
84-
Spark DataFrames
85-
~~~~~~~~~~~~~~~~
86-
87-
To create a DataFrame with custom MongoDB data types, you must supply
88-
those types when you create the RDD and schema:
89-
90-
- Create RDDs using custom MongoDB BSON types
91-
(e.g. ``ObjectId``). The Spark Connector handles converting
92-
those custom types into Spark-compatible data types.
93-
94-
- Declare schemas using the ``StructFields`` helpers for data types
95-
that are not natively supported by Spark
96-
(e.g. ``StructFields.objectId``). Refer to
97-
:ref:`<bson-spark-datatypes>` for the mapping between BSON and custom
98-
MongoDB Spark types.
99-
100-
.. code-block:: scala
101-
102-
import org.apache.spark.sql.Row
103-
import org.apache.spark.sql.types.{StructType, StructField, IntegerType}
104-
import com.mongodb.spark.sql.helpers.StructFields
105-
106-
val data = Seq(Row(Row(new ObjectId().toHexString()), 99))
107-
val rdd = spark.sparkContext.parallelize(data)
108-
val schema = StructType(List(StructFields.objectId("id", true), StructField("a", IntegerType, true)))
109-
val df = spark.createDataFrame(rdd, schema)
110-
df.show()
111-
112-
The preceding example creates a DataFrame containing the following
113-
fields and data types:
114-
115-
- The ``id`` field is a custom MongoDB BSON type, ``ObjectId``, defined
116-
by ``StructFields.objectId``.
117-
118-
- The ``a`` field is an ``Int``, a data type available in Spark.

source/getting-started.txt

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,6 @@ Prerequisites
1515

1616
.. include:: /includes/list-prerequisites.rst
1717

18-
- Java 8 or later.
19-
2018
.. _pyspark-shell:
2119
.. _scala-getting-started:
2220
.. _python-basics:

source/includes/bson-type-consideration.rst

Lines changed: 0 additions & 4 deletions
This file was deleted.
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
.. code-block:: javascript
2+
3+
{ "_id" : ObjectId("585024d558bef808ed84fc3e"), "name" : "Bilbo Baggins", "age" : 50 }
4+
{ "_id" : ObjectId("585024d558bef808ed84fc3f"), "name" : "Gandalf", "age" : 1000 }
5+
{ "_id" : ObjectId("585024d558bef808ed84fc40"), "name" : "Thorin", "age" : 195 }
6+
{ "_id" : ObjectId("585024d558bef808ed84fc41"), "name" : "Balin", "age" : 178 }
7+
{ "_id" : ObjectId("585024d558bef808ed84fc42"), "name" : "Kíli", "age" : 77 }
8+
{ "_id" : ObjectId("585024d558bef808ed84fc43"), "name" : "Dwalin", "age" : 169 }
9+
{ "_id" : ObjectId("585024d558bef808ed84fc44"), "name" : "Óin", "age" : 167 }
10+
{ "_id" : ObjectId("585024d558bef808ed84fc45"), "name" : "Glóin", "age" : 158 }
11+
{ "_id" : ObjectId("585024d558bef808ed84fc46"), "name" : "Fíli", "age" : 82 }
12+
{ "_id" : ObjectId("585024d558bef808ed84fc47"), "name" : "Bombur" }

source/includes/extracts-command-line.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ content: |
33
- the ``--packages`` option to download the MongoDB Spark Connector
44
package. The following package is available:
55
6-
- ``mongo-spark-connector_{+scala-version+}`` for use with Scala 2.12.x
6+
- ``mongo-spark-connector``
77
88
- the ``--conf`` option to configure the MongoDB Spark Connnector.
99
These settings configure the ``SparkConf`` object.
@@ -39,7 +39,7 @@ content: |
3939
4040
./bin/spark-shell --conf "spark.mongodb.read.uri=mongodb://127.0.0.1/test.myCollection?readPreference=primaryPreferred" \
4141
--conf "spark.mongodb.write.uri=mongodb://127.0.0.1/test.myCollection" \
42-
--packages org.mongodb.spark:mongo-spark-connector_{+scala-version+}:{+current-version+}
42+
--packages org.mongodb.spark:mongo-spark-connector:{+current-version+}
4343
4444
.. include:: /includes/extracts/list-configuration-explanation.rst
4545
@@ -56,7 +56,7 @@ content: |
5656
5757
./bin/pyspark --conf "spark.mongodb.read.uri=mongodb://127.0.0.1/test.myCollection?readPreference=primaryPreferred" \
5858
--conf "spark.mongodb.write.uri=mongodb://127.0.0.1/test.myCollection" \
59-
--packages org.mongodb.spark:mongo-spark-connector_{+scala-version+}:{+current-version+}
59+
--packages org.mongodb.spark:mongo-spark-connector:{+current-version+}
6060
6161
.. include:: /includes/extracts/list-configuration-explanation.rst
6262
@@ -73,7 +73,7 @@ content: |
7373
7474
./bin/sparkR --conf "spark.mongodb.read.uri=mongodb://127.0.0.1/test.myCollection?readPreference=primaryPreferred" \
7575
--conf "spark.mongodb.write.uri=mongodb://127.0.0.1/test.myCollection" \
76-
--packages org.mongodb.spark:mongo-spark-connector_{+scala-version+}:{+current-version+}
76+
--packages org.mongodb.spark:mongo-spark-connector:{+current-version+}
7777
7878
.. include:: /includes/extracts/list-configuration-explanation.rst
7979
...

source/includes/list-prerequisites.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,6 @@
44

55
- Running MongoDB instance (version 4.0 or later).
66

7-
- Spark version 3.1 or later
7+
- Spark version 3.1 or later.
8+
9+
- Java 8 or later.

source/includes/new-format-name.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
.. important::
2+
3+
In version 10.0.0 and later of the Connector, use the format
4+
``mongodb`` to read from and write to MongoDB:
5+
6+
``df = spark.read.format("mongodb").load()``

source/includes/scala-java-aggregation.rst

Lines changed: 0 additions & 7 deletions
This file was deleted.
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
Provide the Spark Core, Spark SQL, and MongoDB Spark Connector
2-
dependencies to your dependency management tool.
2+
dependencies to your dependency management tool.

source/includes/scala-java-read-readconfig.rst

Lines changed: 0 additions & 8 deletions
This file was deleted.

0 commit comments

Comments
 (0)