Skip to content

Commit 136c08f

Browse files
authored
(DOCSP-18818) Improve documentation for unsupported types (#77)
* (DOCSP-18818) Improve documentation for unsupported types
1 parent ca5676d commit 136c08f

File tree

2 files changed

+77
-5
lines changed

2 files changed

+77
-5
lines changed

source/faq.txt

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,3 +47,72 @@ In MongoDB deployments with mixed versions of :binary:`~bin.mongod`, it is
4747
possible to get an ``Unrecognized pipeline stage name: '$sample'``
4848
error. To mitigate this situation, explicitly configure the partitioner
4949
to use and define the Schema when using DataFrames.
50+
51+
How do I use MongoDB BSON types that are unsupported in Spark?
52+
--------------------------------------------------------------
53+
54+
Some custom MongoDB BSON types, such as ``ObjectId``, are unsupported
55+
in Spark.
56+
57+
The MongoDB Spark Connector converts custom MongoDB data types to and
58+
from extended JSON-like representations of those data types that are
59+
compatible with Spark. See :ref:`<bson-spark-datatypes>` for a list of
60+
custom MongoDB types and their Spark counterparts.
61+
62+
Spark Datasets
63+
~~~~~~~~~~~~~~
64+
65+
To create a standard Dataset with custom MongoDB data types, use
66+
``fieldTypes`` helpers:
67+
68+
.. code-block:: scala
69+
70+
import com.mongodb.spark.sql.fieldTypes
71+
72+
case class MyData(id: fieldTypes.ObjectId, a: Int)
73+
val ds = spark.createDataset(Seq(MyData(fieldTypes.ObjectId(new ObjectId()), 99)))
74+
ds.show()
75+
76+
The preceding example creates a Dataset containing the following fields
77+
and data types:
78+
79+
- The ``id`` field is a custom MongoDB BSON type, ``ObjectId``, defined
80+
by ``fieldTypes.ObjectId``.
81+
82+
- The ``a`` field is an ``Int``, a data type available in Spark.
83+
84+
Spark DataFrames
85+
~~~~~~~~~~~~~~~~
86+
87+
To create a DataFrame with custom MongoDB data types, you must supply
88+
those types when you create the RDD and schema:
89+
90+
- Create RDDs using custom MongoDB BSON types
91+
(e.g. ``ObjectId``). The Spark Connector handles converting
92+
those custom types into Spark-compatible data types.
93+
94+
- Declare schemas using the ``StructFields`` helpers for data types
95+
that are not natively supported by Spark
96+
(e.g. ``StructFields.objectId``). Refer to
97+
:ref:`<bson-spark-datatypes>` for the mapping between BSON and custom
98+
MongoDB Spark types.
99+
100+
.. code-block:: scala
101+
102+
import org.apache.spark.sql.Row
103+
import org.apache.spark.sql.types.{StructType, StructField, IntegerType}
104+
import com.mongodb.spark.sql.helpers.StructFields
105+
106+
val data = Seq(Row(Row(new ObjectId().toHexString()), 99))
107+
val rdd = spark.sparkContext.parallelize(data)
108+
val schema = StructType(List(StructFields.objectId("id", true), StructField("a", IntegerType, true)))
109+
val df = spark.createDataFrame(rdd, schema)
110+
df.show()
111+
112+
The preceding example creates a DataFrame containing the following
113+
fields and data types:
114+
115+
- The ``id`` field is a custom MongoDB BSON type, ``ObjectId``, defined
116+
by ``StructFields.objectId``.
117+
118+
- The ``a`` field is an ``Int``, a data type available in Spark.

source/scala/datasets-and-sql.txt

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -262,21 +262,24 @@ to MongoDB using the DataFrameWriter directly:
262262
centenarians.write.option("collection", "hundredClub").mode("overwrite").mongo()
263263
centenarians.write.option("collection", "hundredClub").mode("overwrite").format("mongo").save()
264264

265+
.. _bson-spark-datatypes:
266+
265267
DataTypes
266268
---------
267269

268270
Spark supports a limited number of data types to ensure that all BSON
269-
types can be round tripped in and out of Spark DataFrames/Datasets. For
270-
any unsupported Bson Types, custom StructTypes are created.
271+
types can be round tripped in and out of Spark DataFrames/Datasets. The
272+
Spark Connector creates custom StructTypes for any unsupported BSON
273+
types.
271274

272-
The following table shows the mapping between the Bson Types and Spark
275+
The following table shows the mapping between the BSON Types and Spark
273276
Types:
274277

275278
.. list-table::
276279
:header-rows: 1
277280
:widths: 25 75
278281

279-
* - Bson Type
282+
* - BSON Type
280283
- Spark Type
281284

282285
* - ``Document``
@@ -352,7 +355,7 @@ represent the unsupported BSON Types:
352355
:widths: 45 30 30
353356

354357

355-
* - Bson Type
358+
* - BSON Type
356359
- Scala case class
357360
- JavaBean
358361

0 commit comments

Comments
 (0)