Skip to content

Commit 900bd29

Browse files
authored
(DOCS-15355) Fix example (#129)
* (DOCS-15355) Fix example
1 parent 1bcfc8d commit 900bd29

File tree

2 files changed

+33
-27
lines changed

2 files changed

+33
-27
lines changed
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
.. important::
2+
3+
Avoid streaming large datasets to your console. Streaming to your
4+
console is memory intensive and intended only for testing purposes.

source/structured-streaming.txt

Lines changed: 29 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -211,14 +211,14 @@ more about continuous processing, see the `Spark documentation <https://spark.ap
211211
.load()
212212
)
213213

214-
query = (streamingDataFrame
214+
dataStreamWriter = (streamingDataFrame
215215
.writeStream
216216
.trigger(continuous="1 second")
217217
.format("memory")
218218
.outputMode("append")
219219
)
220220

221-
query.start()
221+
query = dataStreamWriter.start()
222222

223223
.. note::
224224

@@ -279,12 +279,12 @@ more about continuous processing, see the `Spark documentation <https://spark.ap
279279
.format("mongodb")
280280
.load()
281281

282-
val query = streamingDataFrame.writeStream
282+
val dataStreamWriter = streamingDataFrame.writeStream
283283
.trigger(Trigger.Continuous("1 second"))
284284
.format("memory")
285285
.outputMode("append")
286286

287-
query.start()
287+
val query = dataStreamWriter.start()
288288

289289
.. note::
290290

@@ -334,7 +334,7 @@ Stream to MongoDB from a CSV File
334334
.getOrCreate()
335335

336336
# define a streaming query
337-
query = (spark
337+
dataStreamWriter = (spark
338338
.readStream
339339
.format("csv")
340340
.option("header", "true")
@@ -352,7 +352,7 @@ Stream to MongoDB from a CSV File
352352
)
353353

354354
# run the query
355-
query.start()
355+
query = dataStreamWriter.start()
356356

357357
- id: scala
358358
content: |
@@ -381,7 +381,7 @@ Stream to MongoDB from a CSV File
381381
.getOrCreate()
382382

383383
// define a streaming query
384-
val query = spark.readStream
384+
val dataStreamWriter = spark.readStream
385385
.format("csv")
386386
.option("header", "true")
387387
.schema(<csv-schema>)
@@ -397,10 +397,10 @@ Stream to MongoDB from a CSV File
397397
.outputMode("append")
398398

399399
// run the query
400-
query.start()
400+
val query = dataStreamWriter.start()
401401

402-
Stream to a CSV File from MongoDB
403-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
402+
Stream to your Console from MongoDB
403+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
404404

405405
.. tabs-drivers::
406406

@@ -409,17 +409,19 @@ Stream to a CSV File from MongoDB
409409
- id: python
410410
content: |
411411

412-
To create a :ref:`read stream <read-structured-stream>` to a
413-
``.csv`` file from MongoDB, first create a `DataStreamReader <https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.streaming.DataStreamReader.html>`__
412+
To create a :ref:`read stream <read-structured-stream>`
413+
output to your console from MongoDB, first create a `DataStreamReader <https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.streaming.DataStreamReader.html>`__
414414
from MongoDB, then use that ``DataStreamReader`` to
415415
create a `DataStreamWriter <https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.streaming.DataStreamWriter.html>`__
416-
to a new ``.csv`` file. Finally, use the ``start()`` method
416+
to the console. Finally, use the ``start()`` method
417417
to begin the stream.
418418

419419
As new data is inserted into MongoDB, MongoDB streams that
420-
data out to a ``.csv`` file in the `outputMode <https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.streaming.DataStreamWriter.outputMode.html#pyspark.sql.streaming.DataStreamWriter.outputMode>`__
420+
data out to your console in the `outputMode <https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.streaming.DataStreamWriter.outputMode.html#pyspark.sql.streaming.DataStreamWriter.outputMode>`__
421421
you specify.
422422

423+
.. include:: /includes/warn-console-stream.txt
424+
423425
.. code-block:: python
424426
:copyable: true
425427
:emphasize-lines: 19, 27, 30
@@ -438,10 +440,10 @@ Stream to a CSV File from MongoDB
438440
.add('company_name', StringType())
439441
.add('price', DoubleType())
440442
.add('tx_time', TimestampType())
441-
)
443+
)
442444

443445
# define a streaming query
444-
query = (spark
446+
dataStreamWriter = (spark
445447
.readStream
446448
.format("mongodb")
447449
.option("spark.mongodb.connection.uri", <mongodb-connection-string>)
@@ -451,29 +453,30 @@ Stream to a CSV File from MongoDB
451453
.load()
452454
# manipulate your streaming data
453455
.writeStream
454-
.format("csv")
455-
.option("path", "/output/")
456+
.format("console")
456457
.trigger(continuous="1 second")
457458
.outputMode("append")
458459
)
459460

460461
# run the query
461-
query.start()
462+
query = dataStreamWriter.start()
462463

463464
- id: scala
464465
content: |
465466

466-
To create a :ref:`read stream <read-structured-stream>` to a
467-
``.csv`` file from MongoDB, first create a `DataStreamReader <https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/streaming/DataStreamReader.htmll>`__
467+
To create a :ref:`read stream <read-structured-stream>`
468+
output to your console from MongoDB, first create a `DataStreamReader <https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/streaming/DataStreamReader.htmll>`__
468469
from MongoDB, then use that ``DataStreamReader`` to
469470
create a `DataStreamWriter <https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/streaming/DataStreamWriter.html>`__
470-
to a new ``.csv`` file. Finally, use the ``start()`` method
471+
to the console. Finally, use the ``start()`` method
471472
to begin the stream.
472473

473474
As new data is inserted into MongoDB, MongoDB streams that
474-
data out to a ``.csv`` file in the `outputMode <https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/streaming/DataStreamWriter.html#outputMode(outputMode:String):org.apache.spark.sql.streaming.DataStreamWriter[T]>`__
475+
data out to your console in the `outputMode <https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/streaming/DataStreamWriter.html#outputMode(outputMode:String):org.apache.spark.sql.streaming.DataStreamWriter[T]>`__
475476
you specify.
476477

478+
.. include:: /includes/warn-console-stream.txt
479+
477480
.. code-block:: scala
478481
:copyable: true
479482
:emphasize-lines: 17, 25, 28
@@ -494,7 +497,7 @@ Stream to a CSV File from MongoDB
494497
.add("tx_time", TimestampType())
495498

496499
// define a streaming query
497-
val query = spark.readStream
500+
val dataStreamWriter = spark.readStream
498501
.format("mongodb")
499502
.option("spark.mongodb.connection.uri", <mongodb-connection-string>)
500503
.option("spark.mongodb.database", <database-name>)
@@ -503,10 +506,9 @@ Stream to a CSV File from MongoDB
503506
.load()
504507
// manipulate your streaming data
505508
.writeStream
506-
.format("csv")
507-
.option("path", "/output/")
509+
.format("console")
508510
.trigger(Trigger.Continuous("1 second"))
509511
.outputMode("append")
510512

511513
// run the query
512-
query.start()
514+
val query = dataStreamWriter.start()

0 commit comments

Comments
 (0)