@@ -57,6 +57,77 @@ Schema Inference
57
57
58
58
.. include:: /scala/schema-inference.rst
59
59
60
+ .. _spark-schema-hint:
61
+
62
+ Specify Known Fields with Schema Hints
63
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
64
+
65
+ You can specify a schema containing known field values to use during
66
+ schema inference by specifying the ``schemaHint`` configuration option. You can
67
+ specify the ``schemaHint`` option in any of the following Spark formats:
68
+
69
+ .. list-table::
70
+ :header-rows: 1
71
+ :widths: 35 65
72
+
73
+ * - Type
74
+ - Format
75
+
76
+ * - DDL
77
+ - ``<field one name> <FIELD ONE TYPE>, <field two name> <FIELD TWO TYPE>``
78
+
79
+ * - SQL DDL
80
+ - ``STRUCT<<field one name>: <FIELD ONE TYPE>, <field two name>: <FIELD TWO TYPE>``
81
+
82
+ * - JSON
83
+ - .. code-block:: json
84
+ :copyable: false
85
+
86
+ { "type": "struct", "fields": [
87
+ { "name": "<field name>", "type": "<field type>", "nullable": <true/false> },
88
+ { "name": "<field name>", "type": "<field type>", "nullable": <true/false> }]}
89
+
90
+ The following example shows how to specify the ``schemaHint`` option in each
91
+ format by using the Spark shell. The example specifies a string-valued field named
92
+ ``"value"`` and an integer-valued field named ``"count"``.
93
+
94
+ .. code-block:: scala
95
+
96
+ import org.apache.spark.sql.types._
97
+
98
+ val mySchema = StructType(Seq(
99
+ StructField("value", StringType),
100
+ StructField("count", IntegerType))
101
+
102
+ // Generate DDL format
103
+ mySchema.toDDL
104
+
105
+ // Generate SQL DDL format
106
+ mySchema.sql
107
+
108
+ // Generate Simple String DDL format
109
+ mySchema.simpleString
110
+
111
+ // Generate JSON format
112
+ mySchema.json
113
+
114
+ You can also specify the ``schemaHint`` option in the Simple String DDL format,
115
+ or in JSON format by using PySpark, as shown in the following example:
116
+
117
+ .. code-block:: python
118
+
119
+ from pyspark.sql.types import StructType, StructField, StringType, IntegerType
120
+
121
+ mySchema = StructType([
122
+ StructField('value', StringType(), True),
123
+ StructField('count', IntegerType(), True)])
124
+
125
+ # Generate Simple String DDL format
126
+ mySchema.simpleString()
127
+
128
+ # Generate JSON format
129
+ mySchema.json()
130
+
60
131
Filters
61
132
-------
62
133
0 commit comments