You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -163,7 +175,7 @@ First fetch the HTML using python-requests and then feed the response body to ``
163
175
164
176
Select syntaxes
165
177
+++++++++++++++
166
-
It is possible to select which syntaxes to extract by passing a list with the desired ones to extract. Valid values: 'microdata', 'json-ld', 'opengraph', 'microformat', 'rdfa'. If no list is passed all syntaxes will be extracted and returned::
178
+
It is possible to select which syntaxes to extract by passing a list with the desired ones to extract. Valid values: 'microdata', 'json-ld', 'opengraph', 'microformat', 'rdfa' and 'dublincore'. If no list is passed all syntaxes will be extracted and returned::
167
179
168
180
>>> r = requests.get('http://www.songkick.com/artists/236156-elysian-fields')
169
181
>>> base_url = get_base_url(r.text, r.url)
@@ -207,9 +219,9 @@ It is possible to select which syntaxes to extract by passing a list with the de
207
219
208
220
Uniform
209
221
+++++++
210
-
Another option is to uniform the output of microformat, opengraph, microdata and json-ld syntaxes to the following structure: ::
222
+
Another option is to uniform the output of microformat, opengraph, microdata, dublincore and json-ld syntaxes to the following structure: ::
211
223
212
-
{'@context': 'http://example.com',
224
+
{'@context': 'http://example.com',
213
225
'@type': 'example_type',
214
226
/* All other the properties in keys here */
215
227
}
@@ -584,6 +596,80 @@ Microformat extraction
584
596
}
585
597
}]
586
598
599
+
DublinCore extraction
600
+
++++++++++++++++++++++++++++++
601
+
::
602
+
603
+
>>> import pprint
604
+
>>> pp = pprint.PrettyPrinter(indent=2)
605
+
>>> from extruct.dublincore import DublinCoreExtractor
606
+
>>> html = '''<head profile="http://dublincore.org/documents/dcq-html/">
607
+
... <title>Expressing Dublin Core in HTML/XHTML meta and link elements</title>
0 commit comments