-
Notifications
You must be signed in to change notification settings - Fork 881
Fix footnote ordering #1546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix footnote ordering #1546
Conversation
…heir references appear Fixes #1367
The test failures for the |
While I understand that it may be easier to implement this way, I am unconvinced that it is imposable to implement in the inline processor. My concern is that it may match things it shouldn't. For example, what if a footnote is included in a code span? Some tests might convince me. In fact, this needs some new tests added to demonstrate that it is working as advertised. |
Thank you for the quick feedback. The code span example is definitely a valid concern. I can add some tests to demonstrate the functionality + a footnote reference inside a code span. I'll think about how to implement this with the inline processor. The problem is not detecting footnote references in the inline processor, it's the timing, since the tree processor which renders the footnotes runs before that. |
- Add changelog entry - Fix minor formatting issue
I've added a bunch of tests, including footnote references within code spans. See in particular Concerning the block vs inline discussion: One consideration I would to add is that my implementation does not remove or modify any of the footnote references/labels during block processing. It just registers the ids of any footnote references it finds in the blocks. I have not touched the parts of the code that remove substrings or render html elements. I merely reorder the dict that holds the ids and footnote content just before the footnotes are rendered. |
Sorry that I didn't mention this previously, but all new tests should be under the appropriate subdir of |
- Move new tests to proper file (tests/test_syntax/extensions/test_footnotes.py) - Adapt new tests to use markdown.test_tools.TestCase and style convention in test_footnotes.py - Remove a few of the new tests which turned out to be redundant, given the existing ones in test_footnotes.py
Ah, I see. No worries, I've moved and adapted the new tests. |
Sorry, don't review or merge this just yet. I've come up with a test case that messes up the order (when a footnote reference id inside e.g. a code span is identical to a proper footnote reference id elsewhere). It is an edge case, but I'd rather get this corrected before review. I'll provide a fix early next week. |
I've gone down the rabbithole with this and can report a few things:
It is indeed possible (trivial, even) to implement tracking the order of footnote references in an Inlineprocessor. What is not possible, however, is to then generate the list of footnotes in the order we have tracked, because the list of footnotes is produced by a Treeprocessor. To my knowledge, that is simply how the architecture is designed; Treeprocessors always run before Inlineprocessors. So instead of trying to generate the footnotes div in the correct order, I have explored different ways of reordering it in a Postprocessor. They all lead to a fair amount of reparsing that seems counterintuitive and excessive at this stage in the pipeline. I've had some succes with a regex-based reordering method that works on the generated footnotes div. But it fails So. Based on my work thus far, I will suggest that for this PR we go back to 0c2e4bf and proceed with review. This entails ignoring the edge case that ce45459 tests. It is a pretty remote possibility, but even if it happens, it will not produce any errors. It will simply result in an illogical order of footnotes. This is, in my view, much preferable to the current behavior where the order of footnote definitions dictates the order of footnotes. |
Actually, Inlineprocessors are wrapped in and run from markdown.treeprocessors.InlineProcessor, which is the first treeprocessor by default (as listed here). Therefore, it is possible to run a Treeprocessors either before or after inline processors by setting the priority either higher or lower than |
- Remove parts of the previous implementation - Add FootnoteReorderingProcessor which reorders the footnotes if necessary - Move backlink title compatability trick to main class init method to avoid repetition
Ah, that makes a lot of sense. Thank you for explaining that. 5d7f8ca adds an implementation that uses a treeprocessor, which is the cleanest solution of the ones I've experimented with. It does fail one test, however: |
The tests should be cleaning up after themselves. I would probably need to understand more about what the differences are and why they are different. |
The specific input file that fails to parse correctly is tests/extensions/extra/raw-html.txt But let me see if I can show this in a simpler way. The problem occurs when we use the import markdown
MD = """
<div markdown="1">
Reference to [^first]
[^first]: First footnote definition
</div>
<div markdown="1">
Reference to [^second]
[^second]: Second footnote definition
</div>
"""
result = markdown.markdown(MD, extensions=["extra"], extension_configs={})
print(result) Prints this: <div>
<p>Reference to <sup id="fnref:first"><a class="footnote-ref" href="#fn:first">2</a></sup></p>
</div>
<div>
<p>Reference to <sup id="fnref:second"><a class="footnote-ref" href="#fn:second">1</a></sup></p>
</div>
<div class="footnote">
<hr />
<ol>
<li id="fn:second">
<p>Second footnote definition <a class="footnote-backref" href="#fnref:second" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:first">
<p>First footnote definition <a class="footnote-backref" href="#fnref:first" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
</ol>
</div> Which should be this: <div>
<p>Reference to <sup id="fnref:first"><a class="footnote-ref" href="#fn:first">1</a></sup></p>
</div>
<div>
<p>Reference to <sup id="fnref:second"><a class="footnote-ref" href="#fn:second">2</a></sup></p>
</div>
<div class="footnote">
<hr />
<ol>
<li id="fn:first">
<p>First footnote definition <a class="footnote-backref" href="#fnref:first" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:second">
<p>Second footnote definition <a class="footnote-backref" href="#fnref:second" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
</ol>
</div> Why? |
I will state that plugins can do almost anything, and can cause some content to be processed out of the expected order, so there may be other 3rd party plugins that could potentially manifest something similar. If it turns out we cannot guarantee order, I do wonder if all this effort is worth it. I do imagine it is possible to reasonably ensure order, but it may not be through the current approach. Just to be clear, I have not explored the current approach, so I don't know yet if what you are doing can simply be tweaked or not. I'll have to find some time to explore further. |
I will note that the failing test passes without any of the changes here. In other words, the original order is preserved. So, the changes here seem to be changing the order. |
With the changes here, this is what I am getting: >>> src = """
... <div markdown="1">
... Reference to [^first]
...
... [^first]: First footnote definition
... </div>
...
... <div markdown="1">
... Reference to [^second]
...
... [^second]: Second footnote definition
... </div>
... """
>>> md = markdown.Markdown(extensions=['footnotes', 'md_in_html'])
>>> print(md.convert(src))
<div>
<p>Reference to <sup id="fnref:first"><a class="footnote-ref" href="#fn:first">2</a></sup></p>
</div>
<div>
<p>Reference to <sup id="fnref:second"><a class="footnote-ref" href="#fn:second">1</a></sup></p>
</div>
<div class="footnote">
<hr />
<ol>
<li id="fn:second">
<p>Second footnote definition <a class="footnote-backref" href="#fnref:second" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:first">
<p>First footnote definition <a class="footnote-backref" href="#fnref:first" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
</ol>
</div> That matches your result. But let's dig a little deeper... >>> md.inlinePatterns[2]
<markdown.extensions.footnotes.FootnoteInlineProcessor object at 0x000001A3C5160B00>
>>> md.inlinePatterns[2].footnotes.footnote_order
['second', 'first'] Here we can see that the diff --git a/markdown/extensions/footnotes.py b/markdown/extensions/footnotes.py
index 0d08dc7..7f29d06 100644
--- a/markdown/extensions/footnotes.py
+++ b/markdown/extensions/footnotes.py
@@ -163,6 +163,7 @@ class FootnoteExtension(Extension):
def addFootnoteRef(self, id: str) -> None:
""" Store a footnote reference id in order of appearance. """
if id not in self.footnote_order:
+ print(f'Saving footnote ID: {id}')
self.footnote_order.append(id)
def get_separator(self) -> str: and check the output... >>> markdown.markdown(src, extensions=['footnotes', 'md_in_html'])
Saving footnote ID: second
Saving footnote ID: first
'<div>\n<p>Reference to <sup id="fnref:first"><a class="footnote-ref" href="#fn:first">2</a></sup></p>\n</div>\n<div>\n<p>Reference to <sup id="fnref:second"><a class="footnote-ref" href="#fn:second">1</a></sup></p>\n</div>\n<div class="footnote">\n<hr />\n<ol>\n<li id="fn:second">\n<p>Second footnote definition <a class="footnote-backref" href="#fnref:second" title="Jump back to footnote 1 in the text">↩</a></p>\n</li>\n<li id="fn:first">\n<p>First footnote definition <a class="footnote-backref" href="#fnref:first" title="Jump back to footnote 2 in the text">↩</a></p>\n</li>\n</ol>\n</div>' Sure enough, the |
EDIT: I wrote this before reading your latest reply, @waylan .
Correct. I am fairly certain that is because the order is dictated by footnote definitions, which are detected during block processing. As opposed to my recent changes which detect the order of footnote references during inline processing. My most recent implementation assumes that inline matches are processed in document order, depth first. I suspect that is not the case in this particular test, due to Though I would also suspect that other plugins which rely upon inline matches to occur in document order would be susceptible to similar issues. |
I will also say that the test we are talking about passes in my earlier implementation 0c2e4bf. That implementation relies upon block processing to detect the order. I think we can safely disregard the edge case I mentioned afterwards and tested in 5d7f8ca. Worst case scenario is: A user uses the same footnote id in a code example as in a real footnote in the same document. That footnote may get a different place in the order of footnotes than expected, but it will do so silently and not produce an error. For me, that would be preferable to the current behavior, where improper order can be achieved much easier than in the edge case; one just has to follow the instructions in the documentation and put definitions "anywhere in the document", as stated in the PHP Markdown Extra docs. |
I have done some spelunking with the Python debugger and md_in_html is not processing anything out of order. It behaves exactly as you would think. In fact, after the blockparser has run, but before any treeprocessors have run, the tree contains the following structure: <div>
<p>Reference to [^first]</p>
</div>
<div>
<p>Reference to [^second]</p>
</div> The issue would seem to be with treeprocessors or inlineprocessors. What would be causing them to process things out of order I don't know. |
Oh, that's interesting. I may dig a little if/when I get some time. |
I'll await for now then, as I don't know much about the internals of the processors. Do let me know if I can do anything to assist. |
I resolved the issue with one minor change in 1c160ef. >>> src = '''
... * Reference to [^first]
...
... * Reference to [^second]
...
... [^first]: First footnote definition
... [^second]: Second footnote definition
... '''
>>> print(markdown.markdown(src, extensions=['footnotes']))
<ul>
<li>
<p>Reference to <sup id="fnref:first"><a class="footnote-ref" href="#fn:first">2</a></sup></p>
</li>
<li>
<p>Reference to <sup id="fnref:second"><a class="footnote-ref" href="#fn:second">1</a></sup></p>
</li>
</ul>
<div class="footnote">
<hr />
<ol>
<li id="fn:second">
<p>Second footnote definition <a class="footnote-backref" href="#fnref:second" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:first">
<p>First footnote definition <a class="footnote-backref" href="#fnref:first" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
</ol>
</div> Note that that example uses a "loose list" (there is a blank line between the list items), which results in each list item being wrapped in a Fortunately, none of the tests seem to rely on this weird behavior, so I think it should be safe to move forward. Hopefully no third-party extensions are relying on this and break from the change. ¯\_(ツ)_/¯ |
I'm wondering if perhaps 1c160ef should be in a separate PR by itself and this PR should be rebased on top of that. At a minimum, we need to note this change in the changelog. @facelessuser any thoughts? |
I'm not sure I have any strong opinions on the change being separate. If we want to specifically track when this change occured, then yeah, maybe submit it as a separate change. This footnote change exposed potential issues with this, and seems like a fine place to fix it as well. I would definitely note the change in the changelog though, if there are concerns, it might cause issues. |
- Fix docstrings in footnotes extension - Modify style of new tests to use self.dedent() - Add an entry in the changelog concerning the change to ensure inline processing occurs in document order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thank you for all your work on this.
I'm happy to help. Thank you for a nice process. One small thing that keeps lingering at the back of my mind: I wonder if the change to |
Perhaps. I briefly thought about that as well, but what could we do to address any potential performance issue? Would using collections.deque for the stack make sense here? If you are so inclined, feel free to benchmark and report back. |
That's what I was thinking. I should be able to do some benchmarking in the next day or two. |
So I managed to do a bit of benchmarking. I benchmarked the processing of two markdown documents using
Let me know if you want to see the contents of these files. The I am no expert on how the inline processor works, but I think the reason we are not seeing big differences here is that inline nesting mostly does not go very deep. I believe this is what @waylan mentioned earlier. As I understand it, Datasample_doc.md, default pyperf timeit benchmarkv3.8.2 with no changes: Mean +- std dev: 50.6 ms +- 0.9 ms nested_doc.md, default pyperf timeit benchmarkHere I used v3.8.2 with no changes: Mean +- std dev: 156 ms +- 3 ms nested_doc.md, rigorous pyperf timeit benchmarkHere I used v3.8.2 with no changes: Mean +- std dev: 160 ms +- 9 ms TLDRThere does not appear to be a noteworthy performance impact from the It is possible to use |
- Don't register FootnoteReorderingProcessor if USE_DEFINITION_ORDER - Footnote reference numbering follows config - Test added - Config option added to docs with note on behavior change in v3.9.0 - Update changelog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent! This looks good to me. Thanks for doing the work on this.
Footnotes are now displayed in the order they are referenced in the document text, not in the order they are defined.
Tracking footnote references is something the logically fits within inline processing. But it must happen before tree processing, because that is where the list of footnotes is rendered to html. For that reason, it is done within the block processor.
Fixes #1367