-
Notifications
You must be signed in to change notification settings - Fork 1.8k
JavaScript: Add new query InvalidEntityTranscoding
.
#556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
No particular rush to review this, it's not going into 1.19. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, just a clarification and maybe a comment.
* A call to `String.prototype.replace` that replaces all instances of a pattern. | ||
*/ | ||
class Replacement extends DataFlow::Node { | ||
RegExpLiteral pattern; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it string patterns here will always be reported by IncompleteSanitization, and that's why we don't include them here? Maybe worth leaving a comment about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct; I'll add a note.
LGTM too, but I see two features that could go in a later PR. How about escapes with backslashes and the escaping of the backslash it self?
|
That's a very interesting idea; let me play with that a little bit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiemaisi - this LGTM. I only made a very minor suggestion, which you can reject if you think it's OTT. Just let me know and I'll approve instead of requesting changes. Thanks!
</p> | ||
|
||
<p> | ||
Instead, the decoding function should decode ampersand last: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny suggestion:
Instead, the decoding function should decode THE ampersand last:
713a27d
to
9fab6a9
Compare
I have significantly rewritten the query based on @esben-semmle's suggestion, generalising it beyond HTML transcoding to a few other kinds of (un-)escaping. I like the new results a lot, for instance this one from an old version of underscore. Full comparison is running. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The generalization LGTM, thank you for supporting the suggestion.
Results from full run look good as well: internal link. |
cf0020f
to
10166be
Compare
Rebased to resolve conflict on change notes. Perhaps @mc-semmle wants to take another look at the query help, which has been rewritten to reflect the expanded scope of the query (but no rush; it's not relevant for 1.19). |
Documentation LGTM, I've approved the changes :) |
Thanks, @mc-semmle! I've resolved the conflict on the change notes, so this should be good to go in once it's green again. |
A simple, lightweight query that spots a common mistake people make when writing HTML entity encoders/decoders: when encoding,
&
has to be encoded first to avoid double-encoding ampersands introduced by the encoding of other characters; conversely, when decoding it has to be decoded last to avoid the decoded ampersand being interpreted as part of an entity reference later on.Finds good results on LGTM.com, here is the full report (internal link). One of the results is in a moderately popular (but not very actively maintained) utility-belt library.
@esben-semmle suggested looking for a similar problem with URL encoding and
%
, but a quick exploratory query seems to indicate that people don't often implement URL transcoding by hand.