Skip to content

JavaScript: Add new query InvalidEntityTranscoding. #556

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
2 commits merged into from
Dec 3, 2018

Conversation

xiemaisi
Copy link

A simple, lightweight query that spots a common mistake people make when writing HTML entity encoders/decoders: when encoding, & has to be encoded first to avoid double-encoding ampersands introduced by the encoding of other characters; conversely, when decoding it has to be decoded last to avoid the decoded ampersand being interpreted as part of an entity reference later on.

Finds good results on LGTM.com, here is the full report (internal link). One of the results is in a moderately popular (but not very actively maintained) utility-belt library.

@esben-semmle suggested looking for a similar problem with URL encoding and %, but a quick exploratory query seems to indicate that people don't often implement URL transcoding by hand.

@xiemaisi xiemaisi added the JS label Nov 28, 2018
@xiemaisi xiemaisi requested a review from a team as a code owner November 28, 2018 10:29
@xiemaisi
Copy link
Author

No particular rush to review this, it's not going into 1.19.

asger-semmle
asger-semmle previously approved these changes Nov 28, 2018
Copy link
Contributor

@asger-semmle asger-semmle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, just a clarification and maybe a comment.

* A call to `String.prototype.replace` that replaces all instances of a pattern.
*/
class Replacement extends DataFlow::Node {
RegExpLiteral pattern;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it string patterns here will always be reported by IncompleteSanitization, and that's why we don't include them here? Maybe worth leaving a comment about this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct; I'll add a note.

asger-semmle
asger-semmle previously approved these changes Nov 28, 2018
@xiemaisi xiemaisi requested a review from mchammer01 November 28, 2018 13:20
@xiemaisi
Copy link
Author

xiemaisi commented Nov 28, 2018

Ping @mc-semmle for doc review (but #555 #559 is more urgent).

@esbena
Copy link
Contributor

esbena commented Nov 28, 2018

LGTM too, but I see two features that could go in a later PR.

How about escapes with backslashes and the escaping of the backslash it self?

getPreviousReplacement seems relevant for:

https://github.com/Semmle/ql/blob/d31c9950f9a9dec68d06c2d7af78761520879871/javascript/ql/src/Security/CWE-116/IncompleteSanitization.ql#L88-L94

@xiemaisi
Copy link
Author

How about escapes with backslashes and the escaping of the backslash it self?

That's a very interesting idea; let me play with that a little bit.

Copy link
Contributor

@mchammer01 mchammer01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiemaisi - this LGTM. I only made a very minor suggestion, which you can reject if you think it's OTT. Just let me know and I'll approve instead of requesting changes. Thanks!

</p>

<p>
Instead, the decoding function should decode ampersand last:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny suggestion:
Instead, the decoding function should decode THE ampersand last:

@xiemaisi xiemaisi force-pushed the js/invalid-entity-transcoding branch 2 times, most recently from 713a27d to 9fab6a9 Compare November 29, 2018 11:56
@xiemaisi
Copy link
Author

I have significantly rewritten the query based on @esben-semmle's suggestion, generalising it beyond HTML transcoding to a few other kinds of (un-)escaping.

I like the new results a lot, for instance this one from an old version of underscore. Full comparison is running.

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generalization LGTM, thank you for supporting the suggestion.

@xiemaisi
Copy link
Author

Results from full run look good as well: internal link.

@xiemaisi xiemaisi force-pushed the js/invalid-entity-transcoding branch from cf0020f to 10166be Compare November 30, 2018 09:39
@xiemaisi
Copy link
Author

Rebased to resolve conflict on change notes.

Perhaps @mc-semmle wants to take another look at the query help, which has been rewritten to reflect the expanded scope of the query (but no rush; it's not relevant for 1.19).

mchammer01
mchammer01 previously approved these changes Nov 30, 2018
@mchammer01
Copy link
Contributor

Documentation LGTM, I've approved the changes :)

@xiemaisi
Copy link
Author

Thanks, @mc-semmle! I've resolved the conflict on the change notes, so this should be good to go in once it's green again.

@ghost ghost merged commit 2cc235d into github:master Dec 3, 2018
@xiemaisi xiemaisi deleted the js/invalid-entity-transcoding branch December 3, 2018 09:40
cklin pushed a commit that referenced this pull request May 23, 2022
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants