-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Improve output of tokens in Parse Errors #5722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
You can use scripts/dev/bless_test.php to update tests. For column numbers, I expect that you will need #3948 as a base, so I'd suggest leaving that for a separate PR. |
Ah, thanks. I was thinking there must be a way to automate that. (I also need to sort out a proper VM to run tests under, because WSL confuses them.)
Yes, I have an initial prototype working with the default Bison location struct (first_line, first_column, last_line, last_column), and was indeed planning to open a separate PR once I get it working properly. I see that patch was reverted on performance grounds, which I had wondered about; I'll discuss some further thoughts once I raise the PR. |
4f64bc2
to
ea696b4
Compare
|
Instead of |
The bulk of the error message is generated by Bison; we currently only over-ride a function that formats the individual tokens, which are then substituted into a printf-style template. For the same reason, it's tricky to capitalise the "s" of "syntax error". We could potentially abuse the i18n facility by providing our own |
3e426bd
to
022e6cd
Compare
da71157
to
d9c1c90
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just some style nits.
239bde7
to
cea470c
Compare
Currently, unexpected tokens in the parser are shown as the text found, plus the internal token name, including the notorious "unexpected '::' (T_PAAMAYIM_NEKUDOTAYIM)". This commit replaces that with a more user-friendly format, with two main types of token: * Tokens which always represent the same text are shown like 'unexpected token "::"' and 'expected "::"' * Tokens which have variable text are given a user-friendly name, and show like 'unexpected identifier "foo"', and 'expected identifer'. A few tokens have special cases: * unexpected token """ -> unexpected double-quote mark * unexpected quoted string "'foo'" -> unexpected single-quoted string "foo" * unexpected quoted string ""foo"" -> unexpected double-quoted string "foo" * unexpected illegal character "_" -> unexpected character 0xNN (where _ is almost certainly a control character, and NN is the hexadecimal value of the byte) The \ token has a special case in the implementation just to stop bison making a mess of escaping it and it coming out as \\
cea470c
to
b7fa69d
Compare
Thanks for the review @nikic Tidied, rebased to master, and with updated examples for the last string change here: https://rwec.co.uk/x/php-parse-errors/comparison.html |
@@ -8,4 +8,4 @@ class A {} | |||
|
|||
?> | |||
--EXPECTF-- | |||
Parse error: syntax error, unexpected '$x' (T_VARIABLE), expecting identifier (T_STRING) or static (T_STATIC) or namespace (T_NAMESPACE) or \\ (T_NS_SEPARATOR) in %s on line %d | |||
Parse error: syntax error, unexpected variable "$x", expecting identifier or "static" or "namespace" or "\" in %s on line %d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the syntax error
start with a capital letter? If it's not too difficult to achieve then I'd love it, since a lot of messages have been capitalized for PHP 8. If this change would require a non-negligible amount of work, please ignore this request :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kocsismate I looked at that, and unfortunately that string comes from a different function somewhere deep in Bison's generated code. :(
The best approach I can think of is to copy it to a new string and fix the capital letter somewhere in the process of creating the Error object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it requires quirks like that then I think we can also live with a lowercase letter :) Thanks for checking it!
Nice work, I really like the final outcome! Merged as 55a15f3. |
Currently, unexpected tokens in the parser are shown as the text found, plus the internal token name, including the notorious "unexpected '::' (T_PAAMAYIM_NEKUDOTAYIM)".
This PR replaces that with a more user-friendly format, with two types of token:
unexpected token "::"
andexpected "::"
unexpected identifier "foo"
, andexpected identifier
.As a special-case, quoted strings are not quoted an extra time, so show like
unexpected quoted string "foo"
rather thanunexpected quoted string ""foo""
orunexpected quoted string "'foo'"
.Examples
For a larger selection of examples, see https://rwec.co.uk/x/php-parse-errors/comparison.html
TODO