Skip to content

Improve output of tokens in Parse Errors #5722

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

IMSoP
Copy link
Contributor

@IMSoP IMSoP commented Jun 15, 2020

Currently, unexpected tokens in the parser are shown as the text found, plus the internal token name, including the notorious "unexpected '::' (T_PAAMAYIM_NEKUDOTAYIM)".

This PR replaces that with a more user-friendly format, with two types of token:

  • Tokens which always represent the same text are shown like unexpected token "::" and expected "::"
  • Tokens which have variable text are given a user-friendly name, and show like unexpected identifier "foo", and expected identifier.

As a special-case, quoted strings are not quoted an extra time, so show like unexpected quoted string "foo" rather than unexpected quoted string ""foo"" or unexpected quoted string "'foo'".

Examples

Parse error: syntax error, unexpected token "<<", expecting identifier or "static" or "namespace" or "\" in Command line code on line 1
Parse error: syntax error, unexpected token "::", expecting end of file in Command line code on line 1
Parse error: syntax error, unexpected identifier "foo", expecting "," or ";" in Command line code on line 1
Parse error: syntax error, unexpected quoted string "foo" in Command line code on line 1

For a larger selection of examples, see https://rwec.co.uk/x/php-parse-errors/comparison.html

TODO

  • Test for edge cases, such as zero-length tokens, backslashes.
  • Update all the phpt tests which expect a particular error output.
  • Put together some before and after examples for discussion / RFC.

@IMSoP IMSoP changed the title DRAFT: Improve output of tokens in Parse Errors Improve output of tokens in Parse Errors Jun 15, 2020
@nikic
Copy link
Member

nikic commented Jun 16, 2020

Update all the phpt tests which expect a particular error output; if I can get column numbers working, I will update the tests for both patches in one go.

You can use scripts/dev/bless_test.php to update tests. For column numbers, I expect that you will need #3948 as a base, so I'd suggest leaving that for a separate PR.

@IMSoP
Copy link
Contributor Author

IMSoP commented Jun 16, 2020

You can use scripts/dev/bless_test.php to update tests.

Ah, thanks. I was thinking there must be a way to automate that. (I also need to sort out a proper VM to run tests under, because WSL confuses them.)

For column numbers, I expect that you will need #3948 as a base, so I'd suggest leaving that for a separate PR.

Yes, I have an initial prototype working with the default Bison location struct (first_line, first_column, last_line, last_column), and was indeed planning to open a separate PR once I get it working properly. I see that patch was reverted on performance grounds, which I had wondered about; I'll discuss some further thoughts once I raise the PR.

@IMSoP IMSoP force-pushed the better-parse-error-tokens branch 2 times, most recently from 4f64bc2 to ea696b4 Compare June 25, 2020 21:57
@IMSoP
Copy link
Contributor Author

IMSoP commented Jun 25, 2020

  • Rebased to current master
  • Fixed handling of zero-length tokens
  • Blessed most tests to reflect the new format (there's a couple I want to look at more closely as they may be real bugs)

@KalleZ
Copy link
Member

KalleZ commented Jun 27, 2020

@hikari-no-yume

@hikari-no-yume
Copy link
Contributor

Instead of 'foo' or 'bar' or 'baz', would it be possible to do 'foo', 'bar', or 'baz'?

@IMSoP
Copy link
Contributor Author

IMSoP commented Jun 27, 2020

Instead of 'foo' or 'bar' or 'baz', would it be possible to do 'foo', 'bar', or 'baz'?

The bulk of the error message is generated by Bison; we currently only over-ride a function that formats the individual tokens, which are then substituted into a printf-style template. For the same reason, it's tricky to capitalise the "s" of "syntax error".

We could potentially abuse the i18n facility by providing our own YY_ macro to re-map the message templates, but I'm not sure the improvement is worth the complexity, and it might not even work correctly on all tool versions, as it's explicitly marked internal.

@IMSoP IMSoP force-pushed the better-parse-error-tokens branch 3 times, most recently from 3e426bd to 022e6cd Compare June 28, 2020 15:09
@IMSoP IMSoP marked this pull request as ready for review June 28, 2020 17:01
@IMSoP IMSoP force-pushed the better-parse-error-tokens branch 2 times, most recently from da71157 to d9c1c90 Compare July 5, 2020 16:01
Copy link
Member

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just some style nits.

@IMSoP IMSoP force-pushed the better-parse-error-tokens branch 3 times, most recently from 239bde7 to cea470c Compare July 12, 2020 18:32
IMSoP added 2 commits July 12, 2020 18:36
Currently, unexpected tokens in the parser are shown as the text
found, plus the internal token name, including the notorious
"unexpected '::' (T_PAAMAYIM_NEKUDOTAYIM)".

This commit replaces that with a more user-friendly format, with
two main types of token:

* Tokens which always represent the same text are shown like
  'unexpected token "::"' and 'expected "::"'
* Tokens which have variable text are given a user-friendly
  name, and show like 'unexpected identifier "foo"', and
  'expected identifer'.

A few tokens have special cases:

* unexpected token """ -> unexpected double-quote mark
* unexpected quoted string "'foo'" -> unexpected single-quoted
  string "foo"
* unexpected quoted string ""foo"" -> unexpected double-quoted
  string "foo"
* unexpected illegal character "_" -> unexpected character 0xNN
  (where _ is almost certainly a control character, and NN is the
   hexadecimal value of the byte)

The \ token has a special case in the implementation just to stop
bison making a mess of escaping it and it coming out as \\
@IMSoP IMSoP force-pushed the better-parse-error-tokens branch from cea470c to b7fa69d Compare July 12, 2020 18:36
@IMSoP
Copy link
Contributor Author

IMSoP commented Jul 12, 2020

Thanks for the review @nikic

Tidied, rebased to master, and with updated examples for the last string change here: https://rwec.co.uk/x/php-parse-errors/comparison.html

@@ -8,4 +8,4 @@ class A {}

?>
--EXPECTF--
Parse error: syntax error, unexpected '$x' (T_VARIABLE), expecting identifier (T_STRING) or static (T_STATIC) or namespace (T_NAMESPACE) or \\ (T_NS_SEPARATOR) in %s on line %d
Parse error: syntax error, unexpected variable "$x", expecting identifier or "static" or "namespace" or "\" in %s on line %d
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the syntax error start with a capital letter? If it's not too difficult to achieve then I'd love it, since a lot of messages have been capitalized for PHP 8. If this change would require a non-negligible amount of work, please ignore this request :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kocsismate I looked at that, and unfortunately that string comes from a different function somewhere deep in Bison's generated code. :(

The best approach I can think of is to copy it to a new string and fix the capital letter somewhere in the process of creating the Error object.

Copy link
Member

@kocsismate kocsismate Jul 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it requires quirks like that then I think we can also live with a lowercase letter :) Thanks for checking it!

@nikic
Copy link
Member

nikic commented Jul 13, 2020

Nice work, I really like the final outcome! Merged as 55a15f3.

@nikic nikic closed this Jul 13, 2020
@carusogabriel carusogabriel added this to the PHP 8.0 milestone Jul 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants