-
Notifications
You must be signed in to change notification settings - Fork 0
Optimize the start position of bounded look-behinds #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: captureless-lookbehinds
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes the starting position calculation for bounded look-behinds in the regex automata engine. Key changes include:
- Replacing the flat list of look-behind start states with a tree structure to capture nesting.
- Constructing and sorting a vector of look-behind tuples in PikeVM to ensure correct evaluation order.
- Updating the compiler and builder components to handle the new look-behind tree structure.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
regex-automata/src/nfa/thompson/pikevm.rs | Updated look-behind evaluation logic and added sorting of look-behind states. |
regex-automata/src/nfa/thompson/nfa.rs | Refactored look-behind representation from a vector of state IDs to a tree structure. |
regex-automata/src/nfa/thompson/compiler.rs | Modified look-around compilation to incorporate look-behind offset and nesting. |
regex-automata/src/nfa/thompson/builder.rs | Updated builder API to construct the look-behind tree according to nesting paths. |
Comments suppressed due to low confidence (3)
regex-automata/src/nfa/thompson/nfa.rs:1579
- [nitpick] The condition using preorder in the Debug implementation is not immediately clear. Adding an inline comment to clarify its intent would help future maintainers understand the logic.
.iter().any(|i| !i.preorder(&|e| e.start_id() != sid))
regex-automata/src/nfa/thompson/compiler.rs:1060
- [nitpick] Clarify in a comment the assumption behind combining relative_start and maximum_len to calculate start_offset, ensuring that the resulting value correctly bounds the look-around start position.
let start_offset = match (relative_start, maximum_len) {
regex-automata/src/nfa/thompson/builder.rs:722
- [nitpick] The new start_lookbehind API relies on a nesting path. Adding a brief example or more detailed inline comment about the structure and expected format of 'path' would improve clarity.
pub fn start_lookbehind(&mut self, start_id: StateID, offset_from_start: Option<usize>, path: &[usize]) {
*self.lookaround_index.borrow_mut() = SmallIndex::ZERO; | ||
*self.lookbehind_nesting_path.borrow_mut() = vec![0]; | ||
*self.current_lookbehind_offset_from_start.borrow_mut() = Some(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can/should be moved to the Builder
?
a4b50d3
to
4740ceb
Compare
No description provided.