Skip to content

Optimize the start position of bounded look-behinds #11

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: captureless-lookbehinds
Choose a base branch
from

Conversation

shilangyu
Copy link
Collaborator

No description provided.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the starting position calculation for bounded look-behinds in the regex automata engine. Key changes include:

  • Replacing the flat list of look-behind start states with a tree structure to capture nesting.
  • Constructing and sorting a vector of look-behind tuples in PikeVM to ensure correct evaluation order.
  • Updating the compiler and builder components to handle the new look-behind tree structure.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
regex-automata/src/nfa/thompson/pikevm.rs Updated look-behind evaluation logic and added sorting of look-behind states.
regex-automata/src/nfa/thompson/nfa.rs Refactored look-behind representation from a vector of state IDs to a tree structure.
regex-automata/src/nfa/thompson/compiler.rs Modified look-around compilation to incorporate look-behind offset and nesting.
regex-automata/src/nfa/thompson/builder.rs Updated builder API to construct the look-behind tree according to nesting paths.
Comments suppressed due to low confidence (3)

regex-automata/src/nfa/thompson/nfa.rs:1579

  • [nitpick] The condition using preorder in the Debug implementation is not immediately clear. Adding an inline comment to clarify its intent would help future maintainers understand the logic.
                .iter().any(|i| !i.preorder(&|e| e.start_id() != sid))

regex-automata/src/nfa/thompson/compiler.rs:1060

  • [nitpick] Clarify in a comment the assumption behind combining relative_start and maximum_len to calculate start_offset, ensuring that the resulting value correctly bounds the look-around start position.
let start_offset = match (relative_start, maximum_len) {

regex-automata/src/nfa/thompson/builder.rs:722

  • [nitpick] The new start_lookbehind API relies on a nesting path. Adding a brief example or more detailed inline comment about the structure and expected format of 'path' would improve clarity.
pub fn start_lookbehind(&mut self, start_id: StateID, offset_from_start: Option<usize>, path: &[usize]) {

Comment on lines 977 to 979
*self.lookaround_index.borrow_mut() = SmallIndex::ZERO;
*self.lookbehind_nesting_path.borrow_mut() = vec![0];
*self.current_lookbehind_offset_from_start.borrow_mut() = Some(0);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can/should be moved to the Builder?

@shilangyu shilangyu force-pushed the feat/pikevm_bounded_lb branch from a4b50d3 to 4740ceb Compare June 27, 2025 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant