Skip to content

updating step_times for e2e tests to avoid false positives using recent data #346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 25, 2025

Conversation

vlasenkoalexey
Copy link
Collaborator

  • fixing update_step_time tool
  • expanding confidence level to 99%
  • updating step time bounds to eliminate false positives

Copy link
Collaborator

@jialei777 jialei777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you get those number? Seems your "step_time" is from both old runs and new runs with regression, is this the right thing to do?

@vlasenkoalexey vlasenkoalexey requested a review from jialei777 July 23, 2025 18:27
Copy link
Collaborator

@jialei777 jialei777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the pr title as it is no longer "expanding confidence level", just recompute the bounds with latest data.

@vlasenkoalexey vlasenkoalexey merged commit db2d731 into main Jul 25, 2025
16 checks passed
@vlasenkoalexey vlasenkoalexey deleted the alekseyv_llama4_training2 branch July 25, 2025 21:27
@vlasenkoalexey vlasenkoalexey changed the title expanding confidence level on e2e tests to eliminate false positives updating step_times for e2e tests to avoid false positives Jul 25, 2025
@vlasenkoalexey vlasenkoalexey changed the title updating step_times for e2e tests to avoid false positives updating step_times for e2e tests to avoid false positives using recent data Jul 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants