Skip to content

Split out numpy and numpy-tests, and update to NumPy v2.3.1 #86

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Jun 26, 2025

Conversation

agriyakhetarpal
Copy link
Member

@agriyakhetarpal agriyakhetarpal commented May 3, 2025

This PR splits out numpy into two packages using the Meson install tags feature, one that installs runtime,python-runtime,devel and another that installs tests. This is an easier way to unvendor the tests from NumPy's wheels. It is to be noted that the package is still built fully – it is just that the relevant files are not installed into the wheel.

I've also updated to NumPy version 2.3.1, which was recently released, and dropped the associated patch from numpy/numpy#28936 as it is no longer needed.

@agriyakhetarpal
Copy link
Member Author

AttributeError: module 'numpy' has no attribute 'zeros' is rather a strange problem – does this suggest that NumPy's test unvendoring is broken?

@agriyakhetarpal
Copy link
Member Author

It still takes [34/36] (thread 4) built numpy-tests in 1m 24s, but that's much less than the 3.5 minutes required to build NumPy alone. So either this is working well and we are good to go, or that it isn't working well and the measurement is inaccurate – as NumPy is being built with other packages, and thus takes time – it could have taken 1m 24s too if no other packages were being built.

@ryanking13
Copy link
Member

I checked the build log of numpy-tests and it looks like it still builds all the source codes in numpy (you can download it from the GHA artifacts). Is it intended behavior?

numpy-tests.log

@agriyakhetarpal
Copy link
Member Author

Yes, this is intended behaviour. The install tags only affect what files are installed, i.e., copied into the final wheels. In this case, these are the test-specific extension modules and Python files, but all files are still built.

I checked the artifacts and it is working as expected: the numpy wheel is 3.1 MiB, and the numpy-tests wheel is 1.6 MiB, the total of which matches the regular wheel build size.

Now I have to figure out why it doesn't work… because if/when it does, we have previously seen that we can compile from a shared directory in around ~8 seconds, and we can take advantage of that.

@agriyakhetarpal
Copy link
Member Author

Ah, okay, I got why it doesn't work by checking the logs. The build dir for numpy-tests is:

Build dir: /home/runner/work/pyodide-recipes/pyodide-recipes/packages/numpy-tests/build/numpy-tests-2.2.5/build

when it should be:

Build dir: /home/runner/work/pyodide-recipes/pyodide-recipes/packages/numpy/build/numpy-2.2.5/build

We need to make the path: field in

source:
  path: ../numpy/build/numpy-2.2.5/

work to do so. The documentation for source/path says that relative paths are already supported, so maybe it's just that I didn't add this correctly?

@ryanking13
Copy link
Member

ryanking13 commented May 4, 2025

work to do so. The documentation for source/path says that relative paths are already supported, so maybe it's just that I didn't add this correctly?

No, I think it is how pyodide-build works (for now). When one sets source/path, we copy it to the build directory first to avoid polluting the source directory (code pointer).

So what happens now is /numpy/build/numpy-2.2.5/ directory is copied to numpy-tests/build/numpy-test-2.2.5/, and build happens in that directory.

Now I have to figure out why it doesn't work… because if/when it does, we have previously seen that we can compile from a shared directory in around ~8 seconds, and we can take advantage of that.

One possibility I can think of is that the mtime of the files was modified when the file was copied, and because of that, meson thinks it needs to recompile all the files, which causes all the compilation to happen again. Or, if recompilation is the expected behavior, it could be that the path of the file has changed, causing a cache miss.

@ryanking13
Copy link
Member

I know that this is in contrast to #74

BTW, this is not in contrast to #74, my intention for #74 was to remove all internal test packages such as cpp-exception-test or fpcast-test, not unvendored tests for other packages.

@agriyakhetarpal
Copy link
Member Author

agriyakhetarpal commented May 4, 2025

we copy it to the build directory first to avoid polluting the source directory (code pointer).

I see, that makes sense. Considering that we've been advertising/using this for local testing and debugging only, would you be willing to change this behaviour? That is, we could modify source/path so that we don't copy anything to the /build/ directory, and rather assume that this field will be used to represent the location of an already extracted archive (which is why we don't need the checksum for verification either).

I think such a way would be easier to manage in comparison to comparing mtimes.

However, one issue will still exist which we haven't sorted out: numpy-tests is separately installable and shouldn't need to depend on NumPy at either build-time or runtime, but it does here because there's no other way to ensure that NumPy is already built. In our case, we don't have the flexibility to change this. One hacky way could be to build all recipes first, and build numpy-tests in a separate pyodide build-recipes command.

@ryanking13
Copy link
Member

Considering that we've been advertising/using this for local testing and debugging only, would you be willing to change this behaviour? That is, we could modify source/path so that we don't copy anything to the /build/ directory, and rather assume that this field will be used to represent the location of an already extracted archive (which is why we don't need the checksum for verification either).

Yeah, I think you can try to see if it works. I guess changing the code in _prepare_source to something like

        srcdir = self.source_metadata.path.resolve()
+        self.src_extract_dir = srcdir
-        if not srcdir.is_dir():
-            raise ValueError(f"path={srcdir} must point to a directory that exists")
-
-        def ignore(path: str, names: list[str]) -> list[str]:
-            ignored: list[str] = []
-
-            if fnmatch.fnmatch(path, "*/dist"):
-                # Do not copy dist/*.whl files from a dirty source tree;
-                # this can lead to "Exception: Unexpected number of wheels" later.
-                ignored.extend(name for name in names if name.endswith(".whl"))
-            return ignored
-
-        shutil.copytree(srcdir, self.src_extract_dir, ignore=ignore)

would do the job, but not 100% sure.

Reusing the src directory for the build directory will make cleaning up the build directory or rebuilding a little bit more complex, but I think it is not so big problem.

there's no other way to ensure that NumPy is already built.

I think the current approach (setting numpy as a host dependency) is sufficient, at least for now. It's a very special case for two recipes to share source code like this, and I don't want to add too much complex behavior to handle this.

@agriyakhetarpal agriyakhetarpal changed the title numpy and numpy-tests debugging (do not merge) numpy and numpy-tests debugging (do not merge) [full build] May 20, 2025
@agriyakhetarpal
Copy link
Member Author

Sorry for getting back to this a little late, and thanks for this patch! Yes, it worked for the build case perfectly – barring one minor problem, where the numpy-tests wheel gets placed in two locations: packages/numpy-tests/dist/ (which is where we want it to be), and also in packages/numpy-tests/build/numpy-tests-2.2.5/dist/. I assume that the wheel is copied into the former directory from the latter directory. This is quite easily overridable using the --install-dir, which we are already setting.

However, I do have my reservations around if we need such a patch at all in this case (especially for such a special case). I've pushed a few commits, please feel free to take a look!

@agriyakhetarpal
Copy link
Member Author

agriyakhetarpal commented May 21, 2025

Okay, so apparently pyodide build-recipes "!numpy-tests" is skipping numpy as well here, when it should only skip the entire package name. This looks like a bug where we parse the package queries in the graph builder. I'll take a look at it.

I resolved this by using '*,!numpy-tests' instead, which seems to work well.

@agriyakhetarpal agriyakhetarpal changed the title numpy and numpy-tests debugging (do not merge) Split out numpy and numpy-tests Jun 24, 2025
@agriyakhetarpal
Copy link
Member Author

Okay, there's just one test to get through:

@pytest.mark.skip_pyproxy_check
def test_runpythonasync_numpy(selenium_standalone):
    selenium_standalone.run_async(
        """
        import numpy as np
        x = np.zeros(5)
        """
    )
    for i in range(5):
        assert selenium_standalone.run_js(
            f"return pyodide.globals.get('x').toJs()[{i}] == 0"
        )

which says

FAILED packages/numpy/test_numpy.py::test_runpythonasync_numpy[chrome] - pytest_pyodide.runner.JavascriptException: PythonError: Traceback (most recent call last):
  File "/lib/python313.zip/_pyodide/_base.py", line 597, in eval_code_async
    await CodeRunner(
    ...<9 lines>...
    .run_async(globals, locals)
  File "/lib/python313.zip/_pyodide/_base.py", line 411, in run_async
    coroutine = eval(self.code, globals, locals)
  File "<exec>", line 3, in <module>
AttributeError: module 'numpy' has no attribute 'zeros'

I'm not too sure why, but I can debug it with NumPy outside this build – probably broken from the test splits.

Copy link

github-actions bot commented Jun 24, 2025

Package Build Results

Total packages built: 30
Total build time: 0:05:22

Package Build Times (click to expand)
Package Build Time
openssl 4m 16s
numpy 3m 48s
sqlite3 1m 39s
numpy-tests 1m 33s
liblzma 1m 9s
test 25s
regex 12s
ssl 12s
hashlib 11s
lzma 5s
pydecimal 4s
pydoc_data 4s
MarkupSafe 4s
atomicwrites 3s
packaging 2s
pytz 1s
exceptiongroup 1s
pytest 1s
Jinja2 1s
more-itertools 1s
micropip 1s
iniconfig 1s
attrs 1s
tblib 1s
pytest-asyncio 1s
pluggy 1s
py 1s
setuptools 1s
pyparsing 1s
six 0s

Longest build: openssl (4m 16s)
Packages built in more than 10 minutes: 0

@agriyakhetarpal agriyakhetarpal marked this pull request as ready for review June 24, 2025 18:46
@agriyakhetarpal agriyakhetarpal changed the title Split out numpy and numpy-tests Split out numpy and numpy-tests, and update to NumPy v2.3.1 Jun 24, 2025
@ryanking13
Copy link
Member

I'm not too sure why, but I can debug it with NumPy outside this build – probably broken from the test splits.

I saw this happening time to time. I think there is an unknown flakiness in our build system, but I really don't understand why.

Copy link
Member

@ryanking13 ryanking13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks for your work @agriyakhetarpal, and also thanks for updating the numpy version.

After merging this, we should also update the numpy version in pyodide/pyodide to align the numpy version in xbuildenv (yes, it is a bit annoying but should be done until we implmenent pyodide/pyodide-build#43).

@@ -3,7 +3,7 @@


def test_numpy(selenium):
selenium.load_package("numpy")
selenium.load_package(["numpy", "numpy-tests"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to import numpy-tests in this file.

Would you like to add a separate test file under packages/numpy-tests that actually runs some tests that are included in numpy-tests instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks! I added the full test suite in ae85b6d, and I will reduce it in subsequent commits before we merge this, so that CI time is not impacted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a subset of the tests that would be fine for us would be those from numpy.linalg, numpy.fft, numpy.polynomial, numpy.random, and numpy.lib. We can leave out numpy.f2py (won't work anyway), numpy.strings, numpy.char, and numpy.ma. This is, unless you may have any other ideas here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, numpy is very important so I am okay with having more tests, but running all test suite will be too time consuming, so it would be nice if we could find a good Goldilocks point.

@agriyakhetarpal
Copy link
Member Author

I tried to run the tests locally after splitting them out, but I don't think this is the right approach for testing them. For example, https://docs.scipy.org/doc/scipy/building/redistributable_binaries.html does not mention how to run the tests from the split wheels. I notice that the numpy-tests package is slightly broken, it says ERROR: module or package not found: numpy (missing __init__.py?).

One way to get around this is to install numpy-tests, copy the package tree entirely (which consists of just the tests), uninstall it, install numpy, copy the tests into the numpy wheels, and then run the tests from --pyargs numpy. In such a case, it is just better to build and test numpy without removing its tests at all.

I tried to follow the approach in pandas-dev/pandas#53007, but I don't think this is something we should be doing here – we should wait for developments on numpy/numpy#26289 first. It is only after numpy-tests becomes installable as a separate package alongside numpy that we will be able to test this properly.

So, would you be okay with proceeding without these tests, given that NumPy functionality is being tested out-of-tree to a reasonable extent, and also here in the numpy recipe, where the tests have been left unchanged in this PR? These would have been just additional tests, and no tests have been removed at this time as part of the split.

@ryanking13
Copy link
Member

So, would you be okay with proceeding without these tests, given that NumPy functionality is being tested out-of-tree to a reasonable extent, and also here in the numpy recipe, where the tests have been left unchanged in this PR? These would have been just additional tests, and no tests have been removed at this time as part of the split.

Sure, that is okay with me.

@agriyakhetarpal
Copy link
Member Author

Thanks! I'll open a follow-up issue to discuss more testing for NumPy and a follow-up PR to update NumPy on the Pyodide repository side.

@agriyakhetarpal agriyakhetarpal merged commit d7901ed into pyodide:main Jun 26, 2025
4 checks passed
@agriyakhetarpal agriyakhetarpal deleted the numpy-tests branch June 26, 2025 10:14
ryanking13 added a commit that referenced this pull request Jul 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants