Skip to content

feat(index): append #1282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

cmp0xff
Copy link
Contributor

@cmp0xff cmp0xff commented Jul 18, 2025

  • Tests added: Please use assert_type() to assert the type of any return value

Index.append used not to be typed. In this PR, typings for Index.append are added and tested.

@overload
def append(self, other: Index[Never]) -> Index: ...
@overload
def append(self, other: Index[S1] | Sequence[Index[S1]]) -> Index[S1]: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be nicer here to have a second S1 here as the resulting Index can contain a mix of different types.

def append(self, other: Index[S2] | Sequence[Index[S2]]) -> Index[S1 | S2]: ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

45e0ad3, but this one works less well.

  • mypy is not happy with Index[int].append(Index[int | str]) and gives Index[Any]
  • pyright is not happy with Index[int | str].append([Index[int], Index[str]]) and gives Index[int | Any]. In particular, the typing for [Index[int], Index[str]] seems to be list[Index[int] | Index[str]], instead of list[Index[int | str]].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • mypy is not happy with Index[int].append(Index[int | str]) and gives Index[Any]
  • pyright is not happy with Index[int | str].append([Index[int], Index[str]]) and gives Index[int | Any]. In particular, the typing for [Index[int], Index[str]] seems to be list[Index[int] | Index[str]], instead of list[Index[int | str]].

While that is annoying for testing on the CI, I think that is the safer choice for user: rather expect a wider type that includes Any than suggesting it is a narrower type. This needs input from @Dr-Irv.

If S1 and S2 were covariant, it seems to work for at least pyright in a simple toy example (but they are invariant)

from __future__ import annotations
from typing import TypeVar, reveal_type, Generic, Sequence

S1 = TypeVar("S1", bound=int | str, covariant=True)
S2 = TypeVar("S2", bound=int | str, covariant=True)

class Index(Generic[S1]):
    def __init__(self, data: list[S1]) -> None: ...

    def append(self: Index[S1], other: Sequence[Index[S2]]) -> Index[S1 | S2]: ...

strings = Index(["a"])
reveal_type(strings)
ints = Index([1])
reveal_type(ints)

reveal_type(strings.append([ints]))
reveal_type(ints.append([strings]))

string_ints = Index(["a", 1])
reveal_type(string_ints)
reveal_type(string_ints.append([ints]))
reveal_type(strings.append([string_ints]))

reveal_type(strings.append([ints]))
reveal_type(strings.append([strings, ints]))

reveal_type(strings.append([ints, strings, string_ints]))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If S1 and S2 were covariant, it seems to work for at least pyright in a simple toy example (but they are invariant)

from __future__ import annotations
from typing import TypeVar, reveal_type, Generic, Sequence

S1 = TypeVar("S1", bound=int | str, covariant=True)
S2 = TypeVar("S2", bound=int | str, covariant=True)

class Index(Generic[S1]):
    def __init__(self, data: list[S1]) -> None: ...

    def append(self: Index[S1], other: Sequence[Index[S2]]) -> Index[S1 | S2]: ...

Hi, I am new to covariance / contravariance, but I read PEP484 (covariance-and-contravariance) and it says covariant is for classes, not for functions, where the latter case is prohibited. In your example, S1 is find, but not S2. Could you help me and explain a bit? Thanks.

B_co = TypeVar('B_co', covariant=True)

def bad_func(x: B_co) -> B_co:  # Flagged as error by a type checker
    ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't change S1 to be covariant. While the following is not exactly what we like to have, it is probably the closest we can get (but it doesn't work with mypy, unless the caller casts).

from __future__ import annotations
from typing import TypeVar, reveal_type, Generic, Sequence

S1 = TypeVar("S1", bound=int | str)
IndexT = TypeVar("IndexT", bound="Index")


class Index(Generic[S1]):
    def __init__(self, data: list[S1]) -> None: ...

    def append(self: Index[S1], other: Sequence[IndexT]) -> Index[S1] | IndexT: ...


strings = Index(["a"])
reveal_type(strings)
ints = Index([1])
reveal_type(ints)

reveal_type(strings.append([ints]))
reveal_type(ints.append([strings]))

string_ints = Index(["a", 1])
reveal_type(string_ints)
reveal_type(string_ints.append([ints]))
reveal_type(strings.append([string_ints]))

reveal_type(strings.append([ints]))
reveal_type(strings.append([strings, ints]))

reveal_type(strings.append([ints, strings, string_ints]))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I ran the script myself. In the most complicated case, I see Index[int | str] | Index[int] | Index[str]. To be honest, as a user I would rather see Index[Unknown], because it's simpler, and in both cases I would probably still need a manual cast. Nevertheless, 3844062

@cmp0xff cmp0xff requested a review from twoertwein July 18, 2025 16:43
@overload
def append(self, other: Index[S1] | Sequence[Index[S1]]) -> Self: ...
@overload
def append(self, other: Index[S2] | Sequence[Index[S2]]) -> Index[S1 | S2]: ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's right. I think the result would be Index[S1] | Index[S2] .

I'm not sure how the downstream Index stuff would work with the union type inside the generic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we append Index([1]) to Index(["a"]), we actually get Index(["a", 1]), which has the type Index[str | int].

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can't narrow it to a specific type that is one of the types in S1, then I think it will need to be just Index without specifying the subtype, or Index[Any], or even UnknownIndex .

Right now, if you did Index(["a", 1]), the revealed type by pyright is Index[str | int], but with mypy it is Index[Any], so I think I want to avoid placing unions inside the generic type because I don't know what else will break and there could be inconsistencies in the type checkers in handling this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

67e6bde, except for that mypy is unable to handle pd.Index([1, "a"]).append(pd.Index([1, "a"])) and gives Index[Any].

@overload
def append(self, other: Index[S2] | Sequence[Index[S2]]) -> Index[S1 | S2]: ...
@overload
def append(self, other: Sequence[_T_INDEX]) -> Self | _T_INDEX: ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we use Sequence[Index] here without the TypeVar ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea came from #1282 (comment). Removed in 67e6bde.

Comment on lines 1041 to 1062
check(assert_type(first.append(third), "pd.Index[int | str]"), pd.Index) # type: ignore[assert-type]
check(assert_type(first.append([third]), "pd.Index[int | str]"), pd.Index) # type: ignore[assert-type]
check(
assert_type( # type: ignore[assert-type]
first.append([second, third]), # pyright: ignore[reportAssertTypeFailure]
"pd.Index[int | str]",
),
pd.Index,
)

check(assert_type(third.append([]), "pd.Index[int | str]"), pd.Index) # type: ignore[assert-type]
check(
assert_type(third.append(cast("list[Index[Any]]", [])), "pd.Index[int | str]"), # type: ignore[assert-type]
pd.Index,
)
check(assert_type(third.append([first]), "pd.Index[int | str]"), pd.Index) # type: ignore[assert-type]
check(
assert_type( # type: ignore[assert-type]
third.append([first, second]), # pyright: ignore[reportAssertTypeFailure]
"pd.Index[int | str]",
),
pd.Index,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests need to work without having ignore in them. So you'll need to fix the types in the append() declarations to make that happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three type: ignore remains, which I believe are mypy bugs. I tried to add

    @overload
    def append(self, other: Sequence[Never]) -> Self: ...

as the first overload, which did not help.

Comment on lines 1085 to 1090
"""Test pd.Index[list[str]].append"""
first = pd.Index([["str", "rts"]])
second = pd.Index([["srt", "trs"]])
check(assert_type(first.append([]), "pd.Index[list[str]]"), pd.Index, list)
check(assert_type(first.append(second), "pd.Index[list[str]]"), pd.Index, list)
check(assert_type(first.append([second]), "pd.Index[list[str]]"), pd.Index, list)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be supporting Index[list[str]] because list[str] is not hashable and labels in an Index should be hashable.

But this is a bug in pandas, I think. See pandas-dev/pandas#61937

So can you remove this test.

Then we will have to separate out list[str] from S1 and have an I1 that includes everything in S1 except list[str], while S1 includes everything.

So can you make that change as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing Index[list[str]] is hard. The following code runs:

import pandas as pd

pd.Index(["a_b"]).str.split("_")

It produces Index([['a', 'b']], dtype='object'). We have Index[S1] all over pandas-stubs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it really isn't supported to have list[str] in an Index:

ind = pd.Index(["a_b", "c_d"])
spl = ind.str.split("_")
spl.duplicated()

This will create an exception because the list is unhashable.

Might be best in this PR to just leave out the test with append and Index[list[str]] and not worry for now about splitting it out until the issue in pandas is straightened out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmp0xff cmp0xff requested a review from Dr-Irv July 29, 2025 11:13
Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not return things like Index[str | int] and just make that Index

def append(self, other: Index[S2] | Sequence[Index[S2]]) -> Index[S1 | S2]: ...
@overload
def append(self, other: Sequence[_T_INDEX]) -> Self | _T_INDEX: ...
def append(self, other: Index[S2]) -> Index[S1 | S2]: ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def append(self, other: Index[S2]) -> Index[S1 | S2]: ...
def append(self, other: Index[S2]) -> Index: ...

I really want to avoid having the union types here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check(
assert_type( # type: ignore[assert-type]
first.append([second, third]), # pyright: ignore[reportAssertTypeFailure]
"pd.Index[int | str]",
first.append(third), "pd.Index[int | str]"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be Index[Any] (or just pd.Index)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d7a538d. Now just two # type: ignore[assert-type]s remain.

@cmp0xff cmp0xff requested a review from Dr-Irv July 29, 2025 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants