Skip to content

Bytes Vectors from r.hget vs Bytes string returned from r.ft().search(query="*") #2772

@ghost

Description

Redis Python Lib Version: version 4.5.5

Redis Stack Version: version 7.0.0

Platform: Python 3.10.6 and Ubuntu 22.04

Description: Description of your issue, stack traces from errors and code that reproduces the issue

After storing a bunch of numpy vectors in bytes in HSETs and creating an index (FT), I am trying to retrieve all of the embeddings using FT.SEARCH with "*" query, however, the vector is returned in a string that differs from the bytes format I get when using HGET. I'll add a few line of code as an example:

import redis
import os
import numpy as np

_redis_match_config = os.getenv("NQAI_REDIS_MATCH_CONFIG")
fake_vec = np.array([0.1,0.2,0.3,0.4])
r = redis.Redis(**_redis_match_config)
expert_hash = {"person_id":1, "vector_emb" : fake_vec.astype(np.float32).tobytes()}
r.hset("person:1", mapping=expert_hash)
index_name = "person"
person_prefix = f"{index_name}:"
vector_search_attributes = {"TYPE": "FLOAT32", "DIM": 4, "DISTANCE_METRIC": "COSINE"}
schema = (
                    TagField("person_id"),
                    VectorField("embeddings_bio", algorithm="HNSW", attributes=vector_search_attributes)
                    )

r.ft(index_name).create_index(fields=schema, definition=IndexDefinition(prefix=[person_prefix], index_type=IndexType.HASH))

byets_person_1 = r.hget("person:1", "vector_emb")
print(byets_person_1)
print(np.frombuffer(byets_person_1, dtype=np.float32))
> output : b"\xcd\xcc\xcc=\xcd\xccL>\x9a\x99\x99>\xcd\xcc\xcc>"
> output : array([0.1, 0.2, 0.3, 0.4], dtype=float32)

However, when I do:

query = (
                    Query("*")
                    .return_fields("id", "vector_emb",)
                )
all_of = r.ft(index_name).search(query=query, query_params={}).docs
print(all_of[0]["vector_emb"])
print(all_of[0]["vector_emb"].encode("utf-32"))
print(np.frombuffer(bytes(all_of[0]["vector_emb"].encode("utf-32")), dtype=np.float32))
> output : "=L>>>"
> output: b'\xff\xfe\x00\x00=\x00\x00\x00L\x00\x00\x00>\x00\x00\x00>\x00\x00\x00>\x00\x00\x00'
> output : array([9.1475e-41 8.5479e-44 1.0650e-43 8.6881e-44 8.6881e-44 8.6881e-44], dtype=float32)

I have tried different combinations of .encode("utf-xx") and dtype=np.floatxx to no avail! Please help. Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions