@meso it's not that hard to implement yourself tbh, most of my time was spent wrestling file formats and shitty microsoft webservers.
Just figure out a way to cut your documents into small enough chunks with as much meaning as possibke in tact, run it through an embedding model and save the result into a database that can handle querying vectors with like 1000 dimensions