
Our server uses Apache Tika to process the files, giving us a wide variety of supported formats. Yes! Relevanssi can handle lots of different formats. The indexing server has a hard file size limit of 256 megabytes. If you can select the text, Relevanssi can read it, but if you can’t, the text is an image (for example, a scanned document that hasn’t been OCR processed), and Relevanssi can’t read it. An easy way to check is to try to select the text in a PDF reader. If the PDF file is all images, Relevanssi cannot read it. Relevanssi can only parse and read PDF files that contain text. That’s fortunately really simple: upload your PDF files to the Media library, and they become posts with the post type of attachment. So, to have Relevanssi index your PDFs, they need to be WordPress posts. Since Relevanssi is a WordPress search, Relevanssi operates on WordPress posts (including all the different post types). Our PDF indexer doesn’t tax your server as it runs as a service on a separate server. Coming up with a fast and reliable method hasn’t been easy, but we’re pretty proud of what we have now.


Relevanssi Premium users have asked for PDF indexing since day one, and version 2.0 finally introduced this feature.
