Home Learn More Purchase Knowledge Base Support Contact

Community Forums

Forum » KBPublisher General discussion

xpdf integration

(9 posts)
  • Started 1 year ago by cnielsen
  • Latest reply from rocket2009

  1. cnielsen
    Member

    I've integrated xpdf for searching in pdf files. The command "pdftotext -raw example.pdf example.txt" works fine. Here's my config.inc.php:

    <?php
    $win = (substr(PHP_OS, 0, 3) == "WIN");

    // change this if you install xpdf to other directory
    $file_conf['extract_tool']['pdf'] = ($win) ? APP_EXTRA_MODULE_DIR . 'file_extractors/xpdf/win/'
    : '/usr/local/groundwork/apache2/htdocs/kb/admin/extra/file_extractors/xpdf/win/';
    ?>

    Have someone made this running in his environment?
    Thank's!

    Posted 1 year ago #
  2. Sorry, what is the question? Does it work for you?

    Posted 1 year ago #
  3. cnielsen
    Member

    sry for the misunderstanding - no it doesnt work for me in my kbpublisher-installation. i can convert pdf's to txt's on the command prompt, but i can not search in pdf documents attached to kb-articles.

    Posted 1 year ago #
  4. This tool does not search in files, it extract index raw text from pdf files and KBP index such files.
    So search will be possible for new uploaded files (after xpdx installation).
    In next KBPublisher release it will be possible to reindex existing files.

    Make tests with php and real path.
    php system('/usr/path_to_xpdf/pdftotext -raw file_read.pdf file_write.txt', $return);

    try to set with $file_conf['extract_tool']['pdf'] = '';

    Posted 1 year ago #
  5. cnielsen
    Member

    so when i upload a new pdf file, kbp will extract the pdf to txt and then i'll have two files in my kb_file folder, one called test.pdf and one called test.txt, is this correct? and how long do i have to wait for indexing the new files?

    Posted 1 year ago #
  6. Text from pdf fle will be extracted to the database, it wil be indexed by MySQL fulltext index.
    Text extracted when you add file.
    There is a "Text" field in files listing if extraction successful then you can able to see extracted text.

    Posted 1 year ago #
  7. cnielsen
    Member

    ahh now i see how it work's :-) sry it's nothing of my daily business...
    and how long do i have to wait till the mysql fulltext index run's? every hour or when?

    Posted 1 year ago #
  8. It runs when you add/update file. No need to wait. Files added before enabling xpdf have never been indexed. You have to update it.

    Posted 1 year ago #
  9. I had your team install my system and I was the test of xpdf. I have uploaded several pdf files and they were converted to text by xpdf as I can see the yes and click on it.

    My problem is I can't seem to create a search that finds any of the articles. I try small words, long words, multiple words, I go into advanced search and select attachments and inline files, I select all categories by the all button and by highlighting all.

    Search works on articles, but I can't seem to get it work reliably on attachments. Any hints?

    Posted 1 year ago #

RSS feed for this topic

Reply

You must log in to post.

© 2008 Double Jade LLC | customer.service@kbpublisher.com