
From LuedtkT@mail.nlm.nih.gov Tue Sep 19 11:03:21 2000
Date: Tue, 19 Sep 2000 09:52:36 -0400
From: Terry Luedtke <LuedtkT@mail.nlm.nih.gov>
To: htdig@htdig.org
Subject: [htdig] infinite loop in doc2html.pl

Hello,

I ran into an infinite loop using doc2html.  When it parses a PDF document it tries to reassemble hyphenated words.  Unfortunately, I have documents that end with a dash, like"text-", so the loop spins forever looking for the other half of the word.  Adding a check for eof fixed it.

in sub try_text()

      while (<CAT>) {
        while ( m/[A-Za-z\300-\377]-\s*$/ && $set->{'hyph'}) {
          ($_ .= <CAT>) || last;
          s/([A-Za-z\300-\377])-\s*\n\s*([A-Za-z\300-\377])/$1$2/s;
        }
--
      while (<CAT>) {
        while ( m/[A-Za-z\300-\377]-\s*$/ && $set->{'hyph'}) {
          ($_ .= <CAT>) || last;
          s/([A-Za-z\300-\377])-\s*\n\s*([A-Za-z\300-\377])/$1$2/s;
+          last if eof;
        }


Terry Luedtke
National Library of Medicine





------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.
List archives:  <http://www.htdig.org/mail/menu.html>
FAQ:            <http://www.htdig.org/FAQ.html>

