Date: Wed, 25 Sep 2002 12:17:53 -0500 (CDT) From: Gilles Detillieux To: "ht://Dig mailing list" Subject: [htdig] PATCH 3.1.6 to skip over JavaScript correctly A frequent complaint of the new JavaScript skipping code in 3.1.6's HTML.cc parser is that it gets confused by a "<" in the JavaScript code, causing it to miss the closing tag. Here is a patch that fixes this problem. As far as I can tell, it works right and doesn't break anything else, but of course I'd appreciate some other testers for this. Apply it using "patch -p0 < this-message-file". --- htdig/HTML.cc.orig Wed Jan 9 16:12:31 2002 +++ htdig/HTML.cc Wed Sep 25 11:50:50 2002 @@ -308,6 +308,13 @@ HTML::parse(Retriever &retriever, URL &b if (!q) break; // Syntax error in the doc. Tag never ends. position++; + if (noindex & TAGscript) + { // Special handling in case '<' is part of JavaScript code + while (isspace(*position)) + position++; + if (mystrncasecmp((char *)position, "/script", 7) != 0) + continue; + } tag = 0; tag.append((char*)position, q - position); while (isspace(*position)) This patch should also work fine with 3.2.0b4 snapshots on or after Sunday, August 2, 2001. -- Gilles R. Detillieux E-mail: Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) _______________________________________________ htdig-general mailing list To unsubscribe, send a message to with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html