
From mikes@mail.sv.dialogic.com Sat Nov 14 22:22:07 1998
Date: Sat, 14 Nov 1998 00:53:59 -0800 (PST)
From: Michael Spann <mikes@mail.sv.dialogic.com>
To: htdig@sdsu.edu
Subject: htdig: [PATCH] nofollow not always obeyed

A <meta name="robots" content="none"> or any of the other variety of ways of
telling htdig not to follow links through a page has two small bugs.  Either
by it self would not manifest this problem I saw.  The following patch seems
to have fixed the problem. 

*** HTML.orig	Mon Nov  2 16:21:51 1998
--- HTML.cc	Sat Nov 14 00:40:55 1998
*************** HTML::parse(Retriever &retriever, URL &b
*** 256,262 ****
  		if (description.length() > max_description_length)
  		{
  		    description << " ...";
! 		    retriever.got_href(*href, description);
  		    in_ref = 0;
  		    description = 0;
  		}
--- 256,263 ----
  		if (description.length() > max_description_length)
  		{
  		    description << " ...";
! 			if (dofollow)
! 		      retriever.got_href(*href, description);
  		    in_ref = 0;
  		    description = 0;
  		}
*************** HTML::do_tag(Retriever &retriever, Strin
*** 512,520 ****
  	}
  
  	case 3:		// "/a"
! 	    if (dofollow && in_ref)
  	    {
! 		retriever.got_href(*href, description);
  		in_ref = 0;
  	    }
  	    break;
--- 513,522 ----
  	}
  
  	case 3:		// "/a"
! 	    if (in_ref)
  	    {
! 		if (dofollow)
! 		  retriever.got_href(*href, description);
  		in_ref = 0;
  	    }
  	    break;


----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.
