
From grdetil@scrc.umanitoba.ca Thu Jan 27 17:30:05 2000
Date: Thu, 27 Jan 2000 17:28:59 -0600 (CST)
From: Gilles Detillieux <grdetil@scrc.umanitoba.ca>
To: htdig3-dev@htdig.org
Cc: henson@intranet.csupomona.edu
Subject: [htdig3-dev] call for votes - Local file system access enhancements

Hi, folks.

This patch from Paul Henson fixes some long-standing complaints about
limitations in the local_urls and local_user_urls support.  It's for
3.1.4, so I don't know how easily it'll apply to 3.2.0b1, but it looks
to me like a clean implementation.  Can we have a vote on including it
in 3.2.0b1, despite the feature freeze?

Here's my
+1

--- begin forwarded message from Paul B. Henson ---
>From henson@intranet.csupomona.edu  Thu Jan 27 16:23:52 2000
Date: Thu, 27 Jan 2000 14:23:46 -0800 (PST)
From: "Paul B. Henson" <henson@intranet.csupomona.edu>
To: Gilles Detillieux <grdetil@scrc.umanitoba.ca>
cc: ht3bugs@htdig.org, htdig3-bugs@htdig.org
Subject: Re: Local file system access enhancements (PR#744)
In-Reply-To: <200001201740.LAA27780@cliff.scrc.umanitoba.ca>
Message-ID: <Pine.GSO.4.10.10001271413180.10558-100000@kenny.intranet.csupomona.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

On Thu, 20 Jan 2000, Gilles Detillieux wrote:

> According to Paul B. Henson (henson@intranet.csupomona.edu):
> > I would like to request that the local_default_doc allow a list of strings
> > specifying multiple default index files sorted by preference.
> 
> It's not really a bug, but it is a limitation that several others have
> requested we address.  It's on the to-do list, but won't make it into
> 3.2.0b1.  Hopefully someone will come forward to implement it for the
> following release.
>
> > We also have multiple locations where a ~name reference is redirected (e.g.,
> > /~foo/ could be either /dfs/user/foo or /dfs/group/foo).
> > 
> > I would like to request that the local_user_urls be enhanced to allow ~foo
> > to be found by searching multiple locations in preference order.
[...]
> 
> Ideally, though, htdig should keep trying other matches in the list if the
> first one fails, rather than only trying the first match in local_urls or
> local_user_urls.  Again, hopefully someone will implement this eventually.

Here is a patch to htdig-3.1.4 that allows you to specify multiple default
index files separated by white space, e.g.,

   local_default_doc:  index.html index.htm

that will be tried in order until one is found or the list is exhausted.

The patch also adds the ability to have multiple identical prefixes with
different paths in the local_urls configuration variable, e.g.,

   local_urls:  http://www.csupomona.edu/=/dfs/web/public/ \
                http://www.csupomona.edu/=/dfs/web/data/

that will be tried in order until one is found or the list is exhausted.

Finally, the patch adds the ability to have multiple locations for
local_user_urls, e.g.,

   local_user_urls:  http://www.csupomona.edu/=/dfs/group/,/ \
                     http://www.csupomona.edu/=/dfs/user/,/


The patch is not overly complicated; hopefully, you will be able to
incorporate it into your next release.


Thanks...

-----------------------------------------------------------------------------------
diff -r -c htdig-3.1.4/htdig/Document.cc htdig-3.1.4-new/htdig/Document.cc
*** htdig-3.1.4/htdig/Document.cc	Thu Dec  9 16:28:44 1999
--- htdig-3.1.4-new/htdig/Document.cc	Thu Jan 27 11:51:52 2000
***************
*** 571,589 ****
  
  
  //*****************************************************************************
! // DocStatus Document::RetrieveLocal(time_t date, char *filename)
  //   Attempt to retrieve the document pointed to by our internal URL
! //   using a local filename given. Returns Document_ok,
  //   Document_not_changed or Document_not_local (in which case the
  //   retriever tries it again using HTTP).
  //
  Document::DocStatus
! Document::RetrieveLocal(time_t date, char *filename)
  {
      struct stat stat_buf;
!     // Check that it exists, and is a regular file. 
!     if ((stat(filename, &stat_buf) == -1) || !S_ISREG(stat_buf.st_mode))
! 	return Document_not_local;
  
      modtime = stat_buf.st_mtime;
      if (modtime <= date)
--- 571,602 ----
  
  
  //*****************************************************************************
! // DocStatus Document::RetrieveLocal(time_t date, StringList *filenames)
  //   Attempt to retrieve the document pointed to by our internal URL
! //   using a list of potential local filenames given. Returns Document_ok,
  //   Document_not_changed or Document_not_local (in which case the
  //   retriever tries it again using HTTP).
  //
  Document::DocStatus
! Document::RetrieveLocal(time_t date, StringList *filenames)
  {
      struct stat stat_buf;
!     String *filename;
! 
!     filenames->Start_Get();
! 
!     // Loop through list of potential filenames until the list is exhausted
!     // or a suitable file is found.
!     while ((filename = (String *)filenames->Get_Next()) &&
! 	   ((stat(*filename, &stat_buf) == -1) || !S_ISREG(stat_buf.st_mode)))
!         if (debug > 1)
! 	    cout << "  tried local file " << *filename << endl;
!     
!     if (!filename)
!         return Document_not_local;
! 
!     if (debug > 1)
!         cout << "  found existing file " << *filename << endl;
  
      modtime = stat_buf.st_mtime;
      if (modtime <= date)
***************
*** 592,598 ****
      // Process only HTML files (this could be changed if we read
      // the server's mime.types file).
      // (...and handle a select few other types for now...)
!     char *ext = strrchr(filename, '.');
      if (ext == NULL)
        	return Document_not_local;
      if ((mystrcasecmp(ext, ".html") == 0) || (mystrcasecmp(ext, ".htm") == 0))
--- 605,611 ----
      // Process only HTML files (this could be changed if we read
      // the server's mime.types file).
      // (...and handle a select few other types for now...)
!     char *ext = strrchr(*filename, '.');
      if (ext == NULL)
        	return Document_not_local;
      if ((mystrcasecmp(ext, ".html") == 0) || (mystrcasecmp(ext, ".htm") == 0))
***************
*** 607,613 ****
    	return Document_not_local;
  
      // Open it
!     FILE *f = fopen(filename, "r");
      if (f == NULL)
   	return Document_not_local;
  
--- 620,626 ----
    	return Document_not_local;
  
      // Open it
!     FILE *f = fopen(*filename, "r");
      if (f == NULL)
   	return Document_not_local;
  
diff -r -c htdig-3.1.4/htdig/Document.h htdig-3.1.4-new/htdig/Document.h
*** htdig-3.1.4/htdig/Document.h	Thu Dec  9 16:28:44 1999
--- htdig-3.1.4-new/htdig/Document.h	Thu Jan 27 11:51:16 2000
***************
*** 19,24 ****
--- 19,25 ----
  #include "Object.h"
  #include "URL.h"
  #include "htString.h"
+ #include "StringList.h"
  #if TIME_WITH_SYS_TIME
  # include <sys/time.h>
  # include <time.h>
***************
*** 79,85 ****
  	Document_not_local
      };
      DocStatus			RetrieveHTTP(time_t date);
!     DocStatus			RetrieveLocal(time_t date, char *filename);
  
      //
      // Return an appropriate parsable object for the document type.
--- 80,86 ----
  	Document_not_local
      };
      DocStatus			RetrieveHTTP(time_t date);
!     DocStatus			RetrieveLocal(time_t date, StringList *filenames);
  
      //
      // Return an appropriate parsable object for the document type.
diff -r -c htdig-3.1.4/htdig/Retriever.cc htdig-3.1.4-new/htdig/Retriever.cc
*** htdig-3.1.4/htdig/Retriever.cc	Thu Dec  9 16:28:44 1999
--- htdig-3.1.4-new/htdig/Retriever.cc	Thu Jan 27 12:18:26 2000
***************
*** 139,148 ****
  	{
  	    String robotsURL = "http://";
  	    robotsURL << u.host() << "/robots.txt";
! 	    String *localRobotsFile = GetLocal(robotsURL.get());
! 	    server = new Server(u.host(), u.port(), localRobotsFile);
  	    servers.Add(u.signature(), server);
! 	    delete localRobotsFile;
  	}
  	else if (from && visited.Exists(url)) 
  	{
--- 139,148 ----
  	{
  	    String robotsURL = "http://";
  	    robotsURL << u.host() << "/robots.txt";
! 	    StringList *localRobotsFiles = GetLocal(robotsURL.get());
! 	    server = new Server(u.host(), u.port(), localRobotsFiles);
  	    servers.Add(u.signature(), server);
! 	    delete localRobotsFiles;
  	}
  	else if (from && visited.Exists(url)) 
  	{
***************
*** 402,413 ****
      // Retrive document, first trying local file access if possible.
      Document::DocStatus status;
      server = (Server *) servers[url.signature()];
!     String *local_filename = GetLocal(url.get());
!     if (local_filename)
      {  
          if (debug > 1)
! 	    cout << "Trying local file " << *local_filename << endl;
!         status = doc->RetrieveLocal(date, *local_filename);
          if (status == Document::Document_not_local)
          {
  	    if (local_urls_only)
--- 402,413 ----
      // Retrive document, first trying local file access if possible.
      Document::DocStatus status;
      server = (Server *) servers[url.signature()];
!     StringList *local_filenames = GetLocal(url.get());
!     if (local_filenames)
      {  
          if (debug > 1)
! 	    cout << "Trying local files" << endl;
!         status = doc->RetrieveLocal(date, local_filenames);
          if (status == Document::Document_not_local)
          {
  	    if (local_urls_only)
***************
*** 421,427 ****
  	    else
  		status = Document::Document_no_server;
          }
!         delete local_filename;
      }
      else if (server && !server->IsDead())
          status = doc->RetrieveHTTP(date);
--- 421,427 ----
  	    else
  		status = Document::Document_no_server;
          }
!         delete local_filenames;
      }
      else if (server && !server->IsDead())
          status = doc->RetrieveHTTP(date);
***************
*** 747,762 ****
  
  
  //*****************************************************************************
! // String* Retriever::GetLocal(char *url)
! //   Returns a string containing the (possible) local filename
  //   of the given url, or 0 if it's definitely not local.
! //   THE CALLER MUST FREE THE STRING AFTER USE!
  //
! String*
  Retriever::GetLocal(char *url)
  {
      static StringList *prefixes = 0;
      static StringList *paths = 0;
  
      //
      // Initialize prefix/path list if this is the first time.
--- 747,763 ----
  
  
  //*****************************************************************************
! // StringList* Retriever::GetLocal(char *url)
! //   Returns a list of strings containing the (possible) local filenames
  //   of the given url, or 0 if it's definitely not local.
! //   THE CALLER MUST FREE THE STRINGLIST AFTER USE!
  //
! StringList*
  Retriever::GetLocal(char *url)
  {
      static StringList *prefixes = 0;
      static StringList *paths = 0;
+     static StringList *defaultdocs = 0;
  
      //
      // Initialize prefix/path list if this is the first time.
***************
*** 766,771 ****
--- 767,773 ----
      {
      	prefixes = new StringList();
  	paths = new StringList();
+ 	defaultdocs = new StringList();
  
  	String t = config["local_urls"];
  	char *p = strtok(t, " \t");
***************
*** 782,793 ****
              paths->Add(path);
  	    p = strtok(0, " \t");
  	}
      }
  
      // Check first for local user...
      if (strchr(url, '~'))
      {
! 	String *local = GetLocalUser(url);
  	if (local)
  	    return local;
      }
--- 784,804 ----
              paths->Add(path);
  	    p = strtok(0, " \t");
  	}
+ 	t = config["local_default_doc"];
+ 	p = strtok(t, " \t");
+ 	while (p)	
+ 	{
+ 	    defaultdocs->Add(p);
+ 	    p = strtok(0, " \t");
+ 	}
+ 	if (defaultdocs->Count() == 0)
+ 	    delete defaultdocs;
      }
  
      // Check first for local user...
      if (strchr(url, '~'))
      {
! 	StringList *local = GetLocalUser(url, defaultdocs);
  	if (local)
  	    return local;
      }
***************
*** 797,802 ****
--- 808,814 ----
          return 0;
      
      String *prefix, *path;
+     StringList *local_names = new StringList();
      prefixes->Start_Get();
      paths->Start_Get();
      while ((prefix = (String*) prefixes->Get_Next()))
***************
*** 807,830 ****
  	    int l = strlen(url)-prefix->length()+path->length()+4;
  	    String *local = new String(*path, l);
  	    *local += &url[prefix->length()];
! 	    if (local->last() == '/' && config["local_default_doc"] != "")
! 	      *local += config["local_default_doc"];
! 	    return local;
  	}	
      }
      return 0;
  }
  
  
  //*****************************************************************************
! // String* Retriever::GetLocalUser(char *url)
! //   If the URL has ~user part, returns a string containing the
! //   (possible) local filename of the given url, or 0 if it's
  //   definitely not local.
! //   THE CALLER MUST FREE THE STRING AFTER USE!
  //
! String*
! Retriever::GetLocalUser(char *url)
  {
      static StringList *prefixes = 0, *paths = 0, *dirs = 0;
      static Dictionary home_cache;
--- 819,854 ----
  	    int l = strlen(url)-prefix->length()+path->length()+4;
  	    String *local = new String(*path, l);
  	    *local += &url[prefix->length()];
! 	    if (local->last() == '/' && defaultdocs) {
! 	      defaultdocs->Start_Get();
! 	      while (String *defaultdoc = (String *)defaultdocs->Get_Next()) {
! 		String *localdefault = new String(*local, local->length()+defaultdoc->length()+1);
! 		localdefault->append(*defaultdoc);
! 		local_names->Add(localdefault);
! 	      }
! 	      delete local;
! 	    }
! 	    else
! 	      local_names->Add(local);
  	}	
      }
+     if (local_names->Count() > 0)
+         return local_names;
+ 
+     delete local_names;
      return 0;
  }
  
  
  //*****************************************************************************
! // StringList* Retriever::GetLocalUser(char *url, StringList *defaultdocs)
! //   If the URL has ~user part, return a list of strings containing the
! //   (possible) local filenames of the given url, or 0 if it's
  //   definitely not local.
! //   THE CALLER MUST FREE THE STRINGLIST AFTER USE!
  //
! StringList*
! Retriever::GetLocalUser(char *url, StringList *defaultdocs)
  {
      static StringList *prefixes = 0, *paths = 0, *dirs = 0;
      static Dictionary home_cache;
***************
*** 882,887 ****
--- 906,912 ----
      paths->Start_Get();
      dirs->Start_Get();
      String *prefix, *path, *dir;
+     StringList *local_names = new StringList();
      while ((prefix = (String*) prefixes->Get_Next()))
      {
          path = (String*) paths->Get_Next();
***************
*** 906,912 ****
  	    if (home)
  	        *local += *home;
  	    else
! 	        return 0;
  	}
  	else
  	{
--- 931,937 ----
  	    if (home)
  	        *local += *home;
  	    else
! 	        continue;
  	}
  	else
  	{
***************
*** 915,924 ****
  	}
  	*local += *dir;
  	*local += rest;
! 	if (local->last() == '/' && config["local_default_doc"] != "")
! 	  *local += config["local_default_doc"];
! 	return local;
      }
      return 0;
  }
  
--- 940,962 ----
  	}
  	*local += *dir;
  	*local += rest;
! 	if (local->last() == '/' && defaultdocs) {
! 	  defaultdocs->Start_Get();
! 	  while (String *defaultdoc = (String *)defaultdocs->Get_Next()) {
! 	    String *localdefault = new String(*local, local->length()+defaultdoc->length()+1);
! 	    localdefault->append(*defaultdoc);
! 	    local_names->Add(localdefault);
! 	  }
! 	  delete local;
! 	}
! 	else
! 	  local_names->Add(local);
      }
+ 
+     if (local_names->Count() > 0)
+         return local_names;
+ 
+     delete local_names;
      return 0;
  }
  
***************
*** 933,939 ****
  {
      int ret;
  
!     String *local_filename = GetLocal(url);
      ret = (local_filename != 0);
      delete local_filename;
  
--- 971,977 ----
  {
      int ret;
  
!     StringList *local_filename = GetLocal(url);
      ret = (local_filename != 0);
      delete local_filename;
  
***************
*** 1174,1180 ****
  		    //
  		    String robotsURL = "http://";
  		    robotsURL << url.host() << "/robots.txt";
! 		    String *localRobotsFile = GetLocal(robotsURL.get());
  		    server = new Server(url.host(), url.port(), localRobotsFile);
  		    servers.Add(url.signature(), server);
  		    delete localRobotsFile;
--- 1212,1218 ----
  		    //
  		    String robotsURL = "http://";
  		    robotsURL << url.host() << "/robots.txt";
! 		    StringList *localRobotsFile = GetLocal(robotsURL.get());
  		    server = new Server(url.host(), url.port(), localRobotsFile);
  		    servers.Add(url.signature(), server);
  		    delete localRobotsFile;
***************
*** 1307,1313 ****
  		    //
  		    String robotsURL = "http://";
  		    robotsURL << url.host() << "/robots.txt";
! 		    String *localRobotsFile = GetLocal(robotsURL.get());
  		    server = new Server(url.host(), url.port(), localRobotsFile);
  		    servers.Add(url.signature(), server);
  		    delete localRobotsFile;
--- 1345,1351 ----
  		    //
  		    String robotsURL = "http://";
  		    robotsURL << url.host() << "/robots.txt";
! 		    StringList *localRobotsFile = GetLocal(robotsURL.get());
  		    server = new Server(url.host(), url.port(), localRobotsFile);
  		    servers.Add(url.signature(), server);
  		    delete localRobotsFile;
diff -r -c htdig-3.1.4/htdig/Retriever.h htdig-3.1.4-new/htdig/Retriever.h
*** htdig-3.1.4/htdig/Retriever.h	Thu Dec  9 16:28:44 1999
--- htdig-3.1.4-new/htdig/Retriever.h	Thu Jan 27 12:05:41 2000
***************
*** 12,17 ****
--- 12,18 ----
  #include "Dictionary.h"
  #include "Queue.h"
  #include "List.h"
+ #include "StringList.h"
  
  class URL;
  class Document;
***************
*** 68,75 ****
      //
      // Routines for dealing with local filesystem access
      //
!     String *            GetLocal(char *url);
!     String *            GetLocalUser(char *url);
      int			IsLocalURL(char *url);
  	
  private:
--- 69,76 ----
      //
      // Routines for dealing with local filesystem access
      //
!     StringList *            GetLocal(char *url);
!     StringList *            GetLocalUser(char *url, StringList *defaultdocs);
      int			IsLocalURL(char *url);
  	
  private:
diff -r -c htdig-3.1.4/htdig/Server.cc htdig-3.1.4-new/htdig/Server.cc
*** htdig-3.1.4/htdig/Server.cc	Thu Dec  9 16:28:44 1999
--- htdig-3.1.4-new/htdig/Server.cc	Thu Jan 27 11:04:37 2000
***************
*** 20,28 ****
  
  
  //*****************************************************************************
! // Server::Server(char *host, int port, String *local_robots_file)
  //
! Server::Server(char *host, int port, String *local_robots_file)
  {
      if (debug > 0)
  	cout << endl << "New server: " << host << ", " << port << endl;
--- 20,28 ----
  
  
  //*****************************************************************************
! // Server::Server(char *host, int port, StringList *local_robots_files)
  //
! Server::Server(char *host, int port, StringList *local_robots_files)
  {
      if (debug > 0)
  	cout << endl << "New server: " << host << ", " << port << endl;
***************
*** 47,57 ****
      static int		local_urls_only = config.Boolean("local_urls_only");
      time_t		timeZero = 0;
      Document::DocStatus	status;
!     if (local_robots_file)
      {  
          if (debug > 1)
! 	    cout << "Trying local file " << *local_robots_file << endl;
!         status = doc.RetrieveLocal(timeZero, *local_robots_file);
          if (status == Document::Document_not_local)
          {
  	    if (local_urls_only)
--- 47,57 ----
      static int		local_urls_only = config.Boolean("local_urls_only");
      time_t		timeZero = 0;
      Document::DocStatus	status;
!     if (local_robots_files)
      {  
          if (debug > 1)
! 	    cout << "Trying local files " << endl;
!         status = doc.RetrieveLocal(timeZero, local_robots_files);
          if (status == Document::Document_not_local)
          {
  	    if (local_urls_only)
diff -r -c htdig-3.1.4/htdig/Server.h htdig-3.1.4-new/htdig/Server.h
*** htdig-3.1.4/htdig/Server.h	Thu Dec  9 16:28:44 1999
--- htdig-3.1.4-new/htdig/Server.h	Thu Jan 27 11:58:29 2000
***************
*** 11,16 ****
--- 11,17 ----
  
  #include "Object.h"
  #include "htString.h"
+ #include "StringList.h"
  #include "Stack.h"
  #include "Queue.h"
  #include "StringMatch.h"
***************
*** 25,31 ****
  	//
  	// Construction/Destruction
  	//
! 	Server(char *host, int port, String *local_robots_file = NULL);
  	~Server();
  
  	//
--- 26,32 ----
  	//
  	// Construction/Destruction
  	//
! 	Server(char *host, int port, StringList *local_robots_files = NULL);
  	~Server();
  
  	//
-----------------------------------------------------------------------------------




-- 
Paul B. Henson  |  (909) 869-3781  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  henson@intranet.csupomona.edu
California State Polytechnic University  |  Pomona CA 91768

--- end forwarded message from Paul B. Henson ---

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
htdig3-dev-unsubscribe@htdig.org 
You will receive a message to confirm this. 

