[an error occurred while processing this directive] [an error occurred while processing this directive][an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] (none) [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive][an error occurred while processing this directive] [an error occurred while processing this directive][an error occurred while processing this directive] [an error occurred while processing this directive][an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] (none) [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive][an error occurred while processing this directive]
 
[an error occurred while processing this directive] [an error occurred while processing this directive]
Skåne Sjælland Linux User Group - http://www.sslug.dk Home   Subscribe   Mail Archive   Forum   Calendar   Search
MhonArc Date: [Date Prev] [Date Index] [Date Next]   Thread: [Date Prev] [Thread Index] [Date Next]   MhonArc
 

Re: [LOCALE] Webskanning



Bo:

> Der er ikke tilfældigvis nogen der ligger inde med et
> script som (måske ved hjælp af lynx?) tømmer et helt
> site for ord?

Jeg troede jeg havde et, men jeg kan ikke finde det. I
stedet får du en løs skitse.

 1) Husk at tjekke /robots.txt for forbudte dele af
    webstedet.
 2) Hold styr på URL'erne med to filer:
     * én med alle de læste sider
     * én med alle observerede URL'er (både læste og ulæste
       sider)
 3) `egrep '[ 0-9][ 0-9][ 0-9][ 0-9][.] http:'` fanger
    URL'erne sidst i udskriften.
 4) `lynx -dump -nolist ${URL}` skriver teksten ud uden
    URL'er og uden URL-numre i teksten.
 5) `sleep 2` holder to sekunders pause.

Jacob
-- 
Growing older is compulsory. Growing up isn't.



 
Home   Subscribe   Mail Archive   Index   Calendar   Search

 
 
Questions about the web-pages to <www_admin>. Last modified 2005-08-10, 20:52 CEST [an error occurred while processing this directive]
This page is maintained by [an error occurred while processing this directive]MHonArc [an error occurred while processing this directive] # [an error occurred while processing this directive] *