Type in a command, or "ls dictionary" to search all commands for "dictionary", etc.

scrape

(This command has been awarded a Yubnub Golden Egg)

http://efridge.net/yubnub/parse.php?ynp_tokens=${tokens}&ynp_dirs=${dirs}&yndesturl=${url}&ynp_debug=${debug=0}&ynp_http=${method=get}&ynp_scrape=${textonly=1}
NAME
     scrape - display a snippet of text parsed from a web page.

SYNOPSIS
     scrape -tokens TOKENS -dirs DIRECTIONS -url URL [options]

DESCRIPTION
     Mandatory arguments:
     -tokens <token_list_string>
        A list of tokens (delimited by spaces) which the user has
        determined to be sufficient to consistently point the
        parser to the exact location of the required data. For
        example
          -tokens <title> </title>
        would return all characters between these two HTML tags.

     -dirs <direction_list_string>
        A list of directions associated with each token. 0
        instructs the parser to look in the forward direction,
        while 1 instructs the parser to look in the reverse
        direction. In the above example, we would have
          -dirs 0 0
        since we will look for <title> from the beginning of
        the file in the forward direction, and then continue
        in the forward direction to find </title>. However,
        we could also specify
          -tokens </title> <title -dirs 0 1
        which would first find </title> and then reverse search
        for <title. This is useful if an HTML tag has attributes
        and you therefore cannot assume that the ">" in <title>
        will be present.
        
     -url <url>
        The full URL to the desired web page using the same format
        regardless of which HTTP variable-sending method is needed.
        Example: -url http://www.site.com?var1=${a}&var2=${b}
        Example: -url http://www.site2.com?var=%s
        Example: -url http://www.site3.com/%s.html
        The website name and any HTTP variables must be separated
        by a "?" and subsequent HTTP variables by a "&".

     Optional arguments:
     -method <http_method = get>
        If not included, the variables will be sent using the HTTP
        GET method. If set to anything else (including of course,
        post) the variables will be sent using the HTTP POST method.

     -textonly <display_method = 1>
        If not included, the scraped text will be returned as a
        simple string of text, easily fed into other YubNub functions.
        This is the default value of 1. If set to anything else,
        scrape will return the scraped text to a "mock" YubNub
        command line (see the defw, defn, and postalcode commands.)

     -debug <debugger = 0>
        If not included, no debugging information will be returned.
        If set to 1, some debugging information will be returned.
        This may help you see why a certain scrape is not working.

     Forthcoming arguments:
     -a character offset argument
     -a length argument
     -a word offset argument
     -a word delimiter argument
     -a numwords argument

EXAMPLE
     1. The qpostal command sends the user to the Canada Post site
     which displays the requested postal code:

        qpostal -n 1708 -s charles -t court -c val caron -p ON

     To scrape this postal code from this web page, you would
     have to examine the HTML of the above Canada Post site, and
     identify various tokens that will consistently guide the
     parser to the postal code string. One could define a command
     called qpostal_lite in this way:

        scrape -tokens >Postal Code< <tr> </tr> tblcell <br> >
               -dirs 0 0 0 1 0 1 -method post
               -url http://www.canadapost.ca/...
                  (see the yndesturl variable of the
                   qpostal command for the full URL)

     Typing:

        qpostal_lite -n 1708 -s charles -t court -c val caron -p ON

     will now return a simple string to your browser, which can
     now be piped into other commands which expect a postal code
     as an argument: cbc_pc {qpostal_lite ...}

     2. The wikt commands sends the user to the Wiktionary site
     which displays the requested definition:

        wikt eon

     To scrape this definition from this page, you would again
     examine the HTML of the above Wiktionary site, and
     identify various tokens that will consistently guide the
     parser to the definition string. This is often difficult.
     The following command gets the first def'n, which is not
     always expected. Define wikt_lite as:

        scrape -tokens </ol> <ol> -dirs 0 1 -textonly 0
               -url http://en.wiktionary.org/wiki/%s

     Typing wikt_lite eon will now return the definition to a "mock"
     YubNub command line prompt. If the textonly switch is dropped,
     the definition is returned to the browser, and can also be
     piped into other YubNub commands expecting a string as an
     argument: fspell {wikt_lite eon}

NOTES
     This command is very beta at the moment. Please bear with
     me. Comments and suggestions are welcome at the YubNub Google
     group (type yubgroup at the YubNub command line.)

AUTHOR
    Sean O'Hagan
    
145872 uses - Created 2005-07-22 03:42:49 - Last used 2024-10-10 06:05:36
Is this command broken? Tell Jon if you know how to fix it.