Type in a command, or "ls dictionary" to search all commands for "dictionary", etc.
|
NAME scrape - display a snippet of text parsed from a web page. SYNOPSIS scrape -tokens TOKENS -dirs DIRECTIONS -url URL [options] DESCRIPTION Mandatory arguments: -tokens <token_list_string> A list of tokens (delimited by spaces) which the user has determined to be sufficient to consistently point the parser to the exact location of the required data. For example -tokens <title> </title> would return all characters between these two HTML tags. -dirs <direction_list_string> A list of directions associated with each token. 0 instructs the parser to look in the forward direction, while 1 instructs the parser to look in the reverse direction. In the above example, we would have -dirs 0 0 since we will look for <title> from the beginning of the file in the forward direction, and then continue in the forward direction to find </title>. However, we could also specify -tokens </title> <title -dirs 0 1 which would first find </title> and then reverse search for <title. This is useful if an HTML tag has attributes and you therefore cannot assume that the ">" in <title> will be present. -url <url> The full URL to the desired web page using the same format regardless of which HTTP variable-sending method is needed. Example: -url http://www.site.com?var1=${a}&var2=${b} Example: -url http://www.site2.com?var=%s Example: -url http://www.site3.com/%s.html The website name and any HTTP variables must be separated by a "?" and subsequent HTTP variables by a "&". Optional arguments: -method <http_method = get> If not included, the variables will be sent using the HTTP GET method. If set to anything else (including of course, post) the variables will be sent using the HTTP POST method. -textonly <display_method = 1> If not included, the scraped text will be returned as a simple string of text, easily fed into other YubNub functions. This is the default value of 1. If set to anything else, scrape will return the scraped text to a "mock" YubNub command line (see the defw, defn, and postalcode commands.) -debug <debugger = 0> If not included, no debugging information will be returned. If set to 1, some debugging information will be returned. This may help you see why a certain scrape is not working. Forthcoming arguments: -a character offset argument -a length argument -a word offset argument -a word delimiter argument -a numwords argument EXAMPLE 1. The qpostal command sends the user to the Canada Post site which displays the requested postal code: qpostal -n 1708 -s charles -t court -c val caron -p ON To scrape this postal code from this web page, you would have to examine the HTML of the above Canada Post site, and identify various tokens that will consistently guide the parser to the postal code string. One could define a command called qpostal_lite in this way: scrape -tokens >Postal Code< <tr> </tr> tblcell <br> > -dirs 0 0 0 1 0 1 -method post -url http://www.canadapost.ca/... (see the yndesturl variable of the qpostal command for the full URL) Typing: qpostal_lite -n 1708 -s charles -t court -c val caron -p ON will now return a simple string to your browser, which can now be piped into other commands which expect a postal code as an argument: cbc_pc {qpostal_lite ...} 2. The wikt commands sends the user to the Wiktionary site which displays the requested definition: wikt eon To scrape this definition from this page, you would again examine the HTML of the above Wiktionary site, and identify various tokens that will consistently guide the parser to the definition string. This is often difficult. The following command gets the first def'n, which is not always expected. Define wikt_lite as: scrape -tokens </ol> <ol> -dirs 0 1 -textonly 0 -url http://en.wiktionary.org/wiki/%s Typing wikt_lite eon will now return the definition to a "mock" YubNub command line prompt. If the textonly switch is dropped, the definition is returned to the browser, and can also be piped into other YubNub commands expecting a string as an argument: fspell {wikt_lite eon} NOTES This command is very beta at the moment. Please bear with me. Comments and suggestions are welcome at the YubNub Google group (type yubgroup at the YubNub command line.) AUTHOR Sean O'Hagan