org.matalon.pagerankhits.util
Class UrlUtil

java.lang.Object
  extended byorg.matalon.pagerankhits.util.UrlUtil

public class UrlUtil
extends java.lang.Object

This class includes basic URL related utilities.

Author:
Yonatan Matalon

Field Summary
private static java.lang.String DEFAULT_PROTOCOL
           
private static java.lang.String FULL_DEFAULT_PROTOCOL
           
private static java.lang.String PROTOCOL_IDENTIFIER
           
private static java.util.Collection tlds
           
private static java.util.Collection validWebPageExtensions
           
 
Constructor Summary
UrlUtil()
           
 
Method Summary
private static void addLink(java.util.List links, java.lang.String url, java.lang.String parentUrl)
          Adds the given URL to the given links list; if the URL is relative it shall be constructed using the given parent URL.
private static java.net.URL createNewURL(java.lang.String address)
          Creates a new URL object using the given address.
static java.lang.Object[] extractTitleAndLinks(java.lang.String webPageAddress)
          Extracts the title and the links of/inside the web page, having the given address.
static java.lang.String getPageTitle(java.net.URL url)
           
private static java.lang.String getParentDirectory(java.lang.String fileOrPath)
           
static boolean isValidWebPageAddress(java.lang.String address, boolean validateSyntax, boolean validateExistance)
          Checks if the given address is a valid web page address.
static boolean isValidWebPageAddress(java.lang.String address, java.lang.String parentAddress, boolean validateSyntax, boolean validateExistance)
          Checks if the given address is a valid web page address.
static java.lang.String normalizeUrl(java.lang.String url)
          Converts the given url into an absolute valid address.
static java.lang.String normalizeUrl(java.lang.String url, java.lang.String parentUrl)
          Converts the given url into an absolute valid address.
static boolean validateUrl(java.lang.String url, boolean validateSyntax, boolean validateExistance)
          Checks if the given url is syntactically valid and/or exists.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_PROTOCOL

private static final java.lang.String DEFAULT_PROTOCOL
See Also:
Constant Field Values

PROTOCOL_IDENTIFIER

private static final java.lang.String PROTOCOL_IDENTIFIER
See Also:
Constant Field Values

FULL_DEFAULT_PROTOCOL

private static final java.lang.String FULL_DEFAULT_PROTOCOL
See Also:
Constant Field Values

validWebPageExtensions

private static final java.util.Collection validWebPageExtensions

tlds

private static final java.util.Collection tlds
Constructor Detail

UrlUtil

public UrlUtil()
Method Detail

validateUrl

public static final boolean validateUrl(java.lang.String url,
                                        boolean validateSyntax,
                                        boolean validateExistance)
Checks if the given url is syntactically valid and/or exists.

Parameters:
url -
validateSyntax -
validateExistance -
Returns:
Returns true, if the given url is syntactically valid, false otherwise.

isValidWebPageAddress

public static final boolean isValidWebPageAddress(java.lang.String address,
                                                  boolean validateSyntax,
                                                  boolean validateExistance)
Checks if the given address is a valid web page address. (I.e. also a valid URL and also an address of a web page (unlike image, pdf etc)).

Parameters:
address -
validateSyntax -
validateExistance -
Returns:
Returns true, if the given address is a valid web page address, false otherwise.

isValidWebPageAddress

public static final boolean isValidWebPageAddress(java.lang.String address,
                                                  java.lang.String parentAddress,
                                                  boolean validateSyntax,
                                                  boolean validateExistance)
Checks if the given address is a valid web page address. (I.e. also a valid URL and also an address of a web page (unlike image, pdf etc)).

Parameters:
address -
parentAddress -
validateSyntax -
validateExistance -
Returns:
Returns true, if the given address is a valid web page address, false otherwise.

normalizeUrl

public static final java.lang.String normalizeUrl(java.lang.String url)
Converts the given url into an absolute valid address.

Parameters:
url -
Returns:
Returns an absolute address if the given url is a valid relative/absolute URL, null otherwise.

normalizeUrl

public static final java.lang.String normalizeUrl(java.lang.String url,
                                                  java.lang.String parentUrl)
Converts the given url into an absolute valid address. In case that the given url is relative, the method uses the given parent URL in order to deduce the absolute representation of the URL.

Parameters:
url -
parentUrl -
Returns:
Returns an absolute address if the given url is a valid relative/absolute URL, null otherwise.

extractTitleAndLinks

public static final java.lang.Object[] extractTitleAndLinks(java.lang.String webPageAddress)
Extracts the title and the links of/inside the web page, having the given address.

Parameters:
webPageAddress -
Returns:
Returns Object[2] = {(java.lang.String)extracted_title, (java.util.List)extracted_links}.

addLink

private static void addLink(java.util.List links,
                            java.lang.String url,
                            java.lang.String parentUrl)
Adds the given URL to the given links list; if the URL is relative it shall be constructed using the given parent URL.

Parameters:
links -
url -
parentUrl -

getPageTitle

public static final java.lang.String getPageTitle(java.net.URL url)
Parameters:
url -
Returns:
Returns the title of the page.

createNewURL

private static final java.net.URL createNewURL(java.lang.String address)
Creates a new URL object using the given address.

Parameters:
address -
Returns:
Returns the newly created URL.

getParentDirectory

private static final java.lang.String getParentDirectory(java.lang.String fileOrPath)
Parameters:
fileOrPath -
Returns:
Returns the parent directory of the given file or path under the assumption that root's parent directory is root.