ITS Services and Applications
Page tree

Regex Beginner Information Reading Material

This is published in the Compliance Sheriff Help Manual as a confirmed good resource on regex.

Please follow the left hand navigation for additional sections of information.

 

Here is another great regex checker, Matt Coulter has provided an excellent example. Note that the lines that become highlighted with a term in it should be excluded from the search.

Regex Examples

Include or Exclude sub pages that have the following terms

Contact, Form, or Login

If the page structure to the url looks like the following

  • https://your.website.com/contact/
  • https://your.website.com/form/
  • https://your.website.com/login/
/(contact|form|login)/

The / at the beginning and end will ensure that it is looking for the string between those characters. In the parenthesis each term is separated by a | to test each term individually.

If they are actually pages

  • https://your.website.com/contact.htm
  • https://your.website.com/form.html
  • https://your.website.com/login.aspx
/(contact|form|login)\.*

The escape character \ will insure that the period is take as the character that it is and not a regex variable.

Scanning Multiple Domains

You may want to include beginning character ^ searches when searching multiple domains

^http://test.syr.edu/(contact|form|login)/

The ^ carrot will allow the regex function to determine if this is the beginning of a string. Please not that if you are searching multiple domains this may need to be adjusted if you are doing a link checker. As most sites use relative links.

Exclude certain file types

If you wish to exclude certain file types you can do the following

\.(pdf|doc|docx|xls|xlxs|ppt|pptx)$

The $ character will designate the end of a string. You may not want to include the $ if your CMS or site uses file versioning and appends additional information after the file type.

 

 

 

  • No labels