How does Google work?
- Googlebot is a web crawler that finds with fetches web pages. Googlebot can finds pages two ways : through add URL form, www.google.com/addurl.html, and through finding links by crawling the web
- The indexer – The webpages fetched by Googlebot are stored in Google’s index database or Indexer. This index is sorted alphabetically by search term, with each index entry storing a list of documents in which the term appears and the location within the text where it occurs. This data structure allows quick access to documents which contain user query terms.
- The query processor -It compares the search query to the index and evaluates / recommends the documents that it considers most relevant.
Google Hacking tricks
Looking for the ultimate tips and tricks for so long! The wait is over. With the use of following tricks you will now start googling like a professional. Here are a few Google Hacking Tricks:
- To look for a specific phrases, put them in quotes, “All that glitters is not gold”
- Synonym search which looks for words that mean similar things. Use the tilde symbol before your keyword. like this. -ethic
- Exclude specific key words with the minus operator Honda civic eBay excludes all results from eBay.
- You can also ask Google to fill in the blank Try: Christopher Columbus discovered
- Search for a particular numerical range with the numrange operator. For example, search for Sony TV between 20000 and 30000 with the string Sony TV 20000 30000.
- Exclude entire file types, using the Boolean syntax exclude keywords: Honda civic-filetype:doc
- Google has many very powerful, hidden search parameters. For example “intitle”only searches page titles. Try intitle:herbs
- The modifier only searches the web address of a page and gives in url spices a go.
- Find live webcams by searching for :inurl:view/view.shtml.
Google basic hacks
Google basic hacks is nothing but providing vital information about anything indexed by Google. Generally people search so many things on google, but searching for the right thing at the right place is the key to success. Here are a few examples that are used by both Black Hair Hacks and white Hat Hackers to collect desired information about their target/vendor using the following syntax.
INTITLE:
intitle: restricts of your search to titles of web pages. The variation, allintitle, finds different pages with all the words which were specified to make up the title of the web page. It’s good to avoid the allintitle: variation, because it does not mix well with some of the other syntaxes.
Intitle:”Barack Obama”
allintitle:”Hacking Computer” Keylogger
INURL:
inurl: restricts your search results to the URL of web pages. This syntax tends to work well for finding search and help pages, because they tend to be rather regular in composition An alliriurt variation finds all the words listed in a URLbut doesn’t mix well with some other special syntaxes
inurl:help
allinurl:search help
INTEXT
Intext:searches only body text Le., Ignores link text, URL, and titles). There is an allintext: variation, but still again, this doesn’t play well with others. While its mos are limited, it’s perfect for finding query words that might be too common in URLsories, Intext:”yahoo.com”intext:html
SITE:
site: allows you to narrow your search by either a kite or a top-level domain. AltaVista has 2 syntaxes for this function (host: and domain, but Google has only the 1.
site:loc:gov
Site:thoman.ioc.gov
Site:edu
LINK
link return a list of pages linking to the specified URL Enter link:www.google.com and you’ll be returned a set of pages that link to Google. Do not worry about including the http://bit, you don’t need it, and, indeed, Google appears to ignore it out even if you do put it in. link: works just as well with “deep” URLshttp://www.raelity or apple/blossom/ for instances with top-level URLs such as raelity.org
CACHE:
cache gives a copy of the page that Google ha =s =indexed even if that page has no longer available at its original URL or has since changed its content completely. This is particularly useful for pages that change often Google returns a result that appears to have little to do with your query, you’re almost sure to find what you are looking in the latest cached version of the page at Google. cache: www.yahoo.com
FILETYPE
filetype: searches the suffixes or filename extensions. These are usually, t nol necessarily, illerent fie typen like to make this distinction, because searching for filetype htm ard filetype china will ulva you different result counts.
even though they’re the same tile type You can even search for rent page generator, such as ASP PHR CGI, and so forth presuming the wite ian hiding them and redirection and proxying Google indexes several different Microsoft formato. including PowerPoint (PPT), Excel (XLS), and Void
(DOC)
homeschooling filetype:pdf
“leading economic indicators” filetype:ppt
How can Google hacking help an ethical hacker?
- Google Hacking Plays a very important role in gathering Information about the Target
- An important part of Reconnaissance
- You can collect lots of Information over the net, which sometimes includes even usernames and passwords
- List of Employees, their personal details, etc. otc.
Preventing from google crawls | Google Hacking
Many times, people who own websites do not wish their important data available on their websites to be available in Google’s search results for many reasons, og security, incomplete data, confidential data etc. Well, they can exercise following options, if they wish that Googlebot and other spiders won’t be able to access that content
1. ROBOTS.TXT –
The robots.txt file is like an electronic No Trespassing sign. It tells Googlebot and some other crawlers that which files and directories on your server should not be crawled. In order to use a robots.txt file, ona needs to have access to the root of his host.
In addition, while all respectable robots will respect the directives in a robots.txt file, some may interpret them differently. However the filerobots.txt may be enforceable, and some spammers and other troublemakers may ignore it. Hence, it is recommended to go for password protecting confidential information (see below).
2. PASSWORD-PROTECTED DIRECTORY-
In case, it is inevitable to keep confidential content on your server, save it in a password-protected directory. Googlebot and other spiders would be unable to access the content. This is the simplest yet most effective way to prevent Googlebot and other spiders from crawling and indexing content on such websites that have password protected Directories.
If the website is hosted on an Apache Web Server, .htaccess file can be edited to password-protect the directory on the server. However, various tools are available on the Internet that will let you do this comfortably.
3. NOINDEX META TAG :
This prevents content from appearing in Google’s search results. When Google or another spider sees a noindex meta tag on a page, the search engine completely drops the page from search results, even if other pages link to it. If the content is currently in spider’s index, that link is also removed after the next time the page containing noindex meta tag is crawled.
Some search engines, however, may interpret this directive differently. As a result, a link to the page can still be shown in search results.
Written By: Mayank Mevada
Reviewed By: Sayan Chatterjee
If you are Interested In Machine Learning You Can Check Machine Learning Internship Program
Also Check Other Technical And Non Technical Internship Programs