пятница, 16 сентября 2011 г.

The power of the Hrefer software

This post is to explain basic functionalities of the Hrefer software.
Hrefer software is the all-powerful parser of search engines that provides huge possibilities. The basic version of Hrefer software has all the potential to parse most popular search engines as Google, Bing, Yahoo, Altavista, MSN and others.

Video tutorial in HD format part 1:



Video tutorial in HD format part 2:




The Hrefer program functions as follows:
 
Let’s have a look at the flexible program settings:

Convert all links to index
. This function allows converting all the harvested links to index (it concerns forums only) as soon as you start parsing

Reject domains with level lower than 2. If the option is enabled, then only second-level domains will be covered to list in the database; the others will be filtered

Check all links “200 OK” response (will work SLOWLY). If the option is enabled, it allows you to check the links to the response of 200 OK. If it is ticked, the process of link picking is slowed.

Log founded hight-PR freehostings into the FreeBonus.txt. When this exclusive option is on, the program saves sub-domains of free hostings with higher Page Rank into the FreeBonus.txt file while parsing (the file is saved in the program root folder)

Enable filtering of duplicated links by hostnames. The domain duplicates are not included into the compiled database; links are filtered by the similar hosting names

Enable filtering of duplicated links on loading links database. If enabled, duplicate domains are removed from the database every time you restart the program (this works with every database start which comes to program slow-down)

By hostnames and by entire URL. These are two options provided to remove duplicate domains

Deep of parsing (pages). This option allows you to limit the number of pages to be parsed

Do not use additive words. If this option is chosen, additional words for parsing are not used

Disable filtering harvested links by Sieve-filter. This option disables filtering feature for the links saved in the template

Query ordering. This option allows you to specify the sequence of the required words to be “glued” when querying based on additional words parsing information by search engines

While parsing multiple search engines you are offered the option to set new query for each search engine or same queries for each search engine

Auto resumption parsing after program starting. This is the option that lets you run parsing immediately after the Hrefer program is started

Parsing delay. This option allows presetting the intervals between the queries for various search engines

Save ‘query -> URL’ into to filename_query.txt. The option allows to save file of all keywords the particular URL is found


Search Engines parsing – principal provisions

The Words and Additive Words databases are required to effectively parse any search engines.

Additive Words. This shows the tags of all the web-sites we are querying for to harvest. This database contains structures of parsing.
Words. This database is used to make the list of queries more completed with all the web-site tags. The words base are intended to have the possibility avail for the highly complete database. 


Let’s have a closer look at the Words database tab




The key “Create New!” is used to create a new Words database.

It is too simple to create a Words database.

There are several options to get this:

1) To use the database of words available.
I can easily share with some of my database of English words:
Top 500 English words - Download Link (2 kb)
Top 2300 English words - Download Link (8 kb)
26,000 English words - Download Link (81 kb)
150,000 English words - Download Link (460 kb)

2) To search for the vocabulary with the list of words online.
Open the Google.com page and type in the browser the query “English dictionary filetype:txt”.
Then the search engine gives you the following message:

We’ve found a huge number of text file to be appropriate for your inquiry. And the process is similar for any other languages to search for words.


3) To use the option to add Words database from other text files.
You may want to refer to a huge amount of electronic books which are available for open public to download. The books are preferred to be formatted as txt files.

You are offered here to look at one of the web-sites shown by the query manybooks.net.

This is a very good platform to download a whole lot of free books.

Go to the page of the book downloading and click “Plain text (.txt)” in the pop-down menu Download and click download.

The downloaded book is like that:

You create the new Words Database by clicking the button “Create new!

Click “Add words from text file…” from the Hrefer program and choose the downloaded book.

After I have added my first book to my new Words Database the lines turned to total 6.250.

The new Words Database looks like this now:

I believe you take in how to get your Words Database when you add your downloaded books.

Again, we may want to have this process done automatically.

When you get too many files with books, you may easily have them compiled into a single file. This is good to speed up adding the new words to the Hrefer program. Text Magician is a good tool to help us here being a free software.

The procedure goes as it is:

Click “Add files” and tick all the downloaded books. This is a perfect option to choose a good total of books at a time. Then click “Open”.
Click here “Click this box to combine files” if you need to compile the files, as well as “Include blank line either side” and press “Do it!” 

When all the procedures are done, the program responds as it is:
 
There was created the text file named aacombined.txt with the whole text from all the selected books. Then we add this file into the Hrefer and that is it!

You can download my English Words database collected from books: Download Link


4) There is the option to add words from Google search.

To do this, click “Add words from Google…
In the box “Enter keyword” type any keyword you need.

The pop-up menu “Choose Language” offers a choice of choosing the language required for your work.
 

Have a closer look to the Tab of Additive Words

Clicking this tab you get the option to add your own Words.

If you figure on compiling the database of various forums, you need to get as many tags of forums as possible to specify in the Additive words.

For example, you go to the English language-based forum to search for any tags appropriate and applicable for this one.


Here is the main page of the forum run on the Simple Machines engine:

The tags in red are the typical keywords to be used for further searching.

I have outlined such keywords for this particular page:
Welcome, Guest
Please login or register
Login with username
General Category
General Discussion
Posts
Topics
Last post by
Posts in
Topics by
Members
Latest Member
View the most recent posts on the forum
recent posts
More Stats
Users Online
Guests
Users
Users active in past 15 minutes
Most Online Today
Most Online Ever
Login
Forgot your password
Username
Password
Minutes to stay logged in
Always stay logged in
Powered by SMF
SMF ©2006-2011
Simple Machines LLC

The page of forum categories made on the Simple Machines engine:

I have outlined such keywords for this particular page:  

Subject
Started by
Replies
Views
Last post
Jump to
Topic
you have posted in
Normal Topic
Hot Topic
More than 15 replies
Very Hot Topic
More than 25 replies
Locked Topic
Sticky Topic
Poll

The page of forum topics running on the Simple Machines engine:

I have outlined such keywords for this particular page:  
Did you miss your activation email
Pages
Administrator
Member
Newbie
Logged
previous
next
Author
Print
Topic
Reply

The page of forum profiles running on the Simple Machines engine:

I have outlined such keywords for this particular page:  
Summary
Name
Posts
Position
Date Registered
Last Active
ICQ
AIM
MSN
YIM
Current Status
Picture/Text
Gender
Age
Location
Local Time
Language Signature
Show the last posts of this person
Show general statistics for this member
The link tags of such forums show the following:
index.php/board
index.php/topic
index.php?action=profile
index.php?action=register
index.php?action=login
index.php?action=help

All keywords for all pages on the Simple Machines engine:  
welcome, guest
please login or register
login with username
general category
general discussion
posts
topics
last post by
posts in
topics by
members
latest member
view the most recent posts on the forum
recent posts
more stats
users online
guests
users
users active in past 15 minutes
most online today
most online ever
login
forgot your password
username
password
minutes to stay logged in
always stay logged in
powered by smf
smf © 2006-2011
simple machines llc
subject
started by
replies
views
last post
jump to
topic you have posted in
normal topic
hot topic
more than 15 replies
very hot topic
more than 25 replies
locked topic
sticky topic
poll
did you miss your activation email
pages
administrator
member
newbie
logged
previous
next
author
print
topic
reply
summary
name
posts
position
date registered
last active
icq
aim
msn
yim
email
website
current status
picture/text
gender
age
location
local time
language
signature
show the last posts of this person
show general statistics for this member
We enter them into the Additive words database and we get perfect database of SMF forums in English language for a song!

What to do if we are in need of forums not in English but any other languages?

Well, there are several options:

1) To do all above steps on the forum in the target language
2) Use the machine translators to translate the English language-based tags into any other target language.

So, take it you have to get a pile of forums in French language.


Have the chosen tags translated into French language.

Eventually we have the tags as follow:

Bienvenue, Invité
S'il vous plaît connecter ou vous inscrire
Connexion avec identifiant
catégorie générale
discussion générale
messages
sujets
Dernier message par
messages dans
sujets par des
membres
dernier membre
Afficher les plus récents messages du forum
les messages récents
plus de stats
utilisateurs en ligne
invités
utilisateurs
utilisateurs actifs dans les 15 dernières minutes
aujourd'hui la plupart en ligne
Record de connexion
Connectez-vous
Mot de passe oublié
nom d'utilisateur
mot de passe
minutes pour rester connecté
Toujours connecté
Propulsé par SMF
SMF © 2006-2011
Simple Machines LLC
sous réserve
ouvertes par
réponses
vues
dernier message
sauter à
le sujet que vous avez écrit dans
sujet normal
sujet chaud
Plus de 15 réponses
sujet très chaud
plus de 25 réponses
sujet verrouillé
des post-it
sondage
Avez-vous perdu votre courriel d'activation
pages
l'administrateur
membres
Débutant
connecté
précédente
la prochaine
l'auteur
Imprimer
le sujet
Répondre
Résumé
nom
messages
position
date d'inscription
Dernière connexion
ICQ
visent
MSN
Yim
email
site web
état actuel
image / texte
des sexes
l'âge
Lieu
heure locale
la langue
signature
Voir les derniers messages de cette personne
Voir les statistiques générales pour ce membre
I have no idea of the appropriate quality of such translation, but these may be used for parsing for no problem.


Here is the main page of the forum run on the phpBB engine: 

The tags in red are the typical keywords to be used for further searching.

I have outlined such keywords for this particular page:
Board index
FAQ
Login
It is currently
View unanswered posts
View active topics
Topics
Posts
Last post
Login
Username
Password
Who is online
Log me on automatically each visit
In total there are
users online
registered
hidden
guests
based on users active over the past 5 minutes
Registered users
Google [Bot]
Yahoo [Bot]
Legend
Administrators
Global moderators
Total posts
Total topics
Total members
Our newest member
Powered by phpBB
phpbb
phpBB Group
Delete all board cookies
All times are UTC

The page of forum categories made on the phpBB engine:



I have outlined such keywords for this particular page:
Return to Board index
Users browsing this forum
No registered users
Forum permissions
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum
Jump to
Display topics from previous
Sort by
Topics
Replies
Views
Last post
viewforum
 Download all tags collected from all forum engines: Download Link

The most effective tags:
viewtopic
showthread
viewforum
forumdisplay
phpbb3
yabb2
member
forum
forums
foro
phorum
yabb
smf
vbulletin
forum_viewtopic
profile
viewthread
thread-view
forum_topics
register
ubbthreads
phpbb2
forum_id
posting
forum_viewforum
memberlist
viewthread.php
entry_ubb
discuss
printview
discussionid
showflat
forum_posts
ftopic
printthread
yaf_postst
topic
thread
arcade
printertopic
newbb
messages
display_topic_threads
dcforum
messageview
board_entry
board
  
Now, let’s have a look at the tab Search Engines options & Filter

This tab offers search engines that you are going to parse

Sieve-filter is the template which only saves those URLs in its database that match this particular template.


The template like this is used primarily in the Hrefer to parse forums:
forum
phorum
topic
bulletin
thread
modules.php
yabb
ultimatebb
board
phpbb
act=ST
act=SF
list.php
posting
profile.php
act=Reg
post.php
ubb
exbb
newbb
ipb
invision
foro/
/sutra
lofiversion

I would recommend that you replace this template for the tougher one that accepts higher % of forums in its database:
topic.php?forum=
yabb.cgi?board=
yabb.pl?board=
index.php?topic=
index.php?board=
posting.php?mode=
ikonboard.cgi?s=
viewtopic.php?topic_id=
showflat.php?cat=
newreply.php?s=
showthread.php?postid=
showtopic.php?threadid=
viewthread.php?s=
dcboard.cgi?az=
forum_viewpost.asp?tid=
newreply.php?do=
viewtopic.php
showthread.php
showtopic
forumdisplay.php
viewforum.php
showforum
http://forum.
http://forums.
http://foro.
http://phorum/
/forum/
/forums/
/foro/
/phorum/

The way the statistics looks done for search engines in the Hrefer program

Let’s get a view of the tab Multithreading

This tab sets a deal of threads for proxychecker and parser.
It provides the option to set the interval between the queries and percentage value of various queries, either.


Let’s have a closer look at the tab Proxylist

This tab has the choice to update the proxy and the button “Options”.

The “Options” tab offers to specify the private address for the proxy checker script as engine.php.


The script of the proxy checker appears as follows when opened:

There is a text file list.txt inside the folder /proxyc/ with a list of addresses. The proxy checker has these proxies checked one by one.


One of the three options below are offered to acquire a good proxy:

1) To purchase a proxy

2) To create the private proxy list
To do this you need to upload the folder /proxyc/ to your own hosting, edit the file list.txt and make changes into the tab Options of the Hrefer.

3) To search for the files /proxyc/engine.php belonging to other people and pick up the best ones.

Open http://google.com and type the next query “inurl:/proxyc/engine.php HTTP_HOST”

The following we have:

One by one you type addresses in the search line of Hrefer software and refresh the proxy list.
engine.php from which Hrefer harvests the most proxies and use at your own.


Sort link database by PR

Click the button Tools => Sort current link database by PR:

Sorting method can appear as follows:

1) Standard
This option ranks the database in descending order related to Page Rank.

2) Multisort
Under this classification option, the database is divided into 10 files with Page Rank values.

3) Sort in range
This sorting option removes all links from the database that avoid matching Page Rank range initially set.


The process of Page Rank verification is provided as follows:

Hrefer is the software that offers vast options and alternatives. 

I hope this post will help you to use the challenges and apply them for your benefit.
 

Organizer of contest - BotmasterLabs.Net

3 комментария: