The Web Design Group

... Making the Web accessible to all.

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
> How contents of Blogs/Articles are crawled/indexed by Google or other Search Engines?
venkat_walking
post Apr 29 2012, 08:40 PM
Post #1


Member
***

Group: Members
Posts: 30
Joined: 22-May 09
Member No.: 8,678



Hello,

This is technical question. I am planning to create a website which will have articles/questions based on technical subject).

As faras I know Google return the result (text search) that best matches from the content in the static web page (correct if I am wrong).
Eg: If I search 'xyz' in Google search, it will return all the results where 'xyz' matches i.e from the static text.

Millions are blogs, articles are written everyday, and of-course they would be saved in databases and not a static file would be created for every article. So how come search engines are allowed to search in to the DB to retrieve the result.

I hope my question is clear to all. Answers would help me in designing a website.

Thanks

Venkat
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Darin McGrew
post Apr 29 2012, 10:08 PM
Post #2


WDG Member
********

Group: Root Admin
Posts: 7,999
Joined: 4-August 06
From: Mountain View, CA
Member No.: 3



Please see the FAQ entry Is there a way to get indexed better by the search engines? especially the article Improving Search Engine Rankings. They discuss some of the basic mechanisms behind search engines.


--------------------
Darin McGrew
WDG Member since 1998
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
venkat_walking
post Apr 29 2012, 10:19 PM
Post #3


Member
***

Group: Members
Posts: 30
Joined: 22-May 09
Member No.: 8,678



QUOTE(Darin McGrew @ Apr 29 2012, 10:08 PM) *

Please see the FAQ entry Is there a way to get indexed better by the search engines? especially the article Improving Search Engine Rankings. They discuss some of the basic mechanisms behind search engines.


Thanks Darin for reply,

But this was not the answers I was looking. I already have website and I know that Meta tag and Title are important for SEO.

Suppose say, I have 100 articles to publish. Now are my below ways of designing
1) Insert all the contents of articles in DB, so total 100 entries identified by unique key.
2) Create one dynamic page (.jsp) and display the articles based on ID.


Now META tag and TITLE content for every article would be different. How would I tell search engine that specific article one is searching is in my Database, so query it.


Please discuss.

Thanks

Venkat
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Darin McGrew
post Apr 30 2012, 12:29 AM
Post #4


WDG Member
********

Group: Root Admin
Posts: 7,999
Joined: 4-August 06
From: Mountain View, CA
Member No.: 3



Each article needs its own URL. Search engines don't care whether the server sends plain HTML files, or whether a server-side program assembles the HTML from a database. All they care about is the URL and the HTML that the server sends.

Consider this forum. Every page is generated by a server-side program pulling content from a database. Google has no problem indexing it.


--------------------
Darin McGrew
WDG Member since 1998
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Brian Chandler
post Apr 30 2012, 12:40 AM
Post #5


Jocular coder
********

Group: Members
Posts: 2,212
Joined: 31-August 06
Member No.: 43



QUOTE
Now META tag and TITLE content for every article would be different. How would I tell search engine that specific article one is searching is in my Database, so query it.


You can't. Google (or whatever) cannot see anything except the published pages, and Google at least follows the robots.txt protocol. So if you think some META tag is important for "SEO" (it isn't, but never mind), just arrange to put that meta tag on the published copy of the particular article.


--------------------
Brian Chandler
Nothing in this post constitutes "commercial solicitation". PayPal does not solicit residents of Japan. Contents may settle in transit. "Legal mind" may or may not be brain-damaged.
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
venkat_walking
post Apr 30 2012, 02:11 AM
Post #6


Member
***

Group: Members
Posts: 30
Joined: 22-May 09
Member No.: 8,678



Thanks Darin and Brian for the reply,

@Brian, I could not understand what u meant by term "published page" as there is just one dynamic page .jsp which retrieves the content from DB just like this forum.

@Darin
I dont know what to call it, a crawling or indexing.

See I will make it as a case study.

I posted the same question in http://www.webdeveloper.com/forum, here is the link. where it has not been answered by anyone yet.

If I google-search part of my text of the description of my question I posted above "This is technical question. I am planning to create a website" i get results,

IPB Image

I am getting top results from webdeveloper.com ,but it doesn't show results from htmlhelp.com (even if I append "htmlhelp.com" in the search textbox) , this is same happening for others question posted by other people in this forum.

Now, why that so, results are only from webdeveloper.com and not from htmlhelp.com forum, even they would be pulling the content out from the DB just like ours.

There must be some programming difference in both, rt? as htmlhelp.com is matching with webdevlopers.com in page ranking.


Please discuss

Thanks

Venkat

This post has been edited by venkat_walking: Apr 30 2012, 02:24 AM
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
pandy
post Apr 30 2012, 02:37 AM
Post #7


Don't like donuts. Don't do MySpace.
********

Group: WDG Moderators
Posts: 15,231
Joined: 9-August 06
Member No.: 6



I get htmlhelp.com at the top of the search and webdeveloper.com isn't even on the first page.
http://www.google.com/search?q=How%20conte...ch%20Engines%3F

Anyway, pages aren't indexed by google as they are created. They crawl with some regularity, but it takes some time before new pages make it to the search results. For pages with a high page rank and frequently added content this time is very short nowadays (hours) but it doesn't happen instantly even if it sometimes is close to instant.

It has nothing to do with the DB. Google is aware of the backend as little as your browser is. Googlebot basically "clicks" links an slurps up the text that is delivered as a result. If that text comes from a database or from a static HTML file doesn't matter.

There used to a be a limit to how many dynamic URLs were crawled for each site, or at least that was the rumor. I don't think that's true anymore, but I don't really know.


--------------------
"Never go to excess, but let moderation be your guide."
- Cicero

IPB Image
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
venkat_walking
post Apr 30 2012, 02:53 AM
Post #8


Member
***

Group: Members
Posts: 30
Joined: 22-May 09
Member No.: 8,678



QUOTE(pandy @ Apr 30 2012, 02:37 AM) *

I get htmlhelp.com at the top of the search and webdeveloper.com isn't even on the first page.
http://www.google.com/search?q=How%20conte...ch%20Engines%3F

Anyway, pages aren't indexed by google as they are created. They crawl with some regularity, but not it takes some time before new pages make it to the search results. For pages with a high page rank and frequently added content this time is pretty short nowadays (hours) but it doesn''t happen instantly.


I have made spell error in the Title @webdeveloper.com hence its not showing up. But I was talking about the topic content not the main topic- title of the question

Later you edited your reply

So what should I do, shall I carry forward with my earlier design
1) Insert all the contents of articles in DB, so total 100 entries identified by unique key.
2) Create one dynamic page (.jsp) and display the articles based on unique key.

Thanks

Venkat

This post has been edited by venkat_walking: Apr 30 2012, 02:59 AM
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
pandy
post Apr 30 2012, 02:55 AM
Post #9


Don't like donuts. Don't do MySpace.
********

Group: WDG Moderators
Posts: 15,231
Joined: 9-August 06
Member No.: 6



There you see. wink.gif


--------------------
"Never go to excess, but let moderation be your guide."
- Cicero

IPB Image
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
venkat_walking
post Apr 30 2012, 04:13 AM
Post #10


Member
***

Group: Members
Posts: 30
Joined: 22-May 09
Member No.: 8,678



QUOTE(pandy @ Apr 30 2012, 02:37 AM) *

I get htmlhelp.com at the top of the search and webdeveloper.com isn't even on the first page.
http://www.google.com/search?q=How%20conte...ch%20Engines%3F

Anyway, pages aren't indexed by google as they are created. They crawl with some regularity, but it takes some time before new pages make it to the search results. For pages with a high page rank and frequently added content this time is very short nowadays (hours) but it doesn't happen instantly even if it sometimes is close to instant.

It has nothing to do with the DB. Google is aware of the backend as little as your browser is. Googlebot basically "clicks" links an slurps up the text that is delivered as a result. If that text comes from a database or from a static HTML file doesn't matter.

There used to a be a limit to how many dynamic URLs were crawled for each site, or at least that was the rumor. I don't think that's true anymore, but I don't really know.


Now, results from forums.htmlhelp.com is coming at the top. as you said it takes hours to be indexed by search engine.

One more question

say URL for every article would be like this

www.abc.com?topic=1
www.abc.com?topic=2
www.abc.com?topic=3
www.abc.com?topic=4
www.abc.com?topic=5

and so on

Ofcourse Id's are retrieved as per user request.

So as you said, "Googlebot basically "clicks" links an slurps up the text that is delivered as a result." , do I need to maintain the URL (just Urls) in separate static file or HTML file . If yes then how to make it useful that search engine automatically redirects to that appropriate page,


Thanks

Venkat

User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
pandy
post Apr 30 2012, 05:37 AM
Post #11


Don't like donuts. Don't do MySpace.
********

Group: WDG Moderators
Posts: 15,231
Joined: 9-August 06
Member No.: 6



No, you don't need static files. As said, googlebot and browsers don't "know" where the content comes form. For them it's just a data stream in both cases.

It takes hours for content on this site to be indexed. For other sites it can take days or weeks or not happen at all. It depends on page rank, popularity, update frequency, the moon phase, the sunspots...


--------------------
"Never go to excess, but let moderation be your guide."
- Cicero

IPB Image
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
venkat_walking
post Apr 30 2012, 06:07 AM
Post #12


Member
***

Group: Members
Posts: 30
Joined: 22-May 09
Member No.: 8,678



QUOTE(pandy @ Apr 30 2012, 05:37 AM) *

No, you don't need static files. As said, googlebot and browsers don't "know" where the content comes form. For them it's just a data stream in both cases.

It takes hours for content on this site to be indexed. For other sites it can take days or weeks or not happen at all. It depends on page rank, popularity, update frequency, the moon phase, the sunspots...



I have been also searching for this answer in other sources.

I came to know about sitemap

Is it the one that we need to play around to tell the Googlebot about the dynamic contents. If yes, do I need to regularly update the sitemap with URL whenever new topic is created.

eg. say Sitemap contains below URL,
www.abc.com?topic=1
www.abc.com?topic=2
www.abc.com?topic=3

Now new topic is created,
so, append

www.abc.com?topic=4


Please reply

Thanks

Venkat

This post has been edited by venkat_walking: Apr 30 2012, 06:08 AM
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
pandy
post Apr 30 2012, 06:33 AM
Post #13


Don't like donuts. Don't do MySpace.
********

Group: WDG Moderators
Posts: 15,231
Joined: 9-August 06
Member No.: 6



I don't know. If it's a forum it would be ridiculous to do so if it's the occational article it's more doable.

You think about this the wrong way. Create your site, fill it with content. Then think about SEO if you need to. I'm not saying that SEO is pointless, but it isn't what makes or breaks a site. If the site is good, people will come and so will google, SEO or not. If it sucks it doesn't matter much how much you put into SEO. To verify this, just look at some SEO companies sites. They claim they know all the tricks in the book, but more often than not they don't have an exceptional page rank, quite often it is the opposite. You can find suitable links to such sites by finding some spammy sigs in this and other forums. glare.gif


--------------------
"Never go to excess, but let moderation be your guide."
- Cicero

IPB Image
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



- Lo-Fi Version Time is now: 1st August 2014 - 11:30 PM