Printable Version of Topic

Click here to view this topic in its original format

HTMLHelp Forums _ General Web Design _ How contents of Blogs/Articles are crawled/indexed by Google or other Search Engines?

Posted by: venkat_walking Apr 29 2012, 08:40 PM

Hello,

This is technical question. I am planning to create a website which will have articles/questions based on technical subject).

As faras I know Google return the result (text search) that best matches from the content in the static web page (correct if I am wrong).
Eg: If I search 'xyz' in Google search, it will return all the results where 'xyz' matches i.e from the static text.

Millions are blogs, articles are written everyday, and of-course they would be saved in databases and not a static file would be created for every article. So how come search engines are allowed to search in to the DB to retrieve the result.

I hope my question is clear to all. Answers would help me in designing a website.

Thanks

Venkat

Posted by: Darin McGrew Apr 29 2012, 10:08 PM

Please see the FAQ entry http://www.htmlhelp.com/faq/html/publish.html#index-better especially the article http://www.htmlhelp.com/feature/seo/. They discuss some of the basic mechanisms behind search engines.

Posted by: venkat_walking Apr 29 2012, 10:19 PM

QUOTE(Darin McGrew @ Apr 29 2012, 10:08 PM) *

Please see the FAQ entry http://www.htmlhelp.com/faq/html/publish.html#index-better especially the article http://www.htmlhelp.com/feature/seo/. They discuss some of the basic mechanisms behind search engines.


Thanks Darin for reply,

But this was not the answers I was looking. I already have website and I know that Meta tag and Title are important for SEO.

Suppose say, I have 100 articles to publish. Now are my below ways of designing
1) Insert all the contents of articles in DB, so total 100 entries identified by unique key.
2) Create one dynamic page (.jsp) and display the articles based on ID.


Now META tag and TITLE content for every article would be different. How would I tell search engine that specific article one is searching is in my Database, so query it.


Please discuss.

Thanks

Venkat

Posted by: Darin McGrew Apr 30 2012, 12:29 AM

Each article needs its own URL. Search engines don't care whether the server sends plain HTML files, or whether a server-side program assembles the HTML from a database. All they care about is the URL and the HTML that the server sends.

Consider this forum. Every page is generated by a server-side program pulling content from a database. Google has no problem indexing it.

Posted by: Brian Chandler Apr 30 2012, 12:40 AM

QUOTE
Now META tag and TITLE content for every article would be different. How would I tell search engine that specific article one is searching is in my Database, so query it.


You can't. Google (or whatever) cannot see anything except the published pages, and Google at least follows the robots.txt protocol. So if you think some META tag is important for "SEO" (it isn't, but never mind), just arrange to put that meta tag on the published copy of the particular article.

Posted by: venkat_walking Apr 30 2012, 02:11 AM

Thanks Darin and Brian for the reply,

@Brian, I could not understand what u meant by term "published page" as there is just one dynamic page .jsp which retrieves the content from DB just like this forum.

@Darin
I dont know what to call it, a crawling or indexing.

See I will make it as a case study.

I posted the same question in http://www.webdeveloper.com/forum, http://www.webdeveloper.com/forum/showthread.php?t=260004. where it has not been answered by anyone yet.

If I google-search part of my text of the description of my question I posted above "This is technical question. I am planning to create a website" i get results,

IPB Image

I am getting top results from webdeveloper.com ,but it doesn't show results from htmlhelp.com (even if I append "htmlhelp.com" in the search textbox) , this is same happening for others question posted by other people in this forum.

Now, why that so, results are only from webdeveloper.com and not from htmlhelp.com forum, even they would be pulling the content out from the DB just like ours.

There must be some programming difference in both, rt? as htmlhelp.com is matching with webdevlopers.com in page ranking.


Please discuss

Thanks

Venkat

Posted by: pandy Apr 30 2012, 02:37 AM

I get htmlhelp.com at the top of the search and webdeveloper.com isn't even on the first page.
http://www.google.com/search?q=How%20contents%20of%20Blogs%2FArticles%20are%20crawled%2Findexed%20by%20Google%20or%20other%20Search%20Engines%3F

Anyway, pages aren't indexed by google as they are created. They crawl with some regularity, but it takes some time before new pages make it to the search results. For pages with a high page rank and frequently added content this time is very short nowadays (hours) but it doesn't happen instantly even if it sometimes is close to instant.

It has nothing to do with the DB. Google is aware of the backend as little as your browser is. Googlebot basically "clicks" links an slurps up the text that is delivered as a result. If that text comes from a database or from a static HTML file doesn't matter.

There used to a be a limit to how many dynamic URLs were crawled for each site, or at least that was the rumor. I don't think that's true anymore, but I don't really know.

Posted by: venkat_walking Apr 30 2012, 02:53 AM

QUOTE(pandy @ Apr 30 2012, 02:37 AM) *

I get htmlhelp.com at the top of the search and webdeveloper.com isn't even on the first page.
http://www.google.com/search?q=How%20contents%20of%20Blogs%2FArticles%20are%20crawled%2Findexed%20by%20Google%20or%20other%20Search%20Engines%3F

Anyway, pages aren't indexed by google as they are created. They crawl with some regularity, but not it takes some time before new pages make it to the search results. For pages with a high page rank and frequently added content this time is pretty short nowadays (hours) but it doesn''t happen instantly.


I have made spell error in the Title @webdeveloper.com hence its not showing up. But I was talking about the topic content not the main topic- title of the question

Later you edited your reply

So what should I do, shall I carry forward with my earlier design
1) Insert all the contents of articles in DB, so total 100 entries identified by unique key.
2) Create one dynamic page (.jsp) and display the articles based on unique key.

Thanks

Venkat

Posted by: pandy Apr 30 2012, 02:55 AM

There you see. wink.gif

Posted by: venkat_walking Apr 30 2012, 04:13 AM

QUOTE(pandy @ Apr 30 2012, 02:37 AM) *

I get htmlhelp.com at the top of the search and webdeveloper.com isn't even on the first page.
http://www.google.com/search?q=How%20contents%20of%20Blogs%2FArticles%20are%20crawled%2Findexed%20by%20Google%20or%20other%20Search%20Engines%3F

Anyway, pages aren't indexed by google as they are created. They crawl with some regularity, but it takes some time before new pages make it to the search results. For pages with a high page rank and frequently added content this time is very short nowadays (hours) but it doesn't happen instantly even if it sometimes is close to instant.

It has nothing to do with the DB. Google is aware of the backend as little as your browser is. Googlebot basically "clicks" links an slurps up the text that is delivered as a result. If that text comes from a database or from a static HTML file doesn't matter.

There used to a be a limit to how many dynamic URLs were crawled for each site, or at least that was the rumor. I don't think that's true anymore, but I don't really know.


Now, results from forums.htmlhelp.com is coming at the top. as you said it takes hours to be indexed by search engine.

One more question

say URL for every article would be like this

www.abc.com?topic=1
www.abc.com?topic=2
www.abc.com?topic=3
www.abc.com?topic=4
www.abc.com?topic=5

and so on

Ofcourse Id's are retrieved as per user request.

So as you said, "Googlebot basically "clicks" links an slurps up the text that is delivered as a result." , do I need to maintain the URL (just Urls) in separate static file or HTML file . If yes then how to make it useful that search engine automatically redirects to that appropriate page,


Thanks

Venkat


Posted by: pandy Apr 30 2012, 05:37 AM

No, you don't need static files. As said, googlebot and browsers don't "know" where the content comes form. For them it's just a data stream in both cases.

It takes hours for content on this site to be indexed. For other sites it can take days or weeks or not happen at all. It depends on page rank, popularity, update frequency, the moon phase, the sunspots...

Posted by: venkat_walking Apr 30 2012, 06:07 AM

QUOTE(pandy @ Apr 30 2012, 05:37 AM) *

No, you don't need static files. As said, googlebot and browsers don't "know" where the content comes form. For them it's just a data stream in both cases.

It takes hours for content on this site to be indexed. For other sites it can take days or weeks or not happen at all. It depends on page rank, popularity, update frequency, the moon phase, the sunspots...



I have been also searching for this answer in other sources.

I came to know about sitemap

Is it the one that we need to play around to tell the Googlebot about the dynamic contents. If yes, do I need to regularly update the sitemap with URL whenever new topic is created.

eg. say Sitemap contains below URL,
www.abc.com?topic=1
www.abc.com?topic=2
www.abc.com?topic=3

Now new topic is created,
so, append

www.abc.com?topic=4


Please reply

Thanks

Venkat

Posted by: pandy Apr 30 2012, 06:33 AM

I don't know. If it's a forum it would be ridiculous to do so if it's the occational article it's more doable.

You think about this the wrong way. Create your site, fill it with content. Then think about SEO if you need to. I'm not saying that SEO is pointless, but it isn't what makes or breaks a site. If the site is good, people will come and so will google, SEO or not. If it sucks it doesn't matter much how much you put into SEO. To verify this, just look at some SEO companies sites. They claim they know all the tricks in the book, but more often than not they don't have an exceptional page rank, quite often it is the opposite. You can find suitable links to such sites by finding some spammy sigs in this and other forums. glare.gif

Powered by Invision Power Board (http://www.invisionboard.com)
© Invision Power Services (http://www.invisionpower.com)