Help - Search - Members - Calendar
Full Version: Designing an efficient message board system from scratch
HTMLHelp Forums > Programming > Server-side Scripting
Christopher Tidy
Hi folks,

I want to write a message board application for a website I'm building. I want to write it from scratch, partly for the experience and partly so that I can have it exactly the way I want it.

Actually, I wrote a simple message board application a few years ago. It's PHP based. Each message is stored as a separate PHP file, in which the data is stored in separate PHP variables. The message date forms part of the file name, to allow for faster ordering of the messages. To retrieve the data, the PHP message file is included. The application works fine for a site with low traffic.

The requirements for the new message board are as follows:

* Semi-threaded (i.e., single-level threaded) design.
* Pages produced by application must be valid HTML or XHTML.
* Fonts, colours, headers and footers must be customisable.
* No JavaScript or Flash permitted.
* URL parsing required.
* HTML in postings not permitted.
* Message board must not require registration or login.
* Some rudimentary form of protection against spamming must be included (e.g., an enforced time delay between postings).
* Application must be designed to minimise load on server when posts are archived online in perpetuity.
* Likely traffic level: 5 posts per day.

So I'm wondering if the system I wrote several years ago will meet the requirements, or whether a system based around a MySQL database will be more efficient.

Can anyone advise me?

Best wishes,

Chris Tidy
Brian Chandler
QUOTE(Christopher Tidy @ Oct 30 2007, 02:19 PM) *

Hi folks,
I want to write a message board application for a website I'm building. I want to write it from scratch, partly for the experience and partly so that I can have it exactly the way I want it.


There's a lot to be said for that...

QUOTE

Actually, I wrote a simple message board application a few years ago. It's PHP based. Each message is stored as a separate PHP file, in which the data is stored in separate PHP variables. The message date forms part of the file name, to allow for faster ordering of the messages. To retrieve the data, the PHP message file is included. The application works fine for a site with low traffic.

The requirements for the new message board are as follows:

* Semi-threaded (i.e., single-level threaded) design.
* Pages produced by application must be valid HTML or XHTML.
* Fonts, colours, headers and footers must be customisable.
* No JavaScript or Flash permitted.
* URL parsing required.
* HTML in postings not permitted.
* Message board must not require registration or login.
* Some rudimentary form of protection against spamming must be included (e.g., an enforced time delay between postings).
* Application must be designed to minimise load on server when posts are archived online in perpetuity.
* Likely traffic level: 5 posts per day.

So I'm wondering if the system I wrote several years ago will meet the requirements, or whether a system based around a MySQL database will be more efficient.


The Include mechanism is basically a security hole unless only controlled people write it. So I think it's a much^H^H^H^Hvastly better idea to use a database. You will then get all obvious search, sort, and similar functions more or less for free. Here are a few comments...

* Semi-threaded (i.e., single-level threaded) design.
You mean one "forum", with many "threads". Sounds simple enough.

* Pages produced by application must be valid HTML or XHTML.
No problem at all. (Problems would be HTML problems, not PHP ones. There is no demonstratable benefit to using XHTML.)

* Fonts, colours, headers and footers must be customisable.
Oh, I thought you were writing a one-off? If you write your own program, *everything* is customizable, because you simply change the program, but (at least for unary properties) you can do all the style customising with CSS.

* No JavaScript or Flash permitted.
Makes thing simpler. You have to wash the input text anyway.

* URL parsing required.
You mean make URLs into links? Should be a library function to do that somewhere...

* HTML in postings not permitted.
But you have to all *some* formatting mechanism, to allow italics, accented characters and similar whatnot. Using UTF-8 will help.

* Message board must not require registration or login.
That's some more programming you don't have to do then.

* Some rudimentary form of protection against spamming must be included (e.g., an enforced time delay between postings).
Uh huh

* Application must be designed to minimise load on server when posts are archived online in perpetuity.
* Likely traffic level: 5 posts per day.

Google for "MYSQL millions". 1000000/5/365 = longer than I will live, at least.

Christopher Tidy
Hi Brian,

Thanks for your thoughtful response. Sorry I've taken a few days to follow it up.

QUOTE(Brian Chandler @ Oct 30 2007, 02:55 PM) *

The Include mechanism is basically a security hole unless only controlled people write it. So I think it's a much^H^H^H^Hvastly better idea to use a database. You will then get all obvious search, sort, and similar functions more or less for free. Here are a few comments...


Is it a security hole because someone with no access to the server could, by editing the URL, provoke the inclusion of a script which wasn't meant to be used in that way, perhaps with unexpected consequences? Or is it only a security hole if you don't entirely trust the people who have access to the server?

QUOTE

* Semi-threaded (i.e., single-level threaded) design.
You mean one "forum", with many "threads". Sounds simple enough.


I mean that if you reply to the first message in a thread, that message is treated in the same way as if you reply to a reply. I would regard this forum at HTMLHelp as being single-level threaded. Sometimes you see a message board where all the messages are shown in a kind of tree structure, and a reply to a reply is treated differently from a reply to an original message. I think this looks messy, so I'd rather avoid it. But as I only anticipate a moderate amount of activity, I'd like to show links to individual messages on the index page, so that you can go straight to a new reply.

QUOTE

* Pages produced by application must be valid HTML or XHTML.
No problem at all. (Problems would be HTML problems, not PHP ones. There is no demonstratable benefit to using XHTML.)


I guess I just like the well-formed nature of XHTML.

QUOTE

* Fonts, colours, headers and footers must be customisable.
Oh, I thought you were writing a one-off? If you write your own program, *everything* is customizable, because you simply change the program, but (at least for unary properties) you can do all the style customising with CSS.


I am. But I'd like to keep the headers, footers and styling separate from the functional code, just in case I choose to use the message board for a second site in the future.

QUOTE

* No JavaScript or Flash permitted.
Makes thing simpler. You have to wash the input text anyway.

* URL parsing required.
You mean make URLs into links? Should be a library function to do that somewhere...


I've actually never managed to find a URL parsing library function in PHP before, but I could look up the official syntax for URLs and write one using a regular expression.

QUOTE

* HTML in postings not permitted.
But you have to all *some* formatting mechanism, to allow italics, accented characters and similar whatnot. Using UTF-8 will help.


It's a matter of opinion I think. I'm not a huge fan of message board systems which allow users to do a lot of text formatting. I might include a "quote" feature, but that's probably all. After all, Usenet works well without formatting.

QUOTE

* Message board must not require registration or login.
That's some more programming you don't have to do then.


I will probably have an administrator login feature for myself, probably setting a cookie with an MD5 hash of my password on the local computer.

QUOTE

* Some rudimentary form of protection against spamming must be included (e.g., an enforced time delay between postings).
Uh huh

* Application must be designed to minimise load on server when posts are archived online in perpetuity.
* Likely traffic level: 5 posts per day.

Google for "MYSQL millions". 1000000/5/365 = longer than I will live, at least.


It seems like a database is the best choice, even if it involves a bit more learning.

Would you also expect a database to be more efficient in terms of the load on the server (compared to messages stored as individual files) when the number of messages stored is large? I would anticipate that only the most recent messages will be viewed frequently, and the older messages will be viewed much less often.

Thanks very much for the advice.

Best wishes,

Chris
Brian Chandler
[quote name='Christopher Tidy' date='Nov 6 2007, 12:23 PM' post='17953']
Hi Brian,

Thanks for your thoughtful response. Sorry I've taken a few days to follow it up.

[quote name='Brian Chandler' post='17725' date='Oct 30 2007, 02:55 PM']
The Include mechanism is basically a security hole unless only controlled people write it. So I think it's a much^H^H^H^Hvastly better idea to use a database. You will then get all obvious search, sort, and similar functions more or less for free. Here are a few comments...
[/quote]

Is it a security hole because someone with no access to the server could, by editing the URL, provoke the inclusion of a script which wasn't meant to be used in that way, perhaps with unexpected consequences? Or is it only a security hole if you don't entirely trust the people who have access to the server?
[/quote]

I mean that if you do an include on a file you haven't eyeballed, or someone you trust hasn't, then you really have to be very careful what you allow in it - you have to have thought of all possible tricks. But if you are just echoing plaintext, there is no /include/ injection risk, because the plaintext is not "executed" (in any sense). If you try to have the parameterising stuff in the plaintext, without using include, you have to parse it . It's just much simpler letting a database provide the field structure. (Hope you see what I mean.)


[quote]
[quote]
* Semi-threaded (i.e., single-level threaded) design.
You mean one "forum", with many "threads". Sounds simple enough.
[/quote]

I mean that if you reply to the first message in a thread, that message is treated in the same way as if you reply to a reply. I would regard this forum at HTMLHelp as being single-level threaded. Sometimes you see a message board where all the messages are shown in a kind of tree structure, and a reply to a reply is treated differently from a reply to an original message. I think this looks messy, so I'd rather avoid it. But as I only anticipate a moderate amount of activity, I'd like to show links to individual messages on the index page, so that you can go straight to a new reply.

[quote]
* Pages produced by application must be valid HTML or XHTML.
No problem at all. (Problems would be HTML problems, not PHP ones. There is no demonstratable benefit to using XHTML.)
[/quote]

I guess I just like the well-formed nature of XHTML.
[/quote]

[quote]
Hmm. Can you give a coherent definition of "well-formed" such that XHTML is, and HTML isn't? XHTML is valid XML (if I've got it right), HTML is valid HTML. Neither is quite valid SGML (or something; I forget the details). But it you use XHTML to serve pages to current browsers, AIUI you are relying on some pretty ill-formed kludges, so it's hardly very philosophically pukkah.

[quote]
* Fonts, colours, headers and footers must be customisable.
Oh, I thought you were writing a one-off? If you write your own program, *everything* is customizable, because you simply change the program, but (at least for unary properties) you can do all the style customising with CSS.
[/quote]

I am. But I'd like to keep the headers, footers and styling separate from the functional code, just in case I choose to use the message board for a second site in the future.
[/quote]
OK, well just put the CSS and stuff for the furniture in a separate file from the body text stuff.

Anyway, I advocate POD (Programming On Demand). The best way to customise to a second site is just to change the bits you need to change. After doing this for about four sites, you *then* know enough about it to make sensible decisions when writing a "proper" customization system.

[quote]

[quote]
* No JavaScript or Flash permitted.
Makes thing simpler. You have to wash the input text anyway.

* URL parsing required.
You mean make URLs into links? Should be a library function to do that somewhere...
[/quote]

I've actually never managed to find a URL parsing library function in PHP before, but I could look up the official syntax for URLs and write one using a regular expression.

[quote]
* HTML in postings not permitted.
But you have to all *some* formatting mechanism, to allow italics, accented characters and similar whatnot. Using UTF-8 will help.
[/quote]

It's a matter of opinion I think. I'm not a huge fan of message board systems which allow users to do a lot of text formatting. I might include a "quote" feature, but that's probably all. After all, Usenet works well without formatting.

[quote]
* Message board must not require registration or login.
That's some more programming you don't have to do then.
[/quote]

I will probably have an administrator login feature for myself, probably setting a cookie with an MD5 hash of my password on the local computer.

[quote]
* Some rudimentary form of protection against spamming must be included (e.g., an enforced time delay between postings).
Uh huh

* Application must be designed to minimise load on server when posts are archived online in perpetuity.
* Likely traffic level: 5 posts per day.

Google for "MYSQL millions". 1000000/5/365 = longer than I will live, at least.
[/quote]

It seems like a database is the best choice, even if it involves a bit more learning.

Would you also expect a database to be more efficient in terms of the load on the server (compared to messages stored as individual files) when the number of messages stored is large? I would anticipate that only the most recent messages will be viewed frequently, and the older messages will be viewed much less often.
[/quote]

I am no expert. Frankly I have no grasp at all of how big the overhead of a trip to the DB server. But I'm still sure that problems will be slower coming with a DB than with any simple file-based system. File systems just weren't designed with the assumption that people would have thousands of files (I think).

HTH
Brian Chandler
http://imaginatorium.org

***************** SORRY! No idea why the QUOTEs are broken....
Christopher Tidy
QUOTE(Brian Chandler @ Nov 6 2007, 07:40 PM) *

I am no expert. Frankly I have no grasp at all of how big the overhead of a trip to the DB server. But I'm still sure that problems will be slower coming with a DB than with any simple file-based system. File systems just weren't designed with the assumption that people would have thousands of files (I think).


Intuitively, that makes good sense. A database (I would imagine) will be designed with the possibility of a very large number of records in mind, whereas a system involving files will probably just be designed to function at the time of testing.

I guess it's a possibility that a system using individual files would be more efficient than a database for a small number of messages, but less efficient for a large number. But I'm interested in what happens for large numbers.

Thanks for your advice. Time to learn some MySQL I think. I'll visit again when the project is finished.

Best wishes,

Chris
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2010 Invision Power Services, Inc.