The Web Design Group

... Making the Web accessible to all.

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
> "Web safe" file name characters
Christian J
post Jun 18 2009, 05:52 PM
Post #1


.
********

Group: WDG Moderators
Posts: 9,656
Joined: 10-August 06
Member No.: 7



I'm writing a script that accepts file uploads from the user's computer, and I want the script to make sure the file names are appropriate for the web. For practical purposes I'll probably not allow uppercase characters or spaces.

Is it a good idea to change an inappropriate file name automatically (with a message that "original file name" was changed to "new file name"), or should I request that the user renames the file himself before uploading it again? The latter means more work for the user, but the renamed file on his computer will have the same name as the uploaded file which might be good for future reference.

What to do with non-reserved special characters? http://www.ietf.org/rfc/rfc1738.txt says that

QUOTE
only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL

...which I gather means that servers and browsers understand all of these in file names:

CODE
$-_.+!*'(),

...but are there other practical pitfalls with any of them?

What to do with non-ASCII alphabetical characters? I might let the script rename those I can anticipate (e.g. the Swedish letters "å ä ö" can be renamed to "a a o" or "aa ae oe"), but what to do with the rest? Replace all of them them with an "x"? Or, if the script deletes them, what to do if nothing remains of the file name?

Anything else I haven't thought about?
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Brian Chandler
post Jun 19 2009, 02:59 AM
Post #2


Jocular coder
********

Group: Members
Posts: 2,460
Joined: 31-August 06
Member No.: 43



Depends on the context of these uploads. After the user has uploaded it, what happens?

In general you should probably just convert the filenames to URL-encoded or somesuch. If you are just providing online storage for some reason, you can then show them the names as they were uploaded. Using adhoc kludge for Swedish seems a bad idea.

It seems a Very Bad idea to bar 26 of the 52 commonest characters! If your users have operating systems with flaky ideas about case, well, that's their problem, I'd say.
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Christian J
post Jun 19 2009, 08:38 AM
Post #3


.
********

Group: WDG Moderators
Posts: 9,656
Joined: 10-August 06
Member No.: 7



QUOTE(Brian Chandler @ Jun 19 2009, 09:59 AM) *

Depends on the context of these uploads. After the user has uploaded it, what happens?

Image uploads to a web site, so browsers must understand the file names.

QUOTE
In general you should probably just convert the filenames to URL-encoded or somesuch.

But then the user may not recognize it anymore.

QUOTE
It seems a Very Bad idea to bar 26 of the 52 commonest characters!

Yes I guess case doesn't matter anyway, since the script generates the HTML for IMG elements automatically based on the file names. When people get problems with this it's because they write the HTML manually and forget the case.


User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Brian Chandler
post Jun 19 2009, 11:28 AM
Post #4


Jocular coder
********

Group: Members
Posts: 2,460
Joined: 31-August 06
Member No.: 43



QUOTE(Christian J @ Jun 19 2009, 10:38 PM) *

QUOTE(Brian Chandler @ Jun 19 2009, 09:59 AM) *

Depends on the context of these uploads. After the user has uploaded it, what happens?

Image uploads to a web site, so browsers must understand the file names.

QUOTE
In general you should probably just convert the filenames to URL-encoded or somesuch.

But then the user may not recognize it anymore.


Well, obviously you can _show_ the users the name they gave it, even in hieroglyphics; you only need to convert it for use in the URL or filename.
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Christian J
post Jun 19 2009, 05:37 PM
Post #5


.
********

Group: WDG Moderators
Posts: 9,656
Joined: 10-August 06
Member No.: 7



QUOTE(Brian Chandler @ Jun 19 2009, 06:28 PM) *

obviously you can _show_ the users the name they gave it, even in hieroglyphics; you only need to convert it for use in the URL or filename.

What if the user later downloads the urlencoded file (perhaps because he lost it on his own computer), and the downloaded file is called something else than it's shown as on the web page?

BTW I get some bug when urlencoding uploaded files. A file called "åäö.png" is saved urlencoded by the script as "%E5%E4%F6.png", and it exists in my script's upload directory under that name, but when my browser requests that file the server returns either error 403 or 404. But on the server's directory listing page the link looks like

CODE
<a href="%25E5%25E4%25F6.png">%E5%E4%F6.png</a>

and with that HREF value the browser finds the image.
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Brian Chandler
post Jun 19 2009, 11:54 PM
Post #6


Jocular coder
********

Group: Members
Posts: 2,460
Joined: 31-August 06
Member No.: 43



QUOTE
What if the user later downloads the urlencoded file (perhaps because he lost it on his own computer), and the downloaded file is called something else than it's shown as on the web page?


By the "urlencoded file", you mean the file whose filename - as stored on your server - is the urlencoded version of the original name. Remember that "urlencoding" simply means changing one string to another.

I would have expected to show the users a list of their uploaded files, in which case the text you show is the original filename, but the actual file sent is the same as it was.

I suppose if you expect people to just type in a URL including their own filename, hmm, well you might be stuck. AFAIK, UNIX filenames cannot include '/' (and you surely don't want people creating arbitrary directory structures on your server?), so it's really not plausible.

As for the file called %E5%E4%F6.png ... obviously to access it with a URL, you need to urlencode the name, replacing '%' -- the only problem character -- with its urlencoded form of %25. So of course the following works:

<a href="%25E5%25E4%25F6.png">%E5%E4%F6.png</a>

This is not a bug.

User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Christian J
post Jun 20 2009, 08:11 AM
Post #7


.
********

Group: WDG Moderators
Posts: 9,656
Joined: 10-August 06
Member No.: 7



QUOTE(Brian Chandler @ Jun 20 2009, 06:54 AM) *

I suppose if you expect people to just type in a URL including their own filename, hmm, well you might be stuck.

Or if the user rightclicks on an image and gets confused when seeing a different file name. Or if someone later wants to manage the files with an FTP program, and the urlencoded file names are not descriptive at all. Or if the images are meant to be sorted by name when displayed on a web page, and the user originally named them with that in mind (this also becomes a problem when converting "å ä ö" to "a a o" or similar).

Seems much simpler to give the user clear rules about which file name characters he can use, probably

CODE
a-zA-Z0-9_-.

and refuse everything else.
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



- Lo-Fi Version Time is now: 26th April 2024 - 02:20 AM