"Web safe" file name characters |
"Web safe" file name characters |
Christian J |
Jun 18 2009, 05:52 PM
Post
#1
|
. Group: WDG Moderators Posts: 9,656 Joined: 10-August 06 Member No.: 7 |
I'm writing a script that accepts file uploads from the user's computer, and I want the script to make sure the file names are appropriate for the web. For practical purposes I'll probably not allow uppercase characters or spaces.
Is it a good idea to change an inappropriate file name automatically (with a message that "original file name" was changed to "new file name"), or should I request that the user renames the file himself before uploading it again? The latter means more work for the user, but the renamed file on his computer will have the same name as the uploaded file which might be good for future reference. What to do with non-reserved special characters? http://www.ietf.org/rfc/rfc1738.txt says that QUOTE only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL ...which I gather means that servers and browsers understand all of these in file names: CODE $-_.+!*'(), ...but are there other practical pitfalls with any of them? What to do with non-ASCII alphabetical characters? I might let the script rename those I can anticipate (e.g. the Swedish letters "å ä ö" can be renamed to "a a o" or "aa ae oe"), but what to do with the rest? Replace all of them them with an "x"? Or, if the script deletes them, what to do if nothing remains of the file name? Anything else I haven't thought about? |
Brian Chandler |
Jun 19 2009, 02:59 AM
Post
#2
|
Jocular coder Group: Members Posts: 2,460 Joined: 31-August 06 Member No.: 43 |
Depends on the context of these uploads. After the user has uploaded it, what happens?
In general you should probably just convert the filenames to URL-encoded or somesuch. If you are just providing online storage for some reason, you can then show them the names as they were uploaded. Using adhoc kludge for Swedish seems a bad idea. It seems a Very Bad idea to bar 26 of the 52 commonest characters! If your users have operating systems with flaky ideas about case, well, that's their problem, I'd say. |
Christian J |
Jun 19 2009, 08:38 AM
Post
#3
|
. Group: WDG Moderators Posts: 9,656 Joined: 10-August 06 Member No.: 7 |
Depends on the context of these uploads. After the user has uploaded it, what happens? Image uploads to a web site, so browsers must understand the file names. QUOTE In general you should probably just convert the filenames to URL-encoded or somesuch. But then the user may not recognize it anymore. QUOTE It seems a Very Bad idea to bar 26 of the 52 commonest characters! Yes I guess case doesn't matter anyway, since the script generates the HTML for IMG elements automatically based on the file names. When people get problems with this it's because they write the HTML manually and forget the case. |
Brian Chandler |
Jun 19 2009, 11:28 AM
Post
#4
|
Jocular coder Group: Members Posts: 2,460 Joined: 31-August 06 Member No.: 43 |
Depends on the context of these uploads. After the user has uploaded it, what happens? Image uploads to a web site, so browsers must understand the file names. QUOTE In general you should probably just convert the filenames to URL-encoded or somesuch. But then the user may not recognize it anymore. Well, obviously you can _show_ the users the name they gave it, even in hieroglyphics; you only need to convert it for use in the URL or filename. |
Christian J |
Jun 19 2009, 05:37 PM
Post
#5
|
. Group: WDG Moderators Posts: 9,656 Joined: 10-August 06 Member No.: 7 |
obviously you can _show_ the users the name they gave it, even in hieroglyphics; you only need to convert it for use in the URL or filename. What if the user later downloads the urlencoded file (perhaps because he lost it on his own computer), and the downloaded file is called something else than it's shown as on the web page? BTW I get some bug when urlencoding uploaded files. A file called "åäö.png" is saved urlencoded by the script as "%E5%E4%F6.png", and it exists in my script's upload directory under that name, but when my browser requests that file the server returns either error 403 or 404. But on the server's directory listing page the link looks like CODE <a href="%25E5%25E4%25F6.png">%E5%E4%F6.png</a> and with that HREF value the browser finds the image. |
Brian Chandler |
Jun 19 2009, 11:54 PM
Post
#6
|
Jocular coder Group: Members Posts: 2,460 Joined: 31-August 06 Member No.: 43 |
QUOTE What if the user later downloads the urlencoded file (perhaps because he lost it on his own computer), and the downloaded file is called something else than it's shown as on the web page? By the "urlencoded file", you mean the file whose filename - as stored on your server - is the urlencoded version of the original name. Remember that "urlencoding" simply means changing one string to another. I would have expected to show the users a list of their uploaded files, in which case the text you show is the original filename, but the actual file sent is the same as it was. I suppose if you expect people to just type in a URL including their own filename, hmm, well you might be stuck. AFAIK, UNIX filenames cannot include '/' (and you surely don't want people creating arbitrary directory structures on your server?), so it's really not plausible. As for the file called %E5%E4%F6.png ... obviously to access it with a URL, you need to urlencode the name, replacing '%' -- the only problem character -- with its urlencoded form of %25. So of course the following works: <a href="%25E5%25E4%25F6.png">%E5%E4%F6.png</a> This is not a bug. |
Christian J |
Jun 20 2009, 08:11 AM
Post
#7
|
. Group: WDG Moderators Posts: 9,656 Joined: 10-August 06 Member No.: 7 |
I suppose if you expect people to just type in a URL including their own filename, hmm, well you might be stuck. Or if the user rightclicks on an image and gets confused when seeing a different file name. Or if someone later wants to manage the files with an FTP program, and the urlencoded file names are not descriptive at all. Or if the images are meant to be sorted by name when displayed on a web page, and the user originally named them with that in mind (this also becomes a problem when converting "å ä ö" to "a a o" or similar). Seems much simpler to give the user clear rules about which file name characters he can use, probably CODE a-zA-Z0-9_-. and refuse everything else. |
Lo-Fi Version | Time is now: 26th April 2024 - 02:20 AM |