Help - Search - Members - Calendar
Full Version: Importing Data, PHPAdmin, Invisible Characters
HTMLHelp Forums > Programming > Databases
colinkites2000
Hello,

I've been trying to import data into a simple table into PHPmyadmin that will dictate the location of my products. There are 2 columns; product_id and location.

When I try to select the data based on the location, it does not find anything when I use;

CODE
WHERE location = 'index2'


However when I use

CODE
WHERE location LIKE "%index2%"


the data is found.

I checked the data and there are no visible spaces before and after. When I delete the location record by clicking NULL in PHPmyadmin and then re-type the exact same data, it will select. Does anyone have any idea why this might be happening and / or how I can fix it?

Best,
C
Brian Chandler
QUOTE(colinkites2000 @ Aug 7 2009, 10:07 PM) *

Hello,

I've been trying to import data into a simple table into PHPmyadmin that will dictate the location of my products. There are 2 columns; product_id and location.



This doesn't quite make sense. You can't "import into Phpmyadmin" -- you may import into your database *using* Phpma, or you may look at the data afterwards using Phpma...


QUOTE


When I try to select the data based on the location, it does not find anything when I use;

CODE
WHERE location = 'index2'


However when I use

CODE
WHERE location LIKE "%index2%"


the data is found.

I checked the data and there are no visible spaces before and after. When I delete the location record by clicking NULL in PHPmyadmin and then re-type the exact same data, it will select. Does anyone have any idea why this might be happening and / or how I can fix it?


The evidence is that there are "invisible characters" there. Try LIKE "%index2" and LIKE "index2%" to see which end (if not both).

Then use php functions to print the byte values one by one; or just use a utility like od to look at the source text. Unfortunately there are all sorts of (generally stupid) ideas around for adding bits of crud to fix some particular situation, while messing up all others. If the data is in UTF-8 and you have let M$ anywhere near it for example, you may find "BOM" starter bytes, which you need to remove.

colinkites2000
I'm not sure I understand the statement "to print the byte values one by one; or just use a utility like od to look at the source text." Can you be more specific in terms of what this will do and the reason for it?


QUOTE
The evidence is that there are "invisible characters" there. Try LIKE "%index2" and LIKE "index2%" to see which end (if not both).

Then use php functions to print the byte values one by one; or just use a utility like od to look at the source text. Unfortunately there are all sorts of (generally stupid) ideas around for adding bits of crud to fix some particular situation, while messing up all others. If the data is in UTF-8 and you have let M$ anywhere near it for example, you may find "BOM" starter bytes, which you need to remove.
Brian Chandler
QUOTE
I'm not sure I understand the statement "to print the byte values one by one; or just use a utility like od to look at the source text." Can you be more specific in terms of what this will do and the reason for it?



Yes, the basic point is thatany text handled in software is *actually* a sequence of bytes. It is *not* a sequence of squiggles drawn on a screen. In the good old (simple) days, the bytes were ASCII codes for enough characters to write (semi-literate) English and programs in C or the like; now typically the bytes are actually an encoding of Unicode character representations, of which there are far too many to remember, and which include all sorts of control codes that probably don't mean anything. So you need to be able to look at these codes.

If you have a source text file (not "Excel" or whatever), you can look at it with a dump utility such as od (Google od+utility: eg http://www.linuxjournal.com/article/1326 )

Or you can write a little bit of debugging code in your php program, using ord() to get the numeric values of the string:
http://jp2.php.net/manual/en/function.ord.php

If you get a sequence of more numbers than there are ASCII characters, you know there is some extre crud in there, which you can remove with a bit of code in the importing program.

If you have a source text file with the data, you could put it here as an attachment (perhaps zip it first, so it can't be "corrected" by some "helpful" software along the line).
colinkites2000
Thanks for this Brian. I was comparing this table with my other table which seems to find the data fine. The main difference is that this column in my new location table is the final column. Another clue about this was when I tried your suggestion of LIKE '%column' and LIKE 'column%'. Since only the second one returned results, it became apparent that something may be going on with the carriage return/line break portion of the data. So I changed my export setting from the spreadsheet program to use a different line break and now everything seems to be working flawlessly.

Typically do people use spreadsheet programs like Excel to create their data and then export to CSV? This seemed like a reasonable route but is there a better or standardized way that might not allow for as many potential errors/extra bytes?


QUOTE(Brian Chandler @ Aug 8 2009, 12:58 AM) *

QUOTE
I'm not sure I understand the statement "to print the byte values one by one; or just use a utility like od to look at the source text." Can you be more specific in terms of what this will do and the reason for it?



Yes, the basic point is thatany text handled in software is *actually* a sequence of bytes. It is *not* a sequence of squiggles drawn on a screen. In the good old (simple) days, the bytes were ASCII codes for enough characters to write (semi-literate) English and programs in C or the like; now typically the bytes are actually an encoding of Unicode character representations, of which there are far too many to remember, and which include all sorts of control codes that probably don't mean anything. So you need to be able to look at these codes.

If you have a source text file (not "Excel" or whatever), you can look at it with a dump utility such as od (Google od+utility: eg http://www.linuxjournal.com/article/1326 )

Or you can write a little bit of debugging code in your php program, using ord() to get the numeric values of the string:
http://jp2.php.net/manual/en/function.ord.php

If you get a sequence of more numbers than there are ASCII characters, you know there is some extre crud in there, which you can remove with a bit of code in the importing program.

If you have a source text file with the data, you could put it here as an attachment (perhaps zip it first, so it can't be "corrected" by some "helpful" software along the line).
Brian Chandler
QUOTE
Typically do people use spreadsheet programs like Excel to create their data and then export to CSV? This seemed like a reasonable route but is there a better or standardized way that might not allow for as many potential errors/extra bytes?


Well, you may always need to import data via CSV, or similar format, but this is always a bit tedious, precisely because of getting the fiddly details like terminators/separators correct.

If you are going to make a fresh data table, it is very likely you need to knock up an admin script for updating it, and personally I would normally use that to enter the data in the first place (assuming it really comes from keystrokes). This initial stage often helps sort out bugs in the way you have designed things.

This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.