Strategy for the column naming

Emile_Schwarz · November 30, 2022, 12:32pm

The reference usually says about how to name a COLUMN in SQL:
Column names must contain only A to Z, 0 to 9, and underscore (_) characters

What strategy do you use if you have, say space, # or any other illegal character in your COLUMN NAME ?

Example:
Your database is about a magazine and you have for the first COLUMN “Issue #"…
You may, in a different database, use “Date (Start)” and “Date (End)” as COLUMN names…

Of course, I can set a filter to avoid illegal characters in the COLUMN name(s), but in 2022 ? Nowaday, only very few characters are illegal in the OS file names…

The question is:

How do you deal with these “illegal” characters ?
(or “What is your strategy to have your own COLUMN names displayed to the users…”)

Jeff_Tullin · November 30, 2022, 12:38pm

Often a bad idea… column names are often shorthand or not user-friendly.

What strategy do you use if you have, say space, # or any other illegal character in your COLUMN NAME ?

I don’t use illegal characters, no matter what year it is.
Nothing to stop you having a column called ISSNUM and showing the value on screen next to a label showing “Issue #”

Can you access these fields in whatever database you are looking at by simply enclosing the name in Quotes?
I have seen some systems (eg SQL Server) where you use brackets to resolve this too…

eg [Issue #]

Emile_Schwarz · November 30, 2022, 1:01pm

Thank you Jeff for your answer.

I think I miss an information.

What about a TABLE who will hold the user displayed name ?

The DB is defined at creation; later modification may or may not be allowed…

Rick_Araujo · November 30, 2022, 1:16pm

There’s a very known strategy: don’t try it, use the known standard.

Some DBMS allows SOME unusual chars, and they allow it enclosed in some kind of brackets, mostly just quoted “My Silly Column Name” or [My Silly Column Name]

The universal rule says “lowercase chars, ASCII a-z 0-9 chars only, underscores as separators, avoid larger names, max 30 chars”

If you want that users use some “friendly name” to a column, you should develop some kind of aliasing engine where a display name as [My silly field for name #3] end just as “name_3” or “name3” in the DBMS but never allowing the user (as they are not trained for) naming fields freely. If you allow, your naming engine should follow the universal rule to avoid troubles and crashes and even migration without fail to diverse DBMS systems in the future.

Jeff_Tullin · November 30, 2022, 1:26pm

Why not. Thats just a lookup.
Reading your first post, I could believe you are confusing the name of the column with the contents.

But the bottom line is this:
If you have an edit screen that allows people to edit the contents of a record,
what you display as the field name is under your control, it does NOT have to match the actual column name , and can even be dynamic so that the language changes.

eg Column backColor
could be exposed to the user as Back Color, Back Colour, Hintergrundfarbe
etc.

Emile_Schwarz · November 30, 2022, 1:38pm

Thanks Rick,

some years ago, I forget that and used “Illegal Characters”.

SQLite complain at DB creation time. So, I searched for a work-around and used a TABLE to do that. But that DB cannot be changed (except by me).
I never read anything about that potential problem since 2013 (that software is in use since april 2012).

Today was the day I had to ask how people do:
Enforce the use of A-Z, 0-9 and underscore
or
Use a strategy to display what a user want and do some trick to use a legal name internally / display the user strings.

Example of User inertia (with data base):
for a ProBono application, I used SQLDates (Birth Date, Date In / Date Out) for either display and internally (just because I was in a hurry for the first version.

In a second version, I added three PopupMenus to enforce the dates (Day, Month and Year) and store the result as SQLDate. I have my three dates OK.

Until one day, they wanted to have a new Column (Called Childs) and they put the Child(s) birth date using all kinds for date format, even in the same field, one child date can be DD-MM-YY and its brother or sister can be YY-DD-MM or whatever format they have in mind (the field separator goes from space, dot, - or /…)
And one day, they asked me for statistic about the childs. My answer was: Yes, of course, but you have to normalyze the birth dates first !… two years ago, they never normalyse the birth dates … I never implemented anything to deal with these…

This is the reason why I asked today this question.

Emile_Schwarz · November 30, 2022, 1:41pm

That was the idea: Column Name (db internal) vs Header Name (screen display).

Rick_Araujo · November 30, 2022, 1:50pm

If you have a controlled environment, not a chaotic environment, and you know what’s all the column names and contents, you should implement a DB versioning system. Let’s for example add a column to a table called “configs” named “db_version”.

If you don’t find such column, you know your db is a very old version, rename fields for the proper ones, fix some data, and set “db_version” to “1.0”

If you find “1.0” there, just run your new version of the app with the new columns and corrections.

If you find something else, as “1.1” or “2.0” display an error message like “Database version is incompatible, you may need to update your app”.

Jeff_Tullin · November 30, 2022, 3:08pm

So we are not talking only about the column names.
Again, this app should have sanitised the date data for the Childs column.
Take what the user enters, turn it into a known date format, and store the same thing every time.
If you let a user put anything they like in a field, you have (these) reporting problems, and run a big risk of SQL injection.

You should sanitise the date data yourself with an update, or a programatic loop.
Read each, try to work out what they entered, convert it into SQLDate, and write it back.

Whenever a customer asked me for a ‘Notes’ field, I had to ask firmly:
‘What do you intend to put there? Because I promise anything you put in there will be useless within a couple of weeks’
Users will add data like

‘As agreed with manager’
‘We owe them $20’
‘Not available on Wednesdays’
‘New phone number will be 123445 from Friday’
It is meaningless.

Bob_Grommes · November 30, 2022, 3:29pm

I’m not 100% clear on exactly why the OP is asking about this or needs it, but in my day job I process daily to monthly data contributions from thousands of clients. Many of these are Excel or csv formats that I have no control over. I find that at least with our usual process of reading the raw data into memory and then bulk-loading it to a DB temp table in order to “chunk” the data into 10,000 row partitions that don’t use too much memory, Sql Server at least has no real issue with spaces in column heads (or with non-alphanumeric ASCII punctuation), you just have to be careful to use square brackets when referencing them in SQL statements. I believe we replace any embedded tabs or line endings in the column head with spaces though.

Other DBs will have their own ways of dealing with this with varying degrees of success. Postgres uses double quotes rather than brackets, and in fact, by default can’t handle mixed case column heads and converts everything to lower case unless you use double quotes in SQL statements to preserve casing; hence the tendency of Postgres users to name everything all lower case with underscores between “words” for clarity in order to get away from peppering queries with ugly double quotes.

I agree with everyone here that it’s best to stick with the standard conventions of the DB you’re using but sometimes at least on initial intake you do have to deal with what your data source dictates. It is possible but requires extra effort and is more error-prone so you build up exception handlers around these issues to give you meaningful information when something goes wrong.

Our final DB tables that all this is scrubbed / cleansed into DO use Sql Server-standard column names and other object names. I am just speaking here of the intake and staging parts of the process.

By the way, OleDb is also pretty tolerant of this kind of thing – that is how we usually convert fixed-width text data initially for example (using the so-called “desktop DB drivers”, e.g., Access, Excel, csv, or fixed width defined with a schema.ini file). What OleDb is weird about is the set of filename extensions it will recognize or work with, it doesn’t understand .dat or .tab for example, I think it’s limited to .txt and .csv so you have to do some renaming to make it acceptable.

Emile_Schwarz · December 1, 2022, 10:46am

Bob, all:
Here’s a partial screenshot:

You can see the ToolBar and the ListBox Header (View by List).

And you can see that I use open-close parents, vowels with accent, space, slash, etc.
This Header comes from a specific Table and is not the name of the Row’s Columns.

As an example, “Envoyé(e) par” DB Column Name is “SentBy” (an English translation and correct SQL name); “Né(e) le” → “Birth_Date” (same), etc.

The TABLE is added to the .sqlite file at creation time as well as the TABLE who hold the data.

Remember:
Users may be newbies not developer like we can be; in this case, they are elderly people, volunteers in their free time with a charity association. We have to be nice to them…

I was simply checking if there is a better solution. It seems this is a good one in this case (they do not have access to change the Data Base Shema).

Importing data was done 9 years ago and will probably never be done again. I’ve created an analyzer to achive that (stip multiples spaces, check dates, etc. and report an error log for manual corrections). The original data was created with Excel, exported into a text file.

The software runs in one computer, the software do not access to the internet neither for read nor write (but who knows what they can do by themselves ? ).
Manual backup(s) of the .sqlite file are done now and then.
There are 2,000 Records and 750 are actives (1250 are “old records” or legacy records). Numbers from this year file.