Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature-level parameter for defaults (ColumnOptions constructor) #164

Open
MV10 opened this issue Oct 12, 2018 · 3 comments
Open

Feature-level parameter for defaults (ColumnOptions constructor) #164

MV10 opened this issue Oct 12, 2018 · 3 comments

Comments

@MV10
Copy link
Contributor

MV10 commented Oct 12, 2018

As time goes on, SQL server releases add or remove certain features. Similarly, this sink has many defaults that exist purely for backwards-compatibility reasons, despite learning later that other defaults would be better. For example, the Level Standard Column defaults to 128 bytes when 12 would be adequate (#146), or the use of a DateTime column for TimeStamp rather than the superior SQL 2008 DateTimeOffset type.

The PR I'm working on consolidates most of these defaults into the ColumnOptions constructor, and I'm considering an override (not in the same PR) that accepts an enumeration which maps to groups of default values.

public enum MSSqlSinkFeatureLevel
{
    BackwardsCompatible, // the default, whatever the sink does today
    OptimizedDefaults,   // baseline SQL, better indexing and column sizes
    SQL2008,             // DateTimeOffset TimeStamp column
    SQL2017              // CCI supports max-length character data
}

var colOpts = new ColumnOptions(MSSqlSinkFeatureLevel.OptimizedDefaults);

The most important aspect of this, in my opinion, would be OptimizedDefaults which are performance-oriented without getting into changes that require a specific version of SQL Server:

  • omit the Id column (see benchmarks)
  • default Level to 12 characters instead of 128
  • include the LogEvent Standard Column by default instead of Properties
  • set DisableTriggers to true by default

While of less importance, this could also help inform configuration validation which could reduce support questions. For example, clustered columnstore indexing only supports max length character data with SQL 2017 or later. If this index type was chosen but the feature-level didn't map to SQL 2017, an exception could be thrown. When SQL 2017 is selected the max length would be allowed.

Perhaps we can start a discussion about how these settings would impact the defaults, other possible levels, or whether this is just unnecessary noise (I'm thick-skinned, it's fine). I'm even strongly tempted to suggest a breaking change by making OptimizedDefaults the default (the default defaults?) and anyone who is truly tied to backwards-compatibility can explicitly specify that level if necessary.

Ideas?

@GunnarHakansson
Copy link

The level BackwardsCompatible begs the question "Backwards compatible with what?" I'm assuming the answer will be "SQL Server 2005 or older", but then it could just be called SQL2005. Backwards Compatible is a correct term for anything but the newest release, but it can be a bit unclear which version is targeted.
Likewise, the level OptimizedDefaults can instead be called SQL2005Optimized. That would make it clear from the name alone that you should consider selecting a newer level when running on a newer version of SQL Server.

It seems that you want to do two things with the same enumeration. You want to provide a way to specify a group of default values, as well as providing a way to validate the configuration for a specific version of SQL Server.
Perhaps two enumerations would be better, one to validate and one to set the defaults?
I imagine that someone will want to make a group of settings suitable for CCI (see #68) while others will prefer the standard row store. That would make more than one group of default settings for SQL Server 2017 necessary.

@MV10
Copy link
Contributor Author

MV10 commented Oct 23, 2018

Great feedback.

I agree BackwardsCompatible is a bad choice because it's unclear what it means. I changed it several times when I wrote the original message and still wasn't happy with it. I'm glad you pointed it out. Normally I would agree it's a bad idea to use an enum to do unrelated things, but providing defaults can be viewed as an aspect of recognizing available features.

OptimizedDefaults is also not good for similar reasons. Compared to what? Is SQL2008 somehow less optimized? I think the biggest problem in my original idea was that the names of the enum members were too obviously doing different things.

So I would instead suggest the enum only reflects SQL versions, plus Legacy as the default setting (constructor with no arguments) which avoids breaking existing code by not requiring the enum and also by using the old non-optimized defaults. I wouldn't even map it to a specific SQL level because that suggests someone might want to use it intentionally. The name Legacy signals to the user that they should consider reviewing what this new flag does. (I'd also considered Undefined but I think the message it sends to the user to accept Undefined vs Legacy is very different.)

Regarding other enum values, I'm proposing a given SQL level merely reflects a minimally optimized variation on the Legacy defaults. The key word is "minimally"... so if SQL2023 comes along and Microsoft makes a bombshell announcement that max-length is no longer supported, then from that version forward, none of the standard columns would use max-length. But I would not expect anyone to add other variations to the enum to achieve default configurations for completely optional features such as CCI.

So it was probably wrong of me to say configuration validation was a less-important role for this idea. It's actually the primary role, and mapping the defaults is more of a side-effect of declaring the SQL version.

(And speaking of CCI, if we accept validation as an important role, we'd probably also want SQL2012 since I think that's the first version of CCI with max-length restrictions.)

@GunnarHakansson
Copy link

Bit of a late reply...

I'm not sure if I would view the providing of defaults as an aspect of (recognizing) available features. The available features determine what set of configurations are valid. Among those, one will be the most suitable default (in general). Another may be better suited to a specific use case.

The name Legacy sounds good. If it said Undefined I might believe that it was an opt-out of any validation, or I might interpret that in another way. Hard to guess.
One opt-out value would be a good idea, but then it could have a name that just says it: NoValidation?
Maybe this is a case where breaking compatibility would be a better option? If you run on some old system, I'd imagine you would be restrictive with updating a package. Maybe even reading the documentation before...
Also, removing support in a future version of this Nuget package for versions of SQL Server that Microsoft has stopped supporting should not be too controversial.

The best argument I can think of for having multiple default configurations is that it would make people select one of the defaults, and that should be a good configuration depending on their specifics. If you have to make your own configuration from scratch, it's easier to make a mistake. The smaller the selection, the easier to make the optimal choice.
I'm picturing code that uses this would first select a default configuration, then maybe modifying it, and lastly use it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants