diff --git a/content/best-practices/no-cargo-cults.md b/content/best-practices/no-cargo-cults.md index 183cc15c..8f63daf4 100644 --- a/content/best-practices/no-cargo-cults.md +++ b/content/best-practices/no-cargo-cults.md @@ -7,9 +7,9 @@ type = "docs" Do not [cargo cult](https://en.wikipedia.org/wiki/Cargo_cult_programming) -settings in proto files. If \ -you are creating a new proto file based on existing schema definitions, don't -apply option settings except for those that you understand the need for. +settings in proto files. If you are creating a new proto file based on existing +schema definitions, don't apply option settings except for those that you +understand the need for. ## Best Practices Specific to Editions {#editions} diff --git a/content/getting-started/pythontutorial.md b/content/getting-started/pythontutorial.md index 34a6a7a0..595e431e 100644 --- a/content/getting-started/pythontutorial.md +++ b/content/getting-started/pythontutorial.md @@ -76,14 +76,16 @@ each field in the message. Here is the `.proto` file that defines your messages, `addressbook.proto`. ```proto -syntax = "proto2"; +edition = "2023"; package tutorial; +option features.field_presence = EXPLICIT; + message Person { - optional string name = 1; - optional int32 id = 2; - optional string email = 3; + string name = 1; + int32 id = 2; + string email = 3; enum PhoneType { PHONE_TYPE_UNSPECIFIED = 0; @@ -93,8 +95,8 @@ message Person { } message PhoneNumber { - optional string number = 1; - optional PhoneType type = 2 [default = PHONE_TYPE_HOME]; + string number = 1; + PhoneType type = 2 [default = PHONE_TYPE_HOME]; } repeated PhoneNumber phones = 4; @@ -135,39 +137,9 @@ less-commonly used optional elements. Each element in a repeated field requires re-encoding the tag number, so repeated fields are particularly good candidates for this optimization. -Each field must be annotated with one of the following modifiers: - -- `optional`: the field may or may not be set. If an optional field value - isn't set, a default value is used. For simple types, you can specify your - own default value, as we've done for the phone number `type` in the example. - Otherwise, a system default is used: zero for numeric types, the empty - string for strings, false for bools. For embedded messages, the default - value is always the "default instance" or "prototype" of the message, which - has none of its fields set. Calling the accessor to get the value of an - optional (or required) field which has not been explicitly set always - returns that field's default value. -- `repeated`: the field may be repeated any number of times (including zero). - The order of the repeated values will be preserved in the protocol buffer. - Think of repeated fields as dynamically sized arrays. -- `required`: a value for the field must be provided, otherwise the message - will be considered "uninitialized". Serializing an uninitialized message - will raise an exception. Parsing an uninitialized message will fail. Other - than this, a required field behaves exactly like an optional field. - -{{% alert title="Important" color="warning" %}} **Required Is Forever** -You should be very careful about marking fields as `required`. If at some point -you wish to stop writing or sending a required field, it will be problematic to -change the field to an optional field -- old readers will consider messages -without this field to be incomplete and may reject or drop them unintentionally. -You should consider writing application-specific custom validation routines for -your buffers instead. Within Google, `required` fields are strongly disfavored; -most messages defined in proto2 syntax use `optional` and `repeated` only. -(Proto3 does not support `required` fields at all.) -{{% /alert %}} - You'll find a complete guide to writing `.proto` files -- including all the possible field types -- in the -[Protocol Buffer Language Guide](/programming-guides/proto2). +[Protocol Buffer Language Guide](/programming-guides/editions). Don't go looking for facilities similar to class inheritance, though -- protocol buffers don't do that. diff --git a/content/installation.md b/content/installation.md index 22781a22..4e1ddc90 100644 --- a/content/installation.md +++ b/content/installation.md @@ -23,14 +23,15 @@ binaries, follow these instructions: ```sh PB_REL="https://github.com/protocolbuffers/protobuf/releases" - curl -LO $PB_REL/download/v< param protoc-version >/protoc-< param protoc-version >-linux-x86_64.zip + curl -LO $PB_REL/download/v30.2/protoc-30.2-linux-x86_64.zip + ``` 2. Unzip the file under `$HOME/.local` or a directory of your choice. For example: ```sh - unzip protoc-< param protoc-version >-linux-x86_64.zip -d $HOME/.local + unzip protoc-30.2-linux-x86_64.zip -d $HOME/.local ``` 3. Update your environment's path variable to include the path to the `protoc` diff --git a/content/news/2025-03-18.md b/content/news/2025-03-18.md new file mode 100644 index 00000000..fe810aa2 --- /dev/null +++ b/content/news/2025-03-18.md @@ -0,0 +1,14 @@ ++++ +title = "Changes Announced on March 18, 2025" +linkTitle = "March 18, 2025" +toc_hide = "true" +description = "Changes announced for Protocol Buffers on March 18, 2025." +type = "docs" ++++ + +## Dropping Ruby 3.0 Support + +As per our official +[Ruby support policy](https://cloud.google.com/ruby/getting-started/supported-ruby-versions), +we will be dropping support for Ruby 3.0 and lower in Protobuf version 31, due +to release in April, 2025. The minimum supported Ruby version will be 3.1. diff --git a/content/news/v31.md b/content/news/v31.md new file mode 100644 index 00000000..0c26f6c1 --- /dev/null +++ b/content/news/v31.md @@ -0,0 +1,23 @@ ++++ +title = "News Announcements for Version 31.x" +linkTitle = "Version 31.x" +toc_hide = "true" +description = "Changes announced for Protocol Buffers version 31.x." +type = "docs" ++++ + +The following announcements are specific to Version 31.x. For information +presented chronologically, see [News](/news). + +The following sections cover planned breaking changes in the v31 release, +expected in 2025 Q2. Also included are some changes that aren't breaking but may +require action on your part. These describe changes as we anticipate them being +implemented, but due to the flexible nature of software some of these changes +may not land or may vary from how they are described in this topic. + +### Dropping Ruby 3.0 Support + +As per our official +[Ruby support policy](https://cloud.google.com/ruby/getting-started/supported-ruby-versions), +we will be dropping support for Ruby 3.0. The minimum supported Ruby version +will be 3.1. diff --git a/content/programming-guides/editions.md b/content/programming-guides/editions.md index 326b2381..441a700e 100644 --- a/content/programming-guides/editions.md +++ b/content/programming-guides/editions.md @@ -1003,7 +1003,11 @@ following rules: effect as if you had cast the number to that type in C++ (for example, if a 64-bit number is read as an int32, it will be truncated to 32 bits). * `sint32` and `sint64` are compatible with each other but are *not* - compatible with the other integer types. + compatible with the other integer types. If the value written was between + INT_MIN and INT_MAX inclusive it will parse as the same value with either + type. If an sint64 value was written outside of that range and parsed as an + sint32, the varint is truncated to 32 bits and then zigzag decoding occurs + (which will cause a different value to be observed). * `string` and `bytes` are compatible as long as the bytes are valid UTF-8. * Embedded messages are compatible with `bytes` if the bytes contain an encoded instance of the message. diff --git a/content/programming-guides/encoding.md b/content/programming-guides/encoding.md index edf8f2d5..3a7c0526 100644 --- a/content/programming-guides/encoding.md +++ b/content/programming-guides/encoding.md @@ -29,13 +29,15 @@ discuss aspects of the wire format. The Protoscope tool can also dump encoded protocol buffers as text. See https://github.com/protocolbuffers/protoscope/tree/main/testdata for examples. +All examples in this topic assume that you are using Edition 2023 or later. + ## A Simple Message {#simple} Let's say you have the following very simple message definition: ```proto message Test1 { - optional int32 a = 1; + int32 a = 1; } ``` @@ -241,7 +243,7 @@ Consider this message schema: ```proto message Test2 { - optional string b = 2; + string b = 2; } ``` @@ -275,7 +277,7 @@ an embedded message of our original example message, `Test1`: ```proto message Test3 { - optional Test1 c = 3; + Test1 c = 3; } ``` @@ -293,36 +295,49 @@ and a length of 3, exactly the same way as strings are encoded. In Protoscope, submessages are quite succinct. ` ``1a03089601`` ` can be written as `3: {1: 150}`. -## Optional and Repeated Elements {#optional} +## Missing Elements {#optional} -Missing `optional` fields are easy to encode: we just leave out the record if +Missing fields are easy to encode: we just leave out the record if it's not present. This means that "huge" protos with only a few fields set are quite sparse. -`repeated` fields are a bit more complicated. Ordinary (not [packed](#packed)) -repeated fields emit one record for every element of the field. Thus, if we have + + +## Repeated Elements {#repeated} + +Starting in Edition 2023, `repeated` fields of a primitive type +(any [scalar type](/programming-guides/proto2#scalar) +that is not `string` or `bytes`) are ["packed"](/editions/features#repeated_field_encoding) by default. + +Packed `repeated` fields, instead of being encoded as one +record per entry, are encoded as a single `LEN` record that contains each +element concatenated. To decode, elements are decoded from the `LEN` record one +by one until the payload is exhausted. The start of the next element is +determined by the length of the previous, which itself depends on the type of +the field. Thus, if we have: ```proto message Test4 { - optional string d = 4; - repeated int32 e = 5; + string d = 4; + repeated int32 e = 6; } ``` and we construct a `Test4` message with `d` set to `"hello"`, and `e` set to -`1`, `2`, and `3`, this *could* be encoded as `` `220568656c6c6f280128022803` -``, or written out as Protoscope, +`1`, `2`, and `3`, this *could* be encoded as `` `3206038e029ea705` ``, or +written out as Protoscope, ```proto 4: {"hello"} -5: 1 -5: 2 -5: 3 +6: {3 270 86942} ``` -However, records for `e` do not need to appear consecutively, and can be -interleaved with other fields; only the order of records for the same field with -respect to each other is preserved. Thus, this could also have been encoded as +However, if the repeated field is set to expanded (overriding the default packed +state) or is not packable (strings and messages) then an entry for each +individual value is encoded. Also, records for `e` do not need to appear +consecutively, and can be interleaved with other fields; only the order of +records for the same field with respect to each other is preserved. Thus, this +could look like the following: ```proto 5: 1 @@ -331,6 +346,24 @@ respect to each other is preserved. Thus, this could also have been encoded as 5: 3 ``` +Only repeated fields of primitive numeric types can be declared "packed". These +are types that would normally use the `VARINT`, `I32`, or `I64` wire types. + +Note that although there's usually no reason to encode more than one key-value +pair for a packed repeated field, parsers must be prepared to accept multiple +key-value pairs. In this case, the payloads should be concatenated. Each pair +must contain a whole number of elements. The following is a valid encoding of +the same message above that parsers must accept: + +```proto +6: {3 270} +6: {86942} +``` + +Protocol buffer parsers must be able to parse repeated fields that were compiled +as `packed` as if they were not packed, and vice versa. This permits adding +`[packed=true]` to existing fields in a forward- and backward-compatible way. + ### Oneofs {#oneofs} [`Oneof` fields](/programming-guides/proto2#oneof) are @@ -368,53 +401,6 @@ message.MergeFrom(message2); This property is occasionally useful, as it allows you to merge two messages (by concatenation) even if you do not know their types. -### Packed Repeated Fields {#packed} - -Starting in v2.1.0, `repeated` fields of a primitive type -(any [scalar type](/programming-guides/proto2#scalar) -that is not `string` or `bytes`) can be declared as "packed". In proto2 this is -done using the field option `[packed=true]`. In proto3 it is the default. - -Instead of being encoded as one record per entry, they are encoded as a single -`LEN` record that contains each element concatenated. To decode, elements are -decoded from the `LEN` record one by one until the payload is exhausted. The -start of the next element is determined by the length of the previous, which -itself depends on the type of the field. - -For example, imagine you have the message type: - -```proto -message Test5 { - repeated int32 f = 6 [packed=true]; -} -``` - -Now let's say you construct a `Test5`, providing the values 3, 270, and 86942 -for the repeated field `f`. Encoded, this gives us `` `3206038e029ea705` ``, or -as Protoscope text, - -```proto -6: {3 270 86942} -``` - -Only repeated fields of primitive numeric types can be declared "packed". These -are types that would normally use the `VARINT`, `I32`, or `I64` wire types. - -Note that although there's usually no reason to encode more than one key-value -pair for a packed repeated field, parsers must be prepared to accept multiple -key-value pairs. In this case, the payloads should be concatenated. Each pair -must contain a whole number of elements. The following is a valid encoding of -the same message above that parsers must accept: - -```proto -6: {3 270} -6: {86942} -``` - -Protocol buffer parsers must be able to parse repeated fields that were compiled -as `packed` as if they were not packed, and vice versa. This permits adding -`[packed=true]` to existing fields in a forward- and backward-compatible way. - ### Maps {#maps} Map fields are just a shorthand for a special kind of repeated field. If we have @@ -430,8 +416,8 @@ this is actually the same as ```proto message Test6 { message g_Entry { - optional string key = 1; - optional int32 value = 2; + string key = 1; + int32 value = 2; } repeated g_Entry g = 7; } diff --git a/content/programming-guides/field_presence.md b/content/programming-guides/field_presence.md index df940f70..bee2fa6a 100644 --- a/content/programming-guides/field_presence.md +++ b/content/programming-guides/field_presence.md @@ -13,12 +13,6 @@ are two different manifestations of presence for protobufs: *implicit presence*, where the generated message API stores field values (only), and *explicit presence*, where the API also stores whether or not a field has been set. -Historically, proto2 has mostly followed *explicit presence*, while proto3 -exposes only *implicit presence* semantics. Singular proto3 fields of basic -types (numeric, string, bytes, and enums) which are defined with the `optional` -label have *explicit presence*, like proto2 (this feature is enabled by default -as release 3.15). - {{% alert title="Note" color="note" %}} We recommend always adding the `optional` label for proto3 basic types. This provides a smoother path to editions, which uses explicit presence by @@ -179,10 +173,8 @@ affirmatively expose presence, although the same set of hazzer methods may not generated as in proto2 APIs. This default behavior of not tracking presence without the `optional` label is -different from the proto2 behavior. We reintroduced -[explicit presence](/editions/features#field_presence) as -the default in edition 2023. We recommend using the `optional` field with proto3 -unless you have a specific reason not to. +different from the proto2 behavior. We recommend using the `optional` label with +proto3 unless you have a specific reason not to. Under the *implicit presence* discipline, the default value is synonymous with "not present" for purposes of serialization. To notionally "clear" a field (so @@ -195,6 +187,28 @@ required to have an enumerator value which maps to 0. By convention, this is an the domain of valid values for the application, this behavior can be thought of as tantamount to *explicit presence*. +### Presence in Editions APIs + +This table outlines whether presence is tracked for fields in editions APIs +(both for generated APIs and using dynamic reflection): + +Field type | Explicit Presence +-------------------------------------------- | ----------------- +Singular numeric (integer or floating point) | ✔️ +Singular enum | ✔️ +Singular string or bytes | ✔️ +Singular message† | ✔️ +Repeated | +Oneofs† | ✔️ +Maps | + +† Messages and oneofs have never had implicit presence, and editions +doesn't allow you to set `field_presence = IMPLICIT`. + +Editions-based APIs track field presence explicitly, similarly to proto2, unless +`features.field_presence` is set to `IMPLICIT`. Similar to proto2 APIs, +editions-based APIs do not track presence explicitly for repeated fields. + ## Semantic Differences {#semantic-differences} The *implicit presence* serialization discipline results in visible differences diff --git a/content/programming-guides/json.md b/content/programming-guides/json.md index f4695416..a56fa673 100644 --- a/content/programming-guides/json.md +++ b/content/programming-guides/json.md @@ -40,6 +40,13 @@ field in any edition of protobuf supports field presence and if set will appear in the output. Proto3 implicit-presence scalar fields will only appear in the JSON output if they are not set to the default value for that type. +When representing numerical data in a JSON file, if the number that is is parsed +from the wire doesn't fit in the corresponding type, you will get the same +effect as if you had cast the number to that type in C++ (for example, if a +64-bit number is read as an int32, it will be truncated to 32 bits). + +The following table shows how data is represented in JSON files. + diff --git a/content/programming-guides/proto2.md b/content/programming-guides/proto2.md index e68362af..7e06faeb 100644 --- a/content/programming-guides/proto2.md +++ b/content/programming-guides/proto2.md @@ -1056,7 +1056,11 @@ following rules: effect as if you had cast the number to that type in C++ (for example, if a 64-bit number is read as an int32, it will be truncated to 32 bits). * `sint32` and `sint64` are compatible with each other but are *not* - compatible with the other integer types. + compatible with the other integer types. If the value written was between + INT_MIN and INT_MAX inclusive it will parse as the same value with either + type. If an sint64 value was written outside of that range and parsed as an + sint32, the varint is truncated to 32 bits and then zigzag decoding occurs + (which will cause a different value to be observed). * `string` and `bytes` are compatible as long as the bytes are valid UTF-8. * Embedded messages are compatible with `bytes` if the bytes contain an encoded instance of the message. diff --git a/content/programming-guides/proto3.md b/content/programming-guides/proto3.md index 34d5cb2c..eb723775 100644 --- a/content/programming-guides/proto3.md +++ b/content/programming-guides/proto3.md @@ -879,9 +879,14 @@ Instead of moving the `.proto` file directly and updating all the call sites in a single change, you can put a placeholder `.proto` file in the old location to forward all the imports to the new location using the `import public` notion. -**Note that the public import functionality is not available in Java, Kotlin, -TypeScript, JavaScript, GCL, as well as C++ targets that use protobuf static -reflection.** +**Note:** The public import functionality available in Java is most effective +when moving an entire .proto file or when using `java_multiple_files = true`. In +these cases, generated names remain stable, avoiding the need to update +references in your code. While technically functional when moving a subset of a +.proto file without `java_multiple_files = true`, doing so requires simultaneous +updates to many references, thus might not significantly ease migration. The +functionality is not available in Kotlin, TypeScript, JavaScript, GCL, or with +C++ targets that use protobuf static reflection. `import public` dependencies can be transitively relied upon by any code importing the proto containing the `import public` statement. For example: @@ -1006,7 +1011,11 @@ following rules: effect as if you had cast the number to that type in C++ (for example, if a 64-bit number is read as an int32, it will be truncated to 32 bits). * `sint32` and `sint64` are compatible with each other but are *not* - compatible with the other integer types. + compatible with the other integer types. If the value written was between + INT_MIN and INT_MAX inclusive it will parse as the same value with either + type. If an sint64 value was written outside of that range and parsed as an + sint32, the varint is truncated to 32 bits and then zigzag decoding occurs + (which will cause a different value to be observed). * `string` and `bytes` are compatible as long as the bytes are valid UTF-8. * Embedded messages are compatible with `bytes` if the bytes contain an encoded instance of the message. diff --git a/content/reference/protobuf/textformat-spec.md b/content/reference/protobuf/textformat-spec.md index c4b2e949..14e57c6c 100644 --- a/content/reference/protobuf/textformat-spec.md +++ b/content/reference/protobuf/textformat-spec.md @@ -171,12 +171,12 @@ escape = "\a" (* ASCII #7 (bell) *) Octal escape sequences consume up to three octal digits. Additional digits are passed through without escaping. For example, when unescaping the input `\1234`, -the parser consumes three octal digits (123) to unescape the byte value 0x83 -(ASCII 'S') and the subsequent '4' passes through as the byte value 0x34 (ASCII -'4'). To ensure correct parsing, express octal escape sequences with 3 octal -digits, using leading zeros as needed, such as: `\000`, `\001`, `\063`, `\377`. -Fewer than three digits are consumed when a non-numeric character follows the -numeric characters, such as `\5Hello`. +the parser consumes three octal digits (123) to unescape the byte value 0x53 +(ASCII 'S', 83 in decimal) and the subsequent '4' passes through as the byte +value 0x34 (ASCII '4'). To ensure correct parsing, express octal escape +sequences with 3 octal digits, using leading zeros as needed, such as: `\000`, +`\001`, `\063`, `\377`. Fewer than three digits are consumed when a non-numeric +character follows the numeric characters, such as `\5Hello`. Hexadecimal escape sequences consume up to two hexadecimal digits. For example, when unescaping `\x213`, the parser consumes only the first two digits (21) to diff --git a/content/support/cross-version-runtime-guarantee.md b/content/support/cross-version-runtime-guarantee.md index af7daa3e..1d0db2bd 100644 --- a/content/support/cross-version-runtime-guarantee.md +++ b/content/support/cross-version-runtime-guarantee.md @@ -13,9 +13,9 @@ using the generated code. When these come from different releases of protobuf, we are in a "cross version runtime" situation. We intend to offer the following guarantees across all languages except -[C++](#cpp). These are the default guarantees; however, owners of protobuf code -generators and runtimes may explicitly override them with more specific -guarantees for that language. +[C++ and Rust](#cpp). These are the default guarantees; however, owners of +protobuf code generators and runtimes may explicitly override them with more +specific guarantees for that language. Protobuf cross-version usages outside the guarantees are **error-prone and not supported**. Version skews can lead to *flakes and undefined behaviors* that are @@ -120,9 +120,10 @@ matrix: Coexistence of multiple major versions in the same process is **not** supported. -## C++ Specific Guarantees {#cpp} +## C++ and Rust Specific Guarantees {#cpp} + +Protobuf C++ and Rust disclaim all cross-runtime support and require an exact +match between the generated code version and the runtime version at all times. -Protobuf C++ disclaims all cross-runtime support and requires an exact match -between its generated code version and its runtime version at all times. Additionally, Protobuf C++ makes no guarantees about ABI stability across any releases (major, minor, or micro). diff --git a/content/support/version-support.md b/content/support/version-support.md index 014f3209..72201e32 100644 --- a/content/support/version-support.md +++ b/content/support/version-support.md @@ -121,7 +121,7 @@ Future plans are shown in *italics* and are subject to change. - + @@ -178,7 +178,7 @@ Future plans are shown in *italics* and are subject to change. - + @@ -194,7 +194,7 @@ Future plans are shown in *italics* and are subject to change. - + @@ -210,7 +210,7 @@ Future plans are shown in *italics* and are subject to change. - + @@ -365,28 +365,28 @@ Future plans are shown in *italics* and are subject to change. - + - + - - + +
25 May 2022 31 Mar 2024
4.x 16 Feb 2023 31 Mar 2025
4.x 22.x-25.x 4.22
5.x 26.x-29.x 5.29 5.29
6.x 30.x-33.x
3.x 16 Feb 202331 Mar 2026*31 Mar 2027*
4.x 13 Mar 202431 Mar 202731 Mar 2028
5.xQ1 2026*31 Mar 2028Q1 2027*31 Mar 2029
{{% alert title="Note" color="note" %}} -The maintenance support window for the Protobuf Java 3.x release will be 24 +The maintenance support window for the Protobuf Java 3.x release will be 36 months rather than the typical 12 months for the final release in a major -version line. Future major version updates (5.x, planned for Q1 2026) will adopt +version line. Future major version updates (5.x, planned for Q1 2027) will adopt an improved ["rolling compatibility window"](/support/cross-version-runtime-guarantee/#major) that should allow a return to 12-month support windows. There will be no major -version bump in Q1 2025.{{% /alert %}} +version bumps in Q1 2025 and Q1 2026.{{% /alert %}} **Release support chart** @@ -522,7 +522,7 @@ Future plans are shown in *italics* and are subject to change. 25Q3 25Q4 - + 3.x 22.x-29.x 3.22 @@ -538,7 +538,7 @@ Future plans are shown in *italics* and are subject to change. 3.29 3.29 - + 4.x 30.x+ @@ -592,7 +592,7 @@ Future plans are shown in *italics* and are subject to change. Release date End of support - + 3.x 16 Feb 2023 31 Mar 2025 @@ -623,7 +623,7 @@ Future plans are shown in *italics* and are subject to change. 25Q3 25Q4 - + 3.x 22.x-25.x 3.22 @@ -701,7 +701,7 @@ Future plans are shown in *italics* and are subject to change. Release date End of support - + 4.x 16 Feb 2023 31 Mar 2025 @@ -737,7 +737,7 @@ Future plans are shown in *italics* and are subject to change. 25Q3 25Q4 - + 4.x 22.x-25.x 4.22 @@ -753,7 +753,7 @@ Future plans are shown in *italics* and are subject to change. - + 5.x 26.x-29.x @@ -769,7 +769,7 @@ Future plans are shown in *italics* and are subject to change. 5.29 5.29 - + 6.x 30.x+ @@ -831,7 +831,7 @@ Future plans are shown in *italics* and are subject to change. Release date End of support - + 3.x 16 Feb 2023 31 Mar 2025 @@ -862,7 +862,7 @@ Future plans are shown in *italics* and are subject to change. 25Q3 25Q4 - + 3.x 22.x-25.x 3.22