Skip to content

Documentation change #248

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 8, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions content/best-practices/no-cargo-cults.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ type = "docs"

Do not
[cargo cult](https://en.wikipedia.org/wiki/Cargo_cult_programming)
settings in proto files. If \
you are creating a new proto file based on existing schema definitions, don't
apply option settings except for those that you understand the need for.
settings in proto files. If you are creating a new proto file based on existing
schema definitions, don't apply option settings except for those that you
understand the need for.

## Best Practices Specific to Editions {#editions}

Expand Down
46 changes: 9 additions & 37 deletions content/getting-started/pythontutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,14 +76,16 @@ each field in the message. Here is the `.proto` file that defines your messages,
`addressbook.proto`.

```proto
syntax = "proto2";
edition = "2023";

package tutorial;

option features.field_presence = EXPLICIT;

message Person {
optional string name = 1;
optional int32 id = 2;
optional string email = 3;
string name = 1;
int32 id = 2;
string email = 3;

enum PhoneType {
PHONE_TYPE_UNSPECIFIED = 0;
Expand All @@ -93,8 +95,8 @@ message Person {
}

message PhoneNumber {
optional string number = 1;
optional PhoneType type = 2 [default = PHONE_TYPE_HOME];
string number = 1;
PhoneType type = 2 [default = PHONE_TYPE_HOME];
}

repeated PhoneNumber phones = 4;
Expand Down Expand Up @@ -135,39 +137,9 @@ less-commonly used optional elements. Each element in a repeated field requires
re-encoding the tag number, so repeated fields are particularly good candidates
for this optimization.

Each field must be annotated with one of the following modifiers:

- `optional`: the field may or may not be set. If an optional field value
isn't set, a default value is used. For simple types, you can specify your
own default value, as we've done for the phone number `type` in the example.
Otherwise, a system default is used: zero for numeric types, the empty
string for strings, false for bools. For embedded messages, the default
value is always the "default instance" or "prototype" of the message, which
has none of its fields set. Calling the accessor to get the value of an
optional (or required) field which has not been explicitly set always
returns that field's default value.
- `repeated`: the field may be repeated any number of times (including zero).
The order of the repeated values will be preserved in the protocol buffer.
Think of repeated fields as dynamically sized arrays.
- `required`: a value for the field must be provided, otherwise the message
will be considered "uninitialized". Serializing an uninitialized message
will raise an exception. Parsing an uninitialized message will fail. Other
than this, a required field behaves exactly like an optional field.

{{% alert title="Important" color="warning" %}} **Required Is Forever**
You should be very careful about marking fields as `required`. If at some point
you wish to stop writing or sending a required field, it will be problematic to
change the field to an optional field -- old readers will consider messages
without this field to be incomplete and may reject or drop them unintentionally.
You should consider writing application-specific custom validation routines for
your buffers instead. Within Google, `required` fields are strongly disfavored;
most messages defined in proto2 syntax use `optional` and `repeated` only.
(Proto3 does not support `required` fields at all.)
{{% /alert %}}

You'll find a complete guide to writing `.proto` files -- including all the
possible field types -- in the
[Protocol Buffer Language Guide](/programming-guides/proto2).
[Protocol Buffer Language Guide](/programming-guides/editions).
Don't go looking for facilities similar to class inheritance, though -- protocol
buffers don't do that.

Expand Down
5 changes: 3 additions & 2 deletions content/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,15 @@ binaries, follow these instructions:

```sh
PB_REL="https://github.com/protocolbuffers/protobuf/releases"
curl -LO $PB_REL/download/v< param protoc-version >/protoc-< param protoc-version >-linux-x86_64.zip
curl -LO $PB_REL/download/v30.2/protoc-30.2-linux-x86_64.zip

```

2. Unzip the file under `$HOME/.local` or a directory of your choice. For
example:

```sh
unzip protoc-< param protoc-version >-linux-x86_64.zip -d $HOME/.local
unzip protoc-30.2-linux-x86_64.zip -d $HOME/.local
```

3. Update your environment's path variable to include the path to the `protoc`
Expand Down
14 changes: 14 additions & 0 deletions content/news/2025-03-18.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
+++
title = "Changes Announced on March 18, 2025"
linkTitle = "March 18, 2025"
toc_hide = "true"
description = "Changes announced for Protocol Buffers on March 18, 2025."
type = "docs"
+++

## Dropping Ruby 3.0 Support

As per our official
[Ruby support policy](https://cloud.google.com/ruby/getting-started/supported-ruby-versions),
we will be dropping support for Ruby 3.0 and lower in Protobuf version 31, due
to release in April, 2025. The minimum supported Ruby version will be 3.1.
23 changes: 23 additions & 0 deletions content/news/v31.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
+++
title = "News Announcements for Version 31.x"
linkTitle = "Version 31.x"
toc_hide = "true"
description = "Changes announced for Protocol Buffers version 31.x."
type = "docs"
+++

The following announcements are specific to Version 31.x. For information
presented chronologically, see [News](/news).

The following sections cover planned breaking changes in the v31 release,
expected in 2025 Q2. Also included are some changes that aren't breaking but may
require action on your part. These describe changes as we anticipate them being
implemented, but due to the flexible nature of software some of these changes
may not land or may vary from how they are described in this topic.

### Dropping Ruby 3.0 Support

As per our official
[Ruby support policy](https://cloud.google.com/ruby/getting-started/supported-ruby-versions),
we will be dropping support for Ruby 3.0. The minimum supported Ruby version
will be 3.1.
6 changes: 5 additions & 1 deletion content/programming-guides/editions.md
Original file line number Diff line number Diff line change
Expand Up @@ -1003,7 +1003,11 @@ following rules:
effect as if you had cast the number to that type in C++ (for example, if a
64-bit number is read as an int32, it will be truncated to 32 bits).
* `sint32` and `sint64` are compatible with each other but are *not*
compatible with the other integer types.
compatible with the other integer types. If the value written was between
INT_MIN and INT_MAX inclusive it will parse as the same value with either
type. If an sint64 value was written outside of that range and parsed as an
sint32, the varint is truncated to 32 bits and then zigzag decoding occurs
(which will cause a different value to be observed).
* `string` and `bytes` are compatible as long as the bytes are valid UTF-8.
* Embedded messages are compatible with `bytes` if the bytes contain an
encoded instance of the message.
Expand Down
118 changes: 52 additions & 66 deletions content/programming-guides/encoding.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,15 @@ discuss aspects of the wire format.
The Protoscope tool can also dump encoded protocol buffers as text. See
https://github.com/protocolbuffers/protoscope/tree/main/testdata for examples.

All examples in this topic assume that you are using Edition 2023 or later.

## A Simple Message {#simple}

Let's say you have the following very simple message definition:

```proto
message Test1 {
optional int32 a = 1;
int32 a = 1;
}
```

Expand Down Expand Up @@ -241,7 +243,7 @@ Consider this message schema:

```proto
message Test2 {
optional string b = 2;
string b = 2;
}
```

Expand Down Expand Up @@ -275,7 +277,7 @@ an embedded message of our original example message, `Test1`:

```proto
message Test3 {
optional Test1 c = 3;
Test1 c = 3;
}
```

Expand All @@ -293,36 +295,49 @@ and a length of 3, exactly the same way as strings are encoded.
In Protoscope, submessages are quite succinct. ` ``1a03089601`` ` can be written
as `3: {1: 150}`.

## Optional and Repeated Elements {#optional}
## Missing Elements {#optional}

Missing `optional` fields are easy to encode: we just leave out the record if
Missing fields are easy to encode: we just leave out the record if
it's not present. This means that "huge" protos with only a few fields set are
quite sparse.

`repeated` fields are a bit more complicated. Ordinary (not [packed](#packed))
repeated fields emit one record for every element of the field. Thus, if we have
<span id="packed"></span>

## Repeated Elements {#repeated}

Starting in Edition 2023, `repeated` fields of a primitive type
(any [scalar type](/programming-guides/proto2#scalar)
that is not `string` or `bytes`) are ["packed"](/editions/features#repeated_field_encoding) by default.

Packed `repeated` fields, instead of being encoded as one
record per entry, are encoded as a single `LEN` record that contains each
element concatenated. To decode, elements are decoded from the `LEN` record one
by one until the payload is exhausted. The start of the next element is
determined by the length of the previous, which itself depends on the type of
the field. Thus, if we have:

```proto
message Test4 {
optional string d = 4;
repeated int32 e = 5;
string d = 4;
repeated int32 e = 6;
}
```

and we construct a `Test4` message with `d` set to `"hello"`, and `e` set to
`1`, `2`, and `3`, this *could* be encoded as `` `220568656c6c6f280128022803`
``, or written out as Protoscope,
`1`, `2`, and `3`, this *could* be encoded as `` `3206038e029ea705` ``, or
written out as Protoscope,

```proto
4: {"hello"}
5: 1
5: 2
5: 3
6: {3 270 86942}
```

However, records for `e` do not need to appear consecutively, and can be
interleaved with other fields; only the order of records for the same field with
respect to each other is preserved. Thus, this could also have been encoded as
However, if the repeated field is set to expanded (overriding the default packed
state) or is not packable (strings and messages) then an entry for each
individual value is encoded. Also, records for `e` do not need to appear
consecutively, and can be interleaved with other fields; only the order of
records for the same field with respect to each other is preserved. Thus, this
could look like the following:

```proto
5: 1
Expand All @@ -331,6 +346,24 @@ respect to each other is preserved. Thus, this could also have been encoded as
5: 3
```

Only repeated fields of primitive numeric types can be declared "packed". These
are types that would normally use the `VARINT`, `I32`, or `I64` wire types.

Note that although there's usually no reason to encode more than one key-value
pair for a packed repeated field, parsers must be prepared to accept multiple
key-value pairs. In this case, the payloads should be concatenated. Each pair
must contain a whole number of elements. The following is a valid encoding of
the same message above that parsers must accept:

```proto
6: {3 270}
6: {86942}
```

Protocol buffer parsers must be able to parse repeated fields that were compiled
as `packed` as if they were not packed, and vice versa. This permits adding
`[packed=true]` to existing fields in a forward- and backward-compatible way.

### Oneofs {#oneofs}

[`Oneof` fields](/programming-guides/proto2#oneof) are
Expand Down Expand Up @@ -368,53 +401,6 @@ message.MergeFrom(message2);
This property is occasionally useful, as it allows you to merge two messages (by
concatenation) even if you do not know their types.

### Packed Repeated Fields {#packed}

Starting in v2.1.0, `repeated` fields of a primitive type
(any [scalar type](/programming-guides/proto2#scalar)
that is not `string` or `bytes`) can be declared as "packed". In proto2 this is
done using the field option `[packed=true]`. In proto3 it is the default.

Instead of being encoded as one record per entry, they are encoded as a single
`LEN` record that contains each element concatenated. To decode, elements are
decoded from the `LEN` record one by one until the payload is exhausted. The
start of the next element is determined by the length of the previous, which
itself depends on the type of the field.

For example, imagine you have the message type:

```proto
message Test5 {
repeated int32 f = 6 [packed=true];
}
```

Now let's say you construct a `Test5`, providing the values 3, 270, and 86942
for the repeated field `f`. Encoded, this gives us `` `3206038e029ea705` ``, or
as Protoscope text,

```proto
6: {3 270 86942}
```

Only repeated fields of primitive numeric types can be declared "packed". These
are types that would normally use the `VARINT`, `I32`, or `I64` wire types.

Note that although there's usually no reason to encode more than one key-value
pair for a packed repeated field, parsers must be prepared to accept multiple
key-value pairs. In this case, the payloads should be concatenated. Each pair
must contain a whole number of elements. The following is a valid encoding of
the same message above that parsers must accept:

```proto
6: {3 270}
6: {86942}
```

Protocol buffer parsers must be able to parse repeated fields that were compiled
as `packed` as if they were not packed, and vice versa. This permits adding
`[packed=true]` to existing fields in a forward- and backward-compatible way.

### Maps {#maps}

Map fields are just a shorthand for a special kind of repeated field. If we have
Expand All @@ -430,8 +416,8 @@ this is actually the same as
```proto
message Test6 {
message g_Entry {
optional string key = 1;
optional int32 value = 2;
string key = 1;
int32 value = 2;
}
repeated g_Entry g = 7;
}
Expand Down
34 changes: 24 additions & 10 deletions content/programming-guides/field_presence.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,6 @@ are two different manifestations of presence for protobufs: *implicit presence*,
where the generated message API stores field values (only), and *explicit
presence*, where the API also stores whether or not a field has been set.

Historically, proto2 has mostly followed *explicit presence*, while proto3
exposes only *implicit presence* semantics. Singular proto3 fields of basic
types (numeric, string, bytes, and enums) which are defined with the `optional`
label have *explicit presence*, like proto2 (this feature is enabled by default
as release 3.15).

{{% alert title="Note" color="note" %}} We
recommend always adding the `optional` label for proto3 basic types. This
provides a smoother path to editions, which uses explicit presence by
Expand Down Expand Up @@ -179,10 +173,8 @@ affirmatively expose presence, although the same set of hazzer methods may not
generated as in proto2 APIs.

This default behavior of not tracking presence without the `optional` label is
different from the proto2 behavior. We reintroduced
[explicit presence](/editions/features#field_presence) as
the default in edition 2023. We recommend using the `optional` field with proto3
unless you have a specific reason not to.
different from the proto2 behavior. We recommend using the `optional` label with
proto3 unless you have a specific reason not to.

Under the *implicit presence* discipline, the default value is synonymous with
"not present" for purposes of serialization. To notionally "clear" a field (so
Expand All @@ -195,6 +187,28 @@ required to have an enumerator value which maps to 0. By convention, this is an
the domain of valid values for the application, this behavior can be thought of
as tantamount to *explicit presence*.

### Presence in Editions APIs

This table outlines whether presence is tracked for fields in editions APIs
(both for generated APIs and using dynamic reflection):

Field type | Explicit Presence
-------------------------------------------- | -----------------
Singular numeric (integer or floating point) | ✔️
Singular enum | ✔️
Singular string or bytes | ✔️
Singular message&#8224; | ✔️
Repeated |
Oneofs&#8224; | ✔️
Maps |

&#8224; Messages and oneofs have never had implicit presence, and editions
doesn't allow you to set `field_presence = IMPLICIT`.

Editions-based APIs track field presence explicitly, similarly to proto2, unless
`features.field_presence` is set to `IMPLICIT`. Similar to proto2 APIs,
editions-based APIs do not track presence explicitly for repeated fields.

## Semantic Differences {#semantic-differences}

The *implicit presence* serialization discipline results in visible differences
Expand Down
Loading