Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JsonDecoder.peekToken() ? #2223

Open
martinbonnin opened this issue Mar 9, 2023 · 11 comments
Open

Add JsonDecoder.peekToken() ? #2223

martinbonnin opened this issue Mar 9, 2023 · 11 comments

Comments

@martinbonnin
Copy link
Contributor

martinbonnin commented Mar 9, 2023

Some backends return polymorphic data without a descriptor:

{
  "value": "someValue"
}

vs

{
  "value": {
    "version": 0,
    "content": "someValue"
  }
}

I know this is quite the edge case but I haven't found a way to decode this without going through JsonElement and buffering everything.

Would it be possible to introduce jsonDecoder.peekToken():

enum class JsonToken {
  BEGIN_OBJECT,
  BEGIN_ARRAY,
  STRING
  // others?
}

public interface JsonDecoder : Decoder, CompositeDecoder {
  /**
   * peeks into the json and returns the next token (without consuming it)
   */
  fun peekToken(): JsonToken
}

This way, users that know they are in a JSON context could do stuff like this:

  override fun deserialize(decoder: Decoder): Schema {
    // unsafe cast, the user needs to assume JSON but in some cases it's doable
    decoder as JsonDecoder
    return when (decoder.peekToken()) {
      BEGIN_OJBECT -> decoder.decodeStructure(/*...*/)
      STRING -> decoder.decodeString()
      else -> error("unexpected token")
    }
  }
@martinbonnin martinbonnin changed the title Add JsonDecoder.peekNextToken() ? Add JsonDecoder.peekToken() ? Mar 9, 2023
@Kantis
Copy link
Contributor

Kantis commented May 11, 2023

Would love to see a JsonDecoder#discardToken() feature as well, so we could easily create a collection serializer which discards illegal entries for instance.

@venator85
Copy link

Hi @martinbonnin, I have exactly the same case as you, could you please share how did you manage to decode it with the current version of kotlinx.serialization? Thanks ;)

@martinbonnin
Copy link
Contributor Author

@venator85 sorry for the late reply, I don't have all the context anymore but I probably ended up decoding everything to a JsonElement

@nomisRev
Copy link

+1 on the solution @martinbonnin mentioned. It allows me to consume a section from the Json, and manually parse or "peek".

@Thomas-Vos
Copy link

This would be amazing! I have a JSON document that needs to be parsed. But depending on the first token, whether that is an object or an array, it needs to choose between two different serializers. Currently I have to parse the entire document using decodeJsonElement() just to see the first token, which is obviously not that fast.

@Thomas-Vos
Copy link

Just found out there is actually an internal API for this, I guess this will do for now.

@Suppress("INVISIBLE_REFERENCE", "INVISIBLE_MEMBER")
run {
    decoder as kotlinx.serialization.json.internal.StreamingJsonDecoder
    when (decoder.lexer.peekNextToken()) {
        kotlinx.serialization.json.internal.TC_BEGIN_OBJ -> {
        }
        else -> {}
    }
}

@hansenji
Copy link

The ability to parse Json or other formats token by token would be huge for low memory devices. I have some large files that don't fit entirely into memory. This would make parsing them possible.

@pdvrieze
Copy link
Contributor

@hansenji The problem is how you are going to deal with partially decoded values. It almost only make sense for collections, but at what level of the hierarchy? (top only?).
The best approach I see is to use custom serializers for this (possibly using direct format access through casting).

@sandwwraith
Copy link
Member

Note that StreamingJsonDecoder is not the only implementation, so it is not guaranteed to work in all cases. Particularly, if you have polymorphic value decoded, a decoder would already have a JsonElement without any notion of tokens.

@hansenji
Copy link

hansenji commented Jun 5, 2024

@pdvrieze Ideally the solution would be a complete peek, nextName, nextValue, skip, beginObject/Array, endObject/Array. With these I can manually parse the file.
Ideally with this then you could pass a reader to a generated serializer and read just that value appropriately and not close the stream allowing partial parts to be read in accordingly.

@fab1an
Copy link

fab1an commented Dec 3, 2024

Btw. I implemented this as a separate library: https://github.com/fab1an/kotlin-json-stream

API-Docs: https://fab1an.github.io/kotlin-json-stream/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants