Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

case classes generated by scalaxb cannot be serialized to Parquet #564

Open
ghost opened this issue Mar 30, 2021 · 6 comments
Open

case classes generated by scalaxb cannot be serialized to Parquet #564

ghost opened this issue Mar 30, 2021 · 6 comments

Comments

@ghost
Copy link

ghost commented Mar 30, 2021

case class Type(code: Option[String] = None,
  description: Option[String] = None,
  any: Seq[scalaxb.DataRecord[Any]] = Nil)
case class Types(rdc_type: Seq[Type] = Nil,
  attributes: Map[String, scalaxb.DataRecord[Any]] = Map.empty) {
  lazy val typeValue = attributes("@type").as[String]
}
case class RID(
                           rdc_types: Seq[Types] = Nil,
                           any: Seq[scalaxb.DataRecord[Any]] = Nil)

When I try to write a parquet file or Dataframe I am getting issue for DataRecord[Any]. How should I resolve the issue?

@eed3si9n
Copy link
Owner

@ag4s

When I try to write a parquet file or Dataframe I am getting issue for DataRecord[Any]. How should I resolve the issue?

Could you provide more details please? - https://scalaxb.org/issue-reporting-guideline
From what you posted above, I can't tell what the actual problem is.

@ghost
Copy link
Author

ghost commented Mar 30, 2021

By using scalaxb the above case class is generated and it has DataRecord[Any]. Then using this case class I am reading the XML file using fromXML and saving it to val. They I am trying to save it to parquet format using spark or any other tool and that's where the problem happen. When spark try to read it, it cannot recognize the DataRecord[Any] and that is the problem. So like to save it into parquet or dataframe. Case classes generated can read XML my problem is how it is saved to parquet format if it needs to be saved (specially handling DataRecord[Any]). If you have any example of reading DataRecord[Any] for creating parquet would be great.

@eed3si9n
Copy link
Owner

Could you copy-paste the actual error message that you see during runtime? Is it missing Jackson databinding?

@eed3si9n eed3si9n changed the title scalaxb.DataRecord[Any] case classes generated by scalaxb cannot be serialized to Parquet Mar 30, 2021
@ghost
Copy link
Author

ghost commented Mar 30, 2021

ParquetWriter.writeAndClose(path, val)
:15: error: could not find implicit value for parameter writerFactory: com.github.mjakubowski84.parquet4s.ParquetWriter.ParquetWriterFactory[RID]
Error occurred in an application involving default arguments.
ParquetWriter.writeAndClose(path, val)

For val .... = scalaxb.fromXML [RID] (pathofXML)
This is by using ParquetWriter and also tried I tried to createDataFrame from SQLContext to create but no success. Unfortunately no further details given in an error

@eed3si9n
Copy link
Owner

According to the readme, this is how you can write a codec?

import com.github.mjakubowski84.parquet4s.{OptionalValueCodec, Value}

implicit def datarecordDummyCodec[A]: OptionalValueCodec[DataRecord[A]] = 
  new OptionalValueCodec[DataRecord[A]] {
    override protected def decodeNonNull(value: Value, configuration: ValueCodecConfiguration): DataRecord[A] = ???
    override protected def encodeNonNull(data: CustomType, configuration: ValueCodecConfiguration): Value = ???
  }

@ghost
Copy link
Author

ghost commented Mar 30, 2021

implicit val scalaxbCodec: OptionalValueCodec[DataRecord[Any]] = new OptionalValueCodec[DataRecord[Any] {
override protected def decodeNonNull(value: Value, configuration: ValueCodecConfiguration): DataRecord[Any] = ???
override protected def encodeNonNull(data: DataRecord[Any], configuration: ValueCodecConfiguration): Value = {data match {
case DataRecord(uri, key, Some(value: Int)) => implicitly[ValueCodec[Int]].encode(value, configuration)}}}

Wrote the above code but getting the same issue. Not sure if further more things are require to complete

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant