Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native IFC and asynchronous workflows #2

Open
brunopostle opened this issue Nov 15, 2024 · 45 comments
Open

Native IFC and asynchronous workflows #2

brunopostle opened this issue Nov 15, 2024 · 45 comments

Comments

@brunopostle
Copy link

@theoryshaw suggested that it would be a good idea to bring this discussion here.

First, some background, you can skip if you have heard all this before:

The Bonsai BIM tool is a Native IFC application, ie. instead of working with an application-specific data model, Native IFC tools operate by editing and authoring IFC models directly - rather than relegating IFC to a reference or archiving role.

Specifically, in a Native IFC application such as Bonsai BIM or FreeCAD, the data is serialised in a consistent format - to the extent that an edited and saved IFC file will be identical to the original file (aside from just those entities that have been edited).

IFC-SPF is a line-oriented file format, so this lends itself nicely to storage in a version control system such as Git. We find that because files are only partially updated and that repositories are compressed, a Git repository recording a full history of hundreds of discrete changes can be smaller than the raw IFC file itself.

A unexpected development that was that a combination of this parsimonious approach to file handling, and a quirk of the SPF file format, gives us an asynchronous collaboration workflow practically for free - we can have multiple authors working on the same IFC model without requiring a continuous connection to a central database.

Twenty-first century software development is characterised by this kind of asynchronous working. The reason we use Git for software development is that two authors can make independent changes to the same file, 'forking' it, and then what is called a three-way-merge can resolve these changes into a single combined result - unifying the forked 'branches'.

We can do exactly the same thing with IFC, in fact in many ways the process is more robust because in an IFC-SPF file every entity is fully addressable due to the STEP-IDs (the #12434 numbering you see when you open one in a text editor). Three-way-merging software sourcecode can easily result in invalid and broken files, but we don't have that problem because IFC-SPF files are well-structured and this kind of corruption is hard to achieve. For a live demonstration of this in action, see this video presentation using Bonsai BIM (at the time called BlenderBIM), you can browse a copy of the IFC repository used in the demonstration here.

That was an overview of the situation, hopefully the links above give more comprehensive information.

IFC5 is text based, the JSON encoding and other features have lots of potential. Native IFC editing will clearly continue to work, and storage in version control systems such as Git will be fine. However it would be a major backwards step if this asynchronous collaboration isn't supported (and potentially improved-on).

IFC5 does away with STEP-IDs, the file format appears to be one big anonymous list, but one where the order of entities is significant.

Entities have a 'name' attribute which resembles the SPF GUID, but it isn't unique: the hello-wall sample has six entities with the name N93791d5d5beb437bb8ec2f1f0ba4bf3b. Does each of them override or add-to information on the previous entity with the same name?

How do we address these entities? This is important to be able to track changes. Do we have to refer to the 'fifth N93791d5d5beb437bb8ec2f1f0ba4bf3b entity? What if some software deletes the second? (but then the fifth entity is now the fourth).

Is this intended to be an append-only file format? I can see that this would have some benefits, as the entire history could be reconstructed by trimming later entities? Is multi-user collaboration intended by simply appending all changes from all sources, with the most recent arrival 'winning'?

@yorikvanhavre
Copy link

yorikvanhavre commented Nov 18, 2024

Where can we see the hello-wall example you are referring to, @brunopostle ?

@brunopostle
Copy link
Author

It's here: hello-wall.ifcx.

@aothms
Copy link
Contributor

aothms commented Nov 19, 2024

I think an important aspect in this discussion is that IFC5 borrows the concepts of layers from USD. So there is less of a need to work around express's limitations on not supporting relationships outside of file boundaries. Asynchronously created models can remain independent layers and folded into the same 'stage' conflict-free by means of selecting the layer order, while still allowing rich relationships and data integration by means of the central tree.

Of course this is only one possible way of working.

Working towards a single layer by means of explicit conflict resolution for the entire stage is still possible, either line-based or more maybe more object-model based. I think the models proposed in this repo work really well for that because the model is guaranteed to be a flat list, so no arbitrarily deep diffs.

The duplicate names you see are specific 'components' that all are to be superimposed/composed to form the complete definition of that node ({'def': 'over'}). These are namespaced, so the pair (N93791d5d5beb437bb8ec2f1f0ba4bf3b, 'ifc:class') maybe could be unique. But whether that is something that would be enforced is not clear to me.

Is this intended to be an append-only file format?

It could indeed be like that. If you look at USD, they indeed don't have a deletion operation specifically, but with an active=False (or similar) flag they can 'remove' subtrees from the model. So indeed that is akin to something append only. But of course you could also remove data from the model if you 'own' the layer.

The basis in all of this is very flexible and allows for multiple ways of working.

Also see the basic viewer prototype that is online now: https://ifc5.technical.buildingsmart.org/viewer/

@yorikvanhavre
Copy link

If I understand correctly, the json format allows things the step/express format wouldn't. But how does that translate in file formats support? There is no plan to "remove" the IFC-STP format, right? Right? 😅

If not, then how will that work? Will there be certain features that cannot be translated to IFC_STP?

@aothms
Copy link
Contributor

aothms commented Nov 22, 2024

I know what you mean. People can get emotionally attached to the craziest things ;) The step file grammar is nice and elegant, but for the majority outside of people like us it's an impediment to use IFC when it doesn't come with a builtin parser in your favourite programming language.

@brunopostle
Copy link
Author

@yorikvanhavre the overlay system won't survive translation back to STEP, and I guess that since all attributes are named rather than part of a fixed length list, any custom attributes you add won't survive either.

I find the JSON incredibly hard to read, but once I converted the sample to YAML with yq it was much clearer what is going on as this is much less verbose. Also note that the trailing ']' in the JSON effectively prevents using this as a truly append-only file format - YAML doesn't have this problem.

@aothms
Copy link
Contributor

aothms commented Nov 22, 2024

I think that's an interesting suggestion. YAML also adds support for references (but not to a named path it seems, only to explicit anchors). Maybe TOML is more in favour these days, but I never really bother to understand the differences.

I'm also a fan of https://jsonlines.org/ because of the streaming/appending you mentioned, but I guess it wouldn't work well for this application area.

@brunopostle
Copy link
Author

I guess with JSON any appended data could be a new list starting with [, this would allow atomic updates, ie. if you wanted to wind back it would be clear what chunk needed to be removed.

@theoryshaw
Copy link

How about GUIDs for all entities?

@aothms
Copy link
Contributor

aothms commented Nov 23, 2024 via email

@brunopostle
Copy link
Author

@aothms what does ECS refer-to here? I'm not good with all these acronyms.

If a project is defined by a cloud of linked files providing overlays, how is this configured? Does each file refer to its parent internally, or is there a separate configuration 'file' that defines the inheritance relationships?

@aothms
Copy link
Contributor

aothms commented Nov 23, 2024

Sorry, ECS is Entity-Component-System here.

That's a very good and relevant question though. In USD it's as you describe (but reversed) layers have explicit sublayers so in the end you load only one file. Currently the working hypothesis is that AEC is a bit more fragmented (anarchaic?) and that it's more flexible to just let every end-user individually compose their view of the state of the project adhoc locally by determining their own (non-hierarhical) layer order.

@brunopostle
Copy link
Author

How would this system deal with changes in a parent that break children?

eg:

  1. A parent file creates and uses a material (or a type etc..)
  2. A child file uses this material
  3. The parent file has no more need for the material and removes it

The child file now has a broken reference to a nonexistent material.

(ifcmerge will break under the same circumstances, but Bonsai just ignores the error)

@aothms
Copy link
Contributor

aothms commented Nov 23, 2024

I would say it's up to the software. There's quite some possible scenarios:

  • Both authors used a def, for example in case of a building storey. Author 1 deleting their def essentially makes no difference as the def of author 2 still exists.
  • Author 2 used an over. in this case author 2's opinions are essentially not applied onto the tree, but this is detectable.
  • The relationship is not a matter of defs and overs, but it's a schema datatype (the spaceboundaries in the hello wall example for ex).

@tomi-p
Copy link

tomi-p commented Nov 24, 2024

Is there also some kind of json-zip solution to be developed? We already have a problem with IFC files as the amount of data they contain has grown (and is still growing). IFC files larger than 1 Gb are by no means exceptional these days. The json format is going to increase the size of raw files many times over. Although network speeds have increased, files of several gigabytes in size will be a problem. And while the vision is probably API-based data transfer, files will certainly be used well into the future. What is the plan?

@aothms
Copy link
Contributor

aothms commented Nov 25, 2024

The json format is going to increase the size of raw files many times over

Not necessarily. It's not like a more or less fixed multiplier like ifc4-json encodings that have been experimented with.

What is the plan?

I think the plan is to build more meaningful and incremental exchanges. A megabyte is about a book worth of data (sorry, lame analogy). Buildings are complex, sure, but I don't think we necessarily need to accept that 1GB of data is the norm for a mid-sized model.

That said, yes scalability needs to be addressed. We have several options, such as one of the binary json encodings, USD has a binary serialization, or zipped variant, which also has advantages that heterogeneous assets can be bundled. I think it's more or less on purpose not really investigated because first the data model and workflows need to be a little bit clearer.

@brunopostle
Copy link
Author

That's a very good and relevant question though. In USD it's as you describe (but reversed) layers have explicit sublayers so in the end you load only one file. Currently the working hypothesis is that AEC is a bit more fragmented (anarchaic?) and that it's more flexible to just let every end-user individually compose their view of the state of the project adhoc locally by determining their own (non-hierarhical) layer order.

It seems to me that this overlay system, as I understand it, is not at all like a CAD XREF system.

Say you have one dataset A, and another dataset B that extends and provides an overlay on A:

A can exist without B, but B is meaningless without A, so it would make sense for B to internally include a link to A.

@aothms
Copy link
Contributor

aothms commented Nov 27, 2024

but B is meaningless without A

I think it's spectrum, it could be that B is near-independent of A, like adding a domain specific view for the elements in A, or indeed a minimal correction of some sorts.

For me, every bit of layer metadata that adds semantics, makes the object in the exchange less pure somehow. I'd rather see layers as flat ordered collections of objects that can be concatenated. The layer-metadata interferes with that, but it's not insurmountable of course.

@marwiss
Copy link

marwiss commented Jan 6, 2025

@brunopostle @aothms I hope that @brunopostle:s work on GIT collaboration for IFC could be equally well (or even better) applied to IFC5 by means of the concept of layers from USD. I think that collaboration could be more precise using layers. GIT track changes to lines. Whereas USD track changes to individual GUID:s/objects/prims regardles of their representation in text. To allow for "management of change" not just adding information, I am also thinking about the possiblility to not only overlay using def:over, but to also allow for cuts in the model using def:remove? ...or def:over "NULL". I also wonder how to track the history of layers/changes to the model? Who added this layer, when, why, etc... I also wish that all objects could have both a human readable name AND a GUID at the same time, otherwise it seems like we need to create each object using two JSON-objects. Conlusion: Having GIT-opportunities to track USD-style layers in IFC5 as GIT-commits would be great! What does it take to apply GIT workflows to USD style layers in IFC5? It would be great if GIT could track changes to GUID:s also, not only track changes to lines of text.

@aothms
Copy link
Contributor

aothms commented Jan 7, 2025

What does it take to apply GIT workflows to USD layers?

More experiments?

@marwiss
Copy link

marwiss commented Jan 7, 2025

@brunopostle @aothms

Some alternatives:

  1. Create a JSON-USD version with one JSON-object (one def with one UUID) per line of text (almost like in STEP). And then use existing GIT and plain JSON text, just like the implementation by @brunopostle. Maybe it could even be possible to implement USD-JSON using STEP (he he :D, I know they want to get rid of STEP).

  2. Add functionality similar to GIT in BCF5 API. BCF API should be about collaboration. But could we collaborate by making changes to models directly, not just by communicating viewpoints, issues and comments? BCF API used to have a BIM-snippet route, for making changes to models. Remove BIM snippet BCF-XML#314 This route was never implemented by anyone, because it seemed impossible. With IFC5 and layers the BIM-snippet route can have a revival, togehter with other new routes for versioning. One USD-JSON layer would be equal to one BIM-snippet in BCF API. Then you also have to develop your own BCF client and BCF API server implementing this.

  3. Look for existing software that tracks changes to objects, JSON dicts, or UUID:s. And see if they can be used.

  4. See if GIT could be manipulated or adapted, to keep track of changes to JSON-objects based on their UUID:s instead of keeping track to changes to lines in text-files. https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain

  5. Import the JSON objects into some graph databases (or triple stores?) tailored for keeping track of changes to nodes. One object would equal to one node in the database. Eventuelly you might like to compose the graph, and flatten the data into one layer.

@brunopostle
Copy link
Author

brunopostle commented Jan 7, 2025

[@marwiss I'm sure you know this, but I'm hoping this discussion has a wider readership]

In response to 1, 3 & 4: IFC Git merging works currently because every entity has a STEP-ID, rather than every entity being on one line. Ifcmerge doesn't rely on source-code diff/patch with fuzzy-context matching (stuff that is too unreliable to use with BIM data). So in principle it would work exactly the same in Git if the step format was rewritten in a multi-line format that retained the STEP-IDs:

#46=IFCBUILDINGSTOREY('2dSYcF59PE4hH6e4u_Cztn',$,'2','Storey 2',$,#52,$,'Storey 2',.ELEMENT.,0.);
#52=IFCLOCALPLACEMENT(#45,#51);
46:
    Class: IFCBUILDINGSTOREY
    GlobalId: 2dSYcF59PE4hH6e4u_Cztn
    Name: '2'
    Description: 'Storey 2'
    ObjectPlacement: 52
    LongName: 'Storey 2'
    CompositionType: ELEMENT
    Elevation: 0.0
52:
    Class: IFCLOCALPLACEMENT
    PlacementRelTo: 45
    RelativePlacement: 51

This illustrates one of the main drawbacks of STEP: every entity is a fixed-length list, which makes it stupidly difficult to evolve the format - whereas attributes as key/value pairs are legible and extendible (bloat isn't a problem when the data is compressed).

..but IFC5 doesn't want to just get rid of the attribute list, the plan is to get rid of STEP-IDs altogether. Now the entities are just one big list and they are not addressable in any kind of reliable way (diff/patch context matching is not reliable). We should do some testing, as Thomas says, but I suspect that the sort of three-way merge that ifcmerge performs isn't going to work here.

This isn't all bad. Ifcmerge isn't perfect, the merge is asymmetrical because it has to rewrite STEP-IDs, and would work better if all entities had full GUIDs that didn't collide. Whereas IFC5 offers an 'overlay' system which potentially provides multi-user collaborative workflows, but they would be a different kind of thing - you could still store the data in Git, but probably asynchronous fork/merge workflows are only possible if Native IFC5 editors were strictly append-only - maybe this is what we want?

@marwiss
Copy link

marwiss commented Jan 7, 2025

Thanks for explaining!

Could "IFC5 Git" use the JSON-USD object GUID directly, when there is no STEP ID as in IFC<5?
(Let's assume hypothetically that all objects would be rooted and would have a GUID in IFC5).

Maybe the addition of a a "def:remove" would enable (not only appending but also) deleting IFC objects, using layers, in IFC/JSON-UDS. Then all types of "CRUD change operations" could be performed using layers in IFC5. Creation, Updating (overlaying) and Deleting.

@brunopostle
Copy link
Author

I don't think the plan is to give all entities a GUID.

Yes, an append-only workflow would require some way of effectively deleting an earlier entity by creating a new entity tagged "def:remove" or "active=False" or similar.

@yorikvanhavre
Copy link

We will still need to be able to reference an entity from another, right? The IFC format relies a lot on transversal relationships (relates to, is assigned to...). I mean, not everything can be just contained in one another, like in classical json. IIRC in IFC4-json some kind of ID is still used so you can relate one element to another even if it's contained in another structure.

If that mechanism is going to stay, that should also serve Git...

@marwiss
Copy link

marwiss commented Jan 8, 2025

I don't think the plan is to give all entities a GUID.

I do not understand how entities could be overlayed using def:over without a GUID. I think everything all prims can be referenced directly in USD somehow, using the name/path. The name is a GUID in the examples of IFC5-JSON-USD. This is why I assume that all entities will have a GUID in IFC5. But I think this is something still up for discussion, maybe not decided yet. But ifnyou use USD then I think everything has a name, not sure though. Everything had a name in the examples at least.

Yes, an append-only workflow would require some way of effectively deleting an earlier entity by creating a new entity tagged "def:remove" or "active=False" or similar.

@theoryshaw
Copy link

theoryshaw commented Jan 8, 2025

I don't think the plan is to give all entities a GUID.

In light of this discussion, is this something that should be reevaluated? Perhaps the benefits of each entity having their own GUID would outweigh the overhead. It seems there's a general demand to root a lot of other entities as well, regardless of this issue of creating a GIT-friendly schema.

@marwiss
Copy link

marwiss commented Jan 8, 2025

In light of this discussion, is this something that should be reevaluated? Perhaps the benefits of each entity having their own GUID would outweigh the overhead. It seems there's a general demand to root a lot of other entities as well, regardless of this issue of creating a GIT-friendly schema.

I don't think that GUIDs on all entities would be an overhead compared to IFC4/STEP. In STEP all lines have an ID. That is not necessary if all entities already have a GUID. I think the biggest overhead in IFC is that there are too many entities, and to many objectified relationships. Some of these entities and relationships could be converted into attributes instead, to make it (and the whole data-structure/graph) simpler. In JSON attributes could also be arrays or dicts instead?

In USD entities are called prims. All prims have a path with a name used for identification, according to the USD-standard.
https://docs.omniverse.nvidia.com/usd/latest/learn-openusd/terms/prim.html

Don't know if this will be the case for all entities in IFC5 though. But if IFC5 is based on USD, then this could well be the case. It could even be necessary for all entities to have a unique identifier to use the overlay functionality from USD. Can't overlay something that you cannot identify.

And in IFC5-JSON the human readable USD-name was replaced by a GUID. GUIDs are not used in USD. Prims normally have human readable identifiers.

https://remedy-entertainment.github.io/USDBook/terminology/path.html
https://docs.omniverse.nvidia.com/dev-guide/latest/programmer_ref/usd/hierarchy-traversal/find-prim-by-name.html
https://lucascheller.github.io/VFX-UsdSurvivalGuide/pages/core/elements/prim.html

So.. until anything else is decided, I think we could assume that all entities could have unique identifiers/GUIDs in IFC5. Not sure if this will be the final decision though. But don't know how overlaying would work otherwise.

@aothms
Copy link
Contributor

aothms commented Jan 9, 2025

I think the biggest overhead in IFC is that there are too many entities

I agree, and that's an overhead you see both in file size as well as performance due to the extra indirections.

Can't overlay something that you cannot identify.

True

And in IFC5-JSON the human readable USD-name was replaced by a GUID

Small nuance. The human readable names are still used in the path of a prim, but the convention is established to follow this pattern:

parent {
  child (inherit=</child_guid_asasd123>) {
  }
}
child_guid_asasd123 {
  grandchild (inherit=</grand_child_guid_asasd123>) {
  }
}
grand_child_guid_asasd123 {
}

This takes some time to familiarize yourself with, but essentially this:

  • peels down the tree into single level parent child relationships, hence resulting in a more streamable, consistent-depth structure
  • because we do over on grand_child_guid_asasd123, even if it's moved in the tree, there is still something stable to refer to
  • when a schema is build (see the recent typespec additions) and APIs this can be hidden to the user
  • hence (maybe?) the best of both worlds of USD's name paths and stable guids

This extra indirection is a bit funky, but this is what I meant with the compromise between USD, ECS and JSON.


So the point here is that the entities themselves (class prims essentially in our example) have a guid (by convention - it's still a regular name string when converted to USD that happens to be a guid). The def prims are used to build the tree which governs local placement and inherits from these class prims. The over prims (components in ECS) reference the stable guid class prims * but do in the examples currently not have a stable identfier themselves *.

With this last point it would be good to experiment:

  • If we format the ifcx consistently; one prim per line I'd propose - lexicographically sorted keys
  • sort them lexicographically by their name (as well as their attributes namespace when specifier=over)
  • do we already get meaningful diffs, can we do a 3-way merge, in what situations does this break down?
  • how does this change/improve when we * do * add stable identifiers to the over components and sort based on those.

One potential caveat here is that the sorting I propose is not necessarily allowed. If you have conflicting information within one file then the order in which this is defined affects what opinion wins in the composition. Reshuffling changes this. But the question here is whether conflicting opinions within one file are allowed. In USD you cannot have multiple definitions for the same prim path within one layer.

These kind of nuances need to be figured out and also really shape the conceptual discussions around IFC5: do these files/layers represent streams of updates linear in time or a snapshot of the in-memory model? Or both, depending on usecase?

@yorikvanhavre
Copy link

If we format the ifcx consistently; one prim per line I'd propose - lexicographically sorted keys

We could get inspiration from STEP there... (Ok I'm out 😅 )

@marwiss
Copy link

marwiss commented Jan 9, 2025

I think the biggest overhead in IFC is that there are too many entities

I agree, and that's an overhead you see both in file size as well as performance due to the extra indirections.

Accordning to Junxiang Zhu, geometric information (mostly BREPS) accounts for about 50 % of all entitites/data/filesize in IFC-files for building models on average, and for about 95 % of the entitites/data/filesize in IFC files for road models on average.

Especially, if we could store geometric information more efficiently, without having to use so many entitites for this purpose, then much overhead/storage could be saved. For more information, see: 4.3.3. Graph type of IFC-graph in https://www.sciencedirect.com/science/article/pii/S0926580523000389.

But I also think other data that uses to many entitites in IFC<5, could be stored more efficiently in JSON attributes, as arrays or dicts, instead of having to use entitites. For example, maybe property sets, and property set data.

This takes some time to familiarize yourself with, but essentially this:

  • peels down the tree into single level parent child relationships, hence resulting in a more streamable, consistent-depth structure
  • because we do over on grand_child_guid_asasd123, even if it's moved in the tree, there is still something stable to refer to

I am thinking about the possibility to do something different from the USD standard here. USD does not have GUID:s, they only have human readable path with name. But we also need GUID:s for AEC because we need something stable to refer to. But do we have to use the path/name-property for this purpose? When we have to do a def:over just to create the GUID, then this means that we have to create two entities for every object. That means a lot of unnecessare entities, and a lot of unneccessary text/overhead. Why not instead simply make a child that inherites the C++ UsdPrim class, that adds an extra GUID attribute, with getters and setters. This way each entity will have both a path/name-property AND a GUID-property directly from the beginning. No need to do a def:over. The extra def:over entity with the GUID could then be automatically generated during the process, when the IFCX-JSON is translated/converted into USDA.

  • when a schema is build (see the recent typespec additions) and APIs this can be hidden to the user
  • hence (maybe?) the best of both worlds of USD's name paths and stable guids

I think this i very good. GUID:s are needed. But, I would like to experiment with a solution where the paths/name and the GUID could be created directly in the same entity, without having to do a def:over. Because I think that could save about 20-30 % filesize. This extra indirection is also a bit "funky" :). Having a singly entity instead of two entities for the same object, could also facilitate GIT-work and implementation of IFC5 at the same time. This is because the two entities representing the same object, could be located in two completely different places in the text file.

This extra indirection is a bit funky, but this is what I meant with the compromise between USD, ECS and JSON.

So the point here is that the entities themselves (class prims essentially in our example) have a guid (by convention - it's still a regular name string when converted to USD that happens to be a guid). The def prims are used to build the tree which governs local placement and inherits from these class prims. The over prims (components in ECS) reference the stable guid class prims * but do in the examples currently not have a stable identfier themselves *.

@tomi-p
Copy link

tomi-p commented Jan 9, 2025

I think the biggest overhead in IFC is that there are too many entities

What do you mean by this? Too many entities to use to express geometry and relationships, or too many entities to represent actual building components? If you mean entities corresponding to real building components, I think there are currently too few of them in IFC. I think we need more of them.

Splitting the IFC schema and the entity class schema in two could be a good solution (maybe this was the intention from the beginning). However, the challenge is standardization. If (and when) the IFC technical schema and the class hierarchy of building elements are separated, the latter should also be certified by ISO (e.g. ISO 16739 part 2). It should be noted that as a process this is likely to be considerably longer than updating ISO 16739 part 1.

For a model to be truly machine-readable, we need a sufficient number of classes (corresponding to real life componts), so that we can understand the components of the model in the same way in all use cases anywhere in the world. If we can accomplish this with an IFC entity dictionary (e.g. bSDD) that is certified by ISO, then that is certainly a valid solution. A user-defined dictionary is not the solution; after a while we have several classes with the same meaning and we don't know what the difference is (if any).

@marwiss
Copy link

marwiss commented Jan 9, 2025

I think the biggest overhead in IFC is that there are too many entities

What do you mean by this? Too many entities to use to express geometry and relationships, or too many entities to represent actual building components? If you mean entities corresponding to real building components, I think there are currently too few of them in IFC. I think we need more of them.

I am not sure what you mean with "we need more of them". I am not talking about the number of "types of entities" (the number of classes) in the schema. I am talking about the number of individual objects needed to create the information in the model - in general - regardless of their type. I am especially thinking about objects representing BREP:S and PSET:s. I wasn't thinking about real building components (neither the number of objects nor the number of classes of building components). Maybe information about real building components should be expressed using more entitites, to enable more precise queries. But I think it is up to the creator of the model to decide upon LOIN, MMI et.c. If you want more detail and more information then you add more objects, up to you (but you don't necessarily add more "types of objects"). As a general rule, I think - as simple as possible to get the intended result - is good.

Splitting the IFC schema and the entity class schema in two could be a good solution (maybe this was the intention from the beginning). However, the challenge is standardization. If (and when) the IFC technical schema and the class hierarchy of building elements are separated, the latter should also be certified by ISO (e.g. ISO 16739 part 2). It should be noted that as a process this is likely to be considerably longer than updating ISO 16739 part 1.

I do not understand the difference between "IFC schema" and "entity class schema". The IFC schema contains classes that are templates for entitites. In other words: IFC is an entity class schema ... or what do you mean? I think it is good to only have one schema. Not sure how the schema will be specified in IFC5 though, since those ideas have not been presented specifically yet. I only know about some general ideas about future schemas, in older material.

For a model to be truly machine-readable, we need a sufficient number of classes (corresponding to real life componts), so that we can understand the components of the model in the same way in all use cases anywhere in the world. If we can accomplish this with an IFC entity dictionary (e.g. bSDD) that is certified by ISO, then that is certainly a valid solution. A user-defined dictionary is not the solution; after a while we have several classes with the same meaning and we don't know what the difference is (if any).

Yeah, I think using some other standard classification is the way to go in general, to get all the classes you need. Look at IEC 81347 part 1, 2, 12, ISO 12006 part 3, ETIM, BSAB, Uniclass, Omniclass, CoClass etc. Because there will be different needs for different classes in different contexts. IFC contains the foundation of classes, only. That's explicit in the name - IFC.

@tomi-p
Copy link

tomi-p commented Jan 9, 2025

I do not understand the difference between "IFC schema" and "entity class schema". The IFC schema contains classes that are templates for entitites. In other words: IFC is an entity class schema ... or what do you mean?

I mean that the technical definitions and the definitions of construction elements are divided into two different standards (IFC Part 1 and 2). The first part describes the different ways of defining geometry, the connections and dependencies between components in a model, the technical coupling of attributes and properties to components, etc. However, it would not include any specification of actual building components such as a wall or a window. This part would replace the current ISO 16739 part 1.

Part 2 would contain all semantic definitions of real building components and their attributes. It, too, has a specific structure, but the term schema may be incorrect in this case. Part 2 would also define the necessary hierarchy (project, plot, building, floor, space, building element, etc.), although the current hierarchy in IFC needs some rethinking. Part 2 could be split into several independent parts dealing with buildings, bridges, railways, tunnels, etc. (ISO 16739 part 2, 3, 4, 5, etc.). Technically, this section would be linked to part 1 with bSDD.

Together, these would work so that the geometry of any building element could be (in theory) described by any specification in Part 1. That is, the same principle could used to describe the geometry of a road or a window. If it were found that this would lead to chaos, Part 2 could include which Part 1 definitions are available for each component. Part 1 could then be kept as simple as possible, as it would not need to consider the differences in geometry and relations between the various components.

I hope this clarifies my thinking.

@marwiss
Copy link

marwiss commented Jan 9, 2025

@tomi-p OK, I think about "The Schema" as the data model, practically specificed in machine readable EXPRESS code. I do not think about the theoretical paper/pdf-standards, and how to write them, even though they also give useful documentation. The EXPRESS schema contains a lot of classes for objects that define geometry. And it takes too many objects to describe geometry. For example this class: https://standards.buildingsmart.org/IFC/RELEASE/IFC2x3/TC1/HTML/ifcgeometricmodelresource/lexical/ifcfacetedbrep.htm And then there will be thousands of IfcCartesianPoint objects. https://standards.buildingsmart.org/IFC/RELEASE/IFC2x3/TC1/HTML/ifcgeometryresource/lexical/ifccartesianpoint.htm --- I hope that in IFC5 there will not just be standards and not just schemas, but also official code libraries that implements the full schema practically as class definitions, so that implementers does not have to start from nothing.

@tomi-p
Copy link

tomi-p commented Jan 10, 2025

OK, I think about "The Schema" as the data model, practically specificed in machine readable EXPRESS code.

I apologise for the careless use of the term schema. I should have known better as I am actively involved in terminology work and standardisation. I will try to be more accurate in the future.

I do not think about the theoretical paper/pdf-standards, and how to write them, even though they also give useful documentation.

We should bear in mind that ISO certification is one of the greatest assets of IFC. Without the ISO standard, the IFC would not have achieved its current status. The examples @marwiss gave are from the ISO standard (i.e. from the corresponding bSI documentation).

hope that in IFC5 there will not just be standards and not just schemas, but also official code libraries that implements the full schema practically as class definitions, so that implementers does not have to start from nothing.

I support this idea. It is in no way in conflict with formal standardisation.

@aothms
Copy link
Contributor

aothms commented Jan 10, 2025

If (and when) the IFC technical schema and the class hierarchy of building elements are separated, the latter should also be certified by ISO (e.g. ISO 16739 part 2).

yes this was more or less anticipated transitioning from ISO 16739:2013 to multipart ISO 16739-1:2018, but yeah that remained only symbolic as there was never a part 2.

ifc5 is probably a chance there to do better, although ISO is very much out of the picture atm I think

also official code libraries

This is a sensitive subject and will likely not happen as it conflicts with the interests of bSI members. I also don't see it as necessarily beneficial. I think innovation is quicker in a more open, pluralistic ecosystem.

re less indirections, more elements

Fully in agreement here. A lot of the indirections in the schema come from backwards compatibility, unclear scope, and the entity-relationship model in Express (i.e the idea that there is semantics to the identity of atomic data blocks (loops, points, etc.), even though we never do something with that.

A json-based encoding will almost automatically fold more of these instances into monolithic components. This makes the graph much more consistent and predictable.

At the same time, reuse of complex aggregates was more or less impossible in IFC4 and is well-supported by the USD-inspired data inheritance where subprims carry over by means of the inheritance arc. Therefore we can go into much higher levels of decomposed detail while also eliminating a lot of the pluriformity in association of semantics because aggregation will be the only means of composition (no more CompositeProfileDef, MaterialList/ConsistuentSet, ProfileProps, MaterialProps, ShapeAspect, etc.)

I know the migration path is a challenge, but these things I'm really enthusiastic about.

@marwiss
Copy link

marwiss commented Jan 10, 2025

also official code libraries

This is a sensitive subject and will likely not happen as it conflicts with the interests of bSI members. I also don't see it as necessarily beneficial. I think innovation is quicker in a more open, pluralistic ecosystem.

I am just thinking about basic code libraries that implements the fundamental class definitions of the schema, in various object oriented programming languages. This should preferrably be implemented uniformly by all members and implementors, anyway, and I can't see how this would conflict with legitimate interests. There could be such conflicts of interests there, but then I am wondering why. There should not be any difference between providing a schema and providing ready made class definitions. Can't see how any member would be disadvantaged by that.

@aothms (and IfcOpenShell community) have done a lot of work providing C++ class definitions (and much more), there are also many other projects and implementations for pure python, C#, java etc. But they all have in common that they are far from full implementations. And this is a real problem for development in the AEC-industry, because it is more constructive and positive, if implementers spend time on developing apps, instead of parsing schemas. And it is good if the parsing is complete, as well as uniformly implemented in all apps. So my idea is to not provide any helper methods, but only the class definitions barebone. And I think those class definitions would be of benefit. Such class definitions is just another nad more efficient way to provide the schema.

@aothms
Copy link
Contributor

aothms commented Jan 10, 2025

The fact that code generation on express is non-trivial is a major reason to move away from that. In the repo here now there is a typespec schema that supports code generation for some languages (but not all we need).

The amount of choices to be made for a C++ generated schema is immense and deeply coupled with choices on parser paradigm.

People have tried in the past to come up with standardized API's, i.e SDAI, I wouldn't call it a success. Nobody has ever asked for a SDAI binding in IfcOpenShell, not even hinted at it in conversations.

@tomi-p
Copy link

tomi-p commented Jan 10, 2025

yes this was more or less anticipated transitioning from ISO 16739:2013 to multipart ISO 16739-1:2018, but yeah that remained only symbolic as there was never a part 2.

ifc5 is probably a chance there to do better, although ISO is very much out of the picture atm I think

Sorry to repeat myself. ISO certification is what distinguishes IFC from other similar data transfer formats. I understand very well that from a programmer's point of view, ISO standards are dull and often look backwards. But without them, we would have to use mobile phones from the same manufacturer, for example, to make a call from one person to another. On top of that, several 'wild' data formats might work between phones from different manufacturers, but when used, they would not allow some of the content of the calls to be understood. After a while, the format would no longer be maintained and could no longer be used on new phones.

For end-users, formal standards provide security and reliability. I do not want to block progress, but I would like to stress the importance of ISO standards in the implementation of IFC. We need to find a balance that supports both points of view. Through formal standardisation, the IFC has achieved market leadership; let's not lose it. If you need help with ISO standardisation, I am available (anyway, I'm better at standardisation than programming).

@marwiss
Copy link

marwiss commented Jan 13, 2025

The fact that code generation on express is non-trivial is a major reason to move away from that. In the repo here now there is a typespec schema that supports code generation for some languages (but not all we need).

The amount of choices to be made for a C++ generated schema is immense and deeply coupled with choices on parser paradigm.

@aothms I am curious about the typespec schema, it is presented here:

https://github.com/buildingSMART/IFC5-development/tree/main/schema

It seems like in IFC5, the "schema" is only used to validate IFCX files.
This would be the equivalent to validate STEP-files.

The schema of IFC<5 is in IFC5 instead defined using standardized API.

I think this is a really neat and interesting idea.

@aothms
Copy link
Contributor

aothms commented Jan 13, 2025

If you need help with ISO standardisation, I am available (anyway, I'm better at standardisation than programming).

Good to hear. I was only talking about near-term though. Focus is now on building consensus and reliability not necessarily yet on formalising that into a coherent structured document.

It seems like in IFC5, the "schema" is only used to validate IFCX files.

Code generation is possible as well from typespec. Could you clarify what you mean?

@marwiss
Copy link

marwiss commented Jan 13, 2025

It seems like in IFC5, the "schema" is only used to validate IFCX files.

Code generation is possible as well from typespec. Could you clarify what you mean?

@aothms

I mean, that this schema:
https://github.com/buildingSMART/IFC5-development/tree/main/schema

Is something completely different, from the IFC-schema, this schema:
https://technical.buildingsmart.org/standards/ifc/ifc-schema-specifications/

Not same pupose, not same information at all. Completely different things.
Different information, different content, different purpose.

And thus I assume, that the real IFC-schema, this schema:
https://technical.buildingsmart.org/standards/ifc/ifc-schema-specifications/
... might be implemented using some kind of API-specifications in IFC5, instead of using typespec.

I will see, later.

@aothms
Copy link
Contributor

aothms commented Jan 14, 2025

I don't know if that's a goal. Many of the things in the IFC express schema are artefacts of the express language or SPF file encoding. Things like property sets are not needed in a more flexible encoding, the duplication of the single inheritance element taxonomy in occurence and types can be handled more elegantly, objectified relationships are not needed when all atomic bits of information is expressed in a ECS-component, etc. And then there are aspects such as the monolithic nature of what should be a much more modular schema, etc.

I don't see why these schemas would serve different purposes though. Keep in mind though the typespec schema is an incomplete sketch at this moment. Things like modularity still need to be handled.

@marwiss
Copy link

marwiss commented Jan 15, 2025

I just wonder if IFC5 will define the data model for the same information as IFC<5, in one way or another.

Doesn't matter if the data model is defined using a single inheritance class schema, or using components in ECS. But I think it has to be defined somewhere, and I couldn't find any data model definitions of any components... yet.

Will find out later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants