The solution ORMs provide is to “map” an object class, which is a type (or domain), onto a table, which is a relation variable (a.k.a. relvar). This supposedly abstracts away an “impedance mismatch” between the two. ORMs are already off to a bad start, mapping a type to a variable, but I’ll continue.
The real impedance mismatch is a set of very fundamental differences between application data and data in a well-designed relational database.
Application data:
- is ephemeral in nature (that is, temporary)
- is subject to changes in structure between revisions of the same application
- is subject to changes in semantics (meaning) between revisions of the same application
- the meaning is highly context-sensitive
- the data is highly dependent on physical structure
- not all of the data that has been processed by the application is held at once, meaning that contradictions are often impossible to detect (for instance, something as simple as a duplicate serial number).
While data in a well-designed database:
- has a definite predicate that maps the data to real world facts independent of the application
- these facts are kept as long as the fact is still believed to be true and relevant
- these facts are context-insensitive
- these facts are independent of any physical structure or implementation
- is held all at once, allowing detection of contradictions through complex logical relationships, enforced by declarative constraints
Some people like to think that an ORM is something that should just be plugged in to add “persistence” without requiring database design. This idea views the difference between application data and relational data as an impedance mismatch that can be overcome with clever code. The entire premise behind that idea is that an RDBMS is a legacy technology, or that the only reason an RDBMS is needed is for performance, reliability, stability, backup procedures, and other services that any DBMS provides.
In reality, the impedance mismatch is the difficulty in mapping application data to facts in the real world. The process of inserting new information into a database is not just the process of making that data persistent; it is the processes of reconciling that new information through automated logical inferences with all the other data in the system and, if a logical contradiction is detected, rejecting the new information with a meaningful error.
Back to ORMs. Why “map” the object, and not just store it directly? Modern relational databases support a wide range of types, and also the ability to declare your own types, including sophisticated types (like an object). You can define input and output routines (for transmission to/from the client application), and then all of the operators on that type. There is some duplicity in first defining the object in the application, and then defining it again in the database, but I don’t think that’s the only reason.
If you put the entire object into one field, you can see right away that no meaning has been stored. Moreover, you will realize soon after that the implementation of the object has been solidified in the database, and therefore it’s no longer so easy to change the implementation details of the application. However, when you dump the internals of an object into separate fields in a table, there is the illusion that the meaning of that data has been stored as well, and the illusion that the data is independent of the implementation. This is the same illusion as when an object is serialized as XML: it looks like there’s meaning there, but there really isn’t, it’s just serialized data made from some internal application state with no meaning outside of context.
If you actually take the time to design predicates that have meaning in the real world, and from which inferences can be made when logically combined with other predicates, and then map the application data to real world facts that match those predicates; only then are you free from the implementation details of the specific revision of the specific application that inserted the data, and have enough information that you can make automated inferences from that data.
The point of all this is not that users of ORMs are wrong necessarily, my point is that there is no “holy grail” ORM solution that will solve these problems for you. Often, ORMs get in the way of you trying to solve those problems. Recognize the limits of an ORM, and as long as you don’t sacrifice data integrity, then work within those limits.
This is a very interesting subject. I post a java OOP question similar on this problem on this thread http://www.utteraccess.com/forums/showflat.php?Cat=&Board=98&Number=1500406&page=0&view=collapsed&sb=5&o=&fpart=1
It seems that academia has dropped the ball in making defining a standard map Object relational modeling and normalized table designs.
I don’t think academia has completely dropped the ball. See “Databases, Types, and the Relational Model: The Third Manifesto” by CJ Date and Hugh Darwen.
The authors propose a database system that supports sophisticated data types, including type inheritance. They reject the “class = table” analogy.
Keep in mind that application data and database data are inherently different, for the reasons I list above. OOP and pointers are inherently closer to physical representations, and applications must inherently manipulate physical representations. So, an OOP approach to application design is natural.
However, at some point you need to translate that data into something with meaning outside of context and abstracted from physical representation. Relations are a great way to do that, and you absolutely can use complex types in your relations’ predicates.
Your code doesn’t know the meaning of your physical representations of data. That is, a pointer represents a “relationship” between two things, but that is arbitrary and holds little meaning. Only when you transform the physical representation to match a predicate does it have a logical meaning from which other code (not just humans) can draw inferences.
What? There’s no silver bullet? ****. Just when I thought I had my vampire problem solved.
More seriously, we’ve had lots of attempts at removing the impedence mismatch with object databases (making the database look and act more like the app’s data). But AFAIK there haven’t been as many attempts at going the other way–making programming languages that can deal directly with relations.
Here’s an attempt:
http://eigenclass.org/hiki/addressing-orm-problem-typed-relational-algebra
I think that’s an interesting idea that will go a long way toward bridging the gap.
It doesn’t address all the differences, however. For instance, the application still only has a small subset of the overall data.
this could save your life: silver bullets work agains werewolves, not vampires.
Great piece, and something that’s been annoying me for a while now. To someone who knows what function a database is supposed to perform for an organization the misconceptions you highlight are frustrating, to say the least. And ORMs getting in my way was frustrating enough for me to start tinker with other ways of gluing applications and databases together (http://tmdbc.dev.java.net)
Interesting idea. I find compile-time type safety between the application and database especially interesting.
I think that’s a more promising line of research, because it’s using the equation “type is analogous to type” not “type is analysis to table (relation variable)”. Smoothing over type mismatches would be a great way to encourage people to use an RDBMS correctly.
For me, growing up as a software developer back when perl’s DBI was in vogue, I have never seen the point of mapping tables to classes, except as a tool to make accessing and aggregating that data more efficient in the code. I guess you have summarized what I have been feeling ever since I was introduced to the idea of an ORM. Too many people take the O’s that the ORM gives you and treat them like first-class application objects when they are really a portal to the database world, a world that lives under different rules and assumptions than the application.
It seems to me that the entire “introspection” piece of ORM systems is
based on three assumptions:
- Writing SQL by hand is bad.
- Mapping SQL to methods by hand is also bad.
- Automating the creation of SQL or its mapping to the OO (or other)
code saves time or other resources over the scope of the project.
In my experience, these are all false.
A summary of the kinds of relationships between classes in the object model are:
1. Link (1-1, N-1, 1-N, N-M) / +ternary
2. IsA (3 flavours) / Inheritance
3. PartOf (Composition or inner class)
All theses relationships have their counterpart in the relational model with integrity contraints like foreign keys etc …
I agree that a Class is not equal to a Table.
But the primary key is important to decrease the impedance mismatch in the object world so we have to create an identifier for a class even if the object has its own object identifier which id an address in memory that way the data inserted in the ralational database can be used by tools other parties (business objects, crystal reports, …)
The transition to 100% object is not yet resolved because of the persistance that needs RDBMS instead of File Systems.
We can see this in the Entity EJB (BMP and CMP) and now with EJB3.
I think that the mapping is now mature but the way to program it is less straightforward.
The bridge between the Application data and a well designed database data is made automatically by tools but sometimes there is some pitfalls.
Concerning the updates you have to make it from the Object Model and make queries from both parts (Model and relational) to keep data integrity.
If we store the entire object we won’t use the sql language to make directly join on columns data, so we can make queries only from the object side and not from the relational side.
Hope that helps.
Mohamed ROMDANE
“All theses relationships have their counterpart in the relational model with integrity contraints like foreign keys etc”
Any information can be expressed in many different structures. However, not all structures are equal.
“But the primary key is important to decrease the impedance mismatch”
I assume here that you are referring to a surrogate key, that is, a generated number that is visible to the application, but hidden from the user (and hidden from the business overall).
I already made the argument that surrogate keys are a bad idea here:
http://people.planetpostgresql.org/jdavis/index.php?/archives/4-Terminology-Confusion.html
Surrogate keys do nothing to decrease the real impedance mismatch, which I described in this article in detail. For just one instance, how does a surrogate key make the data in an application less context-sensitive?
“The transition to 100% object is not yet resolved because of the persistance that needs RDBMS instead of File Systems.”
Object oriented data structures are essentially graphs (or perhaps trees). Graphs are very difficult to work with compared to relations. Graph data structures have been around longer than relations, and the Relational Model was invented to solve many of the problems created by storing data in graph structures.
I think the “transition to 100% object” will never be complete, because I think it’s fundamentally a bad idea. Using object-oriented database management systems is a regression.
“The bridge between the Application data and a well designed database data is made automatically by tools but sometimes there is some pitfalls.”
You completely ignored the fundamental differences between application data and database data that I listed in this article. The differences are not avoidable by “tools”.
Jeff,
The first time I read this blog, I only slight understood the concepts that you were mentioning. However, I am in the middle of reading a book “Data Access Patterns: Database Interactions in Object-Oriented Applications.” I feel that I have a better understanding of your arguments. Since I feel that my SQL skill are much stronger than my code skills, one of the thoughts that concerned me while reading the book was that implementing a ORM, was that the mind set of the ORM was to process data in singular pieces instead of larger chunks at a time. This would essentially eliminate 90% of the SQL that could be used to replace repetitive time consuming manual tasks in a application.
However, would you say that there are certain cases where using an ORM is useful. In my case, I am working towards developing an interface that will allow a user to view and interact with data as represented in the format of a diagram (an instrument loop diagram to be exact). Would using an ORM have any advantage when trying to displays data in a format other than the typical tabular format.
back ground: I am playing with implementing Netbean’s Visual Library into my application in an effort to display data in a format that engineers understand.
There’s nothing wrong with using a specific ORM as a tool for a specific application.
The problems appear when someone tries to use an ORM to solve a general problem. In particular, using an ORM to try to solve the “impedance mismatch” is going to fail.
For your application, the question you should be asking is not what’s presented to the user, but if the information needs to make sense to other users. If the information does need to make sense to other users, then you should design the database accordingly. If the data only has value to that one user, than an ORM may be safe to use.
Pingback: Why DBMSs are so complex « Experimental Thoughts