2.3 Breaking Large Models Up

1 Welcome

Welcome to part 4 of my WriterTool tutorial. If you followed the previous parts, you should find this project in your workspace.

If you didn't, then download the file and use "Import ..." from the File menu. Select "Existing Projects into Workspace" from the folder General.

Choose "Select archive file:" and enter the path to the downloaded file. Eclipse should now offer the project "WriterTool" in the big area in the middle. Click Finish to import it into your workspace.

2 Looking Back

When you use a tool, there always comes a time when you will look back and ask yourself: Is it worth it?

Does EMF really save me so much time? Or do the limitations eat the savings?

Upto now, you'll probably have mixed feelings. When overwriting generated code, it's very easy to forget to add NOT after @generated. The factories are pretty hard to extend.

What advantage does it actually have?

Well, when you look at the model in the ecore sample editor, you see your whole model at once. When you look at it from the Java code, all the information is spread over many classes.

So it gives you a better overview.

It makes it more simple to move things around, to experiment.

But the biggest advantage is what we want to tackle now: Persistence.

2.1 Planning Persistence

How do we want to persistence our model? In one file? Several? A Database?

As I said, the number of chapters can become arbitrarily large. Chapters can become very large while there will be many tiny ideas and events.

We'd rather not limit our users to some arbitrary number of objects anywhere in the model because she just might need another one.

2.2 Database to the Rescue

It might seem that a database is the best choice to solve our problem. Databases know how to handle arbitrarily large amounts of data, they offer searching and transactions.

Maybe but in our case, a database is not really what we want.

A group of software developers once wanted to search a couple of records from a database and then to sort them.

So they fired up their SQL tools, wrote the query and ran it.

Unfortunately, this took very long. Over five minutes.

Just to get an idea, one of the developers created another query which just dumped the whole table into a file and then ran grep and sort on it.

That (including the dump) took only 3 seconds.

Morale: The most obvious solution isn't always best.

To know if we want to put our data into a database, we must again look at the usage patterns.

Will people often search data in our model?

Probably not. Most navigation will happen through parent-child relationships.

Can the model become arbitrarily large?

Probably.

Will the accesses be local or global? Will a user often work with related objects (like all chapters of a book)? Or will he want to search and modify arbitrary parts?

Accesses will be local.

Does the data have to be versioned?

Absolutely. We also want different authors to be able to comfortably merge their changes to the same chapter.

Databases are bad at merging changes from different sources into the same row.

They are great at searching data in a big heap. But what if you don't need that? Also, searching is only fast if you create the right indexes.

How about transactions?

I imagine that we'll have big transactions. Users will write for a few hours on some part of the book before they want to commit it. Databases can do that but it either means someone else can't work on the same part (bad) or the second one to save his or her work is going to see an error (worse).

So basically, the only reason pro database is the size of the model and the biggest contras are the merging and the cooperative work.

And there is another reason: Our model is basically hierarchical. Like a filesystem.

Most filesystems consist of a roughly 100K lines of code. Databases come in at million lines of code. Several millions.

Filesystems are used by every computer user in the world, every day, constantly.

Filesystems have much fewer bugs than databases.

Every developer understands how a filesystem works.

Only very few developers understand what happens, when they work with a database. It's magic.

So every time someone wants you to throw a database at a problem, ask yourself if that's really necessary.

But how about EMF? Can EMF handle arbitrarily large models?

If it cannot, then we have to use a database, no matter the cost.

Well, not out of the box but you can tell it to.

2.3 Breaking Large Models Up

When you look at references in the model, you can find a property "Resolve Proxies". It's true by default.

This means that, instead of actually having the real object, you can use a proxy in there. The proxy can be told to load the real object only when you need it.

While you work on the ideas, you don't have to have all authors in memory.

While working on a single chapter, you don't need to have all the other chapters in memory at the same time.

So the basic idea is to use proxies for the large objects (like chapter texts) while we keep the navigation information (just a few lists and references or proxies) in memory.

This way, the user will be able to navigate quickly while the application loads just those parts of the model which she actually wants to edit.

    public static Author addAuthor (Project project, String name, String summary, String text)
    {
        Author author = WriterToolFactory.eINSTANCE.createAuthor();
        init (author, project.getDefaultLanguage(), name, summary, text);
        project.getAuthors().add(author);
        return author;
    }

Table of Contents

1 Welcome

2 Looking Back

2.1 Planning Persistence

2.2 Database to the Rescue

2.3 Breaking Large Models Up