Some time ago we decided to start an ISO27001 certification project. We believed that being certified for the way we handle sensible data, would improve our overall quality. An auditor visited our company and our employees were interviewed. The outcome? We did a good job, but we could do a lot better. One of the things we needed to improve was the centralisation of our documents. Being a 20-year-old company we had managed to have 3 different locations to store documents:
- Our intranet (Liferay Portal CE v6.1);
- Google Docs;
- Local file server at our office.
We secure our local file server at the office very well. But having a local file server at all was not appreciated by the auditor. We had to come up with a file security policy that excluded the need for this server. And so we decided to integrate the files that were stored at option 3. into our intranet (option 1.). This sounds like a simple solution, as Liferay offers a Document Management System out of the box. But… our intranet was still running on Liferay Portal v6.1 and to be honest it still was GA1. However, because it was only accessible through VPN there had not been an urgency to update the thing. Now that we were about to integrate our locally stored files, it felt like the right moment to update the portal as well.
‘Update’ actually isn’t the right terminology. I wanted our intranet to be switched to the latest version of Liferay Portal. So we decided to ‘upgrade’ to Liferay Portal CE
We already did a lot of upgrades from v6.1 to v6.2 and most of the time, this went ‘ok-ish’ (to avoid me from saying ‘well’). One cannot expect everything (especially custom plugins) to work properly after an upgrade, so there is always some manual work involved. However, upgrading is most of the time not as straightforward as the manual wants you to believe. But hey, we have experienced Liferay people and our intranet is as vanilla as Liferay out of the box can be. No custom plugins, no hooks, even no theme… So what could possibly go wrong?! Well, apparently multiple things…
I can’t exactly remember during what update (from which GA to which GA), but somewhere the migration script of Liferay just stopped. It bailed out. Let me describe what went wrong, and more importantly, how we have fixed things.
Private non-working copies
In the logs just before the conversion script bailed out there was a message indicating that there was a ‘false file version’ detected. Not entirely sure what was meant I started by inspecting a dump of the database. And there it was:
There were character-like-strings inside the column labelled ‘version’, whereas the conversion script apparently assumed these to be some floating-point-like-strings. PWC means ‘Private Working Copy’. These entries occur in the database when someone checks out a file from the Document & Media library and indicates the file is locked (until that someone checks in a new version of the file again). Apparently, over time we had managed to check some files out, without them ever being checked in again. A short inspection of which other versions of these files were available to let me believe a simple fix would be to just put a version there that was ‘0.1’ higher than the previous ones. This, with statements such as:
update DLFileVersion set version=’1.3′ where fileVersionId=366018;
We were able to fix this issue, knowing that we were probably breaking filesystem consistency by doing so, but hey, we wanted the conversion at least to pass. Re-running the conversion showed me that the conversion didn’t break on this issue anymore. However, it did on another issue…
Do you speak Creol?
So was that all? Well to get Liferay Portal to boot: yes. To make Liferay work: no. The search didn’t work anymore. Re-indexing wasn’t possible. It bailed out with a cryptic message indicating the Wiki-parser wasn’t able to understand every article in our Wiki. Ok, we are good at writing cryptic messages ourselves but this message didn’t reveal anything about the exact articles that were causing the issue. We set the log level to ‘DEBUG’ and even to ‘ALL’ (only do that if you do not mind going over 100’s of log-lines to find issues) but the log didn’t have any article indication whatsoever.
Most of the articles in our Wiki use the Creol syntax, although we have a bunch written in HTML too. Because the only clue I had was that it was an issue with interpreting Creol, I was able to filter the article set down to about 200 articles. I tried by visually inspecting the database dump to find diacriticals and strange syntaxes. To make a long story short: I didn’t find any. Although, multiple times I thought I had found the cause of the Creol syntax error, every time I tried to reindex, it bailed out again with the exact same message.
I have been working on it for about 2 days until I was so fed up with it, that I decided to split the number of articles in half (that is, remove half of the articles) to narrow down the set of possible articles that were troublesome to convert. I repeated this process many times and in about 4 hours I did find exactly 1(!) article that was causing the problem. HOORAY! It turned out to be an article that was originally written in Twiki, our previously used wiki implementation and which’s content was converted to Liferay wiki.
Now assuming this conversion from old Twiki to Liferay was the issueshort sighted as many articles in our Wiki were converted from Twiki. So there must be ‘something special’ with this one article. It is a shame I don’t have a 6.1.0 instance running anymore now, so I can exactly show you what went wrong. However it had to do with multiple nested lists defined in Creol, some starting with an asterisks ‘*’ and others starting with a dash ‘-‘. Eventually, I didn’t bother to fully fix it, because it turned out the information in the article was outdated anyway. To clean up some rubbish that was left behind in the database, I used a statement such as:
select * from WikiPageResource
where resourcePrimKey not in (select resourcePrimKey from WikiPage);
File Consistency
Intrigued by the inconsistencies in the Wiki data, I also looked at the tables that store data about the entries in the Document & Media library. After writing some simple bash-scripts that checked the file system entries with the database entries, I found a lot of inconsistencies on both ends. Doing some Googling on this issue, I found two portlets from Xtivia in the marketplace that come in hand very well in this case:
- The Documents Media Database Checker and
- The Documents Media File System Checker.
Using those cleaned up a lot of entries both in the database as well as on the filesystem. So I would definitely recommend using those!
Garbled Document Previews
Now we had the upgrading done, the search functionality restored and obsolete data removed. When my colleagues used the ‘new and improved’ intranet, they complained that the document previews that Liferay generates were garbled. Especially automatically generated PDFs from Open Office documents were fully unreadable. And yes: we did install Open Office and Xuggler and all properties were set right. We know how to do that, we are a managed hosting company remember ;). However, it turned out that some preferences’ values were stored in the database. Newbie mistake… I quickly removed the corresponding preferences from the PortalPreferences table a restart and a re-index made the previews work. If you ever need to remove property values from the database please remember that Liferay stores all these properties in 1 (one) cell in XML format. So in the database (we use Percona, but that doesn’t matter), you will see things like:
So unless you can come up with some sophisticated (and error free) queries that do inline XML-replacement, I would recommend you SELECT the values from the database. Do the changes in your favorite editor. Then, run an UPDATE on the record in the database with the pasted changes. One thing that to me personally comes in hand by debugging or changing not-so-straight-forward thingies in the database, is to make a full dump of the database. Here each data record will print on a separate line. Yes, I know the file will get much bigger and re-importing will take much more time. At least you are able to find records of your liking much easier). All you need to add to your ‘MySQL dump’ command is the option ‘–extended-insert=false’. This will get you each record on separate lines.
Done.. right?
Yes, finally all is working fine now. This gives me the possibility to add the files that were locally stored at our server in the Liferay Portal. To be honest, since writing this all the files are imported. But that was a hassle on its own.
Why managed Liferay hosting?
Discover how Firelay boosts your Liferay in our extended features document.