Loading
November 20, 2020 ipernity service break
Record for the archives

To answer the many questions about what actually happened:

Friday, November 20, 2020, 18:00 CET the ima-team published the fortnightly 'Club News' (previously called 'Newsflash'). For that week, this was the last action of the ima team, executed with help of the normal ipernity blog editor. Any kind of modification to the software on the servers had not been done. No new html-files had been uploaded, except the notice banner. However, this is a proven, uncritical action.

Friday, November 20, 2020, 20:00 CET members of the ima team were alerted from outside that our website was no longer able to upload and comment. An immediate check of the log-files showed last entries until Friday, 19:38 CET. So we have to assume that this is the point in time when the malfunction occurred.

An immediate check of the AWS servers via the AWS console showed that all 16 servers were running technically sound and not overloaded. A hardware error could be excluded. Rather, it must have been a malfunction in the ipernity software, the sudden occurrence of which was completely inexplicable.

Our own volunteer immediately tried to locate the error. Unfortunately his autodidactic skills were not sufficient for this, although he also got support from other club members.

So less than 1 hour after the malfunction occurred, it was clear that we would need help from IT professionals. Because as a club of amateur photographers we do not have our own IT specialists. Until this point we had done everything that IT laymen are capable of doing.

Therefore it was decided to hire an external service on Monday 23. In addition, to inform the members, a banner was published on Friday evening. Moreover, on Friday evening, a detailed information was put online in this ipernity team blog, which is the statutory publication medium of the ima team. It was accessible to everyone until the emergency shutdown on Wednesday, November 25, 2020. Thus we have fulfilled our statutory obligation to inform all members promptly.

During the night and on Saturday morning it was discovered that a server was executing write commands in quick succession, which filled the hard drive. It was then used to 99% capacity. The files could be deleted. But because the servers had fallen out of sync in the meantime, our website did not run correctly. Furthermore, it was not possible to determine why the server, which had fallen out of sync, showed this malfunction.

In addition, on Saturday morning 07:26 CET we answered all tickets received until then. Our attempts to possibly get help from the creator and former operator of the site were unsuccessful.

On Monday, November 22, 07:32 CET we alerted our professional IT service provider Qwellcode (Salzkotten, Germany). After checking all available information, Qwellcode promised to withdraw an expert from another ongoing project in the afternoon.

In parallel, we also tried to contact the AWS specialist from Innovations ON (Ulm, Germany), who successfully helped us solve a server hardware problem in June 2020. His company headquarters told us that he was on vacation. Shortly afterwards, a colleague of his contacted us. Together with Qwellcode he checked the AWS servers and confirmed our own findings that everything is technically in order on the part of AWS.

Qwellcode started its work on Monday, November 22, at 16:11 CET. At 19:00 CET it was not only known which server was breaking ranks, but also that it was useless to delete the command that caused the problem. This command was completely nonsensical and in an illogical place. To delete it was unsuccessful. After some time it was there again and continued its harmful work, which caused the servers to get out of sync and the internal hard disk to fill up.

Normally one knows such abnormal behaviour of virus-infected PCs. However, since ipernity is secured by professional AWS firewalls, it is extremely unlikely that a virus infection occurred. More likely is a hacker attack or other unknown cause which could not be found out quickly. With regard to time and cost we stopped the search for the software pest on Tuesday evening and decided to find and install a non-infected backup instead. We were able to start installing this backup beginning on Wednesday, November 25, 2020, 12:00 CET.

Ipernity runs on 16 servers, which are connected with each other. If there are signs that malicious code is in the system, you have to shut down completely, clean and reboot all connected servers. If only one is left out, suspected malware can spread from there to the others. That’s why also the frontend servers, which connect the website with the web, had to be switched off.

In such situations you would normally redirect the IP address to a separate server with an information panel. So not only to another website, but to a physically different device. However, when ipernity was programmed back in 2013, one did not anticipate such a situation. It would have taken us several hours to rent another AWS server and to bring it up. Too much in such a crisis situation. Therefore we accepted the temporary appearance of a white screen.

In addition , we have continued to provide up-to-date information via the two leading social media ipernity@Facebook and ipernity@twitter.

We ourselves regret that any reserve server for such messaging purposes doesn’t exist yet. We will check what the setup of such a separate news server would cost. Then we decide whether the ongoing rental of an additional AWS server and the necessary programming for such extremely rare emergencies is worthwhile for a small club like ours.

The installation of the backups on the 16 servers took until Friday, November 27, 18:00 CET. Hundreds of gigabytes had to be replicated and the servers to be started up and synchronised one after another. Everything went fine until this moment. However, when the system was booted in total, it turned out that some connections between the servers were still interrupted.

Since the responsible specialist at Qwellcode had already exceeded his legally allowed maximum working hours, we let him go for the weekend. The work has continued since Wednesday December 1. The other AWS specialist from Innovations ON who helped us in June now tries to find what can be done.

Update December 3, 14:00 CET. Considerable progress in the restoration of the website achieved. For those functions that are not yet working correctly, manual corrections are still needed, which the experts are currently working on.


11 comments

Clickity Click said:

Hi Sami, was able to access your "Article test". :)
3 years ago

Sami Serola (inactiv… replied to Clickity Click:

Cheers! Happy to see this works.
3 years ago

Sami Serola (inactiv… replied to Clickity Click:

Changed this to an archived report of the incidence.
3 years ago

Antje P. said:

Many many thanks to the whole ima team for your great commitment!
This is not at all to be taken for granted. I hope it will be a little compensation for you to see how much joy everybody has that Ipernity is up and running again.
3 years ago

Amelia said:

I was able to access the help desk throughout the shutdown, and so I knew roughly what was happening. Thank you to all the IMA team for their hard work and total commitment to Ipernity. Everyone must be so grateful that things are running smoothly again, and that now we can lose ourselves on the site again, and forget the ongoing world problems of Coronavirus.

THANK YOU ALL once again.
3 years ago

Andy Rodker said:

Thank you, IMA team for the hard work and alacrity with which you attempted to resolve the situation. Great to be back!!
3 years ago

Gudrun said:

A big heartfelt thank you to all concerned for the hard work!
3 years ago

Erhard Bernstein replied to Gudrun:

+1
3 years ago ( translate )

StoneRoad2013 said:

A record is the right and proper thing to keep.
3 years ago

PhLB - Luc Boonen said:

thanks for the information, a big shame to the suspect hacker
3 years ago