Quick links :
Home Page :
Contact Us :
Index :
Site Map :
Search Site :
Tech Info :
Documentation
MPC Status Page: Archive (2004 July)
This page describes enhancements to or problems that have occurred with the MPC's webpages
and scripts and the fixes that have been made.
Recent problems are listed elsewhere.
Index of other older problems..
Older Enhancements and Resolved Problems
Local circumstances for geocentric orbits in NEOCP
2004 July 21: 20:00. The calculation of local circumstances for geocentric
orbits in the NEOCP is not correct (I knew there was a reason that block
of code was commented out...). Until the program is fixed, please ignore
the local circumstances. I've only just got back home and I have to eat
before I attempt to modify the code.
- 21:00. Having eaten, I have now fixed (I think) the code.
Network outage July 21
2004 July 21: 12:35. The main UPS in the CF died on 14 July and since then
the CF machines and network routers have been running on Cambridge power
(residents will understand why this is not desireable, particularly in
the summer months...). They are planning to shut down the CF at 17:30
EDT today, in order to put everything back on the UPS. They expect to
be back up by 19:00. All MPC services will be unavailable during this
window.
- 19:45. Normal service has been resumed.
Sluggish response from CfA webserver
2004 July 19: 20:40. We've noticed that the CfA webserver is very sluggish,
as is our webserver (we see a long pause between entering form data and
clicking the action buttion and
the cgi script beginning running). We are investigating with the CF as
to whether this
is a general problem with the network. We do not see a problem invoking
processes on our webserver machine (which suggests that the problem
is not something local to our system). We have just shutdown then
restarted the webserver, but that didn't help.
- 20:50. The copying of this HTML file to the CfA webserver was
very slow, further indicating some network problem or computer problem
in the CF.
- 21:55. No response from the CF. Network activity within our
cluster appears to be normal. However, there are significant delays in
running commands on the CF machines (e.g., 40 seconds before getting
the menu screen after typing 'pine').
- July 20: 00:30. Still no response and the problems persist.
- 08:40. E-mail to @cfa.harvard.edu address is being to trickle in.
Perhaps the problems are being fixed. We'll find out when we get into
the office.
- 09:56. Arrive in office to find e-mail being delivered and better
response from both webservers. Wonder whether we'll get an explanation
for the problems?
- 11:00. It seems that the earlier network problems weren't quite
as fixed as was hoped. Replacement of a CF network switch is pending.
Hopefully, this replacement will not affect our cluster.
- 13:47. The CF has informed us that the faulty network switch has
been replaced.
Planned network disruption July 17 III
2004 July 17: 13:00. Well, we didn't get an OK from the CF, but I went
down there at 12:20 to enquire about the status of the investigation
only to find no-one there! In the hope that if the
problems weren't fixed the CF staff would not have gone to lunch/home, we
are assuming that the problems are fixed. One of the shutdown machines
was brought on-line as a test. This successfully rejoined the cluster,
so all the remaining machines were brought back on-line. Normalcy should
have been restored.
Planned network disruption July 17 II
2004 July 17: 08:00. We have begun to prepare for shutting down most of our cluster
machines prior to the pre-announced network outage.
- 08:50. Shutdown of all but four machines achieved from home.
- 09:30. Heading into office to await all clear to restart machines.
No mid-month MPS batch this week II
2004 July 16: 22:10. Assuming that the CF investigation tomorrow is
satisfactory, we have decided to proceed with this week's mid-month
MPS batch, albeit with a day's delay.
Planned network disruption July 17
2004 July 16: 16:54. The CF has announced that they are planning downtime
of the CfA network from 09:00 to 12:00 EDT tomorrow (Saturday) morning
in order to check that today's fix has actually solved the network
problem (flooding of network traffic due to a router loop).
We intend to power down our machines shortly before 09:00 to ensure
that they are brought down in a controlled fashion. They will be brought
back on-line as soon as possible after we get the OK from the CF.
Network disruption July 16 part II
2004 July 16: 16:30. Another network outage has occurred. The CF apparently
got good logs from the outage this morning and are trouble-shooting with
Cisco. A faulty router has been replaced. Our machines are back up, but
we need to restart various automated procedures.
- 16:50. The restarts have been performed. Normalcy should have been
restored.
No mid-month MPS batch this week
2004 July 16: 15:30. In light of continuing network stability problems, the
decision has been made to not issue a mid-month MPS batch this
week. The preparation of this journal is extremely network intensive
and the failure of the network at an inopportune moment in the
preparation would require many hours of fixing. In addition, the
preparation occurs over a weekend, a period when CF response to problems
is slower than during the week. Network willing, we hope to resume normal
service for the next issue.
Network disruption July 16
2004 July 16: 09:30. Another network outage has knobbled our machines.
No information from the CF as to the reason for the outage, or, for that
matter, any notification that an outage occurred. Machines will become
available again as soon as we have some confidence that the network
will not die immediately.
- 11:00. We have restored normalcy. Again. No guarantees from the CF
that the problem will not reoccur.
- 11:50. It turns out that the DOU MPEC did not complete running.
It seems there were at least two network outages last night, the first
around 03:00 and the second sometime after 08:00. The bits of the
DOU MPEC that did not run are now being completed.
- 12:00. A number of other overnight jobs did not complete and these
are also being rerun.
MPCs in ADS
2004 July 15: 20:00. We have noticed that recent issues of the
MPCs are not indexed in the ADS.
This has apparently been caused by the ADS mail system rejecting our
e-mails containing author and title information for recent batches. We
are investigating this problem with the ADS staff.
Telnet computer services on CFAPS8
2004 July 14: 15:00. The recent loss of a data disk on the old VAX that
powers one half of the telnet computer service has caused us to retire
this functionality, at least temporarily, on CFAPS8. Some of the
facilities offered on CFAPS8 are also accessible on CFAPS1. Most of the
telnet-accessible features have, in any case, been replaced by features
on the web service.
Network disruption July 14
2004 July 13: 16:58. We have been informed that the CF will be replacing
a faulty network device on July 14 from 06:00 to 08:00 EDT. We have
been warned to expect network instability during this period. Machines
(such as the webserver) may not be accessible for extended periods during
this time frame.
- July 14: 09:40. The network caused numerous connectivity problems
in the cluster, as expected. As we clearing a number of remaining problems
the network froze again--there was an extended power outage in the CF.
- 11:30. After trekking into the office, it seems that normalcy is
restored.
Minor Planet Checker
2004 July 13: 21:30. A user reported that the Minor Planet Checker was
reporting inconsistent offsets for an NEA when pasting in observations.
Investigation showed that the program was doing a geocentric computation,
ignoring the observatory code on the entered observation. This was caused
by a failure to modify the arguments on a library routines following an
upgrade. This has now been done.
Network loss?
2004 July 11: 11:00. Early this morning, the machines in our cluster
lost their connectivity. The first machine went off-line and rebooted
around 00:30. Over the next eight hours, all but one of the other
cluster machines rebooted. This is presumably network related
and similar to the problems of July 6 (we suspect this as a home connection
to one of the CF's boxes was also disconnected). Unfortunately, this
interruption has caused havoc with our nightly publications (DOU MPEC
and the mid-month MPS batch). We are working to restore the missing
publications.
- 11:30. It has been determined that the DOU MPEC preparation routine
did not even begin running. This morning's DOU MPEC is therefore
abandoned.
- 11:42. The SMTP queues on six machines that did not restart automatically
have been restarted.
- 11:45. The bits of the mid-month MPS preparation routine
that did not run are being run.
Network loss
2004 July 6: 14:00. Late this morning, the network at the Observatory
died. Most of our cluster members lost their connectivity. No
explanation for the outage is as yet forthcoming. We have restored
normal service.
- 15:00. Our desktops again froze around 14:30. Normal service
was restored after about 20 minutes.
Blank e-mail and web version of the DOU MPEC
2004 July 4: 09:15. A blank version of MPEC 2004-N20 was
e-mailed out this morning. The correct version has been sent out.
Index to the CBAT/MPC/ICQ pages.
Credits
MPC homepage