NTPd + NMEA + PPS in a modern Linux system with PPSAPI

SamsonovAnton · March 3, 2015, 7:13pm

I am quite confused by my current experience with ntpd and PPS (pulse-per-second) support in openSUSE 13.2, which is perhaps no different from other contemporary Linux-based systems, but far different from FreeBSD 10, which requires almost no efforts.

That is, with FreeBSD 10.0/10.1, all I needed to do was to recompile the kernel with PPS_SYNC option and then just configure ntpd (re-compiled with PPS/ATOM driver support) to sync to 127.127.20.0 with “flag1 1”, which means both NMEA sentences and PPS pulses from a GNSS receiver on /dev/gpspps0 which is symlink to the actual /dev/cuau0 serial port.

With openSUSE 13.2 and stock “desktop” kernel, all PPS and PTP timekeeping options are compiled as modules, so I manually loaded the pps_ldisc module (for serial ports), as well as other drivers and pps_core module. This, however, didn’t produce the desired effect, as ntpd still complained:

refclock_ppsapi: time_pps_create: Operation not supported

Digging into this, I found that there must be some PPSAPI testing tools in /usr/src/linux/Documentation/pps, but there were none, except the very same pps.txt file already found online; after checking the official source tree for all versions down to 2.6.32, it became apparent that these tools were never actually included in the upstream, although their accompanying pps.txt was. Nevertheless, it is easy to find 2 versions of pps-tools in the wild, but all of them still said:

# ppstest /dev/ttyS0
trying PPS source "/dev/ttyS0"
cannot create a PPS source from device "/dev/ttyS0" (Operation not supported)

Then I came upon instructions on LinuxPPS site which mention “ldattach 18 /dev/port” command needed to actually enable the PPS discipline on a serial port, and which creates /dev/ppsN device along with /sys/class/pps/ppsN pseudo-file tree. After that, I was finally able to launch ntpd, although in combo mode: with both 127.127.20.0 “flag1 0” and 127.127.22.0 “flag3 1”, which means that NMEA timecodes and PPS pulses are handled by 2 separate drivers, which need to employ fragile heuristics to function as one. NMEA driver then needs “time2 0.124” fudging to bring its timecode near the start of second (this is adjusted automatically when NMEA and PPS are configured as single source); otherwise this source will be considered falseticker that lags ~124 milliseconds behind all other known sources. And without an NMEA (or other preferred) source, PPS driver will never be selected, or will be deselected once NMEA source is unselected by clustering algorithm. To make things worse, both ntptime and ntpq -c kerninfo report kernel status as non-PPS:

ntp_adjtime() returns code 0 (OK)
  modes 0x0 (),
  offset 2.318 us, frequency 31.128 ppm, interval 1 s,
  maximum error 6000 us, estimated error 1 us,
  status 0x2001 (PLL,NANO),
  time constant 4, precision 0.001 us, tolerance 500 ppm,

That is, ntptime output is aborted immediately before usual PPS stuff. Then I tried to relink /dev/gpspps0 to /dev/pps0 and configure ntpd as usual, but it seems that a Linux PPS device does not expose the underlying serial port, so that attempt failed. Retrying ppstest failed as well — unlike ppswatch (from alternative pps-tools pack), which started to print out timestamps (those are simply read from assert and clear pseudo-files in /sys/class/pps/ppsN).

To me, that makes PPS setup on Linux completely uncontrollable and undebuggable. Even if it actually works somehow (until the NMEA source is deselected sooner or later, also bringing PPS down with it), it can barely be considered a production-grade solution even at home. Thus I wonder whether I may be missing something obvious, and Linux indeed provides some trouble-free way to single-source NMEA+PPS, as I saw in FreeBSD. Do someone here have experience on that matter?

arvidjaar · March 3, 2015, 8:38pm

Did you see http://linuxpps.org/wiki/index.php/LinuxPPS_NTPD_support?

SamsonovAnton · March 4, 2015, 12:50pm

That was exactly the place I finally found definite instructions to make PPS setup working in Linux, as mentioned above. My question was, however, whether it is possible to get it working like in FreeBSD, where both NMEA and PPS are configured as a single source and handled as one — without heuristics and with full debugging support.

# uname -sr
FreeBSD 10.1-RELEASE-p5

# cat /etc/ntp.conf | grep 127.127
server 127.127.20.0 mode 95 iburst prefer
 fudge 127.127.20.0 refid GNSS flag1 1 flag2 1 flag3 1

# /usr/local/sbin/ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*GPS_NMEA(0)     .GNSS.           0 l    2   64  377    0.000    0.003   0.002
...

# /usr/local/sbin/ntptime
ntp_gettime() returns code 0 (OK)
  time d8a032e7.e4322294  Tue, Mar  3 2015 13:17:27.891, (.891390586),
  maximum error 2000 us, estimated error 5 us, TAI offset 35
ntp_adjtime() returns code 0 (OK)
  modes 0x0 (),
  offset 4.839 us, frequency 31.501 ppm, interval 256 s,
  maximum error 2000 us, estimated error 5 us,
  status 0x2107 (PLL,PPSFREQ,PPSTIME,PPSSIGNAL,NANO),
  time constant 4, precision 0.001 us, tolerance 496 ppm,
  pps frequency 31.501 ppm, stability 0.020 ppm, jitter 2.230 us,
  intervals 364, jitter exceeded 4, stability exceeded 0, errors 2.

# uname -sr
Linux 3.16.6-2-desktop

# cat /etc/ntp.conf | grep 127.127
server 127.127.20.0 mode 95 minpoll 4 maxpoll 4 iburst prefer
 fudge 127.127.20.0 refid GNSS time2 0.265
server 127.127.22.1 minpoll 4 maxpoll 4 iburst
 fudge 127.127.22.1 flag2 1 flag3 1

# /usr/local/sbin/ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*GPS_NMEA(0)     .GNSS.           0 l    4   16  377    0.000    0.977   0.140
oPPS(1)          .PPS.            0 l    2   16  377    0.000   -0.002   0.002
...

# /usr/local/sbin/ntptime
ntp_gettime() returns code 0 (OK)
  time d8a1612d.33850df0  Wed, Mar  4 2015 13:47:09.201, (.201249485),
  maximum error 7500 us, estimated error 1 us, TAI offset 0
ntp_adjtime() returns code 0 (OK)
  modes 0x0 (),
  offset -1.682 us, frequency 31.139 ppm, interval 1 s,
  maximum error 7500 us, estimated error 1 us,
  status 0x2001 (PLL,NANO),
  time constant 4, precision 0.001 us, tolerance 500 ppm,

# /usr/local/sbin/ntpq -c kerninfo
anton:/home/anton/AppSrc/pps-tools # ntpq -c kerninfo
associd=0 status=011b leap_none, sync_pps, 1 event, leap_event,
pll offset:            0.001586
pll frequency:         31.1226
maximum error:         0.0045
estimated error:       1e-06
kernel status:         pll nano
pll time constant:     4
precision:             1e-06
frequency tolerance:   500
pps frequency:         0
pps stability:         0
pps jitter:            0
calibration interval   0
calibration cycles:    0
jitter exceeded:       0
stability exceeded:    0
calibration errors:    0

After some time:

# /usr/local/sbin/ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*GPS_NMEA(0)     .GNSS.           0 l   11   16  377    0.000    0.004   0.163
 PPS(1)          .PPS.            0 l 176m   16    0    0.000   -0.001   0.000

# /usr/local/sbin/ntpq -c kerninfo
associd=0 status=041b leap_none, sync_uhf_radio, 1 event, leap_event,
pll offset:            -0.075859
pll frequency:         30.1064
maximum error:         0.0065
estimated error:       0.000186
kernel status:         pll nano
pll time constant:     4
precision:             1e-06
frequency tolerance:   500
pps frequency:         0
pps stability:         0
pps jitter:            0
calibration interval   0
calibration cycles:    0
jitter exceeded:       0
stability exceeded:    0

Well, how do I debug this situation, if the only info I am given is that kernel is not synchronized to PPS anymore (which is already obvious after ntpq -p), and taking into account that /sys/class/pps/pps1/assert and …/clear pseudo-files actually do contain ever-increasing timestamps? It is totally unclear what happened and which unit to blame: NTPd itself, its ATOM driver, pps_ldisc module, pps_core module, kernel timekeeping subsystem, or even the GNSS hardware.

PS. My previous statement about ppstest still not working was somethat misleading: it did not work with /dev/ttyS0 after enabling PPS line discipline on that port, but did work with /dev/pps1 virtual device created by PPS engine — and is still working with it even after NTPd somehow lost communication with that device.

tsu2 · March 4, 2015, 4:37pm

It’s been… probably over a decade since I’ve last tried to configure an external GPS since today built-in GPS is so ubiquitous (probably unless you’re talking about an embedded board).

I remember in a general sense similar difficulties, solutions were very specific to the capabilities and features of the GPS device, once a device was selected you had to make sure you have the proper drivers installed (and inspect the supported features of the driver), then make sure the software/app you’re running made the proper calls (compatible) with the driver.

What you’re describing sounds very similar except that driver installation is probably replaced by kernel module implementation which would be good because it would hopefully standardize functionality (one of the many parts of the solution). But, whether the standardization is implemented fully and properly is a guess.

So, I recommend you submit a bug report, the standard opensuse bugzilla (http://bugzilla.opensuse.org) is fine since you can specify that the issues you’re looking at likely are upstream kernel and not specific to openSUSE. When you submit your bug, you’ll have to include many things you haven’t included in this thread, in particular identifying your GPS device hardware (likely including chipset), and specific commands you used to set up your partially working solution. Include exact errors you can capture in logs or stdout.

A general suggestion:
It looks like your posts included snapshot-type readings, ie. status at a particular moment in time.
Without being able to suggest an app, you might try to run a real-time app that continuously reads your serial data, and a similar app looking at your Linux system (free if you’re looking to monitor memory availability?).
The objective is to try to identify both whether the dropped serial connection is sudden or gradual, and if it might correlate with a lack of specific resources. Keep in mind, particularly on the Linux side (and likely different than BSD) is that with systemd, multiple instances of the same app run in the same parent process. So, something like htop which can display a process tree <might> be useful to understand if something is unexpectedly running and exhausting resources. But, these are just suggestions for surrounding your problem and are largely grasping at straws, not something

Also, you may be able to find some good kernel debugging tools. Too bad the excellent presentation I sat at SCALE 13x hasn’t posted their minimal slide deck which although minimal did include a good list of tools, if they ever post the slides it will be at this link
http://www.socallinuxexpo.org/scale/13x/presentations/kernel-debug-tools-and-techniques

TSU

SamsonovAnton · March 5, 2015, 6:16pm

Things got much simpler today, indeed. The pps_ldisc kernel driver for serial line discipline is completely GPS-agnostic and will work with any synchronisation pulse input fed to the DCD pin of a COM port. Moreover, the pps_core driver that stands above pps_ldisc is totally hardware-independent and is able to accommodate other PPS sources like pps_parport for the ACK pin of a parallel port — again, independently of any other information that may travel along this port on data pins.

The only thing that actually puzzles me (aside of ldattach step that seemingly was not needed with earlier versions of LinuxPPS, according to hobbyist how-to’s I read earlier) is that current versions of NTPd (that do not need any patching, according to the LinuxPPS site), when instructed to process PPS input from a regular NMEA source, still try to create a new PPS source device in the system and fail without any fallback, but successfully communicate with system-created PPS device as if it was regular COM port, — but only if that PPS (ATOM) source is configured separately from NMEA source, because the latter needs to listen the actual COM port. So, if there is some “bug” on part of configuration, then it must be NTPd to blame, I suppose.

Nevertheless, such a setup is apparently viable, and the main question that pops up here is how to monitor and debug a system configured in this way. By “debugging” I do not mean using software debugging tools, but rather NTP-related monitoring facilities which help to configure NTPd properly and then keep an eye on it, providing valuable information in case of failure; NTP and NTPd would never become the de-factor standard if it were not for their in-depth monitoring tools and APIs, including those in system kernel. However, with LinuxPPS, no internal information relevant to PPS can bee seen with ntptime and ntpq -c kerninfo, which makes me wonder whether this is by design or by fault, and if the former then what other tools are available for monitoring (except debugger)?

So, I recommend you submit a bug report.
First, it would be nice to know that I do not miss any up-to-date information about the matter. Then there is a need to determine the actual side to blame: the Linux kernel developers, the NTP daemon and its ATOM driver developers, or nobody — if no functionality beyond already seen by me was meant to be implemented. So I prefer to gather more data on the topic before advancing any further.