Planet Linux Plumbers Conf

June 03, 2009

Darrick Wong

Picspam!

/me rounded up a bunch of (old) panoramas and put them into the high-definition panorama viewer. Be sure to check out the (huge spike in memory cache when you load the) panorama previewer (click the "See All" button).

June 03, 2009 02:27 AM

August 27, 2008

Stephen Hemminger

Exploring transactional filesystems

In order to implement router style semantics, Vyatta allows setting many different configuration variables and then applying them all at once with a commit command. Currently, this is implemented by a combination of shell magic and unionfs. The problem is that keeping unionfs up to date and fixing the resulting crashes is major pain.

There must be better alternatives, current options include:
  • Replace unionfs with aufs which has less users yelling at it and more developers.
  • Use a filesystem like btrfs which has snapshots. This changes the model and makes api's like "what changed?" hard to implement.
  • Move to a pure userspace model using git. The problem here is that git as currently written is meant for users not transactions.
  • Use combination of copy, bind mount, and rsync.
  • Use a database for configuration. This is easier for general queries but is the most work. Conversion from existing format would be a pain.
Looks like a fun/hard problem. Don't expect any resolution soon.

by Linux Network Plumber (noreply@blogger.com) at August 27, 2008 10:20 PM

June 14, 2017

Paul E. McKenney

Stupid RCU Tricks: Simplifying Linux-kernel RCU

The last month or two has seen a lot of work simplifying the Linux-kernel RCU implementation, with more than 2700 net lines of code removed. The remainder of this post lists the user-visible changes, along with alternative ways to get the corresponding job done.

  1. The infamous CONFIG_RCU_KTHREAD_PRIO Kconfig parameter is now defunct, but the rcutree.kthread_prio kernel boot parameter gets the job done.
  2. The CONFIG_NO_HZ_FULL_SYSIDLE Kconfig parameter has kicked the bucket. There is no replacement because no one was using it. If you need it, revert the -rcu commit tagged by sysidle.2017.05.11a.
  3. The CONFIG_PROVE_RCU_REPEATEDLY Kconfig parameter is no more. There is no replacement because as far as I know, no one has used it for many years. It was a great help in tracking down lockdep-RCU warnings back in the day, but these warnings are now sufficiently rare that finding them one boot at a time is no longer a problem. If you need it, do the obvious hacking on Kconfig and lockdep.c.
  4. The CONFIG_SPARSE_RCU_POINTER Kconfig parameter now rests in peace. There is no replacement because there doesn't seem to be any reason for RCU's sparse checking to be the only such checking that is optional. If you really need to disable RCU's sparse checking, hand-edit the definition as needed.
  5. The CONFIG_CLASSIC_SRCU Kconfig parameter bought the farm. This was only present to handle massive failures of the new Tree/Tiny SRCU implementations, but these appear to be quite reliable and should be used instead of Classic SRCU.
  6. RCU's debugfs tracing is done for. As far as I know, I was the only real user, and I haven't used it in years. If you need it, revert the -rcu commit tagged by debugfs.2017.05.15a.
  7. The CONFIG_RCU_NOCB_CPU_NONE, CONFIG_RCU_NOCB_CPU_ZERO, and CONFIG_RCU_NOCB_CPU_ALL Kconfig parameters have departed. Use the rcu_nocbs kernel boot parameter instead, which can do quite a bit more than those Kconfig parameters ever could.
  8. Tiny RCU's event tracing and RCU CPU stall warnings are now pushing up daisies. The point of Tiny RCU is to be tiny and educational, and these added features were not helping reach either of these two goals. The replacement is to reproduce the problem with Tree RCU.
  9. These changes should matter only to people running rcutorture:

    1. The CONFIG_RCU_TORTURE_TEST_SLOW_PREINIT and CONFIG_RCU_TORTURE_TEST_SLOW_PREINIT_DELAY Kconfig parameters have been entombed: Use the rcutree.gp_preinit_delay kernel boot parameter instead.
    2. The CONFIG_RCU_TORTURE_TEST_SLOW_INIT and CONFIG_RCU_TORTURE_TEST_SLOW_INIT_DELAY Kconfig parameters have given up the ghost: Use the rcutree.gp_init_delay kernel boot parameter instead.
    3. The CONFIG_RCU_TORTURE_TEST_SLOW_CLEANUP and CONFIG_RCU_TORTURE_TEST_SLOW_CLEANUP_DELAY Kconfig parameters have passed on: Use the rcutree.gp_cleanup_delay kernel boot parameter instead.
There will probably be a few more simplifications in the near future, but this should be at least enough for one merge window!

June 14, 2017 09:03 PM

June 09, 2017

Paul E. McKenney

Stupid RCU Tricks: rcutorture Accidentally Catches an RCU Bug

With the Linux-kernel v4.13 merge window coming up, it is time to do at least a little heavy-duty testing of the patches destined for v4.14, which had been but lightly tested on my laptop. An overnight run on a larger test machine looked very good—with the exception of scenario TREE01 (defined by tools/testing/selftests/rcutorture/configs/rcu/TREE01{.boot,} in the Linux-kernel source tree), which got no fewer than 190 failures in a half-hour run. In other words, rcutorture saw 190 too-short grace periods in 30 minutes, for about one every 20 seconds.

This is not just bad. This is RCU completely and utterly failing to be RCU.

My first action was to re-run the tests on the commits slated for v4.13. You can imagine my relief to see them pass on all scenarios, including TREE01.

Then it was time for bisection. I have been burned many times by false bisections due to RCU's probabilistic failure modes, so I ran 24 30-minute tests on each commit. Fortunately, I could run six in parallel, so that each commit only consumed about two hours of test time. The bisection converged on a commit that adds a --kconfig argument to the rcutorture scripts, which allow me to do things like force lockdep to run in all scenarios. However, this commit should have absolutely no effect on the inner workings of RCU.

OK, perhaps this commit managed to fatally mess up the .config file. But no, the .config files from this commit compare equal to those from the preceding commit. Some additional poking gives me confidence that the kernels being built are also identical. Still, the one fails and the other does not.

The next step is to look very carefully at the console output from the failing runs, most of which contain many complaints about RCU grace periods being too short. Except that one of them also contains RCU CPU stall warnings. In fact, one of the stall warnings lists no fewer than 26 CPUs as stalling the current RCU grace period.

This came as a bit of a surprise, partly because I don't ever recall ever seeing that many CPUs stalling a single grace period, but mostly because the test was only supposed to use eight CPUs.

A look at the beginning of the console output showed that RCU was inexplicably prepared to deal with 43 CPUs instead of the expected eight. A bit more digging showed that the qemu command used to run the failing test had “-smp 43”, while the qemu command for the successful test instead had “-smp 8”. In both cases, the qemu command also included the kernel boot parameter “maxcpus=8”. And a very stupid bug in the --kconfig change to the scripts turned out to be responsible for the bogus -smp argument.

The next step is to swap the values of qemu's -smp argument. And the failure follows the “-smp 43” setting. This means that it is possible that the RCU failures are due to a latent timing bug in RCU. After all, the test system has only 64 CPUs, and I was running 43*6=258 CPUs worth of tests on it. But running six concurrent rcutorture tests with both -smp and maxcpus set to 43 passes with flying colors. So RCU must be suffering from some other problem.

The next question is exactly what is supposed to happen when qemu and the kernel have very different ideas of how many CPUs there are. The ever-helpful Documentation/admin-guide/kernel-parameters.txt file states that maxcpus= limits not the overall number of CPUs, but rather the number that are brought up at boot time. Another look at the console output confirms that in the failing case, eight CPUs are brought up at boot time. However, the other 35 come online some time after boot, sometimes taking a few minutes to come up. Which explains another anomaly I noticed while bisecting, namely that about half the tests ran 30 minutes without failure, but the ones that failed did so within the first five minutes of the run. Apparently the RCU failures are connected somehow to the late arrival of the extra 35 CPUs.

Except that RCU configured itself for the full 43 CPUs, and RCU is supposed to be able to handle CPUs coming and going. In fact, RCU has repeatedly demonstrated its ability to handle CPUs coming and going for more than a decade. So it is time to enable event tracing on a failure scenario (thank you, Steve!). One of the traces shows that there is no RCU callback connected with the first failure, which points the finger of suspicion at RCU expedited grace periods.

A quick inspection of the expedited code shows missing synchronization for the case where a CPU makes its very first appearance just as an expedited grace period starts. Oh, the leaf rcu_node structure's ->lock is held both when updating the number of CPUs that have ever been seen (which is the rcu_state structure's ->ncpus field) and when updating the bitmasks indicating exactly which CPUs have ever been seen (which is the leaf rcu_node structure's ->expmaskinitnext field), but it drops that lock between those two updates.

This means that the expedited grace period might sample the ->ncpus field, notice the change, and therefore check all the ->expmaskinitnext fields—but before those fields had been updated. Not a problem for this grace period, since the new CPUs haven't yet started and thus cannot yet be running any RCU read-side critical sections, which means that there is no reason whatsoever for this grace period to pay any attention to them. However, the next expedited grace period would again sample the ->ncpus field, see no change, and thus not bother checking the ->expmaskinitnext fields. Thus, this grace period would also ignore the new CPUs, which by this time could be very much alive and running RCU read-side critical sections. Hence the too-short grace periods, and hence them showing up within the first few minutes of the run, during the time that the extra 35 CPUs are in the process of coming online.

The fix is easy: Just move the update of ->ncpus to the same critical section as the update of ->expmaskinitnext. With this fix, rcutorture passes the TREE01 scenario even with bogus -smp arguments to qemu. There is therefore once again a bug in rcutorture: There are still bugs in RCU somewhere, and rcutorture is failing to find them!

Strangely enough, I might never have noticed the bug in expedited grace periods had I not made a stupid mistake in the scripting. Sometimes it takes a bug to locate a bug!

June 09, 2017 08:49 PM

September 22, 2016

Sri Ramkrishna

Making money from copylefted code

I wanted to put this out there while I still have it fresh in my mind. Here at the copyleft BoF with Bradlely Kuhn at LAS GNOME. One of the biggest take away from this is something that Bryan Lunduke said that people are able to make money off from copyleft if we don’t actually brand it as free and open source software. So it seems that if we don’t advertise something as free or open source or that there is software available, then there is a decent chance that you can make money.

Which goes back to the interesting conversation we had the previous day on pretty much the same topic. Just fascinating stuff.

by sri at September 22, 2016 06:04 PM

September 20, 2016

Sri Ramkrishna

We’re going to partay, karamu, fiesta, forever

GNOME release 3.22 happens to be during one of the core days of the Libre Application Summit Hosted by GNOME (LAS GNOME) On top of a high rise, in Portland Oregon, we’re going to celebrate GNOME 3.22 in grand style with the conference participants and end the core days at LAS GNOME!

by sri at September 20, 2016 12:05 AM

September 06, 2016

Greg KH

4.9 == next LTS kernel

As I briefly mentioned a few weeks ago on my G+ page, the plan is for the 4.9 Linux kernel release to be the next “Long Term Supported” (LTS) kernel.

Last year, at the Linux Kernel Summit, we discussed just how to pick the LTS kernel. Many years ago, we tried to let everyone know ahead of time what the kernel version would be, but that caused a lot of problems as people threw crud in there that really wasn’t ready to be merged, just to make it easier for their “day job”. That was many years ago, and people insist they aren’t going to do this again, so let’s see what happens.

I reserve the right to not pick 4.9 and support it for two years, if it’s a major pain because people abused this notice. If so, I’ll possibly drop back to 4.8, or just wait for 4.10 to be released. I’ll let everyone know by updating the kernel.org releases page when it’s time (many months from now.)

If people have questions about this, email me and I will be glad to discuss it.

September 06, 2016 07:59 AM

April 06, 2009

Darrick Wong

January 15, 2014

Greg KH

kdbus details

Now that linux.conf.au is over, there has been a bunch of information running around about the status of kdbus and the integration of it with systemd. So, here’s a short summary of what’s going on at the moment.

Lennart Poettering gave a talk about kdbus at linux.conf.au. The talk can be viewed here, and the slides are here. Go read the slides and watch the talk, odds are, most of your questions will be answered there already.

For those who don’t want to take the time watching the talk, lwn.net wrote up a great summary of the talk, and that article is here. For those of you without a lwn.net subscription, what are you waiting for? You’ll have to wait two weeks before it comes out from behind the paid section of the website before reading it, sorry.

There will be a systemd hack-fest a few days before FOSDEM, where we should hopefully pound out the remaining rough edges on the codebase and get it ready to be merged. Lennart will also be giving his kdbus talk again at FOSDEM if anyone wants to see it in person.

The kdbus code can be found in two places, both on google code, and on github, depending on where you like to browse things. In a few weeks we’ll probably be creating some patches and submitting it for inclusion in the main kernel, but more testing with the latest systemd code needs to be done first.

If you want more information about the kdbus interface, and how it works, please see the kdbus.txt file for details.

Binder vs. kdbus

A lot of people have asked about replacing Android’s binder code with kdbus. I originally thought this could be done, but as time has gone by, I’ve come to the conclusion that this will not happen with the first version of kdbus, and possibly can never happen.

First off, go read that link describing binder that I pointed to above, especially all of the links to different resources from that page. That should give you more than you ever wanted to know about binder.

Short answer

Binder is bound to the CPU, D-Bus (and hence kdbus), is bound to RAM.

Long answer

Binder

Binder is an interface that Android uses to provide synchronous calling (CPU) from one task to a thread of another task. There is no queueing involved in these calls, other than the caller process is suspended until the answering process returns. RAM is not interesting besides the fact that it is used to share the data between the different callers. The fact that the caller process gives up its CPU slice to the answering process is key for how Android works with the binder library.

This is just like a syscall, and it behaves a lot like a mutex. The communicating processes are directly connected to each other. There is an upper limit of how many different processes can be using binder at once, and I think it’s around 16 for most systems.

D-Bus

D-Bus is asynchronous, it queues (RAM) messages, keeps the messages in order, and the receiver dequeues the messages. The CPU does not matter at all other than it is used to do the asynchronous work of passing the RAM around between the different processes.

This is a lot like network communication protocols. It is a very “disconnected” communication method between processes. The upper limit of message sizes and numbers is usually around 8Mb per connection and a normal message is around 200-800 bytes.

Binder

The model of Binder was created for a microkernel-like device (side note, go read this wonderful article about the history of Danger written by one of the engineers at that company for a glimpse into where the Android internals came from, binder included.) The model of binder is very limited, inflexible in its use-cases, but very powerful and extremely low-overhead and fast. Binder ensures that the same CPU timeslice will go from the calling process into the called process’s thread, and then come back into the caller when finished. There is almost no scheduling involved, and is much like a syscall into the kernel that does work for the calling process. This interface is very well suited for cheap devices with almost no RAM and very low CPU resources.

So, for systems like Android, binder makes total sense, especially given the history of it and where it was designed to be used.

D-Bus

D-Bus is a create-store-forward, compose reply and then create-store-forward messaging model which is more complex than binder, but because of that, it is extremely flexible, versatile, network transparent, much easier to manage, and very easy to let fully untrusted peers take part of the communication model (hint, never let this happen with binder, or bad things will happen…) D-Bus can scale up to huge amounts of data, and with the implementation of kdbus it is possible to pass gigabytes of buffers to every connection on the bus if you really wanted to. CPU-wise, it is not as efficient as binder, but is a much better general-purpose solution for general-purpose machines and workloads.

CPU vs. RAM

Yes, it’s an over simplification of a different set of complex IPC methods, but these 3 words should help you explain the differences between binder and D-Bus and why kdbus isn’t going to be able to easily replace binder anytime soon.

Never say never

Ok, before you start to object to the above statements, yes, we could add functionality to kdbus to have some blocking ioctl calls that implement something like: write question -> block for reply and read reply one answer for the request side, and then on the server side do: write answer -> block in read That would get kdbus a tiny bit closer to the binder model, by queueing stuff in RAM instead of relying on a thread pool.

That might work, but would require a lot of work on the binder library side in Android, and as a very limited number of people have write access to that code (they all can be counted on one hand), and it’s a non-trivial amount of work for a core function of Android that is working very well today, I don’t know if it will ever happen.

But anything is possible, it’s just software you know…

Thanks

Many thanks to Kay Sievers who came up with the CPU vs. RAM description of binder and D-Bus and whose email I pretty much just copied into this post. Also thanks to Kay and Lennart for taking the time and energy to put up with my silly statements about how kdbus could replace binder, and totally proving me wrong, sorry for having you spend so much time on this, but I now know you are right.

Also thanks to Daniel Mack and Kay for doing so much work on the kdbus kernel code, that I don’t think any of my original implementation is even present anymore, which is probably a good thing. Also thanks to Tejun Heo for help with the memfd implementation and cgroups help in kdbus.

January 15, 2014 08:57 PM

September 03, 2009

Valerie Aurora

Carbon METRIC BUTTLOAD print

I just read Charlie Stross's rant on reducing his household's carbon footprint. Summary: He and his wife can live a life of monastic discomfort, wearing moldy scratchy 10-year-old bamboo fiber jumpsuits and shivering in their flat - or, they can cut out one transatlantic flight per year and achieve the equivalent carbon footprint reduction.

I did a similar analysis back around 2007 or so and had the same result: I've got a relatively trim carbon footprint compared to your average first-worlder, except for the air travel that turns it into a bloated planet-eating monster too extreme to fall under the delicate term "footprint." Like Charlie, I am too practical, too technophilic, and too hopeful to accept that the only hope of saving the planet is to regress to third world living standards (fucking eco-ascetics!). I decided that I would only make changes that made my life better, not worse - e.g., living in a walkable urban center (downtown Portland, now SF). But the air travel was a stumper. I liked traveling, and flying around the world for conferences is a vital component of saving the world through open source. Isn't it? Isn't it?

Two things happened that made me re-evaluate my air travel philosophy. One, I started a file systems consulting business and didn't have a lot of spare cash to spend on fripperies. Two, I hurt my back and sitting became massively uncomfortable (still recovering from that one). So I cut down on the flying around the world to Linux conferences involuntarily.

You know what I discovered? I LOVE not flying around the world for Linux conferences. I love taking only a few flights a year. I love flying mostly in the same time zone (yay, West coast). I love having the energy to travel for fun because I'm not all dragged out by the conference circuit. I love hanging out with my friends who live in the same city instead of missing out on all the parties because I'm in fucking Venezuela instead.

Save the planet. Burn your frequent flyer card.

September 03, 2009 07:04 AM

March 04, 2013

Twitter

March 01, 2013

Twitter

February 18, 2009

Stephen Hemminger

Parallelizing netfilter

The Linux networking receive performance has been mostly single threaded until the advent of MSI-X and multiqueue receive hardware. Now with many cards, it is possible to be processing packets on multiple CPU's and cores at once. All this is great, and improves performance for the simple case.

But most users don't just use simple networking. They use useful features like netfilter to do firewalling, NAT, connection tracking and all other forms of wierd and wonderful things. The netfilter code has been tuned over the years, but there are still several hot locks in the receive path. Most of these are reader-writer locks which are actually the worst kind, much worse than a simple spin lock. The problem with locks on modern CPU's is that even for the uncontested case, a lock operation means a full-stop cache miss.

With the help of Eric Duzmet, Rick Jones, Martin Josefsson and others, it looks like there is a solution to most of these. I am excited to see how it all pans out but it could mean a big performance increase for any kind of netfilter packet intensive processing. Stay tuned.

by Linux Network Plumber (noreply@blogger.com) at February 18, 2009 05:51 AM

September 25, 2010

Andy Grover

Plumbers Down Under

<p>Since the original <a href="http://www.linuxplumbersconf.org/">Linux Plumbers Conference</a> drew much inspiration from <a href="http://lca2011.linux.org.au/">LCA</a>'s continuing success, it's cool to see some of what Plumbers has done be seen as <a href="http://airlied.livejournal.com/73491.html">worthy of emulating at next year's LCA</a>!</p><p>LCA seems like a great opportunity to specifically try to make progress on cross-project issues. It's quite well-attended so it's likely the people you need in the room to make a decision will be <em>in the room</em>.</p>

by andy.grover at September 25, 2010 01:50 PM

September 10, 2010

Andy Grover

Increasing office presence for remote workers

<p>I work from home. My basement, actually. I recently read an article in the Times about <a href="http://www.nytimes.com/2010/09/05/science/05robots.html?_r=1&amp;pagewanted=1">increasing the office presence of remote employees with robots</a>. Pretty interesting. How much does one of those robo-Beltzners cost? $5k? This is a neat idea but it's still not released so who knows.<br /><br />I've been thinking about other options for establishing a stronger office presence for myself. Recently I bought a webcam. If I used this to broadcast me, sitting at my desk on Ustream or Livestream, that would certainly make it so my coworkers (and the rest of the world) could see what I was up to, every second of the workday. This is actually a lot <i>more</i> exposure than an office worker, even in a cubicle, would expect. If I'm in an office cube, I might have people stop by, but I'll know they're there, and they won't <i>always</i> be there.&nbsp; There is still generally solitude and privacy to concentrate on the code and be productive. I'm currently trying something that I think is closer to the balance of a real office:<br /><ul><li>Take snapshots from webcam every 15 minutes<br /></li><li>Only during normal working hours</li><li>Give 3 second audible warning before capturing</li><li>Upload to an intranet webserver</li></ul>I haven't found this to be too much of an imposition -- in fact, the quarter-hourly beeps are somewhat like a clock chime.<br /><br />In the beginning, it's hard to resist mugging for the camera, but that passes:<br /><img style="max-width: 800px;" src="http://oss.oracle.com/%7Eagrover/pics/blog/whassup.jpg" alt="whassup???" height="240" width="320" /><br />Think about how this is better than irc or IM, both of which <i>do</i> have activity/presence indicators, but which either aren't used, or poorly implemented and often wrong. How much more likely are you, as a colleague of mine, to IM, email, video chat, or call me if you can see I'm at my desk and working? No more "around?" messages needed. You could even see if I'm looking cheerful, or perhaps otherwise indisposed, heh heh:<br /><img style="max-width: 800px;" src="http://oss.oracle.com/%7Eagrover/pics/blog/cat1.jpg" alt="hello kitty" height="240" width="320" /><br />On a technical note, although there were many Debian packages that kind-of did what I wanted, it turned out to be surprisingly easy to roll my own in about <a href="http://github.com/agrover/pysnapper/blob/master/webcam.py">20 lines of Python</a>.<br /><img style="max-width: 800px;" src="http://oss.oracle.com/%7Eagrover/pics/blog/working.jpg" alt="working hard." height="240" width="320" /><br />Anyways, just something I've been playing around with, while I wait for my robo-avatar to be set up down at HQ...</p>

by andy.grover at September 10, 2010 05:20 PM

November 08, 2009

Valerie Aurora

Migrated to WordPress

My LiveJournal blog name - valhenson - was the last major holdover from my old name, Val Henson. I got a new Social Security card, passport, and driver's license with my new name several months ago, but migrating my blog? That's hard! Or something. I finally got around to moving to a brand-spanking-new blog at WordPress:

Valerie Aurora's blog

Update your RSS reader with the above if you still want to read my blog - I won't be republishing my posts to my new blog on this LiveJournal blog.

If you're aware of any other current instances of "Val Henson" or "Valerie Henson," let me know! I obviously can't change my name on historical documents, like research papers or interviews, but if it's vaguely real-time-ish, I'd like to update it.

One web page I'm going to keep as Val Henson for historical reasons is my Val Henson is a Man joke. Several of the pages on my web site were created after the fact as vehicles for amusing pictures or graphics I had lying around. In this case, my friend Dana Sibera created a pretty damn cool picture of me with a full beard and I had to do something with it.



It's doubly wild now that I have such short hair.

November 08, 2009 11:36 PM