2024 dbt Labs Coalesce Conference

Posted on October 24, 2024October 24, 2024 by Tarus

I traveled for much of the last two months and during that time I went to four conferences. I usually like to write about them when they are happening but I wasn’t able to manage it, so I’m offering them up a little delayed. I’m going to start with the last one, which was the Coalesce conference hosted by dbt Labs.

I’m a big dbt Labs fan, as they create a ton of useful open source software focused mainly on the “T” part of ETL. I went to their conference last year, which was in San Diego, and it was a lot of fun, but I must be honest in that when they announced it would be in Las Vegas this year I was not that excited.

I get my fill of Vegas at the annual AWS re:Invent conference, and while I’ve met a lot of friendly residents in that town it is not a favorite. I don’t often gamble, and I don’t like crowds or rampant commercialism. Vegas does have decent food options.

That said, I rather enjoyed this trip. The conference was held at Resorts World, which is a Hilton-owned property, and I stayed in the Conrad tower. While I have zero status at Hilton I got a nice room at a reasonable price, and it was great to be able to return to it just by taking an elevator. The conference was on the second floor, while the ground floor hosted a number of events held in various restaurants and bars (as well as being the location of the main casino). I didn’t leave the building for over two days.

I arrived Monday mid-afternoon and that was a day set aside for partner meetings. AWS was well represented and Yev Kravchenko and Siva Ragahupathy did a nice presentation on modern data foundations using dbt and AWS.

That evening the expo floor opened up. One nice thing about being in Vegas is they know hospitality, and the snacks and drinks were top notch. Outside of keynotes, wandering around the expo is one of my favorite parts of any conference.

Since AWS had so many people there, we all got together for dinner. I think there were 12 of us at the table. It was nice to spend time old friends and to make some new ones. One thing I love about working at AWS is being around people who are so much smarter than me.

The next morning the conference officially started. My admission included breakfast and lunch, so I went down to the second floor for food. I liked to sit towards the back of the area next to the windows so I could get a view of The Sphere.

I thought it was funny that Resorts World also had a sort of mini-Sphere:

The main theme of the conference was “One dbt“, reflecting the goal of “an integrated, governed, and scalable approach to data analytics”.

dbt Labs has two main products: the open source “dbt Core” and the managed offering “dbt Cloud”. A goal of One dbt is to make it much easier to work with and switch between them. For example, production could run on cloud due to the extra security and reliability but development could be done locally on core.

Tuesday’s keynote introduced us to our hosts for the week: Lesley Greene, Grace Goheen and Alexis Jones.

They welcomed us to the conference and pointed out that there were around 2000 people in attendance and another 8000 or so watching virtually. They also thanked the sponsors, and AWS joined Tableau and Snowflake at the platinum level.

Alexis then introduced the first speaker, co-founder and CEO Tristan Handy.

He talked about the origins of dbt Labs, which began life as Fishtown Analytics (Fishtown is a section of Philadelphia, PA, USA, where the company was founded) with its mission to create a mature workflow system for analytics. This has now grown into something they are calling the “data control plane”:

He also announced “dbt Copilot”, a generative AI product to leverage genAI models to help write code, analyze logs, improve query performance and use natural language to ask business questions.

Tristan then brought out Yannick Misteli, who is the head of engineering at Roche. He talked about their five year journey with dbt because they needed to build a world class analytics platform and dbt “checked all the boxes”.

The next presentation brought back an old Coalesce favorite, The Jaffle Shop. The Jaffle Shop is an example company created to demonstrate practical uses for dbt Labs’ products, and this year they were expanding. Since a lot of their sales occurred at venues, they decided to buy one: Cirque du Jafflé.

Amy Chen walked us through how one would take two discrete organizations, both who were using dbt products, and combine them. In this hypothetical example, The Jaffle Shop used dbt Cloud with data in Amazon Redshift while Cirque du Jafflé used dbt Core over Amazon Athena.

dbt Cloud now supports a dbt to Athena connector, so it is a somewhat trivial exercise to access that data, and Amy did a great job with the example to show how to do it.

The next guest speaker was Tobi Humpert from Siemens.

Amy broke the ice by bringing up the fact that at the Munich location, Siemens uses sheep to keep the lawn at the proper height, versus mowing it. These sheep are removed in the winter and return the following spring, and this was a segue into cross platform lineage (if you had lineage you would know where they went). Of course the talk then moved into more practical for cross-platform data, enabled by dbt Mesh.

Amy then handed the floor over to Greg McKeon to talk about a new visual editor available in dbt Cloud. He did a demo which also included an integration with dbt Copilot, which involved adding tests and documentation to an existing dbt model.

Greg was followed by Roxi Dahlke, who talked about how to maintain high-quality data you can trust during transformation.

This is done through three new dbt features: Advanced CI (Continuous Integration), Trust and Usage Signals (how data is consumed and is it trustworthy), and the Semantic Layer (custom system metrics across all tools). She also did a dbt Copilot demo leveraging the existing semantic layer to allow for natural language queries.

Roxy invited a special guest, James Dorado from Bilt Rewards, who discussed how they were using the semantic layer to present data directly to end customers, which requires even more trust in that data then we you use it internally.

For the final part of the keynote, Tristan returned to discuss how dbt partners were supporting analytics workflows from end-to-end and it featured Dan Jewett from Tableau and Neeraja Rentachintala from AWS. The discussion focused on how the new product announcements were going to improve the customer experience.

Overall I enjoyed the keynote and I saw a real natural progression for dbt as it continued to focus on making analytics workflows more powerful, more relevant to business needs, and easier to use.

Outside of the keynotes, Coalesce provided a lot of other sessions. I don’t directly use dbt so many (most) of them were above my head, but I did attend all of the ones by AWS folks.

The first one was a talk by Darshit Thakkar and BP Yau called “Amazon Athena and dbt: Unlocking serverless data transformations”.

I understood a lot of this, and I know that “serverless” is becoming the preferred way to manage cloud resources, but I did have a problem with the demo. BP likes to use dark mode. I’m not a fan. I grew up reading dark text on a light background so that is my preference, but in this case it made it almost impossible to see the text on the projection screen, so I just took his word for it (grin).

The Wednesday morning keynotes kicked off with a presentation by Allie K. Miller on artificial intelligence. GenAI is such a hot topic in the industry that you can’t go to any conference these days without encountering it. While I’m at the edge of burnout on the topic, I did enjoy her presentation, which included a graphic on the progression of true autonomous analytics.

She also sent a link to a cool use of genAI where it takes a picture and turns it into kind of a squishy play-dough kind of thing, that I found fascinatingly weird.

This was followed by a customer panel lead by Brandon Sweeney, featuring Kayleigh Lavorini from Fifth Third Bank and Srini Vemuru from Salesforce. The discussion involved their thoughts for using AI for their analytics needs and how they leverage dbt.

As my focus is open source, I was drawn to any presentation involving dbt Core, and one of the more amusing ones was given by Grace Goheen. I had met Grace and Jeremy Cohen at one of the casino bars the night before, and they are both theatre nerds (spoken as an ex-theatre nerd myself). Jeremy currently lives in France (and he was the only one to pronounce “Cirque du Jafflé” in the proper accent) and he made a cameo as Elvis (we were in Vegas, after all). I expect some sort of “rat pack” reference next year. (grin)

To be serious for a moment, all of us in the open source community have witnessed established projects making big changes in their licenses over the last couple of years. While I don’t want to comment on that in this post, it is always nice to see folks like dbt confirm that they have no plans to change the license on dbt Core. I’ve heard it from the top, including Tristan and Mark Porter, and in this presentation Grace renewed that vow in the form of a “wedding” officiated by Elvis.

It was a cute and entertaining way of addressing a very serious subject.

There was another session by AWS speakers Wednesday afternoon with the lofty title of “Journey to Generative AI driven near-real-time-operational analytics with zero-ETL and dbt Cloud”. Whew.

This was presented by Neerja Rentachintala (who you may remember from the Tuesday keynote) and Neela Kulkarni. They presented a reference architecture built on Amazon Aurora, Amazon Redshift and dbt Cloud for operational analytics.

Neela did the demo, and I am happy to say she used both light mode and a large typeface so I could see what she was doing. That doesn’t mean that I totally understood what she was doing, but the audience seemed to get it. (grin)

Wednesday night was the Coalesce party, and as I’ve mentioned before Vegas is known for its hospitality industry so it was top notch. Held in one of the nightclubs in Resorts World it featured a DJ, lights, lots of food and drink.

I am not one for large crowds so I didn’t stay very long. In fact, I had been traveling for almost six weeks straight so I ended up finding a flight home that still allowed me to attend Thursday’s keynote. The people I talked to who stayed at the party had a great time, and it seemed like the attendees enjoyed it.

Thursday’s keynote focused on dbt Core and the community, and was hosted by Grace and Jeremy, along with Amada Echeverría presenting the Community Awards.

While a lot of the focus of the keynotes was on dbt Cloud, this one focused on all of the work that is being done in dbt Core, including a feature to unit test dbt models, improvements to snapshots, as well as the the ability to do “microbatches” which allows a user to break up large datasets into smaller chunks for processing.

Then it was time for the Community Awards.

I was extremely happy to see my friends on the dbt-athena adapter get the Trailblazing Innovator award. Last year at Coalesce, my boss David Nalley made a special reference to this team, made up of Jérémy Guiselin, Nicola Corda, Jesse Dobbelaere, Mattia Sappa, and Serhii Dimchenko, who had created an adapter that was so well made it achieved, this year, “trusted” status and, as was mentioned above, is now supported in dbt Cloud as well as dbt Core.

Three of the award winners, Influencer Extraordinaire Opeyemi Fabiyi, Data Governance Excellence winner Jenna Jordan and Catalyst of Impact winner Bruno Souza de Lima, were in attendance and participated in a panel discussion on the rewards and challenges of their work in the community.

The final part of keynote brought our hosts back out to talk about Coalesce 2025, which will be back in Las Vegas on 6-9 October.

As reluctant as I was to consider another Vegas conference, I have to say that the Resorts World location really worked for Coalesce, and I look forward to returning for my third event next year.

[Note: if you are part of an open source startup, or want to start an open source company, and you are coming to re:Invent, be sure to attend this panel featuring Tristan Handy]

The Yearbook Game

Posted on August 12, 2024 by Tarus

This year marks the 40th anniversary of my graduation from the North Carolina School of Science and Mathematics (NCSSM). Attending NCSSM was one of the most formative experiences of my life, but to be honest I have never been that big on the reunions. For some reason I am super eager to go to the 40th.

Since it has been four decades since I’ve seen most of my classmates, I thought it would be cool to come up with a little game. In the US most schools used to produce a “yearbook” (sometimes called an “annual”) which is a large book containing pictures from the school year. While formats can differ, there is always a section for each class with pictures of the people in that class. I thought it would be cool to scan in all of those pictures and then create an online game where you have to match the person’s picture to their name.

The problem is that I do not consider myself a programmer. While I got my first computer in 1978 and did a lot of coding back then in BASIC, FORTRAN, Pascal and a little C when in college, I haven’t programmed seriously in decades. I mean, I can write a simple bash script and I can usually hack something that kind of does what I need into something that does what I need, but I would struggle to write “Hello world!” in any modern language.

TL;DR: We made a yearbook game where you try to match a name to the correct picture using generative AI.

But I don’t need to know how to code if I have access to generative AI, right? (grin)

I have to admit that I’m a little burned out by all the GenAI hype going on lately, but recently my boss pointed out we were in the same boat when cloud computing was first a thing. It was so bad that my friend Spot Callaway did a presentation [download] where the first four minutes was nothing but slides with the word “cloud” on them.

Now I work for a major cloud provider and cloud computing is pretty much everywhere. It doesn’t seem so bad anymore. My boss’s perspective on GenAI really changed my mind and now I had a reason to try and use it.

I am not the sharpest knife in the drawer, but my superpower is that I know a lot of very smart people. One of those is Ricardo Sueiras, and I figured he would be just the person to help with my project. Last Wednesday we set up a time to meet and to create my game.

First I had to digitize the pictures. People need to remember that, back in 1984, pictures involved this thing called “film”. Things we take for granted today were just impossible back then, and I am actually somewhat in awe of the people who put together the yearbook without the help of digital publishing. The pictures for our class photos were taken in color but in the yearbook they were published in black and white, which can wash out details when printed, but I didn’t have access to anything better. I started with our senior year yearbook since the senior class pictures were printed out larger than the junior class one.

After scanning in twelve pages and cropping each person into their own photo, I saved them in the format $NAME.png. Since I suffer from mild OCD, I had to go back to the junior year yearbook to get pictures of people who didn’t have a senior picture, either because they didn’t return for the second year or they missed picture day. When I was done I had a little over 200 pictures.

When I originally talked to Ricardo about the game, I figured we would show a picture and then offer some choices as to who it was. In fact, he went ahead and experimented with creating that game.

But as I thought about it, I figured most of the fun would be in seeing the pictures and it would take awhile to cycle through 200 or more of them, so I changed the game to present a single name and then four pictures as choices.

One other thing I did was create a CSV file containing the student’s name and their gender. Note that back in 1984 people weren’t as publicly gender fluid as they are now, so I went with the gender by which I knew each person. Feel free to gently correct me if I need to swap one, but I thought it would be a little more difficult to guess if a person with a masculine-sounding name was presented with four masculine-looking pictures.

With everything ready I jumped on a call with Ricardo. Now another thing that has changed since I used to program is that most people use an IDE. When I first started writing code I used a line editor, not even a full screen editor, so IDEs are just magic to me. Ricardo uses Visual Studio, and we decided to use Amazon Q (‘natch) which has an easy to use plugin.

A screenshot showing the Visual Studio interface

The first thing we had to do was map the pictures to a new name. If you were asked to match “Tarus Balog” to the right picture, it wouldn’t do to just mouse over all four until you found “Tarus Balog.png”. Not only did Q create a script to rename all of the files by numbers, it updated the CSV file so that the correct file name was now included as a column, so now the file had name, gender and image number.

Ricardo decided to use Python to write the code and to host it in Flask, which is a simple Python web framework. It was surprisingly easy to get the first pass working, but we could never get the match to work. After debugging it, it seems like it was trying to match the correct person in one result to the random result of the next iteration. What’s weird is that once in awhile it would work, if the picture you chose just happened to be the correct guess for the next game, but it was very rare.

We tried a number of prompts to get Q to fix it, but the final solution involved pasting the code into Amazon Bedrock, choosing a different LLM and explaining what was happening. It created a result using a Python session object and, voila!, it worked.

That turned out to be about half of the project. I still wanted to be able to host it on my web server, but I had been lazy about upgrading Ubuntu and the version I was running didn’t support Python 3.10 (which was required for one of the libraries we used). So time to do a production upgrade and deal with those issues. I also spent a little time editing the web page templates and the CSS file (it turns out NCSSM publishes a brand style guide). I also went to find an old NCSSM logo, what we used to call “burning diapers” versus the modern one that looks like a stylized atom, to use a background.

Flask runs by default on localhost port 5000, so I set up a proxy in the Apache web server to make it accessible (I didn’t want to mess with SSL in Flask as I know my way around Apache configurations). But since I run WordPress in the root of https://tarus.io the submit didn’t work when accessed through the proxy, so I ended up creating a subdomain to host it. The finished product is at https://classof1984.tarus.io.

And to save you the time refreshing to find my picture, here it is.

A screenshot of the Yearbook Game showing my name and four choices. I'm the upper right picture when I had a huge "Greg Brady" perm

Yes, my hair was magnificent.

One last note about those pictures. Not everyone in our class managed to live until the reunion. One of my beta testers mentioned that she was a bit triggered by seeing the picture of a friend of hers who had died from cancer. She asked me if I could remove pictures of those people we knew were no longer with us.

We talked about it and I could understand her issue but I had a different reaction. To me seeing pictures of those who had passed made me think of them fondly. Another beta tester said that he would feel like we were “erasing” them if we took them out. So it was decided to leave them in. It was funny to me how much thought needs to go into even the simplest app (well, if you care about not being a d*ck).

The last task was to upload the files to GitHub, which was my first ever repository even though I joined in 2006. We have generative AI to thank for that.

A screenshot of my GitHub page with a "first repository" image

It was a lot of fun and now I’ve downloaded Visual Studio to my own computer and I plan to make even more programs.

Phear.

Confluent’s “Data In Motion” Tour

Posted on June 7, 2024June 7, 2024 by Tarus

One of the perks of my job is that I get to work with some incredible partners. One of those is Confluent, probably best known for being the primary maintainer of the Apache Kafka project.

This year, Confluent is doing a multi-city “Data In Motion” tour. The name comes from a focus on real-time data processing. Modern applications often have a requirement to collect data from one or more sources, enrich it and then use the enriched data to provide useful information to the end user, usually in real-time. This tour was a half-day seminar exploring some solutions to that use case.

The event was held at a place called Boxyard RTP. It has been many years since I worked in the Research Triangle Park and it has really grown in that time. Boxyard is made up of repurposed shipping containers. There are restaurants, a bar, a stage (when I walked up a band was playing) and for the purposes of this seminar there was an area on the second level with a conference room and a patio.

The agenda consisted of four main items: an overview of what is going on with Confluent’s offerings, a “fireside chat” about using real-time data for security, an hour-long demo of new functionality and a customer success story.

It was cool to see that AWS was a Diamond Sponsor of this event.

The first presenter was Ananda Bose, who is the Director of Solutions Engineering at Confluent. He covered some of the new products available from Confluent, especially Kora. Kora is a cloud native implementation of Apache Kafka.

At my previous company we wanted to be able to offer our technology as a managed service, which was difficult since it was a monolithic Java application. The ultimate goal was to have a cloud native version, and by that I mean a version of the application that can take advantage of cloud technologies that provide resilience and automatic scalability. Apache Kafka is also a Java app and a lot of work must have gone in to decoupling the storage, identity management, metrics and other aspects of the program to fit in the cloud native paradigm.

One thing I liked about Ananda’s presentation style was that he was very direct. Confluent has just completed an integration with Apache Flink, which is a stream processing framework. One thing that Flink brings is ANSI-compliant SQL. Prior to this integration people used KSQL, but the words that Ananda used to describe KSQL are not really appropriate for this family-friendly blog. (grin)

Kora reinforces something I’ve been saying about open source for some time. When it comes to open source software, people are willing to pay for three things: simplicity, stability and security. Kora does all three and the design of Kora even won the “Best Industry Paper” award at last year’s Very Large Databases conference.

We would see Kora and Flink in action in the demo section.

The second talk was a fireside chat between Ananda and Dr. Jared Smith. Jared works at SecurityScorecard, a security risk mitigation company.

SecurityScorecard has to consume petabytes of data in order to detect malicious behavior on the network. In the way their system works, the payload of a given message may be 20 megabytes or larger, and when they used RabbitMQ it simply couldn’t handle the workload. When they switched to Kora their scaling issues went away.

One cool story Jared told happened during the start of the war in Ukraine. SecurityScorecard placed a “honeypot” server in Kyiv and it was able to detect a large Russian botnet attacking the network. They were able to collect and block the IP addresses of the bots and thus mitigate the damage.

The next hour was taken up by a demo. ChunSing Tsui, a Senior Solutions Engineer, walked us through an example using Confluent Cloud, MongoDB and Terraform. The whole demo is available on Github if you’d like to recreate it on your own.

In this example, a shoe store called HappyFeet wants to monitor website traffic to identify customers who visit but don’t stay on the site very long. Then they could use this information to try and re-engage with them through a marketing campaign offering discounts, etc.

While I am in no way an expert at this stuff, it was engaging. There were four data sources that would be processed to provide an enriched data stream to MongoDB tables. What I did like about it is that the heart of the demo was all written in SQL.

As an “old” I am not as up to date on the new hotness in cloud computing as I would like to be, but SQL I know. This was a product that took a difficult concept and made it accessible.

The final presentation was a customer success story from SAS Institute. It was given by Bryan Stines, Director of Product Management for the SAS Cloud, and Justin Dempsey, a Senior Manager for SAS Cloud Innovation and Automation Services.

It was a nice close to the meeting as the “Data In Motion” theme was very present here. One of the products SAS provides involves fraud detection for credit and debit card transactions. When a person swipes or taps a credit card, that sets off a series of events to detect fraud that may involve numerous checks. This has to be done on the order of milliseconds.

Now I am a big open source software enthusiast, but free software doesn’t mean free solution. With my previous project we used technology such as Apache Kafka, Apache Cassandra, PostgreSQL and others. Our users had to either acquire or develop some of that expertise in house or they needed to find a partner, and that was the issue facing SAS. By partnering with Confluent they were able to get the most out of the software from the people who knew it best.

I no longer live that close to RTP but I felt the three hour round trip was worth it for this event. There are still several dates on the calendar so if this interests you, please check it out.

2024 Open Source Founders Summit

Posted on June 6, 2024June 7, 2024 by Tarus

I’m not sure where I first learned about the inaugural Open Source Founders Summit, but I can remember thinking that I’d wished this had been around 20 years ago.

In many ways, starting and running an open source business is no different than any other business. You still need to take care of your customers, create a useful product, and control expenses. But even though open source businesses are software businesses, they different greatly in that the software isn’t the product.

My first computer was a TRS-80 from Radio Shack that I got for Christmas in 1978. A few years later my father would buy one of the first IBM PCs and the main piece of software he ran was Lotus 1-2-3. Mitch Kapor was the first person I’d heard of to become wealthy off of software (followed soon by Bill Gates). From a business perspective the model was compelling: create a useful, high margin product and distribute it for next to nothing.

Because open source involves software, people still think that the standard software business model applies, and that the open source aspect is more of a “loss leader” for a proprietary product. For those of us who want to run a truly open source business, this isn’t an option. From someone who spent two decades running an open source business, the idea of getting a bunch of us together was exciting.

The conference took place in Paris on 27-28 May, and was organized by Emily Omier and Remy Bertot. About 75 people showed up for the conference, and I got to see some old acquaintances like Frank Karlitschek, Peter Zaitsev, Monty Widenius, and Peter Farkas. Brian Proffitt from Red Hat was also there (and I finally got to meet his wife as they came to Paris early to celebrate their wedding anniversary).

The event kicked off Sunday night with a social activity at a brewery. That was a lot of fun as I got to make many new friends, although the venue was a bit loud for conversation. It did make me very eager for the conference to start the next morning.

The venue was pretty cool. It was called The Loft and once you exited off the street you entered into a covered courtyard (which was nice since it did rain off and on). Up a short flight of stairs brought you to the main room. On one side were chairs and two screens for presentations, and on the other was an area with high top tables where food was served and a bar for juice and coffee in the morning (they added soft drinks during lunch).

I really liked the conference format, which consisted of several 30 minute presentations in the morning that everyone could attend. While I understand the need and usefulness of multiple tracks at many larger conferences, it was nice not to have to miss anything when at a smaller event. The presentations were followed by an hour of five minute “lightning talks” and then by lunch.

In the afternoon we would break out into smaller groups for an open discussion of topics around various aspects of running an open source business. One group would stay in the main area while the others would move to spaces in a loft and downstairs in the basement. In every group I attended we would sit in a circle while the moderator would start off the discussion, but after that it was a pretty open format.

One of the main tenets of the conference was that it was a safe space in which to share stories, so none of the sessions were recorded and I will be somewhat circumspect in what I share from the conference as well. It created an intimacy and a level of trust I haven’t experienced at other conferences.

Remy and Emily started us off with an overview of the conference and what was planned for the next two days.

We then jumped into the presentations, starting with Thomas di Giacomo from SUSE.

Thomas talked about “projects vs. products”. In an open source business where you are commercializing the work of an open source project, you don’t have total control of the product roadmap. He discussed strategies for working with your project to better align both the needs of it and of the business. He also talked about pricing open source products. One of the things we did wrong in the beginning at my company was we priced our offerings too low. We were able to adjust that over time but you should never, ever compete on price in an open source business. You compete by being a better and more flexible solution, and one that doesn’t come with vendor lock-in.

The next speaker was Pierre Burgy from Strapi.

I had never heard of Strapi, but they are a popular (and successful) content management system. They have raised a lot of investment money over several rounds and their business model was originally based on “open core” or having a proprietary “enterprise” version.

Now as my three readers know, for over 20 years I’ve been writing about open source business and in many of those posts I have railed against the open core model. I just don’t like it. But it is much more palatable to investors than a pure open source model, and several of the speakers at this conference went the VC route. Strapi did and it seems to be working well for them.

But the basis of Pierre’s talk was on Strapi’s addition of a SaaS model and how it compared to their enterprise offering. I am a huge fan of companies offering a hosted version of open source software. I’ve found that people are willing to pay for three things when it comes to open source: simplicity, security and stability. One of the best ways to offer this is through a managed product.

Selling Strapi’s enterprise offering had similarities to selling proprietary software. It had longer sales cycles and needed a lot of one to one contact. But the deals were large. Contrast that to the SaaS offering which was self-serve, required much less customer contact, and resulted in a faster sales cycle. But the deals were smaller and there was a greater chance that the customer would later leave.

The next speaker was someone I’ve followed for decades but never met: Gaël Duval.

Gaël started Mandrake Linux, which is where I was first introduced to him. He now is the CEO of Murena, a privacy-focused smartphone provider based on the /e/ phone operating system.

There was a time when I was extremely into running truly free and open source software on my mobile devices. I bought a CopperheadOS phone and later ran GrapheneOS. I loaded Asteroid on my smartwatch. For $REASONS I’m back in the Apple ecosystem but I still applaud efforts to bring open source to mobile. It is one area where we in the FOSS community have failed (for example, I could be perfectly happy running Linux Mint as my sole desktop environment but there is nothing truly equivalent in mobile).

However, Gaël wasn’t here to talk about that, instead he focused on the change from being a hands-on founder to becoming a CEO. That is something I had to work through as well, and the issue of delegation was something with which I struggled. He talked about how he addressed that and more and I identified with a lot of what he said.

As with many conferences my favorite part was the hallway track. I didn’t know that Brian Proffitt and Monty Widenius had never met, so I made introductions. This kicked off a spirited discussion over licensing (Red Hat is owned by IBM which will soon own Hashicorp, which opens up some interesting licensing possibilities).

As you can see in this picture I’ve hit my goal of spreading joy and happiness throughout the conference. (grin)

Amandine Le Pape was the next speaker, and she talked about building an open ecosystem.

Amandine is the CEO of Element, the company behind the open source Matrix protocol. What I found interesting about this talk was hearing from a company with an extremely successful project but one where the company behind it struggles to share in that success. She was extremely transparent with numbers, including that Element often contributed 50% of its operating budget to the project and that they had to downsize. But the decisions they had to make have worked and Element is doing much better.

The project I was associated with never had the adoption of something like Matrix, but I do remember the two times we had to downsize and those were the worst days of my professional life. This talk really drove home the reality that open source businesses are businesses, and you have to make decisions based on the health of the company as well as the project.

The last talk of the morning was followed by an hour of five minute lightning talks. The first one was by Olga Rusakova who talked about what types of content from engineers best drive engagement.

Engineers don’t necessarily write compelling content, even though such content can drive business leads. Olga talked about the best practices for getting engineers to write such content.

Kevin Muller was next with a talk about the best KPIs to measure when trying to gauge the health of an open source startup. Spoiler: it isn’t Github stars. I didn’t get a good picture of Kevin but I did get a very clear shot of the guy’s head who was sitting in front of me (sigh).

I did get a decent picture of Alexandre Brianceau.

Alexandre talked about becoming the CEO of an open source company while not being a founder. In many cases when an open source startup gets a funding round, the investors will insert a new CEO into the mix. In the case of Alexandre he was already part of the company and he was just a good fit for the role. Founder and CEO are different roles and sometimes it is best for a company if someone other then a founder takes that role.

And for the viewpoint from the other side, Peter Zaitsev talked about replacing himself in the company he founded with a new CEO.

He covered his rationale for turning over the reins of his company to another, and having met Ann Schlemmer I can see why he made the choice.

Elisabeth Poirier-Defoy finished up the talks with a discussion of the organization behind GTFS, the specification that allows municipalities to publish transit schedules for consumption by third parties, especially mapping applications.

I feel especially close to this technology since I was doing a lot of work with the City of Portland in 2005, and it was their local transportation authority, TriMet, that started publishing transit schedules in an accessible format. It is great to see how that effort has grown into a worldwide practice. Many times in Paris I pulled out my mobile to look up the best bus or metro to take to get to my destination.

We then took a break for a nice lunch and more conversation.

I didn’t take any pictures of the afternoon sessions. There was something about this conference that kind of discouraged my usual need to record everything. As I mentioned above, after lunch we would break out into groups of 20 or so to discuss a certain aspect of open source business. They worked surprisingly well, and in the sessions I attended everyone seemed eager to contribute yet no one really talked over anyone else. It was a refreshingly frank exchange of ideas.

As the afternoon came to a close, I ended up in the courtyard talking with Stephen Augustus. I know we’ve been in the same room before, but this was the first time we were formally introduced. Stephen is the head of open source at Cisco, and we had a great conversation walking to the evening event, which Cisco sponsored.

We met at the Hotel du Nord for drinks and dinner. It was wonderful, although the bar space was a little constrained so it made circulating difficult (and like Sunday night, it was loud). I ended up grabbing a seat (my bad ankle was starting to bother me) but was soon enjoying more conversation and appetizers. We then adjourned to the dining room where the great conversation continued (and with a different set of tablemates so I wouldn’t wear out my welcome). It capped off a nearly perfect day.

Day Two started off very much like the first: with opening remarks from Emily and Remy.

Frank Karlitschek started off the day with a discussion of generating leads when you are an open source company.

I am an unabashed Karlitschek fan-boy. He started a company called ownCloud, and when his vision for the product and the investors diverged, he took a huge leap and left to form a fork called Nextcloud. I use Nextcloud several times a day, and it is amazing, and I am always eager to hear what Frank has to say.

Nextcloud is self-financed (i.e. no VC) and is profitable. They do not subscribe to the open core model, but unlike other open source companies they also do not perform any professional services (at my company professional services made up about 35% of revenue).

He talked about their sales funnel, where they generate leads into the CRM system which are then qualified by marketing and eventually assigned to a salesperson. It’s called a funnel for a reason, and the number drops significantly from Leads to MQLs to SALs. He was kind enough to share actual current numbers from the past week, which gave us an example to set realistic expectations.

The next speaker was Alexis Richardson, the founder of RabbitMQ who went on to found Weaveworks, a startup focused on cloud native development tooling. Unfortunately, despite generating a lot of revenue (numbers which he shared) they ended up having to close their doors earlier this year. It was the most brutally honest presentation of the conference. One thing I found interesting was that his company did both an enterprise product and a SaaS product, and they decided to focus on the enterprise product. This was a bit of a departure from my current belief that managed open source is the best option for open source startups.

Next up was Matt Barker. Matt is in sales and he started us off with a story about going from England to the US to sell magazine subscriptions door to door. That is something I could never do, but it turns out he was good at it.

Plus he referenced Crossing the Chasm, which is probably my favorite business book of all time. Anyone who references that must be cool (granted, I got to talk to Matt a lot after his talk and he is cool, especially when sharing some of his experiences at Canonical where his time overlapped with my friend Jono). A lot of the things he learned selling magazines can be applied to selling open source products.

Continuing the sales theme, Peter Zaitsev returned with a talk about how you can incentivize sales teams.

People often focus on the engineering and product management when it comes to open source businesses, but sales plays an important role, too. In Peter’s experience the best solution is to offer salespeople a commission-based compensation (or “eat what you kill”) and make it unlimited. If they are successful they could end up getting paid more than the CEO, but this can be a good thing for the company.

Garima Kapoor, co-founder of MinIO, closed out the main talks with a discussion about how to convert widespread open source adoption into financial success for the company behind the project. She covered many topics but one line on a slide caught my eye concerning per-incident support. I constantly had to explain to our customers why we didn’t provide it and I was hoping to follow up with her but never got the chance.

Like the previous day, the final morning hour was reserved for lightning talks. First up was David Aronchick.

David founded Expanso, a company that commercializes the Bacalhau project. Bacalhau is a Portuguese seafood dish consisting of salted cod, and that project implements a Compute over Data (CoD) solution. Get it? Expanso is Portuguese for distributed, or so I was told. He talked about how one can drive community engagement through management of the product cycle.

Julien Coupey followed with a talk about how expertise in an open source project is something that can be commercialized.

At my old company I used to compare us to plumbers. There are a lot of things I can do around my farm, woodworking and electrical come to mind, but I suck at plumbing. I end up wet and frustrated. So even though I can buy the same pipes and fittings as a professional, I call someone because overall it will save me time and money. In much the same way, contracting with the people who actually make the open source software you use can save you time and money, since chances are they have the experience to solve your issues more quickly than you could on your own.

The next speaker was Taco Potze, from OpenSocial.

Taco’s company interacts a lot with non-profits, other charities and governments. These organizations represent unique challenges when it comes to the sales cycle. He shared some of his experiences in this field.

Continue with that theme, Yann Lechelle discussed developing a commercial enterprise around Scikit-learn, a publicly sponsored open-source library for Machine Learning.

He talked about the challenges in developing and funding a mission-driven organization around an open source project while still remaining true to its open source nature.

The final speaker was Eyal Bukchin, who talked about using an open source minimal viable product (MVP) to start a business.

I think open source provides a great way to demonstrate the value and demand for a software solution. By working on mirrord, he was able to secure funding for MetalBear, his business aimed at making the lives of backend developers easier.

At lunch I introduced myself to Greg Austic, who works on open source software for farms.

He had on a T-shirt from upstate New York, but I learned he is actually in the Minneapolis-St. Paul area. When he asked me where I was from the conversation went a little like this:

Greg: Where are you located?
Tarus: Oh, I live in North Carolina.
Greg: Where in North Carolina?
Tarus: Central North Carolina.
Greg: Where in central North Carolina? I spent five years in Pittsboro.
Tarus: I used to live in Pittsboro.

Turns out we know a lot of the same people, yet we didn’t meet until we both went to France.

After lunch we broke out into more workshops. I can’t stress enough how I liked the format of this conference.

Unfortunately, while Monday was a holiday in the US, Tuesday wasn’t, and being six hours ahead of the New York time zone I ended up leaving a bit early to deal with some meetings I couldn’t miss.

In case this hasn’t come out in this post, I really enjoyed this conference. A lot. It was very eye-opening. There were companies that embraced VC and did well, and others that embraced VC and did poorly. Some were steadfastly against outside investment, preferring to grow organically. Some focused on the public sector, some focused on the enterprise and some focused on managed services. Everyone had a story to tell and something to add, and I eagerly await next year’s event.

The one challenge will be to maintain the intimacy and honesty of this first conference as it grows. It should be a must attend event for open source founders and anyone seriously interested in forming an open source company.

Computer Nostalgia

Posted on April 23, 2024 by Tarus

Two stories this week caught me up in a little bit of old computer nostalgia. The first was that chip manufacturer Zilog was ending production of the Z80 CPU chip, and the second was that NASA managed to restore communications with the Voyager 1 spacecraft.

Now to be honest I didn’t know that Zilog still made the Z80, and I was impressed it had such a long run. My very first computer was a Radio Shack TRS-80 that I got for Christmas in 1978, when I was 12 years old. At a price of $599 (about $3k today) it was an expensive present, but my father saw it as an investment in my future. I spent a lot of time on that machine, and I thought it was cool that “TRS” were the consonants in my first name.

Compared to the simplest computers in use today it is a dinosaur. The Z80 had a 1.78MHz clock (that’s “mega” hertz) and could only address 64kB of memory. The operating system and programming language (BASIC) were stored in ROM, leaving the rest for RAM. My initial machine came with 4kB of RAM and in March of 1979 was when I wrote my first program that wouldn’t fit under that limit.

I upgraded it to “Level II” ROM (which ate up 16kB) and 48kB of RAM to max out the machine, and I eventually added two floppy disk drives.

Back in the day finding information on computers, especially in my rural North Carolina town, was difficult. I got a lot of it from subscribing to computer magazines, which is how I learned about assembly. I couldn’t afford an actual assembler so I would hand code it into the system by typing in pairs of hexadecimal digits. Now what ties this whole experience to Voyager 1 is that what the team at NASA did to restore communications was similar to things I did with the TRS-80.

Another thing I couldn’t afford was a printer. Dot matrix printers at the time ran about $2k, or about $9500 in today’s currency (you could buy a new Toyota Corolla for less than $4k). My Dad, however, worked for General Electric and they had just discontinued the TermiNet 300 – a beast of a machine – and he managed to get one cheap. The way it printed was crazy. It had a bank of 118 “hammers” that faced the paper. In between the hammers and the paper was a band containing little metal “fingers” oriented vertically, and on each finger was one of the characters the machine could print. The band rotated around at high speed, and when the machine wanted to produce a character, the ink ribbon would pop up and a hammer would hit the specific finger for that character as it flew by. That it could do this at 30 characters a second was amazing, and loud.

The problem was that it didn’t have a parallel interface and was instead serial, so I had to buy an RS-232C board and write a printer driver. It was then that I kind of fell in love with the Z80 instruction set. Note that I had to come up with the instruction, convert it to the proper hex code and then “POKE” it into memory (I could then “PEEK” to make sure it worked). Luckily printer drivers aren’t that complicated (take a byte as an input, send it to the interface as an output). It worked, and I made a lot of extra money typing in term papers for other students (the advantage of the TermiNet was that it was typewriter quality output).

I can only imagine the NASA engineers sitting around doing the same thing to fix Voyager (hand assembling code, not typing term papers).

Modern computing today is so abstracted from what I learned. Now you can just ask GenAI to write a program for you. I’m not one of those old guys who thinks what I went through was better, but it was nice to see some of those techniques prove useful today.

And I should point out that Voyager 2 is working just fine, which goes to show that you should never buy the first release of a technology product (grin).

2024 FOSS Backstage

Posted on March 18, 2024March 18, 2024 by Tarus

I was a speaker at this year’s FOSS Backstage conference, which was held 4-5 March in Berlin, Germany. FOSS Backstage is gathering dedicated to the “non-code” aspects of open source, and I was surprised I had not heard of it before this year. This was the sixth edition of the event, and it was held in a new location due to growing attendance.

TL;DR; I really, really enjoyed this conference, to the point where it is in contention to be my favorite one of the year. The attendees were knowledgeable and friendly, the conference was well run, and it was not so big that I felt I was missing out due to there being too many tracks. I hope I am able to attend again next year.

This was my first time in Berlin, and although I have been to Germany on numerous other trips, for some reason I have never made it to this historic city. It does have a reputation for being a center for hacker culture in Europe, hosting the annual Chaos Communication Congress, and several of my friends who were at FOSS Backstage told me they were in Berlin quite often.

The event was held at a co-working space and we had access to a lobby, a large auditorium, and then downstairs there were two smaller rooms: Wintergarten and a “workshop” room that was used mainly for remote speakers. Each day started off with a keynote in the auditorium followed by breakout sessions of two or three tracks across the three available rooms.

Monday’s keynote (video) was by Melanie Rieback, who is the CEO and Co-founder of a computer security company called Radically Open Security, a not-for-profit security consulting firm. Her company donates 90% of net profits to charity, and they openly share the tools and techniques they use so that their clients can learn to audit their security on their own.

As someone who spends way too much time focused on open source business models, it was encouraging to see a company like Radically Open Security succeed and thrive. But I wasn’t sure I bought in to the premise that all open source companies should be like hers. Consulting firms have a particular business model that is similar to those used by accounting firms, management firms or other service firms such as plumbers or HVAC repair. Software companies have a much different model. For example, I am writing this using WordPress. I didn’t have to pay someone to show me how to use it. If WordPress wants to continue to produce software they need to make money in a different fashion, such as in their case with hosting services and running a marketplace. Those products require capital to create, and since that often can’t be bootstrapped, this means they have investors, investors who will one day expect a return.

Now it is easy to find examples of where investors, specifically venture capitalists, did bad things, but we can’t rule out the model entirely. If you use a Linux desktop, most likely you are using software that companies like Red Hat and Canonical helped to create. Both of those companies are for-profit and have (or in the case of Red Hat, had) investors. The Linux desktop would simply not exist in its current form without them.

The keynote did, however, make me think, which is one of the main reasons I come to conferences.

[Note: I used WordPress as an example because it was handy, and we can discuss the current concerns about selling data for GenAI use another time (grin)]

After the keynote the breakout sessions started, and I headed downstairs to hear Dawn Foster talk about understanding the viability of open source projects (video). Dr. Foster is the Director of Data Science for CHAOSS, a Linux Foundation project for Community Health Analytics in Open Source Software.

Open source viability isn’t something a lot of people think about. Many of us just kind of assume that a piece of open source software will just always be there and always be kept up to date and secure. This can be a dangerous assumption, as illustrated by a famous xkcd comic that I saw no less than three times during the two day conference.

In my $JOB we often use data analytics to examine the health of a project. In addition to metrics such as number of releases, bugs and pull requests, we also look at something called the Pony Factor and the Elephant Factor.

I’ve been using the term Pony Factor for two years now and while I can trace its origin I’m not sure how it got its name. To calculate it, simply rank all the contributors to an open source project by the number of contributions (PRs, lines of code, whatever you think is best) and then start counting from the largest to smallest until you get to 50% of all contributions (usually over a period of time). For example, if for a given month you have 20 contributions and the largest contributor was responsible for 6 and the second largest for 5 you would have a Pony Factor of 2, since the sum of 6 and 5 is 11 which is more than 50% of 20. It is similar to the Bus Factor, which is a little more grisly in that it counts the number of contributors who could get hit by a bus before the project becomes non-viable. People leave open source projects for a number of reasons (and I am thankful that it is rarely of the “hit by bus” type) and if you depend on that project you have a vested interest in its health.

The Elephant Factor is similar, except you count the number of organizations that contribute to a project. In the example above, if the two contributors both worked for the same company, then the project’s Elephant Factor would be 1 (the number of organizations responsible for at least 50% of the project’s contributions). While we often assume that open source software is a pure meritocracy based on the community that creates it, a low Elephant Factor means that the project is controlled by a small number of parties. This isn’t intrinsically a bad thing, but it could result in the interests of that organization(s) outweigh the interests of the project as a whole.

This was just part of what Dawn covered, as there are other metrics to consider, and she didn’t go into detail about what you can do when open source projects that are important to you have low Pony/Elephant factors, but I found the presentation very interesting.

As a nice segue from Dawn’s talk, the next presentation I attended was on how to change the governance model of an existing open source project (video), given by Ruth Cheesley. Ruth is the project lead for Mautic, and they faced an issue when their project, which had an Elephant Factor of 1, found out that the company behind it was no longer going to support development.

Now I want to admit upfront that I will get some of the details of this story wrong, but it is my understanding that Mautic, which does marketing automation, was originally sponsored by Acquia, the company behind the Drupal content management system and other projects. When Acquia decided to step back from the project, those involved had to either pivot to a different governance model or it would die.

There is the myth about open source that simply releasing code under an OSI-approved license means that hundreds of qualified people will give up their nights and weekends to work on it. Creating a sustainable open source community takes a lot of effort, and one of the main tools for building that community is the governance model. No one wants to be a part of a community where they feel their contributions aren’t appreciated and their opinions are not heard, and fewer still want to work in an environment that can be overly aggressive or even hostile. Ruth talked about the path that her project took and how it directly impacted the success of Mautic.

The next session I attended was a panel discussion on open source being used in the transportation, energy, automotive and public sector realms (video). I’m not a big fan of panel discussions, and I was also surprised to see that all the panelists were men (making this a “manel”). FOSS Backstage did a really good job of promoting diversity in other aspects of the conference (and the moderator was female, but I don’t think that avoids the “manel” definition). It was cool to add another data point that “open source is everywhere” and it was interesting to see where each of the panelists were in there “open source journey”. A couple of them seem to have a good understanding of what it gets them but some others were obviously in the “what the heck have we gotten ourselves in to” phase. I did get introduced to Wolfgang Gehring for the first time. Wolfgang works for Mercedes, and I’m an unabashed Mercedes fan. I’ve owned at least one car from most of the humanly affordable brands in my lifetime, and six of them have been Mercedes (I’ve also owned four Ford trucks). While Wolfgang obviously knows his stuff when it comes to open source, I don’t think he can score me F1 tickets. (sniff)

After the lunch break I also revisited Wolfgang and his presentation on Inner Source (video). Most people understand that open source is code that you can see, modify and share. Most open source licenses are based on copyright law, so they don’t apply until you actually “share” the code with a third party. What happens if you want to use open source solely within an organization? In the case of large organizations you might have a number of disparate groups all working on similar projects, and there can be advantages to organizing them. Not only can they share code, they can also share experiences and build a community, albeit an internal one, to maximize the value of the open source solutions they use.

The next three sessions I attended represented kind of a “mini” AWS track. The first one featured Spot Callaway. Spot is something of a savant when it comes to thinking up fun ways to get people involved in open source communities. I’ve known Spot for over two decades and I’ve worked on his team for the last two years, and it is amazing to watch his mind at work. His talk charted a history of his involvement in coming up with cool and weird ways to engage with people in open source (video), and I was around for some of his efforts so I can attest to its effectiveness (and that it was, indeed, fun).

The second session was by Kyle Davis. I had only met him once before seeing him again in Berlin, and this was the first time I’d seen him speak. His topic was on the importance of how you write when it comes open source projects (video). Now, AWS is very much a document-driven culture, and my own writing style has changed in the two years I’ve been there (goodbye passive voice). Kyle’s presentation talked about considerations you should make when communicating to an open source community. Realizing that your information may be read by people from different cultures and where, say, English isn’t a first language can go a long way toward making your project feel more inclusive.

Rich Bowen presented the third session, and he discussed how to talk to management about open source (video). My favorite part of this talk is when he posted a disclaimer that while many managers don’t understand open source, his does (our boss, David Nalley, is currently the President of the Apache Software Foundation).

I made up this graphic which I will use every time I need a disclaimer in the future.

The last session presenter was Carmen Delgado who talked about getting new contributors involved in open source (video). She examined three such programs: Outreachy, Google Summer of Code, and CANOSP and compared and contrasted these three different “flavors” of programs to encourage open source involvement.

Monday’s presentations ended with a set of “lightning” talks. I’ve always wanted to do one of these – a short talk of no more than five minutes in length (there was a timer) but my friends point out that I can’t introduce myself in five minutes much less give a whole, useful talk.

Two talks stood out for me. In the first one the speaker brought her young daughter (in a Pokémon top, ‘natch) and it really made me glad to see people getting into open source at a young age.

I also liked Miriam Seyffarth’s presentation on the Open Source Business Alliance. I was happy to see both MariaDB and the fine folks at Netways are involved.

Tuesday started with a remote keynote (video) given by Giacomo Tenaglia from CERN. As a physics nerd I’ve always wanted to visit CERN. I know a few people who work there but I have not been able to schedule a trip. I was surprised to learn that the CERN Open Source Programs Office (OSPO) is less than a year old. Considering the sheer amount of open source software used by academia and research I would have expected it to be much older.

The next talk was definitely the worst of the bunch (video), but I had to attend since I was giving it (grin). As this blog can attest, I’ve been working in open source for a long time, and I’ve spent way too much of it thinking about open source business models. There are a number of them, but my talk comes to the conclusion that the best way to create a vibrant open source business is to offer a managed (hosted) version of the software you create. I’ve found that when it comes to open source, people are willing to pay for three things: simplicity, security and stability. If you can offer a service that makes open source easier to use, more secure and/or more stable, you can create a business that can scale.

I took a picture just before I started and I was humbled that by the time I was finished the room was packed. Attention is in great demand these days I and really appreciated folks giving me theirs.

I then attended a talk by Celeste Horgan on growing a maintainer community (video). While I have written computer code I do not consider myself a developer, yet I felt that I was able to bring value to the open source project I worked on for decades. This session covered how to get non-coders to contribute and how to manage a project as it grows.

Brian Proffitt gave a talk that I found very interesting to my current role on the difficulties of measuring the return on investment (ROI) for being involved in open source events (video). While I almost always assume engagement with open source communities will generate positive value, how do you put a dollar figure on it? For example, event sponsorship usually gets you a booth space in whatever exhibit hall or expo the event is hosting. When I was at OpenNMS we would sometimes decline the booth because while we wanted to financially sponsor the conference, we couldn’t afford to do that and host a booth. There are a lot of tangible expenses associated with booth duty, such as swag and the travel expenses for the people working in it, as well as intangible expenses such as opportunity cost. For commercial enterprises attendance at an event can be measured in things like how many orders or leads were generated. That doesn’t usually happen at open source events. It turns out that it isn’t an easy question to answer but Brian had some suggestions.

For most of the conference there were two “in person” tracks and one remote track. The only remote track I attended was a talk given by Ashley Sametz comparing outlaw biker gangs to open source communities. Ashley is amazing (she used to work on our team before pursuing a career in law enforcement) and I really enjoyed her talk. Both communities are tightly knit, have their own jargon and different ways of attracting people to the group.

While I wouldn’t have called them part of an “outlaw motorcycle gang”, many years ago I got to meet a bunch of people in a motorcycle club. It was just before Daytona Bike Week and a lot of people were riding down to Florida. North Carolina is a good stopping point about midway into the trip. While it was explained to me and my friend David that “Mama is having a few folks over for a cookout, you two should come” we were a little surprised to find out that all those bikes we saw on the way there were also coming. There were at least 100 bikes, many cars, and one cab from a semi-tractor trailer. It was amazing. If you have ever seen the first part of the movie Mask that is just what it was like. And yes I could rock the John Oates perm back then.

That session ended up being the last session I attended that day. I spent the rest of the conference in the hallway track and got to meet a lot of really interesting people.

That evening I did manage to get Jägerschnitzel, which was on my list of things to do while in Germany. Missed the Döner kebab, however.

I found FOSS Backstage to be well worth attending. I wish I’d known about it earlier, so perhaps this post will get more people interested in attending next year. Open source is so much more than code and it was nice to see an event focused on the non-code aspects of it.

Using rclone to Sync Data to the Cloud

Posted on January 23, 2024January 24, 2024 by Tarus

I am working hard to digitize my life. Last year I moved for the first time in 24 years and I realized I have way too much stuff. A lot of that stuff was paper, in the form of books and files, so I’ve been busy trying to get digital copies of all of it. Also, a lot of my my life was already digital. I have e-mails starting in 1998 and a lot of my pictures were taken with a digital camera.

TL;DR; This is a tutorial for using the open source rclone command line tool to securely synchronize files to a cloud storage provider, in this case Backblaze. It is based on MacOS but should work in a similar fashion on other operating systems.

That brings up the issue of backups. A friend of mine was the victim of a home robbery, and while they took a number of expensive things the most expensive was his archive of photos. It was irreplaceable. This has made me paranoid about backing up my data. I have about 500GB of must save data and around 7TB of “would be nice” to save data.

At my old house the best option I had for network access was DSL. It was usable for downstream but upstream was limited to about 640kbps. At that rate I might be able to backup my data – once.

I can remember in college we were given a test question about moving a large amount of data across the United States. The best answer was to put a physical drive in a FedEx box and overnight it there. So in that vein my backup strategy was to buy three Western Digital MyBooks. I created a script to rsync my data to the external drives. One I kept in a fire safe at the house. It wasn’t guaranteed to survive a hot fire in there (paper requires a much higher temperature to burn) but there was always a chance it might depending on where the fire was hottest. I took the other two drives and stored one at my father’s house and the other at a friend’s house. Periodically I’d take out the drive from the safe, rsync it, and switch it with one of the remote drives. I’d then rsync that drive and put it back in the safe.

It didn’t keep my data perfectly current, but it would mitigate any major loss.

At my new house I have gigabit fiber. It has synchronous upload and download speeds so my ability to upload data is much, much better. I figured it was time to choose a cloud storage provider and set up a much more robust way of backing up my data.

I should stress that when I use the term “backup” I really mean “sync”. I run MacOS and I use the built-in Time Machine app for backups. The term “backup” in this case means keeping multiple copies of files, so not only is your data safe, if you happen to screw up a file you can go back and get a previous version.

Since my offsite “backup” strategy is just about dealing with a catastrophic data loss, I don’t care about multiple versions of files. I’m happy just having the latest one available in case I need to retrieve it. So it is more of synchronizing my current data with the remote copy.

The first thing I had to do was choose a cloud storage provider. Now as my three readers already know I am not a smart person, but I surround myself with people who are. I asked around and several people recommended Backblaze, so I decided to start out with that service.

Now I am also a little paranoid about privacy, so anything I send to the cloud I want to be encrypted. Furthermore, I want to be in control of the encryption keys. Backblaze can encrypt your data but they help you manage the keys, and while I think that is fine for many people it isn’t for me.

I went in search of a solution that both supported Backblaze and contained strong encryption. I have a Synology NAS which contains an application called “Cloud Sync” and while that did both things I wasn’t happy that while the body of the file was encrypted, the file names were not. If someone came across a file called WhereIBuriedTheMoney.txt it could raise some eyebrows and bring unwanted attention. (grin)

Open source to the rescue. In trying to find a solution I came across rclone, an MIT licensed command-line tool that lets you copy and sync data to a large number of cloud providers, including Backblaze. Furthermore, it is installable on MacOS using the very awesome Homebrew project, so getting it on my Mac was as easy as

$ brew install rclone

However, like most open source tools, free software does not mean free solution, so I did have a small learning curve to climb. I wanted to share what I learned in case others find it useful.

Once rclone is installed it needs to be configured. Run

$ rclone config

to access a script to help with that. In rclone syntax a cloud provider, or a particular bucket at a cloud provider, is called a “remote”. When you run the configurator for the first time you’ll get the following menu:

No remotes found, make a new one?
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n

Select “n” to set up a new remote, and it will ask you to name it. Choose something descriptive but keep in mind you will use this on the command line so you may want to choose something that isn’t very long.

Enter name for new remote.
name> BBBackup

The next option in the configurator will ask you to choose your cloud storage provider. Many are specific commercial providers, such as Backblaze B2, Amazon S3, and Proton Drive, but some are generic, such as Samba (SMB) and WebDav.

Option Storage.
Type of storage to configure.
Choose a number from below, or type in your own value.
 1 / 1Fichier
   \ (fichier)
 2 / Akamai NetStorage
   \ (netstorage)
 3 / Alias for an existing remote
   \ (alias)
 4 / Amazon Drive
   \ (amazon cloud drive)
 5 / Amazon S3 Compliant Storage Providers including AWS, Alibaba, ArvanCloud, Ceph, ChinaMobile, Cloudflare, DigitalOcean, Dreamhost, GCS, HuaweiOBS, IBMCOS, IDrive, IONOS, LyveCloud, Leviia, Liara, Linode, Minio, Netease, Petabox, RackCorp, Rclone, Scaleway, SeaweedFS, StackPath, Storj, Synology, TencentCOS, Wasabi, Qiniu and others
   \ (s3)
 6 / Backblaze B2
   \ (b2)

...

I chose “6” for Backblaze.

At this point in time you’ll need to set up the storage on the provider side, and then access it using an application key.

Log in to your Backblaze account. If you want to try it out note that you don’t need any kind of credit card to get started. They will limit you to 10GB (and I don’t know how long it stays around) but if you want to play with it before deciding just remember you can.

Go to Buckets in the menu and click on Create a Bucket

Note that you can choose to have Backblaze encrypt your data, but since I’m going to do that with rclone I left it disabled.

Once you have your bucket you need to create an application key. Click on Application Keys in the menu and choose Add a New Application Key.

Now one annoying issue with Backblaze is that all buckets have to be unique in the entire system, so “rcloneBucket” and “Media1” etc have already been taken. Since I’m just using this as an example it was fine for the screenshot, but note that when I add an application key I usually limit it to a particular bucket. When you click on the dropdown it will list available buckets.

Once you create a new key, Backblaze will display the keyID, the keyName and the applicationKey values on the screen. Copy them somewhere safe because you won’t be able to get them back. If you lose them you can always create a new key, but you can’t modify a key once it has been created.

Now with your new keyID, return to the rclone configuration:

Option account.
Account ID or Application Key ID.
Enter a value.
account> xxxxxxxxxxxxxxxxxxxxxxxx

Option key.
Application Key.
Enter a value.
key> xxxxxxxxxxxxxxxxxxxxxxxxxx

This will allow rclone to connect to the remote cloud storage. Finally, rclone will ask you a couple of questions. I just choose the defaults:

Option hard_delete.
Permanently delete files on remote removal, otherwise hide files.
Enter a boolean value (true or false). Press Enter for the default (false).
hard_delete>

Edit advanced config?
y) Yes
n) No (default)
y/n>

The one last step is to confirm your remote configuration. Note that you can always go back and change it if you want, later.

Configuration complete.
Options:
- type: b2
- account: xxxxxxxxxxxxxxxxxxxxxx
- key: xxxxxxxxxxxxxxxxxxxxxxxxxx
Keep this "BBBackup" remote?
y) Yes this is OK (default)
e) Edit this remote
d) Delete this remote
y/e/d> y

Current remotes:

Name                 Type
====                 ====
BBBackup             b2

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> q

At this point in time, quit out of the configurator for a moment.

You may have realized that we have done nothing with respect to encryption. That is because we need to add a wrapper service around our Backblaze remote to make this work (this is that there learning curve thing I mentioned earlier).

While I don’t know if this is true or not, it was recommended that you not put encrypted files in the root of your bucket. I can’t really see why it would hurt, but just in case we should put a folder in the bucket at which we can then point the encrypted remote. With Backblaze you can use the webUI or you can just use rclone. I recommend the latter since it is a good test to make sure everything is working. On the command line type:

$ rclone mkdir BBBackup:rcloneBackup/Backup

2024/01/23 14:13:25 NOTICE: B2 bucket rcloneBackup path Backup: Warning: running mkdir on a remote which can't have empty directories does nothing

To test that it worked you can look at the WebUI and click on Browse Files, or you can test it from the command line as well:

$ rclone lsf BBBackup:rcloneBackup/
Backup/

Another little annoying thing about Backblaze is that the File Browser in the webUI isn’t in real time, so if you do choose that method note that it may take several minutes for the directory (and later any files you send) to show up.

Okay, now we just have one more step. We have to create the encrypted remote, so go back into the configurator:

$ rclone config

Current remotes:

Name                 Type
====                 ====
BBBackup             b2

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> n

Enter name for new remote.
name> crypt

Just like last time, chose a name that you will be comfortable typing on the command line. This is the main remote you will be using with rclone from here on out. Next we have to choose the storage type:

Option Storage.
Type of storage to configure.
Choose a number from below, or type in your own value.
 1 / 1Fichier
   \ (fichier)
 2 / Akamai NetStorage
   \ (netstorage)

...

14 / Encrypt/Decrypt a remote
   \ (crypt)
15 / Enterprise File Fabric
   \ (filefabric)
16 / FTP
   \ (ftp)
17 / Google Cloud Storage (this is not Google Drive)
   \ (google cloud storage)
18 / Google Drive
   \ (drive)

...

Storage> crypt

You can type the number (currently 14) or just type “crypt” to choose this storage type. Next we have to point this new remote at the first one we created:

Option remote.
Remote to encrypt/decrypt.
Normally should contain a ':' and a path, e.g. "myremote:path/to/dir",
"myremote:bucket" or maybe "myremote:" (not recommended).
Enter a value.
remote> BBBackup:rcloneBackup/Backup

Note that it contains the name of the remote (BBBackup), the name of the bucket (rcloneBackup), and the name of the directory we created (Backup). Now for the fun part:

Option filename_encryption.
How to encrypt the filenames.
Choose a number from below, or type in your own string value.
Press Enter for the default (standard).
   / Encrypt the filenames.
 1 | See the docs for the details.
   \ (standard)
 2 / Very simple filename obfuscation.
   \ (obfuscate)
   / Don't encrypt the file names.
 3 | Adds a ".bin", or "suffix" extension only.
   \ (off)
filename_encryption>

This is the bit where you get to solve the filename problem I mentioned above. I always choose the default, which is “standard”. Next you get to encrypt the directory names as well:

Option directory_name_encryption.
Option to either encrypt directory names or leave them intact.
NB If filename_encryption is "off" then this option will do nothing.
Choose a number from below, or type in your own boolean value (true or false).
Press Enter for the default (true).
 1 / Encrypt directory names.
   \ (true)
 2 / Don't encrypt directory names, leave them intact.
   \ (false)
directory_name_encryption>

I choose the default of “true” here as well. Look, I don’t expect to ever become the subject of an in-depth digital forensics investigation, but the less information out there the better. Should Backblaze ever get a subpoena to let someone browse through my files on their system, I want to minimize what they can find.

Finally, we have to choose a passphrase:

Option password.
Password or pass phrase for encryption.
Choose an alternative below.
y) Yes, type in my own password
g) Generate random password
y/g> y
Enter the password:
password:
Confirm the password:
password:

Option password2.
Password or pass phrase for salt.
Optional but recommended.
Should be different to the previous password.
Choose an alternative below. Press Enter for the default (n).
y) Yes, type in my own password
g) Generate random password
n) No, leave this optional password blank (default)
y/g/n>

Now, unlike your application key ID and password, these passwords you need to remember. If you loose them then you will not be able to get access to your data. I did not choose a salt password but it does appear to be recommended. Now we are almost done:

Edit advanced config?
y) Yes
n) No (default)
y/n>

Configuration complete.
Options:
- type: crypt
- remote: BBBackup:rcloneBackup/Backup
- password: *** ENCRYPTED ***
Keep this "cryptMedia" remote?
y) Yes this is OK (default)
e) Edit this remote
d) Delete this remote
y/e/d> y

Now your remote is ready to use. Note that when using a remote with encrypted files and directories do not use the Backblaze webUI to create folders underneath your root or rclone won’t recognize them.

I bring this up because there is one frustrating thing with rclone. If I want to copy a directory to the cloud storage remote it copies the contents of the directory and not the directory itself. For example, if I type on the command line:

$ cp -r Music /Media

it will create a “Music” directory under the “Media” directory. But if I type:

$ rclone copy Music crypt:Media

it will copy the contents of the Music directory into the root of the Media directory. To get the outcome I want I need to run:

$ rclone mkdir crypt:Media/Music

$ rclone copy Music crypt:Media/Music

Make sense?

While rclone has a lot of commands, the ones I have used are “mkdir” and “rmdir” (just like on a regular command line) and “copy” and “sync”. I use “copy” for the initial transfer and then “sync” for subsequent updates.

Now all I have to do for cloud synchronization is set up a crontab to run these commands on occasion (I set mine up for once a day).

I can check that the encryption is working by using the Backblaze webUI. First I see the folder I created to hold my encrypted files:

But the directories in that folder have names that sound like I’m trying to summon Cthulhu:

As you can see from this graph, I was real eager to upload stuff when I got this working:

and on the first day I sent up nearly 400GB of files. Backblaze B2 pricing is currently $6/TB/month, and this seems about right:

I have since doubled my storage so it should run about 20 cents a day. Note that downloading your data is free up to three times the amount of data stored. In other words, you could download all of the data you have in B2 three times in a given month and not incur fees. Since I am using this simply for catastrophic data recovery I shouldn’t have to worry about egress fees.

I am absolutely delighted to have this working and extremely impressed with rclone. For my needs open source once again outshines commercial offerings. And remember if you have other preferences for cloud storage providers you have a large range of choices, and the installation should be similar to the one I did here.

2023 Percona Live – Day 1

Posted on May 24, 2023 by Tarus

The first day of sessions at Percona Live saw me recovered from the food poisoning I experienced on Monday. It was a miserable experience but I’m happy that it didn’t last very long.

Whenever I go to a conference I always like the opening keynotes as they tend to set the tone for the rest of the event. The room in which the keynotes were held was dominated by a large screen featuring the new Percona logo.

The show was opened by Dave Stokes who, like me, is a technology evangelist.

He welcomed us all to the conference and covered the usual housekeeping notes before turning the stage over to Ann Schlemmer, who is the new CEO of Percona.

Schlemmer took over as CEO from founder Peter Zaitsev last autumn, and she seems to have settled into her new role pretty well.

One of the topics she covered was the new Percona Logo.

While I can’t do the description justice, it represents mountains which refers to both the bedrock on which Percona solutions are built as well as the challenges people sometimes have to overcome when working in IT (think climbing the mountain). The sun represents the shining of light into dark places as well kind of looking like a “P” (while the mountains themselves look like the “A” in the name).

At least that is what I took away from it. (grin)

I asked her later if they designed it in-house or if they hired an outside firm and she told me they did it themselves. Either way I like it and think they did a good job.

She was followed by Peter Zaitsev, one of the two Founders of Percona.

I first met Peter at this year’s FOSDEM back in February. When I found out he lived near me I invited him to lunch and we had a great discussion of open source business models and open source in general. As someone who once ran an open source services company, I identify strongly with his business, although he has been more successful than I was.

He is also known for not holding back when he has a strong opinion, and as part of his talk on the state of open source relational databases he leveled some criticism on AWS, who is also my employer.

Note: These thoughts on my personal blog are mine and mine alone, and may or may not align with my employer, Amazon Web Services.

One of the reasons I joined AWS was to take on the challenge of changing both the perception and processes by which Amazon interacts with open source communities. I’m part of a wonderful team and I think we have made progress toward that goal, so while I won’t either agree or disagree with Peter’s statements, my hope is to earn enough trust that there will be no need to have this as a topic in future conversations.

Peter ended his presentation by bringing up Ann and officially passing the torch by gifting her with a dartboard with his face on it, to be used whenever she might feel the need.

It took a couple of tries before the dart stuck, mainly because Peter had kept the dart in his back pocket and forgot to take off the safety cover on the sharp tip.

The next keynote speaker, Rachel Stephens, was new to me, although I’ve known about the company she works for, Redmonk, for a long time.

Redmonk is an analyst firm focused on software development, and she had my attention by basing her presentation on The Princess Bride, one of my favorite movies. It is very quotable, and she had slides like this one:

She also had a slide where she used the term “fauxpen” source:

Back in 2009 I hosted a party in which I was trying to explain open source vs. open core to a non-technical friend of mine. He replied “oh, so it’s fauxpen source”. I immediately registered the domain name (although I no longer own it as it was sold when I sold my company). I did a search back then and I could find no other references to the term, but I’ve seen it a number of times since. I like to think I had some part in popularizing it but it is clever enough that I’m certain others came up with it, too.

After the keynotes the individual sessions began, and since I’m not a DBA a lot of them are over my head. I did go to the one by Jignesh Shah, who is the General Manager of open source databases for Amazon RDS.

Jignesh gave a “state of” talk on the AWS offerings in this space, and also announced that the “trusted language extensions” feature for PostgreSQL that was introduced last autumn now supports Rust.

As I understand it, trusted language extensions give cloud providers a way to allow their users to extend the functionality of the database without introducing security concerns. There is a limit to what languages can be used, however, due to the fact that there may be no way to “sandbox” the extension from being able to access, say, the memory used by the database. The C language was not supported for that reason.

By supporting Rust, this allows end users to create powerful extensions in a language similar to C but with memory protections.

After his talk I spent some time wandering around the sponsor showcase. This is not a large conference, probably around 300 people, and so the “expo” is simply a hallway with booths along one wall. I actually like this because it facilitates easier interactions between attendees and sponsors.

The “premium” sponsors (AWS, Microsoft and Percona) had slightly larger booths on one end of the hallway.

Jignesh has brought along a number of AWS subject matter experts and there was a lot of activity at the booth, as it provided a way for folks to ask and get answers from the people best able to provide them.

One last note on Day 1 is that lunch was an actual buffet and not a boxed lunch that you often get as such conferences.

While I live in North Carolina and have almost sacred opinions on pork barbecue and cornbread, it was pretty good. The only complaint I would make is that the baked beans were not labeled well since, like in the South, they appeared to include small pieces of pork. As many of the attendees are vegetarian it would have been nice to either offer it without meat or make it clear that meat was included.

I was just happy that it was nice and didn’t result in the same issues I experienced after lunch on Monday (grin)

Obligatory 20 Year Blog Post

Posted on February 19, 2023February 19, 2023 by Tarus

Not to misquote the Beatles, but it was 20 years ago today that I posted my first entry to this blog.

By 2003 blogs were pretty popular so I was somewhat late to the game. My friend Ben Reed had a blog that he used kind of like a proto-Twitter where he would post many times during the day on what he was doing, which at the time focused on porting KDE to MacOS. Back then a lot of open source projects used blogs as a communication platform and since I was maintaining an open source project I figured I should start one. He used Moveable Type as his blogging software so I did as well.

Moveable Type was very popular back then, but when they started to move their licensing to a more proprietary model, people were turned off and migrated to WordPress. I find it delightfully ironic that WordPress, which is open source, now forms the basis for around 40% of all websites whereas people have probably never heard of Moveable Type these days.

If there happen to be any younger readers here, blogs twenty years ago were like podcasts today: practically everyone had one. Also like podcasts, most were sporadically updated, which is why Really Simple Syndication (RSS) became important. RSS is a protocol that lets you find out when websites are updated. Using a “news reader” like Google Reader, you could aggregate all the websites you were interested in following into one application. It was pretty cool.

But then along came social media sites and what people used to post on blogs they started posting there instead of on their own sites. Even with a lot of hosting options, running a blog is incrementally harder than posting to, say, Facebook. In 2013 Google killed Reader which pretty much ended blogging (although I still use RSS and find that the open source Nextcloud News is a great Reader replacement).

But I’m old and stubborn so I kept blogging. In fact I think I have something like five or six blogs that I update periodically. I use another blogging technology called a “planet” to aggregate all of those blogs so my three readers can easily keep up with what I’m doing.

Another thing that social media brought about was this idea of engagement. People still look at metrics such as number of followers as an indication of how far a particular post reached, and even when I started this thing folks would brag about their stats. As a contrarian I took the opposite approach and decided that I’d be happy if just three people read my posts. I got a chuckle the first time someone came up to me and said “hey, I’m one of your three readers”. Made the whole thing much more personal.

And to me blogging is personal. I love to write and the best way to become a better writer is to do it. A lot. I really wish I had more time to post but between my job (which involves a lot of writing) and the farm it is hard to find the time. As someone who loves the culture around open source software, sharing is key and I hope some of the stuff I’ve posted here has helped someone else as so many other blogs have helped me.

That’s about it for this update. I would promise that I’ll post more often and with better content in the future but I don’t like to lie (grin), and in any case thanks for reading.

2022 Open Source Summit – Day 3

Posted on June 24, 2022 by Tarus

Thursday at the Open Source Summit started as usual at the keynotes.

Picture of Robin Bender Ginn on stage

Robin Bender Ginn opened today’s session with a brief introduction and then we jumped into the first session by Matt Butcher of Fermyon.

Picture of Matt Butcher on stage

I’ve enjoyed these keynotes so far, but to be honest nothing has made me go “wow!” as much as this presentation by Fermyon. I felt like I was witnessing a paradigm shift in the way we provide services over the network.

To digress quite a bit, I’ve never been happy with the term “cloud”. An anecdotal story is that the cloud got its name from the fact that the Visio icon for the Internet was a cloud (it’s not true) but I’ve always preferred the term “utility computing”. To me cloud services should be similar to other utilities such as electricity and water where you are billed based on how much you use.

Up until this point, however, instead of buying just electricity it has been more like you are borrowing someone else’s generator. You still have to pay for infrastructure.

Enter “serverless“. While there are many definitions of serverless, the idea is that when you are not using a resource your cost should be zero. I like this definition because, of course, there have to be servers somewhere, but under the utility model you shouldn’t be paying for them if you aren’t using them. This is even better than normal utilities because, for example, my electricity bill includes fees for things such as the meter and even if I don’t use a single watt I still have to pay for something.

Getting back to the topic at hand, the main challenge with serverless is how do you spin up a resource fast enough to be responsive to a request without having to expend resources when it is quiescent? Containers can take seconds to initialize and VMs much longer.

Fermyon hopes to address this by applying Webassembly to microservices. Webassembly (Wasm) was created to allow high performance applications, written in languages other than Javascript, to be served via web pages, although as Fermyon went on to demonstrate this is not its only use.

The presentation used a game called Finicky Whiskers to demonstrate the potential. Slats the cat is a very finicky eater. Sometimes she wants beef, sometimes chicken, sometimes fish and sometimes vegetables. When the game starts Slats will show you an icon representing the food they want, and you have to tap or click on the right icon in order to feed it. After a short time, Slats will change her choice and you have to switch icons. You have 30 seconds to feed as many correct treats as possible.

Slide showing infrastructure for Frisky Kittens: 7 microservices, Redis in a container, Nomad cluster on AWS, Fermyon

Okay, so I doubt it will have the same impact on game culture as Doom, but they were able to implement it using only seven microservices, all in Wasm. There is a detailed description on their blog, but I liked that fact that it was language agnostic. For example, the microservice that controls the session was written in Ruby, but the one that keeps track of the tally was written in Rust. The cool part is that these services can be spun up on the order of a millisecond or less and the whole demo runs on three t2.small AWS instances.

This is the first implementation I’ve seen that really delivers on the promise of serverless, and I’m excited to see where it will go. But don’t let me put words into their mouth, as they have a blog post on Fermyon and serverless that explains it better than I could.

Picture of Carl Meadows on stage

The next presentation was on OpenSearch by Carl Meadows, a Director at AWS.

Note: Full disclosure, I am an AWS employee and this post is a personal account that has not been endorsed or reviewed by my employer.

OpenSearch is an open source (Apache 2.0 licensed) set of technologies for storing large amounts of text that can then be searched and visualized in near real time. Its main use case is for making sense of streaming data that you might get from, say, log files or other types of telemetry. It uses the Apache Lucene search engine and latest version is based on Lucene 9.1.

One of the best ways to encourage adoption of an open source solution is by having it integrate with other applications. With OpenSearch this has traditionally been done using plugins, but there is a initiative underway to create an “extension” framework.

Plugins have a number of shortcomings, especially in that they tend to be tightly coupled to a particular version of OpenSearch, so if a new version comes out your existing plugins may not be compatible until they, too, are upgraded. I run into this with a number of applications I use such as Grafana and it can be annoying.

The idea behind extensions is to provide an SDK and API that are much more resistant to changes in OpenSearch so that important integrations are decoupled from the main OpenSearch application. This also provides an extra layer of security as these extensions will be more isolated from the main code.

I found this encouraging. It takes time to build a community around an open source project but one of the best ways to do it is to provide easy methods to get involved and extensions are a step in the right direction. In addition, OpenSearch has decided not to require a Contributor License Agreement (CLA) for contributions. While I have strong opinions on CLAs this should make contributing more welcome for developers who don’t like them.

Picture of Taylor Dolezal on stage

The next speaker was Taylor Dolezal from the Cloud Native Computing Foundation (CNCF). I liked him from the start, mainly because he posted a picture of his dog:

Slide of a white background with the head and sad eyes of a cute black dog

and it looks a lot like one of my dogs:

Picture of the head of my black Doberman named Kali

Outside of having a cool dog, Dolezal has a cool job and talked about building community within the CNCF. Just saying “hey, here’s some open source code” doesn’t mean that qualified people will give up nights and weekends to work on your project, and his experiences can be applied to other projects as well.

The final keynote was from Chris Wright of Red Hat and talked about open source in automobiles.

Picture of Chris Wright on stage

Awhile ago I actually applied for a job with Red Hat to build a community around their automotive vertical (I didn’t get it). I really like cars and I thought that combining that with open source would just be a dream job (plus I wanted the access). We are on the cusp of a sea change with automobiles as the internal combustion engine gives way to electric motors. Almost all manufacturers have announced the end of production for ICEs and electric cars are much more focused on software. Wright showed a quote predicting that automobile companies will need four times the amount of software-focused talent that the need now.

A slide with a quote stating that automobile companies will need more than four times of the software talent they have now

I think this is going to be a challenge, as the automobile industry is locked into 100+ years of “this is the way we’ve always done it”. For example, in many states it is still illegal to sell cars outside of a dealership. When it comes to technology, these companies have recently been focused on locking their customers into high-margin proprietary features (think navigation) and only recently have they realized that they need to be more open, such as supporting Android Auto or CarPlay. As open source has disrupted most other areas of technology, I expect it to do the same for the automobile industry. It is just going to take some time.

I actually found some time to explore a bit of Austin outside the conference venue. Well, to be honest, I went looking for a place to grab lunch and all the restaurants near the hotel were packed, so I decided to walk further out.

Picture of the wide Brazos river from under the Congress Avenue bridge

The Brazos River flows through Austin, and so I decided to take a walk on the paths beside it. The river plays a role in the latest Neal Stephenson novel called Termination Shock. I really enjoyed reading it and, spoiler alert, it does actually have an ending (fans of Stephenson’s work will know what I’m talking about).

I walked under the Congress Avenue bridge, which I learned was home to the largest urban bat colony in the world. I heard mention at the conference of “going to watch the bats” and now I had context.

A sign stating that drones were not permitted to fly near the bat colony under the Congress Avenue bridge

Back at the Sponsor Showcase I made my way over to the Fermyon booth where I spent a lot of time talking with Mikkel Mørk Hegnhøj. When I asked if they had any referenceable customers he laughed, as they have only been around for a very short amount of time. He did tell me that in addition to the cat game they had a project called Bartholomew that is a CMS built on Fermyon and Wasm, and that was what they were using for their own website.

Picture the Fermyon booth with people clustered around

If you think about it, it makes sense, as a web server is, at its heart, a fileserver, and those already run well as a microservice.

They had a couple of devices up so that people could play Finicky Whiskers, and if you got a score of 100 or more you could get a T-shirt. I am trying to simplify my life which includes minimizing the amount of stuff I have, but their T-shirts were so cool I just had to take one when Mikkel offered.

Note that when I got back to my room and actually played the game, I came up short.

A screenshot of my Finicky Whiskers score of 99

The Showcase closed around 4pm and a lot of the sponsors were eager to head out, but air travel disruptions affected a lot of them. I’m staying around until Saturday and so far so good on my flights. I’m happy to be traveling again but I can’t say I’m enjoying this travel anxiety.

[Note: I overcame by habit of sitting toward the back and off to the side so the quality of the speaker pictures has improved greatly.]

Adventures in Open Source

Uncategorized