2024 dbt Labs Coalesce Conference

I traveled for much of the last two months and during that time I went to four conferences. I usually like to write about them when they are happening but I wasn’t able to manage it, so I’m offering them up a little delayed. I’m going to start with the last one, which was the Coalesce conference hosted by dbt Labs.

I’m a big dbt Labs fan, as they create a ton of useful open source software focused mainly on the “T” part of ETL. I went to their conference last year, which was in San Diego, and it was a lot of fun, but I must be honest in that when they announced it would be in Las Vegas this year I was not that excited.

I get my fill of Vegas at the annual AWS re:Invent conference, and while I’ve met a lot of friendly residents in that town it is not a favorite. I don’t often gamble, and I don’t like crowds or rampant commercialism. Vegas does have decent food options.

That said, I rather enjoyed this trip. The conference was held at Resorts World, which is a Hilton-owned property, and I stayed in the Conrad tower. While I have zero status at Hilton I got a nice room at a reasonable price, and it was great to be able to return to it just by taking an elevator. The conference was on the second floor, while the ground floor hosted a number of events held in various restaurants and bars (as well as being the location of the main casino). I didn’t leave the building for over two days.

I arrived Monday mid-afternoon and that was a day set aside for partner meetings. AWS was well represented and Yev Kravchenko and Siva Ragahupathy did a nice presentation on modern data foundations using dbt and AWS.

That evening the expo floor opened up. One nice thing about being in Vegas is they know hospitality, and the snacks and drinks were top notch. Outside of keynotes, wandering around the expo is one of my favorite parts of any conference.

Since AWS had so many people there, we all got together for dinner. I think there were 12 of us at the table. It was nice to spend time old friends and to make some new ones. One thing I love about working at AWS is being around people who are so much smarter than me.

The next morning the conference officially started. My admission included breakfast and lunch, so I went down to the second floor for food. I liked to sit towards the back of the area next to the windows so I could get a view of The Sphere.

I thought it was funny that Resorts World also had a sort of mini-Sphere:

The main theme of the conference was “One dbt“, reflecting the goal of “an integrated, governed, and scalable approach to data analytics”.

dbt Labs has two main products: the open source “dbt Core” and the managed offering “dbt Cloud”. A goal of One dbt is to make it much easier to work with and switch between them. For example, production could run on cloud due to the extra security and reliability but development could be done locally on core.

Tuesday’s keynote introduced us to our hosts for the week: Lesley Greene, Grace Goheen and Alexis Jones.

They welcomed us to the conference and pointed out that there were around 2000 people in attendance and another 8000 or so watching virtually. They also thanked the sponsors, and AWS joined Tableau and Snowflake at the platinum level.

Alexis then introduced the first speaker, co-founder and CEO Tristan Handy.

He talked about the origins of dbt Labs, which began life as Fishtown Analytics (Fishtown is a section of Philadelphia, PA, USA, where the company was founded) with its mission to create a mature workflow system for analytics. This has now grown into something they are calling the “data control plane”:

He also announced “dbt Copilot”, a generative AI product to leverage genAI models to help write code, analyze logs, improve query performance and use natural language to ask business questions.

Tristan then brought out Yannick Misteli, who is the head of engineering at Roche. He talked about their five year journey with dbt because they needed to build a world class analytics platform and dbt “checked all the boxes”.

The next presentation brought back an old Coalesce favorite, The Jaffle Shop. The Jaffle Shop is an example company created to demonstrate practical uses for dbt Labs’ products, and this year they were expanding. Since a lot of their sales occurred at venues, they decided to buy one: Cirque du Jafflé.

Amy Chen walked us through how one would take two discrete organizations, both who were using dbt products, and combine them. In this hypothetical example, The Jaffle Shop used dbt Cloud with data in Amazon Redshift while Cirque du Jafflé used dbt Core over Amazon Athena.

dbt Cloud now supports a dbt to Athena connector, so it is a somewhat trivial exercise to access that data, and Amy did a great job with the example to show how to do it.

The next guest speaker was Tobi Humpert from Siemens.

Amy broke the ice by bringing up the fact that at the Munich location, Siemens uses sheep to keep the lawn at the proper height, versus mowing it. These sheep are removed in the winter and return the following spring, and this was a segue into cross platform lineage (if you had lineage you would know where they went). Of course the talk then moved into more practical for cross-platform data, enabled by dbt Mesh.

Amy then handed the floor over to Greg McKeon to talk about a new visual editor available in dbt Cloud. He did a demo which also included an integration with dbt Copilot, which involved adding tests and documentation to an existing dbt model.

Greg was followed by Roxi Dahlke, who talked about how to maintain high-quality data you can trust during transformation.

This is done through three new dbt features: Advanced CI (Continuous Integration), Trust and Usage Signals (how data is consumed and is it trustworthy), and the Semantic Layer (custom system metrics across all tools). She also did a dbt Copilot demo leveraging the existing semantic layer to allow for natural language queries.

Roxy invited a special guest, James Dorado from Bilt Rewards, who discussed how they were using the semantic layer to present data directly to end customers, which requires even more trust in that data then we you use it internally.

For the final part of the keynote, Tristan returned to discuss how dbt partners were supporting analytics workflows from end-to-end and it featured Dan Jewett from Tableau and Neeraja Rentachintala from AWS. The discussion focused on how the new product announcements were going to improve the customer experience.

Overall I enjoyed the keynote and I saw a real natural progression for dbt as it continued to focus on making analytics workflows more powerful, more relevant to business needs, and easier to use.

Outside of the keynotes, Coalesce provided a lot of other sessions. I don’t directly use dbt so many (most) of them were above my head, but I did attend all of the ones by AWS folks.

The first one was a talk by Darshit Thakkar and BP Yau called “Amazon Athena and dbt: Unlocking serverless data transformations”.

I understood a lot of this, and I know that “serverless” is becoming the preferred way to manage cloud resources, but I did have a problem with the demo. BP likes to use dark mode. I’m not a fan. I grew up reading dark text on a light background so that is my preference, but in this case it made it almost impossible to see the text on the projection screen, so I just took his word for it (grin).

The Wednesday morning keynotes kicked off with a presentation by Allie K. Miller on artificial intelligence. GenAI is such a hot topic in the industry that you can’t go to any conference these days without encountering it. While I’m at the edge of burnout on the topic, I did enjoy her presentation, which included a graphic on the progression of true autonomous analytics.

She also sent a link to a cool use of genAI where it takes a picture and turns it into kind of a squishy play-dough kind of thing, that I found fascinatingly weird.

This was followed by a customer panel lead by Brandon Sweeney, featuring Kayleigh Lavorini from Fifth Third Bank and Srini Vemuru from Salesforce. The discussion involved their thoughts for using AI for their analytics needs and how they leverage dbt.

As my focus is open source, I was drawn to any presentation involving dbt Core, and one of the more amusing ones was given by Grace Goheen. I had met Grace and Jeremy Cohen at one of the casino bars the night before, and they are both theatre nerds (spoken as an ex-theatre nerd myself). Jeremy currently lives in France (and he was the only one to pronounce “Cirque du Jafflé” in the proper accent) and he made a cameo as Elvis (we were in Vegas, after all). I expect some sort of “rat pack” reference next year. (grin)

To be serious for a moment, all of us in the open source community have witnessed established projects making big changes in their licenses over the last couple of years. While I don’t want to comment on that in this post, it is always nice to see folks like dbt confirm that they have no plans to change the license on dbt Core. I’ve heard it from the top, including Tristan and Mark Porter, and in this presentation Grace renewed that vow in the form of a “wedding” officiated by Elvis.

It was a cute and entertaining way of addressing a very serious subject.

There was another session by AWS speakers Wednesday afternoon with the lofty title of “Journey to Generative AI driven near-real-time-operational analytics with zero-ETL and dbt Cloud”. Whew.

This was presented by Neerja Rentachintala (who you may remember from the Tuesday keynote) and Neela Kulkarni. They presented a reference architecture built on Amazon Aurora, Amazon Redshift and dbt Cloud for operational analytics.

Neela did the demo, and I am happy to say she used both light mode and a large typeface so I could see what she was doing. That doesn’t mean that I totally understood what she was doing, but the audience seemed to get it. (grin)

Wednesday night was the Coalesce party, and as I’ve mentioned before Vegas is known for its hospitality industry so it was top notch. Held in one of the nightclubs in Resorts World it featured a DJ, lights, lots of food and drink.

I am not one for large crowds so I didn’t stay very long. In fact, I had been traveling for almost six weeks straight so I ended up finding a flight home that still allowed me to attend Thursday’s keynote. The people I talked to who stayed at the party had a great time, and it seemed like the attendees enjoyed it.

Thursday’s keynote focused on dbt Core and the community, and was hosted by Grace and Jeremy, along with Amada Echeverría presenting the Community Awards.

While a lot of the focus of the keynotes was on dbt Cloud, this one focused on all of the work that is being done in dbt Core, including a feature to unit test dbt models, improvements to snapshots, as well as the the ability to do “microbatches” which allows a user to break up large datasets into smaller chunks for processing.

Then it was time for the Community Awards.

I was extremely happy to see my friends on the dbt-athena adapter get the Trailblazing Innovator award. Last year at Coalesce, my boss David Nalley made a special reference to this team, made up of Jérémy Guiselin, Nicola Corda, Jesse Dobbelaere, Mattia Sappa, and Serhii Dimchenko, who had created an adapter that was so well made it achieved, this year, “trusted” status and, as was mentioned above, is now supported in dbt Cloud as well as dbt Core.

Three of the award winners, Influencer Extraordinaire Opeyemi Fabiyi, Data Governance Excellence winner Jenna Jordan and Catalyst of Impact winner Bruno Souza de Lima, were in attendance and participated in a panel discussion on the rewards and challenges of their work in the community.

The final part of keynote brought our hosts back out to talk about Coalesce 2025, which will be back in Las Vegas on 6-9 October.

As reluctant as I was to consider another Vegas conference, I have to say that the Resorts World location really worked for Coalesce, and I look forward to returning for my third event next year.

[Note: if you are part of an open source startup, or want to start an open source company, and you are coming to re:Invent, be sure to attend this panel featuring Tristan Handy]

The Yearbook Game

This year marks the 40th anniversary of my graduation from the North Carolina School of Science and Mathematics (NCSSM). Attending NCSSM was one of the most formative experiences of my life, but to be honest I have never been that big on the reunions. For some reason I am super eager to go to the 40th.

Since it has been four decades since I’ve seen most of my classmates, I thought it would be cool to come up with a little game. In the US most schools used to produce a “yearbook” (sometimes called an “annual”) which is a large book containing pictures from the school year. While formats can differ, there is always a section for each class with pictures of the people in that class. I thought it would be cool to scan in all of those pictures and then create an online game where you have to match the person’s picture to their name.

The problem is that I do not consider myself a programmer. While I got my first computer in 1978 and did a lot of coding back then in BASIC, FORTRAN, Pascal and a little C when in college, I haven’t programmed seriously in decades. I mean, I can write a simple bash script and I can usually hack something that kind of does what I need into something that does what I need, but I would struggle to write “Hello world!” in any modern language.

TL;DR: We made a yearbook game where you try to match a name to the correct picture using generative AI.

But I don’t need to know how to code if I have access to generative AI, right? (grin)

I have to admit that I’m a little burned out by all the GenAI hype going on lately, but recently my boss pointed out we were in the same boat when cloud computing was first a thing. It was so bad that my friend Spot Callaway did a presentation [download] where the first four minutes was nothing but slides with the word “cloud” on them.

Now I work for a major cloud provider and cloud computing is pretty much everywhere. It doesn’t seem so bad anymore. My boss’s perspective on GenAI really changed my mind and now I had a reason to try and use it.

I am not the sharpest knife in the drawer, but my superpower is that I know a lot of very smart people. One of those is Ricardo Sueiras, and I figured he would be just the person to help with my project. Last Wednesday we set up a time to meet and to create my game.

First I had to digitize the pictures. People need to remember that, back in 1984, pictures involved this thing called “film”. Things we take for granted today were just impossible back then, and I am actually somewhat in awe of the people who put together the yearbook without the help of digital publishing. The pictures for our class photos were taken in color but in the yearbook they were published in black and white, which can wash out details when printed, but I didn’t have access to anything better. I started with our senior year yearbook since the senior class pictures were printed out larger than the junior class one.

After scanning in twelve pages and cropping each person into their own photo, I saved them in the format $NAME.png. Since I suffer from mild OCD, I had to go back to the junior year yearbook to get pictures of people who didn’t have a senior picture, either because they didn’t return for the second year or they missed picture day. When I was done I had a little over 200 pictures.

When I originally talked to Ricardo about the game, I figured we would show a picture and then offer some choices as to who it was. In fact, he went ahead and experimented with creating that game.

But as I thought about it, I figured most of the fun would be in seeing the pictures and it would take awhile to cycle through 200 or more of them, so I changed the game to present a single name and then four pictures as choices.

One other thing I did was create a CSV file containing the student’s name and their gender. Note that back in 1984 people weren’t as publicly gender fluid as they are now, so I went with the gender by which I knew each person. Feel free to gently correct me if I need to swap one, but I thought it would be a little more difficult to guess if a person with a masculine-sounding name was presented with four masculine-looking pictures.

With everything ready I jumped on a call with Ricardo. Now another thing that has changed since I used to program is that most people use an IDE. When I first started writing code I used a line editor, not even a full screen editor, so IDEs are just magic to me. Ricardo uses Visual Studio, and we decided to use Amazon Q (‘natch) which has an easy to use plugin.

A screenshot showing the Visual Studio interface

The first thing we had to do was map the pictures to a new name. If you were asked to match “Tarus Balog” to the right picture, it wouldn’t do to just mouse over all four until you found “Tarus Balog.png”. Not only did Q create a script to rename all of the files by numbers, it updated the CSV file so that the correct file name was now included as a column, so now the file had name, gender and image number.

Ricardo decided to use Python to write the code and to host it in Flask, which is a simple Python web framework. It was surprisingly easy to get the first pass working, but we could never get the match to work. After debugging it, it seems like it was trying to match the correct person in one result to the random result of the next iteration. What’s weird is that once in awhile it would work, if the picture you chose just happened to be the correct guess for the next game, but it was very rare.

We tried a number of prompts to get Q to fix it, but the final solution involved pasting the code into Amazon Bedrock, choosing a different LLM and explaining what was happening. It created a result using a Python session object and, voila!, it worked.

That turned out to be about half of the project. I still wanted to be able to host it on my web server, but I had been lazy about upgrading Ubuntu and the version I was running didn’t support Python 3.10 (which was required for one of the libraries we used). So time to do a production upgrade and deal with those issues. I also spent a little time editing the web page templates and the CSS file (it turns out NCSSM publishes a brand style guide). I also went to find an old NCSSM logo, what we used to call “burning diapers” versus the modern one that looks like a stylized atom, to use a background.

Flask runs by default on localhost port 5000, so I set up a proxy in the Apache web server to make it accessible (I didn’t want to mess with SSL in Flask as I know my way around Apache configurations). But since I run WordPress in the root of https://tarus.io the submit didn’t work when accessed through the proxy, so I ended up creating a subdomain to host it. The finished product is at https://classof1984.tarus.io.

And to save you the time refreshing to find my picture, here it is.

A screenshot of the Yearbook Game showing my name and four choices. I'm the upper right picture when I had a huge "Greg Brady" perm

Yes, my hair was magnificent.

One last note about those pictures. Not everyone in our class managed to live until the reunion. One of my beta testers mentioned that she was a bit triggered by seeing the picture of a friend of hers who had died from cancer. She asked me if I could remove pictures of those people we knew were no longer with us.

We talked about it and I could understand her issue but I had a different reaction. To me seeing pictures of those who had passed made me think of them fondly. Another beta tester said that he would feel like we were “erasing” them if we took them out. So it was decided to leave them in. It was funny to me how much thought needs to go into even the simplest app (well, if you care about not being a d*ck).

The last task was to upload the files to GitHub, which was my first ever repository even though I joined in 2006. We have generative AI to thank for that.

A screenshot of my GitHub page with a "first repository" image

It was a lot of fun and now I’ve downloaded Visual Studio to my own computer and I plan to make even more programs.

Phear.

Fun with Networks

This is a long post about overcoming some challenges I had with a recent network install. It should have been pretty straightforward but instead I hit a couple of speed bumps that caused it to take much longer than I expected.

Last year I moved for the first time in 24 years, and the new place presented some challenges. One big upside is that it has gigabit fiber, which is a massive improvement over my last place.

The last place, however, had a crawl space and I was able to run Ethernet cable pretty much everywhere it was needed. The new place is a post and beam house (think “ski lodge”) which doesn’t lend itself well to pulling cable, so I needed a wireless solution.

I used to use Ubiquiti gear, and it is quite nice, but It had a couple of downsides. You had to install controller software on a personal device for ease of configuration, and I kind of soured on the company with how it dealt with GPL compliance issues and in disclosing the scale of a security incident. I wanted to check out other options for the new place.

I looked on Wirecutter for their mesh network recommendations and they suggested the ASUS ZenWifi product for gigbit fiber networks, and I ended up buying several nodes. You can connect each node via cable if you want, but there is a dedicated 5 GHz network for backhaul between the nodes for wireless communication, which is what I needed.

There were a couple of issues with the stock firmware, specifically that it didn’t support dynamic DNS for my registrar (the awesome Namecheap) and it also didn’t support SNMP, which is a must have for me. Luckily I found the Merlin project, specifically the gnuton group which makes an open source firmware (with binary blobs for the proprietary bits) for my device with the features I need and more.

[Note: As I write this there is a CVE that ASUS has addressed that has not be patched in gnuton. It’s frustrating as I’ve had to close down NAT for now and the fix is taking some time. I tried to join the team’s Discord channel but you have to be approved and my approval has been in queue for a week now (sigh). Still love the project, though.]

Anyway, while I had some stability issues initially (I am still a monitoring nerd so I was constantly observing the state of the network) those seem to have gone away in the last few months and the system has been rock-solid.

The new farm is nice but there were no outbuildings, so we ended up building a barn. The barn is about 200m from the house, and I wanted to get internet access out there. We live in a dead zone for mobile wireless access so if you are in the barn you are basically cut off, and it would be nice to be able to connect to things like cameras and smart switches.

For a variety of reasons running a cable wasn’t an option, so I started looking for some sort of wireless bridge. I found a great article on Ars Technica on just such a solution, and ended up going with their recommendation of the TP-Link CPE-210, 2.4 GHz version. It has a maximum throughput of 100Mbps but that is more than sufficient for the barn.

I bought these back in December and they were $40 each (you’ll need two) but I just recently got around to installing them.

You configure one as the access point (AP) and one as the client. It doesn’t really matter which is which but I put the access point on the house side. Note that the “quick setup” has an option called “bridge mode” which is exactly what I want to create, a bridge, but that means something different in TP-Link-speak so stick with the AP/Client configuration.

I plugged the first unit into my old Netgear GS110TP switch, but even though it has PoE ports it is not able to drive the CPE-210, so I ended up using the included PoE injector. I simply followed the instructions: I set the IP addresses on each unit to ones I can access from my LAN, I created a separate, wireless network for the bridge, and with the units sitting about a meter apart I was able to plug a laptop on the client side and get internet access.

Now I wanted to be able to extend my mesh network out to the barn, so I bought another ASUS node. The one I got was actually version 2 of the hardware and even though it has the same model number (AX-6600) the software is different enough that gnuton doesn’t support it. From what I can tell this shouldn’t make a difference in the mesh since it will just be a remote node, but I made sure to update the ASUS firmware in any case. The software has a totally different packaging system, which just seems weird to me since the model number didn’t change. I plugged it in and made sure it was connected to my AiMesh network.

I was worried that alignment might be a problem, so I bought a powerful little laser pointer and figured I could use that to help align the radios as I could aim it from one to another.

I had assumed the TP-Link hardware could be directly mounted on the wall, but it is designed to be tie-wrapped to a pole. I don’t have any poles, so it was once again off to Amazon to buy two small masts that I could use for the installation.

Now with everything ready, I started the install. I placed the AP right outside of my office window, which has a clear line of sight to the hayloft window off the barn.

I mounted the Client unit on the side of the barn window, and ran the Ethernet cable down through the ceiling into the tack room.

The tack room is climate controlled and I figured it would be the best place for electronics. The Ethernet cable went into the PoE injector, and I plugged the ASUS ZenWiFi node into the LAN port on the same device. I then crossed my fingers and went back to my desk to test everything.

Yay! I could ping the AP, the Client and the ASUS node. Preparation has it’s benefits.

Or does it?

When I went back to the barn to test out the connection, I could connect to the wireless SSID just fine, but then it would immediately drop. The light on the unit, which should be white, was glowing yellow indicating an issue. I ended up taking the unit back to the house to troubleshoot.

It took me about 90 minutes to figure out the issue. The ASUS device has four ports on the back. Three switched LAN ports and a single WAN port. On the primary unit the WAN port is used to connect to the gigbit fiber device provided by my ISP. What I didn’t realize is that you also use the WAN port if you want to use a cabled backhaul versus the 5GHz wireless network. While not a wired connection per se, I needed to plug the bridge into the blue WAN port versus the yellow LAN port I was using. The difference is about a centimeter yet it cost me an hour and a half.

Everything is a physical layer problem. (sigh)

Once I did that I was golden. I used the laser pointer to make sure I was as aligned as I could be, but it really wasn’t necessary as these devices are meant to span kilometers. My 0.2km run was nothing, and the connection doesn’t require a lot of accuracy. I did a speed test and got really close to the 100Mbps promised.

So I was golden, right?

Nope.

I live so remote that I have an open guest network. There is no way my neighbors would ever come close enough to pick up my signal, and I just think it is easier to tell visitors just to connect without having to give them a password. Most modern phones support calling over WiFi and with mobile wireless service being so weak here my guests can get access easily.

We often have people in the barn so I wanted to make sure that they could have access as well, but when I tested it out it the guest network wouldn’t work. You could connect to the network but it would never assign an IP address.

(sigh)

More research turned out that ASUS uses VLAN tagging to route guest traffic over the network. Something along the way must be gobbling up those packets.

I found what looked like a promising post covering just that issue with the CPE-210, but changing the configuration didn’t work for me.

Finally it dawned on me what the problem must be. Remember that old Netgear switch I mentioned above? I had plugged the bridge into that switch instead of directly into the ASUS. I did this because I thought I could drive it off the switch without using a PoE injector. When I swapped cables around to connect the bridge directly to the AiMesh node, everything started working as it should.

Success! I guess the switch was messing up the protocol just enough to cause the guest network to fail.

If at least one of my three readers has made it this far, I want to point out that several things here made it difficult to pinpoint the problem. When I initially brought up the bridge I could ping the ASUS remote node reliably, so it was hard to diagnose that it was plugged into the wrong port. When the Netgear switch was causing issues with the guest network, the main, password protected SSID worked fine. Had either of these things not worked I doubt it would have taken me so long to figure out the issues.

I am very happy with how it all turned out. I was able to connect the Gree mini-split HVAC unit in the tack room to the network for remote control, and I added TP-Link “kasa” smart switches so we could turn on and off the stall fans from Homekit. I’m sure the next time we have people out here working they will appreciate the access.

Anyway, hope this proves helpful for someone else, and be sure to check out the CPE-210 if you have a similar need.

Confluent’s “Data In Motion” Tour

One of the perks of my job is that I get to work with some incredible partners. One of those is Confluent, probably best known for being the primary maintainer of the Apache Kafka project.

This year, Confluent is doing a multi-city “Data In Motion” tour. The name comes from a focus on real-time data processing. Modern applications often have a requirement to collect data from one or more sources, enrich it and then use the enriched data to provide useful information to the end user, usually in real-time. This tour was a half-day seminar exploring some solutions to that use case.

The event was held at a place called Boxyard RTP. It has been many years since I worked in the Research Triangle Park and it has really grown in that time. Boxyard is made up of repurposed shipping containers. There are restaurants, a bar, a stage (when I walked up a band was playing) and for the purposes of this seminar there was an area on the second level with a conference room and a patio.

The agenda consisted of four main items: an overview of what is going on with Confluent’s offerings, a “fireside chat” about using real-time data for security, an hour-long demo of new functionality and a customer success story.

It was cool to see that AWS was a Diamond Sponsor of this event.

The first presenter was Ananda Bose, who is the Director of Solutions Engineering at Confluent. He covered some of the new products available from Confluent, especially Kora. Kora is a cloud native implementation of Apache Kafka.

At my previous company we wanted to be able to offer our technology as a managed service, which was difficult since it was a monolithic Java application. The ultimate goal was to have a cloud native version, and by that I mean a version of the application that can take advantage of cloud technologies that provide resilience and automatic scalability. Apache Kafka is also a Java app and a lot of work must have gone in to decoupling the storage, identity management, metrics and other aspects of the program to fit in the cloud native paradigm.

One thing I liked about Ananda’s presentation style was that he was very direct. Confluent has just completed an integration with Apache Flink, which is a stream processing framework. One thing that Flink brings is ANSI-compliant SQL. Prior to this integration people used KSQL, but the words that Ananda used to describe KSQL are not really appropriate for this family-friendly blog. (grin)

Kora reinforces something I’ve been saying about open source for some time. When it comes to open source software, people are willing to pay for three things: simplicity, stability and security. Kora does all three and the design of Kora even won the “Best Industry Paper” award at last year’s Very Large Databases conference.

We would see Kora and Flink in action in the demo section.

The second talk was a fireside chat between Ananda and Dr. Jared Smith. Jared works at SecurityScorecard, a security risk mitigation company.

SecurityScorecard has to consume petabytes of data in order to detect malicious behavior on the network. In the way their system works, the payload of a given message may be 20 megabytes or larger, and when they used RabbitMQ it simply couldn’t handle the workload. When they switched to Kora their scaling issues went away.

One cool story Jared told happened during the start of the war in Ukraine. SecurityScorecard placed a “honeypot” server in Kyiv and it was able to detect a large Russian botnet attacking the network. They were able to collect and block the IP addresses of the bots and thus mitigate the damage.

The next hour was taken up by a demo. ChunSing Tsui, a Senior Solutions Engineer, walked us through an example using Confluent Cloud, MongoDB and Terraform. The whole demo is available on Github if you’d like to recreate it on your own.

In this example, a shoe store called HappyFeet wants to monitor website traffic to identify customers who visit but don’t stay on the site very long. Then they could use this information to try and re-engage with them through a marketing campaign offering discounts, etc.

While I am in no way an expert at this stuff, it was engaging. There were four data sources that would be processed to provide an enriched data stream to MongoDB tables. What I did like about it is that the heart of the demo was all written in SQL.

As an “old” I am not as up to date on the new hotness in cloud computing as I would like to be, but SQL I know. This was a product that took a difficult concept and made it accessible.

The final presentation was a customer success story from SAS Institute. It was given by Bryan Stines, Director of Product Management for the SAS Cloud, and Justin Dempsey, a Senior Manager for SAS Cloud Innovation and Automation Services.

It was a nice close to the meeting as the “Data In Motion” theme was very present here. One of the products SAS provides involves fraud detection for credit and debit card transactions. When a person swipes or taps a credit card, that sets off a series of events to detect fraud that may involve numerous checks. This has to be done on the order of milliseconds.

Now I am a big open source software enthusiast, but free software doesn’t mean free solution. With my previous project we used technology such as Apache Kafka, Apache Cassandra, PostgreSQL and others. Our users had to either acquire or develop some of that expertise in house or they needed to find a partner, and that was the issue facing SAS. By partnering with Confluent they were able to get the most out of the software from the people who knew it best.

I no longer live that close to RTP but I felt the three hour round trip was worth it for this event. There are still several dates on the calendar so if this interests you, please check it out.

2024 Open Source Founders Summit

I’m not sure where I first learned about the inaugural Open Source Founders Summit, but I can remember thinking that I’d wished this had been around 20 years ago.

In many ways, starting and running an open source business is no different than any other business. You still need to take care of your customers, create a useful product, and control expenses. But even though open source businesses are software businesses, they different greatly in that the software isn’t the product.

My first computer was a TRS-80 from Radio Shack that I got for Christmas in 1978. A few years later my father would buy one of the first IBM PCs and the main piece of software he ran was Lotus 1-2-3. Mitch Kapor was the first person I’d heard of to become wealthy off of software (followed soon by Bill Gates). From a business perspective the model was compelling: create a useful, high margin product and distribute it for next to nothing.

Because open source involves software, people still think that the standard software business model applies, and that the open source aspect is more of a “loss leader” for a proprietary product. For those of us who want to run a truly open source business, this isn’t an option. From someone who spent two decades running an open source business, the idea of getting a bunch of us together was exciting.

The conference took place in Paris on 27-28 May, and was organized by Emily Omier and Remy Bertot. About 75 people showed up for the conference, and I got to see some old acquaintances like Frank Karlitschek, Peter Zaitsev, Monty Widenius, and Peter Farkas. Brian Proffitt from Red Hat was also there (and I finally got to meet his wife as they came to Paris early to celebrate their wedding anniversary).

The event kicked off Sunday night with a social activity at a brewery. That was a lot of fun as I got to make many new friends, although the venue was a bit loud for conversation. It did make me very eager for the conference to start the next morning.

The venue was pretty cool. It was called The Loft and once you exited off the street you entered into a covered courtyard (which was nice since it did rain off and on). Up a short flight of stairs brought you to the main room. On one side were chairs and two screens for presentations, and on the other was an area with high top tables where food was served and a bar for juice and coffee in the morning (they added soft drinks during lunch).

I really liked the conference format, which consisted of several 30 minute presentations in the morning that everyone could attend. While I understand the need and usefulness of multiple tracks at many larger conferences, it was nice not to have to miss anything when at a smaller event. The presentations were followed by an hour of five minute “lightning talks” and then by lunch.

In the afternoon we would break out into smaller groups for an open discussion of topics around various aspects of running an open source business. One group would stay in the main area while the others would move to spaces in a loft and downstairs in the basement. In every group I attended we would sit in a circle while the moderator would start off the discussion, but after that it was a pretty open format.

One of the main tenets of the conference was that it was a safe space in which to share stories, so none of the sessions were recorded and I will be somewhat circumspect in what I share from the conference as well. It created an intimacy and a level of trust I haven’t experienced at other conferences.

Remy and Emily started us off with an overview of the conference and what was planned for the next two days.

We then jumped into the presentations, starting with Thomas di Giacomo from SUSE.

Thomas talked about “projects vs. products”. In an open source business where you are commercializing the work of an open source project, you don’t have total control of the product roadmap. He discussed strategies for working with your project to better align both the needs of it and of the business. He also talked about pricing open source products. One of the things we did wrong in the beginning at my company was we priced our offerings too low. We were able to adjust that over time but you should never, ever compete on price in an open source business. You compete by being a better and more flexible solution, and one that doesn’t come with vendor lock-in.

The next speaker was Pierre Burgy from Strapi.

I had never heard of Strapi, but they are a popular (and successful) content management system. They have raised a lot of investment money over several rounds and their business model was originally based on “open core” or having a proprietary “enterprise” version.

Now as my three readers know, for over 20 years I’ve been writing about open source business and in many of those posts I have railed against the open core model. I just don’t like it. But it is much more palatable to investors than a pure open source model, and several of the speakers at this conference went the VC route. Strapi did and it seems to be working well for them.

But the basis of Pierre’s talk was on Strapi’s addition of a SaaS model and how it compared to their enterprise offering. I am a huge fan of companies offering a hosted version of open source software. I’ve found that people are willing to pay for three things when it comes to open source: simplicity, security and stability. One of the best ways to offer this is through a managed product.

Selling Strapi’s enterprise offering had similarities to selling proprietary software. It had longer sales cycles and needed a lot of one to one contact. But the deals were large. Contrast that to the SaaS offering which was self-serve, required much less customer contact, and resulted in a faster sales cycle. But the deals were smaller and there was a greater chance that the customer would later leave.

The next speaker was someone I’ve followed for decades but never met: Gaël Duval.

Gaël started Mandrake Linux, which is where I was first introduced to him. He now is the CEO of Murena, a privacy-focused smartphone provider based on the /e/ phone operating system.

There was a time when I was extremely into running truly free and open source software on my mobile devices. I bought a CopperheadOS phone and later ran GrapheneOS. I loaded Asteroid on my smartwatch. For $REASONS I’m back in the Apple ecosystem but I still applaud efforts to bring open source to mobile. It is one area where we in the FOSS community have failed (for example, I could be perfectly happy running Linux Mint as my sole desktop environment but there is nothing truly equivalent in mobile).

However, Gaël wasn’t here to talk about that, instead he focused on the change from being a hands-on founder to becoming a CEO. That is something I had to work through as well, and the issue of delegation was something with which I struggled. He talked about how he addressed that and more and I identified with a lot of what he said.

As with many conferences my favorite part was the hallway track. I didn’t know that Brian Proffitt and Monty Widenius had never met, so I made introductions. This kicked off a spirited discussion over licensing (Red Hat is owned by IBM which will soon own Hashicorp, which opens up some interesting licensing possibilities).

As you can see in this picture I’ve hit my goal of spreading joy and happiness throughout the conference. (grin)

Amandine Le Pape was the next speaker, and she talked about building an open ecosystem.

Amandine is the CEO of Element, the company behind the open source Matrix protocol. What I found interesting about this talk was hearing from a company with an extremely successful project but one where the company behind it struggles to share in that success. She was extremely transparent with numbers, including that Element often contributed 50% of its operating budget to the project and that they had to downsize. But the decisions they had to make have worked and Element is doing much better.

The project I was associated with never had the adoption of something like Matrix, but I do remember the two times we had to downsize and those were the worst days of my professional life. This talk really drove home the reality that open source businesses are businesses, and you have to make decisions based on the health of the company as well as the project.

The last talk of the morning was followed by an hour of five minute lightning talks. The first one was by Olga Rusakova who talked about what types of content from engineers best drive engagement.

Engineers don’t necessarily write compelling content, even though such content can drive business leads. Olga talked about the best practices for getting engineers to write such content.

Kevin Muller was next with a talk about the best KPIs to measure when trying to gauge the health of an open source startup. Spoiler: it isn’t Github stars. I didn’t get a good picture of Kevin but I did get a very clear shot of the guy’s head who was sitting in front of me (sigh).

I did get a decent picture of Alexandre Brianceau.

Alexandre talked about becoming the CEO of an open source company while not being a founder. In many cases when an open source startup gets a funding round, the investors will insert a new CEO into the mix. In the case of Alexandre he was already part of the company and he was just a good fit for the role. Founder and CEO are different roles and sometimes it is best for a company if someone other then a founder takes that role.

And for the viewpoint from the other side, Peter Zaitsev talked about replacing himself in the company he founded with a new CEO.

He covered his rationale for turning over the reins of his company to another, and having met Ann Schlemmer I can see why he made the choice.

Elisabeth Poirier-Defoy finished up the talks with a discussion of the organization behind GTFS, the specification that allows municipalities to publish transit schedules for consumption by third parties, especially mapping applications.

I feel especially close to this technology since I was doing a lot of work with the City of Portland in 2005, and it was their local transportation authority, TriMet, that started publishing transit schedules in an accessible format. It is great to see how that effort has grown into a worldwide practice. Many times in Paris I pulled out my mobile to look up the best bus or metro to take to get to my destination.

We then took a break for a nice lunch and more conversation.

I didn’t take any pictures of the afternoon sessions. There was something about this conference that kind of discouraged my usual need to record everything. As I mentioned above, after lunch we would break out into groups of 20 or so to discuss a certain aspect of open source business. They worked surprisingly well, and in the sessions I attended everyone seemed eager to contribute yet no one really talked over anyone else. It was a refreshingly frank exchange of ideas.

As the afternoon came to a close, I ended up in the courtyard talking with Stephen Augustus. I know we’ve been in the same room before, but this was the first time we were formally introduced. Stephen is the head of open source at Cisco, and we had a great conversation walking to the evening event, which Cisco sponsored.

We met at the Hotel du Nord for drinks and dinner. It was wonderful, although the bar space was a little constrained so it made circulating difficult (and like Sunday night, it was loud). I ended up grabbing a seat (my bad ankle was starting to bother me) but was soon enjoying more conversation and appetizers. We then adjourned to the dining room where the great conversation continued (and with a different set of tablemates so I wouldn’t wear out my welcome). It capped off a nearly perfect day.

Day Two started off very much like the first: with opening remarks from Emily and Remy.

Frank Karlitschek started off the day with a discussion of generating leads when you are an open source company.

I am an unabashed Karlitschek fan-boy. He started a company called ownCloud, and when his vision for the product and the investors diverged, he took a huge leap and left to form a fork called Nextcloud. I use Nextcloud several times a day, and it is amazing, and I am always eager to hear what Frank has to say.

Nextcloud is self-financed (i.e. no VC) and is profitable. They do not subscribe to the open core model, but unlike other open source companies they also do not perform any professional services (at my company professional services made up about 35% of revenue).

He talked about their sales funnel, where they generate leads into the CRM system which are then qualified by marketing and eventually assigned to a salesperson. It’s called a funnel for a reason, and the number drops significantly from Leads to MQLs to SALs. He was kind enough to share actual current numbers from the past week, which gave us an example to set realistic expectations.

The next speaker was Alexis Richardson, the founder of RabbitMQ who went on to found Weaveworks, a startup focused on cloud native development tooling. Unfortunately, despite generating a lot of revenue (numbers which he shared) they ended up having to close their doors earlier this year. It was the most brutally honest presentation of the conference. One thing I found interesting was that his company did both an enterprise product and a SaaS product, and they decided to focus on the enterprise product. This was a bit of a departure from my current belief that managed open source is the best option for open source startups.

Next up was Matt Barker. Matt is in sales and he started us off with a story about going from England to the US to sell magazine subscriptions door to door. That is something I could never do, but it turns out he was good at it.

Plus he referenced Crossing the Chasm, which is probably my favorite business book of all time. Anyone who references that must be cool (granted, I got to talk to Matt a lot after his talk and he is cool, especially when sharing some of his experiences at Canonical where his time overlapped with my friend Jono). A lot of the things he learned selling magazines can be applied to selling open source products.

Continuing the sales theme, Peter Zaitsev returned with a talk about how you can incentivize sales teams.

People often focus on the engineering and product management when it comes to open source businesses, but sales plays an important role, too. In Peter’s experience the best solution is to offer salespeople a commission-based compensation (or “eat what you kill”) and make it unlimited. If they are successful they could end up getting paid more than the CEO, but this can be a good thing for the company.

Garima Kapoor, co-founder of MinIO, closed out the main talks with a discussion about how to convert widespread open source adoption into financial success for the company behind the project. She covered many topics but one line on a slide caught my eye concerning per-incident support. I constantly had to explain to our customers why we didn’t provide it and I was hoping to follow up with her but never got the chance.

Like the previous day, the final morning hour was reserved for lightning talks. First up was David Aronchick.

David founded Expanso, a company that commercializes the Bacalhau project. Bacalhau is a Portuguese seafood dish consisting of salted cod, and that project implements a Compute over Data (CoD) solution. Get it? Expanso is Portuguese for distributed, or so I was told. He talked about how one can drive community engagement through management of the product cycle.

Julien Coupey followed with a talk about how expertise in an open source project is something that can be commercialized.

At my old company I used to compare us to plumbers. There are a lot of things I can do around my farm, woodworking and electrical come to mind, but I suck at plumbing. I end up wet and frustrated. So even though I can buy the same pipes and fittings as a professional, I call someone because overall it will save me time and money. In much the same way, contracting with the people who actually make the open source software you use can save you time and money, since chances are they have the experience to solve your issues more quickly than you could on your own.

The next speaker was Taco Potze, from OpenSocial.

Taco’s company interacts a lot with non-profits, other charities and governments. These organizations represent unique challenges when it comes to the sales cycle. He shared some of his experiences in this field.

Continue with that theme, Yann Lechelle discussed developing a commercial enterprise around Scikit-learn, a publicly sponsored open-source library for Machine Learning.

He talked about the challenges in developing and funding a mission-driven organization around an open source project while still remaining true to its open source nature.

The final speaker was Eyal Bukchin, who talked about using an open source minimal viable product (MVP) to start a business.

I think open source provides a great way to demonstrate the value and demand for a software solution. By working on mirrord, he was able to secure funding for MetalBear, his business aimed at making the lives of backend developers easier.

At lunch I introduced myself to Greg Austic, who works on open source software for farms.

He had on a T-shirt from upstate New York, but I learned he is actually in the Minneapolis-St. Paul area. When he asked me where I was from the conversation went a little like this:

Greg: Where are you located?
Tarus: Oh, I live in North Carolina.
Greg: Where in North Carolina?
Tarus: Central North Carolina.
Greg: Where in central North Carolina? I spent five years in Pittsboro.
Tarus: I used to live in Pittsboro.

Turns out we know a lot of the same people, yet we didn’t meet until we both went to France.

After lunch we broke out into more workshops. I can’t stress enough how I liked the format of this conference.

Unfortunately, while Monday was a holiday in the US, Tuesday wasn’t, and being six hours ahead of the New York time zone I ended up leaving a bit early to deal with some meetings I couldn’t miss.

In case this hasn’t come out in this post, I really enjoyed this conference. A lot. It was very eye-opening. There were companies that embraced VC and did well, and others that embraced VC and did poorly. Some were steadfastly against outside investment, preferring to grow organically. Some focused on the public sector, some focused on the enterprise and some focused on managed services. Everyone had a story to tell and something to add, and I eagerly await next year’s event.

The one challenge will be to maintain the intimacy and honesty of this first conference as it grows. It should be a must attend event for open source founders and anyone seriously interested in forming an open source company.

Amazon Web Services – Two Years In

Today marks my second anniversary as an AWS employee. Time flies and I’m still having a lot of fun.

After I sold the company I founded in 2019 and parted ways two years later, I wasn’t sure what I wanted to do. I’ve often joked that I enjoy working in open source and in network monitoring so much that I would do it for free, so I still wanted to keep working (although probably not so much with the “free” part). When my friend Spot Callaway told me about an open source opportunity at AWS, I was intrigued. After meeting the amazing people I would be working with, I was eager to survive the AWS interview process (called “The Loop“) and join the team.

I wasn’t sure I would be comfortable at a company with roughly 100,000 times the employees of my small venture, but I find that I really enjoy how AWS is structured. There is a philosophy that teams at Amazon should never be bigger than one that can be fed with two pizzas (“two pizza” teams) and so you feel a lot of autonomy. There is also this idea of “two-way” vs. “one-way” doors. Decisions that can be reversed, i.e. “two-way doors”, can be made quickly and at a lower level, whereas decisions that can’t be reversed require more thought and a higher level approval. The team I’m on, the Open Source Strategy and Marketing team (OSSM “awesome”) is amazing, and the closest I’ve ever come to experiencing imposter syndrome has been working with this group of incredible people (and remember kiddos, it’s not imposter syndrome if you are an actual imposter).

Recently, Michael Coté posted a link on Mastodon to an article written back in 2018 by Luke Kanies on “Why We Hate Working for Big Companies“. Outside of Amazon I’ve only worked for one big company, and that was Northern Telecom. A lot of what he writes about really reflected on my experience at NORTEL, but not so much my experience at AWS. Even though AWS has a hierarchical management structure, in practice it seems very flat to me. I’ve always felt that I have access to my skip-level VP, and Amazon has a culture of escalation so that issues can be resolved quickly. I personally have a lot of autonomy to decide how to best implement our team and corporate goals, and I don’t feel that there is some sort of “central planning organization” dictating what I do.

The late, great Anthony Bourdain once talked about the reason he is did his television show was for the access. He simply didn’t have an easier way to get the experiences he wanted outside of it. I like that, in my role at AWS, I get to work with the organizations that in many ways form the heart of modern open source software, such as the Apache Software Foundation and the Linux Foundation. I get to step up my open source game and grow as a person.

But the main reason I like my job is that I get to work with some amazing customers. I like to think that AWS is the best place for companies, especially startups, to run their cloud workloads, and while that may not be true for everyone it is my goal to help make it true. Prior to our acquisition we were an AWS shop and that was mainly because AWS made it easy for us to get started with using the cloud. I literally get paid to meet with these companies that in many cases are defining their market with innovative products and to help them be successful.

It’s a lot of fun.

These days, especially in technology, nothing is certain, but my plan is to keep working at AWS until I retire at age 65. My hope is that I’ll continue in my current role but there are so many opportunities at AWS that I can explore new things as well. My friend William Hurley has gotten me interested in quantum computing, and Amazon is doing some foundational work in that field with Amazon Braket. I’m still skeptical about the hype around Generative AI but the tools I get to play with at AWS have been more powerful than the other things I’ve tried.

For example, I have a group of friends who get together once a week for lunch. One of them meant to write that this week it was at “my abode” but somehow that got autocorrected to “mime abides”. The image that popped into my head was of Jeff Bridges in his role in “The Big Lebowsky” but dressed as a mime. I wanted to create that picture so I tried it on a couple of GenAI image services and they came up short. I decided to try it on AWS and not only did I get an image I could use, I could send my prompt to a number of different models with just one click and choose the best one.

Yeah – it’s definitely not perfect and I am no “prompt engineer” but to be able to create this quickly and easily served my purpose to make my joke.

When I am asked in interviews what is your biggest weakness, my answer is that I can’t handle a job that requires a lot of rote and repetitive work. I thrive in constantly changing environments, and I am grateful that my job at AWS is such a match for my temperament, and many thanks to my teammates for creating such a great environment in which to grow.

Posted in AWS

Computer Nostalgia

Two stories this week caught me up in a little bit of old computer nostalgia. The first was that chip manufacturer Zilog was ending production of the Z80 CPU chip, and the second was that NASA managed to restore communications with the Voyager 1 spacecraft.

Now to be honest I didn’t know that Zilog still made the Z80, and I was impressed it had such a long run. My very first computer was a Radio Shack TRS-80 that I got for Christmas in 1978, when I was 12 years old. At a price of $599 (about $3k today) it was an expensive present, but my father saw it as an investment in my future. I spent a lot of time on that machine, and I thought it was cool that “TRS” were the consonants in my first name.

Compared to the simplest computers in use today it is a dinosaur. The Z80 had a 1.78MHz clock (that’s “mega” hertz) and could only address 64kB of memory. The operating system and programming language (BASIC) were stored in ROM, leaving the rest for RAM. My initial machine came with 4kB of RAM and in March of 1979 was when I wrote my first program that wouldn’t fit under that limit.

I upgraded it to “Level II” ROM (which ate up 16kB) and 48kB of RAM to max out the machine, and I eventually added two floppy disk drives.

Back in the day finding information on computers, especially in my rural North Carolina town, was difficult. I got a lot of it from subscribing to computer magazines, which is how I learned about assembly. I couldn’t afford an actual assembler so I would hand code it into the system by typing in pairs of hexadecimal digits. Now what ties this whole experience to Voyager 1 is that what the team at NASA did to restore communications was similar to things I did with the TRS-80.

Another thing I couldn’t afford was a printer. Dot matrix printers at the time ran about $2k, or about $9500 in today’s currency (you could buy a new Toyota Corolla for less than $4k). My Dad, however, worked for General Electric and they had just discontinued the TermiNet 300 – a beast of a machine – and he managed to get one cheap. The way it printed was crazy. It had a bank of 118 “hammers” that faced the paper. In between the hammers and the paper was a band containing little metal “fingers” oriented vertically, and on each finger was one of the characters the machine could print. The band rotated around at high speed, and when the machine wanted to produce a character, the ink ribbon would pop up and a hammer would hit the specific finger for that character as it flew by. That it could do this at 30 characters a second was amazing, and loud.

The problem was that it didn’t have a parallel interface and was instead serial, so I had to buy an RS-232C board and write a printer driver. It was then that I kind of fell in love with the Z80 instruction set. Note that I had to come up with the instruction, convert it to the proper hex code and then “POKE” it into memory (I could then “PEEK” to make sure it worked). Luckily printer drivers aren’t that complicated (take a byte as an input, send it to the interface as an output). It worked, and I made a lot of extra money typing in term papers for other students (the advantage of the TermiNet was that it was typewriter quality output).

I can only imagine the NASA engineers sitting around doing the same thing to fix Voyager (hand assembling code, not typing term papers).

Modern computing today is so abstracted from what I learned. Now you can just ask GenAI to write a program for you. I’m not one of those old guys who thinks what I went through was better, but it was nice to see some of those techniques prove useful today.

And I should point out that Voyager 2 is working just fine, which goes to show that you should never buy the first release of a technology product (grin).

2024 FOSS Backstage

I was a speaker at this year’s FOSS Backstage conference, which was held 4-5 March in Berlin, Germany. FOSS Backstage is gathering dedicated to the “non-code” aspects of open source, and I was surprised I had not heard of it before this year. This was the sixth edition of the event, and it was held in a new location due to growing attendance.

TL;DR; I really, really enjoyed this conference, to the point where it is in contention to be my favorite one of the year. The attendees were knowledgeable and friendly, the conference was well run, and it was not so big that I felt I was missing out due to there being too many tracks. I hope I am able to attend again next year.

This was my first time in Berlin, and although I have been to Germany on numerous other trips, for some reason I have never made it to this historic city. It does have a reputation for being a center for hacker culture in Europe, hosting the annual Chaos Communication Congress, and several of my friends who were at FOSS Backstage told me they were in Berlin quite often.

The event was held at a co-working space and we had access to a lobby, a large auditorium, and then downstairs there were two smaller rooms: Wintergarten and a “workshop” room that was used mainly for remote speakers. Each day started off with a keynote in the auditorium followed by breakout sessions of two or three tracks across the three available rooms.

Monday’s keynote (video) was by Melanie Rieback, who is the CEO and Co-founder of a computer security company called Radically Open Security, a not-for-profit security consulting firm. Her company donates 90% of net profits to charity, and they openly share the tools and techniques they use so that their clients can learn to audit their security on their own.

As someone who spends way too much time focused on open source business models, it was encouraging to see a company like Radically Open Security succeed and thrive. But I wasn’t sure I bought in to the premise that all open source companies should be like hers. Consulting firms have a particular business model that is similar to those used by accounting firms, management firms or other service firms such as plumbers or HVAC repair. Software companies have a much different model. For example, I am writing this using WordPress. I didn’t have to pay someone to show me how to use it. If WordPress wants to continue to produce software they need to make money in a different fashion, such as in their case with hosting services and running a marketplace. Those products require capital to create, and since that often can’t be bootstrapped, this means they have investors, investors who will one day expect a return.

Now it is easy to find examples of where investors, specifically venture capitalists, did bad things, but we can’t rule out the model entirely. If you use a Linux desktop, most likely you are using software that companies like Red Hat and Canonical helped to create. Both of those companies are for-profit and have (or in the case of Red Hat, had) investors. The Linux desktop would simply not exist in its current form without them.

The keynote did, however, make me think, which is one of the main reasons I come to conferences.

[Note: I used WordPress as an example because it was handy, and we can discuss the current concerns about selling data for GenAI use another time (grin)]

After the keynote the breakout sessions started, and I headed downstairs to hear Dawn Foster talk about understanding the viability of open source projects (video). Dr. Foster is the Director of Data Science for CHAOSS, a Linux Foundation project for Community Health Analytics in Open Source Software.

Open source viability isn’t something a lot of people think about. Many of us just kind of assume that a piece of open source software will just always be there and always be kept up to date and secure. This can be a dangerous assumption, as illustrated by a famous xkcd comic that I saw no less than three times during the two day conference.

In my $JOB we often use data analytics to examine the health of a project. In addition to metrics such as number of releases, bugs and pull requests, we also look at something called the Pony Factor and the Elephant Factor.

I’ve been using the term Pony Factor for two years now and while I can trace its origin I’m not sure how it got its name. To calculate it, simply rank all the contributors to an open source project by the number of contributions (PRs, lines of code, whatever you think is best) and then start counting from the largest to smallest until you get to 50% of all contributions (usually over a period of time). For example, if for a given month you have 20 contributions and the largest contributor was responsible for 6 and the second largest for 5 you would have a Pony Factor of 2, since the sum of 6 and 5 is 11 which is more than 50% of 20. It is similar to the Bus Factor, which is a little more grisly in that it counts the number of contributors who could get hit by a bus before the project becomes non-viable. People leave open source projects for a number of reasons (and I am thankful that it is rarely of the “hit by bus” type) and if you depend on that project you have a vested interest in its health.

The Elephant Factor is similar, except you count the number of organizations that contribute to a project. In the example above, if the two contributors both worked for the same company, then the project’s Elephant Factor would be 1 (the number of organizations responsible for at least 50% of the project’s contributions). While we often assume that open source software is a pure meritocracy based on the community that creates it, a low Elephant Factor means that the project is controlled by a small number of parties. This isn’t intrinsically a bad thing, but it could result in the interests of that organization(s) outweigh the interests of the project as a whole.

This was just part of what Dawn covered, as there are other metrics to consider, and she didn’t go into detail about what you can do when open source projects that are important to you have low Pony/Elephant factors, but I found the presentation very interesting.

As a nice segue from Dawn’s talk, the next presentation I attended was on how to change the governance model of an existing open source project (video), given by Ruth Cheesley. Ruth is the project lead for Mautic, and they faced an issue when their project, which had an Elephant Factor of 1, found out that the company behind it was no longer going to support development.

Now I want to admit upfront that I will get some of the details of this story wrong, but it is my understanding that Mautic, which does marketing automation, was originally sponsored by Acquia, the company behind the Drupal content management system and other projects. When Acquia decided to step back from the project, those involved had to either pivot to a different governance model or it would die.

There is the myth about open source that simply releasing code under an OSI-approved license means that hundreds of qualified people will give up their nights and weekends to work on it. Creating a sustainable open source community takes a lot of effort, and one of the main tools for building that community is the governance model. No one wants to be a part of a community where they feel their contributions aren’t appreciated and their opinions are not heard, and fewer still want to work in an environment that can be overly aggressive or even hostile. Ruth talked about the path that her project took and how it directly impacted the success of Mautic.

The next session I attended was a panel discussion on open source being used in the transportation, energy, automotive and public sector realms (video). I’m not a big fan of panel discussions, and I was also surprised to see that all the panelists were men (making this a “manel”). FOSS Backstage did a really good job of promoting diversity in other aspects of the conference (and the moderator was female, but I don’t think that avoids the “manel” definition). It was cool to add another data point that “open source is everywhere” and it was interesting to see where each of the panelists were in there “open source journey”. A couple of them seem to have a good understanding of what it gets them but some others were obviously in the “what the heck have we gotten ourselves in to” phase. I did get introduced to Wolfgang Gehring for the first time. Wolfgang works for Mercedes, and I’m an unabashed Mercedes fan. I’ve owned at least one car from most of the humanly affordable brands in my lifetime, and six of them have been Mercedes (I’ve also owned four Ford trucks). While Wolfgang obviously knows his stuff when it comes to open source, I don’t think he can score me F1 tickets. (sniff)

After the lunch break I also revisited Wolfgang and his presentation on Inner Source (video). Most people understand that open source is code that you can see, modify and share. Most open source licenses are based on copyright law, so they don’t apply until you actually “share” the code with a third party. What happens if you want to use open source solely within an organization? In the case of large organizations you might have a number of disparate groups all working on similar projects, and there can be advantages to organizing them. Not only can they share code, they can also share experiences and build a community, albeit an internal one, to maximize the value of the open source solutions they use.

The next three sessions I attended represented kind of a “mini” AWS track. The first one featured Spot Callaway. Spot is something of a savant when it comes to thinking up fun ways to get people involved in open source communities. I’ve known Spot for over two decades and I’ve worked on his team for the last two years, and it is amazing to watch his mind at work. His talk charted a history of his involvement in coming up with cool and weird ways to engage with people in open source (video), and I was around for some of his efforts so I can attest to its effectiveness (and that it was, indeed, fun).

The second session was by Kyle Davis. I had only met him once before seeing him again in Berlin, and this was the first time I’d seen him speak. His topic was on the importance of how you write when it comes open source projects (video). Now, AWS is very much a document-driven culture, and my own writing style has changed in the two years I’ve been there (goodbye passive voice). Kyle’s presentation talked about considerations you should make when communicating to an open source community. Realizing that your information may be read by people from different cultures and where, say, English isn’t a first language can go a long way toward making your project feel more inclusive.

Rich Bowen presented the third session, and he discussed how to talk to management about open source (video). My favorite part of this talk is when he posted a disclaimer that while many managers don’t understand open source, his does (our boss, David Nalley, is currently the President of the Apache Software Foundation).

I made up this graphic which I will use every time I need a disclaimer in the future.

The last session presenter was Carmen Delgado who talked about getting new contributors involved in open source (video). She examined three such programs: Outreachy, Google Summer of Code, and CANOSP and compared and contrasted these three different “flavors” of programs to encourage open source involvement.

Monday’s presentations ended with a set of “lightning” talks. I’ve always wanted to do one of these – a short talk of no more than five minutes in length (there was a timer) but my friends point out that I can’t introduce myself in five minutes much less give a whole, useful talk.

Two talks stood out for me. In the first one the speaker brought her young daughter (in a Pokémon top, ‘natch) and it really made me glad to see people getting into open source at a young age.

I also liked Miriam Seyffarth’s presentation on the Open Source Business Alliance. I was happy to see both MariaDB and the fine folks at Netways are involved.

Tuesday started with a remote keynote (video) given by Giacomo Tenaglia from CERN. As a physics nerd I’ve always wanted to visit CERN. I know a few people who work there but I have not been able to schedule a trip. I was surprised to learn that the CERN Open Source Programs Office (OSPO) is less than a year old. Considering the sheer amount of open source software used by academia and research I would have expected it to be much older.

The next talk was definitely the worst of the bunch (video), but I had to attend since I was giving it (grin). As this blog can attest, I’ve been working in open source for a long time, and I’ve spent way too much of it thinking about open source business models. There are a number of them, but my talk comes to the conclusion that the best way to create a vibrant open source business is to offer a managed (hosted) version of the software you create. I’ve found that when it comes to open source, people are willing to pay for three things: simplicity, security and stability. If you can offer a service that makes open source easier to use, more secure and/or more stable, you can create a business that can scale.

I took a picture just before I started and I was humbled that by the time I was finished the room was packed. Attention is in great demand these days I and really appreciated folks giving me theirs.

I then attended a talk by Celeste Horgan on growing a maintainer community (video). While I have written computer code I do not consider myself a developer, yet I felt that I was able to bring value to the open source project I worked on for decades. This session covered how to get non-coders to contribute and how to manage a project as it grows.

Brian Proffitt gave a talk that I found very interesting to my current role on the difficulties of measuring the return on investment (ROI) for being involved in open source events (video). While I almost always assume engagement with open source communities will generate positive value, how do you put a dollar figure on it? For example, event sponsorship usually gets you a booth space in whatever exhibit hall or expo the event is hosting. When I was at OpenNMS we would sometimes decline the booth because while we wanted to financially sponsor the conference, we couldn’t afford to do that and host a booth. There are a lot of tangible expenses associated with booth duty, such as swag and the travel expenses for the people working in it, as well as intangible expenses such as opportunity cost. For commercial enterprises attendance at an event can be measured in things like how many orders or leads were generated. That doesn’t usually happen at open source events. It turns out that it isn’t an easy question to answer but Brian had some suggestions.

For most of the conference there were two “in person” tracks and one remote track. The only remote track I attended was a talk given by Ashley Sametz comparing outlaw biker gangs to open source communities. Ashley is amazing (she used to work on our team before pursuing a career in law enforcement) and I really enjoyed her talk. Both communities are tightly knit, have their own jargon and different ways of attracting people to the group.

While I wouldn’t have called them part of an “outlaw motorcycle gang”, many years ago I got to meet a bunch of people in a motorcycle club. It was just before Daytona Bike Week and a lot of people were riding down to Florida. North Carolina is a good stopping point about midway into the trip. While it was explained to me and my friend David that “Mama is having a few folks over for a cookout, you two should come” we were a little surprised to find out that all those bikes we saw on the way there were also coming. There were at least 100 bikes, many cars, and one cab from a semi-tractor trailer. It was amazing. If you have ever seen the first part of the movie Mask that is just what it was like. And yes I could rock the John Oates perm back then.

That session ended up being the last session I attended that day. I spent the rest of the conference in the hallway track and got to meet a lot of really interesting people.

That evening I did manage to get Jägerschnitzel, which was on my list of things to do while in Germany. Missed the Döner kebab, however.

I found FOSS Backstage to be well worth attending. I wish I’d known about it earlier, so perhaps this post will get more people interested in attending next year. Open source is so much more than code and it was nice to see an event focused on the non-code aspects of it.

This Blog Can Now Vote

It’s hard to believe but this blog is now 21 years old, having started back on this day in 2003. In the beginning I only had the one reader, but now I’m up to a whole three readers! I’m hoping by the time it turns 30 I can get to four.

I am writing this from the French Quarter in New Orleans, where I am attending a meeting. I was trying to think about a topic for this auspicious anniversary, and my go-to was to complain, once again, about the death of Google Reader also killing blogs, but instead I thought I’d just mention a few of the events I’ll be attending in the first part of 2024.

Next week I’ll be heading to Berlin for FOSS Backstage. I’ve been to Germany many times and love visiting that county, and this will be my first visit to Berlin so I am looking forward to it. I’m speaking on open source business models, which is a topic I’m passionate about. It should be a lot of fun.

April will find me in Seattle for the Open Source Summit. I won’t be speaking, but I will be in the AWS booth and would love to meet up if you happen to be attending as well.

Finally, in May I’ll be in Paris for the inaugural Open Source Founders Summit. If you run a commercial open source company, or are thinking about starting an open source company, consider applying to attend this conference. Emily Omier is bringing together pretty much the brain-trust when it comes to open source business (and me!) and it promises to be a great discussion on how to make money with open source while remaining true to its values.

Using rclone to Sync Data to the Cloud

I am working hard to digitize my life. Last year I moved for the first time in 24 years and I realized I have way too much stuff. A lot of that stuff was paper, in the form of books and files, so I’ve been busy trying to get digital copies of all of it. Also, a lot of my my life was already digital. I have e-mails starting in 1998 and a lot of my pictures were taken with a digital camera.

TL;DR; This is a tutorial for using the open source rclone command line tool to securely synchronize files to a cloud storage provider, in this case Backblaze. It is based on MacOS but should work in a similar fashion on other operating systems.

That brings up the issue of backups. A friend of mine was the victim of a home robbery, and while they took a number of expensive things the most expensive was his archive of photos. It was irreplaceable. This has made me paranoid about backing up my data. I have about 500GB of must save data and around 7TB of “would be nice” to save data.

At my old house the best option I had for network access was DSL. It was usable for downstream but upstream was limited to about 640kbps. At that rate I might be able to backup my data – once.

I can remember in college we were given a test question about moving a large amount of data across the United States. The best answer was to put a physical drive in a FedEx box and overnight it there. So in that vein my backup strategy was to buy three Western Digital MyBooks. I created a script to rsync my data to the external drives. One I kept in a fire safe at the house. It wasn’t guaranteed to survive a hot fire in there (paper requires a much higher temperature to burn) but there was always a chance it might depending on where the fire was hottest. I took the other two drives and stored one at my father’s house and the other at a friend’s house. Periodically I’d take out the drive from the safe, rsync it, and switch it with one of the remote drives. I’d then rsync that drive and put it back in the safe.

It didn’t keep my data perfectly current, but it would mitigate any major loss.

At my new house I have gigabit fiber. It has synchronous upload and download speeds so my ability to upload data is much, much better. I figured it was time to choose a cloud storage provider and set up a much more robust way of backing up my data.

I should stress that when I use the term “backup” I really mean “sync”. I run MacOS and I use the built-in Time Machine app for backups. The term “backup” in this case means keeping multiple copies of files, so not only is your data safe, if you happen to screw up a file you can go back and get a previous version.

Since my offsite “backup” strategy is just about dealing with a catastrophic data loss, I don’t care about multiple versions of files. I’m happy just having the latest one available in case I need to retrieve it. So it is more of synchronizing my current data with the remote copy.

The first thing I had to do was choose a cloud storage provider. Now as my three readers already know I am not a smart person, but I surround myself with people who are. I asked around and several people recommended Backblaze, so I decided to start out with that service.

Now I am also a little paranoid about privacy, so anything I send to the cloud I want to be encrypted. Furthermore, I want to be in control of the encryption keys. Backblaze can encrypt your data but they help you manage the keys, and while I think that is fine for many people it isn’t for me.

I went in search of a solution that both supported Backblaze and contained strong encryption. I have a Synology NAS which contains an application called “Cloud Sync” and while that did both things I wasn’t happy that while the body of the file was encrypted, the file names were not. If someone came across a file called WhereIBuriedTheMoney.txt it could raise some eyebrows and bring unwanted attention. (grin)

Open source to the rescue. In trying to find a solution I came across rclone, an MIT licensed command-line tool that lets you copy and sync data to a large number of cloud providers, including Backblaze. Furthermore, it is installable on MacOS using the very awesome Homebrew project, so getting it on my Mac was as easy as

$ brew install rclone

However, like most open source tools, free software does not mean free solution, so I did have a small learning curve to climb. I wanted to share what I learned in case others find it useful.

Once rclone is installed it needs to be configured. Run

$ rclone config

to access a script to help with that. In rclone syntax a cloud provider, or a particular bucket at a cloud provider, is called a “remote”. When you run the configurator for the first time you’ll get the following menu:

No remotes found, make a new one?
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n

Select “n” to set up a new remote, and it will ask you to name it. Choose something descriptive but keep in mind you will use this on the command line so you may want to choose something that isn’t very long.

Enter name for new remote.
name> BBBackup

The next option in the configurator will ask you to choose your cloud storage provider. Many are specific commercial providers, such as Backblaze B2, Amazon S3, and Proton Drive, but some are generic, such as Samba (SMB) and WebDav.

Option Storage.
Type of storage to configure.
Choose a number from below, or type in your own value.
 1 / 1Fichier
   \ (fichier)
 2 / Akamai NetStorage
   \ (netstorage)
 3 / Alias for an existing remote
   \ (alias)
 4 / Amazon Drive
   \ (amazon cloud drive)
 5 / Amazon S3 Compliant Storage Providers including AWS, Alibaba, ArvanCloud, Ceph, ChinaMobile, Cloudflare, DigitalOcean, Dreamhost, GCS, HuaweiOBS, IBMCOS, IDrive, IONOS, LyveCloud, Leviia, Liara, Linode, Minio, Netease, Petabox, RackCorp, Rclone, Scaleway, SeaweedFS, StackPath, Storj, Synology, TencentCOS, Wasabi, Qiniu and others
   \ (s3)
 6 / Backblaze B2
   \ (b2)

...

I chose “6” for Backblaze.

At this point in time you’ll need to set up the storage on the provider side, and then access it using an application key.

Log in to your Backblaze account. If you want to try it out note that you don’t need any kind of credit card to get started. They will limit you to 10GB (and I don’t know how long it stays around) but if you want to play with it before deciding just remember you can.

Go to Buckets in the menu and click on Create a Bucket

Note that you can choose to have Backblaze encrypt your data, but since I’m going to do that with rclone I left it disabled.

Once you have your bucket you need to create an application key. Click on Application Keys in the menu and choose Add a New Application Key.

Now one annoying issue with Backblaze is that all buckets have to be unique in the entire system, so “rcloneBucket” and “Media1” etc have already been taken. Since I’m just using this as an example it was fine for the screenshot, but note that when I add an application key I usually limit it to a particular bucket. When you click on the dropdown it will list available buckets.

Once you create a new key, Backblaze will display the keyID, the keyName and the applicationKey values on the screen. Copy them somewhere safe because you won’t be able to get them back. If you lose them you can always create a new key, but you can’t modify a key once it has been created.

Now with your new keyID, return to the rclone configuration:

Option account.
Account ID or Application Key ID.
Enter a value.
account> xxxxxxxxxxxxxxxxxxxxxxxx

Option key.
Application Key.
Enter a value.
key> xxxxxxxxxxxxxxxxxxxxxxxxxx

This will allow rclone to connect to the remote cloud storage. Finally, rclone will ask you a couple of questions. I just choose the defaults:

Option hard_delete.
Permanently delete files on remote removal, otherwise hide files.
Enter a boolean value (true or false). Press Enter for the default (false).
hard_delete>

Edit advanced config?
y) Yes
n) No (default)
y/n>

The one last step is to confirm your remote configuration. Note that you can always go back and change it if you want, later.

Configuration complete.
Options:
- type: b2
- account: xxxxxxxxxxxxxxxxxxxxxx
- key: xxxxxxxxxxxxxxxxxxxxxxxxxx
Keep this "BBBackup" remote?
y) Yes this is OK (default)
e) Edit this remote
d) Delete this remote
y/e/d> y

Current remotes:

Name                 Type
====                 ====
BBBackup             b2

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> q

At this point in time, quit out of the configurator for a moment.

You may have realized that we have done nothing with respect to encryption. That is because we need to add a wrapper service around our Backblaze remote to make this work (this is that there learning curve thing I mentioned earlier).

While I don’t know if this is true or not, it was recommended that you not put encrypted files in the root of your bucket. I can’t really see why it would hurt, but just in case we should put a folder in the bucket at which we can then point the encrypted remote. With Backblaze you can use the webUI or you can just use rclone. I recommend the latter since it is a good test to make sure everything is working. On the command line type:

$ rclone mkdir BBBackup:rcloneBackup/Backup

2024/01/23 14:13:25 NOTICE: B2 bucket rcloneBackup path Backup: Warning: running mkdir on a remote which can't have empty directories does nothing

To test that it worked you can look at the WebUI and click on Browse Files, or you can test it from the command line as well:

$ rclone lsf BBBackup:rcloneBackup/
Backup/

Another little annoying thing about Backblaze is that the File Browser in the webUI isn’t in real time, so if you do choose that method note that it may take several minutes for the directory (and later any files you send) to show up.

Okay, now we just have one more step. We have to create the encrypted remote, so go back into the configurator:

$ rclone config

Current remotes:

Name                 Type
====                 ====
BBBackup             b2

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> n

Enter name for new remote.
name> crypt

Just like last time, chose a name that you will be comfortable typing on the command line. This is the main remote you will be using with rclone from here on out. Next we have to choose the storage type:

Option Storage.
Type of storage to configure.
Choose a number from below, or type in your own value.
 1 / 1Fichier
   \ (fichier)
 2 / Akamai NetStorage
   \ (netstorage)

...

14 / Encrypt/Decrypt a remote
   \ (crypt)
15 / Enterprise File Fabric
   \ (filefabric)
16 / FTP
   \ (ftp)
17 / Google Cloud Storage (this is not Google Drive)
   \ (google cloud storage)
18 / Google Drive
   \ (drive)

...

Storage> crypt

You can type the number (currently 14) or just type “crypt” to choose this storage type. Next we have to point this new remote at the first one we created:

Option remote.
Remote to encrypt/decrypt.
Normally should contain a ':' and a path, e.g. "myremote:path/to/dir",
"myremote:bucket" or maybe "myremote:" (not recommended).
Enter a value.
remote> BBBackup:rcloneBackup/Backup

Note that it contains the name of the remote (BBBackup), the name of the bucket (rcloneBackup), and the name of the directory we created (Backup). Now for the fun part:

Option filename_encryption.
How to encrypt the filenames.
Choose a number from below, or type in your own string value.
Press Enter for the default (standard).
   / Encrypt the filenames.
 1 | See the docs for the details.
   \ (standard)
 2 / Very simple filename obfuscation.
   \ (obfuscate)
   / Don't encrypt the file names.
 3 | Adds a ".bin", or "suffix" extension only.
   \ (off)
filename_encryption>

This is the bit where you get to solve the filename problem I mentioned above. I always choose the default, which is “standard”. Next you get to encrypt the directory names as well:

Option directory_name_encryption.
Option to either encrypt directory names or leave them intact.
NB If filename_encryption is "off" then this option will do nothing.
Choose a number from below, or type in your own boolean value (true or false).
Press Enter for the default (true).
 1 / Encrypt directory names.
   \ (true)
 2 / Don't encrypt directory names, leave them intact.
   \ (false)
directory_name_encryption>

I choose the default of “true” here as well. Look, I don’t expect to ever become the subject of an in-depth digital forensics investigation, but the less information out there the better. Should Backblaze ever get a subpoena to let someone browse through my files on their system, I want to minimize what they can find.

Finally, we have to choose a passphrase:

Option password.
Password or pass phrase for encryption.
Choose an alternative below.
y) Yes, type in my own password
g) Generate random password
y/g> y
Enter the password:
password:
Confirm the password:
password:

Option password2.
Password or pass phrase for salt.
Optional but recommended.
Should be different to the previous password.
Choose an alternative below. Press Enter for the default (n).
y) Yes, type in my own password
g) Generate random password
n) No, leave this optional password blank (default)
y/g/n>

Now, unlike your application key ID and password, these passwords you need to remember. If you loose them then you will not be able to get access to your data. I did not choose a salt password but it does appear to be recommended. Now we are almost done:

Edit advanced config?
y) Yes
n) No (default)
y/n>

Configuration complete.
Options:
- type: crypt
- remote: BBBackup:rcloneBackup/Backup
- password: *** ENCRYPTED ***
Keep this "cryptMedia" remote?
y) Yes this is OK (default)
e) Edit this remote
d) Delete this remote
y/e/d> y

Now your remote is ready to use. Note that when using a remote with encrypted files and directories do not use the Backblaze webUI to create folders underneath your root or rclone won’t recognize them.

I bring this up because there is one frustrating thing with rclone. If I want to copy a directory to the cloud storage remote it copies the contents of the directory and not the directory itself. For example, if I type on the command line:

$ cp -r Music /Media

it will create a “Music” directory under the “Media” directory. But if I type:

$ rclone copy Music crypt:Media

it will copy the contents of the Music directory into the root of the Media directory. To get the outcome I want I need to run:

$ rclone mkdir crypt:Media/Music

$ rclone copy Music crypt:Media/Music

Make sense?

While rclone has a lot of commands, the ones I have used are “mkdir” and “rmdir” (just like on a regular command line) and “copy” and “sync”. I use “copy” for the initial transfer and then “sync” for subsequent updates.

Now all I have to do for cloud synchronization is set up a crontab to run these commands on occasion (I set mine up for once a day).

I can check that the encryption is working by using the Backblaze webUI. First I see the folder I created to hold my encrypted files:

But the directories in that folder have names that sound like I’m trying to summon Cthulhu:

As you can see from this graph, I was real eager to upload stuff when I got this working:

and on the first day I sent up nearly 400GB of files. Backblaze B2 pricing is currently $6/TB/month, and this seems about right:

I have since doubled my storage so it should run about 20 cents a day. Note that downloading your data is free up to three times the amount of data stored. In other words, you could download all of the data you have in B2 three times in a given month and not incur fees. Since I am using this simply for catastrophic data recovery I shouldn’t have to worry about egress fees.

I am absolutely delighted to have this working and extremely impressed with rclone. For my needs open source once again outshines commercial offerings. And remember if you have other preferences for cloud storage providers you have a large range of choices, and the installation should be similar to the one I did here.