Tag: apache spark
Python Now a First-Class Language on Spark, Databricks Says
The Apache Spark community has improved support for Python to such a great degree over the past few years that Python is now a “first-class” language, and no longer a "clunky" add-on as it once was, Databricks co-fou Read more…
Nvidia Bolsters RAPIDS Graph Analytics with NetworkX Expansion
Nvidia has expanded its support of NetworkX graph analytic algorithms in RAPIDS, its open source library for accelerated computing. The expansion means data scientists can run 40-plus NetworkX algorithms on Nvidia GPUs w Read more…
Voltron Aims to Unblock AI with GPU-Accelerated Data Processing
Data is at the heart of artificial intelligence, but it’s also emerging as one of its biggest bottlenecks. Without sufficient quantities of good, clean data to feed into models, companies simply can’t reap the reward Read more…
How Acceldata Helped T-Mobile’s Data Modernization Strategy
When T-Mobile started migrating some of its data estate from an on-prem Hadoop system to cloud-based data platforms, it found the move liberating. But as it settled into a hybrid-cloud world, T-Mobile realized costs were Read more…
IBM to Showcase Open Analytics Push at PrestoCon Day
The PrestoDB community will come together this Wednesday for PrestoCon Day, the third annual virtual event showcasing the popular open source SQL engine. Representatives from Uber, Adobe, Alibaba, and TikTok will share s Read more…
Hugging Face and Databricks Streamline Dataset Creation with Spark
Databricks and Hugging Face have unveiled a new integration that will allow users to create a Hugging Face dataset from an Apache Spark dataframe. Databricks has written and committed these Spark changes to the Huggin Read more…
Apache Beam Cuts Processing Time 94% for LinkedIn
Like many large companies, LinkedIn relied on the Lamba architecture to run separate batch and streaming workloads, with a form of reconciliation at the end. After implementing Apache Beam, it was able to combine batch a Read more…
A Dozen Questions for Databricks CTO Matei Zaharia
Matei Zaharia is a very busy man. When he’s not helping to shape the future of Databricks as its CTO, he is helping to shape the future of computer science as an assistant professor at Stanford University. He also fi Read more…
Is Real-Time Streaming Finally Taking Off?
Like commercial fusion reactors, real-time streaming is a tantalizing technology, but one that perpetually needs just a few more years (or decades) of R&D. But some in the industry are sensing that something has shif Read more…
Databricks Bolsters Governance and Secure Sharing in the Lakehouse
Data governance is one of the four pillars necessary for the future of AI, along with past-looking analytics, future-looking AI, and real-time decision-making. To that end, Databricks rolled out several new governance ca Read more…
It’s Not ‘Mobile Spark,’ But It’s Close
On April 1, 2015, Apache Spark PMC member Reynold Xin wrote a compelling blog detailing plans to deliver a mobile version of Spark. It was all a joke, of course: Spark was a heavy bit of code designed for distributed sys Read more…
Databricks Scores ACM SIGMOD Awards for Spark and Photon
Databricks announced it has won two awards at the ACM SIGMOD (Association of Computing Machinery’s Special Interest Group in the Management of Data) Conference in Philadelphia. Apache Spark was awarded the SIGMOD Sy Read more…
Spark Gets Closer Hooks to Pandas, SQL with Version 3.2
The Apache Spark community last week announced Spark 3.2, a significant new release of the distributed computing framework. Among the more exciting features are deeper support for the Python data ecosystem, including the Read more…
Machine Learning, from Single Core to Whole Cluster
The demand for production-quality software for mining insights from datasets across scales has exploded in the last several years. The growing size of datasets throughout industry, government, and other fields has increa Read more…
Meet Sean Knapp, a 2021 Datanami Person to Watch
Getting data to the right place at the right time has never been more important than it is now. But for many organizations, the data movement task largely remains a manual affair. Sean Knapp founded Ascend.io because he Read more…
ML Scaling Requires Upgraded Data Management Plan
Successful data strategies are built on a foundation of meticulous data management, creating enterprise architectures that “democratize” data access and usage, yielding measurable results from machine learning platfo Read more…
Cloudera, Nvidia Team to Speed Cloud AI via Spark
Cloud access to GPUs for AI development will expand under a partnership between Cloudera and Nvidia that calls for the data cloud provider to integrate Nvidia’s accelerated Apache Spark 3.0 platform as a way to scale d Read more…
No-Coder Upsolver Aims to Ease Use of Cloud Data Lakes
Upsolver, the no-code data lake platform vendor, has closed a $25 million funding round this week, boosting total venture funding for its cloud analytics tools to about $42 million. The financing round announced Tuesd Read more…
Databricks Plotting IPO in 2021, Bloomberg Reports
Databricks, which runs a unified data platform in the cloud and is the driving force behind Apache Spark, is preparing for an initial public offering (IPO), possibly in the first half of 2021, according to a report in Bl Read more…